Coder Social home page Coder Social logo

getarxivdaily's Introduction

Hello There 👋 A-suozhang Here

I am Tianchen Zhao, currently a Phd. student at NICS-EFC, EE Dept., Tsinghua University. My main research focus is Efficient Deep Learning algorithms, and software-hardware co-design. My recent research interest is Effiicent AIGC (diffusion-based generation methods).

📤 Feel free to contact me through [email protected] for discussion and cooperaion, and our lab is open for visting student, please ref here

I've done research about these compression techniques:

  • Network Pruning: (structural pruning for CNN)
  • Quantization: (binary network, low-bit training)
  • Neural Architecture Search: (participate in developing aw_nas a modularized NAS framework)
  • Adaptive Inference: (adaptive inference for 3D voxel-based CNN)

and these applications:

  • Autonomous Driving (3D Transformer & efficient lidar-based perception, intern at Novauto Inc.)
  • AIGC (efficient Diffusion methods)

tianchen's GitHub Stats, Rank: B+ (github-readme-stats.vercel.app)

github-readme-stats.vercel.app/api/top-langs/?username=A-suozhang&layout=compact&theme=vue

getarxivdaily's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

getarxivdaily's Issues

New submissions for Mon, 20 Mar 23

Keyword: pruning

Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution

  • Authors: Jiamian Wang, Huan Wang, Yulun Zhang, Yun Fu, Zhiqiang Tao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09650
  • Pdf link: https://arxiv.org/pdf/2303.09650
  • Abstract
    The field of image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. However, prevailing SR models suffer from prohibitive memory footprint and intensive computations, which limits further deployment on computational-constrained platforms. In this work, we investigate the potential of network pruning for super-resolution to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. Two main challenges remain in applying pruning methods for SR. First, the widely-used filter pruning technique reflects limited granularity and restricted adaptability to diverse network structures. Second, existing pruning methods generally operate upon a pre-trained network for the sparse structure determination, failing to get rid of dense model training in the traditional SR paradigm. To address these challenges, we adopt unstructured pruning with sparse models directly trained from scratch. Specifically, we propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly initialized network at each iteration and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly. We observe that the proposed ISS-P could dynamically learn sparse structures adapting to the optimization process and preserve the sparse model's trainability by yielding a more regularized gradient throughput. Experiments on benchmark datasets demonstrate the effectiveness of the proposed ISS-P compared with state-of-the-art methods over diverse network architectures.

Dynamic Structure Pruning for Compressing CNNs

  • Authors: Jun-Hyung Park, Yeachan Kim, Junho Kim, Joon-Young Choi, SangKeun Lee
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.09736
  • Pdf link: https://arxiv.org/pdf/2303.09736
  • Abstract
    Structure pruning is an effective method to compress and accelerate neural networks. While filter and channel pruning are preferable to other structure pruning methods in terms of realistic acceleration and hardware compatibility, pruning methods with a finer granularity, such as intra-channel pruning, are expected to be capable of yielding more compact and computationally efficient networks. Typical intra-channel pruning methods utilize a static and hand-crafted pruning granularity due to a large search space, which leaves room for improvement in their pruning performance. In this work, we introduce a novel structure pruning method, termed as dynamic structure pruning, to identify optimal pruning granularities for intra-channel pruning. In contrast to existing intra-channel pruning methods, the proposed method automatically optimizes dynamic pruning granularities in each layer while training deep neural networks. To achieve this, we propose a differentiable group learning method designed to efficiently learn a pruning granularity based on gradient-based learning of filter groups. The experimental results show that dynamic structure pruning achieves state-of-the-art pruning performance and better realistic acceleration on a GPU compared with channel pruning. In particular, it reduces the FLOPs of ResNet50 by 71.85% without accuracy degradation on the ImageNet dataset. Our code is available at https://github.com/irishev/DSP.

Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration

  • Authors: Zheng Qin, Hao Yu, Changjian Wang, Yuxing Peng, Kai Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09950
  • Pdf link: https://arxiv.org/pdf/2303.09950
  • Abstract
    We study the problem of outlier correspondence pruning for non-rigid point cloud registration. In rigid registration, spatial consistency has been a commonly used criterion to discriminate outliers from inliers. It measures the compatibility of two correspondences by the discrepancy between the respective distances in two point clouds. However, spatial consistency no longer holds in non-rigid cases and outlier rejection for non-rigid registration has not been well studied. In this work, we propose Graph-based Spatial Consistency Network (GraphSCNet) to filter outliers for non-rigid registration. Our method is based on the fact that non-rigid deformations are usually locally rigid, or local shape preserving. We first design a local spatial consistency measure over the deformation graph of the point cloud, which evaluates the spatial compatibility only between the correspondences in the vicinity of a graph node. An attention-based non-rigid correspondence embedding module is then devised to learn a robust representation of non-rigid correspondences from local spatial consistency. Despite its simplicity, GraphSCNet effectively improves the quality of the putative correspondences and attains state-of-the-art performance on three challenging benchmarks. Our code and models are available at https://github.com/qinzheng93/GraphSCNet.

Keyword: neural\ architecture\ search

There is no result

Keyword: 3d object detection

GOOD: General Optimization-based Fusion for 3D Object Detection via LiDAR-Camera Object Candidates

  • Authors: Bingqi Shen, Shuwei Dai, Yuyin Chen, Rong Xiong, Yue Wang, Yanmei Jiao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.09800
  • Pdf link: https://arxiv.org/pdf/2303.09800
  • Abstract
    3D object detection serves as the core basis of the perception tasks in autonomous driving. Recent years have seen the rapid progress of multi-modal fusion strategies for more robust and accurate 3D object detection. However, current researches for robust fusion are all learning-based frameworks, which demand a large amount of training data and are inconvenient to implement in new scenes. In this paper, we propose GOOD, a general optimization-based fusion framework that can achieve satisfying detection without training additional models and is available for any combinations of 2D and 3D detectors to improve the accuracy and robustness of 3D detection. First we apply the mutual-sided nearest-neighbor probability model to achieve the 3D-2D data association. Then we design an optimization pipeline that can optimize different kinds of instances separately based on the matching result. Apart from this, the 3D MOT method is also introduced to enhance the performance aided by previous frames. To the best of our knowledge, this is the first optimization-based late fusion framework for multi-modal 3D object detection which can be served as a baseline for subsequent research. Experiments on both nuScenes and KITTI datasets are carried out and the results show that GOOD outperforms by 9.1% on mAP score compared with PointPillars and achieves competitive results with the learning-based late fusion CLOCs.

A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving

  • Authors: Wanshui Gan, Ningkai Mo, Hongbin Xu, Naoto Yokoya
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.10076
  • Pdf link: https://arxiv.org/pdf/2303.10076
  • Abstract
    The task of estimating 3D occupancy from surrounding view images is an exciting development in the field of autonomous driving, following the success of Birds Eye View (BEV) perception.This task provides crucial 3D attributes of the driving environment, enhancing the overall understanding and perception of the surrounding space. However, there is still a lack of a baseline to define the task, such as network design, optimization, and evaluation. In this work, we present a simple attempt for 3D occupancy estimation, which is a CNN-based framework designed to reveal several key factors for 3D occupancy estimation. In addition, we explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation, stereo matching, and BEV perception (3D object detection and map segmentation), which could advance the study on 3D occupancy estimation. For evaluation, we propose a simple sampling strategy to define the metric for occupancy evaluation, which is flexible for current public datasets. Moreover, we establish a new benchmark in terms of the depth estimation metric, where we compare our proposed method with monocular depth estimation methods on the DDAD and Nuscenes datasets.The relevant code will be available in https://github.com/GANWANSHUI/SimpleOccupancy

Keyword: voxel

A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

  • Authors: Xiaotao Hu, Zhewei Huang, Ailin Huang, Jun Xu, Shuchang Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09875
  • Pdf link: https://arxiv.org/pdf/2303.09875
  • Abstract
    The performance of video prediction has been greatly boosted by advanced deep neural networks. However, most of the current methods suffer from large model sizes and require extra inputs, e.g., semantic/depth maps, for promising performance. For efficiency consideration, in this paper, we propose a Dynamic Multi-scale Voxel Flow Network (DMVFN) to achieve better video prediction performance at lower computational costs with only RGB images, than previous methods. The core of our DMVFN is a differentiable routing module that can effectively perceive the motion scales of video frames. Once trained, our DMVFN selects adaptive sub-networks for different inputs at the inference stage. Experiments on several benchmarks demonstrate that our DMVFN is an order of magnitude faster than Deep Voxel Flow and surpasses the state-of-the-art iterative-based OPT on generated image quality. Our code and demo are available at https://huxiaotaostasy.github.io/DMVFN/.

Semantic Scene Completion with Cleaner Self

  • Authors: Fengyun Wang, Dong Zhang, Hanwang Zhang, Jinhui Tang, Qianru Sun
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09977
  • Pdf link: https://arxiv.org/pdf/2303.09977
  • Abstract
    Semantic Scene Completion (SSC) transforms an image of single-view depth and/or RGB 2D pixels into 3D voxels, each of whose semantic labels are predicted. SSC is a well-known ill-posed problem as the prediction model has to "imagine" what is behind the visible surface, which is usually represented by Truncated Signed Distance Function (TSDF). Due to the sensory imperfection of the depth camera, most existing methods based on the noisy TSDF estimated from depth values suffer from 1) incomplete volumetric predictions and 2) confused semantic labels. To this end, we use the ground-truth 3D voxels to generate a perfect visible surface, called TSDF-CAD, and then train a "cleaner" SSC model. As the model is noise-free, it is expected to focus more on the "imagination" of unseen voxels. Then, we propose to distill the intermediate "cleaner" knowledge into another model with noisy TSDF input. In particular, we use the 3D occupancy feature and the semantic relations of the "cleaner self" to supervise the counterparts of the "noisy self" to respectively address the above two incorrect predictions. Experimental results validate that our method improves the noisy counterparts with 3.1% IoU and 2.2% mIoU for measuring scene completion and SSC, and also achieves new state-of-the-art accuracy on the popular NYU dataset.

Gyroid-like metamaterials: Topology optimization and Deep Learning

  • Authors: Asha Viswanath, Diab W Abueidda, Mohamad Modrek, Kamran A Khan, Seid Koric, Rashid K. Abu Al-Rub
  • Subjects: Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2303.10007
  • Pdf link: https://arxiv.org/pdf/2303.10007
  • Abstract
    Triply periodic minimal surface (TPMS) metamaterials characterized by mathematically-controlled topologies exhibit better mechanical properties compared to uniform structures. The unit cell topology of such metamaterials can be further optimized to improve a desired mechanical property for a specific application. However, such inverse design involves multiple costly 3D finite element analyses in topology optimization and hence has not been attempted. Data-driven models have recently gained popularity as surrogate models in the geometrical design of metamaterials. Gyroid-like unit cells are designed using a novel voxel algorithm, a homogenization-based topology optimization, and a Heaviside filter to attain optimized densities of 0-1 configuration. Few optimization data are used as input-output for supervised learning of the topology optimization process from a 3D CNN model. These models could then be used to instantaneously predict the optimized unit cell geometry for any topology parameters, thus alleviating the need to run any topology optimization for future design. The high accuracy of the model was demonstrated by a low mean square error metric and a high dice coefficient metric. This accelerated design of 3D metamaterials opens the possibility of designing any computationally costly problems involving complex geometry of metamaterials with multi-objective properties or multi-scale applications.

Keyword: lidar

Exorcising ''Wraith'': Protecting LiDAR-based Object Detector in Automated Driving System from Appearing Attacks

  • Authors: Qifan Xiao, Xudong Pan, Yifan Lu, Mi Zhang, Jiarun Dai, Min Yang
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.09731
  • Pdf link: https://arxiv.org/pdf/2303.09731
  • Abstract
    Automated driving systems rely on 3D object detectors to recognize possible obstacles from LiDAR point clouds. However, recent works show the adversary can forge non-existent cars in the prediction results with a few fake points (i.e., appearing attack). By removing statistical outliers, existing defenses are however designed for specific attacks or biased by predefined heuristic rules. Towards more comprehensive mitigation, we first systematically inspect the mechanism of recent appearing attacks: Their common weaknesses are observed in crafting fake obstacles which (i) have obvious differences in the local parts compared with real obstacles and (ii) violate the physical relation between depth and point density. In this paper, we propose a novel plug-and-play defensive module which works by side of a trained LiDAR-based object detector to eliminate forged obstacles where a major proportion of local parts have low objectness, i.e., to what degree it belongs to a real object. At the core of our module is a local objectness predictor, which explicitly incorporates the depth information to model the relation between depth and point density, and predicts each local part of an obstacle with an objectness score. Extensive experiments show, our proposed defense eliminates at least 70% cars forged by three known appearing attacks in most cases, while, for the best previous defense, less than 30% forged cars are eliminated. Meanwhile, under the same circumstance, our defense incurs less overhead for AP/precision on cars compared with existing defenses. Furthermore, We validate the effectiveness of our proposed defense on simulation-based closed-loop control driving tests in the open-source system of Baidu's Apollo.

Identifying Occluded Agents in Dynamic Games with Noise-Corrupted Observations

  • Authors: Tianyu Qiu, David Fridovich-Keil
  • Subjects: Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2303.09744
  • Pdf link: https://arxiv.org/pdf/2303.09744
  • Abstract
    To provide safe and efficient services, robots must rely on observations from sensors (lidar, camera, etc.) to have a clear knowledge of the environment. In multi-agent scenarios, robots must further reason about the intrinsic motivation underlying the behavior of other agents in order to make inferences about their future behavior. Occlusions, which often occur in robot operating scenarios, make the decision-making of robots even more challenging. In scenarios without occlusions, dynamic game theory provides a solid theoretical framework for predicting the behavior of agents with different objectives interacting with each other over time. Prior work proposed an inverse dynamic game method to recover the game model that best explains observed behavior. However, an apparent shortcoming is that it does not account for agents that may be occluded. Neglecting these agents may result in risky navigation decisions. To address this problem, we propose a novel inverse dynamic game technique to infer the behavior of occluded, unobserved agents that best explains the observation of visible agents' behavior, and simultaneously to predict the agents' future behavior based on the recovered game model. We demonstrate our method in several simulated scenarios. Results reveal that our method robustly estimates agents' objectives and predicts trajectories for both visible and occluded agents from a short sequence of noise corrupted trajectory observation of only the visible agents.

LCE-Calib: Automatic LiDAR-Frame/Event Camera Extrinsic Calibration With A Globally Optimal Solution

  • Authors: Jianhao Jiao, Feiyi Chen, Hexiang Wei, Jin Wu, Ming Liu
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09825
  • Pdf link: https://arxiv.org/pdf/2303.09825
  • Abstract
    The combination of LiDARs and cameras enables a mobile robot to perceive environments with multi-modal data, becoming a key factor in achieving robust perception. Traditional frame cameras are sensitive to changing illumination conditions, motivating us to introduce novel event cameras to make LiDAR-camera fusion more complete and robust. However, to jointly exploit these sensors, the challenging extrinsic calibration problem should be addressed. This paper proposes an automatic checkerboard-based approach to calibrate extrinsics between a LiDAR and a frame/event camera, where four contributions are presented. Firstly, we present an automatic feature extraction and checkerboard tracking method from LiDAR's point clouds. Secondly, we reconstruct realistic frame images from event streams, applying traditional corner detectors to event cameras. Thirdly, we propose an initialization-refinement procedure to estimate extrinsics using point-to-plane and point-to-line constraints in a coarse-to-fine manner. Fourthly, we introduce a unified and globally optimal solution to address two optimization problems in calibration. Our approach has been validated with extensive experiments on 19 simulated and real-world datasets and outperforms the state-of-the-art.

Privacy-preserving Pedestrian Tracking using Distributed 3D LiDARs

  • Authors: Masakazu Ohno, Riki Ukyo, Tatsuya Amano, Hamada Rizk, Hirozumi Yamaguchi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2303.09915
  • Pdf link: https://arxiv.org/pdf/2303.09915
  • Abstract
    The growing demand for intelligent environments unleashes an extraordinary cycle of privacy-aware applications that makes individuals' life more comfortable and safe. Examples of these applications include pedestrian tracking systems in large areas. Although the ubiquity of camera-based systems, they are not a preferable solution due to the vulnerability of leaking the privacy of pedestrians.In this paper, we introduce a novel privacy-preserving system for pedestrian tracking in smart environments using multiple distributed LiDARs of non-overlapping views. The system is designed to leverage LiDAR devices to track pedestrians in partially covered areas due to practical constraints, e.g., occlusion or cost. Therefore, the system uses the point cloud captured by different LiDARs to extract discriminative features that are used to train a metric learning model for pedestrian matching purposes. To boost the system's robustness, we leverage a probabilistic approach to model and adapt the dynamic mobility patterns of individuals and thus connect their sub-trajectories.We deployed the system in a large-scale testbed with 70 colorless LiDARs and conducted three different experiments. The evaluation result at the entrance hall confirms the system's ability to accurately track the pedestrians with a 0.98 F-measure even with zero-covered areas. This result highlights the promise of the proposed system as the next generation of privacy-preserving tracking means in smart environments.

New submissions for Fri, 7 Apr 23

Keyword: efficient

Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural Networks

  • Authors: Michael Weiss, Paolo Tonella
  • Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.02654
  • Pdf link: https://arxiv.org/pdf/2304.02654
  • Abstract
    Recent decades have seen the rise of large-scale Deep Neural Networks (DNNs) to achieve human-competitive performance in a variety of artificial intelligence tasks. Often consisting of hundreds of millions, if not hundreds of billion parameters, these DNNs are too large to be deployed to, or efficiently run on resource-constrained devices such as mobile phones or IoT microcontrollers. Systems relying on large-scale DNNs thus have to call the corresponding model over the network, leading to substantial costs for hosting and running the large-scale remote model, costs which are often charged on a per-use basis. In this paper, we propose BiSupervised, a novel architecture, where, before relying on a large remote DNN, a system attempts to make a prediction on a small-scale local model. A DNN supervisor monitors said prediction process and identifies easy inputs for which the local prediction can be trusted. For these inputs, the remote model does not have to be invoked, thus saving costs, while only marginally impacting the overall system accuracy. Our architecture furthermore foresees a second supervisor to monitor the remote predictions and identify inputs for which not even these can be trusted, allowing to raise an exception or run a fallback strategy instead. We evaluate the cost savings, and the ability to detect incorrectly predicted inputs on four diverse case studies: IMDB movie review sentiment classification, Github issue triaging, Imagenet image classification, and SQuADv2 free-text question answering

nD-PDPA: nDimensional Probability Density Profile Analysis

  • Authors: Arjang Fahim, Stephanie Irausquin, Homayoun Valafar
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
  • Arxiv link: https://arxiv.org/abs/2304.02682
  • Pdf link: https://arxiv.org/pdf/2304.02682
  • Abstract
    Despite the recent advances in various Structural Genomics Projects, a large gap remains between the number of sequenced and structurally characterized proteins. Some reasons for this discrepancy include technical difficulties, labor, and the cost related to determining a structure by experimental methods such as NMR spectroscopy. Several computational methods have been developed to expand the applicability of NMR spectroscopy by addressing temporal and economical problems more efficiently. While these methods demonstrate successful outcomes to solve more challenging and structurally novel proteins, the cost has not been reduced significantly. Probability Density Profile Analysis (PDPA) has been previously introduced by our lab to directly address the economics of structure determination of routine proteins and the identification of novel structures from a minimal set of unassigned NMR data. 2D-PDPA (in which 2D denotes incorporation of data from two alignment media) has been successful in identifying the structural homolog of an unknown protein within a library of ~1000 decoy structures. In order to further expand the selectivity and sensitivity of PDPA, the incorporation of additional data was necessary. However, the expansion of the original PDPA approach was limited by its computational requirements where the inclusion of additional data would render it computationally intractable. Here we present the most recent developments of PDPA method (nD-PDPA: n Dimensional Probability Density Profile Analysis) that eliminate 2D-PDPA's computational limitations, and allows inclusion of RDC data from multiple vector types in multiple alignment media.

A Certified Radius-Guided Attack Framework to Image Segmentation Models

  • Authors: Wenjie Qu, Youqi Li, Binghui Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02693
  • Pdf link: https://arxiv.org/pdf/2304.02693
  • Abstract
    Image segmentation is an important problem in many safety-critical applications. Recent studies show that modern image segmentation models are vulnerable to adversarial perturbations, while existing attack methods mainly follow the idea of attacking image classification models. We argue that image segmentation and classification have inherent differences, and design an attack framework specially for image segmentation models. Our attack framework is inspired by certified radius, which was originally used by defenders to defend against adversarial perturbations to classification models. We are the first, from the attacker perspective, to leverage the properties of certified radius and propose a certified radius guided attack framework against image segmentation models. Specifically, we first adapt randomized smoothing, the state-of-the-art certification method for classification models, to derive the pixel's certified radius. We then focus more on disrupting pixels with relatively smaller certified radii and design a pixel-wise certified radius guided loss, when plugged into any existing white-box attack, yields our certified radius-guided white-box attack. Next, we propose the first black-box attack to image segmentation models via bandit. We design a novel gradient estimator, based on bandit feedback, which is query-efficient and provably unbiased and stable. We use this gradient estimator to design a projected bandit gradient descent (PBGD) attack, as well as a certified radius-guided PBGD (CR-PBGD) attack. We prove our PBGD and CR-PBGD attacks can achieve asymptotically optimal attack performance with an optimal rate. We evaluate our certified-radius guided white-box and black-box attacks on multiple modern image segmentation models and datasets. Our results validate the effectiveness of our certified radius-guided attack framework.

Recovering Continuous Scene Dynamics from A Single Blurry Image with Events

  • Authors: Zhangyi Cheng, Xiang Zhang, Lei Yu, Jianzhuang Liu, Wen Yang, Gui-Song Xia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02695
  • Pdf link: https://arxiv.org/pdf/2304.02695
  • Abstract
    This paper aims at demystifying a single motion-blurred image with events and revealing temporally continuous scene dynamics encrypted behind motion blurs. To achieve this end, an Implicit Video Function (IVF) is learned to represent a single motion blurred image with concurrent events, enabling the latent sharp image restoration of arbitrary timestamps in the range of imaging exposures. Specifically, a dual attention transformer is proposed to efficiently leverage merits from both modalities, i.e., the high temporal resolution of event features and the smoothness of image features, alleviating temporal ambiguities while suppressing the event noise. The proposed network is trained only with the supervision of ground-truth images of limited referenced timestamps. Motion- and texture-guided supervisions are employed simultaneously to enhance restorations of the non-referenced timestamps and improve the overall sharpness. Experiments on synthetic, semi-synthetic, and real-world datasets demonstrate that our proposed method outperforms state-of-the-art methods by a large margin in terms of both objective PSNR and SSIM measurements and subjective evaluations.

Agnostic proper learning of monotone functions: beyond the black-box correction barrier

  • Authors: Jane Lange, Arsen Vasilyan
  • Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02700
  • Pdf link: https://arxiv.org/pdf/2304.02700
  • Abstract
    We give the first agnostic, efficient, proper learning algorithm for monotone Boolean functions. Given $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$ uniformly random examples of an unknown function $f:{\pm 1}^n \rightarrow {\pm 1}$, our algorithm outputs a hypothesis $g:{\pm 1}^n \rightarrow {\pm 1}$ that is monotone and $(\mathrm{opt} + \varepsilon)$-close to $f$, where $\mathrm{opt}$ is the distance from $f$ to the closest monotone function. The running time of the algorithm (and consequently the size and evaluation time of the hypothesis) is also $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$, nearly matching the lower bound of Blais et al (RANDOM '15). We also give an algorithm for estimating up to additive error $\varepsilon$ the distance of an unknown function $f$ to monotone using a run-time of $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$. Previously, for both of these problems, sample-efficient algorithms were known, but these algorithms were not run-time efficient. Our work thus closes this gap in our knowledge between the run-time and sample complexity. This work builds upon the improper learning algorithm of Bshouty and Tamon (JACM '96) and the proper semiagnostic learning algorithm of Lange, Rubinfeld, and Vasilyan (FOCS '22), which obtains a non-monotone Boolean-valued hypothesis, then corrects'' it to monotone using query-efficient local computation algorithms on graphs. This black-box correction approach can achieve no error better than $2\mathrm{opt} + \varepsilon$ information-theoretically; we bypass this barrier by a) augmenting the improper learner with a convex optimization step, and b) learning and correcting a real-valued function before rounding its values to Boolean. Our real-valued correction algorithm solves the poset sorting'' problem of [LRV22] for functions over general posets with non-Boolean labels.

A Unified Taxonomy for Automated Vehicles: Individual, Cooperative, Collaborative, On-Road, and Off-Road

  • Authors: Fredrik Warg, Anders Thorsén, Victoria Vu, Carl Bergenhem
  • Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02705
  • Pdf link: https://arxiv.org/pdf/2304.02705
  • Abstract
    Various types of vehicle automation is increasingly used in a variety of environments including road vehicles such as cars or automated shuttles, confined areas such as mines or harbours, or in agriculture and forestry. In many use cases, the benefits are greater if several automated vehicles (AVs) cooperate to aid each other reach their goals more efficiently, or collaborate to complete a common task. Taxonomies and definitions create a common framework that helps researchers and practitioners advance the field. However, most existing work focus on road vehicles. In this paper, we review and extend taxonomies and definitions to encompass individually acting as well as cooperative and collaborative AVs for both on-road and off-road use cases. In particular, we introduce classes of collaborative vehicles not defined in existing literature, and define levels of automation suitable for vehicles where automation applies to additional functions in addition to the driving task.

Efficient OCR for Building a Diverse Digital History

  • Authors: Jacob Carlson, Tom Bryan, Melissa Dell
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Digital Libraries (cs.DL); General Economics (econ.GN)
  • Arxiv link: https://arxiv.org/abs/2304.02737
  • Pdf link: https://arxiv.org/pdf/2304.02737
  • Abstract
    Thousands of users consult digital archives daily, but the information they can access is unrepresentative of the diversity of documentary history. The sequence-to-sequence architecture typically used for optical character recognition (OCR) - which jointly learns a vision and language model - is poorly extensible to low-resource document collections, as learning a language-vision model requires extensive labeled sequences and compute. This study models OCR as a character level image retrieval problem, using a contrastively trained vision encoder. Because the model only learns characters' visual features, it is more sample efficient and extensible than existing architectures, enabling accurate OCR in settings where existing solutions fail. Crucially, the model opens new avenues for community engagement in making digital history more representative of documentary history.

Sejarah dan Perkembangan Teknik Natural Language Processing (NLP) Bahasa Indonesia: Tinjauan tentang sejarah, perkembangan teknologi, dan aplikasi NLP dalam bahasa Indonesia

  • Authors: Mukhlis Amien
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.02746
  • Pdf link: https://arxiv.org/pdf/2304.02746
  • Abstract
    This study provides an overview of the history of the development of Natural Language Processing (NLP) in the context of the Indonesian language, with a focus on the basic technologies, methods, and practical applications that have been developed. This review covers developments in basic NLP technologies such as stemming, part-of-speech tagging, and related methods; practical applications in cross-language information retrieval systems, information extraction, and sentiment analysis; and methods and techniques used in Indonesian language NLP research, such as machine learning, statistics-based machine translation, and conflict-based approaches. This study also explores the application of NLP in Indonesian language industry and research and identifies challenges and opportunities in Indonesian language NLP research and development. Recommendations for future Indonesian language NLP research and development include developing more efficient methods and technologies, expanding NLP applications, increasing sustainability, further research into the potential of NLP, and promoting interdisciplinary collaboration. It is hoped that this review will help researchers, practitioners, and the government to understand the development of Indonesian language NLP and identify opportunities for further research and development.

Robust, privacy-preserving, transparent, and auditable on-device blocklisting

  • Authors: Kurt Thomas, Sarah Meiklejohn, Michael A. Specter, Xiang Wang, Xavier Llorà, Stephan Somogyi, David Kleidermacher
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.02810
  • Pdf link: https://arxiv.org/pdf/2304.02810
  • Abstract
    With the accelerated adoption of end-to-end encryption, there is an opportunity to re-architect security and anti-abuse primitives in a manner that preserves new privacy expectations. In this paper, we consider two novel protocols for on-device blocklisting that allow a client to determine whether an object (e.g., URL, document, image, etc.) is harmful based on threat information possessed by a so-called remote enforcer in a way that is both privacy-preserving and trustworthy. Our protocols leverage a unique combination of private set intersection to promote privacy, cryptographic hashes to ensure resilience to false positives, cryptographic signatures to improve transparency, and Merkle inclusion proofs to ensure consistency and auditability. We benchmark our protocols -- one that is time-efficient, and the other space-efficient -- to demonstrate their practical use for applications such as email, messaging, storage, and other applications. We also highlight remaining challenges, such as privacy and censorship tensions that exist with logging or reporting. We consider our work to be a critical first step towards enabling complex, multi-stakeholder discussions on how best to provide on-device protections.

GIF: A General Graph Unlearning Strategy via Influence Function

  • Authors: Jiancan Wu, Yi Yang, Yuchun Qian, Yongduo Sui, Xiang Wang, Xiangnan He
  • Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.02835
  • Pdf link: https://arxiv.org/pdf/2304.02835
  • Abstract
    With the greater emphasis on privacy and security in our society, the problem of graph unlearning -- revoking the influence of specific data on the trained GNN model, is drawing increasing attention. However, ranging from machine unlearning to recently emerged graph unlearning methods, existing efforts either resort to retraining paradigm, or perform approximate erasure that fails to consider the inter-dependency between connected neighbors or imposes constraints on GNN structure, therefore hard to achieve satisfying performance-complexity trade-offs. In this work, we explore the influence function tailored for graph unlearning, so as to improve the unlearning efficacy and efficiency for graph unlearning. We first present a unified problem formulation of diverse graph unlearning tasks \wrt node, edge, and feature. Then, we recognize the crux to the inability of traditional influence function for graph unlearning, and devise Graph Influence Function (GIF), a model-agnostic unlearning method that can efficiently and accurately estimate parameter changes in response to a $\epsilon$-mass perturbation in deleted data. The idea is to supplement the objective of the traditional influence function with an additional loss term of the influenced neighbors due to the structural dependency. Further deductions on the closed-form solution of parameter changes provide a better understanding of the unlearning mechanism. We conduct extensive experiments on four representative GNN models and three benchmark datasets to justify the superiority of GIF for diverse graph unlearning tasks in terms of unlearning efficacy, model utility, and unlearning efficiency. Our implementations are available at \url{https://github.com/wujcan/GIF-torch/}.

Robustmix: Improving Robustness by Regularizing the Frequency Bias of Deep Nets

  • Authors: Jonas Ngnawe, Marianne ABEMGNIGNI NJIFON, Jonathan Heek, Yann Dauphin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02847
  • Pdf link: https://arxiv.org/pdf/2304.02847
  • Abstract
    Deep networks have achieved impressive results on a range of well-curated benchmark datasets. Surprisingly, their performance remains sensitive to perturbations that have little effect on human performance. In this work, we propose a novel extension of Mixup called Robustmix that regularizes networks to classify based on lower-frequency spatial features. We show that this type of regularization improves robustness on a range of benchmarks such as Imagenet-C and Stylized Imagenet. It adds little computational overhead and, furthermore, does not require a priori knowledge of a large set of image transformations. We find that this approach further complements recent advances in model architecture and data augmentation, attaining a state-of-the-art mCE of 44.8 with an EfficientNet-B8 model and RandAugment, which is a reduction of 16 mCE compared to the baseline.

Towards an Effective and Efficient Transformer for Rain-by-snow Weather Removal

  • Authors: Tao Gao, Yuanbo Wen, Kaihao Zhang, Peng Cheng, Ting Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02860
  • Pdf link: https://arxiv.org/pdf/2304.02860
  • Abstract
    Rain-by-snow weather removal is a specialized task in weather-degraded image restoration aiming to eliminate coexisting rain streaks and snow particles. In this paper, we propose RSFormer, an efficient and effective Transformer that addresses this challenge. Initially, we explore the proximity of convolution networks (ConvNets) and vision Transformers (ViTs) in hierarchical architectures and experimentally find they perform approximately at intra-stage feature learning. On this basis, we utilize a Transformer-like convolution block (TCB) that replaces the computationally expensive self-attention while preserving attention characteristics for adapting to input content. We also demonstrate that cross-stage progression is critical for performance improvement, and propose a global-local self-attention sampling mechanism (GLASM) that down-/up-samples features while capturing both global and local dependencies. Finally, we synthesize two novel rain-by-snow datasets, RSCityScape and RS100K, to evaluate our proposed RSFormer. Extensive experiments verify that RSFormer achieves the best trade-off between performance and time-consumption compared to other restoration methods. For instance, it outperforms Restormer with a 1.53% reduction in the number of parameters and a 15.6% reduction in inference time. Datasets, source code and pre-trained models are available at \url{https://github.com/chdwyb/RSFormer}.

VPFusion: Towards Robust Vertical Representation Learning for 3D Object Detection

  • Authors: Yuhao Huang, Sanping Zhou, Junjie Zhang, Jinpeng Dong, Nanning Zheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02867
  • Pdf link: https://arxiv.org/pdf/2304.02867
  • Abstract
    Efficient point cloud representation is a fundamental element of Lidar-based 3D object detection. Recent grid-based detectors usually divide point clouds into voxels or pillars and construct single-stream networks in Bird's Eye View. However, these point cloud encoding paradigms underestimate the point representation in the vertical direction, which cause the loss of semantic or fine-grained information, especially for vertical sensitive objects like pedestrian and cyclists. In this paper, we propose an explicit vertical multi-scale representation learning framework, VPFusion, to combine the complementary information from both voxel and pillar streams. Specifically, VPFusion first builds upon a sparse voxel-pillar-based backbone. The backbone divides point clouds into voxels and pillars, then encodes features with 3D and 2D sparse convolution simultaneously. Next, we introduce the Sparse Fusion Layer (SFL), which establishes a bidirectional pathway for sparse voxel and pillar features to enable the interaction between them. Additionally, we present the Dense Fusion Neck (DFN) to effectively combine the dense feature maps from voxel and pillar branches with multi-scale. Extensive experiments on the large-scale Waymo Open Dataset and nuScenes Dataset demonstrate that VPFusion surpasses the single-stream baselines by a large margin and achieves state-of-the-art performance with real-time inference speed.

Object-centric Inference for Language Conditioned Placement: A Foundation Model based Approach

  • Authors: Zhixuan Xu, Kechun Xu, Yue Wang, Rong Xiong
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02893
  • Pdf link: https://arxiv.org/pdf/2304.02893
  • Abstract
    We focus on the task of language-conditioned object placement, in which a robot should generate placements that satisfy all the spatial relational constraints in language instructions. Previous works based on rule-based language parsing or scene-centric visual representation have restrictions on the form of instructions and reference objects or require large amounts of training data. We propose an object-centric framework that leverages foundation models to ground the reference objects and spatial relations for placement, which is more sample efficient and generalizable. Experiments indicate that our model can achieve a 97.75% success rate of placement with only ~0.26M trainable parameters. Besides, our method generalizes better to both unseen objects and instructions. Moreover, with only 25% training data, we still outperform the top competing approach.

Affect as a proxy for literary mood

  • Authors: Emily Öhman, Riikka Rossi
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.02894
  • Pdf link: https://arxiv.org/pdf/2304.02894
  • Abstract
    We propose to use affect as a proxy for mood in literary texts. In this study, we explore the differences in computationally detecting tone versus detecting mood. Methodologically we utilize affective word embeddings to look at the affective distribution in different text segments. We also present a simple yet efficient and effective method of enhancing emotion lexicons to take both semantic shift and the domain of the text into account producing real-world congruent results closely matching both contemporary and modern qualitative analyses.

LSketch: A Label-Enabled Graph Stream Sketch Toward Time-Sensitive Queries

  • Authors: Yiling Zeng, Chunyao Song, Yuhan Li, Tingjian Ge
  • Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.02897
  • Pdf link: https://arxiv.org/pdf/2304.02897
  • Abstract
    Graph streams represent data interactions in real applications. The mining of graph streams plays an important role in network security, social network analysis, and traffic control, among others. However, the sheer volume and high dynamics cause great challenges for efficient storage and subsequent query analysis on them. Current studies apply sketches to summarize graph streams. We propose LSketch that works for heterogeneous graph streams, which effectively preserves the label information carried by the streams in real scenes, thereby enriching the expressive ability of sketches. In addition, as graph streams continue to evolve over time, edges too old may lose their practical significance. Therefore, we introduce the sliding window model into LSketch to eliminate the expired edges automatically. LSketch uses sub-linear storage space and can support structure based queries and time-sensitive queries with high accuracy. We perform extensive experiments over four real datasets, demonstrating the superiority of the proposed method over state-of-the-art methods, in aspects of query accuracy and time efficiency.

InterFormer: Real-time Interactive Image Segmentation

  • Authors: You Huang, Hao Yang, Ke Sun, Shengchuan Zhang, Guannan Jiang, Rongrong Ji, Liujuan Cao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.02942
  • Pdf link: https://arxiv.org/pdf/2304.02942
  • Abstract
    Interactive image segmentation enables annotators to efficiently perform pixel-level annotation for segmentation tasks. However, the existing interactive segmentation pipeline suffers from inefficient computations of interactive models because of the following two issues. First, annotators' later click is based on models' feedback of annotators' former click. This serial interaction is unable to utilize model's parallelism capabilities. Second, the model has to repeatedly process the image, the annotator's current click, and the model's feedback of the annotator's former clicks at each step of interaction, resulting in redundant computations. For efficient computation, we propose a method named InterFormer that follows a new pipeline to address these issues. InterFormer extracts and preprocesses the computationally time-consuming part i.e. image processing from the existing process. Specifically, InterFormer employs a large vision transformer (ViT) on high-performance devices to preprocess images in parallel, and then uses a lightweight module called interactive multi-head self attention (I-MSA) for interactive segmentation. Furthermore, the I-MSA module's deployment on low-power devices extends the practical application of interactive segmentation. The I-MSA module utilizes the preprocessed features to efficiently response to the annotator inputs in real-time. The experiments on several datasets demonstrate the effectiveness of InterFormer, which outperforms previous interactive segmentation models in terms of computational efficiency and segmentation quality, achieve real-time high-quality interactive segmentation on CPU-only devices.

When approximate design for fast homomorphic computation provides differential privacy guarantees

  • Authors: Arnaud Grivet Sébert, Martin Zuber, Oana Stan, Renaud Sirdey, Cédric Gouy-Pailler
  • Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02959
  • Pdf link: https://arxiv.org/pdf/2304.02959
  • Abstract
    While machine learning has become pervasive in as diversified fields as industry, healthcare, social networks, privacy concerns regarding the training data have gained a critical importance. In settings where several parties wish to collaboratively train a common model without jeopardizing their sensitive data, the need for a private training protocol is particularly stringent and implies to protect the data against both the model's end-users and the actors of the training phase. Differential privacy (DP) and cryptographic primitives are complementary popular countermeasures against privacy attacks. Among these cryptographic primitives, fully homomorphic encryption (FHE) offers ciphertext malleability at the cost of time-consuming operations in the homomorphic domain. In this paper, we design SHIELD, a probabilistic approximation algorithm for the argmax operator which is both fast when homomorphically executed and whose inaccuracy is used as a feature to ensure DP guarantees. Even if SHIELD could have other applications, we here focus on one setting and seamlessly integrate it in the SPEED collaborative training framework from "SPEED: Secure, PrivatE, and Efficient Deep learning" (Grivet S'ebert et al., 2021) to improve its computational efficiency. After thoroughly describing the FHE implementation of our algorithm and its DP analysis, we present experimental results. To the best of our knowledge, it is the first work in which relaxing the accuracy of an homomorphic calculation is constructively usable as a degree of freedom to achieve better FHE performances.

A Fast and Lightweight Network for Low-Light Image Enhancement

  • Authors: Yu Zhang, Xiaoguang Di, Junde Wu, RAO FU, Yong Li, Yue Wang, Yanwu Xu, Guohui YANG, Chunhui Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.02978
  • Pdf link: https://arxiv.org/pdf/2304.02978
  • Abstract
    Low-light images often suffer from severe noise, low brightness, low contrast, and color deviation. While several low-light image enhancement methods have been proposed, there remains a lack of efficient methods that can simultaneously solve all of these problems. In this paper, we introduce FLW-Net, a Fast and LightWeight Network for low-light image enhancement that significantly improves processing speed and overall effect. To achieve efficient low-light image enhancement, we recognize the challenges of the lack of an absolute reference and the need for a large receptive field to obtain global contrast. Therefore, we propose an efficient global feature information extraction component and design loss functions based on relative information to overcome these challenges. Finally, we conduct comparative experiments to demonstrate the effectiveness of the proposed method, and the results confirm that FLW-Net can significantly reduce the complexity of supervised low-light image enhancement networks while improving processing effect. Code is available at https://github.com/hitzhangyu/FLW-Net

IoT Federated Blockchain Learning at the Edge

  • Authors: James Calo, Benny Lo
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.03006
  • Pdf link: https://arxiv.org/pdf/2304.03006
  • Abstract
    IoT devices are sorely underutilized in the medical field, especially within machine learning for medicine, yet they offer unrivaled benefits. IoT devices are low-cost, energy-efficient, small and intelligent devices. In this paper, we propose a distributed federated learning framework for IoT devices, more specifically for IoMT (Internet of Medical Things), using blockchain to allow for a decentralized scheme improving privacy and efficiency over a centralized system; this allows us to move from the cloud-based architectures, that are prevalent, to the edge. The system is designed for three paradigms: 1) Training neural networks on IoT devices to allow for collaborative training of a shared model whilst decoupling the learning from the dataset to ensure privacy. Training is performed in an online manner simultaneously amongst all participants, allowing for the training of actual data that may not have been present in a dataset collected in the traditional way and dynamically adapt the system whilst it is being trained. 2) Training of an IoMT system in a fully private manner such as to mitigate the issue with confidentiality of medical data and to build robust, and potentially bespoke, models where not much, if any, data exists. 3) Distribution of the actual network training, something federated learning itself does not do, to allow hospitals, for example, to utilize their spare computing resources to train network models.

PointCAT: Cross-Attention Transformer for point cloud

  • Authors: Xincheng Yang, Mingze Jin, Weiji He, Qian Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03012
  • Pdf link: https://arxiv.org/pdf/2304.03012
  • Abstract
    Transformer-based models have significantly advanced natural language processing and computer vision in recent years. However, due to the irregular and disordered structure of point cloud data, transformer-based models for 3D deep learning are still in their infancy compared to other methods. In this paper we present Point Cross-Attention Transformer (PointCAT), a novel end-to-end network architecture using cross-attentions mechanism for point cloud representing. Our approach combines multi-scale features via two seprate cross-attention transformer branches. To reduce the computational increase brought by multi-branch structure, we further introduce an efficient model for shape classification, which only process single class token of one branch as a query to calculate attention map with the other. Extensive experiments demonstrate that our method outperforms or achieves comparable performance to several approaches in shape classification, part segmentation and semantic segmentation tasks.

Tensor Slicing and Optimization for Multicore NPUs

  • Authors: Rafael Sousa, Marcio Pereira, Yongin Kwon, Taeho Kim, Namsoon Jung, Chang Soo Kim, Michael Frank, Guido Araujo
  • Subjects: Performance (cs.PF); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03013
  • Pdf link: https://arxiv.org/pdf/2304.03013
  • Abstract
    Although code generation for Convolution Neural Network (CNN) models has been extensively studied, performing efficient data slicing and parallelization for highly-constrai-ned Multicore Neural Processor Units (NPUs) is still a challenging problem. Given the size of convolutions' input/output tensors and the small footprint of NPU on-chip memories, minimizing memory transactions while maximizing parallelism and MAC utilization are central to any effective solution. This paper proposes a TensorFlow XLA/LLVM compiler optimization pass for Multicore NPUs, called Tensor Slicing Optimization (TSO), which: (a) maximizes convolution parallelism and memory usage across NPU cores; and (b) reduces data transfers between host and NPU on-chip memories by using DRAM memory burst time estimates to guide tensor slicing. To evaluate the proposed approach, a set of experiments was performed using the NeuroMorphic Processor (NMP), a multicore NPU containing 32 RISC-V cores extended with novel CNN instructions. Experimental results show that TSO is capable of identifying the best tensor slicing that minimizes execution time for a set of CNN models. Speed-ups of up to 21.7% result when comparing the TSO burst-based technique to a no-burst data slicing approach. To validate the generality of the TSO approach, the algorithm was also ported to the Glow Machine Learning framework. The performance of the models were measured on both Glow and TensorFlow XLA/LLVM compilers, revealing similar results.

A computation of D(9) using FPGA Supercomputing

  • Authors: Lennart Van Hirtum, Patrick De Causmaecker, Jens Goemaere, Tobias Kenter, Heinrich Riebler, Michael Lass, Christian Plessl
  • Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)
  • Arxiv link: https://arxiv.org/abs/2304.03039
  • Pdf link: https://arxiv.org/pdf/2304.03039
  • Abstract
    This preprint makes the claim of having computed the $9^{th}$ Dedekind Number. This was done by building an efficient FPGA Accelerator for the core operation of the process, and parallelizing it on the Noctua 2 Supercluster at Paderborn University. The resulting value is 286386577668298411128469151667598498812366. This value can be verified in two steps. We have made the data file containing the 490M results available, each of which can be verified separately on CPU, and the whole file sums to our proposed value.

Data-driven HVAC Control Using Symbolic Regression: Design and Implementation

  • Authors: Yuki Ozawa, Dafang Zhao, Daichi Watari, Ittetsu Taniguchi, Toshihiro Suzuki, Yoshiyuki Shimoda, Takao Onoye
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.03078
  • Pdf link: https://arxiv.org/pdf/2304.03078
  • Abstract
    The large amount of data collected in buildings makes energy management smarter and more energy efficient. This study proposes a design and implementation methodology of data-driven heating, ventilation, and air conditioning (HVAC) control. Building thermodynamics is modeled using a symbolic regression model (SRM) built from the collected data. Additionally, an HVAC system model is also developed with a data-driven approach. A model predictive control (MPC) based HVAC scheduling is formulated with the developed models to minimize energy consumption and peak power demand and maximize thermal comfort. The performance of the proposed framework is demonstrated in the workspace in the actual campus building. The HVAC system using the proposed framework reduces the peak power by 16.1% compared to the widely used thermostat controller.

Offline Uncertainty Sampling in Data-driven Stochastic MPC

  • Authors: Johannes Teutsch, Sebastian Kerz, Tim Brüdigam, Dirk Wollherr, Marion Leibold
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.03088
  • Pdf link: https://arxiv.org/pdf/2304.03088
  • Abstract
    In this work, we exploit an offline-sampling based strategy for the constrained data-driven predictive control of an unknown linear system subject to random measurement noise. The strategy uses only past measured, potentially noisy data in a non-parametric system representation and does not require any prior model identification. The approximation of chance constraints using uncertainty sampling leads to efficient constraint tightening. Under mild assumptions, robust recursive feasibility and closed-loop constraint satisfaction is shown. In a simulation example, we provide evidence for the improved control performance of the proposed control scheme in comparison to a purely robust data-driven predictive control approach.

Inductive Graph Unlearning

  • Authors: Cheng-Long Wang, Mengdi Huai, Di Wang
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.03093
  • Pdf link: https://arxiv.org/pdf/2304.03093
  • Abstract
    As a way to implement the "right to be forgotten" in machine learning, \textit{machine unlearning} aims to completely remove the contributions and information of the samples to be deleted from a trained model without affecting the contributions of other samples. Recently, many frameworks for machine unlearning have been proposed, and most of them focus on image and text data. To extend machine unlearning to graph data, \textit{GraphEraser} has been proposed. However, a critical issue is that \textit{GraphEraser} is specifically designed for the transductive graph setting, where the graph is static and attributes and edges of test nodes are visible during training. It is unsuitable for the inductive setting, where the graph could be dynamic and the test graph information is invisible in advance. Such inductive capability is essential for production machine learning systems with evolving graphs like social media and transaction networks. To fill this gap, we propose the \underline{{\bf G}}\underline{{\bf U}}ided \underline{{\bf I}}n\underline{{\bf D}}uctiv\underline{{\bf E}} Graph Unlearning framework (GUIDE). GUIDE consists of three components: guided graph partitioning with fairness and balance, efficient subgraph repair, and similarity-based aggregation. Empirically, we evaluate our method on several inductive benchmarks and evolving transaction graphs. Generally speaking, GUIDE can be efficiently implemented on the inductive graph learning tasks for its low graph partition cost, no matter on computation or structure information. The code will be available here: https://github.com/Happy2Git/GUIDE.

FABRID: Flexible Attestation-Based Routing for Inter-Domain Networks

  • Authors: Cyrill Krähenbühl (ETH Zürich), Marc Wyss (ETH Zürich), David Basin (ETH Zürich), Vincent Lenders (armasuisse), Adrian Perrig (ETH Zürich), Martin Strohmeier (armasuisse)
  • Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.03108
  • Pdf link: https://arxiv.org/pdf/2304.03108
  • Abstract
    In its current state, the Internet does not provide end users with transparency and control regarding on-path forwarding devices. In particular, the lack of network device information reduces the trustworthiness of the forwarding path and prevents end-user applications requiring specific router capabilities from reaching their full potential. Moreover, the inability to influence the traffic's forwarding path results in applications communicating over undesired routes, while alternative paths with more desirable properties remain unusable. In this work, we present FABRID, a system that enables applications to forward traffic flexibly, potentially on multiple paths selected to comply with user-defined preferences, where information about forwarding devices is exposed and transparently attested by autonomous systems (ASes). The granularity of this information is chosen by each AS individually, protecting them from leaking sensitive network details, while the secrecy and authenticity of preferences embedded within the users' packets are protected through efficient cryptographic operations. We show the viability of FABRID by deploying it on a global SCION network test bed, and we demonstrate high throughput on commodity hardware.

Simplifying Content-Based Neural News Recommendation: On User Modeling and Training Objectives

  • Authors: Andreea Iana, Goran Glavaš, Heiko Paulheim
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.03112
  • Pdf link: https://arxiv.org/pdf/2304.03112
  • Abstract
    The advent of personalized news recommendation has given rise to increasingly complex recommender architectures. Most neural news recommenders rely on user click behavior and typically introduce dedicated user encoders that aggregate the content of clicked news into user embeddings (early fusion). These models are predominantly trained with standard point-wise classification objectives. The existing body of work exhibits two main shortcomings: (1) despite general design homogeneity, direct comparisons between models are hindered by varying evaluation datasets and protocols; (2) it leaves alternative model designs and training objectives vastly unexplored. In this work, we present a unified framework for news recommendation, allowing for a systematic and fair comparison of news recommenders across several crucial design dimensions: (i) candidate-awareness in user modeling, (ii) click behavior fusion, and (iii) training objectives. Our findings challenge the status quo in neural news recommendation. We show that replacing sizable user encoders with parameter-efficient dot products between candidate and clicked news embeddings (late fusion) often yields substantial performance gains. Moreover, our results render contrastive training a viable alternative to point-wise classification objectives.

Zero-shot Generative Model Adaptation via Image-specific Prompt Learning

  • Authors: Jiayi Guo, Chaofei Wang, You Wu, Eric Zhang, Kai Wang, Xingqian Xu, Shiji Song, Humphrey Shi, Gao Huang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03119
  • Pdf link: https://arxiv.org/pdf/2304.03119
  • Abstract
    Recently, CLIP-guided image synthesis has shown appealing performance on adapting a pre-trained source-domain generator to an unseen target domain. It does not require any target-domain samples but only the textual domain labels. The training is highly efficient, e.g., a few minutes. However, existing methods still have some limitations in the quality of generated images and may suffer from the mode collapse issue. A key reason is that a fixed adaptation direction is applied for all cross-domain image pairs, which leads to identical supervision signals. To address this issue, we propose an Image-specific Prompt Learning (IPL) method, which learns specific prompt vectors for each source-domain image. This produces a more precise adaptation direction for every cross-domain image pair, endowing the target-domain generator with greatly enhanced flexibility. Qualitative and quantitative evaluations on various domains demonstrate that IPL effectively improves the quality and diversity of synthesized images and alleviates the mode collapse. Moreover, IPL is independent of the structure of the generative model, such as generative adversarial networks or diffusion models. Code is available at https://github.com/Picsart-AI-Research/IPL-Zero-Shot-Generative-Model-Adaptation.

BotTriNet: A Unified and Efficient Embedding for Social Bots Detection via Metric Learning

  • Authors: Jun Wu, Xuesong Ye, Man Yan Yuet
  • Subjects: Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.03144
  • Pdf link: https://arxiv.org/pdf/2304.03144
  • Abstract
    A persistently popular topic in online social networks is the rapid and accurate discovery of bot accounts to prevent their invasion and harassment of genuine users. We propose a unified embedding framework called BOTTRINET, which utilizes textual content posted by accounts for bot detection based on the assumption that contexts naturally reveal account personalities and habits. Content is abundant and valuable if the system efficiently extracts bot-related information using embedding techniques. Beyond the general embedding framework that generates word, sentence, and account embeddings, we design a triplet network to tune the raw embeddings (produced by traditional natural language processing techniques) for better classification performance. We evaluate detection accuracy and f1score on a real-world dataset CRESCI2017, comprising three bot account categories and five bot sample sets. Our system achieves the highest average accuracy of 98.34% and f1score of 97.99% on two content-intensive bot sets, outperforming previous work and becoming state-of-the-art. It also makes a breakthrough on four content-less bot sets, with an average accuracy improvement of 11.52% and an average f1score increase of 16.70%.

Parameterized Approximation Schemes for Clustering with General Norm Objectives

  • Authors: Fateme Abbasi, Sandip Banerjee, Jarosław Byrka, Parinya Chalermsook, Ameet Gadekar, Kamyar Khodamoradi, Dániel Marx, Roohani Sharma, Joachim Spoerhase
  • Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03146
  • Pdf link: https://arxiv.org/pdf/2304.03146
  • Abstract
    This paper considers the well-studied algorithmic regime of designing a $(1+\epsilon)$-approximation algorithm for a $k$-clustering problem that runs in time $f(k,\epsilon)poly(n)$ (sometimes called an efficient parameterized approximation scheme or EPAS for short). Notable results of this kind include EPASes in the high-dimensional Euclidean setting for $k$-center [Bad\u{o}iu, Har-Peled, Indyk; STOC'02] as well as $k$-median, and $k$-means [Kumar, Sabharwal, Sen; J. ACM 2010]. However, existing EPASes handle only basic objectives (such as $k$-center, $k$-median, and $k$-means) and are tailored to the specific objective and metric space. Our main contribution is a clean and simple EPAS that settles more than ten clustering problems (across multiple well-studied objectives as well as metric spaces) and unifies well-known EPASes. Our algorithm gives EPASes for a large variety of clustering objectives (for example, $k$-means, $k$-center, $k$-median, priority $k$-center, $\ell$-centrum, ordered $k$-median, socially fair $k$-median aka robust $k$-median, or more generally monotone norm $k$-clustering) and metric spaces (for example, continuous high-dimensional Euclidean spaces, metrics of bounded doubling dimension, bounded treewidth metrics, and planar metrics). Key to our approach is a new concept that we call bounded $\epsilon$-scatter dimension--an intrinsic complexity measure of a metric space that is a relaxation of the standard notion of bounded doubling dimension. Our main technical result shows that two conditions are essentially sufficient for our algorithm to yield an EPAS on the input metric $M$ for any clustering objective: (i) The objective is described by a monotone (not necessarily symmetric!) norm, and (ii) the $\epsilon$-scatter dimension of $M$ is upper bounded by a function of $\epsilon$.

Spectral Toolkit of Algorithms for Graphs: Technical Report (1)

  • Authors: Peter Macgregor, He Sun
  • Subjects: Social and Information Networks (cs.SI); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Mathematical Software (cs.MS)
  • Arxiv link: https://arxiv.org/abs/2304.03170
  • Pdf link: https://arxiv.org/pdf/2304.03170
  • Abstract
    Spectral Toolkit of Algorithms for Graphs (STAG) is an open-source library for efficient spectral graph algorithms, and its development starts in September 2022. We have so far finished the component on local graph clustering, and this technical report presents a user's guide to STAG, showcase studies, and several technical considerations behind our development.

Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream

  • Authors: Yuheng Jiang, Kaixin Yao, Zhuo Su, Zhehao Shen, Haimin Luo, Lan Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03184
  • Pdf link: https://arxiv.org/pdf/2304.03184
  • Abstract
    Convenient 4D modeling of human-object interactions is essential for numerous applications. However, monocular tracking and rendering of complex interaction scenarios remain challenging. In this paper, we propose Instant-NVR, a neural approach for instant volumetric human-object tracking and rendering using a single RGBD camera. It bridges traditional non-rigid tracking with recent instant radiance field techniques via a multi-thread tracking-rendering mechanism. In the tracking front-end, we adopt a robust human-object capture scheme to provide sufficient motion priors. We further introduce a separated instant neural representation with a novel hybrid deformation module for the interacting scene. We also provide an on-the-fly reconstruction scheme of the dynamic/static radiance fields via efficient motion-prior searching. Moreover, we introduce an online key frame selection scheme and a rendering-aware refinement strategy to significantly improve the appearance details for online novel-view synthesis. Extensive experiments demonstrate the effectiveness and efficiency of our approach for the instant generation of human-object radiance fields on the fly, notably achieving real-time photo-realistic novel view synthesis under complex human-object interactions.

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

  • Authors: Nolan Dey, Gurpreet Gosal, Zhiming (Charles)Chen, Hemant Khachane, William Marshall, Ribhu Pathria, Marvin Tom, Joel Hestness
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.03208
  • Pdf link: https://arxiv.org/pdf/2304.03208
  • Abstract
    We study recent research advances that improve large language models through efficient pre-training and scaling, and open datasets and tools. We combine these advances to introduce Cerebras-GPT, a family of open compute-optimal language models scaled from 111M to 13B parameters. We train Cerebras-GPT models on the Eleuther Pile dataset following DeepMind Chinchilla scaling rules for efficient pre-training (highest accuracy for a given compute budget). We characterize the predictable power-law scaling and compare Cerebras-GPT with other publicly-available models to show all Cerebras-GPT models have state-of-the-art training efficiency on both pre-training and downstream objectives. We describe our learnings including how Maximal Update Parameterization ($\mu$P) can further improve large model scaling, improving accuracy and hyperparameter predictability at scale. We release our pre-trained models and code, making this paper the first open and reproducible work comparing compute-optimal model scaling to models trained on fixed dataset sizes. Cerebras-GPT models are available on HuggingFace: https://huggingface.co/cerebras.

Hierarchical Graph Neural Network with Cross-Attention for Cross-Device User Matching

  • Authors: Ali Taghibakhshi, Mingyuan Ma, Ashwath Aithal, Onur Yilmaz, Haggai Maron, Matthew West
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.03215
  • Pdf link: https://arxiv.org/pdf/2304.03215
  • Abstract
    Cross-device user matching is a critical problem in numerous domains, including advertising, recommender systems, and cybersecurity. It involves identifying and linking different devices belonging to the same person, utilizing sequence logs. Previous data mining techniques have struggled to address the long-range dependencies and higher-order connections between the logs. Recently, researchers have modeled this problem as a graph problem and proposed a two-tier graph contextual embedding (TGCE) neural network architecture, which outperforms previous methods. In this paper, we propose a novel hierarchical graph neural network architecture (HGNN), which has a more computationally efficient second level design than TGCE. Furthermore, we introduce a cross-attention (Cross-Att) mechanism in our model, which improves performance by 5% compared to the state-of-the-art TGCE method.

FedBot: Enhancing Privacy in Chatbots with Federated Learning

  • Authors: Addi Ait-Mlouk, Sadi Alawadi, Salman Toor, Andreas Hellander
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03228
  • Pdf link: https://arxiv.org/pdf/2304.03228
  • Abstract
    Chatbots are mainly data-driven and usually based on utterances that might be sensitive. However, training deep learning models on shared data can violate user privacy. Such issues have commonly existed in chatbots since their inception. In the literature, there have been many approaches to deal with privacy, such as differential privacy and secure multi-party computation, but most of them need to have access to users' data. In this context, Federated Learning (FL) aims to protect data privacy through distributed learning methods that keep the data in its location. This paper presents Fedbot, a proof-of-concept (POC) privacy-preserving chatbot that leverages large-scale customer support data. The POC combines Deep Bidirectional Transformer models and federated learning algorithms to protect customer data privacy during collaborative model training. The results of the proof-of-concept showcase the potential for privacy-preserving chatbots to transform the customer support industry by delivering personalized and efficient customer service that meets data privacy regulations and legal requirements. Furthermore, the system is specifically designed to improve its performance and accuracy over time by leveraging its ability to learn from previous interactions.

DiffMimic: Efficient Motion Mimicking with Differentiable Physics

  • Authors: Jiawei Ren, Cunjun Yu, Siwei Chen, Xiao Ma, Liang Pan, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03274
  • Pdf link: https://arxiv.org/pdf/2304.03274
  • Abstract
    Motion mimicking is a foundational task in physics-based character animation. However, most existing motion mimicking methods are built upon reinforcement learning (RL) and suffer from heavy reward engineering, high variance, and slow convergence with hard explorations. Specifically, they usually take tens of hours or even days of training to mimic a simple motion sequence, resulting in poor scalability. In this work, we leverage differentiable physics simulators (DPS) and propose an efficient motion mimicking method dubbed DiffMimic. Our key insight is that DPS casts a complex policy learning task to a much simpler state matching problem. In particular, DPS learns a stable policy by analytical gradients with ground-truth physical priors hence leading to significantly faster and stabler convergence than RL-based methods. Moreover, to escape from local optima, we utilize a Demonstration Replay mechanism to enable stable gradient backpropagation in a long horizon. Extensive experiments on standard benchmarks show that DiffMimic has a better sample efficiency and time efficiency than existing methods (e.g., DeepMimic). Notably, DiffMimic allows a physically simulated character to learn Backflip after 10 minutes of training and be able to cycle it after 3 hours of training, while the existing approach may require about a day of training to cycle Backflip. More importantly, we hope DiffMimic can benefit more differentiable animation systems with techniques like differentiable clothes simulation in future research.

Keyword: faster

DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model

  • Authors: Hoigi Seo, Hayeon Kim, Gwanghyun Kim, Se Young Chun
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.02827
  • Pdf link: https://arxiv.org/pdf/2304.02827
  • Abstract
    The increasing demand for high-quality 3D content creation has motivated the development of automated methods for creating 3D object models from a single image and/or from a text prompt. However, the reconstructed 3D objects using state-of-the-art image-to-3D methods still exhibit low correspondence to the given image and low multi-view consistency. Recent state-of-the-art text-to-3D methods are also limited, yielding 3D samples with low diversity per prompt with long synthesis time. To address these challenges, we propose DITTO-NeRF, a novel pipeline to generate a high-quality 3D NeRF model from a text prompt or a single image. Our DITTO-NeRF consists of constructing high-quality partial 3D object for limited in-boundary (IB) angles using the given or text-generated 2D image from the frontal view and then iteratively reconstructing the remaining 3D NeRF using inpainting latent diffusion model. We propose progressive 3D object reconstruction schemes in terms of scales (low to high resolution), angles (IB angles initially to outer-boundary (OB) later), and masks (object to background boundary) in our DITTO-NeRF so that high-quality information on IB can be propagated into OB. Our DITTO-NeRF outperforms state-of-the-art methods in terms of fidelity and diversity qualitatively and quantitatively with much faster training times than prior arts on image/text-to-3D such as DreamFusion, and NeuralLift-360.

Convolutional neural networks for crack detection on flexible road pavements

  • Authors: Hermann Tapamo, Anna Bosman, James Maina, Emile Horak
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02933
  • Pdf link: https://arxiv.org/pdf/2304.02933
  • Abstract
    Flexible road pavements deteriorate primarily due to traffic and adverse environmental conditions. Cracking is the most common deterioration mechanism; the surveying thereof is typically conducted manually using internationally defined classification standards. In South Africa, the use of high-definition video images has been introduced, which allows for safer road surveying. However, surveying is still a tedious manual process. Automation of the detection of defects such as cracks would allow for faster analysis of road networks and potentially reduce human bias and error. This study performs a comparison of six state-of-the-art convolutional neural network models for the purpose of crack detection. The models are pretrained on the ImageNet dataset, and fine-tuned using a new real-world binary crack dataset consisting of 14000 samples. The effects of dataset augmentation are also investigated. Of the six models trained, five achieved accuracy above 97%. The highest recorded accuracy was 98%, achieved by the ResNet and VGG16 models. The dataset is available at the following URL: https://zenodo.org/record/7795975

Boundary-Denoising for Video Activity Localization

  • Authors: Mengmeng Xu, Mattia Soldan, Jialin Gao, Shuming Liu, Juan-Manuel Pérez-Rúa, Bernard Ghanem
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02934
  • Pdf link: https://arxiv.org/pdf/2304.02934
  • Abstract
    Video activity localization aims at understanding the semantic content in long untrimmed videos and retrieving actions of interest. The retrieved action with its start and end locations can be used for highlight generation, temporal action detection, etc. Unfortunately, learning the exact boundary location of activities is highly challenging because temporal activities are continuous in time, and there are often no clear-cut transitions between actions. Moreover, the definition of the start and end of events is subjective, which may confuse the model. To alleviate the boundary ambiguity, we propose to study the video activity localization problem from a denoising perspective. Specifically, we propose an encoder-decoder model named DenoiseLoc. During training, a set of action spans is randomly generated from the ground truth with a controlled noise scale. Then we attempt to reverse this process by boundary denoising, allowing the localizer to predict activities with precise boundaries and resulting in faster convergence speed. Experiments show that DenoiseLoc advances %in several video activity understanding tasks. For example, we observe a gain of +12.36% average mAP on QV-Highlights dataset and +1.64% [email protected] on THUMOS'14 dataset over the baseline. Moreover, DenoiseLoc achieves state-of-the-art performance on TACoS and MAD datasets, but with much fewer predictions compared to other current methods.

Training a Two Layer ReLU Network Analytically

  • Authors: Adrian Barbu
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02972
  • Pdf link: https://arxiv.org/pdf/2304.02972
  • Abstract
    Neural networks are usually trained with different variants of gradient descent based optimization algorithms such as stochastic gradient descent or the Adam optimizer. Recent theoretical work states that the critical points (where the gradient of the loss is zero) of two-layer ReLU networks with the square loss are not all local minima. However, in this work we will explore an algorithm for training two-layer neural networks with ReLU-like activation and the square loss that alternatively finds the critical points of the loss function analytically for one layer while keeping the other layer and the neuron activation pattern fixed. Experiments indicate that this simple algorithm can find deeper optima than Stochastic Gradient Descent or the Adam optimizer, obtaining significantly smaller training loss values on four out of the five real datasets evaluated. Moreover, the method is faster than the gradient descent methods and has virtually no tuning parameters.

Patch-wise Features for Blur Image Classification

  • Authors: Sri Charan Kattamuru, Kshitij Agrawal, Shyam Prasad Adhikari, Abhishek Bose, Hemant Misra
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03156
  • Pdf link: https://arxiv.org/pdf/2304.03156
  • Abstract
    Images captured through smartphone cameras often suffer from degradation, blur being one of the major ones, posing a challenge in processing these images for downstream tasks. In this paper we propose low-compute lightweight patch-wise features for image quality assessment. Using our method we can discriminate between blur vs sharp image degradation. To this end, we train a decision-tree based XGBoost model on various intuitive image features like gray level variance, first and second order gradients, texture features like local binary patterns. Experiments conducted on an open dataset show that the proposed low compute method results in 90.1% mean accuracy on the validation set, which is comparable to the accuracy of a compute-intensive VGG16 network with 94% mean accuracy fine-tuned to this task. To demonstrate the generalizability of our proposed features and model we test the model on BHBID dataset and an internal dataset where we attain accuracy of 98% and 91%, respectively. The proposed method is 10x faster than the VGG16 based model on CPU and scales linearly to the input image size making it suitable to be implemented on low compute edge devices.

DiffMimic: Efficient Motion Mimicking with Differentiable Physics

  • Authors: Jiawei Ren, Cunjun Yu, Siwei Chen, Xiao Ma, Liang Pan, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03274
  • Pdf link: https://arxiv.org/pdf/2304.03274
  • Abstract
    Motion mimicking is a foundational task in physics-based character animation. However, most existing motion mimicking methods are built upon reinforcement learning (RL) and suffer from heavy reward engineering, high variance, and slow convergence with hard explorations. Specifically, they usually take tens of hours or even days of training to mimic a simple motion sequence, resulting in poor scalability. In this work, we leverage differentiable physics simulators (DPS) and propose an efficient motion mimicking method dubbed DiffMimic. Our key insight is that DPS casts a complex policy learning task to a much simpler state matching problem. In particular, DPS learns a stable policy by analytical gradients with ground-truth physical priors hence leading to significantly faster and stabler convergence than RL-based methods. Moreover, to escape from local optima, we utilize a Demonstration Replay mechanism to enable stable gradient backpropagation in a long horizon. Extensive experiments on standard benchmarks show that DiffMimic has a better sample efficiency and time efficiency than existing methods (e.g., DeepMimic). Notably, DiffMimic allows a physically simulated character to learn Backflip after 10 minutes of training and be able to cycle it after 3 hours of training, while the existing approach may require about a day of training to cycle Backflip. More importantly, we hope DiffMimic can benefit more differentiable animation systems with techniques like differentiable clothes simulation in future research.

Keyword: mobile

Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural Networks

  • Authors: Michael Weiss, Paolo Tonella
  • Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.02654
  • Pdf link: https://arxiv.org/pdf/2304.02654
  • Abstract
    Recent decades have seen the rise of large-scale Deep Neural Networks (DNNs) to achieve human-competitive performance in a variety of artificial intelligence tasks. Often consisting of hundreds of millions, if not hundreds of billion parameters, these DNNs are too large to be deployed to, or efficiently run on resource-constrained devices such as mobile phones or IoT microcontrollers. Systems relying on large-scale DNNs thus have to call the corresponding model over the network, leading to substantial costs for hosting and running the large-scale remote model, costs which are often charged on a per-use basis. In this paper, we propose BiSupervised, a novel architecture, where, before relying on a large remote DNN, a system attempts to make a prediction on a small-scale local model. A DNN supervisor monitors said prediction process and identifies easy inputs for which the local prediction can be trusted. For these inputs, the remote model does not have to be invoked, thus saving costs, while only marginally impacting the overall system accuracy. Our architecture furthermore foresees a second supervisor to monitor the remote predictions and identify inputs for which not even these can be trusted, allowing to raise an exception or run a fallback strategy instead. We evaluate the cost savings, and the ability to detect incorrectly predicted inputs on four diverse case studies: IMDB movie review sentiment classification, Github issue triaging, Imagenet image classification, and SQuADv2 free-text question answering

Adaptive Headway Motion Control and Motion Prediction for Safe Unicycle Motion Design

  • Authors: Aykut İşleyen, Nathan van de Wouw, Ömür Arslan
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02760
  • Pdf link: https://arxiv.org/pdf/2304.02760
  • Abstract
    Differential drive robots that can be modeled as a kinematic unicycle are a standard mobile base platform for many service and logistics robots. Safe and smooth autonomous motion around obstacles is a crucial skill for unicycle robots to perform diverse tasks in complex environments. A classical control approach for unicycle control is feedback linearization using a headway point at a fixed headway distance in front of the unicycle. The unicycle headway control brings the headway point to a desired goal location by embedding a linear headway reference dynamics, which often results in an undesired offset for the actual unicycle position. In this paper, we introduce a new unicycle headway control approach with an adaptive headway distance that overcomes this limitation, i.e., when the headway point reaches the goal the unicycle position is also at the goal. By systematically analyzing the closed-loop unicycle motion under the adaptive headway controller, we design analytical feedback motion prediction methods that bound the closed-loop unicycle position trajectory and so can be effectively used for safety assessment and safe unicycle motion design around obstacles. We present an application of adaptive headway motion control and motion prediction for safe unicycle path following around obstacles in numerical simulations.

Evaluating Customization of Remote Tele-operation Interfaces for Assistive Robots

  • Authors: Vinitha Ranganeni, Noah Ponto, Maya Cakmak
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02771
  • Pdf link: https://arxiv.org/pdf/2304.02771
  • Abstract
    Mobile manipulator platforms, like the Stretch RE1 robot, make the promise of in-home robotic assistance feasible. For people with severe physical limitations, like those with quadriplegia, the ability to tele-operate these robots themselves means that they can perform physical tasks they cannot otherwise do themselves, thereby increasing their level of independence. In order for users with physical limitations to operate these robots, their interfaces must be accessible and cater to the specific needs of all users. As physical limitations vary amongst users, it is difficult to make a single interface that will accommodate all users. Instead, such interfaces should be customizable to each individual user. In this paper we explore the value of customization of a browser-based interface for tele-operating the Stretch RE1 robot. More specifically, we evaluate the usability and effectiveness of a customized interface in comparison to the default interface configurations from prior work. We present a user study involving participants with motor impairments (N=10) and without motor impairments, who could serve as a caregiver, (N=13) that use the robot to perform mobile manipulation tasks in a real kitchen environment. Our study demonstrates that no single interface configuration satisfies all users' needs and preferences. Users perform better when using the customized interface for navigation, but not for manipulation due to higher complexity of learning to manipulate through the robot. All participants are able to use the robot to complete all tasks and participants with motor impairments believe that having the robot in their home would make them more independent.

Gotta Assess `Em All: A Risk Analysis of Criminal Offenses Facilitated through PokemonGO

  • Authors: Ashly Fuller, Martin Lo, Angelica Holmes, Lu Lemanski, Marie Vasek, Enrico Mariconti
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.02952
  • Pdf link: https://arxiv.org/pdf/2304.02952
  • Abstract
    Location-based games have come to the forefront of popularity in casual and mobile gaming over the past six years. However, there is no hard data on crimes that these games enable, ranging from assault to cyberstalking to grooming. Given these potential harms, we conduct a risk assessment and quasi-experiment on the game features of location-based games. Using PokemonGO as a case study, we identify and establish cyber-enabled stalking as the main risk event where in-game features such as an innocent function to share in-game postcards can be exploited by malicious users. Users obtain postcards that are unique to each Pokestop and represent gifts that can be shared with in-game friends. The number of postcards that each user can retain is limited, so they send the excess to their friends with items that boost their friends' game activities. The postcard often also unintentionally leaks the users' commonly visited locations to their in-game friends. We analyze these in-game features using risk assessment and identify cyber-enabled stalking as one of the main threats. We further evaluate the feasibility of this crime through a quasi-experiment. Our results show that participants' routine locations such as home and work can be reliably re-identified within days from the first gift exchange. This exploitation of a previously unconsidered in-game feature enables physical stalking of previously unknown persons which can escalate into more serious crimes. Given current data protection legislation in Europe, further preventive measures are required by Niantic to protect pseudonymized users from being re-identified by in-game features and (potentially) stalked.

SwarmGear: Heterogeneous Swarm of Drones with Reconfigurable Leader Drone and Virtual Impedance Links for Multi-Robot Inspection

  • Authors: Zhanibek Darush, Mikhail Martynov, Aleksey Fedoseev, Aleksei Shcherbak, Dzmitry Tsetserukou
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02956
  • Pdf link: https://arxiv.org/pdf/2304.02956
  • Abstract
    The continuous monitoring by drone swarms remains a challenging problem due to the lack of power supply and the inability of drones to land on uneven surfaces. Heterogeneous swarms, including ground and aerial vehicles, can support longer inspections and carry a higher number of sensors on board. However, their capabilities are limited by the mobility of wheeled and legged robots in a cluttered environment. In this paper, we propose a novel concept for autonomous inspection that we call SwarmGear. SwarmGear utilizes a heterogeneous swarm that investigates the environment in a leader-follower formation. The leader drone is able to land on rough terrain and traverse it by four compliant robotic legs, possessing both the functionalities of an aerial and mobile robot. To preserve the formation of the swarm during its motion, virtual impedance links were developed between the leader and the follower drones. We evaluated experimentally the accuracy of the hybrid leader drone's ground locomotion. By changing the step parameters, the optimal step configuration was found. Two types of gaits were evaluated. The experiments revealed low crosstrack error (mean of 2 cm and max of 4.8 cm) and the ability of the leader drone to move with a 190 mm step length and a 3 degree standard yaw deviation. Four types of drone formations were considered. The best formation was used for experiments with SwarmGear, and it showed low overall crosstrack error for the swarm (mean 7.9 cm for the type 1 gait and 5.1 cm for the type 2 gait). The proposed system can potentially improve the performance of autonomous swarms in cluttered and unstructured environments by allowing all agents of the swarm to switch between aerial and ground formations to overcome various obstacles and perform missions over a large area.

Spritz-PS: Validation of Synthetic Face Images Using a Large Dataset of Printed Documents

  • Authors: Ehsan Nowroozi, Yoosef Habibi, Mauro Conti
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computers and Society (cs.CY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02982
  • Pdf link: https://arxiv.org/pdf/2304.02982
  • Abstract
    The capability of doing effective forensic analysis on printed and scanned (PS) images is essential in many applications. PS documents may be used to conceal the artifacts of images which is due to the synthetic nature of images since these artifacts are typically present in manipulated images and the main artifacts in the synthetic images can be removed after the PS. Due to the appeal of Generative Adversarial Networks (GANs), synthetic face images generated with GANs models are difficult to differentiate from genuine human faces and may be used to create counterfeit identities. Additionally, since GANs models do not account for physiological constraints for generating human faces and their impact on human IRISes, distinguishing genuine from synthetic IRISes in the PS scenario becomes extremely difficult. As a result of the lack of large-scale reference IRIS datasets in the PS scenario, we aim at developing a novel dataset to become a standard for Multimedia Forensics (MFs) investigation which is available at [45]. In this paper, we provide a novel dataset made up of a large number of synthetic and natural printed IRISes taken from VIPPrint Printed and Scanned face images. We extracted irises from face images and it is possible that the model due to eyelid occlusion captured the incomplete irises. To fill the missing pixels of extracted iris, we applied techniques to discover the complex link between the iris images. To highlight the problems involved with the evaluation of the dataset's IRIS images, we conducted a large number of analyses employing Siamese Neural Networks to assess the similarities between genuine and synthetic human IRISes, such as ResNet50, Xception, VGG16, and MobileNet-v2. For instance, using the Xception network, we achieved 56.76% similarity of IRISes for synthetic images and 92.77% similarity of IRISes for real images.

Keyword: pruning

To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency

  • Authors: Daniel Campos, ChengXiang Zhai
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.02721
  • Pdf link: https://arxiv.org/pdf/2304.02721
  • Abstract
    Sequence-to-sequence language models can be used to produce abstractive summaries which are coherent, relevant, and concise. Still, model sizes can make deployment in latency-sensitive or web-scale implementations difficult. This paper studies the relationship between model size, structured pruning, inference efficiency, and summarization accuracy on widely used summarization datasets. We show that model accuracy is tied to the encoder size while inference efficiency is connected to the decoder. Using asymmetric pruning can lead to nearly 3x improvement in inference latency with ~1 point loss in Rouge-2. Moreover, we find both the average degradation and the role of asymmetry to be consistent across model sizes and variations in datasets.

NTK-SAP: Improving neural network pruning by aligning training dynamics

  • Authors: Yite Wang, Dawei Li, Ruoyu Sun
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02840
  • Pdf link: https://arxiv.org/pdf/2304.02840
  • Abstract
    Pruning neural networks before training has received increasing interest due to its potential to reduce training time and memory. One popular method is to prune the connections based on a certain metric, but it is not entirely clear what metric is the best choice. Recent advances in neural tangent kernel (NTK) theory suggest that the training dynamics of large enough neural networks is closely related to the spectrum of the NTK. Motivated by this finding, we propose to prune the connections that have the least influence on the spectrum of the NTK. This method can help maintain the NTK spectrum, which may help align the training dynamics to that of its dense counterpart. However, one possible issue is that the fixed-weight-NTK corresponding to a given initial point can be very different from the NTK corresponding to later iterates during the training phase. We further propose to sample multiple realizations of random weights to estimate the NTK spectrum. Note that our approach is weight-agnostic, which is different from most existing methods that are weight-dependent. In addition, we use random inputs to compute the fixed-weight-NTK, making our method data-agnostic as well. We name our foresight pruning algorithm Neural Tangent Kernel Spectrum-Aware Pruning (NTK-SAP). Empirically, our method achieves better performance than all baselines on multiple datasets.

Learning to Learn with Indispensable Connections

  • Authors: Sambhavi Tiwari, Manas Gogoi, Shekhar Verma, Krishna Pratap Singh
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02862
  • Pdf link: https://arxiv.org/pdf/2304.02862
  • Abstract
    Meta-learning aims to solve unseen tasks with few labelled instances. Nevertheless, despite its effectiveness for quick learning in existing optimization-based methods, it has several flaws. Inconsequential connections are frequently seen during meta-training, which results in an over-parameterized neural network. Because of this, meta-testing observes unnecessary computations and extra memory overhead. To overcome such flaws. We propose a novel meta-learning method called Meta-LTH that includes indispensible (necessary) connections. We applied the lottery ticket hypothesis technique known as magnitude pruning to generate these crucial connections that can effectively solve few-shot learning problem. We aim to perform two things: (a) to find a sub-network capable of more adaptive meta-learning and (b) to learn new low-level features of unseen tasks and recombine those features with the already learned features during the meta-test phase. Experimental results show that our proposed Met-LTH method outperformed existing first-order MAML algorithm for three different classification datasets. Our method improves the classification accuracy by approximately 2% (20-way 1-shot task setting) for omniglot dataset.

Keyword: voxel

VPFusion: Towards Robust Vertical Representation Learning for 3D Object Detection

  • Authors: Yuhao Huang, Sanping Zhou, Junjie Zhang, Jinpeng Dong, Nanning Zheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02867
  • Pdf link: https://arxiv.org/pdf/2304.02867
  • Abstract
    Efficient point cloud representation is a fundamental element of Lidar-based 3D object detection. Recent grid-based detectors usually divide point clouds into voxels or pillars and construct single-stream networks in Bird's Eye View. However, these point cloud encoding paradigms underestimate the point representation in the vertical direction, which cause the loss of semantic or fine-grained information, especially for vertical sensitive objects like pedestrian and cyclists. In this paper, we propose an explicit vertical multi-scale representation learning framework, VPFusion, to combine the complementary information from both voxel and pillar streams. Specifically, VPFusion first builds upon a sparse voxel-pillar-based backbone. The backbone divides point clouds into voxels and pillars, then encodes features with 3D and 2D sparse convolution simultaneously. Next, we introduce the Sparse Fusion Layer (SFL), which establishes a bidirectional pathway for sparse voxel and pillar features to enable the interaction between them. Additionally, we present the Dense Fusion Neck (DFN) to effectively combine the dense feature maps from voxel and pillar branches with multi-scale. Extensive experiments on the large-scale Waymo Open Dataset and nuScenes Dataset demonstrate that VPFusion surpasses the single-stream baselines by a large margin and achieves state-of-the-art performance with real-time inference speed.

Keyword: lidar

VPFusion: Towards Robust Vertical Representation Learning for 3D Object Detection

  • Authors: Yuhao Huang, Sanping Zhou, Junjie Zhang, Jinpeng Dong, Nanning Zheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02867
  • Pdf link: https://arxiv.org/pdf/2304.02867
  • Abstract
    Efficient point cloud representation is a fundamental element of Lidar-based 3D object detection. Recent grid-based detectors usually divide point clouds into voxels or pillars and construct single-stream networks in Bird's Eye View. However, these point cloud encoding paradigms underestimate the point representation in the vertical direction, which cause the loss of semantic or fine-grained information, especially for vertical sensitive objects like pedestrian and cyclists. In this paper, we propose an explicit vertical multi-scale representation learning framework, VPFusion, to combine the complementary information from both voxel and pillar streams. Specifically, VPFusion first builds upon a sparse voxel-pillar-based backbone. The backbone divides point clouds into voxels and pillars, then encodes features with 3D and 2D sparse convolution simultaneously. Next, we introduce the Sparse Fusion Layer (SFL), which establishes a bidirectional pathway for sparse voxel and pillar features to enable the interaction between them. Additionally, we present the Dense Fusion Neck (DFN) to effectively combine the dense feature maps from voxel and pillar branches with multi-scale. Extensive experiments on the large-scale Waymo Open Dataset and nuScenes Dataset demonstrate that VPFusion surpasses the single-stream baselines by a large margin and achieves state-of-the-art performance with real-time inference speed.

Geometric-aware Pretraining for Vision-centric 3D Object Detection

  • Authors: Linyan Huang, Huijie Wang, Jia Zeng, Shengchuan Zhang, Liujuan Cao, Rongrong Ji, Junchi Yan, Hongyang Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03105
  • Pdf link: https://arxiv.org/pdf/2304.03105
  • Abstract
    Multi-camera 3D object detection for autonomous driving is a challenging problem that has garnered notable attention from both academia and industry. An obstacle encountered in vision-based techniques involves the precise extraction of geometry-conscious features from RGB images. Recent approaches have utilized geometric-aware image backbones pretrained on depth-relevant tasks to acquire spatial information. However, these approaches overlook the critical aspect of view transformation, resulting in inadequate performance due to the misalignment of spatial knowledge between the image backbone and view transformation. To address this issue, we propose a novel geometric-aware pretraining framework called GAPretrain. Our approach incorporates spatial and structural cues to camera networks by employing the geometric-rich modality as guidance during the pretraining phase. The transference of modal-specific attributes across different modalities is non-trivial, but we bridge this gap by using a unified bird's-eye-view (BEV) representation and structural hints derived from LiDAR point clouds to facilitate the pretraining process. GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors. Our experiments demonstrate the effectiveness and generalization ability of the proposed method. We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively. We also conduct experiments on various image backbones and view transformations to validate the efficacy of our approach. Code will be released at https://github.com/OpenDriveLab/BEVPerception-Survey-Recipe.

SALUDA: Surface-based Automotive Lidar Unsupervised Domain Adaptation

  • Authors: Bjoern Michele, Alexandre Boulch, Gilles Puy, Tuan-Hung Vu, Renaud Marlet, Nicolas Courty
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03251
  • Pdf link: https://arxiv.org/pdf/2304.03251
  • Abstract
    Learning models on one labeled dataset that generalize well on another domain is a difficult task, as several shifts might happen between the data domains. This is notably the case for lidar data, for which models can exhibit large performance discrepancies due for instance to different lidar patterns or changes in acquisition conditions. This paper addresses the corresponding Unsupervised Domain Adaptation (UDA) task for semantic segmentation. To mitigate this problem, we introduce an unsupervised auxiliary task of learning an implicit underlying surface representation simultaneously on source and target data. As both domains share the same latent representation, the model is forced to accommodate discrepancies between the two sources of data. This novel strategy differs from classical minimization of statistical divergences or lidar-specific state-of-the-art domain adaptation techniques. Our experiments demonstrate that our method achieves a better performance than the current state of the art in synthetic-to-real and real-to-real scenarios.

Keyword: diffusion

DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model

  • Authors: Hoigi Seo, Hayeon Kim, Gwanghyun Kim, Se Young Chun
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.02827
  • Pdf link: https://arxiv.org/pdf/2304.02827
  • Abstract
    The increasing demand for high-quality 3D content creation has motivated the development of automated methods for creating 3D object models from a single image and/or from a text prompt. However, the reconstructed 3D objects using state-of-the-art image-to-3D methods still exhibit low correspondence to the given image and low multi-view consistency. Recent state-of-the-art text-to-3D methods are also limited, yielding 3D samples with low diversity per prompt with long synthesis time. To address these challenges, we propose DITTO-NeRF, a novel pipeline to generate a high-quality 3D NeRF model from a text prompt or a single image. Our DITTO-NeRF consists of constructing high-quality partial 3D object for limited in-boundary (IB) angles using the given or text-generated 2D image from the frontal view and then iteratively reconstructing the remaining 3D NeRF using inpainting latent diffusion model. We propose progressive 3D object reconstruction schemes in terms of scales (low to high resolution), angles (IB angles initially to outer-boundary (OB) later), and masks (object to background boundary) in our DITTO-NeRF so that high-quality information on IB can be propagated into OB. Our DITTO-NeRF outperforms state-of-the-art methods in terms of fidelity and diversity qualitatively and quantitatively with much faster training times than prior arts on image/text-to-3D such as DreamFusion, and NeuralLift-360.

Benchmarking Robustness to Text-Guided Corruptions

  • Authors: Mohammadreza Mofayezi, Yasamin Medghalchi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02963
  • Pdf link: https://arxiv.org/pdf/2304.02963
  • Abstract
    This study investigates the robustness of image classifiers to text-guided corruptions. We utilize diffusion models to edit images to different domains. Unlike other works that use synthetic or hand-picked data for benchmarking, we use diffusion models as they are generative models capable of learning to edit images while preserving their semantic content. Thus, the corruptions will be more realistic and the comparison will be more informative. Also, there is no need for manual labeling and we can create large-scale benchmarks with less effort. We define a prompt hierarchy based on the original ImageNet hierarchy to apply edits in different domains. As well as introducing a new benchmark we try to investigate the robustness of different vision models. The results of this study demonstrate that the performance of image classifiers decreases significantly in different language-based corruptions and edit domains. We also observe that convolutional models are more robust than transformer architectures. Additionally, we see that common data augmentation techniques can improve the performance on both the original data and the edited images. The findings of this research can help improve the design of image classifiers and contribute to the development of more robust machine learning systems. The code for generating the benchmark will be made available online upon publication.

DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance

  • Authors: Longwen Zhang, Qiwei Qiu, Hongyang Lin, Qixuan Zhang, Cheng Shi, Wei Yang, Ye Shi, Sibei Yang, Lan Xu, Jingyi Yu
  • Subjects: Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.03117
  • Pdf link: https://arxiv.org/pdf/2304.03117
  • Abstract
    Emerging Metaverse applications demand accessible, accurate, and easy-to-use tools for 3D digital human creations in order to depict different cultures and societies as if in the physical world. Recent large-scale vision-language advances pave the way to for novices to conveniently customize 3D content. However, the generated CG-friendly assets still cannot represent the desired facial traits for human characteristics. In this paper, we present DreamFace, a progressive scheme to generate personalized 3D faces under text guidance. It enables layman users to naturally customize 3D facial assets that are compatible with CG pipelines, with desired shapes, textures, and fine-grained animation capabilities. From a text input to describe the facial traits, we first introduce a coarse-to-fine scheme to generate the neutral facial geometry with a unified topology. We employ a selection strategy in the CLIP embedding space, and subsequently optimize both the details displacements and normals using Score Distillation Sampling from generic Latent Diffusion Model. Then, for neutral appearance generation, we introduce a dual-path mechanism, which combines the generic LDM with a novel texture LDM to ensure both the diversity and textural specification in the UV space. We also employ a two-stage optimization to perform SDS in both the latent and image spaces to significantly provides compact priors for fine-grained synthesis. Our generated neutral assets naturally support blendshapes-based facial animations. We further improve the animation ability with personalized deformation characteristics by learning the universal expression prior using the cross-identity hypernetwork. Notably, DreamFace can generate of realistic 3D facial assets with physically-based rendering quality and rich animation ability from video footage, even for fashion icons or exotic characters in cartoons and fiction movies.

Zero-shot Generative Model Adaptation via Image-specific Prompt Learning

  • Authors: Jiayi Guo, Chaofei Wang, You Wu, Eric Zhang, Kai Wang, Xingqian Xu, Shiji Song, Humphrey Shi, Gao Huang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03119
  • Pdf link: https://arxiv.org/pdf/2304.03119
  • Abstract
    Recently, CLIP-guided image synthesis has shown appealing performance on adapting a pre-trained source-domain generator to an unseen target domain. It does not require any target-domain samples but only the textual domain labels. The training is highly efficient, e.g., a few minutes. However, existing methods still have some limitations in the quality of generated images and may suffer from the mode collapse issue. A key reason is that a fixed adaptation direction is applied for all cross-domain image pairs, which leads to identical supervision signals. To address this issue, we propose an Image-specific Prompt Learning (IPL) method, which learns specific prompt vectors for each source-domain image. This produces a more precise adaptation direction for every cross-domain image pair, endowing the target-domain generator with greatly enhanced flexibility. Qualitative and quantitative evaluations on various domains demonstrate that IPL effectively improves the quality and diversity of synthesized images and alleviates the mode collapse. Moreover, IPL is independent of the structure of the generative model, such as generative adversarial networks or diffusion models. Code is available at https://github.com/Picsart-AI-Research/IPL-Zero-Shot-Generative-Model-Adaptation.

SketchFFusion: Sketch-guided image editing with diffusion model

  • Authors: Weihang Mao, Bo Han, Zihao Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03174
  • Pdf link: https://arxiv.org/pdf/2304.03174
  • Abstract
    Sketch-guided image editing aims to achieve local fine-tuning of the image based on the sketch information provided by the user, while maintaining the original status of the unedited areas. Due to the high cost of acquiring human sketches, previous works mostly relied on edge maps as a substitute for sketches, but sketches possess more rich structural information. In this paper, we propose a sketch generation scheme that can preserve the main contours of an image and closely adhere to the actual sketch style drawn by the user. Simultaneously, current image editing methods often face challenges such as image distortion, training cost, and loss of fine details in the sketch. To address these limitations, We propose a conditional diffusion model (SketchFFusion) based on the sketch structure vector. We evaluate the generative performance of our model and demonstrate that it outperforms existing methods.

Face Animation with an Attribute-Guided Diffusion Model

  • Authors: Bohan Zeng, Xuhui Liu, Sicheng Gao, Boyu Liu, Hong Li, Jianzhuang Liu, Baochang Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03199
  • Pdf link: https://arxiv.org/pdf/2304.03199
  • Abstract
    Face animation has achieved much progress in computer vision. However, prevailing GAN-based methods suffer from unnatural distortions and artifacts due to sophisticated motion deformation. In this paper, we propose a Face Animation framework with an attribute-guided Diffusion Model (FADM), which is the first work to exploit the superior modeling capacity of diffusion models for photo-realistic talking-head generation. To mitigate the uncontrollable synthesis effect of the diffusion model, we design an Attribute-Guided Conditioning Network (AGCN) to adaptively combine the coarse animation features and 3D face reconstruction results, which can incorporate appearance and motion conditions into the diffusion process. These specific designs help FADM rectify unnatural artifacts and distortions, and also enrich high-fidelity facial details through iterative diffusion refinements with accurate animation attributes. FADM can flexibly and effectively improve existing animation videos. Extensive experiments on widely used talking-head benchmarks validate the effectiveness of FADM over prior arts.

Inst-Inpaint: Instructing to Remove Objects with Diffusion Models

  • Authors: Ahmet Burak Yildirim, Vedat Baday, Erkut Erdem, Aykut Erdem, Aysegul Dundar
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03246
  • Pdf link: https://arxiv.org/pdf/2304.03246
  • Abstract
    Image inpainting task refers to erasing unwanted pixels from images and filling them in a semantically consistent and realistic way. Traditionally, the pixels that are wished to be erased are defined with binary masks. From the application point of view, a user needs to generate the masks for the objects they would like to remove which can be time-consuming and prone to errors. In this work, we are interested in an image inpainting algorithm that estimates which object to be removed based on natural language input and also removes it, simultaneously. For this purpose, first, we construct a dataset named GQA-Inpaint for this task which will be released soon. Second, we present a novel inpainting framework, Inst-Inpaint, that can remove objects from images based on the instructions given as text prompts. We set various GAN and diffusion-based baselines and run experiments on synthetic and real image datasets. We compare methods with different evaluation metrics that measure the quality and accuracy of the models and show significant quantitative and qualitative improvements.

Diffusion Models as Masked Autoencoders

  • Authors: Chen Wei, Karttikeya Mangalam, Po-Yao Huang, Yanghao Li, Haoqi Fan, Hu Xu, Huiyu Wang, Cihang Xie, Alan Yuille, Christoph Feichtenhofer
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03283
  • Pdf link: https://arxiv.org/pdf/2304.03283
  • Abstract
    There has been a longstanding belief that generation can facilitate a true understanding of visual data. In line with this, we revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models. While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE). Our approach is capable of (i) serving as a strong initialization for downstream recognition tasks, (ii) conducting high-quality image inpainting, and (iii) being effortlessly extended to video where it produces state-of-the-art classification accuracy. We further perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.

Keyword: dynamic

Abstraction-based Probabilistic Stability Analysis of Polyhedral Probabilistic Hybrid Systems

  • Authors: Spandan Das, Pavithra Prabhakar
  • Subjects: Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02647
  • Pdf link: https://arxiv.org/pdf/2304.02647
  • Abstract
    In this paper, we consider the problem of probabilistic stability analysis of a subclass of Stochastic Hybrid Systems, namely, Polyhedral Probabilistic Hybrid Systems (PPHS), where the flow dynamics is given by a polyhedral inclusion, the discrete switching between modes happens probabilistically at the boundaries of their invariant regions and the continuous state is not reset during switching. We present an abstraction-based analysis framework that consists of constructing a finite Markov Decision Processes (MDP) such that verification of certain property on the finite MDP ensures the satisfaction of probabilistic stability on the PPHS. Further, we present a polynomial-time algorithm for verifying the corresponding property on the MDP. Our experimental analysis demonstrates the feasibility of the approach in successfully verifying probabilistic stability on PPHS of various dimensions and sizes.

Emergent Coordination through Game-Induced Nonlinear Opinion Dynamics

  • Authors: Haimin Hu, Kensuke Nakamura, Kai-Chieh Hsu, Naomi Ehrich Leonard, Jaime Fernández Fisac
  • Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02687
  • Pdf link: https://arxiv.org/pdf/2304.02687
  • Abstract
    We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to unsafe behavior in safety-critical events. The recently developed nonlinear opinion dynamics (NOD) provide guarantees for breaking deadlocks. However, choosing the appropriate model parameters automatically in general multi-agent settings remains a challenge. In this paper, we first propose a novel and principled procedure for synthesizing NOD based on the value functions of dynamic games conditioned on agents' intents. In particular, we provide for the two-player two-option case precise stability conditions for equilibria of the game-induced NOD based on the mismatch between agents' opinions and their game values. We then propose an optimization-based trajectory optimization algorithm that computes agents' policies guided by the evolution of opinions. The efficacy of our method is illustrated with a simulated toll station coordination example.

Going Further: Flatness at the Rescue of Early Stopping for Adversarial Example Transferability

  • Authors: Martin Gubri, Maxime Cordy, Yves Le Traon
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02688
  • Pdf link: https://arxiv.org/pdf/2304.02688
  • Abstract
    Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that transferability is substantially increased when the training of the surrogate model has been early stopped. A common hypothesis to explain this is that the later training epochs are when models learn the non-robust features that adversarial attacks exploit. Hence, an early stopped model is more robust (hence, a better surrogate) than fully trained models. We demonstrate that the reasons why early stopping improves transferability lie in the side effects it has on the learning dynamics of the model. We first show that early stopping benefits transferability even on models learning from data with non-robust features. We then establish links between transferability and the exploration of the loss landscape in the parameter space, on which early stopping has an inherent effect. More precisely, we observe that transferability peaks when the learning rate decays, which is also the time at which the sharpness of the loss significantly drops. This leads us to propose RFN, a new approach for transferability that minimizes loss sharpness during training in order to maximize transferability. We show that by searching for large flat neighborhoods, RFN always improves over early stopping (by up to 47 points of transferability rate) and is competitive to (if not better than) strong state-of-the-art baselines.

ACTION++: Improving Semi-supervised Medical Image Segmentation with Adaptive Anatomical Contrast

  • Authors: Chenyu You, Weicheng Dai, Yifei Min, Lawrence Staib, Jas Sekhon, James S. Duncan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.02689
  • Pdf link: https://arxiv.org/pdf/2304.02689
  • Abstract
    Medical data often exhibits long-tail distributions with heavy class imbalance, which naturally leads to difficulty in classifying the minority classes (i.e., boundary regions or rare objects). Recent work has significantly improved semi-supervised medical image segmentation in long-tailed scenarios by equipping them with unsupervised contrastive criteria. However, it remains unclear how well they will perform in the labeled portion of data where class distribution is also highly imbalanced. In this work, we present ACTION++, an improved contrastive learning framework with adaptive anatomical contrast for semi-supervised medical segmentation. Specifically, we propose an adaptive supervised contrastive loss, where we first compute the optimal locations of class centers uniformly distributed on the embedding space (i.e., off-line), and then perform online contrastive matching training by encouraging different class features to adaptively match these distinct and uniformly distributed class centers. Moreover, we argue that blindly adopting a constant temperature $\tau$ in the contrastive loss on long-tailed medical data is not optimal, and propose to use a dynamic $\tau$ via a simple cosine schedule to yield better separation between majority and minority classes. Empirically, we evaluate ACTION++ on ACDC and LA benchmarks and show that it achieves state-of-the-art across two semi-supervised settings. Theoretically, we analyze the performance of adaptive anatomical contrast and confirm its superiority in label efficiency.

Recovering Continuous Scene Dynamics from A Single Blurry Image with Events

  • Authors: Zhangyi Cheng, Xiang Zhang, Lei Yu, Jianzhuang Liu, Wen Yang, Gui-Song Xia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02695
  • Pdf link: https://arxiv.org/pdf/2304.02695
  • Abstract
    This paper aims at demystifying a single motion-blurred image with events and revealing temporally continuous scene dynamics encrypted behind motion blurs. To achieve this end, an Implicit Video Function (IVF) is learned to represent a single motion blurred image with concurrent events, enabling the latent sharp image restoration of arbitrary timestamps in the range of imaging exposures. Specifically, a dual attention transformer is proposed to efficiently leverage merits from both modalities, i.e., the high temporal resolution of event features and the smoothness of image features, alleviating temporal ambiguities while suppressing the event noise. The proposed network is trained only with the supervision of ground-truth images of limited referenced timestamps. Motion- and texture-guided supervisions are employed simultaneously to enhance restorations of the non-referenced timestamps and improve the overall sharpness. Experiments on synthetic, semi-synthetic, and real-world datasets demonstrate that our proposed method outperforms state-of-the-art methods by a large margin in terms of both objective PSNR and SSIM measurements and subjective evaluations.

Efficient and Accurate Automatic Python Bindings with cppyy & Cling

  • Authors: Baidyanath Kundu (1 and 2), Vassil Vassilev (1 and 2), Wim Lavrijsen (3) ((1) European Council for Nuclear Research, (2) Princeton University (US), (3) LBNL (US))
  • Subjects: Programming Languages (cs.PL)
  • Arxiv link: https://arxiv.org/abs/2304.02712
  • Pdf link: https://arxiv.org/pdf/2304.02712
  • Abstract
    The simplicity of Python and the power of C++ force stark choices on a scientific software stack. There have been multiple developments to mitigate language boundaries by implementing language bindings, but the impedance mismatch between the static nature of C++ and the dynamic one of Python hinders their implementation; examples include the use of user-defined Python types with templated C++ and advanced memory management. The development of the C++ interpreter Cling has changed the way we can think of language bindings as it provides an incremental compilation infrastructure available at runtime. That is, Python can interrogate C++ on demand, and bindings can be lazily constructed at runtime. This automatic binding provision requires no direct support from library authors and offers better performance than alternative solutions, such as PyBind11. ROOT pioneered this approach with PyROOT, which was later enhanced with its successor, cppyy. However, until now, cppyy relied on the reflection layer of ROOT, which is limited in terms of provided features and performance. This paper presents the next step for language interoperability with cppyy, enabling research into uniform cross-language execution environments and boosting optimization opportunities across language boundaries. We illustrate the use of advanced C++ in Numba-accelerated Python through cppyy. We outline a path forward for re-engineering parts of cppyy to use upstream LLVM components to improve performance and sustainability. We demonstrate cppyy purely based on a C++ reflection library, InterOp, which offers interoperability primitives based on Cling and Clang-Repl.

Software and Analysis for Dynamic Voronoi Diagrams in the Hilbert Metric

  • Authors: Madeline Bumpus, Caesar Dai, Auguste H. Gezalyan, Sam Munoz, Renita Santhoshkumar, Songyu Ye, David M. Mount
  • Subjects: Computational Geometry (cs.CG)
  • Arxiv link: https://arxiv.org/abs/2304.02745
  • Pdf link: https://arxiv.org/pdf/2304.02745
  • Abstract
    The Hilbert metric is a projective metric defined on a convex body which generalizes the Cayley-Klein model of hyperbolic geometry to any convex set. In this paper we analyze Hilbert Voronoi diagrams in the Dynamic setting. In addition we introduce dynamic visualization software for Voronoi diagrams in the Hilbert metric on user specified convex polygons.

Adaptive Headway Motion Control and Motion Prediction for Safe Unicycle Motion Design

  • Authors: Aykut İşleyen, Nathan van de Wouw, Ömür Arslan
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02760
  • Pdf link: https://arxiv.org/pdf/2304.02760
  • Abstract
    Differential drive robots that can be modeled as a kinematic unicycle are a standard mobile base platform for many service and logistics robots. Safe and smooth autonomous motion around obstacles is a crucial skill for unicycle robots to perform diverse tasks in complex environments. A classical control approach for unicycle control is feedback linearization using a headway point at a fixed headway distance in front of the unicycle. The unicycle headway control brings the headway point to a desired goal location by embedding a linear headway reference dynamics, which often results in an undesired offset for the actual unicycle position. In this paper, we introduce a new unicycle headway control approach with an adaptive headway distance that overcomes this limitation, i.e., when the headway point reaches the goal the unicycle position is also at the goal. By systematically analyzing the closed-loop unicycle motion under the adaptive headway controller, we design analytical feedback motion prediction methods that bound the closed-loop unicycle position trajectory and so can be effectively used for safety assessment and safe unicycle motion design around obstacles. We present an application of adaptive headway motion control and motion prediction for safe unicycle path following around obstacles in numerical simulations.

A Robust Observer with Gyroscopic Bias Correction for Rotational Dynamics

  • Authors: Erjen Lefeber, Marcus Greiff, Anders Robertsson
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02763
  • Pdf link: https://arxiv.org/pdf/2304.02763
  • Abstract
    We propose an observer for rotational dynamics subject to directional and gyroscopic measurements, which simultaneously estimates the gyroscopic biases and attitude rates. We show uniform almost global asymptotic and local exponential stability of the resulting error dynamics, implying robustness against bounded disturbances. This robustness is quantified with respect to a popular nonlinear complementary filter in quantitative simulation studies, and we explore how the measurement noise propagates to the asymptotic errors as a function of tuning. This is an extended version of a paper with the same title (to appear at IFAC WC 2023). Additional mathematical details are provided in this extended version.

MoStGAN-V: Video Generation with Temporal Motion Styles

  • Authors: Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02777
  • Pdf link: https://arxiv.org/pdf/2304.02777
  • Abstract
    Video generation remains a challenging task due to spatiotemporal complexity and the requirement of synthesizing diverse motions with temporal consistency. Previous works attempt to generate videos in arbitrary lengths either in an autoregressive manner or regarding time as a continuous signal. However, they struggle to synthesize detailed and diverse motions with temporal coherence and tend to generate repetitive scenes after a few time steps. In this work, we argue that a single time-agnostic latent vector of style-based generator is insufficient to model various and temporally-consistent motions. Hence, we introduce additional time-dependent motion styles to model diverse motion patterns. In addition, a Motion Style Attention modulation mechanism, dubbed as MoStAtt, is proposed to augment frames with vivid dynamics for each specific scale (i.e., layer), which assigns attention score for each motion style w.r.t deconvolution filter weights in the target synthesis layer and softly attends different motion styles for weight modulation. Experimental results show our model achieves state-of-the-art performance on four unconditional $256^2$ video synthesis benchmarks trained with only 3 frames per clip and produces better qualitative results with respect to dynamic motions. Code and videos have been made available at https://github.com/xiaoqian-shen/MoStGAN-V.

Enhanced Grid Following Inverter: A Uniform Control Design Framework

  • Authors: Alireza Askarian, Jaesang Park, Srinivasa Salapaka
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02792
  • Pdf link: https://arxiv.org/pdf/2304.02792
  • Abstract
    This article presents a novel grid following (GFL) inverter control design framework that exploits the line dynamics structure in $dq$ frame and treats the inverter as an actuator. The proposed framework imposes a structure on the line's coupled dynamics and captures the effect of coupling on the GFL inverter's closed-loop stability and performance. One of the main features of our work is using the bode sensitivity integral to characterize the fundamental limitations of control design. These constraints translate into fundamental trade-offs between performance objectives such as reference tracking, closed-loop bandwidth, robust synchronization, and resilience to grid anomalies. The article develops design considerations to ensure specific trade-offs. We assess the performance of our proposed framework through simulation and experimental results.

Unveiling the Dynamics of Censorship, COVID-19 Regulations, and Protest: An Empirical Study of Chinese Subreddit r/china_irl

  • Authors: Siyi Zhou, Luca Luceri, Emilio Ferrara
  • Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2304.02800
  • Pdf link: https://arxiv.org/pdf/2304.02800
  • Abstract
    The COVID-19 pandemic has intensified numerous social issues that warrant academic investigation. Although information dissemination has been extensively studied, the silenced voices and censored content also merit attention due to their role in mobilizing social movements. In this paper, we provide empirical evidence to explore the relationships among COVID-19 regulations, censorship, and protest through a series of social incidents occurred in China during 2022. We analyze the similarities and differences between censored articles and discussions on r/china_irl, the most popular Chinese-speaking subreddit, and scrutinize the temporal dynamics of government censorship activities and their impact on user engagement within the subreddit. Furthermore, we examine users' linguistic patterns under the influence of a censorship-driven environment. Our findings reveal patterns in topic recurrence, the complex interplay between censorship activities, user subscription, and collective commenting behavior, as well as potential linguistic adaptation strategies to circumvent censorship. These insights hold significant implications for researchers interested in understanding the survival mechanisms of marginalized groups within censored information ecosystems.

Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling

  • Authors: Haotao Wang, Ziyu Jiang, Yan Han, Zhangyang Wang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02806
  • Pdf link: https://arxiv.org/pdf/2304.02806
  • Abstract
    Graph neural networks (GNNs) have been widely applied to learning over graph data. Yet, real-world graphs commonly exhibit diverse graph structures and contain heterogeneous nodes and edges. Moreover, to enhance the generalization ability of GNNs, it has become common practice to further increase the diversity of training graph structures by incorporating graph augmentations and/or performing large-scale pre-training on more graphs. Therefore, it becomes essential for a GNN to simultaneously model diverse graph structures. Yet, naively increasing the GNN model capacity will suffer from both higher inference costs and the notorious trainability issue of GNNs. This paper introduces the Mixture-of-Expert (MoE) idea to GNNs, aiming to enhance their ability to accommodate the diversity of training graph structures, without incurring computational overheads. Our new Graph Mixture of Expert (GMoE) model enables each node in the graph to dynamically select its own optimal \textit{information aggregation experts}. These experts are trained to model different subgroups of graph structures in the training set. Additionally, GMoE includes information aggregation experts with varying aggregation hop sizes, where the experts with larger hop sizes are specialized in capturing information over longer ranges. The effectiveness of GMoE is verified through experimental results on a large variety of graph, node, and link prediction tasks in the OGB benchmark. For instance, it enhances ROC-AUC by $1.81%$ in ogbg-molhiv and by $1.40%$ in ogbg-molbbbp, as compared to the non-MoE baselines. Our code is available at https://github.com/VITA-Group/Graph-Mixture-of-Experts.

Causal Repair of Learning-enabled Cyber-physical Systems

  • Authors: Pengyuan Lu, Ivan Ruchkin, Matthew Cleaveland, Oleg Sokolsky, Insup Lee
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.02813
  • Pdf link: https://arxiv.org/pdf/2304.02813
  • Abstract
    Models of actual causality leverage domain knowledge to generate convincing diagnoses of events that caused an outcome. It is promising to apply these models to diagnose and repair run-time property violations in cyber-physical systems (CPS) with learning-enabled components (LEC). However, given the high diversity and complexity of LECs, it is challenging to encode domain knowledge (e.g., the CPS dynamics) in a scalable actual causality model that could generate useful repair suggestions. In this paper, we focus causal diagnosis on the input/output behaviors of LECs. Specifically, we aim to identify which subset of I/O behaviors of the LEC is an actual cause for a property violation. An important by-product is a counterfactual version of the LEC that repairs the run-time property by fixing the identified problematic behaviors. Based on this insights, we design a two-step diagnostic pipeline: (1) construct and Halpern-Pearl causality model that reflects the dependency of property outcome on the component's I/O behaviors, and (2) perform a search for an actual cause and corresponding repair on the model. We prove that our pipeline has the following guarantee: if an actual cause is found, the system is guaranteed to be repaired; otherwise, we have high probabilistic confidence that the LEC under analysis did not cause the property violation. We demonstrate that our approach successfully repairs learned controllers on a standard OpenAI Gym benchmark.

NTK-SAP: Improving neural network pruning by aligning training dynamics

  • Authors: Yite Wang, Dawei Li, Ruoyu Sun
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02840
  • Pdf link: https://arxiv.org/pdf/2304.02840
  • Abstract
    Pruning neural networks before training has received increasing interest due to its potential to reduce training time and memory. One popular method is to prune the connections based on a certain metric, but it is not entirely clear what metric is the best choice. Recent advances in neural tangent kernel (NTK) theory suggest that the training dynamics of large enough neural networks is closely related to the spectrum of the NTK. Motivated by this finding, we propose to prune the connections that have the least influence on the spectrum of the NTK. This method can help maintain the NTK spectrum, which may help align the training dynamics to that of its dense counterpart. However, one possible issue is that the fixed-weight-NTK corresponding to a given initial point can be very different from the NTK corresponding to later iterates during the training phase. We further propose to sample multiple realizations of random weights to estimate the NTK spectrum. Note that our approach is weight-agnostic, which is different from most existing methods that are weight-dependent. In addition, we use random inputs to compute the fixed-weight-NTK, making our method data-agnostic as well. We name our foresight pruning algorithm Neural Tangent Kernel Spectrum-Aware Pruning (NTK-SAP). Empirically, our method achieves better performance than all baselines on multiple datasets.

Design and Control of a Ballbot Drivetrain with High Agility, Minimal Footprint, and High Payload

  • Authors: Chenzhang Xiao, Mahshid Mansouri, David Lam, Joao Ramos, Elizabeth T. Hsiao-Wecksler
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02887
  • Pdf link: https://arxiv.org/pdf/2304.02887
  • Abstract
    This paper presents the design and control of a ballbot drivetrain that aims to achieve high agility, minimal footprint, and high payload capacity while maintaining dynamic stability. Two hardware platforms and analytical models were developed to test design and control methodologies. The full-scale ballbot prototype (MiaPURE) was constructed using off-the-shelf components and designed to have agility, footprint, and balance similar to that of a walking human. The planar inverted pendulum testbed (PIPTB) was developed as a reduced-order testbed for quick validation of system performance. We then proposed a simple yet robust LQR-PI controller to balance and maneuver the ballbot drivetrain with a heavy payload. This is crucial because the drivetrain is often subject to high stiction due to elastomeric components in the torque transmission system. This controller was first tested in the PIPTB to compare with traditional LQR and cascaded PI-PD controllers, and then implemented in the ballbot drivetrain. The MiaPURE drivetrain was able to carry a payload of 60 kg, achieve a maximum speed of 2.3 m/s, and come to a stop from a speed of 1.4 m/s in 2 seconds in a selected translation direction. Finally, we demonstrated the omnidirectional movement of the ballbot drivetrain in an indoor environment as a payload-carrying robot and a human-riding mobility device. Our experiments demonstrated the feasibility of using the ballbot drivetrain as a universal mobility platform with agile movements, minimal footprint, and high payload capacity using our proposed design and control methodologies.

LSketch: A Label-Enabled Graph Stream Sketch Toward Time-Sensitive Queries

  • Authors: Yiling Zeng, Chunyao Song, Yuhan Li, Tingjian Ge
  • Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.02897
  • Pdf link: https://arxiv.org/pdf/2304.02897
  • Abstract
    Graph streams represent data interactions in real applications. The mining of graph streams plays an important role in network security, social network analysis, and traffic control, among others. However, the sheer volume and high dynamics cause great challenges for efficient storage and subsequent query analysis on them. Current studies apply sketches to summarize graph streams. We propose LSketch that works for heterogeneous graph streams, which effectively preserves the label information carried by the streams in real scenes, thereby enriching the expressive ability of sketches. In addition, as graph streams continue to evolve over time, edges too old may lose their practical significance. Therefore, we introduce the sliding window model into LSketch to eliminate the expired edges automatically. LSketch uses sub-linear storage space and can support structure based queries and time-sensitive queries with high accuracy. We perform extensive experiments over four real datasets, demonstrating the superiority of the proposed method over state-of-the-art methods, in aspects of query accuracy and time efficiency.

Quantifying and Defending against Privacy Threats on Federated Knowledge Graph Embedding

  • Authors: Yuke Hu, Wei Liang, Ruofan Wu, Kai Xiao, Weiqiang Wang, Xiaochen Li, Jinfei Liu, Zhan Qin
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.02932
  • Pdf link: https://arxiv.org/pdf/2304.02932
  • Abstract
    Knowledge Graph Embedding (KGE) is a fundamental technique that extracts expressive representation from knowledge graph (KG) to facilitate diverse downstream tasks. The emerging federated KGE (FKGE) collaboratively trains from distributed KGs held among clients while avoiding exchanging clients' sensitive raw KGs, which can still suffer from privacy threats as evidenced in other federated model trainings (e.g., neural networks). However, quantifying and defending against such privacy threats remain unexplored for FKGE which possesses unique properties not shared by previously studied models. In this paper, we conduct the first holistic study of the privacy threat on FKGE from both attack and defense perspectives. For the attack, we quantify the privacy threat by proposing three new inference attacks, which reveal substantial privacy risk by successfully inferring the existence of the KG triple from victim clients. For the defense, we propose DP-Flames, a novel differentially private FKGE with private selection, which offers a better privacy-utility tradeoff by exploiting the entity-binding sparse gradient property of FKGE and comes with a tight privacy accountant by incorporating the state-of-the-art private selection technique. We further propose an adaptive privacy budget allocation policy to dynamically adjust defense magnitude across the training procedure. Comprehensive evaluations demonstrate that the proposed defense can successfully mitigate the privacy threat by effectively reducing the success rate of inference attacks from $83.1%$ to $59.4%$ on average with only a modest utility decrease.

Adaptable and Interpretable Framework for Novelty Detection in Real-Time IoT Systems

  • Authors: Marek Wadinger, Michal Kvasnica
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02947
  • Pdf link: https://arxiv.org/pdf/2304.02947
  • Abstract
    This paper presents the Real-time Adaptive and Interpretable Detection (RAID) algorithm. The novel approach addresses the limitations of state-of-the-art anomaly detection methods for multivariate dynamic processes, which are restricted to detecting anomalies within the scope of the model training conditions. The RAID algorithm adapts to non-stationary effects such as data drift and change points that may not be accounted for during model development, resulting in prolonged service life. A dynamic model based on joint probability distribution handles anomalous behavior detection in a system and the root cause isolation based on adaptive process limits. RAID algorithm does not require changes to existing process automation infrastructures, making it highly deployable across different domains. Two case studies involving real dynamic system data demonstrate the benefits of the RAID algorithm, including change point adaptation, root cause isolation, and improved detection accuracy.

FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead

  • Authors: Kang Chen, Tao Han, Junchao Gong, Lei Bai, Fenghua Ling, Jing-Jia Luo, Xi Chen, Leiming Ma, Tianning Zhang, Rui Su, Yuanzheng Ci, Bin Li, Xiaokang Yang, Wanli Ouyang
  • Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
  • Arxiv link: https://arxiv.org/abs/2304.02948
  • Pdf link: https://arxiv.org/pdf/2304.02948
  • Abstract
    We present FengWu, an advanced data-driven global medium-range weather forecast system based on Artificial Intelligence (AI). Different from existing data-driven weather forecast methods, FengWu solves the medium-range forecast problem from a multi-modal and multi-task perspective. Specifically, a deep learning architecture equipped with model-specific encoder-decoders and cross-modal fusion Transformer is elaborately designed, which is learned under the supervision of an uncertainty loss to balance the optimization of different predictors in a region-adaptive manner. Besides this, a replay buffer mechanism is introduced to improve medium-range forecast performance. With 39-year data training based on the ERA5 reanalysis, FengWu is able to accurately reproduce the atmospheric dynamics and predict the future land and atmosphere states at 37 vertical levels on a 0.25{\deg} latitude-longitude resolution. Hindcasts of 6-hourly weather in 2018 based on ERA5 demonstrate that FengWu performs better than GraphCast in predicting 80% of the 880 reported predictands, e.g., reducing the root mean square error (RMSE) of 10-day lead global z500 prediction from 733 to 651 $m^{2}/s^2$. In addition, the inference cost of each iteration is merely 600ms on NVIDIA Tesla A100 hardware. The results suggest that FengWu can significantly improve the forecast skill and extend the skillful global medium-range weather forecast out to 10.75 days lead (with ACC of z500 > 0.6) for the first time.

Deep Long-Short Term Memory networks: Stability properties and Experimental validation

  • Authors: Fabio Bonassi, Alessio La Bella, Giulio Panzani, Marcello Farina, Riccardo Scattolini
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.02975
  • Pdf link: https://arxiv.org/pdf/2304.02975
  • Abstract
    The aim of this work is to investigate the use of Incrementally Input-to-State Stable ($\delta$ISS) deep Long Short Term Memory networks (LSTMs) for the identification of nonlinear dynamical systems. We show that suitable sufficient conditions on the weights of the network can be leveraged to setup a training procedure able to learn provenly-$\delta$ISS LSTM models from data. The proposed approach is tested on a real brake-by-wire apparatus to identify a model of the system from input-output experimentally collected data. Results show satisfactory modeling performances.

Distributed Model Predictive Control for Periodic Cooperation of Multi-Agent Systems

  • Authors: Matthias Köhler, Matthias A. Müller, Frank Allgöwer
  • Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.03002
  • Pdf link: https://arxiv.org/pdf/2304.03002
  • Abstract
    We consider multi-agent systems with heterogeneous, nonlinear agents subject to individual constraints that want to achieve a periodic, dynamic cooperative control goal which can be characterised by a set and a suitable cost. We propose a sequential distributed model predictive control (MPC) scheme in which agents sequentially solve an individual optimisation problem to track an artificial periodic output trajectory. The optimisation problems are coupled through these artificial periodic output trajectories, which are communicated and penalised using the cost that characterises the cooperative goal. The agents communicate only their artificial trajectories and only once per time step. We show that under suitable assumptions, the agents can incrementally move their artificial output trajectories towards the cooperative goal, and, hence, their closed-loop output trajectories asymptotically achieve it. We illustrate the scheme with a simulation example.

IoT Federated Blockchain Learning at the Edge

  • Authors: James Calo, Benny Lo
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.03006
  • Pdf link: https://arxiv.org/pdf/2304.03006
  • Abstract
    IoT devices are sorely underutilized in the medical field, especially within machine learning for medicine, yet they offer unrivaled benefits. IoT devices are low-cost, energy-efficient, small and intelligent devices. In this paper, we propose a distributed federated learning framework for IoT devices, more specifically for IoMT (Internet of Medical Things), using blockchain to allow for a decentralized scheme improving privacy and efficiency over a centralized system; this allows us to move from the cloud-based architectures, that are prevalent, to the edge. The system is designed for three paradigms: 1) Training neural networks on IoT devices to allow for collaborative training of a shared model whilst decoupling the learning from the dataset to ensure privacy. Training is performed in an online manner simultaneously amongst all participants, allowing for the training of actual data that may not have been present in a dataset collected in the traditional way and dynamically adapt the system whilst it is being trained. 2) Training of an IoMT system in a fully private manner such as to mitigate the issue with confidentiality of medical data and to build robust, and potentially bespoke, models where not much, if any, data exists. 3) Distribution of the actual network training, something federated learning itself does not do, to allow hospitals, for example, to utilize their spare computing resources to train network models.

Data-driven HVAC Control Using Symbolic Regression: Design and Implementation

  • Authors: Yuki Ozawa, Dafang Zhao, Daichi Watari, Ittetsu Taniguchi, Toshihiro Suzuki, Yoshiyuki Shimoda, Takao Onoye
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.03078
  • Pdf link: https://arxiv.org/pdf/2304.03078
  • Abstract
    The large amount of data collected in buildings makes energy management smarter and more energy efficient. This study proposes a design and implementation methodology of data-driven heating, ventilation, and air conditioning (HVAC) control. Building thermodynamics is modeled using a symbolic regression model (SRM) built from the collected data. Additionally, an HVAC system model is also developed with a data-driven approach. A model predictive control (MPC) based HVAC scheduling is formulated with the developed models to minimize energy consumption and peak power demand and maximize thermal comfort. The performance of the proposed framework is demonstrated in the workspace in the actual campus building. The HVAC system using the proposed framework reduces the peak power by 16.1% compared to the widely used thermostat controller.

Inductive Graph Unlearning

  • Authors: Cheng-Long Wang, Mengdi Huai, Di Wang
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.03093
  • Pdf link: https://arxiv.org/pdf/2304.03093
  • Abstract
    As a way to implement the "right to be forgotten" in machine learning, \textit{machine unlearning} aims to completely remove the contributions and information of the samples to be deleted from a trained model without affecting the contributions of other samples. Recently, many frameworks for machine unlearning have been proposed, and most of them focus on image and text data. To extend machine unlearning to graph data, \textit{GraphEraser} has been proposed. However, a critical issue is that \textit{GraphEraser} is specifically designed for the transductive graph setting, where the graph is static and attributes and edges of test nodes are visible during training. It is unsuitable for the inductive setting, where the graph could be dynamic and the test graph information is invisible in advance. Such inductive capability is essential for production machine learning systems with evolving graphs like social media and transaction networks. To fill this gap, we propose the \underline{{\bf G}}\underline{{\bf U}}ided \underline{{\bf I}}n\underline{{\bf D}}uctiv\underline{{\bf E}} Graph Unlearning framework (GUIDE). GUIDE consists of three components: guided graph partitioning with fairness and balance, efficient subgraph repair, and similarity-based aggregation. Empirically, we evaluate our method on several inductive benchmarks and evolving transaction graphs. Generally speaking, GUIDE can be efficiently implemented on the inductive graph learning tasks for its low graph partition cost, no matter on computation or structure information. The code will be available here: https://github.com/Happy2Git/GUIDE.

Constrained Exploration in Reinforcement Learning with Optimality Preservation

  • Authors: Peter C. Y. Chen
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.03104
  • Pdf link: https://arxiv.org/pdf/2304.03104
  • Abstract
    We consider a class of reinforcement-learning systems in which the agent follows a behavior policy to explore a discrete state-action space to find an optimal policy while adhering to some restriction on its behavior. Such restriction may prevent the agent from visiting some state-action pairs, possibly leading to the agent finding only a sub-optimal policy. To address this problem we introduce the concept of constrained exploration with optimality preservation, whereby the exploration behavior of the agent is constrained to meet a specification while the optimality of the (original) unconstrained learning process is preserved. We first establish a feedback-control structure that models the dynamics of the unconstrained learning process. We then extend this structure by adding a supervisor to ensure that the behavior of the agent meets the specification, and establish (for a class of reinforcement-learning problems with a known deterministic environment) a necessary and sufficient condition under which optimality is preserved. This work demonstrates the utility and the prospect of studying reinforcement-learning problems in the context of the theories of discrete-event systems, automata and formal languages.

A self-organizing robotic aggregate using solid and liquid-like collective states

  • Authors: Baudouin Saintyves, Matthew Spenko, Heinrich M. Jaeger
  • Subjects: Robotics (cs.RO); Soft Condensed Matter (cond-mat.soft); Adaptation and Self-Organizing Systems (nlin.AO)
  • Arxiv link: https://arxiv.org/abs/2304.03125
  • Pdf link: https://arxiv.org/pdf/2304.03125
  • Abstract
    Designing robotic systems that can change their physical form factor as well as their compliance to adapt to environmental constraints remains a major conceptual and technical challenge. To address this, we introduce the Granulobot, a modular system that blurs the distinction between soft, modular, and swarm robotics. The system consists of gear-like units that each contain a single actuator such that units can self-assemble into larger, granular aggregates using magnetic coupling. These aggregates can reconfigure dynamically and also split up into subsystems that might later recombine. Aggregates can self-organize into collective states with solid- and liquid-like properties, thus displaying widely differing compliances. These states can be perturbed locally via actuators or externally via mechanical feedback from the environment to produce adaptive shape shifting in a decentralized manner. This in turn can generate locomotion strategies adapted to different conditions. Aggregates can move over obstacles without using external sensors or coordinate to maintain a steady gait over different surfaces without electronic communication among units. The modular design highlights a physical, morphological form of control that advances the development of resilient robotic systems with the ability to morph and adapt to different functions and conditions.

From Saliency to DINO: Saliency-guided Vision Transformer for Few-shot Keypoint Detection

  • Authors: Changsheng Lu, Hao Zhu, Piotr Koniusz
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03140
  • Pdf link: https://arxiv.org/pdf/2304.03140
  • Abstract
    Unlike current deep keypoint detectors that are trained to recognize limited number of body parts, few-shot keypoint detection (FSKD) attempts to localize any keypoints, including novel or base keypoints, depending on the reference samples. FSKD requires the semantically meaningful relations for keypoint similarity learning to overcome the ubiquitous noise and ambiguous local patterns. One rescue comes with vision transformer (ViT) as it captures long-range relations well. However, ViT may model irrelevant features outside of the region of interest due to the global attention matrix, thus degrading similarity learning between support and query features. In this paper, we present a novel saliency-guided vision transformer, dubbed SalViT, for few-shot keypoint detection. Our SalViT enjoys a uniquely designed masked self-attention and a morphology learner, where the former introduces saliency map as a soft mask to constrain the self-attention on foregrounds, while the latter leverages the so-called power normalization to adjust morphology of saliency map, realizing ``dynamically changing receptive field''. Moreover, as salinecy detectors add computations, we show that attentive masks of DINO transformer can replace saliency. On top of SalViT, we also investigate i) transductive FSKD that enhances keypoint representations with unlabelled data and ii) FSKD under occlusions. We show that our model performs well on five public datasets and achieves ~10% PCK higher than the normally trained model under severe occlusions.

Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream

  • Authors: Yuheng Jiang, Kaixin Yao, Zhuo Su, Zhehao Shen, Haimin Luo, Lan Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03184
  • Pdf link: https://arxiv.org/pdf/2304.03184
  • Abstract
    Convenient 4D modeling of human-object interactions is essential for numerous applications. However, monocular tracking and rendering of complex interaction scenarios remain challenging. In this paper, we propose Instant-NVR, a neural approach for instant volumetric human-object tracking and rendering using a single RGBD camera. It bridges traditional non-rigid tracking with recent instant radiance field techniques via a multi-thread tracking-rendering mechanism. In the tracking front-end, we adopt a robust human-object capture scheme to provide sufficient motion priors. We further introduce a separated instant neural representation with a novel hybrid deformation module for the interacting scene. We also provide an on-the-fly reconstruction scheme of the dynamic/static radiance fields via efficient motion-prior searching. Moreover, we introduce an online key frame selection scheme and a rendering-aware refinement strategy to significantly improve the appearance details for online novel-view synthesis. Extensive experiments demonstrate the effectiveness and efficiency of our approach for the instant generation of human-object radiance fields on the fly, notably achieving real-time photo-realistic novel view synthesis under complex human-object interactions.

LANe: Lighting-Aware Neural Fields for Compositional Scene Synthesis

  • Authors: Akshay Krishnan, Amit Raj, Xianling Zhang, Alexandra Carlson, Nathan Tseng, Sandhya Sridhar, Nikita Jaipuria, James Hays
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03280
  • Pdf link: https://arxiv.org/pdf/2304.03280
  • Abstract
    Neural fields have recently enjoyed great success in representing and rendering 3D scenes. However, most state-of-the-art implicit representations model static or dynamic scenes as a whole, with minor variations. Existing work on learning disentangled world and object neural fields do not consider the problem of composing objects into different world neural fields in a lighting-aware manner. We present Lighting-Aware Neural Field (LANe) for the compositional synthesis of driving scenes in a physically consistent manner. Specifically, we learn a scene representation that disentangles the static background and transient elements into a world-NeRF and class-specific object-NeRFs to allow compositional synthesis of multiple objects in the scene. Furthermore, we explicitly designed both the world and object models to handle lighting variation, which allows us to compose objects into scenes with spatially varying lighting. This is achieved by constructing a light field of the scene and using it in conjunction with a learned shader to modulate the appearance of the object NeRFs. We demonstrate the performance of our model on a synthetic dataset of diverse lighting conditions rendered with the CARLA simulator, as well as a novel real-world dataset of cars collected at different times of the day. Our approach shows that it outperforms state-of-the-art compositional scene synthesis on the challenging dataset setup, via composing object-NeRFs learned from one scene into an entirely different scene whilst still respecting the lighting variations in the novel scene. For more results, please visit our project website https://lane-composition.github.io/.

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention

  • Authors: Mingyu Ding, Yikang Shen, Lijie Fan, Zhenfang Chen, Zitian Chen, Ping Luo, Joshua B. Tenenbaum, Chuang Gan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03282
  • Pdf link: https://arxiv.org/pdf/2304.03282
  • Abstract
    Humans possess a versatile mechanism for extracting structured representations of our visual world. When looking at an image, we can decompose the scene into entities and their parts as well as obtain the dependencies between them. To mimic such capability, we propose Visual Dependency Transformers (DependencyViT) that can induce visual dependencies without any labels. We achieve that with a novel neural operator called \emph{reversed attention} that can naturally capture long-range visual dependencies between image patches. Specifically, we formulate it as a dependency graph where a child token in reversed attention is trained to attend to its parent tokens and send information following a normalized probability distribution rather than gathering information in conventional self-attention. With such a design, hierarchies naturally emerge from reversed attention layers, and a dependency tree is progressively induced from leaf nodes to the root node unsupervisedly. DependencyViT offers several appealing benefits. (i) Entities and their parts in an image are represented by different subtrees, enabling part partitioning from dependencies; (ii) Dynamic visual pooling is made possible. The leaf nodes which rarely send messages can be pruned without hindering the model performance, based on which we propose the lightweight DependencyViT-Lite to reduce the computational and memory footprints; (iii) DependencyViT works well on both self- and weakly-supervised pretraining paradigms on ImageNet, and demonstrates its effectiveness on 8 datasets and 5 tasks, such as unsupervised part and saliency segmentation, recognition, and detection.

New submissions for Fri, 17 Mar 23

Keyword: pruning

There is no result

Keyword: nas

Modeling and Analysis on Efficiency Degradation of Lithium-ion Batteries

  • Authors: Zihui Lin, Dagang Li
  • Subjects: Systems and Control (eess.SY); Data Analysis, Statistics and Probability (physics.data-an)
  • Arxiv link: https://arxiv.org/abs/2303.09456
  • Pdf link: https://arxiv.org/pdf/2303.09456
  • Abstract
    Efficiency of Battery Energy Storage Systems (BESSs) is increasingly critical as renewable energy generation becomes more prevalent on the grid. Therefore, it is necessary to study the energy efficiency of Lithium-ion Batteries (LIBs), which are typically used in BESSs. The purpose of this study is to propose the State of Efficiency (SOE) as a measure of how efficiently LIBs transfer energy, and what factors affect the SOE of a battery throughout its lifetime. Using NASA's data set, we calculate the SOE of NCA LIBs by calculating the ratio of energy generated and consumed during discharge and charge phases. A linear trend was observed in the SOE trajectories, which is confirmed by the MannKendall (MK) trend test. Following that, a linear SOE degradation model was presented. Ambient temperature, discharge current, and cutoff voltage all affect SOE in different ways. Using the SOE and its behavior observed in this study, Battery Management Systems (BMS) can improve the energy efficiency of LIBs by adjusting operating conditions or developing better management strategies.

Gate Recurrent Unit Network based on Hilbert-Schmidt Independence Criterion for State-of-Health Estimation

  • Authors: Ziyue Huang, Lujuan Dang, Yuqing Xie, Wentao Ma, Badong Chen
  • Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2303.09497
  • Pdf link: https://arxiv.org/pdf/2303.09497
  • Abstract
    State-of-health (SOH) estimation is a key step in ensuring the safe and reliable operation of batteries. Due to issues such as varying data distribution and sequence length in different cycles, most existing methods require health feature extraction technique, which can be time-consuming and labor-intensive. GRU can well solve this problem due to the simple structure and superior performance, receiving widespread attentions. However, redundant information still exists within the network and impacts the accuracy of SOH estimation. To address this issue, a new GRU network based on Hilbert-Schmidt Independence Criterion (GRU-HSIC) is proposed. First, a zero masking network is used to transform all battery data measured with varying lengths every cycle into sequences of the same length, while still retaining information about the original data size in each cycle. Second, the Hilbert-Schmidt Independence Criterion (HSIC) bottleneck, which evolved from Information Bottleneck (IB) theory, is extended to GRU to compress the information from hidden layers. To evaluate the proposed method, we conducted experiments on datasets from the Center for Advanced Life Cycle Engineering (CALCE) of the University of Maryland and NASA Ames Prognostics Center of Excellence. Experimental results demonstrate that our model achieves higher accuracy than other recurrent models.

Large Population Games on Constrained Unreliable Networks

  • Authors: Shubham Aggarwal, Muhammad Aneeq uz Zaman, Melih Bastopcu, Tamer Başar
  • Subjects: Systems and Control (eess.SY); Computer Science and Game Theory (cs.GT); Social and Information Networks (cs.SI); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2303.09515
  • Pdf link: https://arxiv.org/pdf/2303.09515
  • Abstract
    This paper studies an $N$--agent cost-coupled game where the agents are connected via an unreliable capacity constrained network. Each agent receives state information over that network which loses packets with probability $p$. A Base station (BS) actively schedules agent communications over the network by minimizing a weighted Age of Information (WAoI) based cost function under a capacity limit $\mathcal{C} < N$ on the number of transmission attempts at each instant. Under a standard information structure, we show that the problem can be decoupled into a scheduling problem for the BS and a game problem for the $N$ agents. Since the scheduling problem is an NP hard combinatorics problem, we propose an approximately optimal solution which approaches the optimal solution as $N \rightarrow \infty$. In the process, we also provide some insights on the case without channel erasure. Next, to solve the large population game problem, we use the mean-field game framework to compute an approximate decentralized Nash equilibrium. Finally, we validate the theoretical results using a numerical example.

New submissions for Wed, 19 Apr 23

Keyword: efficient

Model-Driven Quantum Federated Learning (QFL)

  • Authors: Armin Moin, Atta Badii, Moharram Challenger
  • Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08496
  • Pdf link: https://arxiv.org/pdf/2304.08496
  • Abstract
    Recently, several studies have proposed frameworks for Quantum Federated Learning (QFL). For instance, the Google TensorFlow Quantum (TFQ) and TensorFlow Federated (TFF) libraries have been deployed for realizing QFL. However, developers, in the main, are not as yet familiar with Quantum Computing (QC) libraries and frameworks. A Domain-Specific Modeling Language (DSML) that provides an abstraction layer over the underlying QC and Federated Learning (FL) libraries would be beneficial. This could enable practitioners to carry out software development and data science tasks efficiently while deploying the state of the art in Quantum Machine Learning (QML). In this position paper, we propose extending existing domain-specific Model-Driven Engineering (MDE) tools for Machine Learning (ML) enabled systems, such as MontiAnna, ML-Quadrat, and GreyCat, to support QFL.

CyFormer: Accurate State-of-Health Prediction of Lithium-Ion Batteries via Cyclic Attention

  • Authors: Zhiqiang Nie, Jiankun Zhao, Qicheng Li, Yong Qin
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.08502
  • Pdf link: https://arxiv.org/pdf/2304.08502
  • Abstract
    Predicting the State-of-Health (SoH) of lithium-ion batteries is a fundamental task of battery management systems on electric vehicles. It aims at estimating future SoH based on historical aging data. Most existing deep learning methods rely on filter-based feature extractors (e.g., CNN or Kalman filters) and recurrent time sequence models. Though efficient, they generally ignore cyclic features and the domain gap between training and testing batteries. To address this problem, we present CyFormer, a transformer-based cyclic time sequence model for SoH prediction. Instead of the conventional CNN-RNN structure, we adopt an encoder-decoder architecture. In the encoder, row-wise and column-wise attention blocks effectively capture intra-cycle and inter-cycle connections and extract cyclic features. In the decoder, the SoH queries cross-attend to these features to form the final predictions. We further utilize a transfer learning strategy to narrow the domain gap between the training and testing set. To be specific, we use fine-tuning to shift the model to a target working condition. Finally, we made our model more efficient by pruning. The experiment shows that our method attains an MAE of 0.75% with only 10% data for fine-tuning on a testing battery, surpassing prior methods by a large margin. Effective and robust, our method provides a potential solution for all cyclic time sequence prediction tasks.

Schottky Barrier MOSFET Enabled Ultra-Low Power Real-Time Neuron for Neuromorphic Computing

  • Authors: Shubham Patil, Jayatika Sakhuja, Ajay Kumar Singh, Anmol Biswas, Vivek Saraswat, Sandeep Kumar, Sandip Lashkare, Udayan Ganguly
  • Subjects: Emerging Technologies (cs.ET); Applied Physics (physics.app-ph)
  • Arxiv link: https://arxiv.org/abs/2304.08504
  • Pdf link: https://arxiv.org/pdf/2304.08504
  • Abstract
    Energy-efficient real-time synapses and neurons are essential to enable large-scale neuromorphic computing. In this paper, we propose and demonstrate the Schottky-Barrier MOSFET-based ultra-low power voltage-controlled current source to enable real-time neurons for neuromorphic computing. Schottky-Barrier MOSFET is fabricated on a Silicon-on-insulator platform with polycrystalline Silicon as the channel and Nickel/Platinum as the source/drain. The Poly-Si and Nickel make the back-to-back Schottky junction enabling ultra-low ON current required for energy-efficient neurons.

Popular Support for Balancing Equity and Efficiency in Resource Allocation: A Case Study in Online Advertising to Increase Welfare Program Awareness

  • Authors: Allison Koenecke, Eric Giannella, Robb Willer, Sharad Goel
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.08530
  • Pdf link: https://arxiv.org/pdf/2304.08530
  • Abstract
    Algorithmically optimizing the provision of limited resources is commonplace across domains from healthcare to lending. Optimization can lead to efficient resource allocation, but, if deployed without additional scrutiny, can also exacerbate inequality. Little is known about popular preferences regarding acceptable efficiency-equity trade-offs, making it difficult to design algorithms that are responsive to community needs and desires. Here we examine this trade-off and concomitant preferences in the context of GetCalFresh, an online service that streamlines the application process for California's Supplementary Nutrition Assistance Program (SNAP, formerly known as food stamps). GetCalFresh runs online advertisements to raise awareness of their multilingual SNAP application service. We first demonstrate that when ads are optimized to garner the most enrollments per dollar, a disproportionately small number of Spanish speakers enroll due to relatively higher costs of non-English language advertising. Embedding these results in a survey (N = 1,532) of a diverse set of Americans, we find broad popular support for valuing equity in addition to efficiency: respondents generally preferred reducing total enrollments to facilitate increased enrollment of Spanish speakers. These results buttress recent calls to reevaluate the efficiency-centric paradigm popular in algorithmic resource allocation.

LIMIT: Learning Interfaces to Maximize Information Transfer

  • Authors: Benjamin A. Christie, Dylan P. Losey
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.08539
  • Pdf link: https://arxiv.org/pdf/2304.08539
  • Abstract
    Robots can use auditory, visual, or haptic interfaces to convey information to human users. The way these interfaces select signals is typically pre-defined by the designer: for instance, a haptic wristband might vibrate when the robot is moving and squeeze when the robot stops. But different people interpret the same signals in different ways, so that what makes sense to one person might be confusing or unintuitive to another. In this paper we introduce a unified algorithmic formalism for learning co-adaptive interfaces from scratch. Our method does not need to know the human's task (i.e., what the human is using these signals for). Instead, our insight is that interpretable interfaces should select signals that maximize correlation between the human's actions and the information the interface is trying to convey. Applying this insight we develop LIMIT: Learning Interfaces to Maximize Information Transfer. LIMIT optimizes a tractable, real-time proxy of information gain in continuous spaces. The first time a person works with our system the signals may appear random; but over repeated interactions the interface learns a one-to-one mapping between displayed signals and human responses. Our resulting approach is both personalized to the current user and not tied to any specific interface modality. We compare LIMIT to state-of-the-art baselines across controlled simulations, an online survey, and an in-person user study with auditory, visual, and haptic interfaces. Overall, our results suggest that LIMIT learns interfaces that enable users to complete the task more quickly and efficiently, and users subjectively prefer LIMIT to the alternatives. See videos here: https://youtu.be/IvQ3TM1_2fA.

GrOVe: Ownership Verification of Graph Neural Networks using Embeddings

  • Authors: Asim Waheed, Vasisht Duddu, N. Asokan
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.08566
  • Pdf link: https://arxiv.org/pdf/2304.08566
  • Abstract
    Graph neural networks (GNNs) have emerged as a state-of-the-art approach to model and draw inferences from large scale graph-structured data in various application settings such as social networking. The primary goal of a GNN is to learn an embedding for each graph node in a dataset that encodes both the node features and the local graph structure around the node. Embeddings generated by a GNN for a graph node are unique to that GNN. Prior work has shown that GNNs are prone to model extraction attacks. Model extraction attacks and defenses have been explored extensively in other non-graph settings. While detecting or preventing model extraction appears to be difficult, deterring them via effective ownership verification techniques offer a potential defense. In non-graph settings, fingerprinting models, or the data used to build them, have shown to be a promising approach toward ownership verification. We present GrOVe, a state-of-the-art GNN model fingerprinting scheme that, given a target model and a suspect model, can reliably determine if the suspect model was trained independently of the target model or if it is a surrogate of the target model obtained via model extraction. We show that GrOVe can distinguish between surrogate and independent models even when the independent model uses the same training dataset and architecture as the original target model. Using six benchmark datasets and three model architectures, we show that consistently achieves low false-positive and false-negative rates. We demonstrate that is robust against known fingerprint evasion techniques while remaining computationally efficient.

Traversing combinatorial 0/1-polytopes via optimization

  • Authors: Arturo Merino, Torsten Mütze
  • Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
  • Arxiv link: https://arxiv.org/abs/2304.08567
  • Pdf link: https://arxiv.org/pdf/2304.08567
  • Abstract
    In this paper, we present a new framework that exploits combinatorial optimization for efficiently generating a large variety of combinatorial objects based on graphs, matroids, posets and polytopes. Our method relies on a simple and versatile algorithm for computing a Hamilton path on the skeleton of any 0/1-polytope ${\rm conv}(X)$, where $X\subseteq {0,1}^n$. The algorithm uses as a black box any algorithm that solves a variant of the classical linear optimization problem $\min{w\cdot x\mid x\in X}$, and the resulting delay, i.e., the running time per visited vertex on the Hamilton path, is only by a factor of $\log n$ larger than the running time of the optimization algorithm. When $X$ encodes a particular class of combinatorial objects, then traversing the skeleton of the polytope ${\rm conv}(X)$ along a Hamilton path corresponds to listing the combinatorial objects by local change operations, i.e., we obtain Gray code listings. As concrete results of our general framework, we obtain efficient algorithms for generating all ($c$-optimal) bases in a matroid; ($c$-optimal) spanning trees, forests, ($c$-optimal) matchings in a general graph; ($c$-optimal) vertex covers, ($c$-optimal) stable sets in a bipartite graph; as well as ($c$-optimal) antichains and ideals of a poset. The delay and space required by these algorithms are polynomial in the size of the matroid, graph, or poset, respectively, and these listings correspond to Hamilton paths on the corresponding combinatorial polytopes. We also obtain an $O(t_{\rm LP} \log n)$ delay algorithm for the vertex enumeration problem on 0/1-polytopes ${x\in\mathbb{R}^n\mid Ax\leq b}$, where $A\in \mathbb{R}^{m\times n}$ and $b\in\mathbb{R}^m$, and $t_{\rm LP}$ is the time needed to solve the linear program $\min{w\cdot x\mid Ax\leq b}$. This improves upon the 25-year old $O(t_{\rm LP},n)$ delay algorithm of Bussieck and L"ubbecke.

Diagnosing applications' I/O behavior through system call observability

  • Authors: Tânia Esteves, Ricardo Macedo, Rui Oliveira, João Paulo
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.08569
  • Pdf link: https://arxiv.org/pdf/2304.08569
  • Abstract
    We present DIO, a generic tool for observing inefficient and erroneous I/O interactions between applications and in-kernel storage systems that lead to performance, dependability, and correctness issues. DIO facilitates the analysis and enables near real-time visualization of complex I/O patterns for data-intensive applications generating millions of storage requests. This is achieved by non-intrusively intercepting system calls, enriching collected data with relevant context, and providing timely analysis and visualization for traced events. We demonstrate its usefulness by analyzing two production-level applications. Results show that DIO enables diagnosing resource contention in multi-threaded I/O that leads to high tail latency and erroneous file accesses that cause data loss.

Energy-Efficient Lane Changes Planning and Control for Connected Autonomous Vehicles on Urban Roads

  • Authors: Eunhyek Joa, Hotae Lee, Eric Yongkeun Choi, Francesco Borrelli
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.08576
  • Pdf link: https://arxiv.org/pdf/2304.08576
  • Abstract
    This paper presents a novel energy-efficient motion planning algorithm for Connected Autonomous Vehicles (CAVs) on urban roads. The approach consists of two components: a decision-making algorithm and an optimization-based trajectory planner. The decision-making algorithm leverages Signal Phase and Timing (SPaT) information from connected traffic lights to select a lane with the aim of reducing energy consumption. The algorithm is based on a heuristic rule which is learned from human driving data. The optimization-based trajectory planner generates a safe, smooth, and energy-efficient trajectory toward the selected lane. The proposed strategy is experimentally evaluated in a Vehicle-in-the-Loop (VIL) setting, where a real test vehicle receives SPaT information from both actual and virtual traffic lights and autonomously drives on a testing site, while the surrounding vehicles are simulated. The results demonstrate that the use of SPaT information in autonomous driving leads to improved energy efficiency, with the proposed strategy saving 37.1% energy consumption compared to a lane-keeping algorithm.

Graph Sparsification by Approximate Matrix Multiplication

  • Authors: Neophytos Charalambides, Alfred O. Hero III
  • Subjects: Numerical Analysis (math.NA); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Signal Processing (eess.SP); Spectral Theory (math.SP)
  • Arxiv link: https://arxiv.org/abs/2304.08581
  • Pdf link: https://arxiv.org/pdf/2304.08581
  • Abstract
    Graphs arising in statistical problems, signal processing, large networks, combinatorial optimization, and data analysis are often dense, which causes both computational and storage bottlenecks. One way of \textit{sparsifying} a \textit{weighted} graph, while sharing the same vertices as the original graph but reducing the number of edges, is through \textit{spectral sparsification}. We study this problem through the perspective of RandNLA. Specifically, we utilize randomized matrix multiplication to give a clean and simple analysis of how sampling according to edge weights gives a spectral approximation to graph Laplacians. Through the $CR$-MM algorithm, we attain a simple and computationally efficient sparsifier whose resulting Laplacian estimate is unbiased and of minimum variance. Furthermore, we define a new notion of \textit{additive spectral sparsifiers}, which has not been considered in the literature.

Safe Navigation and Obstacle Avoidance Using Differentiable Optimization Based Control Barrier Functions

  • Authors: Bolun Dai, Rooholla Khorrambakht, Prashanth Krishnamurthy, Vinícius Gonçalves, Anthony Tzes, Farshad Khorrami
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.08586
  • Pdf link: https://arxiv.org/pdf/2304.08586
  • Abstract
    Control barrier functions (CBFs) have been widely applied to safety-critical robotic applications. However, the construction of control barrier functions for robotic systems remains a challenging task. Recently, collision detection using differentiable optimization has provided a way to compute the minimum uniform scaling factor that results in an intersection between two convex shapes and to also compute the Jacobian of the scaling factor. In this paper, we propose a framework that uses this scaling factor, with an offset, to systematically define a CBF for obstacle avoidance tasks. We provide a theoretical analysis that proves the continuity of the proposed CBF. Empirically, we show that the proposed CBF is continuously differentiable, and the resulting optimal control problem is computationally efficient, which makes it applicable for real-time robotic control. We validate our approach, first using a 2D mobile robot example, then on the Franka-Emika Research~3 (FR3) robot manipulator both in simulation and experiment.

Revisiting Block-Diagonal SDP Relaxations for the Clique Number of the Paley Graphs

  • Authors: Vladimir A. Kobzar, Krishnan Mody
  • Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Combinatorics (math.CO); Number Theory (math.NT); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.08615
  • Pdf link: https://arxiv.org/pdf/2304.08615
  • Abstract
    This work addresses the block-diagonal semidefinite program (SDP) relaxations for the clique number of the Paley graphs. The size of the maximal clique (clique number) of a graph is a classic NP-complete problem; a Paley graph is a deterministic graph where two vertices are connected if their difference is a quadratic residue modulo certain prime powers. Improving the upper bound for the Paley graph clique number for odd prime powers is an open problem in combinatorics. Moreover, since quadratic residues exhibit pseudorandom properties, Paley graphs are related to the construction of deterministic restricted isometries, an open problem in compressed sensing and sparse recovery. Recent work provides evidence that the current upper bounds can be improved by the sum-of-squares (SOS) relaxations. In particular the bounds given by the SOS relaxations of degree 4 (SOS-4) are asymptotically growing at an order smaller than square root of the prime. However computations of SOS-4 become intractable with respect to large graphs. Gvozdenovic et al. introduced a more computationally efficient block-diagonal hierarchy of SDPs that refines the SOS hierarchy. They computed the values of these SDPs of degrees 2 and 3 (L2 and L3 respectively) for the Paley graph clique numbers associated with primes p less or equal to 809. These values bound from the above the values of the corresponding SOS-4 and SOS-6 relaxations respectively. We revisit these computations and determine the values of the L2 relaxation for larger p's. Our results provide additional numerical evidence that the L2 relaxations, and therefore also the SOS-4 relaxations, are asymptotically growing at an order smaller than the square root of p.

Dynamic Vector Bin Packing for Online Resource Allocation in the Cloud

  • Authors: Aniket Murhekar, David Arbour, Tung Mai, Anup Rao
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.08648
  • Pdf link: https://arxiv.org/pdf/2304.08648
  • Abstract
    Several cloud-based applications, such as cloud gaming, rent servers to execute jobs which arrive in an online fashion. Each job has a resource demand and must be dispatched to a cloud server which has enough resources to execute the job, which departs after its completion. Under the `pay-as-you-go' billing model, the server rental cost is proportional to the total time that servers are actively running jobs. The problem of efficiently allocating a sequence of online jobs to servers without exceeding the resource capacity of any server while minimizing total server usage time can be modelled as a variant of the dynamic bin packing problem (DBP), called MinUsageTime DBP. In this work, we initiate the study of the problem with multi-dimensional resource demands (e.g. CPU/GPU usage, memory requirement, bandwidth usage, etc.), called MinUsageTime Dynamic Vector Bin Packing (DVBP). We study the competitive ratio (CR) of Any Fit packing algorithms for this problem. We show almost-tight bounds on the CR of three specific Any Fit packing algorithms, namely First Fit, Next Fit, and Move To Front. We prove that the CR of Move To Front is at most $(2\mu+1)d +1$, where $\mu$ is the ratio of the max/min item durations. For $d=1$, this significantly improves the previously known upper bound of $6\mu+7$ (Kamali & Lopez-Ortiz, 2015). We then prove the CR of First Fit and Next Fit are bounded by $(\mu+2)d+1$ and $2\mu d+1$, respectively. Next, we prove a lower bound of $(\mu+1)d$ on the CR of any Any Fit packing algorithm, an improved lower bound of $2\mu d$ for Next Fit, and a lower bound of $2\mu$ for Move To Front in the 1-D case. All our bounds improve or match the best-known bounds for the 1-D case. Finally, we experimentally study the average-case performance of these algorithms on randomly generated synthetic data, and observe that Move To Front outperforms other Any Fit packing algorithms.

An Ethereum-compatible blockchain that explicates and ensures design-level safety properties for smart contracts

  • Authors: Nikolaj Bjørner, Shuo Chen, Yang Chen, Zhongxin Guo, Peng Liu, Nanqing Luo
  • Subjects: Cryptography and Security (cs.CR); Programming Languages (cs.PL)
  • Arxiv link: https://arxiv.org/abs/2304.08655
  • Pdf link: https://arxiv.org/pdf/2304.08655
  • Abstract
    Smart contracts are crucial elements of decentralized technologies, but they face significant obstacles to trustworthiness due to security bugs and trapdoors. To address the core issue, we propose a technology that enables programmers to focus on design-level properties rather than specific low-level attack patterns. Our proposed technology, called Theorem-Carrying-Transaction (TCT), combines the benefits of runtime checking and symbolic proof. Under the TCT protocol, every transaction must carry a theorem that proves its adherence to the safety properties in the invoked contracts, and the blockchain checks the proof before executing the transaction. The unique design of TCT ensures that the theorems are provable and checkable in an efficient manner. We believe that TCT holds a great promise for enabling provably secure smart contracts in the future. As such, we call for collaboration toward this vision.

Space Efficient Sequence Alignment for SRAM-Based Computing: X-Drop on the Graphcore IPU

  • Authors: Luk Burchard, Max Xiaohang Zhao, Johannes Langguth, Aydın Buluç, Giulia Guidi
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Genomics (q-bio.GN)
  • Arxiv link: https://arxiv.org/abs/2304.08662
  • Pdf link: https://arxiv.org/pdf/2304.08662
  • Abstract
    Dedicated accelerator hardware has become essential for processing AI-based workloads, leading to the rise of novel accelerator architectures. Furthermore, fundamental differences in memory architecture and parallelism have made these accelerators targets for scientific computing. The sequence alignment problem is fundamental in bioinformatics; we have implemented the $X$-Drop algorithm, a heuristic method for pairwise alignment that reduces search space, on the Graphcore Intelligence Processor Unit (IPU) accelerator. The $X$-Drop algorithm has an irregular computational pattern, which makes it difficult to accelerate due to load balancing. Here, we introduce a graph-based partitioning and queue-based batch system to improve load balancing. Our implementation achieves $10\times$ speedup over a state-of-the-art GPU implementation and up to $4.65\times$ compared to CPU. In addition, we introduce a memory-restricted $X$-Drop algorithm that reduces memory footprint by $55\times$ and efficiently uses the IPU's limited low-latency SRAM. This optimization further improves the strong scaling performance by $3.6\times$.

Continuous Versatile Jumping Using Learned Action Residuals

  • Authors: Yuxiang Yang, Xiangyun Meng, Wenhao Yu, Tingnan Zhang, Jie Tan, Byron Boots
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08663
  • Pdf link: https://arxiv.org/pdf/2304.08663
  • Abstract
    Jumping is essential for legged robots to traverse through difficult terrains. In this work, we propose a hierarchical framework that combines optimal control and reinforcement learning to learn continuous jumping motions for quadrupedal robots. The core of our framework is a stance controller, which combines a manually designed acceleration controller with a learned residual policy. As the acceleration controller warm starts policy for efficient training, the trained policy overcomes the limitation of the acceleration controller and improves the jumping stability. In addition, a low-level whole-body controller converts the body pose command from the stance controller to motor commands. After training in simulation, our framework can be deployed directly to the real robot, and perform versatile, continuous jumping motions, including omni-directional jumps at up to 50cm high, 60cm forward, and jump-turning at up to 90 degrees. Please visit our website for more results: https://sites.google.com/view/learning-to-jump.

A Voice Disease Detection Method Based on MFCCs and Shallow CNN

  • Authors: Xiaoping Xie, Hao Cai, Can Li, Fei Ding
  • Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.08708
  • Pdf link: https://arxiv.org/pdf/2304.08708
  • Abstract
    The incidence rate of voice diseases is increasing year by year. The use of software for remote diagnosis is a technical development trend and has important practical value. Among voice diseases, common diseases that cause hoarseness include spasmodic dysphonia, vocal cord paralysis, vocal nodule, and vocal cord polyp. This paper presents a voice disease detection method that can be applied in a wide range of clinical. We cooperated with Xiangya Hospital of Central South University to collect voice samples from sixty-one different patients. The Mel Frequency Cepstrum Coefficient (MFCC) parameters are extracted as input features to describe the voice in the form of data. An innovative model combining MFCC parameters and single convolution layer CNN is proposed for fast calculation and classification. The highest accuracy we achieved was 92%, it is fully ahead of the original research results and internationally advanced. And we use Advanced Voice Function Assessment Databases (AVFAD) to evaluate the generalization ability of the method we proposed, which achieved an accuracy rate of 98%. Experiments on clinical and standard datasets show that for the pathological detection of voice diseases, our method has greatly improved in accuracy and computational efficiency.

InversOS: Efficient Control-Flow Protection for AArch64 Applications with Privilege Inversion

  • Authors: Zhuojia Shen, John Criswell
  • Subjects: Cryptography and Security (cs.CR); Operating Systems (cs.OS)
  • Arxiv link: https://arxiv.org/abs/2304.08717
  • Pdf link: https://arxiv.org/pdf/2304.08717
  • Abstract
    With the increasing popularity of AArch64 processors in general-purpose computing, securing software running on AArch64 systems against control-flow hijacking attacks has become a critical part toward secure computation. Shadow stacks keep shadow copies of function return addresses and, when protected from illegal modifications and coupled with forward-edge control-flow integrity, form an effective and proven defense against such attacks. However, AArch64 lacks native support for write-protected shadow stacks, while software alternatives either incur prohibitive performance overhead or provide weak security guarantees. We present InversOS, the first hardware-assisted write-protected shadow stacks for AArch64 user-space applications, utilizing commonly available features of AArch64 to achieve efficient intra-address space isolation (called Privilege Inversion) required to protect shadow stacks. Privilege Inversion adopts unconventional design choices that run protected applications in the kernel mode and mark operating system (OS) kernel memory as user-accessible; InversOS therefore uses a novel combination of OS kernel modifications, compiler transformations, and another AArch64 feature to ensure the safety of doing so and to support legacy applications. We show that InversOS is secure by design, effective against various control-flow hijacking attacks, and performant on selected benchmarks and applications (incurring overhead of 7.0% on LMBench, 7.1% on SPEC CPU 2017, and 3.0% on Nginx web server).

Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets

  • Authors: Maximilian Du, Suraj Nair, Dorsa Sadigh, Chelsea Finn
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08742
  • Pdf link: https://arxiv.org/pdf/2304.08742
  • Abstract
    Enabling robots to learn novel visuomotor skills in a data-efficient manner remains an unsolved problem with myriad challenges. A popular paradigm for tackling this problem is through leveraging large unlabeled datasets that have many behaviors in them and then adapting a policy to a specific task using a small amount of task-specific human supervision (i.e. interventions or demonstrations). However, how best to leverage the narrow task-specific supervision and balance it with offline data remains an open question. Our key insight in this work is that task-specific data not only provides new data for an agent to train on but can also inform the type of prior data the agent should use for learning. Concretely, we propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset (including many sub-optimal behaviors). The agent is then jointly trained on the expert and queried data. We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data. By doing so, it is able to learn more effectively from the mix of task-specific and offline data compared to naively mixing the data or only using the task-specific data. Furthermore, we find that our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images. See https://sites.google.com/view/behaviorretrieval for videos and code.

A Survey on Biomedical Text Summarization with Pre-trained Language Model

  • Authors: Qianqian Xie, Zheheng Luo, Benyou Wang, Sophia Ananiadou
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.08763
  • Pdf link: https://arxiv.org/pdf/2304.08763
  • Abstract
    The exponential growth of biomedical texts such as biomedical literature and electronic health records (EHRs), provides a big challenge for clinicians and researchers to access clinical information efficiently. To address the problem, biomedical text summarization has been proposed to support clinical information retrieval and management, aiming at generating concise summaries that distill key information from single or multiple biomedical documents. In recent years, pre-trained language models (PLMs) have been the de facto standard of various natural language processing tasks in the general domain. Most recently, PLMs have been further investigated in the biomedical field and brought new insights into the biomedical text summarization task. In this paper, we systematically summarize recent advances that explore PLMs for biomedical text summarization, to help understand recent progress, challenges, and future directions. We categorize PLMs-based approaches according to how they utilize PLMs and what PLMs they use. We then review available datasets, recent approaches and evaluation metrics of the task. We finally discuss existing challenges and promising future directions. To facilitate the research community, we line up open resources including available datasets, recent approaches, codes, evaluation metrics, and the leaderboard in a public project: https://github.com/KenZLuo/Biomedical-Text-Summarization-Survey/tree/master.

Sparks of GPTs in Edge Intelligence for Metaverse: Caching and Inference for Mobile AIGC Services

  • Authors: Minrui Xu, Dusit Niyato, Hongliang Zhang, Jiawen Kang, Zehui Xiong, Shiwen Mao, Zhu Han
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.08782
  • Pdf link: https://arxiv.org/pdf/2304.08782
  • Abstract
    Aiming at achieving artificial general intelligence (AGI) for Metaverse, pretrained foundation models (PFMs), e.g., generative pretrained transformers (GPTs), can effectively provide various AI services, such as autonomous driving, digital twins, and AI-generated content (AIGC) for extended reality. With the advantages of low latency and privacy-preserving, serving PFMs of mobile AI services in edge intelligence is a viable solution for caching and executing PFMs on edge servers with limited computing resources and GPU memory. However, PFMs typically consist of billions of parameters that are computation and memory-intensive for edge servers during loading and execution. In this article, we investigate edge PFM serving problems for mobile AIGC services of Metaverse. First, we introduce the fundamentals of PFMs and discuss their characteristic fine-tuning and inference methods in edge intelligence. Then, we propose a novel framework of joint model caching and inference for managing models and allocating resources to satisfy users' requests efficiently. Furthermore, considering the in-context learning ability of PFMs, we propose a new metric to evaluate the freshness and relevance between examples in demonstrations and executing tasks, namely the Age of Context (AoC). Finally, we propose a least context algorithm for managing cached models at edge servers by balancing the tradeoff among latency, energy consumption, and accuracy.

Connectivity in the presence of an opponent

  • Authors: Zihui Liang, Bakh Khoussainov, Toru Takisaka, Mingyu Xiao
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2304.08783
  • Pdf link: https://arxiv.org/pdf/2304.08783
  • Abstract
    The paper introduces two player connectivity games played on finite bipartite graphs. Algorithms that solve these connectivity games can be used as subroutines for solving M"uller games. M"uller games constitute a well established class of games in model checking and verification. In connectivity games, the objective of one of the players is to visit every node of the game graph infinitely often. The first contribution of this paper is our proof that solving connectivity games can be reduced to the incremental strongly connected component maintenance (ISCCM) problem, an important problem in graph algorithms and data structures. The second contribution is that we non-trivially adapt two known algorithms for the ISCCM problem to provide two efficient algorithms that solve the connectivity games problem. Finally, based on the techniques developed, we recast Horn's polynomial time algorithm that solves explicitly given M"uller games and provide an alternative proof of its correctness. Our algorithms are more efficient than that of Horn's algorithm. Our solution for connectivity games is used as a subroutine in the algorithm.

Large-scale Dynamic Network Representation via Tensor Ring Decomposition

  • Authors: Qu Wang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08798
  • Pdf link: https://arxiv.org/pdf/2304.08798
  • Abstract
    Large-scale Dynamic Networks (LDNs) are becoming increasingly important in the Internet age, yet the dynamic nature of these networks captures the evolution of the network structure and how edge weights change over time, posing unique challenges for data analysis and modeling. A Latent Factorization of Tensors (LFT) model facilitates efficient representation learning for a LDN. But the existing LFT models are almost based on Canonical Polyadic Factorization (CPF). Therefore, this work proposes a model based on Tensor Ring (TR) decomposition for efficient representation learning for a LDN. Specifically, we incorporate the principle of single latent factor-dependent, non-negative, and multiplicative update (SLF-NMU) into the TR decomposition model, and analyze the particular bias form of TR decomposition. Experimental studies on two real LDNs demonstrate that the propose method achieves higher accuracy than existing models.

Neuromorphic computing for attitude estimation onboard quadrotors

  • Authors: Stein Stroobants, Julien Dupeyroux, Guido C.H.E. de Croon
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.08802
  • Pdf link: https://arxiv.org/pdf/2304.08802
  • Abstract
    Compelling evidence has been given for the high energy efficiency and update rates of neuromorphic processors, with performance beyond what standard Von Neumann architectures can achieve. Such promising features could be advantageous in critical embedded systems, especially in robotics. To date, the constraints inherent in robots (e.g., size and weight, battery autonomy, available sensors, computing resources, processing time, etc.), and particularly in aerial vehicles, severely hamper the performance of fully-autonomous on-board control, including sensor processing and state estimation. In this work, we propose a spiking neural network (SNN) capable of estimating the pitch and roll angles of a quadrotor in highly dynamic movements from 6-degree of freedom Inertial Measurement Unit (IMU) data. With only 150 neurons and a limited training dataset obtained using a quadrotor in a real world setup, the network shows competitive results as compared to state-of-the-art, non-neuromorphic attitude estimators. The proposed architecture was successfully tested on the Loihi neuromorphic processor on-board a quadrotor to estimate the attitude when flying. Our results show the robustness of neuromorphic attitude estimation and pave the way towards energy-efficient, fully autonomous control of quadrotors with dedicated neuromorphic computing systems.

Implicit representation priors meet Riemannian geometry for Bayesian robotic grasping

  • Authors: Norman Marlier, Julien Gustin, Gilles Louppe, Olivier Brüls
  • Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08805
  • Pdf link: https://arxiv.org/pdf/2304.08805
  • Abstract
    Robotic grasping in highly noisy environments presents complex challenges, especially with limited prior knowledge about the scene. In particular, identifying good grasping poses with Bayesian inference becomes difficult due to two reasons: i) generating data from uninformative priors proves to be inefficient, and ii) the posterior often entails a complex distribution defined on a Riemannian manifold. In this study, we explore the use of implicit representations to construct scene-dependent priors, thereby enabling the application of efficient simulation-based Bayesian inference algorithms for determining successful grasp poses in unstructured environments. Results from both simulation and physical benchmarks showcase the high success rate and promising potential of this approach.

Revisiting the Role of Similarity and Dissimilarity inBest Counter Argument Retrieval

  • Authors: Hongguang Shi, Shuirong Cao, Cam-Tu Nguyen
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.08807
  • Pdf link: https://arxiv.org/pdf/2304.08807
  • Abstract
    This paper studies the task of best counter-argument retrieval given an input argument. Following the definition that the best counter-argument addresses the same aspects as the input argument while having the opposite stance, we aim to develop an efficient and effective model for scoring counter-arguments based on similarity and dissimilarity metrics. We first conduct an experimental study on the effectiveness of available scoring methods, including traditional Learning-To-Rank (LTR) and recent neural scoring models. We then propose Bipolar-encoder, a novel BERT-based model to learn an optimal representation for simultaneous similarity and dissimilarity. Experimental results show that our proposed method can achieve the accuracy@1 of 88.9%, which significantly outperforms other baselines by a large margin. When combined with an appropriate caching technique, Bipolar-encoder is comparably efficient at prediction time.

DILI: A Distribution-Driven Learned Index

  • Authors: Pengfei Li, Hua Lu, Rong Zhu, Bolin Ding, Long Yang, Gang Pan
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.08817
  • Pdf link: https://arxiv.org/pdf/2304.08817
  • Abstract
    Targeting in-memory one-dimensional search keys, we propose a novel DIstribution-driven Learned Index tree (DILI), where a concise and computation-efficient linear regression model is used for each node. An internal node's key range is equally divided by its child nodes such that a key search enjoys perfect model prediction accuracy to find the relevant leaf node. A leaf node uses machine learning models to generate searchable data layout and thus accurately predicts the data record position for a key. To construct DILI, we first build a bottom-up tree with linear regression models according to global and local key distributions. Using the bottom-up tree, we build DILI in a top-down manner, individualizing the fanouts for internal nodes according to local distributions. DILI strikes a good balance between the number of leaf nodes and the height of the tree, two critical factors of key search time. Moreover, we design flexible algorithms for DILI to efficiently insert and delete keys and automatically adjust the tree structure when necessary. Extensive experimental results show that DILI outperforms the state-of-the-art alternatives on different kinds of workloads.

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

  • Authors: Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08818
  • Pdf link: https://arxiv.org/pdf/2304.08818
  • Abstract
    Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task. We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i.e., videos. Similarly, we temporally align diffusion model upsamplers, turning them into temporally consistent video super resolution models. We focus on two relevant real-world applications: Simulation of in-the-wild driving data and creative content creation with text-to-video modeling. In particular, we validate our Video LDM on real driving videos of resolution 512 x 1024, achieving state-of-the-art performance. Furthermore, our approach can easily leverage off-the-shelf pre-trained image LDMs, as we only need to train a temporal alignment model in that case. Doing so, we turn the publicly available, state-of-the-art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280 x 2048. We show that the temporal layers trained in this way generalize to different fine-tuned text-to-image LDMs. Utilizing this property, we show the first results for personalized text-to-video generation, opening exciting directions for future content creation. Project page: https://research.nvidia.com/labs/toronto-ai/VideoLDM/

Motion-state Alignment for Video Semantic Segmentation

  • Authors: Jinming Su, Ruihong Yin, Shuaibin Zhang, Junfeng Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.08820
  • Pdf link: https://arxiv.org/pdf/2304.08820
  • Abstract
    In recent years, video semantic segmentation has made great progress with advanced deep neural networks. However, there still exist two main challenges \ie, information inconsistency and computation cost. To deal with the two difficulties, we propose a novel motion-state alignment framework for video semantic segmentation to keep both motion and state consistency. In the framework, we first construct a motion alignment branch armed with an efficient decoupled transformer to capture dynamic semantics, guaranteeing region-level temporal consistency. Then, a state alignment branch composed of a stage transformer is designed to enrich feature spaces for the current frame to extract static semantics and achieve pixel-level state consistency. Next, by a semantic assignment mechanism, the region descriptor of each semantic category is gained from dynamic semantics and linked with pixel descriptors from static semantics. Benefiting from the alignment of these two kinds of effective information, the proposed method picks up dynamic and static semantics in a targeted way, so that video semantic regions are consistently segmented to obtain precise locations with low computational complexity. Extensive experiments on Cityscapes and CamVid datasets show that the proposed approach outperforms state-of-the-art methods and validates the effectiveness of the motion-state alignment framework.

Contact Tracing over Uncertain Indoor Positioning Data (Extended Version)

  • Authors: Tiantian Liu, Huan Li, Hua Lu, Muhammad Aamir Cheema, Harry Kai-Ho Chan
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.08838
  • Pdf link: https://arxiv.org/pdf/2304.08838
  • Abstract
    Pandemics often cause dramatic losses of human lives and impact our societies in many aspects such as public health, tourism, and economy. To contain the spread of an epidemic like COVID-19, efficient and effective contact tracing is important, especially in indoor venues where the risk of infection is higher. In this work, we formulate and study a novel query called Indoor Contact Query (ICQ) over raw, uncertain indoor positioning data that digitalizes people's movements indoors. Given a query object o, e.g., a person confirmed to be a virus carrier, an ICQ analyzes uncertain indoor positioning data to find objects that most likely had close contact with o for a long period of time. To process ICQ, we propose a set of techniques. First, we design an enhanced indoor graph model to organize different types of data necessary for ICQ. Second, for indoor moving objects, we devise methods to determine uncertain regions and to derive positioning samples missing in the raw data. Third, we propose a query processing framework with a close contact determination method, a search algorithm, and the acceleration strategies. We conduct extensive experiments on synthetic and real datasets to evaluate our proposals. The results demonstrate the efficiency and effectiveness of our proposals.

GoferBot: A Visual Guided Human-Robot Collaborative Assembly System

  • Authors: Zheyu Zhuang, Yizhak Ben-Shabat, Jiahao Zhang, Stephen Gould, Robert Mahony
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.08840
  • Pdf link: https://arxiv.org/pdf/2304.08840
  • Abstract
    The current transformation towards smart manufacturing has led to a growing demand for human-robot collaboration (HRC) in the manufacturing process. Perceiving and understanding the human co-worker's behaviour introduces challenges for collaborative robots to efficiently and effectively perform tasks in unstructured and dynamic environments. Integrating recent data-driven machine vision capabilities into HRC systems is a logical next step in addressing these challenges. However, in these cases, off-the-shelf components struggle due to generalisation limitations. Real-world evaluation is required in order to fully appreciate the maturity and robustness of these approaches. Furthermore, understanding the pure-vision aspects is a crucial first step before combining multiple modalities in order to understand the limitations. In this paper, we propose GoferBot, a novel vision-based semantic HRC system for a real-world assembly task. It is composed of a visual servoing module that reaches and grasps assembly parts in an unstructured multi-instance and dynamic environment, an action recognition module that performs human action prediction for implicit communication, and a visual handover module that uses the perceptual understanding of human behaviour to produce an intuitive and efficient collaborative assembly experience. GoferBot is a novel assembly system that seamlessly integrates all sub-modules by utilising implicit semantic information purely from visual perception.

Two-stage Denoising Diffusion Model for Source Localization in Graph Inverse Problems

  • Authors: Bosong Huang, Weihao Yu, Ruzhong Xie, Jing Xiao, Jin Huang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08841
  • Pdf link: https://arxiv.org/pdf/2304.08841
  • Abstract
    Source localization is the inverse problem of graph information dissemination and has broad practical applications. However, the inherent intricacy and uncertainty in information dissemination pose significant challenges, and the ill-posed nature of the source localization problem further exacerbates these challenges. Recently, deep generative models, particularly diffusion models inspired by classical non-equilibrium thermodynamics, have made significant progress. While diffusion models have proven to be powerful in solving inverse problems and producing high-quality reconstructions, applying them directly to the source localization is infeasible for two reasons. Firstly, it is impossible to calculate the posterior disseminated results on a large-scale network for iterative denoising sampling, which would incur enormous computational costs. Secondly, in the existing methods for this field, the training data itself are ill-posed (many-to-one); thus simply transferring the diffusion model would only lead to local optima. To address these challenges, we propose a two-stage optimization framework, the source localization denoising diffusion model (SL-Diff). In the coarse stage, we devise the source proximity degrees as the supervised signals to generate coarse-grained source predictions. This aims to efficiently initialize the next stage, significantly reducing its convergence time and calibrating the convergence process. Furthermore, the introduction of cascade temporal information in this training method transforms the many-to-one mapping relationship into a one-to-one relationship, perfectly addressing the ill-posed problem. In the fine stage, we design a diffusion model for the graph inverse problem that can quantify the uncertainty in the dissemination. The proposed SL-Diff yields excellent prediction results within a reasonable sampling time at extensive experiments.

Revisiting Fast Fourier multiplication algorithms on quotient rings

  • Authors: Ramiro Martínez, Paz Morillo
  • Subjects: Discrete Mathematics (cs.DM)
  • Arxiv link: https://arxiv.org/abs/2304.08860
  • Pdf link: https://arxiv.org/pdf/2304.08860
  • Abstract
    This work formalizes efficient Fast Fourier-based multiplication algorithms for polynomials in quotient rings such as $\mathbb{Z}_{m}[x]/\left<x^{n}-a\right>$, with $n$ a power of 2 and $m$ a non necessarily prime integer. We also present a meticulous study on the necessary and/or sufficient conditions required for the applicability of these multiplication algorithms. This paper allows us to unify the different approaches to the problem of efficiently computing the product of two polynomials in these quotient rings.

Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition

  • Authors: Maurits Bleeker, Pawel Swietojanski, Stefan Braun, Xiaodan Zhuang
  • Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.08862
  • Pdf link: https://arxiv.org/pdf/2304.08862
  • Abstract
    This paper presents an extension to train end-to-end Context-Aware Transformer Transducer ( CATT ) models by using a simple, yet efficient method of mining hard negative phrases from the latent space of the context encoder. During training, given a reference query, we mine a number of similar phrases using approximate nearest neighbour search. These sampled phrases are then used as negative examples in the context list alongside random and ground truth contextual information. By including approximate nearest neighbour phrases (ANN-P) in the context list, we encourage the learned representation to disambiguate between similar, but not identical, biasing phrases. This improves biasing accuracy when there are several similar phrases in the biasing inventory. We carry out experiments in a large-scale data regime obtaining up to 7% relative word error rate reductions for the contextual portion of test data. We also extend and evaluate CATT approach in streaming applications.

Romanization-based Large-scale Adaptation of Multilingual Language Models

  • Authors: Sukannya Purkayastha, Sebastian Ruder, Jonas Pfeiffer, Iryna Gurevych, Ivan Vulić
  • Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08865
  • Pdf link: https://arxiv.org/pdf/2304.08865
  • Abstract
    Large multilingual pretrained language models (mPLMs) have become the de facto state of the art for cross-lingual transfer in NLP. However, their large-scale deployment to many languages, besides pretraining data scarcity, is also hindered by the increase in vocabulary size and limitations in their parameter budget. In order to boost the capacity of mPLMs to deal with low-resource and unseen languages, we explore the potential of leveraging transliteration on a massive scale. In particular, we explore the UROMAN transliteration tool, which provides mappings from UTF-8 to Latin characters for all the writing systems, enabling inexpensive romanization for virtually any language. We first focus on establishing how UROMAN compares against other language-specific and manually curated transliterators for adapting multilingual PLMs. We then study and compare a plethora of data- and parameter-efficient strategies for adapting the mPLMs to romanized and non-romanized corpora of 14 diverse low-resource languages. Our results reveal that UROMAN-based transliteration can offer strong performance for many languages, with particular gains achieved in the most challenging setups: on languages with unseen scripts and with limited training data without any vocabulary augmentation. Further analyses reveal that an improved tokenizer based on romanized data can even outperform non-transliteration-based methods in the majority of languages.

Differentiable Genetic Programming for High-dimensional Symbolic Regression

  • Authors: Peng Zeng, Xiaotian Song, Andrew Lensen, Yuwei Ou, Yanan Sun, Mengjie Zhang, Jiancheng Lv
  • Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08915
  • Pdf link: https://arxiv.org/pdf/2304.08915
  • Abstract
    Symbolic regression (SR) is the process of discovering hidden relationships from data with mathematical expressions, which is considered an effective way to reach interpretable machine learning (ML). Genetic programming (GP) has been the dominator in solving SR problems. However, as the scale of SR problems increases, GP often poorly demonstrates and cannot effectively address the real-world high-dimensional problems. This limitation is mainly caused by the stochastic evolutionary nature of traditional GP in constructing the trees. In this paper, we propose a differentiable approach named DGP to construct GP trees towards high-dimensional SR for the first time. Specifically, a new data structure called differentiable symbolic tree is proposed to relax the discrete structure to be continuous, thus a gradient-based optimizer can be presented for the efficient optimization. In addition, a sampling method is proposed to eliminate the discrepancy caused by the above relaxation for valid symbolic expressions. Furthermore, a diversification mechanism is introduced to promote the optimizer escaping from local optima for globally better solutions. With these designs, the proposed DGP method can efficiently search for the GP trees with higher performance, thus being capable of dealing with high-dimensional SR. To demonstrate the effectiveness of DGP, we conducted various experiments against the state of the arts based on both GP and deep neural networks. The experiment results reveal that DGP can outperform these chosen peer competitors on high-dimensional regression benchmarks with dimensions varying from tens to thousands. In addition, on the synthetic SR problems, the proposed DGP method can also achieve the best recovery rate even with different noisy levels. It is believed this work can facilitate SR being a powerful alternative to interpretable ML for a broader range of real-world problems.

Coefficient Synthesis for Threshold Automata

  • Authors: A. R. Balasubramanian
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.08917
  • Pdf link: https://arxiv.org/pdf/2304.08917
  • Abstract
    Threshold automata are a formalism for modeling fault-tolerant distributed algorithms. The main feature of threshold automata is the notion of a threshold guard, which allows us to compare the number of received messages with the total number of different types of processes. In this paper, we consider the coefficient synthesis problem for threshold automata, in which we are given a sketch of a threshold automaton (with the constants in the threshold guards left unspecified) and a specification and we want to synthesize a set of constants which when plugged into the sketch, gives a threshold automaton satisfying the specification. Our main result is that this problem is undecidable, even when the specification is a coverability specification and the underlying sketch is acyclic.

Quantum Annealing for Single Image Super-Resolution

  • Authors: Han Yao Choong, Suryansh Kumar, Luc Van Gool
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08924
  • Pdf link: https://arxiv.org/pdf/2304.08924
  • Abstract
    This paper proposes a quantum computing-based algorithm to solve the single image super-resolution (SISR) problem. One of the well-known classical approaches for SISR relies on the well-established patch-wise sparse modeling of the problem. Yet, this field's current state of affairs is that deep neural networks (DNNs) have demonstrated far superior results than traditional approaches. Nevertheless, quantum computing is expected to become increasingly prominent for machine learning problems soon. As a result, in this work, we take the privilege to perform an early exploration of applying a quantum computing algorithm to this important image enhancement problem, i.e., SISR. Among the two paradigms of quantum computing, namely universal gate quantum computing and adiabatic quantum computing (AQC), the latter has been successfully applied to practical computer vision problems, in which quantum parallelism has been exploited to solve combinatorial optimization efficiently. This work demonstrates formulating quantum SISR as a sparse coding optimization problem, which is solved using quantum annealers accessed via the D-Wave Leap platform. The proposed AQC-based algorithm is demonstrated to achieve improved speed-up over a classical analog while maintaining comparable SISR accuracy.

Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks

  • Authors: Ping Gong, Yuxin Ma, Cheng Li, Xiaosong Ma, Sam H. Noh
  • Subjects: Machine Learning (cs.LG); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.08925
  • Pdf link: https://arxiv.org/pdf/2304.08925
  • Abstract
    In this paper, we primarily focus on understanding the data preprocessing pipeline for DNN Training in the public cloud. First, we run experiments to test the performance implications of the two major data preprocessing methods using either raw data or record files. The preliminary results show that data preprocessing is a clear bottleneck, even with the most efficient software and hardware configuration enabled by NVIDIA DALI, a high-optimized data preprocessing library. Second, we identify the potential causes, exercise a variety of optimization methods, and present their pros and cons. We hope this work will shed light on the new co-design of data storage, loading pipeline'' and training framework'' and flexible resource configurations between them so that the resources can be fully exploited and performance can be maximized.

Multitenant Containers as a Service (CaaS) for Clouds and Edge Clouds

  • Authors: Berat Can Senel, Maxime Mouchet, Justin Cappos, Olivier Fourmaux, Timur Friedman, Rick McGeer
  • Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.08927
  • Pdf link: https://arxiv.org/pdf/2304.08927
  • Abstract
    Cloud computing, offering on-demand access to computing resources through the Internet and the pay-as-you-go model, has marked the last decade with its three main service models; Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). The lightweight nature of containers compared to virtual machines has led to the rapid uptake of another in recent years, called Containers as a Service (CaaS), which falls between IaaS and PaaS regarding control abstraction. However, when CaaS is offered to multiple independent users, or tenants, a multi-instance approach is used, in which each tenant receives its own separate cluster, which reimposes significant overhead due to employing virtual machines for isolation. If CaaS is to be offered not just at the cloud, but also at the edge cloud, where resources are limited, another solution is required. We introduce a native CaaS multitenancy framework, meaning that tenants share a cluster, which is more efficient than the one tenant per cluster model. Whenever there are shared resources, isolation of multitenant workloads is an issue. Such workloads can be isolated by Kata Containers today. Besides, our framework esteems the application requirements that compel complete isolation and a fully customized environment. Node-level slicing empowers tenants to programmatically reserve isolated subclusters where they can choose the container runtime that suits application needs. The framework is publicly available as liberally-licensed, free, open-source software that extends Kubernetes, the de facto standard container orchestration system. It is in production use within the EdgeNet testbed for researchers.

Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning

  • Authors: Dingwen Kong, Lin F. Yang
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.08944
  • Pdf link: https://arxiv.org/pdf/2304.08944
  • Abstract
    An appropriate reward function is of paramount importance in specifying a task in reinforcement learning (RL). Yet, it is known to be extremely challenging in practice to design a correct reward function for even simple tasks. Human-in-the-loop (HiL) RL allows humans to communicate complex goals to the RL agent by providing various types of feedback. However, despite achieving great empirical successes, HiL RL usually requires too much feedback from a human teacher and also suffers from insufficient theoretical understanding. In this paper, we focus on addressing this issue from a theoretical perspective, aiming to provide provably feedback-efficient algorithmic frameworks that take human-in-the-loop to specify rewards of given tasks. We provide an active-learning-based RL algorithm that first explores the environment without specifying a reward function and then asks a human teacher for only a few queries about the rewards of a task at some state-action pairs. After that, the algorithm guarantees to provide a nearly optimal policy for the task with high probability. We show that, even with the presence of random noise in the feedback, the algorithm only takes $\widetilde{O}(H{{\dim_{R}^2}})$ queries on the reward function to provide an $\epsilon$-optimal policy for any $\epsilon > 0$. Here $H$ is the horizon of the RL environment, and $\dim_{R}$ specifies the complexity of the function class representing the reward function. In contrast, standard RL algorithms require to query the reward function for at least $\Omega(\operatorname{poly}(d, 1/\epsilon))$ state-action pairs where $d$ depends on the complexity of the environmental transition.

Generative modeling of living cells with SO(3)-equivariant implicit neural representations

  • Authors: David Wiesner, Julian Suk, Sven Dummer, Tereza Nečasová, Vladimír Ulman, David Svoboda, Jelmer M. Wolterink
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM)
  • Arxiv link: https://arxiv.org/abs/2304.08960
  • Pdf link: https://arxiv.org/pdf/2304.08960
  • Abstract
    Data-driven cell tracking and segmentation methods in biomedical imaging require diverse and information-rich training data. In cases where the number of training samples is limited, synthetic computer-generated data sets can be used to improve these methods. This requires the synthesis of cell shapes as well as corresponding microscopy images using generative models. To synthesize realistic living cell shapes, the shape representation used by the generative model should be able to accurately represent fine details and changes in topology, which are common in cells. These requirements are not met by 3D voxel masks, which are restricted in resolution, and polygon meshes, which do not easily model processes like cell growth and mitosis. In this work, we propose to represent living cell shapes as level sets of signed distance functions (SDFs) which are estimated by neural networks. We optimize a fully-connected neural network to provide an implicit representation of the SDF value at any point in a 3D+time domain, conditioned on a learned latent code that is disentangled from the rotation of the cell shape. We demonstrate the effectiveness of this approach on cells that exhibit rapid deformations (Platynereis dumerilii), cells that grow and divide (C. elegans), and cells that have growing and branching filopodial protrusions (A549 human lung carcinoma cells). A quantitative evaluation using shape features, Hausdorff distance, and Dice similarity coefficients of real and synthetic cell shapes shows that our model can generate topologically plausible complex cell shapes in 3D+time with high similarity to real living cell shapes. Finally, we show how microscopy images of living cells that correspond to our generated cell shapes can be synthesized using an image-to-image model.

SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes

  • Authors: Yiming Gao, Yan-Pei Cao, Ying Shan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.08971
  • Pdf link: https://arxiv.org/pdf/2304.08971
  • Abstract
    Online reconstructing and rendering of large-scale indoor scenes is a long-standing challenge. SLAM-based methods can reconstruct 3D scene geometry progressively in real time but can not render photorealistic results. While NeRF-based methods produce promising novel view synthesis results, their long offline optimization time and lack of geometric constraints pose challenges to efficiently handling online input. Inspired by the complementary advantages of classical 3D reconstruction and NeRF, we thus investigate marrying explicit geometric representation with NeRF rendering to achieve efficient online reconstruction and high-quality rendering. We introduce SurfelNeRF, a variant of neural radiance field which employs a flexible and scalable neural surfel representation to store geometric attributes and extracted appearance features from input images. We further extend the conventional surfel-based fusion scheme to progressively integrate incoming input frames into the reconstructed global neural scene representation. In addition, we propose a highly-efficient differentiable rasterization scheme for rendering neural surfel radiance fields, which helps SurfelNeRF achieve $10\times$ speedups in training and inference time, respectively. Experimental results show that our method achieves the state-of-the-art 23.82 PSNR and 29.58 PSNR on ScanNet in feedforward inference and per-scene optimization settings, respectively.

Neural Architecture Search for Visual Anomaly Segmentation

  • Authors: Tommie Kerssies
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.08975
  • Pdf link: https://arxiv.org/pdf/2304.08975
  • Abstract
    This paper presents AutoPatch, the first application of neural architecture search to the complex task of segmenting visual anomalies. Measurement of anomaly segmentation quality is challenging due to imbalanced anomaly pixels, varying region areas, and various types of anomalies. First, the weighted average precision (wAP) metric is proposed as an alternative to AUROC and AUPRO, which does not need to be limited to a specific maximum FPR. Second, a novel neural architecture search method is proposed, which enables efficient segmentation of visual anomalies without any training. By leveraging a pre-trained supernet, a black-box optimization algorithm can directly minimize FLOPS and maximize wAP on a small validation set of anomalous examples. Finally, compelling results on the widely studied MVTec [3] dataset are presented, demonstrating that AutoPatch outperforms the current state-of-the-art method PatchCore [12] with more than 18x fewer FLOPS, using only one example per anomaly type. These results highlight the potential of automated machine learning to optimize throughput in industrial quality control. The code for AutoPatch is available at: https://github.com/tommiekerssies/AutoPatch

A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese

  • Authors: Hugo Sousa, Arian Pasquali, Alípio Jorge, Catarina Sousa Santos, Mário Amorim Lopes
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.08999
  • Pdf link: https://arxiv.org/pdf/2304.08999
  • Abstract
    Textual health records of cancer patients are usually protracted and highly unstructured, making it very time-consuming for health professionals to get a complete overview of the patient's therapeutic course. As such limitations can lead to suboptimal and/or inefficient treatment procedures, healthcare providers would greatly benefit from a system that effectively summarizes the information of those records. With the advent of deep neural models, this objective has been partially attained for English clinical texts, however, the research community still lacks an effective solution for languages with limited resources. In this paper, we present the approach we developed to extract procedures, drugs, and diseases from oncology health records written in European Portuguese. This project was conducted in collaboration with the Portuguese Institute for Oncology which, besides holding over $10$ years of duly protected medical records, also provided oncologist expertise throughout the development of the project. Since there is no annotated corpus for biomedical entity extraction in Portuguese, we also present the strategy we followed in annotating the corpus for the development of the models. The final models, which combined a neural architecture with entity linking, achieved $F_1$ scores of $88.6$, $95.0$, and $55.8$ per cent in the mention extraction of procedures, drugs, and diseases, respectively.

An Augmented Subspace Based Adaptive Proper Orthogonal Decomposition Method for Time Dependent Partial Differential Equations

  • Authors: Xiaoying Dai, Miao Hu, Jack Xin, Aihui Zhou
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.09007
  • Pdf link: https://arxiv.org/pdf/2304.09007
  • Abstract
    In this paper, we propose an augmented subspace based adaptive proper orthogonal decomposition (POD) method for solving the time dependent partial differential equations. By augmenting the POD subspace with some auxiliary modes, we obtain an augmented subspace. We use the difference between the approximation obtained in this augmented subspace and that obtained in the original POD subspace to construct an error indicator, by which we obtain a general framework for augmented subspace based adaptive POD method. We then provide two strategies to obtain some specific augmented subspaces, the random vector based augmented subspace and the coarse-grid approximations based augmented subspace. We apply our new method to two typical 3D advection-diffusion equations with the advection being the Kolmogorov flow and the ABC flow. Numerical results show that our method is more efficient than the existing adaptive POD methods, especially for the advection dominated models.

GUILGET: GUI Layout GEneration with Transformer

  • Authors: Andrey Sobolevsky, Guillaume-Alexandre Bilodeau, Jinghui Cheng, Jin L.C. Guo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09012
  • Pdf link: https://arxiv.org/pdf/2304.09012
  • Abstract
    Sketching out Graphical User Interface (GUI) layout is part of the pipeline of designing a GUI and a crucial task for the success of a software application. Arranging all components inside a GUI layout manually is a time-consuming task. In order to assist designers, we developed a method named GUILGET to automatically generate GUI layouts from positional constraints represented as GUI arrangement graphs (GUI-AGs). The goal is to support the initial step of GUI design by producing realistic and diverse GUI layouts. The existing image layout generation techniques often cannot incorporate GUI design constraints. Thus, GUILGET needs to adapt existing techniques to generate GUI layouts that obey to constraints specific to GUI designs. GUILGET is based on transformers in order to capture the semantic in relationships between elements from GUI-AG. Moreover, the model learns constraints through the minimization of losses responsible for placing each component inside its parent layout, for not letting components overlap if they are inside the same parent, and for component alignment. Our experiments, which are conducted on the CLAY dataset, reveal that our model has the best understanding of relationships from GUI-AG and has the best performances in most of evaluation metrics. Therefore, our work contributes to improved GUI layout generation by proposing a novel method that effectively accounts for the constraints on GUI elements and paves the road for a more efficient GUI design pipeline.

DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures using Lookup Tables

  • Authors: Darshan C. Ganji, Saad Ashfaq, Ehsan Saboori, Sudhakar Sah, Saptarshi Mitra, MohammadHossein AskariHemmat, Alexander Hoffman, Ahmed Hassanien, Mathieu Léonardon
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09049
  • Pdf link: https://arxiv.org/pdf/2304.09049
  • Abstract
    A lot of recent progress has been made in ultra low-bit quantization, promising significant improvements in latency, memory footprint and energy consumption on edge devices. Quantization methods such as Learned Step Size Quantization can achieve model accuracy that is comparable to full-precision floating-point baselines even with sub-byte quantization. However, it is extremely challenging to deploy these ultra low-bit quantized models on mainstream CPU devices because commodity SIMD (Single Instruction, Multiple Data) hardware typically supports no less than 8-bit precision. To overcome this limitation, we propose DeepGEMM, a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware. The proposed method precomputes all possible products of weights and activations, stores them in a lookup table, and efficiently accesses them at inference time to avoid costly multiply-accumulate operations. Our 2-bit implementation outperforms corresponding 8-bit integer kernels in the QNNPACK framework by up to 1.74x on x86 platforms.

Revisiting k-NN for Pre-trained Language Models

  • Authors: Lei Li, Jing Chen, Bozhong Tian, Ningyu Zhang
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Databases (cs.DB); Information Retrieval (cs.IR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09058
  • Pdf link: https://arxiv.org/pdf/2304.09058
  • Abstract
    Pre-trained Language Models (PLMs), as parametric-based eager learners, have become the de-facto choice for current paradigms of Natural Language Processing (NLP). In contrast, k-Nearest-Neighbor (k-NN) classifiers, as the lazy learning paradigm, tend to mitigate over-fitting and isolated noise. In this paper, we revisit k-NN classifiers for augmenting the PLMs-based classifiers. From the methodological level, we propose to adopt k-NN with textual representations of PLMs in two steps: (1) Utilize k-NN as prior knowledge to calibrate the training process. (2) Linearly interpolate the probability distribution predicted by k-NN with that of the PLMs' classifier. At the heart of our approach is the implementation of k-NN-calibrated training, which treats predicted results as indicators for easy versus hard examples during the training process. From the perspective of the diversity of application scenarios, we conduct extensive experiments on fine-tuning, prompt-tuning paradigms and zero-shot, few-shot and fully-supervised settings, respectively, across eight diverse end-tasks. We hope our exploration will encourage the community to revisit the power of classical methods for efficient NLP\footnote{Code and datasets are available in https://github.com/zjunlp/Revisit-KNN.

Always Strengthen Your Strengths: A Drift-Aware Incremental Learning Framework for CTR Prediction

  • Authors: Congcong Liu, Fei Teng, Xiwei Zhao, Zhangang Lin, Jinghe Hu, Jingping Shao
  • Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09062
  • Pdf link: https://arxiv.org/pdf/2304.09062
  • Abstract
    Click-through rate (CTR) prediction is of great importance in recommendation systems and online advertising platforms. When served in industrial scenarios, the user-generated data observed by the CTR model typically arrives as a stream. Streaming data has the characteristic that the underlying distribution drifts over time and may recur. This can lead to catastrophic forgetting if the model simply adapts to new data distribution all the time. Also, it's inefficient to relearn distribution that has been occurred. Due to memory constraints and diversity of data distributions in large-scale industrial applications, conventional strategies for catastrophic forgetting such as replay, parameter isolation, and knowledge distillation are difficult to be deployed. In this work, we design a novel drift-aware incremental learning framework based on ensemble learning to address catastrophic forgetting in CTR prediction. With explicit error-based drift detection on streaming data, the framework further strengthens well-adapted ensembles and freezes ensembles that do not match the input distribution avoiding catastrophic interference. Both evaluations on offline experiments and A/B test shows that our method outperforms all baselines considered.

METAM: Goal-Oriented Data Discovery

  • Authors: Sainyam Galhotra, Yue Gong, Raul Castro Fernandez
  • Subjects: Databases (cs.DB); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09068
  • Pdf link: https://arxiv.org/pdf/2304.09068
  • Abstract
    Data is a central component of machine learning and causal inference tasks. The availability of large amounts of data from sources such as open data repositories, data lakes and data marketplaces creates an opportunity to augment data and boost those tasks' performance. However, augmentation techniques rely on a user manually discovering and shortlisting useful candidate augmentations. Existing solutions do not leverage the synergy between discovery and augmentation, thus under exploiting data. In this paper, we introduce METAM, a novel goal-oriented framework that queries the downstream task with a candidate dataset, forming a feedback loop that automatically steers the discovery and augmentation process. To select candidates efficiently, METAM leverages properties of the: i) data, ii) utility function, and iii) solution set size. We show METAM's theoretical guarantees and demonstrate those empirically on a broad set of tasks. All in all, we demonstrate the promise of goal-oriented data discovery to modern data science applications.

DRIFT: A Federated Recommender System with Implicit Feedback on the Items

  • Authors: Theo Nommay
  • Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09084
  • Pdf link: https://arxiv.org/pdf/2304.09084
  • Abstract
    Nowadays there are more and more items available online, this makes it hard for users to find items that they like. Recommender systems aim to find the item who best suits the user, using his historical interactions. Depending on the context, these interactions may be more or less sensitive and collecting them brings an important problem concerning the users' privacy. Federated systems have shown that it is possible to make accurate and efficient recommendations without storing users' personal information. However, these systems use instantaneous feedback from the user. In this report, we propose DRIFT, a federated architecture for recommender systems, using implicit feedback. Our learning model is based on a recent algorithm for recommendation with implicit feedbacks SAROS. We aim to make recommendations as precise as SAROS, without compromising the users' privacy. In this report we show that thanks to our experiments, but also thanks to a theoretical analysis on the convergence. We have shown also that the computation time has a linear complexity with respect to the number of interactions made. Finally, we have shown that our algorithm is secure, and participants in our federated system cannot guess the interactions made by the user, except DOs that have the item involved in the interaction.

Balancing Unobserved Confounding with a Few Unbiased Ratings in Debiased Recommendations

  • Authors: Haoxuan Li, Yanghao Xiao, Chunyuan Zheng, Peng Wu
  • Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09085
  • Pdf link: https://arxiv.org/pdf/2304.09085
  • Abstract
    Recommender systems are seen as an effective tool to address information overload, but it is widely known that the presence of various biases makes direct training on large-scale observational data result in sub-optimal prediction performance. In contrast, unbiased ratings obtained from randomized controlled trials or A/B tests are considered to be the golden standard, but are costly and small in scale in reality. To exploit both types of data, recent works proposed to use unbiased ratings to correct the parameters of the propensity or imputation models trained on the biased dataset. However, the existing methods fail to obtain accurate predictions in the presence of unobserved confounding or model misspecification. In this paper, we propose a theoretically guaranteed model-agnostic balancing approach that can be applied to any existing debiasing method with the aim of combating unobserved confounding and model misspecification. The proposed approach makes full use of unbiased data by alternatively correcting model parameters learned with biased data, and adaptively learning balance coefficients of biased samples for further debiasing. Extensive real-world experiments are conducted along with the deployment of our proposal on four representative debiasing methods to demonstrate the effectiveness.

MATURE-HEALTH: HEALTH Recommender System for MAndatory FeaTURE choices

  • Authors: Ritu Shandilya, Sugam Sharma, Johnny Wong
  • Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09099
  • Pdf link: https://arxiv.org/pdf/2304.09099
  • Abstract
    Balancing electrolytes is utmost important and essential for appropriate functioning of organs in human body as electrolytes imbalance can be an indication of the development of underlying pathophysiology. Efficient monitoring of electrolytes imbalance not only can increase the chances of early detection of disease, but also prevents the further deterioration of the health by strictly following nutrient controlled diet for balancing the electrolytes post disease detection. In this research, a recommender system MATURE Health is proposed and implemented, which predicts the imbalance of mandatory electrolytes and other substances presented in blood and recommends the food items with the balanced nutrients to avoid occurrence of the electrolytes imbalance. The proposed model takes user most recent laboratory results and daily food intake into account to predict the electrolytes imbalance. MATURE Health relies on MATURE Food algorithm to recommend food items as latter recommends only those food items that satisfy all mandatory nutrient requirements while also considering user past food preferences. To validate the proposed method, particularly sodium, potassium, and BUN levels have been predicted with prediction algorithm, Random Forest, for dialysis patients using their laboratory reports history and daily food intake. And, the proposed model demonstrates 99.53 percent, 96.94 percent and 95.35 percent accuracy for Sodium, Potassium, and BUN respectively. MATURE Health is a novel health recommender system that implements machine learning models to predict the imbalance of mandatory electrolytes and other substances in the blood and recommends the food items which contain the required amount of the nutrients that prevent or at least reduce the risk of the electrolytes imbalance.

LaSNN: Layer-wise ANN-to-SNN Distillation for Effective and Efficient Training in Deep Spiking Neural Networks

  • Authors: Di Hong, Jiangrong Shen, Yu Qi, Yueming Wang
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09101
  • Pdf link: https://arxiv.org/pdf/2304.09101
  • Abstract
    Spiking Neural Networks (SNNs) are biologically realistic and practically promising in low-power computation because of their event-driven mechanism. Usually, the training of SNNs suffers accuracy loss on various tasks, yielding an inferior performance compared with ANNs. A conversion scheme is proposed to obtain competitive accuracy by mapping trained ANNs' parameters to SNNs with the same structures. However, an enormous number of time steps are required for these converted SNNs, thus losing the energy-efficient benefit. Utilizing both the accuracy advantages of ANNs and the computing efficiency of SNNs, a novel SNN training framework is proposed, namely layer-wise ANN-to-SNN knowledge distillation (LaSNN). In order to achieve competitive accuracy and reduced inference latency, LaSNN transfers the learning from a well-trained ANN to a small SNN by distilling the knowledge other than converting the parameters of ANN. The information gap between heterogeneous ANN and SNN is bridged by introducing the attention scheme, the knowledge in an ANN is effectively compressed and then efficiently transferred by utilizing our layer-wise distillation paradigm. We conduct detailed experiments to demonstrate the effectiveness, efficacy, and scalability of LaSNN on three benchmark data sets (CIFAR-10, CIFAR-100, and Tiny ImageNet). We achieve competitive top-1 accuracy compared to ANNs and 20x faster inference than converted SNNs with similar performance. More importantly, LaSNN is dexterous and extensible that can be effortlessly developed for SNNs with different architectures/depths and input encoding methods, contributing to their potential development.

Fast Neural Scene Flow

  • Authors: Xueqian Li, Jianqiao Zheng, Francesco Ferroni, Jhony Kaesemodel Pontes, Simon Lucey
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09121
  • Pdf link: https://arxiv.org/pdf/2304.09121
  • Abstract
    Scene flow is an important problem as it provides low-level motion cues for many downstream tasks. State-of-the-art learning methods are usually fast and can achieve impressive performance on in-domain data, but usually fail to generalize to out-of-the-distribution (OOD) data or handle dense point clouds. In this paper, we focus on a runtime optimization-based neural scene flow pipeline. In (a) one can see its application in the densification of lidar. However, in (c) one sees that the major drawback is the extensive computation time. We identify that the common speedup strategy in network architectures for coordinate networks has little effect on scene flow acceleration [see green (b)] unlike image reconstruction [see pink (b)]. With the dominant computational burden stemming instead from the Chamfer loss function, we propose to use a distance transform-based loss function to accelerate [see purple (b)], which achieves up to 30x speedup and on-par estimation performance compared to NSFP [see (c)]. When tested on 8k points, it is as efficient [see (c)] as leading learning methods, achieving real-time performance.

Keyword: faster

Agent-Based Modeling and its Tradeoffs: An Introduction & Examples

  • Authors: G. Wade McDonald, Nathaniel D. Osgood
  • Subjects: Multiagent Systems (cs.MA); Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2304.08497
  • Pdf link: https://arxiv.org/pdf/2304.08497
  • Abstract
    Agent-based modeling is a computational dynamic modeling technique that may be less familiar to some readers. Agent-based modeling seeks to understand the behaviour of complex systems by situating agents in an environment and studying the emergent outcomes of agent-agent and agent-environment interactions. In comparison with compartmental models, agent-based models offer simpler, more scalable and flexible representation of heterogeneity, the ability to capture dynamic and static network and spatial context, and the ability to consider history of individuals within the model. In contrast, compartmental models offer faster development time with less programming required, lower computational requirements that do not scale with population, and the option for concise mathematical formulation with ordinary, delay or stochastic differential equations supporting derivation of properties of the system behaviour. In this chapter, basic characteristics of agent-based models are introduced, advantages and disadvantages of agent-based models, as compared with compartmental models, are discussed, and two example agent-based infectious disease models are reviewed.

Hybrid Materialization in a Disk-Based Column-Store

  • Authors: Evgeniy Klyuchikov, Elena Mikhailova, George Chernishev
  • Subjects: Databases (cs.DB); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.08532
  • Pdf link: https://arxiv.org/pdf/2304.08532
  • Abstract
    In column-oriented query processing, a materialization strategy determines when lightweight positions (row IDs) are translated into tuples. It is an important part of column-store architecture, since it defines the class of supported query plans, and, therefore, impacts the overall system performance. In this paper we continue investigating materialization strategies for a distributed disk-based column-store. We start with demonstrating cases when existing approaches impose fundamental limitations on the resulting system performance. Then, in order to address them, we propose a new hybrid materialization model. The main feature of hybrid materialization is the ability to manipulate both positions and values at the same time. This way, query engine can flexibly combine advantages of all the existing strategies and support a new class of query plans. Moreover, hybrid materialization allows the query engine to flexibly customize the materialization policy of individual attributes. We describe our vision of how hybrid materialization can be implemented in a columnar system. As an example, we use PosDB~ -- a distributed, disk-based column-store. We present necessary data structures, the internals of a hybrid operator, and describe the algebra of such operators. Based on this implementation, we evaluate performance of late, ultra-late, and hybrid materialization strategies in several scenarios based on TPC-H queries. Our experiments demonstrate that hybrid materialization is almost two times faster than its counterparts, while providing a more flexible query model.

Stochastic Subgraph Neighborhood Pooling for Subgraph Classification

  • Authors: Shweta Ann Jacob, Paul Louis, Amirali Salehi-Abari
  • Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.08556
  • Pdf link: https://arxiv.org/pdf/2304.08556
  • Abstract
    Subgraph classification is an emerging field in graph representation learning where the task is to classify a group of nodes (i.e., a subgraph) within a graph. Subgraph classification has applications such as predicting the cellular function of a group of proteins or identifying rare diseases given a collection of phenotypes. Graph neural networks (GNNs) are the de facto solution for node, link, and graph-level tasks but fail to perform well on subgraph classification tasks. Even GNNs tailored for graph classification are not directly transferable to subgraph classification as they ignore the external topology of the subgraph, thus failing to capture how the subgraph is located within the larger graph. The current state-of-the-art models for subgraph classification address this shortcoming through either labeling tricks or multiple message-passing channels, both of which impose a computation burden and are not scalable to large graphs. To address the scalability issue while maintaining generalization, we propose Stochastic Subgraph Neighborhood Pooling (SSNP), which jointly aggregates the subgraph and its neighborhood (i.e., external topology) information without any computationally expensive operations such as labeling tricks. To improve scalability and generalization further, we also propose a simple data augmentation pre-processing step for SSNP that creates multiple sparse views of the subgraph neighborhood. We show that our model is more expressive than GNNs without labeling tricks. Our extensive experiments demonstrate that our models outperform current state-of-the-art methods (with a margin of up to 2%) while being up to 3X faster in training.

LaSNN: Layer-wise ANN-to-SNN Distillation for Effective and Efficient Training in Deep Spiking Neural Networks

  • Authors: Di Hong, Jiangrong Shen, Yu Qi, Yueming Wang
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09101
  • Pdf link: https://arxiv.org/pdf/2304.09101
  • Abstract
    Spiking Neural Networks (SNNs) are biologically realistic and practically promising in low-power computation because of their event-driven mechanism. Usually, the training of SNNs suffers accuracy loss on various tasks, yielding an inferior performance compared with ANNs. A conversion scheme is proposed to obtain competitive accuracy by mapping trained ANNs' parameters to SNNs with the same structures. However, an enormous number of time steps are required for these converted SNNs, thus losing the energy-efficient benefit. Utilizing both the accuracy advantages of ANNs and the computing efficiency of SNNs, a novel SNN training framework is proposed, namely layer-wise ANN-to-SNN knowledge distillation (LaSNN). In order to achieve competitive accuracy and reduced inference latency, LaSNN transfers the learning from a well-trained ANN to a small SNN by distilling the knowledge other than converting the parameters of ANN. The information gap between heterogeneous ANN and SNN is bridged by introducing the attention scheme, the knowledge in an ANN is effectively compressed and then efficiently transferred by utilizing our layer-wise distillation paradigm. We conduct detailed experiments to demonstrate the effectiveness, efficacy, and scalability of LaSNN on three benchmark data sets (CIFAR-10, CIFAR-100, and Tiny ImageNet). We achieve competitive top-1 accuracy compared to ANNs and 20x faster inference than converted SNNs with similar performance. More importantly, LaSNN is dexterous and extensible that can be effortlessly developed for SNNs with different architectures/depths and input encoding methods, contributing to their potential development.

Keyword: mobile

Coordinated Multi-Agent Reinforcement Learning for Unmanned Aerial Vehicle Swarms in Autonomous Mobile Access Applications

  • Authors: Chanyoung Park, Haemin Lee, Won Joon Yun, Soyi Jung, Joongheon Kim
  • Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.08493
  • Pdf link: https://arxiv.org/pdf/2304.08493
  • Abstract
    This paper proposes a novel centralized training and distributed execution (CTDE)-based multi-agent deep reinforcement learning (MADRL) method for multiple unmanned aerial vehicles (UAVs) control in autonomous mobile access applications. For the purpose, a single neural network is utilized in centralized training for cooperation among multiple agents while maximizing the total quality of service (QoS) in mobile access applications.

Safe Navigation and Obstacle Avoidance Using Differentiable Optimization Based Control Barrier Functions

  • Authors: Bolun Dai, Rooholla Khorrambakht, Prashanth Krishnamurthy, Vinícius Gonçalves, Anthony Tzes, Farshad Khorrami
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.08586
  • Pdf link: https://arxiv.org/pdf/2304.08586
  • Abstract
    Control barrier functions (CBFs) have been widely applied to safety-critical robotic applications. However, the construction of control barrier functions for robotic systems remains a challenging task. Recently, collision detection using differentiable optimization has provided a way to compute the minimum uniform scaling factor that results in an intersection between two convex shapes and to also compute the Jacobian of the scaling factor. In this paper, we propose a framework that uses this scaling factor, with an offset, to systematically define a CBF for obstacle avoidance tasks. We provide a theoretical analysis that proves the continuity of the proposed CBF. Empirically, we show that the proposed CBF is continuously differentiable, and the resulting optimal control problem is computationally efficient, which makes it applicable for real-time robotic control. We validate our approach, first using a 2D mobile robot example, then on the Franka-Emika Research~3 (FR3) robot manipulator both in simulation and experiment.

Graceful User Following for Mobile Balance Assistive Robot in Daily Activities Assistance

  • Authors: Yifan Wang, Meng Yuan, Lei Li, Karen Sui Geok Chua, Seng Kwee Wee, Wei Tech Ang
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.08695
  • Pdf link: https://arxiv.org/pdf/2304.08695
  • Abstract
    Numerous diseases and aging can cause degeneration of people's balance ability resulting in limited mobility and even high risks of fall. Robotic technologies can provide more intensive rehabilitation exercises or be used as assistive devices to compensate for balance ability. However, With the new healthcare paradigm shifting from hospital care to home care, there is a gap in robotic systems that can provide care at home. This paper introduces Mobile Robotic Balance Assistant (MRBA), a compact and cost-effective balance assistive robot that can provide both rehabilitation training and activities of daily living (ADLs) assistance at home. A three degrees of freedom (3-DoF) robotic arm was designed to mimic the therapist arm function to provide balance assistance to the user. To minimize the interference to users' natural pelvis movements and gait patterns, the robot must have a Human-Robot Interface(HRI) that can detect user intention accurately and follow the user's movement smoothly and timely. Thus, a graceful user following control rule was proposed. The overall control architecture consists of two parts: an observer for human inputs estimation and an LQR-based controller with disturbance rejection. The proposed controller is validated in high-fidelity simulation with actual human trajectories, and the results successfully show the effectiveness of the method in different walking modes.

AoI-Delay Tradeoff in Mobile Edge Caching: A Mixed-Order Drift-Plus-Penalty Algorithm

  • Authors: Ran Li, Chuan Huang, Xiaoqi Qin
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.08781
  • Pdf link: https://arxiv.org/pdf/2304.08781
  • Abstract
    We consider a scheduling problem in a Mobile Edge Caching (MEC) network, where a base station (BS) uploads messages from multiple source nodes (SNs) and transmits them to mobile users (MUs) via downlinks, aiming to jointly optimize the average service Age of Information (AoI) and service delay over MUs. This problem is formulated as a difficult sequential decision making problem with discrete-valued and linearly-constrained design variables. To solve this problem, we first approximate its achievable region by characterizing its superset and subset. The superset is derived based on the rate stability theorem, while the subset is obtained using a novel stochastic policy. We also validate that this subset is substantially identical to the achievable region when the number of schedule resources is large. Additionally, we propose a sufficient condition to check the existence of the solution to the problem. Then, we propose the mixed-order drift-plus-penalty algorithm that uses a dynamic programming (DP) method to optimize the summation over a linear and quadratic Lyapunov drift and a penalty term, to handle the product term over different queue backlogs in the objective function. Finally, by associating the proposed algorithm with the stochastic policy, we demonstrate that it achieves an $O(1/V)$ versus $O(V)$ tradeoff for the average AoI and average delay.

Sparks of GPTs in Edge Intelligence for Metaverse: Caching and Inference for Mobile AIGC Services

  • Authors: Minrui Xu, Dusit Niyato, Hongliang Zhang, Jiawen Kang, Zehui Xiong, Shiwen Mao, Zhu Han
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.08782
  • Pdf link: https://arxiv.org/pdf/2304.08782
  • Abstract
    Aiming at achieving artificial general intelligence (AGI) for Metaverse, pretrained foundation models (PFMs), e.g., generative pretrained transformers (GPTs), can effectively provide various AI services, such as autonomous driving, digital twins, and AI-generated content (AIGC) for extended reality. With the advantages of low latency and privacy-preserving, serving PFMs of mobile AI services in edge intelligence is a viable solution for caching and executing PFMs on edge servers with limited computing resources and GPU memory. However, PFMs typically consist of billions of parameters that are computation and memory-intensive for edge servers during loading and execution. In this article, we investigate edge PFM serving problems for mobile AIGC services of Metaverse. First, we introduce the fundamentals of PFMs and discuss their characteristic fine-tuning and inference methods in edge intelligence. Then, we propose a novel framework of joint model caching and inference for managing models and allocating resources to satisfy users' requests efficiently. Furthermore, considering the in-context learning ability of PFMs, we propose a new metric to evaluate the freshness and relevance between examples in demonstrations and executing tasks, namely the Age of Context (AoC). Finally, we propose a least context algorithm for managing cached models at edge servers by balancing the tradeoff among latency, energy consumption, and accuracy.

Full-Duplex Wireless for 6G: Progress Brings New Opportunities and Challenges

  • Authors: Besma Smida, Ashutosh Sabharwal, Gabor Fodor, George C. Alexandropoulos, Himal A. Suraweera, Chan-Byoung Chae
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.08789
  • Pdf link: https://arxiv.org/pdf/2304.08789
  • Abstract
    The use of in-band full-duplex (FD) enables nodes to simultaneously transmit and receive on the same frequency band, which challenges the traditional assumption in wireless network design. The full-duplex capability enhances spectral efficiency and decreases latency, which are two key drivers pushing the performance expectations of next-generation mobile networks. In less than ten years, in-band FD has advanced from being demonstrated in research labs to being implemented in standards and products, presenting new opportunities to utilize its foundational concepts. Some of the most significant opportunities include using FD to enable wireless networks to sense the physical environment, integrate sensing and communication applications, develop integrated access and backhaul solutions, and work with smart signal propagation environments powered by reconfigurable intelligent surfaces. However, these new opportunities also come with new challenges for large-scale commercial deployment of FD technology, such as managing self-interference, combating cross-link interference in multi-cell networks, and coexistence of dynamic time division duplex, subband FD and FD networks.

Event Camera and LiDAR based Human Tracking for Adverse Lighting Conditions in Subterranean Environments

  • Authors: Mario A.V. Saucedo, Akash Patel, Rucha Sawlekar, Akshit Saradagi, Christoforos Kanellakis, Ali-Akbar Agha-Mohammadi, George Nikolakopoulos
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.08908
  • Pdf link: https://arxiv.org/pdf/2304.08908
  • Abstract
    In this article, we propose a novel LiDAR and event camera fusion modality for subterranean (SubT) environments for fast and precise object and human detection in a wide variety of adverse lighting conditions, such as low or no light, high-contrast zones and in the presence of blinding light sources. In the proposed approach, information from the event camera and LiDAR are fused to localize a human or an object-of-interest in a robot's local frame. The local detection is then transformed into the inertial frame and used to set references for a Nonlinear Model Predictive Controller (NMPC) for reactive tracking of humans or objects in SubT environments. The proposed novel fusion uses intensity filtering and K-means clustering on the LiDAR point cloud and frequency filtering and connectivity clustering on the events induced in an event camera by the returning LiDAR beams. The centroids of the clusters in the event camera and LiDAR streams are then paired to localize reflective markers present on safety vests and signs in SubT environments. The efficacy of the proposed scheme has been experimentally validated in a real SubT environment (a mine) with a Pioneer 3AT mobile robot. The experimental results show real-time performance for human detection and the NMPC-based controller allows for reactive tracking of a human or object of interest, even in complete darkness.

Continuous-Time Range-Only Pose Estimation

  • Authors: Abhishek Goudar, Timothy D. Barfoot, Angela P. Schoellig
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.09043
  • Pdf link: https://arxiv.org/pdf/2304.09043
  • Abstract
    Range-only (RO) localization involves determining the position of a mobile robot by measuring the distance to specific anchors. RO localization is challenging since the measurements are low-dimensional and a single range sensor does not have enough information to estimate the full pose of the robot. As such, range sensors are typically coupled with other sensing modalities such as wheel encoders or inertial measurement units (IMUs) to estimate the full pose. In this work, we propose a continuous-time Gaussian process (GP)- based trajectory estimation method to estimate the full pose of a robot using only range measurements from multiple range sensors. Results from simulation and real experiments show that our proposed method, using off-the-shelf range sensors, is able to achieve comparable performance and in some cases outperform alternative state-of-the-art sensor-fusion methods that use additional sensing modalities.

Designing the mobile robot Kevin for a life science laboratory

  • Authors: Sarah Kleine-Wechelmann, Kim Bastiaanse, Matthias Freundel, Christian Becker-Asano
  • Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.09090
  • Pdf link: https://arxiv.org/pdf/2304.09090
  • Abstract
    Laboratories are being increasingly automated. In small laboratories individual processes can be fully automated, but this is usually not economically viable. Nevertheless, individual process steps can be performed by flexible, mobile robots to relieve the laboratory staff. As a contribution to the requirements in a life science laboratory the mobile, dextrous robot Kevin was designed by the Fraunhofer IPA research institute in Stuttgart, Germany. Kevin is a mobile service robot which is able to fulfill non-value adding activities such as transportation of labware. This paper gives an overview of Kevin's functionalities, its development process, and presents a preliminary study on how its lights and sounds improve user interaction.

Keyword: pruning

CyFormer: Accurate State-of-Health Prediction of Lithium-Ion Batteries via Cyclic Attention

  • Authors: Zhiqiang Nie, Jiankun Zhao, Qicheng Li, Yong Qin
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.08502
  • Pdf link: https://arxiv.org/pdf/2304.08502
  • Abstract
    Predicting the State-of-Health (SoH) of lithium-ion batteries is a fundamental task of battery management systems on electric vehicles. It aims at estimating future SoH based on historical aging data. Most existing deep learning methods rely on filter-based feature extractors (e.g., CNN or Kalman filters) and recurrent time sequence models. Though efficient, they generally ignore cyclic features and the domain gap between training and testing batteries. To address this problem, we present CyFormer, a transformer-based cyclic time sequence model for SoH prediction. Instead of the conventional CNN-RNN structure, we adopt an encoder-decoder architecture. In the encoder, row-wise and column-wise attention blocks effectively capture intra-cycle and inter-cycle connections and extract cyclic features. In the decoder, the SoH queries cross-attend to these features to form the final predictions. We further utilize a transfer learning strategy to narrow the domain gap between the training and testing set. To be specific, we use fine-tuning to shift the model to a target working condition. Finally, we made our model more efficient by pruning. The experiment shows that our method attains an MAE of 0.75% with only 10% data for fine-tuning on a testing battery, surpassing prior methods by a large margin. Effective and robust, our method provides a potential solution for all cyclic time sequence prediction tasks.

Keyword: voxel

Generative modeling of living cells with SO(3)-equivariant implicit neural representations

  • Authors: David Wiesner, Julian Suk, Sven Dummer, Tereza Nečasová, Vladimír Ulman, David Svoboda, Jelmer M. Wolterink
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM)
  • Arxiv link: https://arxiv.org/abs/2304.08960
  • Pdf link: https://arxiv.org/pdf/2304.08960
  • Abstract
    Data-driven cell tracking and segmentation methods in biomedical imaging require diverse and information-rich training data. In cases where the number of training samples is limited, synthetic computer-generated data sets can be used to improve these methods. This requires the synthesis of cell shapes as well as corresponding microscopy images using generative models. To synthesize realistic living cell shapes, the shape representation used by the generative model should be able to accurately represent fine details and changes in topology, which are common in cells. These requirements are not met by 3D voxel masks, which are restricted in resolution, and polygon meshes, which do not easily model processes like cell growth and mitosis. In this work, we propose to represent living cell shapes as level sets of signed distance functions (SDFs) which are estimated by neural networks. We optimize a fully-connected neural network to provide an implicit representation of the SDF value at any point in a 3D+time domain, conditioned on a learned latent code that is disentangled from the rotation of the cell shape. We demonstrate the effectiveness of this approach on cells that exhibit rapid deformations (Platynereis dumerilii), cells that grow and divide (C. elegans), and cells that have growing and branching filopodial protrusions (A549 human lung carcinoma cells). A quantitative evaluation using shape features, Hausdorff distance, and Dice similarity coefficients of real and synthetic cell shapes shows that our model can generate topologically plausible complex cell shapes in 3D+time with high similarity to real living cell shapes. Finally, we show how microscopy images of living cells that correspond to our generated cell shapes can be synthesized using an image-to-image model.

Unsupervised Semantic Segmentation of 3D Point Clouds via Cross-modal Distillation and Super-Voxel Clustering

  • Authors: Zisheng Chen, Hongbin Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.08965
  • Pdf link: https://arxiv.org/pdf/2304.08965
  • Abstract
    Semantic segmentation of point clouds usually requires exhausting efforts of human annotations, hence it attracts wide attention to the challenging topic of learning from unlabeled or weaker forms of annotations. In this paper, we take the first attempt for fully unsupervised semantic segmentation of point clouds, which aims to delineate semantically meaningful objects without any form of annotations. Previous works of unsupervised pipeline on 2D images fails in this task of point clouds, due to: 1) Clustering Ambiguity caused by limited magnitude of data and imbalanced class distribution; 2) Irregularity Ambiguity caused by the irregular sparsity of point cloud. Therefore, we propose a novel framework, PointDC, which is comprised of two steps that handle the aforementioned problems respectively: Cross-Modal Distillation (CMD) and Super-Voxel Clustering (SVC). In the first stage of CMD, multi-view visual features are back-projected to the 3D space and aggregated to a unified point feature to distill the training of the point representation. In the second stage of SVC, the point features are aggregated to super-voxels and then fed to the iterative clustering process for excavating semantic classes. PointDC yields a significant improvement over the prior state-of-the-art unsupervised methods, on both the ScanNet-v2 (+18.4 mIoU) and S3DIS (+11.5 mIoU) semantic segmentation benchmarks.

Keyword: lidar

PALF: Pre-Annotation and Camera-LiDAR Late Fusion for the Easy Annotation of Point Clouds

  • Authors: Yucheng Zhang, Masaki Fukuda, Yasunori Ishii, Kyoko Ohshima, Takayoshi Yamashita
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.08591
  • Pdf link: https://arxiv.org/pdf/2304.08591
  • Abstract
    3D object detection has become indispensable in the field of autonomous driving. To date, gratifying breakthroughs have been recorded in 3D object detection research, attributed to deep learning. However, deep learning algorithms are data-driven and require large amounts of annotated point cloud data for training and evaluation. Unlike 2D image labels, annotating point cloud data is difficult due to the limitations of sparsity, irregularity, and low resolution, which requires more manual work, and the annotation efficiency is much lower than 2D image.Therefore, we propose an annotation algorithm for point cloud data, which is pre-annotation and camera-LiDAR late fusion algorithm to easily and accurately annotate. The contributions of this study are as follows. We propose (1) a pre-annotation algorithm that employs 3D object detection and auto fitting for the easy annotation of point clouds, (2) a camera-LiDAR late fusion algorithm using 2D and 3D results for easily error checking, which helps annotators easily identify missing objects, and (3) a point cloud annotation evaluation pipeline to evaluate our experiments. The experimental results show that the proposed algorithm improves the annotating speed by 6.5 times and the annotation quality in terms of the 3D Intersection over Union and precision by 8.2 points and 5.6 points, respectively; additionally, the miss rate is reduced by 31.9 points.

(LC)$^2$: LiDAR-Camera Loop Constraints For Cross-Modal Place Recognition

  • Authors: Alex Junho Lee, Seungwon Song, Hyungtae Lim, Woojoo Lee, Hyun Myung
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.08660
  • Pdf link: https://arxiv.org/pdf/2304.08660
  • Abstract
    Localization has been a challenging task for autonomous navigation. A loop detection algorithm must overcome environmental changes for the place recognition and re-localization of robots. Therefore, deep learning has been extensively studied for the consistent transformation of measurements into localization descriptors. Street view images are easily accessible; however, images are vulnerable to appearance changes. LiDAR can robustly provide precise structural information. However, constructing a point cloud database is expensive, and point clouds exist only in limited places. Different from previous works that train networks to produce shared embedding directly between the 2D image and 3D point cloud, we transform both data into 2.5D depth images for matching. In this work, we propose a novel cross-matching method, called (LC)$^2$, for achieving LiDAR localization without a prior point cloud map. To this end, LiDAR measurements are expressed in the form of range images before matching them to reduce the modality discrepancy. Subsequently, the network is trained to extract localization descriptors from disparity and range images. Next, the best matches are employed as a loop factor in a pose graph. Using public datasets that include multiple sessions in significantly different lighting conditions, we demonstrated that LiDAR-based navigation systems could be optimized from image databases and vice versa.

Event Camera and LiDAR based Human Tracking for Adverse Lighting Conditions in Subterranean Environments

  • Authors: Mario A.V. Saucedo, Akash Patel, Rucha Sawlekar, Akshit Saradagi, Christoforos Kanellakis, Ali-Akbar Agha-Mohammadi, George Nikolakopoulos
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.08908
  • Pdf link: https://arxiv.org/pdf/2304.08908
  • Abstract
    In this article, we propose a novel LiDAR and event camera fusion modality for subterranean (SubT) environments for fast and precise object and human detection in a wide variety of adverse lighting conditions, such as low or no light, high-contrast zones and in the presence of blinding light sources. In the proposed approach, information from the event camera and LiDAR are fused to localize a human or an object-of-interest in a robot's local frame. The local detection is then transformed into the inertial frame and used to set references for a Nonlinear Model Predictive Controller (NMPC) for reactive tracking of humans or objects in SubT environments. The proposed novel fusion uses intensity filtering and K-means clustering on the LiDAR point cloud and frequency filtering and connectivity clustering on the events induced in an event camera by the returning LiDAR beams. The centroids of the clusters in the event camera and LiDAR streams are then paired to localize reflective markers present on safety vests and signs in SubT environments. The efficacy of the proposed scheme has been experimentally validated in a real SubT environment (a mine) with a Pioneer 3AT mobile robot. The experimental results show real-time performance for human detection and the NMPC-based controller allows for reactive tracking of a human or object of interest, even in complete darkness.

Visual-LiDAR Odometry and Mapping with Monocular Scale Correction and Motion Compensation

  • Authors: Hanyu Cai, Ni Ou, Junzheng Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.08978
  • Pdf link: https://arxiv.org/pdf/2304.08978
  • Abstract
    This paper presents a novel visual-LiDAR odometry and mapping method with low-drift characteristics. The proposed method is based on two popular approaches, ORB-SLAM and A-LOAM, with monocular scale correction and visual-assisted LiDAR motion compensation modifications. The scale corrector calculates the proportion between the depth of image keypoints recovered by triangulation and that provided by LiDAR, using an outlier rejection process for accuracy improvement. Concerning LiDAR motion compensation, the visual odometry approach gives the initial guesses of LiDAR motions for better performance. This methodology is not only applicable to high-resolution LiDAR but can also adapt to low-resolution LiDAR. To evaluate the proposed SLAM system's robustness and accuracy, we conducted experiments on the KITTI Odometry and S3E datasets. Experimental results illustrate that our method significantly outperforms standalone ORB-SLAM2 and A-LOAM. Furthermore, regarding the accuracy of visual odometry with scale correction, our method performs similarly to the stereo-mode ORB-SLAM2.

Fast Neural Scene Flow

  • Authors: Xueqian Li, Jianqiao Zheng, Francesco Ferroni, Jhony Kaesemodel Pontes, Simon Lucey
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09121
  • Pdf link: https://arxiv.org/pdf/2304.09121
  • Abstract
    Scene flow is an important problem as it provides low-level motion cues for many downstream tasks. State-of-the-art learning methods are usually fast and can achieve impressive performance on in-domain data, but usually fail to generalize to out-of-the-distribution (OOD) data or handle dense point clouds. In this paper, we focus on a runtime optimization-based neural scene flow pipeline. In (a) one can see its application in the densification of lidar. However, in (c) one sees that the major drawback is the extensive computation time. We identify that the common speedup strategy in network architectures for coordinate networks has little effect on scene flow acceleration [see green (b)] unlike image reconstruction [see pink (b)]. With the dominant computational burden stemming instead from the Chamfer loss function, we propose to use a distance transform-based loss function to accelerate [see purple (b)], which achieves up to 30x speedup and on-par estimation performance compared to NSFP [see (c)]. When tested on 8k points, it is as efficient [see (c)] as leading learning methods, achieving real-time performance.

Keyword: diffusion

Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model

  • Authors: Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.08577
  • Pdf link: https://arxiv.org/pdf/2304.08577
  • Abstract
    With the recent surge in popularity of AR/VR applications, realistic and accurate control of 3D full-body avatars has become a highly demanded feature. A particular challenge is that only a sparse tracking signal is available from standalone HMDs (Head Mounted Devices), often limited to tracking the user's head and wrists. While this signal is resourceful for reconstructing the upper body motion, the lower body is not tracked and must be synthesized from the limited information provided by the upper body joints. In this paper, we present AGRoL, a novel conditional diffusion model specifically designed to track full bodies given sparse upper-body tracking signals. Our model is based on a simple multi-layer perceptron (MLP) architecture and a novel conditioning scheme for motion data. It can predict accurate and smooth full-body motion, particularly the challenging lower body movement. Unlike common diffusion architectures, our compact architecture can run in real-time, making it suitable for online body-tracking applications. We train and evaluate our model on AMASS motion capture dataset, and demonstrate that our approach outperforms state-of-the-art methods in generated motion accuracy and smoothness. We further justify our design choices through extensive experiments and ablation studies.

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

  • Authors: Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08818
  • Pdf link: https://arxiv.org/pdf/2304.08818
  • Abstract
    Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task. We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i.e., videos. Similarly, we temporally align diffusion model upsamplers, turning them into temporally consistent video super resolution models. We focus on two relevant real-world applications: Simulation of in-the-wild driving data and creative content creation with text-to-video modeling. In particular, we validate our Video LDM on real driving videos of resolution 512 x 1024, achieving state-of-the-art performance. Furthermore, our approach can easily leverage off-the-shelf pre-trained image LDMs, as we only need to train a temporal alignment model in that case. Doing so, we turn the publicly available, state-of-the-art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280 x 2048. We show that the temporal layers trained in this way generalize to different fine-tuned text-to-image LDMs. Utilizing this property, we show the first results for personalized text-to-video generation, opening exciting directions for future content creation. Project page: https://research.nvidia.com/labs/toronto-ai/VideoLDM/

TTIDA: Controllable Generative Data Augmentation via Text-to-Text and Text-to-Image Models

  • Authors: Yuwei Yin, Jean Kaddour, Xiang Zhang, Yixin Nie, Zhenguang Liu, Lingpeng Kong, Qi Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08821
  • Pdf link: https://arxiv.org/pdf/2304.08821
  • Abstract
    Data augmentation has been established as an efficacious approach to supplement useful information for low-resource datasets. Traditional augmentation techniques such as noise injection and image transformations have been widely used. In addition, generative data augmentation (GDA) has been shown to produce more diverse and flexible data. While generative adversarial networks (GANs) have been frequently used for GDA, they lack diversity and controllability compared to text-to-image diffusion models. In this paper, we propose TTIDA (Text-to-Text-to-Image Data Augmentation) to leverage the capabilities of large-scale pre-trained Text-to-Text (T2T) and Text-to-Image (T2I) generative models for data augmentation. By conditioning the T2I model on detailed descriptions produced by T2T models, we are able to generate photo-realistic labeled images in a flexible and controllable manner. Experiments on in-domain classification, cross-domain classification, and image captioning tasks show consistent improvements over other data augmentation baselines. Analytical studies in varied settings, including few-shot, long-tail, and adversarial, further reinforce the effectiveness of TTIDA in enhancing performance and increasing robustness.

Two-stage Denoising Diffusion Model for Source Localization in Graph Inverse Problems

  • Authors: Bosong Huang, Weihao Yu, Ruzhong Xie, Jing Xiao, Jin Huang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08841
  • Pdf link: https://arxiv.org/pdf/2304.08841
  • Abstract
    Source localization is the inverse problem of graph information dissemination and has broad practical applications. However, the inherent intricacy and uncertainty in information dissemination pose significant challenges, and the ill-posed nature of the source localization problem further exacerbates these challenges. Recently, deep generative models, particularly diffusion models inspired by classical non-equilibrium thermodynamics, have made significant progress. While diffusion models have proven to be powerful in solving inverse problems and producing high-quality reconstructions, applying them directly to the source localization is infeasible for two reasons. Firstly, it is impossible to calculate the posterior disseminated results on a large-scale network for iterative denoising sampling, which would incur enormous computational costs. Secondly, in the existing methods for this field, the training data itself are ill-posed (many-to-one); thus simply transferring the diffusion model would only lead to local optima. To address these challenges, we propose a two-stage optimization framework, the source localization denoising diffusion model (SL-Diff). In the coarse stage, we devise the source proximity degrees as the supervised signals to generate coarse-grained source predictions. This aims to efficiently initialize the next stage, significantly reducing its convergence time and calibrating the convergence process. Furthermore, the introduction of cascade temporal information in this training method transforms the many-to-one mapping relationship into a one-to-one relationship, perfectly addressing the ill-posed problem. In the fine stage, we design a diffusion model for the graph inverse problem that can quantify the uncertainty in the dissemination. The proposed SL-Diff yields excellent prediction results within a reasonable sampling time at extensive experiments.

UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer

  • Authors: Soon Yau Cheong, Armin Mustafa, Andrew Gilbert
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.08870
  • Pdf link: https://arxiv.org/pdf/2304.08870
  • Abstract
    Existing person image generative models can do either image generation or pose transfer but not both. We propose a unified diffusion model, UPGPT to provide a universal solution to perform all the person image tasks - generative, pose transfer, and editing. With fine-grained multimodality and disentanglement capabilities, our approach offers fine-grained control over the generation and the editing process of images using a combination of pose, text, and image, all without needing a semantic segmentation mask which can be challenging to obtain or edit. We also pioneer the parameterized body SMPL model in pose-guided person image generation to demonstrate new capability - simultaneous pose and camera view interpolation while maintaining a person's appearance. Results on the benchmark DeepFashion dataset show that UPGPT is the new state-of-the-art while simultaneously pioneering new capabilities of edit and pose transfer in human image generation.

An Augmented Subspace Based Adaptive Proper Orthogonal Decomposition Method for Time Dependent Partial Differential Equations

  • Authors: Xiaoying Dai, Miao Hu, Jack Xin, Aihui Zhou
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.09007
  • Pdf link: https://arxiv.org/pdf/2304.09007
  • Abstract
    In this paper, we propose an augmented subspace based adaptive proper orthogonal decomposition (POD) method for solving the time dependent partial differential equations. By augmenting the POD subspace with some auxiliary modes, we obtain an augmented subspace. We use the difference between the approximation obtained in this augmented subspace and that obtained in the original POD subspace to construct an error indicator, by which we obtain a general framework for augmented subspace based adaptive POD method. We then provide two strategies to obtain some specific augmented subspaces, the random vector based augmented subspace and the coarse-grid approximations based augmented subspace. We apply our new method to two typical 3D advection-diffusion equations with the advection being the Kolmogorov flow and the ABC flow. Numerical results show that our method is more efficient than the existing adaptive POD methods, especially for the advection dominated models.

Look ATME: The Discriminator Mean Entropy Needs Attention

  • Authors: Edgardo Solano-Carrillo, Angel Bueno Rodriguez, Borja Carrillo-Perez, Yannik Steiniger, Jannis Stoppe
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09024
  • Pdf link: https://arxiv.org/pdf/2304.09024
  • Abstract
    Generative adversarial networks (GANs) are successfully used for image synthesis but are known to face instability during training. In contrast, probabilistic diffusion models (DMs) are stable and generate high-quality images, at the cost of an expensive sampling procedure. In this paper, we introduce a simple method to allow GANs to stably converge to their theoretical optimum, while bringing in the denoising machinery from DMs. These models are combined into a simpler model (ATME) that only requires a forward pass during inference, making predictions cheaper and more accurate than DMs and popular GANs. ATME breaks an information asymmetry existing in most GAN models in which the discriminator has spatial knowledge of where the generator is failing. To restore the information symmetry, the generator is endowed with knowledge of the entropic state of the discriminator, which is leveraged to allow the adversarial game to converge towards equilibrium. We demonstrate the power of our method in several image-to-image translation tasks, showing superior performance than state-of-the-art methods at a lesser cost. Code is available at https://github.com/DLR-MI/atme

Keyword: dynamic

Agent-Based Modeling and its Tradeoffs: An Introduction & Examples

  • Authors: G. Wade McDonald, Nathaniel D. Osgood
  • Subjects: Multiagent Systems (cs.MA); Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2304.08497
  • Pdf link: https://arxiv.org/pdf/2304.08497
  • Abstract
    Agent-based modeling is a computational dynamic modeling technique that may be less familiar to some readers. Agent-based modeling seeks to understand the behaviour of complex systems by situating agents in an environment and studying the emergent outcomes of agent-agent and agent-environment interactions. In comparison with compartmental models, agent-based models offer simpler, more scalable and flexible representation of heterogeneity, the ability to capture dynamic and static network and spatial context, and the ability to consider history of individuals within the model. In contrast, compartmental models offer faster development time with less programming required, lower computational requirements that do not scale with population, and the option for concise mathematical formulation with ordinary, delay or stochastic differential equations supporting derivation of properties of the system behaviour. In this chapter, basic characteristics of agent-based models are introduced, advantages and disadvantages of agent-based models, as compared with compartmental models, are discussed, and two example agent-based infectious disease models are reviewed.

A comparison between Recurrent Neural Networks and classical machine learning approaches In Laser induced breakdown spectroscopy

  • Authors: Fatemeh Rezaei, Pouriya Khaliliyan, Mohsen Rezaei, Parvin Karimi, Behnam Ashrafkhani
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08500
  • Pdf link: https://arxiv.org/pdf/2304.08500
  • Abstract
    Recurrent Neural Networks are classes of Artificial Neural Networks that establish connections between different nodes form a directed or undirected graph for temporal dynamical analysis. In this research, the laser induced breakdown spectroscopy (LIBS) technique is used for quantitative analysis of aluminum alloys by different Recurrent Neural Network (RNN) architecture. The fundamental harmonic (1064 nm) of a nanosecond Nd:YAG laser pulse is employed to generate the LIBS plasma for the prediction of constituent concentrations of the aluminum standard samples. Here, Recurrent Neural Networks based on different networks, such as Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU), Simple Recurrent Neural Network (Simple RNN), and as well as Recurrent Convolutional Networks comprising of Conv-SimpleRNN, Conv-LSTM and Conv-GRU are utilized for concentration prediction. Then a comparison is performed among prediction by classical machine learning methods of support vector regressor (SVR), the Multi Layer Perceptron (MLP), Decision Tree algorithm, Gradient Boosting Regression (GBR), Random Forest Regression (RFR), Linear Regression, and k-Nearest Neighbor (KNN) algorithm. Results showed that the machine learning tools based on Convolutional Recurrent Networks had the best efficiencies in prediction of the most of the elements among other multivariate methods.

Robust Control Barrier Functions with Uncertainty Estimation

  • Authors: Ersin Daş, Skylar X. Wei, Joel W. Burdick
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.08538
  • Pdf link: https://arxiv.org/pdf/2304.08538
  • Abstract
    This paper proposes a safety controller for control-affine nonlinear systems with unmodelled dynamics and disturbances to improve closed-loop robustness. Uncertainty estimation-based control barrier functions (CBFs) are utilized to ensure robust safety in the presence of model uncertainties, which may depend on control input and states. We present a new uncertainty/disturbance estimator with theoretical upper bounds on estimation error and estimated outputs, which are used to ensure robust safety by formulating a convex optimization problem using a high-order CBF. The possibly unsafe nominal feedback controller is augmented with the proposed estimator in two frameworks (1) an uncertainty compensator and (2) a robustifying reformulation of CBF constraint with respect to the estimator outputs. The former scheme ensures safety with performance improvement by adaptively rejecting the matched uncertainty. The second method uses uncertainty estimation to robustify higher-order CBFs for safety-critical control. The proposed methods are demonstrated in simulations of an uncertain adaptive cruise control problem and a multirotor obstacle avoidance situation.

RS2G: Data-Driven Scene-Graph Extraction and Embedding for Robust Autonomous Perception and Scenario Understanding

  • Authors: Arnav Vaibhav Malawade, Shih-Yuan Yu, Junyao Wang, Mohammad Abdullah Al Faruque
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08600
  • Pdf link: https://arxiv.org/pdf/2304.08600
  • Abstract
    Human drivers naturally reason about interactions between road users to understand and safely navigate through traffic. Thus, developing autonomous vehicles necessitates the ability to mimic such knowledge and model interactions between road users to understand and navigate unpredictable, dynamic environments. However, since real-world scenarios often differ from training datasets, effectively modeling the behavior of various road users in an environment remains a significant research challenge. This reality necessitates models that generalize to a broad range of domains and explicitly model interactions between road users and the environment to improve scenario understanding. Graph learning methods address this problem by modeling interactions using graph representations of scenarios. However, existing methods cannot effectively transfer knowledge gained from the training domain to real-world scenarios. This constraint is caused by the domain-specific rules used for graph extraction that can vary in effectiveness across domains, limiting generalization ability. To address these limitations, we propose RoadScene2Graph (RS2G): a data-driven graph extraction and modeling approach that learns to extract the best graph representation of a road scene for solving autonomous scene understanding tasks. We show that RS2G enables better performance at subjective risk assessment than rule-based graph extraction methods and deep-learning-based models. RS2G also improves generalization and Sim2Real transfer learning, which denotes the ability to transfer knowledge gained from simulation datasets to unseen real-world scenarios. We also present ablation studies showing how RS2G produces a more useful graph representation for downstream classifiers. Finally, we show how RS2G can identify the relative importance of rule-based graph edges and enables intelligent graph sparsity tuning.

Dynamic Vector Bin Packing for Online Resource Allocation in the Cloud

  • Authors: Aniket Murhekar, David Arbour, Tung Mai, Anup Rao
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.08648
  • Pdf link: https://arxiv.org/pdf/2304.08648
  • Abstract
    Several cloud-based applications, such as cloud gaming, rent servers to execute jobs which arrive in an online fashion. Each job has a resource demand and must be dispatched to a cloud server which has enough resources to execute the job, which departs after its completion. Under the `pay-as-you-go' billing model, the server rental cost is proportional to the total time that servers are actively running jobs. The problem of efficiently allocating a sequence of online jobs to servers without exceeding the resource capacity of any server while minimizing total server usage time can be modelled as a variant of the dynamic bin packing problem (DBP), called MinUsageTime DBP. In this work, we initiate the study of the problem with multi-dimensional resource demands (e.g. CPU/GPU usage, memory requirement, bandwidth usage, etc.), called MinUsageTime Dynamic Vector Bin Packing (DVBP). We study the competitive ratio (CR) of Any Fit packing algorithms for this problem. We show almost-tight bounds on the CR of three specific Any Fit packing algorithms, namely First Fit, Next Fit, and Move To Front. We prove that the CR of Move To Front is at most $(2\mu+1)d +1$, where $\mu$ is the ratio of the max/min item durations. For $d=1$, this significantly improves the previously known upper bound of $6\mu+7$ (Kamali & Lopez-Ortiz, 2015). We then prove the CR of First Fit and Next Fit are bounded by $(\mu+2)d+1$ and $2\mu d+1$, respectively. Next, we prove a lower bound of $(\mu+1)d$ on the CR of any Any Fit packing algorithm, an improved lower bound of $2\mu d$ for Next Fit, and a lower bound of $2\mu$ for Move To Front in the 1-D case. All our bounds improve or match the best-known bounds for the 1-D case. Finally, we experimentally study the average-case performance of these algorithms on randomly generated synthetic data, and observe that Move To Front outperforms other Any Fit packing algorithms.

Mechanical Intelligence Simplifies Control in Terrestrial Limbless Locomotion

  • Authors: Tianyu Wang, Christopher Pierce, Velin Kojouharov, Baxi Chong, Kelimar Diaz, Hang Lu, Daniel I. Goldman
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.08652
  • Pdf link: https://arxiv.org/pdf/2304.08652
  • Abstract
    Limbless locomotors, from microscopic worms to macroscopic snakes, traverse complex, heterogeneous natural environments typically using undulatory body wave propagation. Theoretical and robophysical models typically emphasize body kinematics and active neural/electronic control. However, we contend that because such approaches often neglect the role of passive, mechanically controlled processes (i.e., those involving mechanical intelligence), they fail to reproduce the performance of even the simplest organisms. To discover principles of how mechanical intelligence aids limbless locomotion in heterogeneous terradynamic regimes, here we conduct a comparative study of locomotion in a model of heterogeneous terrain (lattices of rigid posts). We use a model biological system, the highly studied nematode worm C. elegans, and a novel robophysical device whose bilateral actuator morphology models that of limbless organisms across scales. The robot's kinematics quantitatively reproduce the performance of the nematodes with purely open-loop control; mechanical intelligence simplifies control of obstacle navigation and exploitation by reducing the need for active sensing and feedback. An active behavior observed in C. elegans, undulatory wave reversal upon head collisions, robustifies locomotion via exploitation of the systems' mechanical intelligence. Our study provides insights into how neurally simple limbless organisms like nematodes can leverage mechanical intelligence via appropriately tuned bilateral actuation to locomote in complex environments. These principles likely apply to neurally more sophisticated organisms and also provide a new design and control paradigm for limbless robots for applications like search and rescue and planetary exploration.

RPDP: An Efficient Data Placement based on Residual Performance for P2P Storage Systems

  • Authors: Fitrio Pakana, Nasrin Sohrabi, Chenhao Xu, Zahir Tari, Hai Dong
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.08692
  • Pdf link: https://arxiv.org/pdf/2304.08692
  • Abstract
    Storage systems using Peer-to-Peer (P2P) architecture are an alternative to the traditional client-server systems. They offer better scalability and fault tolerance while at the same time eliminate the single point of failure. The nature of P2P storage systems (which consist of heterogeneous nodes) introduce however data placement challenges that create implementation trade-offs (e.g., between performance and scalability). Existing Kademlia-based DHT data placement method stores data at closest node, where the distance is measured by bit-wise XOR operation between data and a given node. This approach is highly scalable because it does not require global knowledge for placing data nor for the data retrieval. It does not however consider the heterogeneous performance of the nodes, which can result in imbalanced resource usage affecting the overall latency of the system. Other works implement criteria-based selection that addresses heterogeneity of nodes, however often cause subsequent data retrieval to require global knowledge of where the data stored. This paper introduces Residual Performance-based Data Placement (RPDP), a novel data placement method based on dynamic temporal residual performance of data nodes. RPDP places data to most appropriate selected nodes based on their throughput and latency with the aim to achieve lower overall latency by balancing data distribution with respect to the individual performance of nodes. RPDP relies on Kademlia-based DHT with modified data structure to allow data subsequently retrieved without the need of global knowledge. The experimental results indicate that RPDP reduces the overall latency of the baseline Kademlia-based P2P storage system (by 4.87%) and it also reduces the variance of latency among the nodes, with minimal impact to the data retrieval complexity.

Super-Logarithmic Lower Bounds for Dynamic Graph Problems

  • Authors: Kasper Green Larsen, Huacheng Yu
  • Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
  • Arxiv link: https://arxiv.org/abs/2304.08745
  • Pdf link: https://arxiv.org/pdf/2304.08745
  • Abstract
    In this work, we prove a $\tilde{\Omega}(\lg^{3/2} n )$ unconditional lower bound on the maximum of the query time and update time for dynamic data structures supporting reachability queries in $n$-node directed acyclic graphs under edge insertions. This is the first super-logarithmic lower bound for any natural graph problem. In proving the lower bound, we also make novel contributions to the state-of-the-art data structure lower bound techniques that we hope may lead to further progress in proving lower bounds.

Cooperative Multi-Agent Reinforcement Learning for Inventory Management

  • Authors: Madhav Khirwar, Karthik S. Gurumoorthy, Ankit Ajit Jain, Shantala Manchenahally
  • Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2304.08769
  • Pdf link: https://arxiv.org/pdf/2304.08769
  • Abstract
    With Reinforcement Learning (RL) for inventory management (IM) being a nascent field of research, approaches tend to be limited to simple, linear environments with implementations that are minor modifications of off-the-shelf RL algorithms. Scaling these simplistic environments to a real-world supply chain comes with a few challenges such as: minimizing the computational requirements of the environment, specifying agent configurations that are representative of dynamics at real world stores and warehouses, and specifying a reward framework that encourages desirable behavior across the whole supply chain. In this work, we present a system with a custom GPU-parallelized environment that consists of one warehouse and multiple stores, a novel architecture for agent-environment dynamics incorporating enhanced state and action spaces, and a shared reward specification that seeks to optimize for a large retailer's supply chain needs. Each vertex in the supply chain graph is an independent agent that, based on its own inventory, able to place replenishment orders to the vertex upstream. The warehouse agent, aside from placing orders from the supplier, has the special property of also being able to constrain replenishment to stores downstream, which results in it learning an additional allocation sub-policy. We achieve a system that outperforms standard inventory control policies such as a base-stock policy and other RL-based specifications for 1 product, and lay out a future direction of work for multiple products.

Neuromorphic Control using Input-Weighted Threshold Adaptation

  • Authors: Stein Stroobants, Christophe De Wagter, Guido C.H.E. de Croon
  • Subjects: Robotics (cs.RO); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.08778
  • Pdf link: https://arxiv.org/pdf/2304.08778
  • Abstract
    Neuromorphic processing promises high energy efficiency and rapid response rates, making it an ideal candidate for achieving autonomous flight of resource-constrained robots. It will be especially beneficial for complex neural networks as are involved in high-level visual perception. However, fully neuromorphic solutions will also need to tackle low-level control tasks. Remarkably, it is currently still challenging to replicate even basic low-level controllers such as proportional-integral-derivative (PID) controllers. Specifically, it is difficult to incorporate the integral and derivative parts. To address this problem, we propose a neuromorphic controller that incorporates proportional, integral, and derivative pathways during learning. Our approach includes a novel input threshold adaptation mechanism for the integral pathway. This Input-Weighted Threshold Adaptation (IWTA) introduces an additional weight per synaptic connection, which is used to adapt the threshold of the post-synaptic neuron. We tackle the derivative term by employing neurons with different time constants. We first analyze the performance and limits of the proposed mechanisms and then put our controller to the test by implementing it on a microcontroller connected to the open-source tiny Crazyflie quadrotor, replacing the innermost rate controller. We demonstrate the stability of our bio-inspired algorithm with flights in the presence of disturbances. The current work represents a substantial step towards controlling highly dynamic systems with neuromorphic algorithms, thus advancing neuromorphic processing and robotics. In addition, integration is an important part of any temporal task, so the proposed Input-Weighted Threshold Adaptation (IWTA) mechanism may have implications well beyond control tasks.

AoI-Delay Tradeoff in Mobile Edge Caching: A Mixed-Order Drift-Plus-Penalty Algorithm

  • Authors: Ran Li, Chuan Huang, Xiaoqi Qin
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.08781
  • Pdf link: https://arxiv.org/pdf/2304.08781
  • Abstract
    We consider a scheduling problem in a Mobile Edge Caching (MEC) network, where a base station (BS) uploads messages from multiple source nodes (SNs) and transmits them to mobile users (MUs) via downlinks, aiming to jointly optimize the average service Age of Information (AoI) and service delay over MUs. This problem is formulated as a difficult sequential decision making problem with discrete-valued and linearly-constrained design variables. To solve this problem, we first approximate its achievable region by characterizing its superset and subset. The superset is derived based on the rate stability theorem, while the subset is obtained using a novel stochastic policy. We also validate that this subset is substantially identical to the achievable region when the number of schedule resources is large. Additionally, we propose a sufficient condition to check the existence of the solution to the problem. Then, we propose the mixed-order drift-plus-penalty algorithm that uses a dynamic programming (DP) method to optimize the summation over a linear and quadratic Lyapunov drift and a penalty term, to handle the product term over different queue backlogs in the objective function. Finally, by associating the proposed algorithm with the stochastic policy, we demonstrate that it achieves an $O(1/V)$ versus $O(V)$ tradeoff for the average AoI and average delay.

Full-Duplex Wireless for 6G: Progress Brings New Opportunities and Challenges

  • Authors: Besma Smida, Ashutosh Sabharwal, Gabor Fodor, George C. Alexandropoulos, Himal A. Suraweera, Chan-Byoung Chae
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.08789
  • Pdf link: https://arxiv.org/pdf/2304.08789
  • Abstract
    The use of in-band full-duplex (FD) enables nodes to simultaneously transmit and receive on the same frequency band, which challenges the traditional assumption in wireless network design. The full-duplex capability enhances spectral efficiency and decreases latency, which are two key drivers pushing the performance expectations of next-generation mobile networks. In less than ten years, in-band FD has advanced from being demonstrated in research labs to being implemented in standards and products, presenting new opportunities to utilize its foundational concepts. Some of the most significant opportunities include using FD to enable wireless networks to sense the physical environment, integrate sensing and communication applications, develop integrated access and backhaul solutions, and work with smart signal propagation environments powered by reconfigurable intelligent surfaces. However, these new opportunities also come with new challenges for large-scale commercial deployment of FD technology, such as managing self-interference, combating cross-link interference in multi-cell networks, and coexistence of dynamic time division duplex, subband FD and FD networks.

Large-scale Dynamic Network Representation via Tensor Ring Decomposition

  • Authors: Qu Wang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08798
  • Pdf link: https://arxiv.org/pdf/2304.08798
  • Abstract
    Large-scale Dynamic Networks (LDNs) are becoming increasingly important in the Internet age, yet the dynamic nature of these networks captures the evolution of the network structure and how edge weights change over time, posing unique challenges for data analysis and modeling. A Latent Factorization of Tensors (LFT) model facilitates efficient representation learning for a LDN. But the existing LFT models are almost based on Canonical Polyadic Factorization (CPF). Therefore, this work proposes a model based on Tensor Ring (TR) decomposition for efficient representation learning for a LDN. Specifically, we incorporate the principle of single latent factor-dependent, non-negative, and multiplicative update (SLF-NMU) into the TR decomposition model, and analyze the particular bias form of TR decomposition. Experimental studies on two real LDNs demonstrate that the propose method achieves higher accuracy than existing models.

Neuromorphic computing for attitude estimation onboard quadrotors

  • Authors: Stein Stroobants, Julien Dupeyroux, Guido C.H.E. de Croon
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.08802
  • Pdf link: https://arxiv.org/pdf/2304.08802
  • Abstract
    Compelling evidence has been given for the high energy efficiency and update rates of neuromorphic processors, with performance beyond what standard Von Neumann architectures can achieve. Such promising features could be advantageous in critical embedded systems, especially in robotics. To date, the constraints inherent in robots (e.g., size and weight, battery autonomy, available sensors, computing resources, processing time, etc.), and particularly in aerial vehicles, severely hamper the performance of fully-autonomous on-board control, including sensor processing and state estimation. In this work, we propose a spiking neural network (SNN) capable of estimating the pitch and roll angles of a quadrotor in highly dynamic movements from 6-degree of freedom Inertial Measurement Unit (IMU) data. With only 150 neurons and a limited training dataset obtained using a quadrotor in a real world setup, the network shows competitive results as compared to state-of-the-art, non-neuromorphic attitude estimators. The proposed architecture was successfully tested on the Loihi neuromorphic processor on-board a quadrotor to estimate the attitude when flying. Our results show the robustness of neuromorphic attitude estimation and pave the way towards energy-efficient, fully autonomous control of quadrotors with dedicated neuromorphic computing systems.

Towards the Transferable Audio Adversarial Attack via Ensemble Methods

  • Authors: Feng Guo, Zheng Sun, Yuxuan Chen, Lei Ju
  • Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.08811
  • Pdf link: https://arxiv.org/pdf/2304.08811
  • Abstract
    In recent years, deep learning (DL) models have achieved significant progress in many domains, such as autonomous driving, facial recognition, and speech recognition. However, the vulnerability of deep learning models to adversarial attacks has raised serious concerns in the community because of their insufficient robustness and generalization. Also, transferable attacks have become a prominent method for black-box attacks. In this work, we explore the potential factors that impact adversarial examples (AEs) transferability in DL-based speech recognition. We also discuss the vulnerability of different DL systems and the irregular nature of decision boundaries. Our results show a remarkable difference in the transferability of AEs between speech and images, with the data relevance being low in images but opposite in speech recognition. Motivated by dropout-based ensemble approaches, we propose random gradient ensembles and dynamic gradient-weighted ensembles, and we evaluate the impact of ensembles on the transferability of AEs. The results show that the AEs created by both approaches are valid for transfer to the black box API.

Motion-state Alignment for Video Semantic Segmentation

  • Authors: Jinming Su, Ruihong Yin, Shuaibin Zhang, Junfeng Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.08820
  • Pdf link: https://arxiv.org/pdf/2304.08820
  • Abstract
    In recent years, video semantic segmentation has made great progress with advanced deep neural networks. However, there still exist two main challenges \ie, information inconsistency and computation cost. To deal with the two difficulties, we propose a novel motion-state alignment framework for video semantic segmentation to keep both motion and state consistency. In the framework, we first construct a motion alignment branch armed with an efficient decoupled transformer to capture dynamic semantics, guaranteeing region-level temporal consistency. Then, a state alignment branch composed of a stage transformer is designed to enrich feature spaces for the current frame to extract static semantics and achieve pixel-level state consistency. Next, by a semantic assignment mechanism, the region descriptor of each semantic category is gained from dynamic semantics and linked with pixel descriptors from static semantics. Benefiting from the alignment of these two kinds of effective information, the proposed method picks up dynamic and static semantics in a targeted way, so that video semantic regions are consistently segmented to obtain precise locations with low computational complexity. Extensive experiments on Cityscapes and CamVid datasets show that the proposed approach outperforms state-of-the-art methods and validates the effectiveness of the motion-state alignment framework.

GoferBot: A Visual Guided Human-Robot Collaborative Assembly System

  • Authors: Zheyu Zhuang, Yizhak Ben-Shabat, Jiahao Zhang, Stephen Gould, Robert Mahony
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.08840
  • Pdf link: https://arxiv.org/pdf/2304.08840
  • Abstract
    The current transformation towards smart manufacturing has led to a growing demand for human-robot collaboration (HRC) in the manufacturing process. Perceiving and understanding the human co-worker's behaviour introduces challenges for collaborative robots to efficiently and effectively perform tasks in unstructured and dynamic environments. Integrating recent data-driven machine vision capabilities into HRC systems is a logical next step in addressing these challenges. However, in these cases, off-the-shelf components struggle due to generalisation limitations. Real-world evaluation is required in order to fully appreciate the maturity and robustness of these approaches. Furthermore, understanding the pure-vision aspects is a crucial first step before combining multiple modalities in order to understand the limitations. In this paper, we propose GoferBot, a novel vision-based semantic HRC system for a real-world assembly task. It is composed of a visual servoing module that reaches and grasps assembly parts in an unstructured multi-instance and dynamic environment, an action recognition module that performs human action prediction for implicit communication, and a visual handover module that uses the perceptual understanding of human behaviour to produce an intuitive and efficient collaborative assembly experience. GoferBot is a novel assembly system that seamlessly integrates all sub-modules by utilising implicit semantic information purely from visual perception.

Two-stage Denoising Diffusion Model for Source Localization in Graph Inverse Problems

  • Authors: Bosong Huang, Weihao Yu, Ruzhong Xie, Jing Xiao, Jin Huang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.08841
  • Pdf link: https://arxiv.org/pdf/2304.08841
  • Abstract
    Source localization is the inverse problem of graph information dissemination and has broad practical applications. However, the inherent intricacy and uncertainty in information dissemination pose significant challenges, and the ill-posed nature of the source localization problem further exacerbates these challenges. Recently, deep generative models, particularly diffusion models inspired by classical non-equilibrium thermodynamics, have made significant progress. While diffusion models have proven to be powerful in solving inverse problems and producing high-quality reconstructions, applying them directly to the source localization is infeasible for two reasons. Firstly, it is impossible to calculate the posterior disseminated results on a large-scale network for iterative denoising sampling, which would incur enormous computational costs. Secondly, in the existing methods for this field, the training data itself are ill-posed (many-to-one); thus simply transferring the diffusion model would only lead to local optima. To address these challenges, we propose a two-stage optimization framework, the source localization denoising diffusion model (SL-Diff). In the coarse stage, we devise the source proximity degrees as the supervised signals to generate coarse-grained source predictions. This aims to efficiently initialize the next stage, significantly reducing its convergence time and calibrating the convergence process. Furthermore, the introduction of cascade temporal information in this training method transforms the many-to-one mapping relationship into a one-to-one relationship, perfectly addressing the ill-posed problem. In the fine stage, we design a diffusion model for the graph inverse problem that can quantify the uncertainty in the dissemination. The proposed SL-Diff yields excellent prediction results within a reasonable sampling time at extensive experiments.

PEGA: Personality-Guided Preference Aggregator for Ephemeral Group Recommendation

  • Authors: Guangze Ye, Wen Wu, Liye Shi, Wenxin Hu, Xin Chen, Liang He
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.08851
  • Pdf link: https://arxiv.org/pdf/2304.08851
  • Abstract
    Recently, making recommendations for ephemeral groups which contain dynamic users and few historic interactions have received an increasing number of attention. The main challenge of ephemeral group recommender is how to aggregate individual preferences to represent the group's overall preference. Score aggregation and preference aggregation are two commonly-used methods that adopt hand-craft predefined strategies and data-driven strategies, respectively. However, they neglect to take into account the importance of the individual inherent factors such as personality in the group. In addition, they fail to work well due to a small number of interactive records. To address these issues, we propose a Personality-Guided Preference Aggregator (PEGA) for ephemeral group recommendation. Concretely, we first adopt hyper-rectangle to define the concept of Group Personality. We then use the personality attention mechanism to aggregate group preferences. The role of personality in our approach is twofold: (1) To estimate individual users' importance in a group and provide explainability; (2) to alleviate the data sparsity issue that occurred in ephemeral groups. The experimental results demonstrate that our model significantly outperforms the state-of-the-art methods w.r.t. the score of both Recall and NDCG on Amazon and Yelp datasets.

Secured and Cooperative Publish/Subscribe Scheme in Autonomous Vehicular Networks

  • Authors: Yuntao Wang, Zhou Su, Qichao Xu, Tom H. Luan, Rongxing Lu
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2304.08875
  • Pdf link: https://arxiv.org/pdf/2304.08875
  • Abstract
    In order to save computing power yet enhance safety, there is a strong intention for autonomous vehicles (AVs) in future to drive collaboratively by sharing sensory data and computing results among neighbors. However, the intense collaborative computing and data transmissions among unknown others will inevitably introduce severe security concerns. Aiming at addressing security concerns in future AVs, in this paper, we develop SPAD, a secured framework to forbid free-riders and {promote trustworthy data dissemination} in collaborative autonomous driving. Specifically, we first introduce a publish/subscribe framework for inter-vehicle data transmissions{. To defend against free-riding attacks,} we formulate the interactions between publisher AVs and subscriber AVs as a vehicular publish/subscribe game, {and incentivize AVs to deliver high-quality data by analyzing the Stackelberg equilibrium of the game. We also design a reputation evaluation mechanism in the game} to identify malicious AVs {in disseminating fake information}. {Furthermore, for} lack of sufficient knowledge on parameters of {the} network model and user cost model {in dynamic game scenarios}, a two-tier reinforcement learning based algorithm with hotbooting is developed to obtain the optimal {strategies of subscriber AVs and publisher AVs with free-rider prevention}. Extensive simulations are conducted, and the results validate that our SPAD can effectively {prevent free-riders and enhance the dependability of disseminated contents,} compared with conventional schemes.

Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection

  • Authors: Chang Xu, Jian Ding, Jinwang Wang, Wen Yang, Huai Yu, Lei Yu, Gui-Song Xia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.08876
  • Pdf link: https://arxiv.org/pdf/2304.08876
  • Abstract
    Detecting arbitrarily oriented tiny objects poses intense challenges to existing detectors, especially for label assignment. Despite the exploration of adaptive label assignment in recent oriented object detectors, the extreme geometry shape and limited feature of oriented tiny objects still induce severe mismatch and imbalance issues. Specifically, the position prior, positive sample feature, and instance are mismatched, and the learning of extreme-shaped objects is biased and unbalanced due to little proper feature supervision. To tackle these issues, we propose a dynamic prior along with the coarse-to-fine assigner, dubbed DCFL. For one thing, we model the prior, label assignment, and object representation all in a dynamic manner to alleviate the mismatch issue. For another, we leverage the coarse prior matching and finer posterior constraint to dynamically assign labels, providing appropriate and relatively balanced supervision for diverse instances. Extensive experiments on six datasets show substantial improvements to the baseline. Notably, we obtain the state-of-the-art performance for one-stage detectors on the DOTA-v1.5, DOTA-v2.0, and DIOR-R datasets under single-scale training and testing. Codes are available at https://github.com/Chasel-Tsui/mmrotate-dcfl.

NPS: A Framework for Accurate Program Sampling Using Graph Neural Network

  • Authors: Yuanwei Fang, Zihao Liu, Yanheng Lu, Jiawei Liu, Jiajie Li, Yi Jin, Jian Chen, Yenkuang Chen, Hongzhong Zheng, Yuan Xie
  • Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL)
  • Arxiv link: https://arxiv.org/abs/2304.08880
  • Pdf link: https://arxiv.org/pdf/2304.08880
  • Abstract
    With the end of Moore's Law, there is a growing demand for rapid architectural innovations in modern processors, such as RISC-V custom extensions, to continue performance scaling. Program sampling is a crucial step in microprocessor design, as it selects representative simulation points for workload simulation. While SimPoint has been the de-facto approach for decades, its limited expressiveness with Basic Block Vector (BBV) requires time-consuming human tuning, often taking months, which impedes fast innovation and agile hardware development. This paper introduces Neural Program Sampling (NPS), a novel framework that learns execution embeddings using dynamic snapshots of a Graph Neural Network. NPS deploys AssemblyNet for embedding generation, leveraging an application's code structures and runtime states. AssemblyNet serves as NPS's graph model and neural architecture, capturing a program's behavior in aspects such as data computation, code path, and data flow. AssemblyNet is trained with a data prefetch task that predicts consecutive memory addresses. In the experiments, NPS outperforms SimPoint by up to 63%, reducing the average error by 38%. Additionally, NPS demonstrates strong robustness with increased accuracy, reducing the expensive accuracy tuning overhead. Furthermore, NPS shows higher accuracy and generality than the state-of-the-art GNN approach in code behavior learning, enabling the generation of high-quality execution embeddings.

Safe reinforcement learning with self-improving hard constraints for multi-energy management systems

  • Authors: Glenn Ceusters, Muhammad Andy Putratama, Rüdiger Franke, Ann Nowé, Maarten Messagie
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.08897
  • Pdf link: https://arxiv.org/pdf/2304.08897
  • Abstract
    Safe reinforcement learning (RL) with hard constraint guarantees is a promising optimal control direction for multi-energy management systems. It only requires the environment-specific constraint functions itself a prior and not a complete model (i.e. plant, disturbance and noise models, and prediction models for states not included in the plant model - e.g. demand, weather, and price forecasts). The project-specific upfront and ongoing engineering efforts are therefore still reduced, better representations of the underlying system dynamics can still be learned and modeling bias is kept to a minimum (no model-based objective function). However, even the constraint functions alone are not always trivial to accurately provide in advance (e.g. an energy balance constraint requires the detailed determination of all energy inputs and outputs), leading to potentially unsafe behavior. In this paper, we present two novel advancements: (I) combining the Optlayer and SafeFallback method, named OptLayerPolicy, to increase the initial utility while keeping a high sample efficiency. (II) introducing self-improving hard constraints, to increase the accuracy of the constraint functions as more data becomes available so that better policies can be learned. Both advancements keep the constraint formulation decoupled from the RL formulation, so that new (presumably better) RL algorithms can act as drop-in replacements. We have shown that, in a simulated multi-energy system case study, the initial utility is increased to 92.4% (OptLayerPolicy) compared to 86.1% (OptLayer) and that the policy after training is increased to 104.9% (GreyOptLayerPolicy) compared to 103.4% (OptLayer) - all relative to a vanilla RL benchmark. While introducing surrogate functions into the optimization problem requires special attention, we do conclude that the newly presented GreyOptLayerPolicy method is the most advantageous.

Distributed Search Planning in 3-D Environments With a Dynamically Varying Number of Agents

  • Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.08932
  • Pdf link: https://arxiv.org/pdf/2304.08932
  • Abstract
    In this work, a novel distributed search-planning framework is proposed, where a dynamically varying team of autonomous agents cooperate in order to search multiple objects of interest in three-dimension (3-D). It is assumed that the agents can enter and exit the mission space at any point in time, and as a result the number of agents that actively participate in the mission varies over time. The proposed distributed search-planning framework takes into account the agent dynamical and sensing model, and the dynamically varying number of agents, and utilizes model predictive control (MPC) to generate cooperative search trajectories over a finite rolling planning horizon. This enables the agents to adapt their decisions on-line while considering the plans of their peers, maximizing their search planning performance, and reducing the duplication of work.

Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes

  • Authors: Rui Li, Dong Gong, Wei Yin, Hao Chen, Yu Zhu, Kaixuan Wang, Xiaozhi Chen, Jinqiu Sun, Yanning Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.08993
  • Pdf link: https://arxiv.org/pdf/2304.08993
  • Abstract
    Multi-frame depth estimation generally achieves high accuracy relying on the multi-view geometric consistency. When applied in dynamic scenes, e.g., autonomous driving, this consistency is usually violated in the dynamic areas, leading to corrupted estimations. Many multi-frame methods handle dynamic areas by identifying them with explicit masks and compensating the multi-view cues with monocular cues represented as local monocular depth or features. The improvements are limited due to the uncontrolled quality of the masks and the underutilized benefits of the fusion of the two types of cues. In this paper, we propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the heuristically crafted masks. As unveiled in our analyses, the multi-view cues capture more accurate geometric information in static areas, and the monocular cues capture more useful contexts in dynamic areas. To let the geometric perception learned from multi-view cues in static areas propagate to the monocular representation in dynamic areas and let monocular cues enhance the representation of multi-view cost volume, we propose a cross-cue fusion (CCF) module, which includes the cross-cue attention (CCA) to encode the spatially non-local relative intra-relations from each source to enhance the representation of the other. Experiments on real-world datasets prove the significant effectiveness and generalization ability of the proposed method.

PaTeCon: A Pattern-Based Temporal Constraint Mining Method for Conflict Detection on Knowledge Graphs

  • Authors: Jianhao Chen, Junyang Ren, Wentao Ding, Yuzhong Qu
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09015
  • Pdf link: https://arxiv.org/pdf/2304.09015
  • Abstract
    Temporal facts, the facts for characterizing events that hold in specific time periods, are attracting rising attention in the knowledge graph (KG) research communities. In terms of quality management, the introduction of time restrictions brings new challenges to maintaining the temporal consistency of KGs and detecting potential temporal conflicts. Previous studies rely on manually enumerated temporal constraints to detect conflicts, which are labor-intensive and may have granularity issues. We start from the common pattern of temporal facts and constraints and propose a pattern-based temporal constraint mining method, PaTeCon. PaTeCon uses automatically determined graph patterns and their relevant statistical information over the given KG instead of human experts to generate time constraints. Specifically, PaTeCon dynamically attaches class restriction to candidate constraints according to their measuring scores.We evaluate PaTeCon on two large-scale datasets based on Wikidata and Freebase respectively. The experimental results show that pattern-based automatic constraint mining is powerful in generating valuable temporal constraints.

Neural Lumped Parameter Differential Equations with Application in Friction-Stir Processing

  • Authors: James Koch, WoongJo Choi, Ethan King, David Garcia, Hrishikesh Das, Tianhao Wang, Ken Ross, Keerti Kappagantula
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2304.09047
  • Pdf link: https://arxiv.org/pdf/2304.09047
  • Abstract
    Lumped parameter methods aim to simplify the evolution of spatially-extended or continuous physical systems to that of a "lumped" element representative of the physical scales of the modeled system. For systems where the definition of a lumped element or its associated physics may be unknown, modeling tasks may be restricted to full-fidelity simulations of the physics of a system. In this work, we consider data-driven modeling tasks with limited point-wise measurements of otherwise continuous systems. We build upon the notion of the Universal Differential Equation (UDE) to construct data-driven models for reducing dynamics to that of a lumped parameter and inferring its properties. The flexibility of UDEs allow for composing various known physical priors suitable for application-specific modeling tasks, including lumped parameter methods. The motivating example for this work is the plunge and dwell stages for friction-stir welding; specifically, (i) mapping power input into the tool to a point-measurement of temperature and (ii) using this learned mapping for process control.

A Field Test of Bandit Algorithms for Recommendations: Understanding the Validity of Assumptions on Human Preferences in Multi-armed Bandits

  • Authors: Liu Leqi, Giulio Zhou, Fatma Kılınç-Karzan, Zachary C. Lipton, Alan L. Montgomery
  • Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09088
  • Pdf link: https://arxiv.org/pdf/2304.09088
  • Abstract
    Personalized recommender systems suffuse modern life, shaping what media we read and what products we consume. Algorithms powering such systems tend to consist of supervised learning-based heuristics, such as latent factor models with a variety of heuristically chosen prediction targets. Meanwhile, theoretical treatments of recommendation frequently address the decision-theoretic nature of the problem, including the need to balance exploration and exploitation, via the multi-armed bandits (MABs) framework. However, MAB-based approaches rely heavily on assumptions about human preferences. These preference assumptions are seldom tested using human subject studies, partly due to the lack of publicly available toolkits to conduct such studies. In this work, we conduct a study with crowdworkers in a comics recommendation MABs setting. Each arm represents a comic category, and users provide feedback after each recommendation. We check the validity of core MABs assumptions-that human preferences (reward distributions) are fixed over time-and find that they do not hold. This finding suggests that any MAB algorithm used for recommender systems should account for human preference dynamics. While answering these questions, we provide a flexible experimental framework for understanding human preference dynamics and testing MABs algorithms with human users. The code for our experimental framework and the collected data can be found at https://github.com/HumainLab/human-bandit-evaluation.

Safety Guaranteed Manipulation Based on Reinforcement Learning Planner and Model Predictive Control Actor

  • Authors: Zhenshan Bing, Aleksandr Mavrichev, Sicong Shen, Xiangtong Yao, Kejia Chen, Kai Huang, Alois Knoll
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.09119
  • Pdf link: https://arxiv.org/pdf/2304.09119
  • Abstract
    Deep reinforcement learning (RL) has been endowed with high expectations in tackling challenging manipulation tasks in an autonomous and self-directed fashion. Despite the significant strides made in the development of reinforcement learning, the practical deployment of this paradigm is hindered by at least two barriers, namely, the engineering of a reward function and ensuring the safety guaranty of learning-based controllers. In this paper, we address these challenging limitations by proposing a framework that merges a reinforcement learning \lstinline[columns=fixed]{planner} that is trained using sparse rewards with a model predictive controller (MPC) \lstinline[columns=fixed]{actor}, thereby offering a safe policy. On the one hand, the RL \lstinline[columns=fixed]{planner} learns from sparse rewards by selecting intermediate goals that are easy to achieve in the short term and promising to lead to target goals in the long term. On the other hand, the MPC \lstinline[columns=fixed]{actor} takes the suggested intermediate goals from the RL \lstinline[columns=fixed]{planner} as the input and predicts how the robot's action will enable it to reach that goal while avoiding any obstacles over a short period of time. We evaluated our method on four challenging manipulation tasks with dynamic obstacles and the results demonstrate that, by leveraging the complementary strengths of these two components, the agent can solve manipulation tasks in complex, dynamic environments safely with a $100%$ success rate. Videos are available at \url{https://videoviewsite.wixsite.com/mpc-hgg}.

Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning using Passive Langevin Dynamics

  • Authors: Luke Snow, Vikram Krishnamurthy
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.09123
  • Pdf link: https://arxiv.org/pdf/2304.09123
  • Abstract
    Stochastic gradient Langevin dynamics (SGLD) are a useful methodology for sampling from probability distributions. This paper provides a finite sample analysis of a passive stochastic gradient Langevin dynamics algorithm (PSGLD) designed to achieve inverse reinforcement learning. By "passive", we mean that the noisy gradients available to the PSGLD algorithm (inverse learning process) are evaluated at randomly chosen points by an external stochastic gradient algorithm (forward learner). The PSGLD algorithm thus acts as a randomized sampler which recovers the cost function being optimized by this external process. Previous work has analyzed the asymptotic performance of this passive algorithm using stochastic approximation techniques; in this work we analyze the non-asymptotic performance. Specifically, we provide finite-time bounds on the 2-Wasserstein distance between the passive algorithm and its stationary measure, from which the reconstructed cost function is obtained.

New submissions for Fri, 17 Mar 23

Keyword: pruning

There is no result

Keyword: neural\ architecture\ search

There is no result

Keyword: 3d object detection

Among Us: Adversarially Robust Collaborative Perception by Consensus

  • Authors: Yiming Li, Qi Fang, Jiamu Bai, Siheng Chen, Felix Juefei-Xu, Chen Feng
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2303.09495
  • Pdf link: https://arxiv.org/pdf/2303.09495
  • Abstract
    Multiple robots could perceive a scene (e.g., detect objects) collaboratively better than individuals, although easily suffer from adversarial attacks when using deep learning. This could be addressed by the adversarial defense, but its training requires the often-unknown attacking mechanism. Differently, we propose ROBOSAC, a novel sampling-based defense strategy generalizable to unseen attackers. Our key idea is that collaborative perception should lead to consensus rather than dissensus in results compared to individual perception. This leads to our hypothesize-and-verify framework: perception results with and without collaboration from a random subset of teammates are compared until reaching a consensus. In such a framework, more teammates in the sampled subset often entail better perception performance but require longer sampling time to reject potential attackers. Thus, we derive how many sampling trials are needed to ensure the desired size of an attacker-free subset, or equivalently, the maximum size of such a subset that we can successfully sample within a given number of trials. We validate our method on the task of collaborative 3D object detection in autonomous driving scenarios.

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

  • Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09551
  • Pdf link: https://arxiv.org/pdf/2303.09551
  • Abstract
    3D scene understanding plays a vital role in vision-based autonomous driving. While most existing methods focus on 3D object detection, they have difficulty describing real-world objects of arbitrary shapes and infinite classes. Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial 2D-3D attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To obtain dense occupancy prediction, we design a pipeline to generate dense occupancy ground truth without expansive occupancy annotations. Specifically, we fuse multi-frame LiDAR scans of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense occupancy labels. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our method. Code and dataset are available at https://github.com/weiyithu/SurroundOcc

Keyword: voxel

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

  • Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09551
  • Pdf link: https://arxiv.org/pdf/2303.09551
  • Abstract
    3D scene understanding plays a vital role in vision-based autonomous driving. While most existing methods focus on 3D object detection, they have difficulty describing real-world objects of arbitrary shapes and infinite classes. Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial 2D-3D attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To obtain dense occupancy prediction, we design a pipeline to generate dense occupancy ground truth without expansive occupancy annotations. Specifically, we fuse multi-frame LiDAR scans of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense occupancy labels. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our method. Code and dataset are available at https://github.com/weiyithu/SurroundOcc

Keyword: lidar

SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments

  • Authors: Yudi Dai (1), Yitai Lin (1), Xiping Lin (2), Chenglu Wen (1), Lan Xu (2), Hongwei Yi (3), Siqi Shen (1), Yuexin Ma (2), Cheng Wang (1) ((1) Xiamen University, China, (2) ShanghaiTech University, China, (3) Max Planck Institute for Intelligent Systems, Germany)
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09095
  • Pdf link: https://arxiv.org/pdf/2303.09095
  • Abstract
    We present SLOPER4D, a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction in the wild. Employing a head-mounted device integrated with a LiDAR and camera, we record 12 human subjects' activities over 10 diverse urban scenes from an egocentric view. Frame-wise annotations for 2D key points, 3D pose parameters, and global translations are provided, together with reconstructed scene point clouds. To obtain accurate 3D ground truth in such large dynamic scenes, we propose a joint optimization method to fit local SMPL meshes to the scene and fine-tune the camera calibration during dynamic motions frame by frame, resulting in plausible and scene-natural 3D human poses. Eventually, SLOPER4D consists of 15 sequences of human motions, each of which has a trajectory length of more than 200 meters (up to 1,300 meters) and covers an area of more than 2,000 $m^2$ (up to 13,000 $m^2$), including more than 100K LiDAR frames, 300k video frames, and 500K IMU-based motion frames. With SLOPER4D, we provide a detailed and thorough analysis of two critical tasks, including camera-based 3D HPE and LiDAR-based 3D HPE in urban environments, and benchmark a new task, GHPE. The in-depth analysis demonstrates SLOPER4D poses significant challenges to existing methods and produces great research opportunities. The dataset and code are released at \url{this http URL}

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

  • Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09551
  • Pdf link: https://arxiv.org/pdf/2303.09551
  • Abstract
    3D scene understanding plays a vital role in vision-based autonomous driving. While most existing methods focus on 3D object detection, they have difficulty describing real-world objects of arbitrary shapes and infinite classes. Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial 2D-3D attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To obtain dense occupancy prediction, we design a pipeline to generate dense occupancy ground truth without expansive occupancy annotations. Specifically, we fuse multi-frame LiDAR scans of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense occupancy labels. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our method. Code and dataset are available at https://github.com/weiyithu/SurroundOcc

New submissions for Fri, 17 Mar 23

Keyword: pruning

There is no result

Keyword: neural\ architecture\ search

There is no result

Keyword: 3d object detection

Among Us: Adversarially Robust Collaborative Perception by Consensus

  • Authors: Yiming Li, Qi Fang, Jiamu Bai, Siheng Chen, Felix Juefei-Xu, Chen Feng
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2303.09495
  • Pdf link: https://arxiv.org/pdf/2303.09495
  • Abstract
    Multiple robots could perceive a scene (e.g., detect objects) collaboratively better than individuals, although easily suffer from adversarial attacks when using deep learning. This could be addressed by the adversarial defense, but its training requires the often-unknown attacking mechanism. Differently, we propose ROBOSAC, a novel sampling-based defense strategy generalizable to unseen attackers. Our key idea is that collaborative perception should lead to consensus rather than dissensus in results compared to individual perception. This leads to our hypothesize-and-verify framework: perception results with and without collaboration from a random subset of teammates are compared until reaching a consensus. In such a framework, more teammates in the sampled subset often entail better perception performance but require longer sampling time to reject potential attackers. Thus, we derive how many sampling trials are needed to ensure the desired size of an attacker-free subset, or equivalently, the maximum size of such a subset that we can successfully sample within a given number of trials. We validate our method on the task of collaborative 3D object detection in autonomous driving scenarios.

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

  • Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09551
  • Pdf link: https://arxiv.org/pdf/2303.09551
  • Abstract
    3D scene understanding plays a vital role in vision-based autonomous driving. While most existing methods focus on 3D object detection, they have difficulty describing real-world objects of arbitrary shapes and infinite classes. Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial 2D-3D attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To obtain dense occupancy prediction, we design a pipeline to generate dense occupancy ground truth without expansive occupancy annotations. Specifically, we fuse multi-frame LiDAR scans of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense occupancy labels. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our method. Code and dataset are available at https://github.com/weiyithu/SurroundOcc

Keyword: voxel

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

  • Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09551
  • Pdf link: https://arxiv.org/pdf/2303.09551
  • Abstract
    3D scene understanding plays a vital role in vision-based autonomous driving. While most existing methods focus on 3D object detection, they have difficulty describing real-world objects of arbitrary shapes and infinite classes. Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial 2D-3D attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To obtain dense occupancy prediction, we design a pipeline to generate dense occupancy ground truth without expansive occupancy annotations. Specifically, we fuse multi-frame LiDAR scans of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense occupancy labels. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our method. Code and dataset are available at https://github.com/weiyithu/SurroundOcc

Keyword: lidar

SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments

  • Authors: Yudi Dai (1), Yitai Lin (1), Xiping Lin (2), Chenglu Wen (1), Lan Xu (2), Hongwei Yi (3), Siqi Shen (1), Yuexin Ma (2), Cheng Wang (1) ((1) Xiamen University, China, (2) ShanghaiTech University, China, (3) Max Planck Institute for Intelligent Systems, Germany)
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09095
  • Pdf link: https://arxiv.org/pdf/2303.09095
  • Abstract
    We present SLOPER4D, a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction in the wild. Employing a head-mounted device integrated with a LiDAR and camera, we record 12 human subjects' activities over 10 diverse urban scenes from an egocentric view. Frame-wise annotations for 2D key points, 3D pose parameters, and global translations are provided, together with reconstructed scene point clouds. To obtain accurate 3D ground truth in such large dynamic scenes, we propose a joint optimization method to fit local SMPL meshes to the scene and fine-tune the camera calibration during dynamic motions frame by frame, resulting in plausible and scene-natural 3D human poses. Eventually, SLOPER4D consists of 15 sequences of human motions, each of which has a trajectory length of more than 200 meters (up to 1,300 meters) and covers an area of more than 2,000 $m^2$ (up to 13,000 $m^2$), including more than 100K LiDAR frames, 300k video frames, and 500K IMU-based motion frames. With SLOPER4D, we provide a detailed and thorough analysis of two critical tasks, including camera-based 3D HPE and LiDAR-based 3D HPE in urban environments, and benchmark a new task, GHPE. The in-depth analysis demonstrates SLOPER4D poses significant challenges to existing methods and produces great research opportunities. The dataset and code are released at \url{this http URL}

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

  • Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09551
  • Pdf link: https://arxiv.org/pdf/2303.09551
  • Abstract
    3D scene understanding plays a vital role in vision-based autonomous driving. While most existing methods focus on 3D object detection, they have difficulty describing real-world objects of arbitrary shapes and infinite classes. Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial 2D-3D attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To obtain dense occupancy prediction, we design a pipeline to generate dense occupancy ground truth without expansive occupancy annotations. Specifically, we fuse multi-frame LiDAR scans of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense occupancy labels. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our method. Code and dataset are available at https://github.com/weiyithu/SurroundOcc

New submissions for Fri, 28 Apr 23

Keyword: efficient

SamurAI: A Versatile IoT Node With Event-Driven Wake-Up and Embedded ML Acceleration

  • Authors: Ivan Miro-Panades (LSTA), Benoit Tain (LECA), Jean-Frederic Christmann (LFIM), David Coriat (LIIM), Romain Lemaire (LIIM), Clement Jany, Baudouin Martineau (DSYS), Fabrice Chaix (DSYS), Guillaume Waltener (DSYS), Emmanuel Pluchart (LSTA), Jean-Philippe Noel (LFIM), Adam Makosiej, Maxime Montoya, Simone Bacles-Min (LIIM), David Briand (LIAE), Jean-Marc Philippe, Yvain Thonnart (LFIM), Alexandre Valentian (LSTA), Frederic Heitzmann (DSYS), Fabien Clermidy (DSCIN)
  • Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13726
  • Pdf link: https://arxiv.org/pdf/2304.13726
  • Abstract
    Increased capabilities such as recognition and self-adaptability are now required from IoT applications. While IoT node power consumption is a major concern for these applications, cloud-based processing is becoming unsustainable due to continuous sensor or image data transmission over the wireless network. Thus optimized ML capabilities and data transfers should be integrated in the IoT node. Moreover, IoT applications are torn between sporadic data-logging and energy-hungry data processing (e.g. image classification). Thus, the versatility of the node is key in addressing this wide diversity of energy and processing needs. This paper presents SamurAI, a versatile IoT node bridging this gap in processing and in energy by leveraging two on-chip sub-systems: a low power, clock-less, event-driven Always-Responsive (AR) part and an energy-efficient On-Demand (OD) part. AR contains a 1.7MOPS event-driven, asynchronous Wake-up Controller (WuC) with a 207ns wake-up time optimized for sporadic computing, while OD combines a deep-sleep RISC-V CPU and 1.3TOPS/W Machine Learning (ML) for more complex tasks up to 36GOPS. This architecture partitioning achieves best in class versatility metrics such as peak performance to idle power ratio. On an applicative classification scenario, it demonstrates system power gains, up to 3.5x compared to cloud-based processing, and thus extended battery lifetime.

A Unified Approach to Lane Change Intention Recognition and Driving Status Prediction through TCN-LSTM and Multi-Task Learning Models

  • Authors: Renteng Yuan, Mohamed Abdel-Aty, Xin Gu, Ou Zheng, Qiaojun Xiang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13732
  • Pdf link: https://arxiv.org/pdf/2304.13732
  • Abstract
    Lane change (LC) is a continuous and complex operation process. Accurately detecting and predicting LC processes can help traffic participants better understand their surrounding environment, recognize potential LC safety hazards, and improve traffic safety. This present paper focuses on LC processes, developing an LC intention recognition (LC-IR) model and an LC status prediction (LC-SP) model. A novel ensemble temporal convolutional network with Long Short-Term Memory units (TCN-LSTM) is first proposed to capture long-range dependencies in sequential data. Then, three multi-task models (MTL-LSTM, MTL-TCN, MTL-TCN -LSTM) are developed to capture the intrinsic relationship among output indicators. Furthermore, a unified modeling framework for LC intention recognition and driving status prediction (LC-IR-SP) is developed. To validate the performance of the proposed models, a total number of 1023 vehicle trajectories is extracted from the CitySim dataset. The Pearson coefficient is employed to determine the related indicators. The results indicate that using150 frames as input length, the TCN-LSTM model with 96.67% accuracy outperforms TCN and LSTM models in LC intention classification and provides more balanced results for each class. Three proposed multi-tasking learning models provide markedly increased performance compared to corresponding single-task models, with an average reduction of 24.24% and 22.86% in the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), respectively. The developed LC-IR-SP model has promising applications for autonomous vehicles to identity lane change behaviors, calculate a real-time traffic conflict index and improve vehicle control strategies.

Surrogate Assisted Generation of Human-Robot Interaction Scenarios

  • Authors: Varun Bhatt, Heramb Nemlekar, Matthew Fontaine, Bryon Tjanaka, Hejia Zhang, Ya-Chuan Hsu, Stefanos Nikolaidis
  • Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13787
  • Pdf link: https://arxiv.org/pdf/2304.13787
  • Abstract
    As human-robot interaction (HRI) systems advance, so does the difficulty of evaluating and understanding the strengths and limitations of these systems in different environments and with different users. To this end, previous methods have algorithmically generated diverse scenarios that reveal system failures in a shared control teleoperation task. However, these methods require directly evaluating generated scenarios by simulating robot policies and human actions. The computational cost of these evaluations limits their applicability in more complex domains. Thus, we propose augmenting scenario generation systems with surrogate models that predict both human and robot behaviors. In the shared control teleoperation domain and a more complex shared workspace collaboration task, we show that surrogate assisted scenario generation efficiently synthesizes diverse datasets of challenging scenarios. We demonstrate that these failures are reproducible in real-world interactions.

A Data-Driven Hybrid Automaton Framework to Modeling Complex Dynamical Systems

  • Authors: Yejiang Yang, Zihao Mo, Weiming Xiang
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.13811
  • Pdf link: https://arxiv.org/pdf/2304.13811
  • Abstract
    In this paper, a computationally efficient data-driven hybrid automaton model is proposed to capture unknown complex dynamical system behaviors using multiple neural networks. The sampled data of the system is divided by valid partitions into groups corresponding to their topologies and based on which, transition guards are defined. Then, a collection of small-scale neural networks that are computationally efficient are trained as the local dynamical description for their corresponding topologies. After modeling the system with a neural-network-based hybrid automaton, the set-valued reachability analysis with low computation cost is provided based on interval analysis and a split and combined process. At last, a numerical example of the limit cycle is presented to illustrate that the developed models can significantly reduce the computational cost in reachable set computation without sacrificing any modeling precision.

Programmatically Grounded, Compositionally Generalizable Robotic Manipulation

  • Authors: Renhao Wang, Jiayuan Mao, Joy Hsu, Hang Zhao, Jiajun Wu, Yang Gao
  • Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.13826
  • Pdf link: https://arxiv.org/pdf/2304.13826
  • Abstract
    Robots operating in the real world require both rich manipulation skills as well as the ability to semantically reason about when to apply those skills. Towards this goal, recent works have integrated semantic representations from large-scale pretrained vision-language (VL) models into manipulation models, imparting them with more general reasoning capabilities. However, we show that the conventional pretraining-finetuning pipeline for integrating such representations entangles the learning of domain-specific action information and domain-general visual information, leading to less data-efficient training and poor generalization to unseen objects and tasks. To this end, we propose ProgramPort, a modular approach to better leverage pretrained VL models by exploiting the syntactic and semantic structures of language instructions. Our framework uses a semantic parser to recover an executable program, composed of functional modules grounded on vision and action across different modalities. Each functional module is realized as a combination of deterministic computation and learnable neural networks. Program execution produces parameters to general manipulation primitives for a robotic end-effector. The entire modular network can be trained with end-to-end imitation learning objectives. Experiments show that our model successfully disentangles action and perception, translating to improved zero-shot and compositional generalization in a variety of manipulation behaviors. Project webpage at: \url{https://progport.github.io}.

Physics-informed Data-driven Discovery of Constitutive Models with Application to Strain-Rate-sensitive Soft Materials

  • Authors: Kshitiz Upadhyay, Jan N. Fuhg, Nikolaos Bouklas, K.T. Ramesh
  • Subjects: Computational Engineering, Finance, and Science (cs.CE); Materials Science (cond-mat.mtrl-sci); Soft Condensed Matter (cond-mat.soft)
  • Arxiv link: https://arxiv.org/abs/2304.13897
  • Pdf link: https://arxiv.org/pdf/2304.13897
  • Abstract
    A novel data-driven constitutive modeling approach is proposed, which combines the physics-informed nature of modeling based on continuum thermodynamics with the benefits of machine learning. This approach is demonstrated on strain-rate-sensitive soft materials. This model is based on the viscous dissipation-based visco-hyperelasticity framework where the total stress is decomposed into volumetric, isochoric hyperelastic, and isochoric viscous overstress contributions. It is shown that each of these stress components can be written as linear combinations of the components of an irreducible integrity basis. Three Gaussian process regression-based surrogate models are trained (one per stress component) between principal invariants of strain and strain rate tensors and the corresponding coefficients of the integrity basis components. It is demonstrated that this type of model construction enforces key physics-based constraints on the predicted responses: the second law of thermodynamics, the principles of local action and determinism, objectivity, the balance of angular momentum, an assumed reference state, isotropy, and limited memory. The three surrogate models that constitute our constitutive model are evaluated by training them on small-size numerically generated data sets corresponding to a single deformation mode and then analyzing their predictions over a much wider testing regime comprising multiple deformation modes. Our physics-informed data-driven constitutive model predictions are compared with the corresponding predictions of classical continuum thermodynamics-based and purely data-driven models. It is shown that our surrogate models can reasonably capture the stress-strain-strain rate responses in both training and testing regimes, and provide improvements in terms of prediction accuracy, generalizability to multiple deformation modes, and compatibility with limited data.

MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results

  • Authors: Qingpeng Zhu, Wenxiu Sun, Yuekun Dai, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Qianhui Sun, Chen Change Loy, Jinwei Gu, Yi Yu, Yangke Huang, Kang Zhang, Meiya Chen, Yu Wang, Yongchao Li, Hao Jiang, Amrit Kumar Muduli, Vikash Kumar, Kunal Swami, Pankaj Kumar Bajpai, Yunchao Ma, Jiajun Xiao, Zhi Ling
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13916
  • Pdf link: https://arxiv.org/pdf/2304.13916
  • Abstract
    Depth completion from RGB images and sparse Time-of-Flight (ToF) measurements is an important problem in computer vision and robotics. While traditional methods for depth completion have relied on stereo vision or structured light techniques, recent advances in deep learning have enabled more accurate and efficient completion of depth maps from RGB images and sparse ToF measurements. To evaluate the performance of different depth completion methods, we organized an RGB+sparse ToF depth completion competition. The competition aimed to encourage research in this area by providing a standardized dataset and evaluation metrics to compare the accuracy of different approaches. In this report, we present the results of the competition and analyze the strengths and weaknesses of the top-performing methods. We also discuss the implications of our findings for future research in RGB+sparse ToF depth completion. We hope that this competition and report will help to advance the state-of-the-art in this important area of research. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2023.

Proportionally Representative Clustering

  • Authors: Haris Aziz, Barton E. Lee, Sean Morota Chu
  • Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2304.13917
  • Pdf link: https://arxiv.org/pdf/2304.13917
  • Abstract
    In recent years, there has been a surge in effort to formalize notions of fairness in machine learning. We focus on clustering -- one of the fundamental tasks in unsupervised machine learning. We propose a new axiom that captures proportional representation fairness (PRF). We make a case that the concept achieves the raison d'{^{e}}tre of several existing concepts in the literature in an arguably more convincing manner. Our fairness concept is not satisfied by existing fair clustering algorithms. We design efficient algorithms to achieve PRF both for unconstrained and discrete clustering problems.

SkinSAM: Empowering Skin Cancer Segmentation with Segment Anything Model

  • Authors: Mingzhe Hu, Yuheng Li, Xiaofeng Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13973
  • Pdf link: https://arxiv.org/pdf/2304.13973
  • Abstract
    Skin cancer is a prevalent and potentially fatal disease that requires accurate and efficient diagnosis and treatment. Although manual tracing is the current standard in clinics, automated tools are desired to reduce human labor and improve accuracy. However, developing such tools is challenging due to the highly variable appearance of skin cancers and complex objects in the background. In this paper, we present SkinSAM, a fine-tuned model based on the Segment Anything Model that showed outstanding segmentation performance. The models are validated on HAM10000 dataset which includes 10015 dermatoscopic images. While larger models (ViT_L, ViT_H) performed better than the smaller one (ViT_b), the finetuned model (ViT_b_finetuned) exhibited the greatest improvement, with a Mean pixel accuracy of 0.945, Mean dice score of 0.8879, and Mean IoU score of 0.7843. Among the lesion types, vascular lesions showed the best segmentation results. Our research demonstrates the great potential of adapting SAM to medical image segmentation tasks.

An FPTAS for Budgeted Laminar Matroid Independent Set

  • Authors: Ilan Doron-Arad, Ariel Kulik, Hadas Shachnai
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.13984
  • Pdf link: https://arxiv.org/pdf/2304.13984
  • Abstract
    We study the budgeted laminar matroid independent set problem. The input is a ground set, where each element has a cost and a non-negative profit, along with a laminar matroid over the elements and a budget. The goal is to select a maximum profit independent set of the matroid whose total cost is bounded by the budget. Several well known special cases, where we have, e.g., no matroid constraint (the classic knapsack problem) or a uniform matroid constraint (knapsack with a cardinality constraint), admit a fully polynomial-time approximation scheme (FPTAS). In contrast, the budgeted matroid independent set (BMI) problem with a general matroid has an efficient polynomial-time approximation scheme (EPTAS) but does not admit an FPTAS. This implies an EPTAS for our problem, which is the best known result prior to this work. We present an FPTAS for budgeted laminar matroid independent set, improving the previous EPTAS for this matroid family and generalizing the FPTAS known for knapsack with a cardinality constraint and multiple-choice knapsack. Our scheme is based on a simple dynamic program which utilizes the tree-like structure of laminar matroids.

Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification

  • Authors: Thanh-Tung Nguyen, Viktor Schlegel, Abhinav Kashyap, Stefan Winkler, Shao-Syuan Huang, Jie-Jyun Liu, Chih-Jen Lin
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13998
  • Pdf link: https://arxiv.org/pdf/2304.13998
  • Abstract
    Clinical notes are assigned ICD codes - sets of codes for diagnoses and procedures. In the recent years, predictive machine learning models have been built for automatic ICD coding. However, there is a lack of widely accepted benchmarks for automated ICD coding models based on large-scale public EHR data. This paper proposes a public benchmark suite for ICD-10 coding using a large EHR dataset derived from MIMIC-IV, the most recent public EHR dataset. We implement and compare several popular methods for ICD coding prediction tasks to standardize data preprocessing and establish a comprehensive ICD coding benchmark dataset. This approach fosters reproducibility and model comparison, accelerating progress toward employing automated ICD coding in future studies. Furthermore, we create a new ICD-9 benchmark using MIMIC-IV data, providing more data points and a higher number of ICD codes than MIMIC-III. Our open-source code offers easy access to data processing steps, benchmark creation, and experiment replication for those with MIMIC-IV access, providing insights, guidance, and protocols to efficiently develop ICD coding models.

A Supervised Machine Learning Approach to Operator Intent Recognition for Teleoperated Mobile Robot Navigation

  • Authors: Evangelos Tsagkournis, Dimitris Panagopoulos, Giannis Petousakis, Grigoris Nikolaou, Rustam Stolkin, Manolis Chiou
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14003
  • Pdf link: https://arxiv.org/pdf/2304.14003
  • Abstract
    In applications that involve human-robot interaction (HRI), human-robot teaming (HRT), and cooperative human-machine systems, the inference of the human partner's intent is of critical importance. This paper presents a method for the inference of the human operator's navigational intent, in the context of mobile robots that provide full or partial (e.g., shared control) teleoperation. We propose the Machine Learning Operator Intent Inference (MLOII) method, which a) processes spatial data collected by the robot's sensors; b) utilizes a supervised machine learning algorithm to estimate the operator's most probable navigational goal online. The proposed method's ability to reliably and efficiently infer the intent of the human operator is experimentally evaluated in realistically simulated exploration and remote inspection scenarios. The results in terms of accuracy and uncertainty indicate that the proposed method is comparable to another state-of-the-art method found in the literature.

Diagonalization Based Parallel-in-Time Method for a Class of Fourth Order Time Dependent PDEs

  • Authors: Gobinda Garai, Bankim C. Mandal
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14021
  • Pdf link: https://arxiv.org/pdf/2304.14021
  • Abstract
    In this paper, we design, analyze and implement efficient time parallel method for a class of fourth order time-dependent partial differential equations (PDEs), namely biharmonic heat equation, linearized Cahn-Hilliard (CH) equation and the nonlinear CH equation. We use diagonalization technique on all-at-once system to develop efficient iterative time parallel methods for investigating the solution behaviour of said equations. We present the convergence analysis of Parallel-in-Time (PinT) algorithms. We verify our findings by presenting numerical results.

Attacks on Robust Distributed Learning Schemes via Sensitivity Curve Maximization

  • Authors: Christian A. Schroth, Stefan Vlaski, Abdelhak M. Zoubir
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.14024
  • Pdf link: https://arxiv.org/pdf/2304.14024
  • Abstract
    Distributed learning paradigms, such as federated or decentralized learning, allow a collection of agents to solve global learning and optimization problems through limited local interactions. Most such strategies rely on a mixture of local adaptation and aggregation steps, either among peers or at a central fusion center. Classically, aggregation in distributed learning is based on averaging, which is statistically efficient, but susceptible to attacks by even a small number of malicious agents. This observation has motivated a number of recent works, which develop robust aggregation schemes by employing robust variations of the mean. We present a new attack based on sensitivity curve maximization (SCM), and demonstrate that it is able to disrupt existing robust aggregation schemes by injecting small, but effective perturbations.

COSST: Multi-organ Segmentation with Partially Labeled Datasets Using Comprehensive Supervisions and Self-training

  • Authors: Han Liu, Zhoubing Xu, Riqiang Gao, Hao Li, Jianing Wang, Guillaume Chabin, Ipek Oguz, Sasa Grbic
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14030
  • Pdf link: https://arxiv.org/pdf/2304.14030
  • Abstract
    Deep learning models have demonstrated remarkable success in multi-organ segmentation but typically require large-scale datasets with all organs of interest annotated. However, medical image datasets are often low in sample size and only partially labeled, i.e., only a subset of organs are annotated. Therefore, it is crucial to investigate how to learn a unified model on the available partially labeled datasets to leverage their synergistic potential. In this paper, we empirically and systematically study the partial-label segmentation with in-depth analyses on the existing approaches and identify three distinct types of supervision signals, including two signals derived from ground truth and one from pseudo label. We propose a novel training framework termed COSST, which effectively and efficiently integrates comprehensive supervision signals with self-training. Concretely, we first train an initial unified model using two ground truth-based signals and then iteratively incorporate the pseudo label signal to the initial model using self-training. To mitigate performance degradation caused by unreliable pseudo labels, we assess the reliability of pseudo labels via outlier detection in latent space and exclude the most unreliable pseudo labels from each self-training iteration. Extensive experiments are conducted on six CT datasets for three partial-label segmentation tasks. Experimental results show that our proposed COSST achieves significant improvement over the baseline method, i.e., individual networks trained on each partially labeled dataset. Compared to the state-of-the-art partial-label segmentation methods, COSST demonstrates consistent superior performance on various segmentation tasks and with different training data size.

A Parameterized Theory of PAC Learning

  • Authors: Cornelius Brand, Robert Ganian, Kirill Simonov
  • Subjects: Computational Complexity (cs.CC); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14058
  • Pdf link: https://arxiv.org/pdf/2304.14058
  • Abstract
    Probably Approximately Correct (i.e., PAC) learning is a core concept of sample complexity theory, and efficient PAC learnability is often seen as a natural counterpart to the class P in classical computational complexity. But while the nascent theory of parameterized complexity has allowed us to push beyond the P-NP ``dichotomy'' in classical computational complexity and identify the exact boundaries of tractability for numerous problems, there is no analogue in the domain of sample complexity that could push beyond efficient PAC learnability. As our core contribution, we fill this gap by developing a theory of parameterized PAC learning which allows us to shed new light on several recent PAC learning results that incorporated elements of parameterized complexity. Within the theory, we identify not one but two notions of fixed-parameter learnability that both form distinct counterparts to the class FPT -- the core concept at the center of the parameterized complexity paradigm -- and develop the machinery required to exclude fixed-parameter learnability. We then showcase the applications of this theory to identify refined boundaries of tractability for CNF and DNF learning as well as for a range of learning problems on graphs.

Fourier-Gegenbauer Pseudospectral Method for Solving Time-Dependent One-Dimensional Fractional Partial Differential Equations with Variable Coefficients and Periodic Solutions

  • Authors: Kareem T. Elgindy
  • Subjects: Numerical Analysis (math.NA); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.14061
  • Pdf link: https://arxiv.org/pdf/2304.14061
  • Abstract
    In this paper, we present a novel pseudospectral (PS) method for solving a new class of initial-value problems (IVPs) of time-dependent one-dimensional fractional partial differential equations (FPDEs) with variable coefficients and periodic solutions. A main ingredient of our work is the use of the recently developed periodic RL/Caputo fractional derivative (FD) operators with sliding positive fixed memory length of Bourafa et al. [1] or their reduced forms obtained by Elgindy [2] as the natural FD operators to accurately model FPDEs with periodic solutions. The proposed method converts the IVP into a well-conditioned linear system of equations using the PS method based on Fourier collocations and Gegenbauer quadratures. The reduced linear system has a simple special structure and can be solved accurately and rapidly by using standard linear system solvers. A rigorous study of the error and convergence of the proposed method is presented. The idea and results presented in this paper are expected to be useful in the future to address more general problems involving FPDEs with periodic solutions.

Lightweight, Pre-trained Transformers for Remote Sensing Timeseries

  • Authors: Gabriel Tseng, Ivan Zvonkov, Mirali Purohit, David Rolnick, Hannah Kerner
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14065
  • Pdf link: https://arxiv.org/pdf/2304.14065
  • Abstract
    Machine learning algorithms for parsing remote sensing data have a wide range of societally relevant applications, but labels used to train these algorithms can be difficult or impossible to acquire. This challenge has spurred research into self-supervised learning for remote sensing data aiming to unlock the use of machine learning in geographies or application domains where labelled datasets are small. Current self-supervised learning approaches for remote sensing data draw significant inspiration from techniques applied to natural images. However, remote sensing data has important differences from natural images -- for example, the temporal dimension is critical for many tasks and data is collected from many complementary sensors. We show that designing models and self-supervised training techniques specifically for remote sensing data results in both smaller and more performant models. We introduce the Pretrained Remote Sensing Transformer (Presto), a transformer-based model pre-trained on remote sensing pixel-timeseries data. Presto excels at a wide variety of globally distributed remote sensing tasks and outperforms much larger models. Presto can be used for transfer learning or as a feature extractor for simple models, enabling efficient deployment at scale.

Linear and Nonlinear Parareal Methods for the Cahn-Hilliard Equation

  • Authors: Gobinda Garai, Bankim C. Mandal
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14074
  • Pdf link: https://arxiv.org/pdf/2304.14074
  • Abstract
    In this paper, we propose, analyze and implement efficient time parallel methods for the Cahn-Hilliard (CH) equation. It is of great importance to develop efficient numerical methods for the CH equation, given the range of applicability of the CH equation has. The CH equation generally needs to be simulated for a very long time to get the solution of phase coarsening stage. Therefore it is desirable to accelerate the computation using parallel method in time. We present linear and nonlinear Parareal methods for the CH equation depending on the choice of fine approximation. We illustrate our results by numerical experiments.

Lowering the Entry Bar to HPC-Scale Uncertainty Quantification

  • Authors: Linus Seelinger, Anne Reinarz, Jean Benezech, Mikkel Bue Lykkegaard, Lorenzo Tamellini, Robert Scheichl
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14087
  • Pdf link: https://arxiv.org/pdf/2304.14087
  • Abstract
    Treating uncertainties in models is essential in many fields of science and engineering. Uncertainty quantification (UQ) on complex and computationally costly numerical models necessitates a combination of efficient model solvers, advanced UQ methods and HPC-scale resources. The resulting technical complexities as well as lack of separation of concerns between UQ and model experts is holding back many interesting UQ applications. The aim of this paper is to close the gap between advanced UQ methods and advanced models by removing the hurdle of complex software stack integration, which in turn will offer a straightforward way to scale even prototype-grade UQ applications to high-performance resources. We achieve this goal by introducing a parallel software architecture based on UM-Bridge, a universal interface for linking UQ and models. We present three realistic applications from different areas of science and engineering, scaling from single machines to large clusters on the Google Cloud Platform.

Securing Autonomous Air Traffic Management: Blockchain Networks Driven by Explainable AI

  • Authors: Louise Axon, Dimitrios Panagiotakopoulos, Samuel Ayo, Carolina Sanchez-Hernandez, Yan Zong, Simon Brown, Lei Zhang, Michael Goldsmith, Sadie Creese, Weisi Guo
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.14095
  • Pdf link: https://arxiv.org/pdf/2304.14095
  • Abstract
    Air Traffic Management data systems today are inefficient and not scalable to enable future unmanned systems. Current data is fragmented, siloed, and not easily accessible. There is data conflict, misuse, and eroding levels of trust in provenance and accuracy. With increased autonomy in aviation, Artificially Intelligent (AI) enabled unmanned traffic management (UTM) will be more reliant on secure data from diverse stakeholders. There is an urgent need to develop a secure network that has trustworthy data chains and works with the requirements generated by UTM. Here, we review existing research in 3 key interconnected areas: (1) blockchain development for secure data transfer between competing aviation stakeholders, (2) self-learning networking architectures that distribute consensus to achieve secure air traffic control, (3) explainable AI to build trust with human stakeholders and backpropagate requirements for blockchain and network optimisation. When connected together, this new digital ecosystem blueprint is tailored for safety critical UTM sectors. We motivate the readers with a case study, where a federated learning UTM uses real air traffic and weather data is secured and explained to human operators. This emerging area still requires significant research and development by the community to ensure it can enable future autonomous air mobility.

Learning Neural PDE Solvers with Parameter-Guided Channel Attention

  • Authors: Makoto Takamoto, Francesco Alesiani, Mathias Niepert
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn); Geophysics (physics.geo-ph)
  • Arxiv link: https://arxiv.org/abs/2304.14118
  • Pdf link: https://arxiv.org/pdf/2304.14118
  • Abstract
    Scientific Machine Learning (SciML) is concerned with the development of learned emulators of physical systems governed by partial differential equations (PDE). In application domains such as weather forecasting, molecular dynamics, and inverse design, ML-based surrogate models are increasingly used to augment or replace inefficient and often non-differentiable numerical simulation algorithms. While a number of ML-based methods for approximating the solutions of PDEs have been proposed in recent years, they typically do not adapt to the parameters of the PDEs, making it difficult to generalize to PDE parameters not seen during training. We propose a Channel Attention mechanism guided by PDE Parameter Embeddings (CAPE) component for neural surrogate models and a simple yet effective curriculum learning strategy. The CAPE module can be combined with neural PDE solvers allowing them to adapt to unseen PDE parameters. The curriculum learning strategy provides a seamless transition between teacher-forcing and fully auto-regressive training. We compare CAPE in conjunction with the curriculum learning strategy using a popular PDE benchmark and obtain consistent and significant improvements over the baseline models. The experiments also show several advantages of CAPE, such as its increased ability to generalize to unseen PDE parameters without large increases inference time and parameter count.

Exploiting Inductive Bias in Transformer for Point Cloud Classification and Segmentation

  • Authors: Zihao Li, Pan Gao, Hui Yuan, Ran Wei, Manoranjan Paul
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14124
  • Pdf link: https://arxiv.org/pdf/2304.14124
  • Abstract
    Discovering inter-point connection for efficient high-dimensional feature extraction from point coordinate is a key challenge in processing point cloud. Most existing methods focus on designing efficient local feature extractors while ignoring global connection, or vice versa. In this paper, we design a new Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point relations, which considers both local and global attentions. Specifically, considering local spatial coherence, local feature learning is performed through Relative Position Encoding and Attentive Feature Pooling. We incorporate the learned locality into the Transformer module. The local feature affects value component in Transformer to modulate the relationship between channels of each point, which can enhance self-attention mechanism with locality based channel interaction. We demonstrate its superiority experimentally on classification and segmentation tasks. The code is available at: https://github.com/jiamang/IBT

Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point Clouds

  • Authors: Pengfei Song, Luoyu MEI, Han Cheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); General Topology (math.GN)
  • Arxiv link: https://arxiv.org/abs/2304.14132
  • Pdf link: https://arxiv.org/pdf/2304.14132
  • Abstract
    This paper presents a framework for semantic segmentation on sparse sequential point clouds of millimeter-wave radar. Compared with cameras and lidars, millimeter-wave radars have the advantage of not revealing privacy, having a strong anti-interference ability, and having long detection distance. The sparsity and capturing temporal-topological features of mmWave data is still a problem. However, the issue of capturing the temporal-topological coupling features under the human semantic segmentation task prevents previous advanced segmentation methods (e.g PointNet, PointCNN, Point Transformer) from being well utilized in practical scenarios. To address the challenge caused by the sparsity and temporal-topological feature of the data, we (i) introduce graph structure and topological features to the point cloud, (ii) propose a semantic segmentation framework including a global feature-extracting module and a sequential feature-extracting module. In addition, we design an efficient and more fitting loss function for a better training process and segmentation results based on graph clustering. Experimentally, we deploy representative semantic segmentation algorithms (Transformer, GCNN, etc.) on a custom dataset. Experimental results indicate that our model achieves mean accuracy on the custom dataset by $\mathbf{82.31}%$ and outperforms the state-of-the-art algorithms. Moreover, to validate the model's robustness, we deploy our model on the well-known S3DIS dataset. On the S3DIS dataset, our model achieves mean accuracy by $\mathbf{92.6}%$, outperforming baseline algorithms.

Multiplicity Problems on Algebraic Series and Context-Free Grammars

  • Authors: Nikhil Balaji, Lorenzo Clemente, Klara Nosan, Mahsa Shirmohammadi, James Worrell
  • Subjects: Formal Languages and Automata Theory (cs.FL); Computational Complexity (cs.CC)
  • Arxiv link: https://arxiv.org/abs/2304.14145
  • Pdf link: https://arxiv.org/pdf/2304.14145
  • Abstract
    In this paper we obtain complexity bounds for computational problems on algebraic power series over several commuting variables. The power series are specified by systems of polynomial equations: a formalism closely related to weighted context-free grammars. We focus on three problems -- decide whether a given algebraic series is identically zero, determine whether all but finitely many coefficients are zero, and compute the coefficient of a specific monomial. We relate these questions to well-known computational problems on arithmetic circuits and thereby show that all three problems lie in the counting hierarchy. Our main result improves the best known complexity bound on deciding zeroness of an algebraic series. This problem is known to lie in PSPACE by reduction to the decision problem for the existential fragment of the theory of real closed fields. Here we show that the problem lies in the counting hierarchy by reduction to the problem of computing the degree of a polynomial given by an arithmetic circuit. As a corollary we obtain new complexity bounds on multiplicity equivalence of context-free grammars restricted to a bounded language, language inclusion of a nondeterministic finite automaton in an unambiguous context-free grammar, and language inclusion of a non-deterministic context-free grammar in an unambiguous finite automaton.

Tractability of sampling recovery on unweighted function classes

  • Authors: David Krieg
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14169
  • Pdf link: https://arxiv.org/pdf/2304.14169
  • Abstract
    It is well-known that the problem of sampling recovery in the $L_2$-norm on unweighted Korobov spaces (Sobolev spaces with mixed smoothness) as well as classical smoothness classes such as H"older classes suffers from the curse of dimensionality. We show that the problem is tractable for those classes if they are intersected with the Wiener algebra of functions with summable Fourier coefficients. In fact, this is a relatively simple implication of powerful results by Rauhut and Ward [Appl. Comput. Harmon. Anal. 40 (2016), pp. 321--351]. Tractability is achieved by the use of non-linear algorithms, while linear algorithms cannot do the job.

The Mutual Information In The Vicinity of Capacity-Achieving Input Distributions

  • Authors: Hao-Chung Cheng, Barış Nakiboğlu
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2304.14219
  • Pdf link: https://arxiv.org/pdf/2304.14219
  • Abstract
    The mutual information is analyzed as a function of the input distribution using an identity due to Tops\o{e} for channels with (possibly multiple) linear cost constraints and finite input and output sets. The mutual information is bounded above by a function decreasing quadratically with the distance to the set of all capacity-achieving input distributions for the case when the distance is less than a certain threshold. The closed-form expressions for the threshold and the coefficient of the quadratic decrease are derived. A counter-example demonstrating the non-existence of such a quadratic bound in the case of infinitely many linear cost constraints is provided. Implications of these observations for the channel coding problem and applications of the proof technique to related problems are discussed.

Developing Distributed High-performance Computing Capabilities of an Open Science Platform for Robust Epidemic Analysis

  • Authors: Nicholson Collier, Justin M. Wozniak, Abby Stevens, Yadu Babuji, Mickaël Binois, Ardindam Fadikar, Alexandra Würth, Kyle Chard, Jonathan Ozik
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.14244
  • Pdf link: https://arxiv.org/pdf/2304.14244
  • Abstract
    COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among domain experts, mathematical modelers, and scientific computing specialists. Computationally, however, it also revealed critical gaps in the ability of researchers to exploit advanced computing systems. These challenging areas include gaining access to scalable computing systems, porting models and workflows to new systems, sharing data of varying sizes, and producing results that can be reproduced and validated by others. Informed by our team's work in supporting public health decision makers during the COVID-19 pandemic and by the identified capability gaps in applying high-performance computing (HPC) to the modeling of complex social systems, we present the goals, requirements, and initial implementation of OSPREY, an open science platform for robust epidemic analysis. The prototype implementation demonstrates an integrated, algorithm-driven HPC workflow architecture, coordinating tasks across federated HPC resources, with robust, secure and automated access to each of the resources. We demonstrate scalable and fault-tolerant task execution, an asynchronous API to support fast time-to-solution algorithms, an inclusive, multi-language approach, and efficient wide-area data management. The example OSPREY code is made available on a public repository.

Evaluating the Impact of Pair Documentation on Requirements Quality and Team Productivity

  • Authors: Nosheen Qamar, Nosheen Sabahat, Amir Mashmool, Amir Mosavi
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.14255
  • Pdf link: https://arxiv.org/pdf/2304.14255
  • Abstract
    The most important deliverable of the requirements engineering process is the software requirements specification(SRS)document. Requirements documentation is important during the complete software development lifecycle to share the vision and effective communication between major stakeholders. The Standish Group reported that the top factors behind project failures are related to requirements. By giving the right level of attention to key requirements good quality software can be produced. Therefore, more research is needed in this area and this study is trying to fill this gap. This empirical study aims to examine the importance of pair documentation. Unconventional documentation refers to the approach when two persons work on the same document's requirements collaboratively just like pair programming on the requirements quality and team productivity. Twenty pairs of documentation writers worked into two groups. one group using pair documentation, i.e., the experimental group, and the other one using conventional documentation, i.e., the control group. the resultant requirement's documents for the same project, produced by both groups were then compared. It is observed that there is a significant improvement in the quality and productivity of the experimental group using pair documentation. The findings of this study may assist requirement engineers in forming efficient teams that can create high-quality SRS documents.

A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving Services

  • Authors: Dewant Katare, Diego Perino, Jari Nurmi, Martijn Warnier, Marijn Janssen, Aaron Yi Ding
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14271
  • Pdf link: https://arxiv.org/pdf/2304.14271
  • Abstract
    Autonomous driving services rely heavily on sensors such as cameras, LiDAR, radar, and communication modules. A common practice of processing the sensed data is using a high-performance computing unit placed inside the vehicle, which deploys AI models and algorithms to act as the brain or administrator of the vehicle. The vehicular data generated from average hours of driving can be up to 20 Terabytes depending on the data rate and specification of the sensors. Given the scale and fast growth of services for autonomous driving, it is essential to improve the overall energy and environmental efficiency, especially in the trend towards vehicular electrification (e.g., battery-powered). Although the areas have seen significant advancements in sensor technologies, wireless communications, computing and AI/ML algorithms, the challenge still exists in how to apply and integrate those technology innovations to achieve energy efficiency. This survey reviews and compares the connected vehicular applications, vehicular communications, approximation and Edge AI techniques. The focus is on energy efficiency by covering newly proposed approximation and enabling frameworks. To the best of our knowledge, this survey is the first to review the latest approximate Edge AI frameworks and publicly available datasets in energy-efficient autonomous driving. The insights and vision from this survey can be beneficial for the collaborative driving service development on low-power and memory-constrained systems and also for the energy optimization of autonomous vehicles.

On Solution Discovery via Reconfiguration

  • Authors: Michael R. Fellows, Mario Grobler, Nicole Megow, Amer E. Mouawad, Vijayaragunathan Ramamoorthi, Frances A. Rosamond, Daniel Schmand, Sebastian Siebertz
  • Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.14295
  • Pdf link: https://arxiv.org/pdf/2304.14295
  • Abstract
    The dynamics of real-world applications and systems require efficient methods for improving infeasible solutions or restoring corrupted ones by making modifications to the current state of a system in a restricted way. We propose a new framework of solution discovery via reconfiguration for constructing a feasible solution for a given problem by executing a sequence of small modifications starting from a given state. Our framework integrates and formalizes different aspects of classical local search, reoptimization, and combinatorial reconfiguration. We exemplify our framework on a multitude of fundamental combinatorial problems, namely Vertex Cover, Independent Set, Dominating Set, and Coloring. We study the classical as well as the parameterized complexity of the solution discovery variants of those problems and explore the boundary between tractable and intractable instances.

Incremental Generalized Category Discovery

  • Authors: Bingchen Zhao, Oisin Mac Aodha
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14310
  • Pdf link: https://arxiv.org/pdf/2304.14310
  • Abstract
    We explore the problem of Incremental Generalized Category Discovery (IGCD). This is a challenging category incremental learning setting where the goal is to develop models that can correctly categorize images from previously seen categories, in addition to discovering novel ones. Learning is performed over a series of time steps where the model obtains new labeled and unlabeled data, and discards old data, at each iteration. The difficulty of the problem is compounded in our generalized setting as the unlabeled data can contain images from categories that may or may not have been observed before. We present a new method for IGCD which combines non-parametric categorization with efficient image sampling to mitigate catastrophic forgetting. To quantify performance, we propose a new benchmark dataset named iNatIGCD that is motivated by a real-world fine-grained visual categorization task. In our experiments we outperform existing related methods

Empirical Individual State Observability

  • Authors: Benjamin Cellini, Burak Boyacıoğlu, Floris van Breugel
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.14313
  • Pdf link: https://arxiv.org/pdf/2304.14313
  • Abstract
    A dynamical system is observable if there is a one-to-one mapping from the system's measured outputs and inputs to all of the system's states. Analytical and empirical tools exist for quantifying the (full state) observability of linear and nonlinear systems; however, empirical tools for evaluating the observability of individual state variables are lacking. Here, a new empirical approach termed Empirical Individual State Observability (E-ISO) is developed to quantify the level of observability of individual state variables. E-ISO first builds an empirical observability matrix via simulation, then applies convex optimization to efficiently determine the subset of its rows required to estimate each state variable individually. Finally, (un)observability measures for these subsets are calculated to provide independent estimates of the observability of each state variable. Multiple example applications of E-ISO on linear and nonlinear systems are shown to be consistent with analytical results. Broadly, E-ISO will be an invaluable tool both for designing active sensing control laws or optimizing sensor placement to increase the observability of individual state variables for engineered systems, and analyzing the trajectory decisions made by organisms.

SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

  • Authors: Yichen Xie, Chenfeng Xu, Marie-Julie Rakotosaona, Patrick Rim, Federico Tombari, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14340
  • Pdf link: https://arxiv.org/pdf/2304.14340
  • Abstract
    By identifying four important components of existing LiDAR-camera 3D object detection methods (LiDAR and camera candidates, transformation, and fusion outputs), we observe that all existing methods either find dense candidates or yield dense representations of scenes. However, given that objects occupy only a small part of a scene, finding dense candidates and generating dense representations is noisy and inefficient. We propose SparseFusion, a novel multi-sensor 3D detection method that exclusively uses sparse candidates and sparse representations. Specifically, SparseFusion utilizes the outputs of parallel detectors in the LiDAR and camera modalities as sparse candidates for fusion. We transform the camera candidates into the LiDAR coordinate space by disentangling the object representations. Then, we can fuse the multi-modality candidates in a unified 3D space by a lightweight self-attention module. To mitigate negative transfer between modalities, we propose novel semantic and geometric cross-modality transfer modules that are applied prior to the modality-specific detectors. SparseFusion achieves state-of-the-art performance on the nuScenes benchmark while also running at the fastest speed, even outperforming methods with stronger backbones. We perform extensive experiments to demonstrate the effectiveness and efficiency of our modules and overall method pipeline. Our code will be made publicly available at https://github.com/yichen928/SparseFusion.

$π$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

  • Authors: Chengyue Wu, Teng Wang, Yixiao Ge, Zeyu Lu, Ruisong Zhou, Ping Luo, Ying Shan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14381
  • Pdf link: https://arxiv.org/pdf/2304.14381
  • Abstract
    Foundation models have achieved great advances in multi-task learning with a unified interface of unimodal and multimodal tasks. However, the potential of such multi-task learners has not been exploited during transfer learning. In this work, we present a universal parameter-efficient transfer learning method, termed Predict-Interpolate Tuning ($\pi$-Tuning), for vision, language, and vision-language tasks. It aggregates the parameters of lightweight task-specific experts learned from similar tasks to aid the target downstream task. The task similarities are predicted in a unified modality-independent space, yielding a scalable graph to demonstrate task relationships. $\pi$-Tuning has several appealing benefits. First, it flexibly explores both intra- and inter-modal transferability between similar tasks to improve the accuracy and robustness of transfer learning, especially in data-scarce scenarios. Second, it offers a systematical solution for transfer learning with multi-task prediction-and-then-interpolation, compatible with diverse types of parameter-efficient experts, such as prompt and adapter. Third, an extensive study of task-level mutual benefits on 14 unimodal and 6 multimodal datasets shows that $\pi$-Tuning surpasses fine-tuning and other parameter-efficient transfer learning methods both in full-shot and low-shot regimes. The task graph also enables an in-depth interpretable analysis of task transferability across modalities.

Dynamic Pricing and Learning with Bayesian Persuasion

  • Authors: Shipra Agrawal, Yiding Feng, Wei Tang
  • Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14385
  • Pdf link: https://arxiv.org/pdf/2304.14385
  • Abstract
    We consider a novel dynamic pricing and learning setting where in addition to setting prices of products in sequential rounds, the seller also ex-ante commits to 'advertising schemes'. That is, in the beginning of each round the seller can decide what kind of signal they will provide to the buyer about the product's quality upon realization. Using the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses, we formulate the problem of finding an optimal design of the advertising scheme along with a pricing scheme that maximizes the seller's expected revenue. Without any apriori knowledge of the buyers' demand function, our goal is to design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy. We study the regret of the algorithm when compared to the optimal clairvoyant price and advertising scheme. Our main result is a computationally efficient online algorithm that achieves an $O(T^{2/3}(m\log T)^{1/3})$ regret bound when the valuation function is linear in the product quality. Here $m$ is the cardinality of the discrete product quality domain and $T$ is the time horizon. This result requires some natural monotonicity and Lipschitz assumptions on the valuation function, but no Lipschitz or smoothness assumption on the buyers' demand function. For constant $m$, our result matches the regret lower bound for dynamic pricing within logarithmic factors, which is a special case of our problem. We also obtain several improved results for the widely considered special case of additive valuations, including an $\tilde{O}(T^{2/3})$ regret bound independent of $m$ when $m\le T^{1/3}$.

string2string: A Modern Python Library for String-to-String Algorithms

  • Authors: Mirac Suzgun, Stuart M. Shieber, Dan Jurafsky
  • Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL)
  • Arxiv link: https://arxiv.org/abs/2304.14395
  • Pdf link: https://arxiv.org/pdf/2304.14395
  • Abstract
    We introduce string2string, an open-source library that offers a comprehensive suite of efficient algorithms for a broad range of string-to-string problems. It includes traditional algorithmic solutions as well as recent advanced neural approaches to tackle various problems in string alignment, distance measurement, lexical and semantic search, and similarity analysis -- along with several helpful visualization tools and metrics to facilitate the interpretation and analysis of these methods. Notable algorithms featured in the library include the Smith-Waterman algorithm for pairwise local alignment, the Hirschberg algorithm for global alignment, the Wagner-Fisher algorithm for edit distance, BARTScore and BERTScore for similarity analysis, the Knuth-Morris-Pratt algorithm for lexical search, and Faiss for semantic search. Besides, it wraps existing efficient and widely-used implementations of certain frameworks and metrics, such as sacreBLEU and ROUGE, whenever it is appropriate and suitable. Overall, the library aims to provide extensive coverage and increased flexibility in comparison to existing libraries for strings. It can be used for many downstream applications, tasks, and problems in natural-language processing, bioinformatics, and computational social sciences. It is implemented in Python, easily installable via pip, and accessible through a simple API. Source code, documentation, and tutorials are all available on our GitHub page: https://github.com/stanfordnlp/string2string.

Maximizing Model Generalization for Manufacturing with Self-Supervised Learning and Federated Learning

  • Authors: Matthew Russell, Peng Wang
  • Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.14398
  • Pdf link: https://arxiv.org/pdf/2304.14398
  • Abstract
    Deep Learning (DL) can diagnose faults and assess machine health from raw condition monitoring data without manually designed statistical features. However, practical manufacturing applications remain extremely difficult for existing DL methods. Machine data is often unlabeled and from very few health conditions (e.g., only normal operating data). Furthermore, models often encounter shifts in domain as process parameters change and new categories of faults emerge. Traditional supervised learning may struggle to learn compact, discriminative representations that generalize to these unseen target domains since it depends on having plentiful classes to partition the feature space with decision boundaries. Transfer Learning (TL) with domain adaptation attempts to adapt these models to unlabeled target domains but assumes similar underlying structure that may not be present if new faults emerge. This study proposes focusing on maximizing the feature generality on the source domain and applying TL via weight transfer to copy the model to the target domain. Specifically, Self-Supervised Learning (SSL) with Barlow Twins may produce more discriminative features for monitoring health condition than supervised learning by focusing on semantic properties of the data. Furthermore, Federated Learning (FL) for distributed training may also improve generalization by efficiently expanding the effective size and diversity of training data by sharing information across multiple client machines. Results show that Barlow Twins outperforms supervised learning in an unlabeled target domain with emerging motor faults when the source training data contains very few distinct categories. Incorporating FL may also provide a slight advantage by diffusing knowledge of health conditions between machines.

Keyword: faster

Physics-informed neural networks for predicting gas flow dynamics and unknown parameters in diesel engines

  • Authors: Kamaljyoti Nath, Xuhui Meng, Daniel J Smith, George Em Karniadakis
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13799
  • Pdf link: https://arxiv.org/pdf/2304.13799
  • Abstract
    This paper presents a physics-informed neural network (PINN) approach for monitoring the health of diesel engines. The aim is to evaluate the engine dynamics, identify unknown parameters in a "mean value" model, and anticipate maintenance requirements. The PINN model is applied to diesel engines with a variable-geometry turbocharger and exhaust gas recirculation, using measurement data of selected state variables. The results demonstrate the ability of the PINN model to predict simultaneously both unknown parameters and dynamics accurately with both clean and noisy data, and the importance of the self-adaptive weight in the loss function for faster convergence. The input data for these simulations are derived from actual engine running conditions, while the outputs are simulated data, making this a practical case study of PINN's ability to predict real-world dynamical systems. The mean value model of the diesel engine incorporates empirical formulae to represent certain states, but these formulae may not be generalizable to other engines. To address this, the study considers the use of deep neural networks (DNNs) in addition to the PINN model. The DNNs are trained using laboratory test data and are used to model the engine-specific empirical formulae in the mean value model, allowing for a more flexible and adaptive representation of the engine's states. In other words, the mean value model uses both the PINN model and the DNNs to represent the engine's states, with the PINN providing a physics-based understanding of the engine's overall dynamics and the DNNs offering a more engine-specific and adaptive representation of the empirical formulae. By combining these two approaches, the study aims to offer a comprehensive and versatile approach to monitoring the health and performance of diesel engines.

A Survey on Solving and Discovering Differential Equations Using Deep Neural Networks

  • Authors: Hyeonjung (Tari)Jung, Jayant Gupta, Bharat Jayaprakash, Matthew Eagon, Harish Panneer Selvam, Carl Molnar, William Northrop, Shashi Shekhar
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.13807
  • Pdf link: https://arxiv.org/pdf/2304.13807
  • Abstract
    Ordinary and partial differential equations (DE) are used extensively in scientific and mathematical domains to model physical systems. Current literature has focused primarily on deep neural network (DNN) based methods for solving a specific DE or a family of DEs. Research communities with a history of using DE models may view DNN-based differential equation solvers (DNN-DEs) as a faster and transferable alternative to current numerical methods. However, there is a lack of systematic surveys detailing the use of DNN-DE methods across physical application domains and a generalized taxonomy to guide future research. This paper surveys and classifies previous works and provides an educational tutorial for senior practitioners, professionals, and graduate students in engineering and computer science. First, we propose a taxonomy to navigate domains of DE systems studied under the umbrella of DNN-DE. Second, we examine the theory and performance of the Physics Informed Neural Network (PINN) to demonstrate how the influential DNN-DE architecture mathematically solves a system of equations. Third, to reinforce the key ideas of solving and discovery of DEs using DNN, we provide a tutorial using DeepXDE, a Python package for developing PINNs, to develop DNN-DEs for solving and discovering a classic DE, the linear transport equation.

Variational Bayes Made Easy

  • Authors: Mohammad Emtiyaz Khan
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.14251
  • Pdf link: https://arxiv.org/pdf/2304.14251
  • Abstract
    Variational Bayes is a popular method for approximate inference but its derivation can be cumbersome. To simplify the process, we give a 3-step recipe to identify the posterior form by explicitly looking for linearity with respect to expectations of well-known distributions. We can then directly write the update by simply ``reading-off'' the terms in front of those expectations. The recipe makes the derivation easier, faster, shorter, and more general.

Keyword: mobile

AI-based Predictive Analytic Approaches for safeguarding the Future of Electric/Hybrid Vehicles

  • Authors: Ishan Shivansh Bangroo
  • Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.13841
  • Pdf link: https://arxiv.org/pdf/2304.13841
  • Abstract
    In response to the global need for sustainable energy, green technology may help fight climate change. Before green infrastructure to be easily integrated into the world's energy system, it needs upgrading. By improving energy infrastructure and decision-making, artificial intelligence (AI) may help solve this challenge. EHVs have grown in popularity because to concerns about global warming and the need for more ecologically friendly transportation. EHVs may work better with cutting-edge technologies like AI. Electric vehicles (EVs) reduce greenhouse gas emissions and promote sustainable mobility. Electric automobiles (EVs) are growing in popularity due to their benefits for climate change mitigation and sustainable mobility. Unfortunately, EV production consumes a lot of energy and materials, which may harm nature. EV production is being improved using green technologies like artificial intelligence and predictive analysis. Electric and hybrid vehicles (EHVs) may help meet the need for ecologically friendly transportation. However, the Battery Management System (BMS) controls EHV performance and longevity. AI may improve EHV energy efficiency, emissions reduction, and sustainability. Remote hijacking, security breaches, and unauthorized access are EHV cybersecurity vulnerabilities addressed in the article. AI research and development may help make transportation more sustainable, as may optimizing EHVs and charging infrastructure.

Detecting inner-LAN anomalies using hierarchical forecasting

  • Authors: Sevvandi Kandanaarachchi, Mahdi Abolghasemi, Hideya Ochiai, Asha Rao
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.13941
  • Pdf link: https://arxiv.org/pdf/2304.13941
  • Abstract
    Increasing activity and the number of devices online are leading to increasing and more diverse cyber attacks. This continuously evolving attack activity makes signature-based detection methods ineffective. Once malware has infiltrated into a LAN, bypassing an external gateway or entering via an unsecured mobile device, it can potentially infect all nodes in the LAN as well as carry out nefarious activities such as stealing valuable data, leading to financial damage and loss of reputation. Such infiltration could be viewed as an insider attack, increasing the need for LAN monitoring and security. In this paper we aim to detect such inner-LAN activity by studying the variations in Address Resolution Protocol (ARP) calls within the LAN. We find anomalous nodes by modelling inner-LAN traffic using hierarchical forecasting methods. We substantially reduce the false positives ever present in anomaly detection, by using an extreme value theory based method. We use a dataset from a real inner-LAN monitoring project, containing over 10M ARP calls from 362 nodes. Furthermore, the small number of false positives generated using our methods, is a potential solution to the "alert fatigue" commonly reported by security experts.

A Review of Panoptic Segmentation for Mobile Mapping Point Clouds

  • Authors: Binbin Xiang, Yuanwen Yue, Torben Peters, Konrad Schindler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13980
  • Pdf link: https://arxiv.org/pdf/2304.13980
  • Abstract
    3D point cloud panoptic segmentation is the combined task to (i) assign each point to a semantic class and (ii) separate the points in each class into object instances. Recently there has been an increased interest in such comprehensive 3D scene understanding, building on the rapid advances of semantic segmentation due to the advent of deep 3D neural networks. Yet, to date there is very little work about panoptic segmentation of outdoor mobile-mapping data, and no systematic comparisons. The present paper tries to close that gap. It reviews the building blocks needed to assemble a panoptic segmentation pipeline and the related literature. Moreover, a modular pipeline is set up to perform comprehensive, systematic experiments to assess the state of panoptic segmentation in the context of street mapping. As a byproduct, we also provide the first public dataset for that task, by extending the NPM3D dataset to include instance labels.

A Supervised Machine Learning Approach to Operator Intent Recognition for Teleoperated Mobile Robot Navigation

  • Authors: Evangelos Tsagkournis, Dimitris Panagopoulos, Giannis Petousakis, Grigoris Nikolaou, Rustam Stolkin, Manolis Chiou
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14003
  • Pdf link: https://arxiv.org/pdf/2304.14003
  • Abstract
    In applications that involve human-robot interaction (HRI), human-robot teaming (HRT), and cooperative human-machine systems, the inference of the human partner's intent is of critical importance. This paper presents a method for the inference of the human operator's navigational intent, in the context of mobile robots that provide full or partial (e.g., shared control) teleoperation. We propose the Machine Learning Operator Intent Inference (MLOII) method, which a) processes spatial data collected by the robot's sensors; b) utilizes a supervised machine learning algorithm to estimate the operator's most probable navigational goal online. The proposed method's ability to reliably and efficiently infer the intent of the human operator is experimentally evaluated in realistically simulated exploration and remote inspection scenarios. The results in terms of accuracy and uncertainty indicate that the proposed method is comparable to another state-of-the-art method found in the literature.

MCLFIQ: Mobile Contactless Fingerprint Image Quality

  • Authors: Jannis Priesnitz, Axel Weißenfeld, Christian Rathgeb, Bernhard Strobl, Ralph Lessmann, Christoph Busch1
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14123
  • Pdf link: https://arxiv.org/pdf/2304.14123
  • Abstract
    We propose MCLFIQ: Mobile Contactless Fingerprint Image Quality, the first quality assessment algorithm for mobile contactless fingerprint samples. To this end, we retrained the NIST Fingerprint Image Quality (NFIQ) 2 method, which was originally designed for contact-based fingerprints, with a synthetic contactless fingerprint database. We evaluate the predictive performance of the resulting MCLFIQ model in terms of Error-vs.-Discard Characteristic (EDC) curves on three real-world contactless fingerprint databases using two recognition algorithms. In experiments, the MCLFIQ method is compared against the original NFIQ 2 method and a sharpness-based quality assessment algorithm developed for contactless fingerprint images. Obtained results show that the re-training of NFIQ 2 on synthetic data is a viable alternative to training on real databases. Moreover, the evaluation shows that our MCLFIQ method works more accurate and robust compared to NFIQ 2 and the sharpness-based quality assessment. We suggest considering the proposed MCLFIQ method as a candidate for a new standard algorithm for contactless fingerprint quality assessment.

Combining HoloLens with Instant-NeRFs: Advanced Real-Time 3D Mobile Mapping

  • Authors: Dennis Haitz, Boris Jutzi, Markus Ulrich, Miriam Jaeger, Patrick Huebner
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14301
  • Pdf link: https://arxiv.org/pdf/2304.14301
  • Abstract
    This work represents a large step into modern ways of fast 3D reconstruction based on RGB camera images. Utilizing a Microsoft HoloLens 2 as a multisensor platform that includes an RGB camera and an inertial measurement unit for SLAM-based camera-pose determination, we train a Neural Radiance Field (NeRF) as a neural scene representation in real-time with the acquired data from the HoloLens. The HoloLens is connected via Wifi to a high-performance PC that is responsible for the training and 3D reconstruction. After the data stream ends, the training is stopped and the 3D reconstruction is initiated, which extracts a point cloud of the scene. With our specialized inference algorithm, five million scene points can be extracted within 1 second. In addition, the point cloud also includes radiometry per point. Our method of 3D reconstruction outperforms grid point sampling with NeRFs by multiple orders of magnitude and can be regarded as a complete real-time 3D reconstruction method in a mobile mapping setup.

A Versatile Low-Complexity Feedback Scheme for FDD Systems via Generative Modeling

  • Authors: Nurettin Turan, Benedikt Fesl, Michael Koller, Michael Joham, Wolfgang Utschick
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.14373
  • Pdf link: https://arxiv.org/pdf/2304.14373
  • Abstract
    In this work, we propose a versatile feedback scheme which can be deployed for both single- and multi-user multiple-input multiple-output (MIMO) frequency division duplex (FDD) systems. Particularly, we propose to use a Gaussian mixture model (GMM) with a reduced number of parameters for codebook construction, feedback encoding, and precoder design. The GMM is fitted offline at the base station (BS) to uplink (UL) training samples to approximate the channel distribution of all possible mobile terminals (MTs) located inside the BS cell. Afterwards, a codebook is constructed, where each codebook entry is based on one GMM component. By extracting directional information of the constructed codebook, the proposed GMM-based feedback approach allows to jointly design the precoders of a multi-user MIMO (MU-MIMO) system using common precoding algorithms. Alternatively, the GMM's sample generation ability can be utilized to design the precoders using a state-of-the-art stochastic iterative algorithm. After offloading the GMM to the MTs, they determine their feedback simply as the index of the GMM component with the highest responsibility for their received pilot signal. This strategy exhibits low complexity and allows for parallelization. Simulation results show that the proposed approach outperforms conventional methods, especially for a reduced number of pilots.

Keyword: pruning

Fine Tuning with Abnormal Examples

  • Authors: Will Rieger
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.13783
  • Pdf link: https://arxiv.org/pdf/2304.13783
  • Abstract
    Given the prevalence of crowd sourced labor in creating Natural Language processing datasets, these aforementioned sets have become increasingly large. For instance, the SQUAD dataset currently sits at over 80,000 records. However, because the English language is rather repetitive in structure, the distribution of word frequencies in the SQUAD dataset's contexts are relatively unchanged. By measuring each sentences distance from the co-variate distance of frequencies of all sentences in the dataset, we identify 10,500 examples that create a more uniform distribution for training. While fine-tuning ELECTRA [4] on this subset of examples reaches better performance to a model trained on all 87,000 examples. Herein we introduce a methodology for systematically pruning datasets for fine tuning reaching better out of sample performance.

JaxPruner: A concise library for sparsity research

  • Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Karolina Dziugaite, Pablo Samuel Castro, Utku Evci
  • Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.14082
  • Pdf link: https://arxiv.org/pdf/2304.14082
  • Abstract
    This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks.

Keyword: voxel

There is no result

Keyword: lidar

Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point Clouds

  • Authors: Pengfei Song, Luoyu MEI, Han Cheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); General Topology (math.GN)
  • Arxiv link: https://arxiv.org/abs/2304.14132
  • Pdf link: https://arxiv.org/pdf/2304.14132
  • Abstract
    This paper presents a framework for semantic segmentation on sparse sequential point clouds of millimeter-wave radar. Compared with cameras and lidars, millimeter-wave radars have the advantage of not revealing privacy, having a strong anti-interference ability, and having long detection distance. The sparsity and capturing temporal-topological features of mmWave data is still a problem. However, the issue of capturing the temporal-topological coupling features under the human semantic segmentation task prevents previous advanced segmentation methods (e.g PointNet, PointCNN, Point Transformer) from being well utilized in practical scenarios. To address the challenge caused by the sparsity and temporal-topological feature of the data, we (i) introduce graph structure and topological features to the point cloud, (ii) propose a semantic segmentation framework including a global feature-extracting module and a sequential feature-extracting module. In addition, we design an efficient and more fitting loss function for a better training process and segmentation results based on graph clustering. Experimentally, we deploy representative semantic segmentation algorithms (Transformer, GCNN, etc.) on a custom dataset. Experimental results indicate that our model achieves mean accuracy on the custom dataset by $\mathbf{82.31}%$ and outperforms the state-of-the-art algorithms. Moreover, to validate the model's robustness, we deploy our model on the well-known S3DIS dataset. On the S3DIS dataset, our model achieves mean accuracy by $\mathbf{92.6}%$, outperforming baseline algorithms.

Quadric Representations for LiDAR Odometry, Mapping and Localization

  • Authors: Chao Xia, Chenfeng Xu, Patrick Rim, Mingyu Ding, Nanning Zheng, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14190
  • Pdf link: https://arxiv.org/pdf/2304.14190
  • Abstract
    Current LiDAR odometry, mapping and localization methods leverage point-wise representations of 3D scenes and achieve high accuracy in autonomous driving tasks. However, the space-inefficiency of methods that use point-wise representations limits their development and usage in practical applications. In particular, scan-submap matching and global map representation methods are restricted by the inefficiency of nearest neighbor searching (NNS) for large-volume point clouds. To improve space-time efficiency, we propose a novel method of describing scenes using quadric surfaces, which are far more compact representations of 3D objects than conventional point clouds. In contrast to point cloud-based methods, our quadric representation-based method decomposes a 3D scene into a collection of sparse quadric patches, which improves storage efficiency and avoids the slow point-wise NNS process. Our method first segments a given point cloud into patches and fits each of them to a quadric implicit function. Each function is then coupled with other geometric descriptors of the patch, such as its center position and covariance matrix. Collectively, these patch representations fully describe a 3D scene, which can be used in place of the original point cloud and employed in LiDAR odometry, mapping and localization algorithms. We further design a novel incremental growing method for quadric representations, which eliminates the need to repeatedly re-fit quadric surfaces from the original point cloud. Extensive odometry, mapping and localization experiments on large-volume point clouds in the KITTI and UrbanLoco datasets demonstrate that our method maintains low latency and memory utility while achieving competitive, and even superior, accuracy.

A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving Services

  • Authors: Dewant Katare, Diego Perino, Jari Nurmi, Martijn Warnier, Marijn Janssen, Aaron Yi Ding
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14271
  • Pdf link: https://arxiv.org/pdf/2304.14271
  • Abstract
    Autonomous driving services rely heavily on sensors such as cameras, LiDAR, radar, and communication modules. A common practice of processing the sensed data is using a high-performance computing unit placed inside the vehicle, which deploys AI models and algorithms to act as the brain or administrator of the vehicle. The vehicular data generated from average hours of driving can be up to 20 Terabytes depending on the data rate and specification of the sensors. Given the scale and fast growth of services for autonomous driving, it is essential to improve the overall energy and environmental efficiency, especially in the trend towards vehicular electrification (e.g., battery-powered). Although the areas have seen significant advancements in sensor technologies, wireless communications, computing and AI/ML algorithms, the challenge still exists in how to apply and integrate those technology innovations to achieve energy efficiency. This survey reviews and compares the connected vehicular applications, vehicular communications, approximation and Edge AI techniques. The focus is on energy efficiency by covering newly proposed approximation and enabling frameworks. To the best of our knowledge, this survey is the first to review the latest approximate Edge AI frameworks and publicly available datasets in energy-efficient autonomous driving. The insights and vision from this survey can be beneficial for the collaborative driving service development on low-power and memory-constrained systems and also for the energy optimization of autonomous vehicles.

SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

  • Authors: Yichen Xie, Chenfeng Xu, Marie-Julie Rakotosaona, Patrick Rim, Federico Tombari, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14340
  • Pdf link: https://arxiv.org/pdf/2304.14340
  • Abstract
    By identifying four important components of existing LiDAR-camera 3D object detection methods (LiDAR and camera candidates, transformation, and fusion outputs), we observe that all existing methods either find dense candidates or yield dense representations of scenes. However, given that objects occupy only a small part of a scene, finding dense candidates and generating dense representations is noisy and inefficient. We propose SparseFusion, a novel multi-sensor 3D detection method that exclusively uses sparse candidates and sparse representations. Specifically, SparseFusion utilizes the outputs of parallel detectors in the LiDAR and camera modalities as sparse candidates for fusion. We transform the camera candidates into the LiDAR coordinate space by disentangling the object representations. Then, we can fuse the multi-modality candidates in a unified 3D space by a lightweight self-attention module. To mitigate negative transfer between modalities, we propose novel semantic and geometric cross-modality transfer modules that are applied prior to the modality-specific detectors. SparseFusion achieves state-of-the-art performance on the nuScenes benchmark while also running at the fastest speed, even outperforming methods with stronger backbones. We perform extensive experiments to demonstrate the effectiveness and efficiency of our modules and overall method pipeline. Our code will be made publicly available at https://github.com/yichen928/SparseFusion.

SMAT: A Self-Reinforcing Framework for Simultaneous Mapping and Tracking in Unbounded Urban Environments

  • Authors: Tingxiang Fan, Bowen Shen, Yinqiang Zhang, Chuye Zhang, Lei Yang, Hua Chen, Wei Zhang, Jia Pan
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14356
  • Pdf link: https://arxiv.org/pdf/2304.14356
  • Abstract
    With the increasing prevalence of robots in daily life, it is crucial to enable robots to construct a reliable map online to navigate in unbounded and changing environments. Although existing methods can individually achieve the goals of spatial mapping and dynamic object detection and tracking, limited research has been conducted on an effective combination of these two important abilities. The proposed framework, SMAT (Simultaneous Mapping and Tracking), integrates the front-end dynamic object detection and tracking module with the back-end static mapping module using a self-reinforcing mechanism, which promotes mutual improvement of mapping and tracking performance. The conducted experiments demonstrate the framework's effectiveness in real-world applications, achieving successful long-range navigation and mapping in multiple urban environments using only one LiDAR, a CPU-only onboard computer, and a consumer-level GPS receiver.

Keyword: diffusion

Towards ethical multimodal systems

  • Authors: Alexis Roger, Esma Aïmeur, Irina Rish
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13765
  • Pdf link: https://arxiv.org/pdf/2304.13765
  • Abstract
    The impact of artificial intelligence systems on our society is increasing at an unprecedented speed. For instance, ChatGPT is being tested in mental health treatment applications such as Koko, Stable Diffusion generates pieces of art competitive with (or outperforming) human artists, and so on. Ethical concerns regarding the behavior and applications of generative AI systems have been increasing over the past years, and the field of AI alignment - steering the behavior of AI systems towards being aligned with human values - is a rapidly growing subfield of modern AI. In this paper, we address the challenges involved in ethical evaluation of a multimodal artificial intelligence system. The multimodal systems we focus on take both text and an image as input and output text, completing the sentence or answering the question asked as input. We perform the evaluation of these models in two steps: we first discus the creation of a multimodal ethical database and then use this database to construct morality-evaluating algorithms. The creation of the multimodal ethical database is done interactively through human feedback. Users are presented with multiple examples and votes on whether they are ethical or not. Once these answers have been aggregated into a dataset, we built and tested different algorithms to automatically evaluate the morality of multimodal systems. These algorithms aim to classify the answers as ethical or not. The models we tested are a RoBERTa-large classifier and a multilayer perceptron classifier.

Preserving Superconvergence of Spectral Elements for Curved Domains via $h$ and $p$-Geometric Refinement

  • Authors: Jacob Jones, Rebecca Conley, Xiangmin Jiao
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13766
  • Pdf link: https://arxiv.org/pdf/2304.13766
  • Abstract
    Spectral element methods (SEM), which are extensions of finite element methods (FEM), are important emerging techniques for solving partial differential equations in physics and engineering. SEM can potentially deliver better accuracy due to the potential superconvergence for well-shaped tensor-product elements. However, for complex geometries, the accuracy of SEM often degrades due to a combination of geometric inaccuracies near curved boundaries and the loss of superconvergence with simplicial or non-tensor-product elements. We propose to overcome the first issue by using $h$- and $p$-geometric refinement, to refine the mesh near high-curvature regions and increase the degree of geometric basis functions, respectively. We show that when using mixed-meshes with tensor-product elements in the interior of the domain, curvature-based geometric refinement near boundaries can improve the accuracy of the interior elements by reducing pollution errors and preserving the superconvergence. To overcome the second issue, we apply a post-processing technique to recover the accuracy near the curved boundaries by using the adaptive extended stencil finite element method (AES-FEM). The combination of curvature-based geometric refinement and accurate post-processing delivers an effective and easier-to-implement alternative to other methods based on exact geometries. We demonstrate our techniques by solving the convection-diffusion equation in 2D and show one to two orders of magnitude of improvement in the solution accuracy, even when the elements are poorly shaped near boundaries.

Multimodal Composite Association Score: Measuring Gender Bias in Generative Multimodal Models

  • Authors: Abhishek Mandal, Susan Leavy, Suzanne Little
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13855
  • Pdf link: https://arxiv.org/pdf/2304.13855
  • Abstract
    Generative multimodal models based on diffusion models have seen tremendous growth and advances in recent years. Models such as DALL-E and Stable Diffusion have become increasingly popular and successful at creating images from texts, often combining abstract ideas. However, like other deep learning models, they also reflect social biases they inherit from their training data, which is often crawled from the internet. Manually auditing models for biases can be very time and resource consuming and is further complicated by the unbounded and unconstrained nature of inputs these models can take. Research into bias measurement and quantification has generally focused on small single-stage models working on a single modality. Thus the emergence of multistage multimodal models requires a different approach. In this paper, we propose Multimodal Composite Association Score (MCAS) as a new method of measuring gender bias in multimodal generative models. Evaluating both DALL-E 2 and Stable Diffusion using this approach uncovered the presence of gendered associations of concepts embedded within the models. We propose MCAS as an accessible and scalable method of quantifying potential bias for models with different modalities and a range of potential biases.

Two kinds of numerical algorithms for ultra-slow diffusion equations

  • Authors: Min Cai, Changpin Li, Yu Wang
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13966
  • Pdf link: https://arxiv.org/pdf/2304.13966
  • Abstract
    In this article, two kinds of numerical algorithms are derived for the ultra-slow (or superslow) diffusion equation in one and two space dimensions, where the ultra-slow diffusion is characterized by the Caputo-Hadamard fractional derivative of order $\alpha \in (0,1)$. To describe the spatial interaction, the Riesz fractional derivative and the fractional Laplacian are used in one and two space dimensions, respectively. The Caputo-Hadamard derivative is discretized by two typical approximate formulae, i.e., L2-1${\sigma}$ and L1-2 methods. The spatial fractional derivatives are discretized by the 2-nd order finite difference methods. When L2-1${\sigma}$ discretization is used, the derived numerical scheme is unconditionally stable with error estimate $\mathcal{O}(\tau^{2}+h^{2})$ for all $\alpha \in (0, 1)$, in which $\tau$ and $h$ are temporal and spatial stepsizes, respectively. When L1-2 discretization is used, the derived numerical scheme is stable with error estimate $\mathcal{O}(\tau^{3-\alpha}+h^{2})$ for $\alpha \in (0, 0.3738)$. The illustrative examples displayed are in line with the theoretical analysis.

Edit Everything: A Text-Guided Generative System for Images Editing

  • Authors: Defeng Xie, Ruichen Wang, Jian Ma, Chen Chen, Haonan Lu, Dong Yang, Fobo Shi, Xiaodong Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14006
  • Pdf link: https://arxiv.org/pdf/2304.14006
  • Abstract
    We introduce a new generative system called Edit Everything, which can take image and text inputs and produce image outputs. Edit Everything allows users to edit images using simple text instructions. Our system designs prompts to guide the visual module in generating requested images. Experiments demonstrate that Edit Everything facilitates the implementation of the visual aspects of Stable Diffusion with the use of Segment Anything model and CLIP. Our system is publicly available at https://github.com/DefengXie/Edit_Everything.

Localized orthogonal decomposition for a multiscale parabolic stochastic partial differential equation

  • Authors: Annika Lang, Per Ljung, Axel Målqvist
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14049
  • Pdf link: https://arxiv.org/pdf/2304.14049
  • Abstract
    A multiscale method is proposed for a parabolic stochastic partial differential equation with additive noise and highly oscillatory diffusion. The framework is based on the localized orthogonal decomposition (LOD) method and computes a coarse-scale representation of the elliptic operator, enriched by fine-scale information on the diffusion. Optimal order strong convergence is derived. The LOD technique is combined with a (multilevel) Monte-Carlo estimator and the weak error is analyzed. Numerical examples that confirm the theoretical findings are provided, and the computational efficiency of the method is highlighted.

DataComp: In search of the next generation of multimodal datasets

  • Authors: Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14108
  • Pdf link: https://arxiv.org/pdf/2304.14108
  • Abstract
    Large multimodal datasets have been instrumental in recent breakthroughs such as CLIP, Stable Diffusion, and GPT-4. At the same time, datasets rarely receive the same research attention as model architectures or training algorithms. To address this shortcoming in the machine learning ecosystem, we introduce DataComp, a benchmark where the training code is fixed and researchers innovate by proposing new training sets. We provide a testbed for dataset experiments centered around a new candidate pool of 12.8B image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or curate new data sources and then evaluate their new dataset by running our standardized CLIP training code and testing on 38 downstream test sets. Our benchmark consists of multiple scales, with four candidate pool sizes and associated compute budgets ranging from 12.8M to 12.8B samples seen during training. This multi-scale design facilitates the study of scaling trends and makes the benchmark accessible to researchers with varying resources. Our baseline experiments show that the DataComp workflow is a promising way of improving multimodal datasets. We introduce DataComp-1B, a dataset created by applying a simple filtering algorithm to the 12.8B candidate pool. The resulting 1.4B subset enables training a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet. Our new ViT-L/14 model outperforms a larger ViT-g/14 trained on LAION-2B by 0.7 percentage points while requiring 9x less training compute. We also outperform OpenAI's CLIP ViT-L/14 by 3.7 percentage points, which is trained with the same compute budget as our model. These gains highlight the potential for improving model performance by carefully curating training sets. We view DataComp-1B as only the first step and hope that DataComp paves the way toward the next generation of multimodal datasets.

Functional Diffusion Maps

  • Authors: María Barroso, Carlos María Alaíz, Ángela Fernández, Jose Luis Torrecilla
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.14378
  • Pdf link: https://arxiv.org/pdf/2304.14378
  • Abstract
    Nowadays many real-world datasets can be considered as functional, in the sense that the processes which generate them are continuous. A fundamental property of this type of data is that in theory they belong to an infinite-dimensional space. Although in practice we usually receive finite observations, they are still high-dimensional and hence dimensionality reduction methods are crucial. In this vein, the main state-of-the-art method for functional data analysis is Functional PCA. Nevertheless, this classic technique assumes that the data lie in a linear manifold, and hence it could have problems when this hypothesis is not fulfilled. In this research, attention has been placed on a non-linear manifold learning method: Diffusion Maps. The article explains how to extend this multivariate method to functional data and compares its behavior against Functional PCA over different simulated and real examples.

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

  • Authors: Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin, Ming-Hsuan Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14404
  • Pdf link: https://arxiv.org/pdf/2304.14404
  • Abstract
    Recent advancements in diffusion models have greatly improved the quality and diversity of synthesized content. To harness the expressive power of diffusion models, researchers have explored various controllable mechanisms that allow users to intuitively guide the content synthesis process. Although the latest efforts have primarily focused on video synthesis, there has been a lack of effective methods for controlling and describing desired content and motion. In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis. To tackle the ambiguity of sparse motion inputs and achieve better synthesis quality, MCDiff first utilizes a flow completion model to predict the dense video motion based on the semantic understanding of the video frame and the sparse motion control. Then, the diffusion model synthesizes high-quality future frames to form the output video. We qualitatively and quantitatively show that MCDiff achieves the state-the-of-art visual quality in stroke-guided controllable video synthesis. Additional experiments on MPII Human Pose further exhibit the capability of our model on diverse content and motion synthesis.

Putting People in Their Place: Affordance-Aware Human Insertion into Scenes

  • Authors: Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, Jingwan Lu, Alexei A. Efros, Krishna Kumar Singh
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14406
  • Pdf link: https://arxiv.org/pdf/2304.14406
  • Abstract
    We study the problem of inferring scene affordances by presenting a method for realistically inserting people into scenes. Given a scene image with a marked region and an image of a person, we insert the person into the scene while respecting the scene affordances. Our model can infer the set of realistic poses given the scene context, re-pose the reference person, and harmonize the composition. We set up the task in a self-supervised fashion by learning to re-pose humans in video clips. We train a large-scale diffusion model on a dataset of 2.4M video clips that produces diverse plausible poses while respecting the scene context. Given the learned human-scene composition, our model can also hallucinate realistic people and scenes when prompted without conditioning and also enables interactive editing. A quantitative evaluation shows that our method synthesizes more realistic human appearance and more natural human-scene interactions than prior work.

Keyword: dynamic

TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation

  • Authors: Zhaoyan Liu, Noel Vouitsis, Satya Krishna Gorti, Jimmy Ba, Gabriel Loaiza-Ganem
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.13742
  • Pdf link: https://arxiv.org/pdf/2304.13742
  • Abstract
    We propose TR0N, a highly general framework to turn pre-trained unconditional generative models, such as GANs and VAEs, into conditional models. The conditioning can be highly arbitrary, and requires only a pre-trained auxiliary model. For example, we show how to turn unconditional models into class-conditional ones with the help of a classifier, and also into text-to-image models by leveraging CLIP. TR0N learns a lightweight stochastic mapping which "translates" between the space of conditions and the latent space of the generative model, in such a way that the generated latent corresponds to a data sample satisfying the desired condition. The translated latent samples are then further improved upon through Langevin dynamics, enabling us to obtain higher-quality data samples. TR0N requires no training data nor fine-tuning, yet can achieve a zero-shot FID of 10.9 on MS-COCO, outperforming competing alternatives not only on this metric, but also in sampling speed -- all while retaining a much higher level of generality. Our code is available at https://github.com/layer6ai-labs/tr0n.

Physics-informed neural networks for predicting gas flow dynamics and unknown parameters in diesel engines

  • Authors: Kamaljyoti Nath, Xuhui Meng, Daniel J Smith, George Em Karniadakis
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13799
  • Pdf link: https://arxiv.org/pdf/2304.13799
  • Abstract
    This paper presents a physics-informed neural network (PINN) approach for monitoring the health of diesel engines. The aim is to evaluate the engine dynamics, identify unknown parameters in a "mean value" model, and anticipate maintenance requirements. The PINN model is applied to diesel engines with a variable-geometry turbocharger and exhaust gas recirculation, using measurement data of selected state variables. The results demonstrate the ability of the PINN model to predict simultaneously both unknown parameters and dynamics accurately with both clean and noisy data, and the importance of the self-adaptive weight in the loss function for faster convergence. The input data for these simulations are derived from actual engine running conditions, while the outputs are simulated data, making this a practical case study of PINN's ability to predict real-world dynamical systems. The mean value model of the diesel engine incorporates empirical formulae to represent certain states, but these formulae may not be generalizable to other engines. To address this, the study considers the use of deep neural networks (DNNs) in addition to the PINN model. The DNNs are trained using laboratory test data and are used to model the engine-specific empirical formulae in the mean value model, allowing for a more flexible and adaptive representation of the engine's states. In other words, the mean value model uses both the PINN model and the DNNs to represent the engine's states, with the PINN providing a physics-based understanding of the engine's overall dynamics and the DNNs offering a more engine-specific and adaptive representation of the empirical formulae. By combining these two approaches, the study aims to offer a comprehensive and versatile approach to monitoring the health and performance of diesel engines.

A Data-Driven Hybrid Automaton Framework to Modeling Complex Dynamical Systems

  • Authors: Yejiang Yang, Zihao Mo, Weiming Xiang
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.13811
  • Pdf link: https://arxiv.org/pdf/2304.13811
  • Abstract
    In this paper, a computationally efficient data-driven hybrid automaton model is proposed to capture unknown complex dynamical system behaviors using multiple neural networks. The sampled data of the system is divided by valid partitions into groups corresponding to their topologies and based on which, transition guards are defined. Then, a collection of small-scale neural networks that are computationally efficient are trained as the local dynamical description for their corresponding topologies. After modeling the system with a neural-network-based hybrid automaton, the set-valued reachability analysis with low computation cost is provided based on interval analysis and a split and combined process. At last, a numerical example of the limit cycle is presented to illustrate that the developed models can significantly reduce the computational cost in reachable set computation without sacrificing any modeling precision.

Controlled density transport using Perron Frobenius generators

  • Authors: Jake Buzhardt, Phanindra Tallapragada
  • Subjects: Systems and Control (eess.SY); Robotics (cs.RO); Fluid Dynamics (physics.flu-dyn)
  • Arxiv link: https://arxiv.org/abs/2304.13829
  • Pdf link: https://arxiv.org/pdf/2304.13829
  • Abstract
    We consider the problem of the transport of a density of states from an initial state distribution to a desired final state distribution through a dynamical system with actuation. In particular, we consider the case where the control signal is a function of time, but not space; that is, the same actuation is applied at every point in the state space. This is motivated by several problems in fluid mechanics, such as mixing and manipulation of a collection of particles by a global control input such as a uniform magnetic field, as well as by more general control problems where a density function describes an uncertainty distribution or a distribution of agents in a multi-agent system. We formulate this problem using the generators of the Perron-Frobenius operator associated with the drift and control vector fields of the system. By considering finite-dimensional approximations of these operators, the density transport problem can be expressed as a control problem for a bilinear system in a high-dimensional, lifted state. With this system, we frame the density control problem as a problem of driving moments of the density function to the moments of a desired density function, where the moments of the density can be expressed as an output which is linear in the lifted state. This output tracking problem for the lifted bilinear system is then solved using differential dynamic programming, an iterative trajectory optimization scheme.

Understand the Dynamic World: An End-to-End Knowledge Informed Framework for Open Domain Entity State Tracking

  • Authors: Mingchen Li, Lifu Huang
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13854
  • Pdf link: https://arxiv.org/pdf/2304.13854
  • Abstract
    Open domain entity state tracking aims to predict reasonable state changes of entities (i.e., [attribute] of [entity] was [before_state] and [after_state] afterwards) given the action descriptions. It's important to many reasoning tasks to support human everyday activities. However, it's challenging as the model needs to predict an arbitrary number of entity state changes caused by the action while most of the entities are implicitly relevant to the actions and their attributes as well as states are from open vocabularies. To tackle these challenges, we propose a novel end-to-end Knowledge Informed framework for open domain Entity State Tracking, namely KIEST, which explicitly retrieves the relevant entities and attributes from external knowledge graph (i.e., ConceptNet) and incorporates them to autoregressively generate all the entity state changes with a novel dynamic knowledge grained encoder-decoder framework. To enforce the logical coherence among the predicted entities, attributes, and states, we design a new constraint decoding strategy and employ a coherence reward to improve the decoding process. Experimental results show that our proposed KIEST framework significantly outperforms the strong baselines on the public benchmark dataset OpenPI.

Ensoul: A framework for the creation of self organizing intelligent ultra low power systems (SOULS) through evolutionary enerstatic networks

  • Authors: Ty Roachford
  • Subjects: Artificial Intelligence (cs.AI); Adaptation and Self-Organizing Systems (nlin.AO)
  • Arxiv link: https://arxiv.org/abs/2304.13863
  • Pdf link: https://arxiv.org/pdf/2304.13863
  • Abstract
    Ensoul is a framework proposed for the purpose of creating technologies that create more technologies through the combined use of networks, and nests, of energy homeostatic (enerstatic) loops and open-ended evolutionary techniques. Generative technologies developed by such an approach serve as both simple, yet insightful models of thermodynamically driven complex systems and as powerful sources of novel technologies. "Self Organizing intelligent Ultra Low power Systems" (SOULS) is a term that well describes the technologies produced by such a generative technology, as well as the generative technology itself. The term is meant to capture the abstract nature of such technologies as being independent of the substrate in which they are embedded. In other words, SOULS can be biological, artificial or hybrid in form.

Physics-informed Data-driven Discovery of Constitutive Models with Application to Strain-Rate-sensitive Soft Materials

  • Authors: Kshitiz Upadhyay, Jan N. Fuhg, Nikolaos Bouklas, K.T. Ramesh
  • Subjects: Computational Engineering, Finance, and Science (cs.CE); Materials Science (cond-mat.mtrl-sci); Soft Condensed Matter (cond-mat.soft)
  • Arxiv link: https://arxiv.org/abs/2304.13897
  • Pdf link: https://arxiv.org/pdf/2304.13897
  • Abstract
    A novel data-driven constitutive modeling approach is proposed, which combines the physics-informed nature of modeling based on continuum thermodynamics with the benefits of machine learning. This approach is demonstrated on strain-rate-sensitive soft materials. This model is based on the viscous dissipation-based visco-hyperelasticity framework where the total stress is decomposed into volumetric, isochoric hyperelastic, and isochoric viscous overstress contributions. It is shown that each of these stress components can be written as linear combinations of the components of an irreducible integrity basis. Three Gaussian process regression-based surrogate models are trained (one per stress component) between principal invariants of strain and strain rate tensors and the corresponding coefficients of the integrity basis components. It is demonstrated that this type of model construction enforces key physics-based constraints on the predicted responses: the second law of thermodynamics, the principles of local action and determinism, objectivity, the balance of angular momentum, an assumed reference state, isotropy, and limited memory. The three surrogate models that constitute our constitutive model are evaluated by training them on small-size numerically generated data sets corresponding to a single deformation mode and then analyzing their predictions over a much wider testing regime comprising multiple deformation modes. Our physics-informed data-driven constitutive model predictions are compared with the corresponding predictions of classical continuum thermodynamics-based and purely data-driven models. It is shown that our surrogate models can reasonably capture the stress-strain-strain rate responses in both training and testing regimes, and provide improvements in terms of prediction accuracy, generalizability to multiple deformation modes, and compatibility with limited data.

Conditional dominance in games with unawareness

  • Authors: Martin Meier, Burkhard C. Schipper
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2304.13901
  • Pdf link: https://arxiv.org/pdf/2304.13901
  • Abstract
    Heifetz, Meier, and Schipper (2013) introduced dynamic game with unawareness consisting of a partially ordered set of games in extensive form. Here, we study the normal form of dynamic games with unawareness. The generalized normal form associated with a dynamic game with unawareness consists of a partially ordered set of games in norm form. We use the generalized normal form to characterize extensive-form rationalizability (resp., prudent rationalizability) in dynamic games with unawareness by iterated conditional strict (resp., weak) dominance in the associated generalized normal form. We also show that the analogue to iterated admissibility for dynamic games with unawareness depends on extensive-form structure. This is because under unawareness, a player's information set not only determines which nodes she considers possible but also of which game tree(s) she is aware of.

Level Assembly as a Markov Decision Process

  • Authors: Colan F. Biemer, Seth Cooper
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13922
  • Pdf link: https://arxiv.org/pdf/2304.13922
  • Abstract
    Many games feature a progression of levels that doesn't adapt to the player. This can be problematic because some players may get stuck if the progression is too difficult, while others may find it boring if the progression is too slow to get to more challenging levels. This can be addressed by building levels based on the player's performance and preferences. In this work, we formulate the problem of generating levels for a player as a Markov Decision Process (MDP) and use adaptive dynamic programming (ADP) to solve the MDP before assembling a level. We tested with two case studies and found that using an ADP outperforms two baselines. Furthermore, we experimented with player proxies and switched them in the middle of play, and we show that a simple modification prior to running ADP results in quick adaptation. By using ADP, which searches the entire MDP, we produce a dynamic progression of levels that adapts to the player.

A One-Dimensional Symmetric Force-Based Blending Method for Atomistic-to-Continuum Coupling

  • Authors: Elaine Gorom-Alexander, Xingjie Helen Li
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13939
  • Pdf link: https://arxiv.org/pdf/2304.13939
  • Abstract
    Inspired by the blending method developed by [P. Seleson, S. Beneddine, and S. Prudhome, \emph{A Force-Based Coupling Scheme for Peridynamics and Classical Elasticity}, (2013)] for the nonlocal-to-local coupling, we create a symmetric and consistent blended force-based Atomistic-to-Continuum (a/c) scheme for the atomistic chain in one-dimensional space. The conditions for the well-posedness of the underlying model are established by analyzing an optimal blending size and blending type to ensure the $H^1$ semi-norm stability for the blended force-based operator. We present several numerical experiments to test and confirm the theoretical findings.

Provably Stabilizing Global-Position Tracking Control for Hybrid Models of Multi-Domain Bipedal Walking via Multiple Lyapunov Analysis

  • Authors: Yuan Gao, Kentaro Barhydt, Christopher Niezrecki, Yan Gu
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.13943
  • Pdf link: https://arxiv.org/pdf/2304.13943
  • Abstract
    Accurate control of a humanoid robot's global position (i.e., its three-dimensional position in the world) is critical to the reliable execution of high-risk tasks such as avoiding collision with pedestrians in a crowded environment. This paper introduces a time-based nonlinear control method that achieves accurate global-position tracking (GPT) for multi-domain bipedal walking. Deriving a tracking controller for bipedal robots is challenging due to the highly complex robot dynamics that are time-varying and hybrid, especially for multi-domain walking that involves multiple phases/domains of full actuation, over actuation, and underactuation. To tackle this challenge, we introduce a continuous-phase GPT control law for multi-domain walking, which provably ensures the exponential convergence of the entire error state within the full and over actuation domains and that of the directly regulated error state within the underactuation domain. We then construct sufficient multiple-Lyapunov stability conditions for the hybrid multi-domain tracking error system under the proposed GPT control law. We illustrate the proposed controller design through both three-domain walking with all motors activated and two-domain gait with inactive ankle motors. Simulations of a ROBOTIS OP3 bipedal humanoid robot demonstrate the satisfactory accuracy and convergence rate of the proposed control approach under two different cases of multi-domain walking as well as various walking speeds and desired paths.

A central scheme for coupled hyperbolic systems

  • Authors: Michael Herty, Niklas Kolbe, Siegfried Müller
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13946
  • Pdf link: https://arxiv.org/pdf/2304.13946
  • Abstract
    A novel numerical scheme to solve coupled systems of conservation laws is introduced. The scheme is derived based on a relaxation approach and does not require information on the Lax curves of the coupled systems, which simplifies the computation of suitable coupling data. The coupling condition for the underlying relaxation system plays a crucial role as it determines the behavior of the scheme in the zero relaxation limit. The role of this condition is discussed, a consistency concept with respect to the original problem is introduced, well-posedness is analyzed and explicit, nodal Riemann solvers are provided. Based on a case study considering the p-system of gas dynamics a strategy for the design of the relaxation coupling condition within the new scheme is provided.

Data-driven time-scale separation of ODE right-hand sides using dynamic mode decomposition and time delay embedding

  • Authors: Cody J. Balos
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13971
  • Pdf link: https://arxiv.org/pdf/2304.13971
  • Abstract
    Multi-physics simulation often involve multiple different scales. The ARKODE ODE solver package in the SUNDIALS library addresses multi-scale problems with a multi-rate time-integrator that can work with a right-hand side that has fast scale and slow scale components. In this report, we use dynamic mode decomposition and time delay embedding to extract the fast and and slow components of the right-hand sides of a simple ODE from data. We then use the extracted components to solve the ODE with ARKODE. Finally, to move towards a real-world use case, we attempt to extract fast and slow scale dynamics from synthetic seismic modeling data.

An FPTAS for Budgeted Laminar Matroid Independent Set

  • Authors: Ilan Doron-Arad, Ariel Kulik, Hadas Shachnai
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.13984
  • Pdf link: https://arxiv.org/pdf/2304.13984
  • Abstract
    We study the budgeted laminar matroid independent set problem. The input is a ground set, where each element has a cost and a non-negative profit, along with a laminar matroid over the elements and a budget. The goal is to select a maximum profit independent set of the matroid whose total cost is bounded by the budget. Several well known special cases, where we have, e.g., no matroid constraint (the classic knapsack problem) or a uniform matroid constraint (knapsack with a cardinality constraint), admit a fully polynomial-time approximation scheme (FPTAS). In contrast, the budgeted matroid independent set (BMI) problem with a general matroid has an efficient polynomial-time approximation scheme (EPTAS) but does not admit an FPTAS. This implies an EPTAS for our problem, which is the best known result prior to this work. We present an FPTAS for budgeted laminar matroid independent set, improving the previous EPTAS for this matroid family and generalizing the FPTAS known for knapsack with a cardinality constraint and multiple-choice knapsack. Our scheme is based on a simple dynamic program which utilizes the tree-like structure of laminar matroids.

communication of information in systems of heterogenious agents and systems' dynamics

  • Authors: Inga Ivanova
  • Subjects: Computers and Society (cs.CY); Information Theory (cs.IT); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.14013
  • Pdf link: https://arxiv.org/pdf/2304.14013
  • Abstract
    Communication of information in complex systems can be considered as major driver of systems evolution. What matters is not the communicated information by itself but rather the meaning that is supplied to the information. However informational exchange in a system of heterogenious agents, which code and decode information with different meaning processing structures, is more complex than simple input-output model. The structural difference of coding and decoding algorithms in a system of three or more groups of agents, entertaining different sets of communication codes,provide a source of additional options which has an impact on system's dynamics. The mechanisms of meaning and information processing can be evaluated analytically ion a model framework. The results show that model predictions acccurately fit empirically observed data in systems of different origions.

Unification of Lagrangian staggered-grid hydrodynamics and cell-centered hydrodynamics in one dimension

  • Authors: Xihua Xu
  • Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
  • Arxiv link: https://arxiv.org/abs/2304.14054
  • Pdf link: https://arxiv.org/pdf/2304.14054
  • Abstract
    This paper focuses on the novel scheme to unify both Lagrangian staggered-grid and cell-centered hydrodynamic methods in one dimension. The scheme neither contains empirical parameters nor solves the Riemann problem. It includes two key points: one is the relationship between pressure and velocity, and the other is Newton's second law. The two methods that make use of this scheme satisfy the entropy condition and are conservative in total mass, momentum, and energy. Numerical results show the robustness and accuracy of both methods.

Comparison of Optimization-Based Methods for Energy-Optimal Quadrotor Motion Planning

  • Authors: Welf Rehberg, Joaquim Ortiz-Haro, Marc Toussaint, Wolfgang Hönig
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14062
  • Pdf link: https://arxiv.org/pdf/2304.14062
  • Abstract
    Quadrotors are agile flying robots that are challenging to control. Considering the full dynamics of quadrotors during motion planning is crucial to achieving good solution quality and small tracking errors during flight. Optimization-based methods scale well with high-dimensional state spaces and can handle dynamic constraints directly, therefore they are often used in these scenarios. The resulting optimization problem is notoriously difficult to solve due to its nonconvex constraints. In this work, we present an analysis of four solvers for nonlinear trajectory optimization (KOMO, direct collocation with SCvx, direct collocation with CasADi, Crocoddyl) and evaluate their performance in scenarios where the solvers are tasked to find minimum-effort solutions to geometrically complex problems and problems requiring highly dynamic solutions. Benchmarking these methods helps to determine the best algorithm structures for these kinds of problems.

Compositional 3D Human-Object Neural Animation

  • Authors: Zhi Hou, Baosheng Yu, Dacheng Tao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14070
  • Pdf link: https://arxiv.org/pdf/2304.14070
  • Abstract
    Human-object interactions (HOIs) are crucial for human-centric scene understanding applications such as human-centric visual generation, AR/VR, and robotics. Since existing methods mainly explore capturing HOIs, rendering HOI remains less investigated. In this paper, we address this challenge in HOI animation from a compositional perspective, i.e., animating novel HOIs including novel interaction, novel human and/or novel object driven by a novel pose sequence. Specifically, we adopt neural human-object deformation to model and render HOI dynamics based on implicit neural representations. To enable the interaction pose transferring among different persons and objects, we then devise a new compositional conditional neural radiance field (or CC-NeRF), which decomposes the interdependence between human and object using latent codes to enable compositionally animation control of novel HOIs. Experiments show that the proposed method can generalize well to various novel HOI animation settings. Our project page is https://zhihou7.github.io/CHONA/

Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning: A Dynamic Weight-based Approach

  • Authors: Junlin Lu, Patrick Mannion, Karl Mason
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14115
  • Pdf link: https://arxiv.org/pdf/2304.14115
  • Abstract
    Many decision-making problems feature multiple objectives. In such problems, it is not always possible to know the preferences of a decision-maker for different objectives. However, it is often possible to observe the behavior of decision-makers. In multi-objective decision-making, preference inference is the process of inferring the preferences of a decision-maker for different objectives. This research proposes a Dynamic Weight-based Preference Inference (DWPI) algorithm that can infer the preferences of agents acting in multi-objective decision-making problems, based on observed behavior trajectories in the environment. The proposed method is evaluated on three multi-objective Markov decision processes: Deep Sea Treasure, Traffic, and Item Gathering. The performance of the proposed DWPI approach is compared to two existing preference inference methods from the literature, and empirical results demonstrate significant improvements compared to the baseline algorithms, in terms of both time requirements and accuracy of the inferred preferences. The Dynamic Weight-based Preference Inference algorithm also maintains its performance when inferring preferences for sub-optimal behavior demonstrations. In addition to its impressive performance, the Dynamic Weight-based Preference Inference algorithm does not require any interactions during training with the agent whose preferences are inferred, all that is required is a trajectory of observed behavior.

Learning Neural PDE Solvers with Parameter-Guided Channel Attention

  • Authors: Makoto Takamoto, Francesco Alesiani, Mathias Niepert
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn); Geophysics (physics.geo-ph)
  • Arxiv link: https://arxiv.org/abs/2304.14118
  • Pdf link: https://arxiv.org/pdf/2304.14118
  • Abstract
    Scientific Machine Learning (SciML) is concerned with the development of learned emulators of physical systems governed by partial differential equations (PDE). In application domains such as weather forecasting, molecular dynamics, and inverse design, ML-based surrogate models are increasingly used to augment or replace inefficient and often non-differentiable numerical simulation algorithms. While a number of ML-based methods for approximating the solutions of PDEs have been proposed in recent years, they typically do not adapt to the parameters of the PDEs, making it difficult to generalize to PDE parameters not seen during training. We propose a Channel Attention mechanism guided by PDE Parameter Embeddings (CAPE) component for neural surrogate models and a simple yet effective curriculum learning strategy. The CAPE module can be combined with neural PDE solvers allowing them to adapt to unseen PDE parameters. The curriculum learning strategy provides a seamless transition between teacher-forcing and fully auto-regressive training. We compare CAPE in conjunction with the curriculum learning strategy using a popular PDE benchmark and obtain consistent and significant improvements over the baseline models. The experiments also show several advantages of CAPE, such as its increased ability to generalize to unseen PDE parameters without large increases inference time and parameter count.

A particle method for non-local advection-selection-mutation equations

  • Authors: Frank Ernesto Alvarez, Jules Guilberteau
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14210
  • Pdf link: https://arxiv.org/pdf/2304.14210
  • Abstract
    The well-posedness of a non-local advection-selection-mutation problem deriving from adaptive dynamics models is shown for a wide family of initial data. A particle method is then developed, in order to approximate the solution of such problem by a regularised sum of weighted Dirac masses whose characteristics solve a suitably defined ODE system. The convergence of the particle method over any finite interval is shown and an explicit rate of convergence is given. Furthermore, we investigate the asymptotic-preserving properties of the method in large times, providing sufficient conditions for it to hold true as well as examples and counter-examples. Finally, we illustrate the method in two cases taken from the literature.

Some of the variables, some of the parameters, some of the times, with some physics known: Identification with partial information

  • Authors: Saurabh Malani, Tom S. Bertalan, Tianqi Cui, Jose L. Avalos, Michael Betenbaugh, Ioannis G. Kevrekidis
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.14214
  • Pdf link: https://arxiv.org/pdf/2304.14214
  • Abstract
    Experimental data is often comprised of variables measured independently, at different sampling rates (non-uniform ${\Delta}$t between successive measurements); and at a specific time point only a subset of all variables may be sampled. Approaches to identifying dynamical systems from such data typically use interpolation, imputation or subsampling to reorganize or modify the training data $\textit{prior}$ to learning. Partial physical knowledge may also be available $\textit{a priori}$ (accurately or approximately), and data-driven techniques can complement this knowledge. Here we exploit neural network architectures based on numerical integration methods and $\textit{a priori}$ physical knowledge to identify the right-hand side of the underlying governing differential equations. Iterates of such neural-network models allow for learning from data sampled at arbitrary time points $\textit{without}$ data modification. Importantly, we integrate the network with available partial physical knowledge in "physics informed gray-boxes"; this enables learning unknown kinetic rates or microbial growth functions while simultaneously estimating experimental parameters.

Fast Sampling of $b$-Matchings and $b$-Edge Covers

  • Authors: Zongchen Chen, Yuzhou Gu
  • Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Combinatorics (math.CO); Probability (math.PR)
  • Arxiv link: https://arxiv.org/abs/2304.14289
  • Pdf link: https://arxiv.org/pdf/2304.14289
  • Abstract
    For integer $b \ge 1$, a $b$-matching (resp. $b$-edge cover) of a graph $G=(V,E)$ is a subset $S\subseteq E$ of edges such that every vertex is incident with at most (resp. at least) $b$ edges from $S$. We prove that for any $b \ge 1$ the simple Glauber dynamics for sampling (weighted) $b$-matchings and $b$-edge covers mixes in $O(n\log n)$ time on all $n$-vertex bounded-degree graphs. This significantly improves upon previous results which have worse running time and only work for $b$-matchings with $b \le 7$ and for $b$-edge covers with $b \le 2$. Moreover generally, we prove spectral independence for a broad class of binary symmetric Holant problems with log-concave signatures, including $b$-matchings, $b$-edge covers, and antiferromagnetic $2$-spin edge models. We hence deduce optimal mixing time of Glauber dynamics from spectral independence.

Structured interpolation for multivariate transfer functions of quadratic-bilinear systems

  • Authors: Peter Benner, Serkan Gugercin, Steffen W. R. Werner
  • Subjects: Numerical Analysis (math.NA); Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.14292
  • Pdf link: https://arxiv.org/pdf/2304.14292
  • Abstract
    High-dimensional/high-fidelity nonlinear dynamical systems appear naturally when the goal is to accurately model real-world phenomena. Many physical properties are thereby encoded in the internal differential structure of these resulting large-scale nonlinear systems. The high-dimensionality of the dynamics causes computational bottlenecks, especially when these large-scale systems need to be simulated for a variety of situations such as different forcing terms. This motivates model reduction where the goal is to replace the full-order dynamics with accurate reduced-order surrogates. Interpolation-based model reduction has been proven to be an effective tool for the construction of cheap-to-evaluate surrogate models that preserve the internal structure in the case of weak nonlinearities. In this paper, we consider the construction of multivariate interpolants in frequency domain for structured quadratic-bilinear systems. We propose definitions for structured variants of the symmetric subsystem and generalized transfer functions of quadratic-bilinear systems and provide conditions for structure-preserving interpolation by projection. The theoretical results are illustrated using two numerical examples including the simulation of molecular dynamics in crystal structures.

On Solution Discovery via Reconfiguration

  • Authors: Michael R. Fellows, Mario Grobler, Nicole Megow, Amer E. Mouawad, Vijayaragunathan Ramamoorthi, Frances A. Rosamond, Daniel Schmand, Sebastian Siebertz
  • Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.14295
  • Pdf link: https://arxiv.org/pdf/2304.14295
  • Abstract
    The dynamics of real-world applications and systems require efficient methods for improving infeasible solutions or restoring corrupted ones by making modifications to the current state of a system in a restricted way. We propose a new framework of solution discovery via reconfiguration for constructing a feasible solution for a given problem by executing a sequence of small modifications starting from a given state. Our framework integrates and formalizes different aspects of classical local search, reoptimization, and combinatorial reconfiguration. We exemplify our framework on a multitude of fundamental combinatorial problems, namely Vertex Cover, Independent Set, Dominating Set, and Coloring. We study the classical as well as the parameterized complexity of the solution discovery variants of those problems and explore the boundary between tractable and intractable instances.

Learning Absorption Rates in Glucose-Insulin Dynamics from Meal Covariates

  • Authors: Ke Alexander Wang, Matthew E. Levine, Jiaxin Shi, Emily B. Fox
  • Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Quantitative Methods (q-bio.QM)
  • Arxiv link: https://arxiv.org/abs/2304.14300
  • Pdf link: https://arxiv.org/pdf/2304.14300
  • Abstract
    Traditional models of glucose-insulin dynamics rely on heuristic parameterizations chosen to fit observations within a laboratory setting. However, these models cannot describe glucose dynamics in daily life. One source of failure is in their descriptions of glucose absorption rates after meal events. A meal's macronutritional content has nuanced effects on the absorption profile, which is difficult to model mechanistically. In this paper, we propose to learn the effects of macronutrition content from glucose-insulin data and meal covariates. Given macronutrition information and meal times, we use a neural network to predict an individual's glucose absorption rate. We use this neural rate function as the control function in a differential equation of glucose dynamics, enabling end-to-end training. On simulated data, our approach is able to closely approximate true absorption rates, resulting in better forecast than heuristic parameterizations, despite only observing glucose, insulin, and macronutritional information. Our work readily generalizes to meal events with higher-dimensional covariates, such as images, setting the stage for glucose dynamics models that are personalized to each individual's daily life.

Empirical Individual State Observability

  • Authors: Benjamin Cellini, Burak Boyacıoğlu, Floris van Breugel
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.14313
  • Pdf link: https://arxiv.org/pdf/2304.14313
  • Abstract
    A dynamical system is observable if there is a one-to-one mapping from the system's measured outputs and inputs to all of the system's states. Analytical and empirical tools exist for quantifying the (full state) observability of linear and nonlinear systems; however, empirical tools for evaluating the observability of individual state variables are lacking. Here, a new empirical approach termed Empirical Individual State Observability (E-ISO) is developed to quantify the level of observability of individual state variables. E-ISO first builds an empirical observability matrix via simulation, then applies convex optimization to efficiently determine the subset of its rows required to estimate each state variable individually. Finally, (un)observability measures for these subsets are calculated to provide independent estimates of the observability of each state variable. Multiple example applications of E-ISO on linear and nonlinear systems are shown to be consistent with analytical results. Broadly, E-ISO will be an invaluable tool both for designing active sensing control laws or optimizing sensor placement to increase the observability of individual state variables for engineered systems, and analyzing the trajectory decisions made by organisms.

An Audit Framework for Adopting AI-Nudging on Children

  • Authors: Marianna Ganapini, Enrico Panai
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14338
  • Pdf link: https://arxiv.org/pdf/2304.14338
  • Abstract
    This is an audit framework for AI-nudging. Unlike the static form of nudging usually discussed in the literature, we focus here on a type of nudging that uses large amounts of data to provide personalized, dynamic feedback and interfaces. We call this AI-nudging (Lanzing, 2019, p. 549; Yeung, 2017). The ultimate goal of the audit outlined here is to ensure that an AI system that uses nudges will maintain a level of moral inertia and neutrality by complying with the recommendations, requirements, or suggestions of the audit (in other words, the criteria of the audit). In the case of unintended negative consequences, the audit suggests risk mitigation mechanisms that can be put in place. In the case of unintended positive consequences, it suggests some reinforcement mechanisms. Sponsored by the IBM-Notre Dame Tech Ethics Lab

SMAT: A Self-Reinforcing Framework for Simultaneous Mapping and Tracking in Unbounded Urban Environments

  • Authors: Tingxiang Fan, Bowen Shen, Yinqiang Zhang, Chuye Zhang, Lei Yang, Hua Chen, Wei Zhang, Jia Pan
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14356
  • Pdf link: https://arxiv.org/pdf/2304.14356
  • Abstract
    With the increasing prevalence of robots in daily life, it is crucial to enable robots to construct a reliable map online to navigate in unbounded and changing environments. Although existing methods can individually achieve the goals of spatial mapping and dynamic object detection and tracking, limited research has been conducted on an effective combination of these two important abilities. The proposed framework, SMAT (Simultaneous Mapping and Tracking), integrates the front-end dynamic object detection and tracking module with the back-end static mapping module using a self-reinforcing mechanism, which promotes mutual improvement of mapping and tracking performance. The conducted experiments demonstrate the framework's effectiveness in real-world applications, achieving successful long-range navigation and mapping in multiple urban environments using only one LiDAR, a CPU-only onboard computer, and a consumer-level GPS receiver.

Measuring and Modeling the Free Content Web

  • Authors: Abdulrahman Alabduljabbar, Runyu Ma, Ahmed Abusnaina, Rhongho Jang, Songqing Chen, DaeHun Nyang, and David Mohaisen
  • Subjects: Computers and Society (cs.CY); Cryptography and Security (cs.CR); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.14359
  • Pdf link: https://arxiv.org/pdf/2304.14359
  • Abstract
    Free content websites that provide free books, music, games, movies, etc., have existed on the Internet for many years. While it is a common belief that such websites might be different from premium websites providing the same content types, an analysis that supports this belief is lacking in the literature. In particular, it is unclear if those websites are as safe as their premium counterparts. In this paper, we set out to investigate, by analysis and quantification, the similarities and differences between free content and premium websites, including their risk profiles. To conduct this analysis, we assembled a list of 834 free content websites offering books, games, movies, music, and software, and 728 premium websites offering content of the same type. We then contribute domain-, content-, and risk-level analysis, examining and contrasting the websites' domain names, creation times, SSL certificates, HTTP requests, page size, average load time, and content type. For risk analysis, we consider and examine the maliciousness of these websites at the website- and component-level. Among other interesting findings, we show that free content websites tend to be vastly distributed across the TLDs and exhibit more dynamics with an upward trend for newly registered domains. Moreover, the free content websites are 4.5 times more likely to utilize an expired certificate, 19 times more likely to be malicious at the website level, and 2.64 times more likely to be malicious at the component level. Encouraged by the clear differences between the two types of websites, we explore the automation and generalization of the risk modeling of the free content risky websites, showing that a simple machine learning-based technique can produce 86.81% accuracy in identifying them.

Learning Neural Constitutive Laws From Motion Observations for Generalizable PDE Dynamics

  • Authors: Pingchuan Ma, Peter Yichen Chen, Bolei Deng, Joshua B. Tenenbaum, Tao Du, Chuang Gan, Wojciech Matusik
  • Subjects: Machine Learning (cs.LG); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.14369
  • Pdf link: https://arxiv.org/pdf/2304.14369
  • Abstract
    We propose a hybrid neural network (NN) and PDE approach for learning generalizable PDE dynamics from motion observations. Many NN approaches learn an end-to-end model that implicitly models both the governing PDE and constitutive models (or material models). Without explicit PDE knowledge, these approaches cannot guarantee physical correctness and have limited generalizability. We argue that the governing PDEs are often well-known and should be explicitly enforced rather than learned. Instead, constitutive models are particularly suitable for learning due to their data-fitting nature. To this end, we introduce a new framework termed "Neural Constitutive Laws" (NCLaw), which utilizes a network architecture that strictly guarantees standard constitutive priors, including rotation equivariance and undeformed state equilibrium. We embed this network inside a differentiable simulation and train the model by minimizing a loss function based on the difference between the simulation and the motion observation. We validate NCLaw on various large-deformation dynamical systems, ranging from solids to fluids. After training on a single motion trajectory, our method generalizes to new geometries, initial/boundary conditions, temporal ranges, and even multi-physics systems. On these extremely out-of-distribution generalization tasks, NCLaw is orders-of-magnitude more accurate than previous NN approaches. Real-world experiments demonstrate our method's ability to learn constitutive laws from videos.

Pseudo-Hamiltonian neural networks for learning partial differential equations

  • Authors: Sølve Eidnes, Kjetil Olsen Lye
  • Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14374
  • Pdf link: https://arxiv.org/pdf/2304.14374
  • Abstract
    Pseudo-Hamiltonian neural networks (PHNN) were recently introduced for learning dynamical systems that can be modelled by ordinary differential equations. In this paper, we extend the method to partial differential equations. The resulting model is comprised of up to three neural networks, modelling terms representing conservation, dissipation and external forces, and discrete convolution operators that can either be learned or be prior knowledge. We demonstrate numerically the superior performance of PHNN compared to a baseline model that models the full dynamics by a single neural network. Moreover, since the PHNN model consists of three parts with different physical interpretations, these can be studied separately to gain insight into the system, and the learned model is applicable also if external forces are removed or changed.

Dynamic Pricing and Learning with Bayesian Persuasion

  • Authors: Shipra Agrawal, Yiding Feng, Wei Tang
  • Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14385
  • Pdf link: https://arxiv.org/pdf/2304.14385
  • Abstract
    We consider a novel dynamic pricing and learning setting where in addition to setting prices of products in sequential rounds, the seller also ex-ante commits to 'advertising schemes'. That is, in the beginning of each round the seller can decide what kind of signal they will provide to the buyer about the product's quality upon realization. Using the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses, we formulate the problem of finding an optimal design of the advertising scheme along with a pricing scheme that maximizes the seller's expected revenue. Without any apriori knowledge of the buyers' demand function, our goal is to design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy. We study the regret of the algorithm when compared to the optimal clairvoyant price and advertising scheme. Our main result is a computationally efficient online algorithm that achieves an $O(T^{2/3}(m\log T)^{1/3})$ regret bound when the valuation function is linear in the product quality. Here $m$ is the cardinality of the discrete product quality domain and $T$ is the time horizon. This result requires some natural monotonicity and Lipschitz assumptions on the valuation function, but no Lipschitz or smoothness assumption on the buyers' demand function. For constant $m$, our result matches the regret lower bound for dynamic pricing within logarithmic factors, which is a special case of our problem. We also obtain several improved results for the widely considered special case of additive valuations, including an $\tilde{O}(T^{2/3})$ regret bound independent of $m$ when $m\le T^{1/3}$.

SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos

  • Authors: John Z. Zhang, Shuo Yang, Gengshan Yang, Arun L. Bishop, Deva Ramanan, Zachary Manchester
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14389
  • Pdf link: https://arxiv.org/pdf/2304.14389
  • Abstract
    We present SLoMo: a first-of-its-kind framework for transferring skilled motions from casually captured "in the wild" video footage of humans and animals to legged robots. SLoMo works in three stages: 1) synthesize a physically plausible reconstructed key-point trajectory from monocular videos; 2) optimize a dynamically feasible reference trajectory for the robot offline that includes body and foot motion, as well as contact sequences that closely tracks the key points; 3) track the reference trajectory online using a general-purpose model-predictive controller on robot hardware. Traditional motion imitation for legged motor skills often requires expert animators, collaborative demonstrations, and/or expensive motion capture equipment, all of which limits scalability. Instead, SLoMo only relies on easy-to-obtain monocular video footage, readily available in online repositories such as YouTube. It converts videos into motion primitives that can be executed reliably by real-world robots. We demonstrate our approach by transferring the motions of cats, dogs, and humans to example robots including a quadruped (on hardware) and a humanoid (in simulation). To the best knowledge of the authors, this is the first attempt at a general-purpose motion transfer framework that imitates animal and human motions on legged robots directly from casual videos without artificial markers or labels.

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

  • Authors: Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin, Ming-Hsuan Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14404
  • Pdf link: https://arxiv.org/pdf/2304.14404
  • Abstract
    Recent advancements in diffusion models have greatly improved the quality and diversity of synthesized content. To harness the expressive power of diffusion models, researchers have explored various controllable mechanisms that allow users to intuitively guide the content synthesis process. Although the latest efforts have primarily focused on video synthesis, there has been a lack of effective methods for controlling and describing desired content and motion. In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis. To tackle the ambiguity of sparse motion inputs and achieve better synthesis quality, MCDiff first utilizes a flow completion model to predict the dense video motion based on the semantic understanding of the video frame and the sparse motion control. Then, the diffusion model synthesizes high-quality future frames to form the output video. We qualitatively and quantitatively show that MCDiff achieves the state-the-of-art visual quality in stroke-guided controllable video synthesis. Additional experiments on MPII Human Pose further exhibit the capability of our model on diverse content and motion synthesis.

New submissions for Fri, 24 Mar 23

Keyword: pruning

Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation

  • Authors: Bingyi Zhang, Viktor Prasanna
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2303.12901
  • Pdf link: https://arxiv.org/pdf/2303.12901
  • Abstract
    Graph Neural Network (GNN) inference is used in many real-world applications. Data sparsity in GNN inference, including sparsity in the input graph and the GNN model, offer opportunities to further speed up inference. Also, many pruning techniques have been proposed for model compression that increase the data sparsity of GNNs. We propose Dynasparse, a comprehensive hardware-software codesign on FPGA to accelerate GNN inference through dynamic sparsity exploitation. For this, we decouple the GNN computation kernels from the basic computation primitives, and explore hardware-software codesign as follows: 1) Hardware design: We propose a novel unified accelerator design on FPGA to efficiently execute various computation primitives. We develop a customized soft processor that is tightly coupled with the accelerator to execute a runtime system. Moreover, we develop efficient hardware mechanisms to profile the data sparsity and perform on-the-fly data format transformation to prepare the input data for various computation primitives; 2) Software design: We develop a runtime system that works synergistically with the accelerator to perform dynamic kernel-to-primitive mapping based on data sparsity. We implement Dynasparse on a state-of-the-art FPGA platform, Xilinx Alveo U250, and evaluate the design using widely used GNN models (GCN, GraphSAGE, GIN and SGC). For the above GNN models and various input graphs, the proposed accelerator and dynamic kernel-to-primitive mapping reduces the inference latency by $3.73\times$ on the average compared with the static mapping strategies employed in the state-of-the-art GNN accelerators. Compared with state-of-the-art CPU (GPU) implementations, Dynasparse achieves up to $56.9\times$ ($2.37\times$) speedup in end-to-end latency.

CP$^3$: Channel Pruning Plug-in for Point-based Networks

  • Authors: Yaomin Huang, Ning Liu, Zhengping Che, Zhiyuan Xu, Chaomin Shen, Yaxin Peng, Guixu Zhang, Xinmei Liu, Feifei Feng, Jian Tang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.13097
  • Pdf link: https://arxiv.org/pdf/2303.13097
  • Abstract
    Channel pruning can effectively reduce both computational cost and memory footprint of the original network while keeping a comparable accuracy performance. Though great success has been achieved in channel pruning for 2D image-based convolutional networks (CNNs), existing works seldom extend the channel pruning methods to 3D point-based neural networks (PNNs). Directly implementing the 2D CNN channel pruning methods to PNNs undermine the performance of PNNs because of the different representations of 2D images and 3D point clouds as well as the network architecture disparity. In this paper, we proposed CP$^3$, which is a Channel Pruning Plug-in for Point-based network. CP$^3$ is elaborately designed to leverage the characteristics of point clouds and PNNs in order to enable 2D channel pruning methods for PNNs. Specifically, it presents a coordinate-enhanced channel importance metric to reflect the correlation between dimensional information and individual channel features, and it recycles the discarded points in PNN's sampling process and reconsiders their potentially-exclusive information to enhance the robustness of channel pruning. Experiments on various PNN architectures show that CP$^3$ constantly improves state-of-the-art 2D CNN pruning approaches on different point cloud tasks. For instance, our compressed PointNeXt-S on ScanObjectNN achieves an accuracy of 88.52% with a pruning rate of 57.8%, outperforming the baseline pruning methods with an accuracy gain of 1.94%.

DetOFA: Efficient Training of Once-for-All Networks for Object Detection by Using Pre-trained Supernet and Path Filter

  • Authors: Yuiko Sakuma, Masato Ishii, Takuya Narihira
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13121
  • Pdf link: https://arxiv.org/pdf/2303.13121
  • Abstract
    We address the challenge of training a large supernet for the object detection task, using a relatively small amount of training data. Specifically, we propose an efficient supernet-based neural architecture search (NAS) method that uses transfer learning and search space pruning. First, the supernet is pre-trained on a classification task, for which large datasets are available. Second, the search space defined by the supernet is pruned by removing candidate models that are predicted to perform poorly. To effectively remove the candidates over a wide range of resource constraints, we particularly design a performance predictor, called path filter, which can accurately predict the relative performance of the models that satisfy similar resource constraints. Hence, supernet training is more focused on the best-performing candidates. Our path filter handles prediction for paths with different resource budgets. Compared to once-for-all, our proposed method reduces the computational cost of the optimal network architecture by 30% and 63%, while yielding better accuracy-floating point operations Pareto front (0.85 and 0.45 points of improvement on average precision for Pascal VOC and COCO, respectively).

Keyword: neural\ architecture\ search

There is no result

Keyword: 3d object detection

MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer

  • Authors: Yunsong Zhou, Hongzi Zhu, Quan Liu, Shan Chang, Minyi Guo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13018
  • Pdf link: https://arxiv.org/pdf/2303.13018
  • Abstract
    Mobile monocular 3D object detection (Mono3D) (e.g., on a vehicle, a drone, or a robot) is an important yet challenging task. Existing transformer-based offline Mono3D models adopt grid-based vision tokens, which is suboptimal when using coarse tokens due to the limited available computational power. In this paper, we propose an online Mono3D framework, called MonoATT, which leverages a novel vision transformer with heterogeneous tokens of varying shapes and sizes to facilitate mobile Mono3D. The core idea of MonoATT is to adaptively assign finer tokens to areas of more significance before utilizing a transformer to enhance Mono3D. To this end, we first use prior knowledge to design a scoring network for selecting the most important areas of the image, and then propose a token clustering and merging network with an attention mechanism to gradually merge tokens around the selected areas in multiple stages. Finally, a pixel-level feature map is reconstructed from heterogeneous tokens before employing a SOTA Mono3D detector as the underlying detection core. Experiment results on the real-world KITTI dataset demonstrate that MonoATT can effectively improve the Mono3D accuracy for both near and far objects and guarantee low latency. MonoATT yields the best performance compared with the state-of-the-art methods by a large margin and is ranked number one on the KITTI 3D benchmark.

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

  • Authors: Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, Jinkun Cao, Jiangmiao Pang, Dahua Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13510
  • Pdf link: https://arxiv.org/pdf/2303.13510
  • Abstract
    This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset. Inspired by the scene-voxel-point hierarchy in downstream 3D object detectors, we design masking and reconstruction strategies accounting for voxel distributions in the scene and local point distributions within the voxel. We employ a Reversed-Furthest-Voxel-Sampling strategy to address the uneven distribution of LiDAR points and propose MV-JAR, which combines two techniques for modeling the aforementioned distributions, resulting in superior performance. Our experiments reveal limitations in previous data-efficient experiments, which uniformly sample fine-tuning splits with varying data proportions from each LiDAR sequence, leading to similar data diversity across splits. To address this, we propose a new benchmark that samples scene sequences for diverse fine-tuning splits, ensuring adequate model convergence and providing a more accurate evaluation of pre-training methods. Experiments on our Waymo benchmark and the KITTI dataset demonstrate that MV-JAR consistently and significantly improves 3D detection performance across various data scales, achieving up to a 6.3% increase in mAPH compared to training from scratch. Codes and the benchmark will be available at https://github.com/SmartBot-PJLab/MV-JAR .

Keyword: voxel

Marching-Primitives: Shape Abstraction from Signed Distance Function

  • Authors: Weixiao Liu, Yuwei Wu, Sipu Ruan, Gregory S. Chirikjian
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13190
  • Pdf link: https://arxiv.org/pdf/2303.13190
  • Abstract
    Representing complex objects with basic geometric primitives has long been a topic in computer vision. Primitive-based representations have the merits of compactness and computational efficiency in higher-level tasks such as physics simulation, collision checking, and robotic manipulation. Unlike previous works which extract polygonal meshes from a signed distance function (SDF), in this paper, we present a novel method, named Marching-Primitives, to obtain a primitive-based abstraction directly from an SDF. Our method grows geometric primitives (such as superquadrics) iteratively by analyzing the connectivity of voxels while marching at different levels of signed distance. For each valid connected volume of interest, we march on the scope of voxels from which a primitive is able to be extracted in a probabilistic sense and simultaneously solve for the parameters of the primitive to capture the underlying local geometry. We evaluate the performance of our method on both synthetic and real-world datasets. The results show that the proposed method outperforms the state-of-the-art in terms of accuracy, and is directly generalizable among different categories and scales. The code is open-sourced at https://github.com/ChirikjianLab/Marching-Primitives.git.

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

  • Authors: Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, Jinkun Cao, Jiangmiao Pang, Dahua Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13510
  • Pdf link: https://arxiv.org/pdf/2303.13510
  • Abstract
    This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset. Inspired by the scene-voxel-point hierarchy in downstream 3D object detectors, we design masking and reconstruction strategies accounting for voxel distributions in the scene and local point distributions within the voxel. We employ a Reversed-Furthest-Voxel-Sampling strategy to address the uneven distribution of LiDAR points and propose MV-JAR, which combines two techniques for modeling the aforementioned distributions, resulting in superior performance. Our experiments reveal limitations in previous data-efficient experiments, which uniformly sample fine-tuning splits with varying data proportions from each LiDAR sequence, leading to similar data diversity across splits. To address this, we propose a new benchmark that samples scene sequences for diverse fine-tuning splits, ensuring adequate model convergence and providing a more accurate evaluation of pre-training methods. Experiments on our Waymo benchmark and the KITTI dataset demonstrate that MV-JAR consistently and significantly improves 3D detection performance across various data scales, achieving up to a 6.3% increase in mAPH compared to training from scratch. Codes and the benchmark will be available at https://github.com/SmartBot-PJLab/MV-JAR .

Keyword: lidar

MMFormer: Multimodal Transformer Using Multiscale Self-Attention for Remote Sensing Image Classification

  • Authors: Bo Zhang, Zuheng Ming, Wei Feng, Yaqian Liu, Liang He, Kaixing Zhao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13101
  • Pdf link: https://arxiv.org/pdf/2303.13101
  • Abstract
    To benefit the complementary information between heterogeneous data, we introduce a new Multimodal Transformer (MMFormer) for Remote Sensing (RS) image classification using Hyperspectral Image (HSI) accompanied by another source of data such as Light Detection and Ranging (LiDAR). Compared with traditional Vision Transformer (ViT) lacking inductive biases of convolutions, we first introduce convolutional layers to our MMFormer to tokenize patches from multimodal data of HSI and LiDAR. Then we propose a Multi-scale Multi-head Self-Attention (MSMHSA) module to address the problem of compatibility which often limits to fuse HSI with high spectral resolution and LiDAR with relatively low spatial resolution. The proposed MSMHSA module can incorporate HSI to LiDAR data in a coarse-to-fine manner enabling us to learn a fine-grained representation. Extensive experiments on widely used benchmarks (e.g., Trento and MUUFL) demonstrate the effectiveness and superiority of our proposed MMFormer for RS image classification.

Position-Guided Point Cloud Panoptic Segmentation Transformer

  • Authors: Zeqi Xiao, Wenwei Zhang, Tai Wang, Chen Change Loy, Dahua Lin, Jiangmiao Pang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13509
  • Pdf link: https://arxiv.org/pdf/2303.13509
  • Abstract
    DEtection TRansformer (DETR) started a trend that uses a group of learnable queries for unified visual perception. This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline. Although the naive adaptation obtains fair results, the instance segmentation performance is noticeably inferior to previous works. By diving into the details, we observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain. Considering instances in 3D are more featured by their positional information, we emphasize their roles during the modeling and design a robust Mixed-parameterized Positional Embedding (MPE) to guide the segmentation process. It is embedded into backbone features and later guides the mask prediction and query update processes iteratively, leading to Position-Aware Segmentation (PA-Seg) and Masked Focal Attention (MFA). All these designs impel the queries to attend to specific regions and identify various instances. The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% PQ on SemanticKITTI and nuScenes benchmark, respectively. The source code and models are available at https://github.com/SmartBot-PJLab/P3Former .

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

  • Authors: Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, Jinkun Cao, Jiangmiao Pang, Dahua Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13510
  • Pdf link: https://arxiv.org/pdf/2303.13510
  • Abstract
    This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset. Inspired by the scene-voxel-point hierarchy in downstream 3D object detectors, we design masking and reconstruction strategies accounting for voxel distributions in the scene and local point distributions within the voxel. We employ a Reversed-Furthest-Voxel-Sampling strategy to address the uneven distribution of LiDAR points and propose MV-JAR, which combines two techniques for modeling the aforementioned distributions, resulting in superior performance. Our experiments reveal limitations in previous data-efficient experiments, which uniformly sample fine-tuning splits with varying data proportions from each LiDAR sequence, leading to similar data diversity across splits. To address this, we propose a new benchmark that samples scene sequences for diverse fine-tuning splits, ensuring adequate model convergence and providing a more accurate evaluation of pre-training methods. Experiments on our Waymo benchmark and the KITTI dataset demonstrate that MV-JAR consistently and significantly improves 3D detection performance across various data scales, achieving up to a 6.3% increase in mAPH compared to training from scratch. Codes and the benchmark will be available at https://github.com/SmartBot-PJLab/MV-JAR .

New submissions for Mon, 17 Apr 23

Keyword: efficient

End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs

  • Authors: Javier Campos, Zhen Dong, Javier Duarte, Amir Gholami, Michael W. Mahoney, Jovan Mitrevski, Nhan Tran
  • Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); High Energy Physics - Experiment (hep-ex); Instrumentation and Detectors (physics.ins-det)
  • Arxiv link: https://arxiv.org/abs/2304.06745
  • Pdf link: https://arxiv.org/pdf/2304.06745
  • Abstract
    We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs) for efficient field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) hardware. Our approach leverages Hessian-aware quantization (HAWQ) of NNs, the Quantized Open Neural Network Exchange (QONNX) intermediate representation, and the hls4ml tool flow for transpiling NNs into FPGA and ASIC firmware. This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow that can be deployed for real-time machine learning applications in a wide range of scientific and industrial settings. We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the CERN Large Hadron Collider (LHC). Given the high collision rate, all data processing must be implemented on custom ASIC and FPGA hardware within a strict area and latency. Based on these constraints, we implement an optimized mixed-precision NN classifier for high-momentum particle jets in simulated LHC proton-proton collisions.

A Polynomial Time, Pure Differentially Private Estimator for Binary Product Distributions

  • Authors: Vikrant Singhal
  • Subjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.06787
  • Pdf link: https://arxiv.org/pdf/2304.06787
  • Abstract
    We present the first $\varepsilon$-differentially private, computationally efficient algorithm that estimates the means of product distributions over ${0,1}^d$ accurately in total-variation distance, whilst attaining the optimal sample complexity to within polylogarithmic factors. The prior work had either solved this problem efficiently and optimally under weaker notions of privacy, or had solved it optimally while having exponential running times.

Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction

  • Authors: Guillaume Jaume, Anurag Vaidya, Richard Chen, Drew Williamson, Paul Liang, Faisal Mahmood
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Genomics (q-bio.GN); Quantitative Methods (q-bio.QM); Tissues and Organs (q-bio.TO)
  • Arxiv link: https://arxiv.org/abs/2304.06819
  • Pdf link: https://arxiv.org/pdf/2304.06819
  • Abstract
    Integrating whole-slide images (WSIs) and bulk transcriptomics for predicting patient survival can improve our understanding of patient prognosis. However, this multimodal task is particularly challenging due to the different nature of these data: WSIs represent a very high-dimensional spatial description of a tumor, while bulk transcriptomics represent a global description of gene expression levels within that tumor. In this context, our work aims to address two key challenges: (1) how can we tokenize transcriptomics in a semantically meaningful and interpretable way?, and (2) how can we capture dense multimodal interactions between these two modalities? Specifically, we propose to learn biological pathway tokens from transcriptomics that can encode specific cellular functions. Together with histology patch tokens that encode the different morphological patterns in the WSI, we argue that they form appropriate reasoning units for downstream interpretability analyses. We propose fusing both modalities using a memory-efficient multimodal Transformer that can model interactions between pathway and histology patch tokens. Our proposed model, SURVPATH, achieves state-of-the-art performance when evaluated against both unimodal and multimodal baselines on five datasets from The Cancer Genome Atlas. Our interpretability framework identifies key multimodal prognostic factors, and, as such, can provide valuable insights into the interaction between genotype and phenotype, enabling a deeper understanding of the underlying biological mechanisms at play. We make our code public at: https://github.com/ajv012/SurvPath.

Reachability Analysis of Nonlinear Systems Using Hybrid Zonotopes and Functional Decomposition

  • Authors: Jacob A. Siefert, Trevor J. Bird, Justin P. Koeln, Neera Jain, Herschel C. Pangborn
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06827
  • Pdf link: https://arxiv.org/pdf/2304.06827
  • Abstract
    This paper proposes methods for reachability analysis of nonlinear systems in both open loop and closed loop with advanced controllers. The methods combine hybrid zonotopes, a construct called a state-update set, functional decomposition, and special ordered set approximations to enable linear growth in both reachable set memory complexity and computational complexity with time. Facilitating this combination are new identities for constructing nonconvex sets that contain nonlinear functions and for efficiently converting a collection of polytopes from vertex representation to hybrid zonotope representation. Numerical examples demonstrate reachability analysis of a continuous-time nonlinear system in closed loop with a neural network controller trained using nonlinear model predictive control and a high-dimensional logical system.

Multi-Layer Continuum Deformation Optimization of Multi-Agent Systems

  • Authors: Harshvardhan Uppaluru, Hossein Rastgoftar
  • Subjects: Multiagent Systems (cs.MA); Robotics (cs.RO); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.06839
  • Pdf link: https://arxiv.org/pdf/2304.06839
  • Abstract
    This paper studies the problem of safe and optimal continuum deformation of a large-scale multi-agent system (MAS). We present a novel approach for MAS continuum deformation coordination that aims to achieve safe and efficient agent movement using a leader-follower multi-layer hierarchical optimization framework with a single input layer, multiple hidden layers, and a single output layer. The input layer receives the reference (material) positions of the primary leaders, the hidden layers compute the desired positions of the interior leader agents and followers, and the output layer computes the nominal position of the MAS configuration. By introducing a lower bound on the major principles of the strain field of the MAS deformation, we obtain linear inequality safety constraints and ensure inter-agent collision avoidance. The continuum deformation optimization is formulated as a quadratic programming problem. It consists of the following components: (i) decision variables that represent the weights in the first hidden layer; (ii) a quadratic cost function that penalizes deviation of the nominal MAS trajectory from the desired MAS trajectory; and (iii) inequality safety constraints that ensure inter-agent collision avoidance. To validate the proposed approach, we simulate and present the results of continuum deformation on a large-scale quadcopter team tracking a desired helix trajectory, demonstrating improvements in safety and efficiency.

Application of the Bell polynomials for the solution of some differential-algebraic equations

  • Authors: Hari Mohan Srivastava, Giriraj Methi, Anil Kumar, Mohammad Izadi, Vishnu Narayan Mishra, Brahim Benhammouda
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.06856
  • Pdf link: https://arxiv.org/pdf/2304.06856
  • Abstract
    The differential transform method is used to find numerical approximation of solution to a class of certain nonlinear differential algebraic equations. The method is based on Taylor's theorem. Coefficients of the Taylor series are determined by constructing a recurrence relation. To deal with nonlinearity of the problems, the Fa`{a} di Bruno's formula containing the partial ordinary Bell polynomials is applied within the differential transform to avoid computation of symbolic derivatives. The error estimation results are presented too. Four concrete problems are studied to show efficiency and reliability of the method. The obtained results are compared to other methods.

Quantum Algorithms for Multiscale Partial Differential Equations

  • Authors: Junpeng Hu, Shi Jin, Lei Zhang
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.06902
  • Pdf link: https://arxiv.org/pdf/2304.06902
  • Abstract
    Partial differential equation (PDE) models with multiple temporal/spatial scales are prevalent in several disciplines such as physics, engineering, and many others. These models are of great practical importance but notoriously difficult to solve due to prohibitively small mesh and time step sizes limited by the scaling parameter and CFL condition. Another challenge in scientific computing could come from curse-of-dimensionality. In this paper, we aim to provide a quantum algorithm, based on either direct approximations of the original PDEs or their homogenized models, for prototypical multiscale problems in partial differential equations (PDEs), including elliptic, parabolic and hyperbolic PDEs. To achieve this, we will lift these problems to higher dimensions and leverage the recently developed Schr"{o}dingerization based quantum simulation algorithms to efficiently reduce the computational cost of the resulting high-dimensional and multiscale problems. We will examine the error contributions arising from discretization, homogenization, and relaxation, analyze and compare the complexities of these algorithms in order to identify the best algorithms in terms of complexities for different equations in different regimes.

Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding

  • Authors: Yu-Qi Yang, Yu-Xiao Guo, Jian-Yu Xiong, Yang Liu, Hao Pan, Peng-Shuai Wang, Xin Tong, Baining Guo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06906
  • Pdf link: https://arxiv.org/pdf/2304.06906
  • Abstract
    Pretrained backbones with fine-tuning have been widely adopted in 2D vision and natural language processing tasks and demonstrated significant advantages to task-specific networks. In this paper, we present a pretrained 3D backbone, named {\SST}, which first outperforms all state-of-the-art methods in downstream 3D indoor scene understanding tasks. Our backbone network is based on a 3D Swin transformer and carefully designed to efficiently conduct self-attention on sparse voxels with linear memory complexity and capture the irregularity of point signals via generalized contextual relative positional embedding. Based on this backbone design, we pretrained a large {\SST} model on a synthetic Structed3D dataset that is 10 times larger than the ScanNet dataset and fine-tuned the pretrained model in various downstream real-world indoor scene understanding tasks. The results demonstrate that our model pretrained on the synthetic dataset not only exhibits good generality in both downstream segmentation and detection on real 3D point datasets, but also surpasses the state-of-the-art methods on downstream tasks after fine-tuning with +2.3 mIoU and +2.2 mIoU on S3DIS Area5 and 6-fold semantic segmentation, +2.1 mIoU on ScanNet segmentation (val), +1.9 [email protected] on ScanNet detection, +8.1 [email protected] on S3DIS detection. Our method demonstrates the great potential of pretrained 3D backbones with fine-tuning for 3D understanding tasks. The code and models are available at https://github.com/microsoft/Swin3D .

An NMPC-ECBF Framework for Dynamic Motion Planning and Execution in vision-based Human-Robot Collaboration

  • Authors: Dianhao Zhang, Mien Van, Pantelis Sopasakis, Seán McLoone
  • Subjects: Robotics (cs.RO); Image and Video Processing (eess.IV); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06923
  • Pdf link: https://arxiv.org/pdf/2304.06923
  • Abstract
    To enable safe and effective human-robot collaboration (HRC) in smart manufacturing, seamless integration of sensing, cognition, and prediction into the robot controller is critical for real-time awareness, response, and communication inside a heterogeneous environment (robots, humans, and equipment). The proposed approach takes advantage of the prediction capabilities of nonlinear model predictive control (NMPC) to execute a safe path planning based on feedback from a vision system. In order to satisfy the requirement of real-time path planning, an embedded solver based on a penalty method is applied. However, due to tight sampling times NMPC solutions are approximate, and hence the safety of the system cannot be guaranteed. To address this we formulate a novel safety-critical paradigm with an exponential control barrier function (ECBF) used as a safety filter. We also design a simple human-robot collaboration scenario using V-REP to evaluate the performance of the proposed controller and investigate whether integrating human pose prediction can help with safe and efficient collaboration. The robot uses OptiTrack cameras for perception and dynamically generates collision-free trajectories to the predicted target interactive position. Results for a number of different configurations confirm the efficiency of the proposed motion planning and execution framework. It yields a 19.8% reduction in execution time for the HRC task considered.

Scale Federated Learning for Label Set Mismatch in Medical Image Classification

  • Authors: Zhipeng Deng, Luyang Luo, Hao Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06931
  • Pdf link: https://arxiv.org/pdf/2304.06931
  • Abstract
    Federated learning (FL) has been introduced to the healthcare domain as a decentralized learning paradigm that allows multiple parties to train a model collaboratively without privacy leakage. However, most previous studies have assumed that every client holds an identical label set. In reality, medical specialists tend to annotate only diseases within their knowledge domain or interest. This implies that label sets in each client can be different and even disjoint. In this paper, we propose the framework FedLSM to solve the problem Label Set Mismatch. FedLSM adopts different training strategies on data with different uncertainty levels to efficiently utilize unlabeled or partially labeled data as well as class-wise adaptive aggregation in the classification layer to avoid inaccurate aggregation when clients have missing labels. We evaluate FedLSM on two public real-world medical image datasets, including chest x-ray (CXR) diagnosis with 112,120 CXR images and skin lesion diagnosis with 10,015 dermoscopy images, and show that it significantly outperforms other state-of-the-art FL algorithms. Code will be made available upon acceptance.

Groebner.jl: A package for Gröbner bases computations in Julia

  • Authors: Alexander Demin, Shashi Gowda
  • Subjects: Mathematical Software (cs.MS); Symbolic Computation (cs.SC); Commutative Algebra (math.AC)
  • Arxiv link: https://arxiv.org/abs/2304.06935
  • Pdf link: https://arxiv.org/pdf/2304.06935
  • Abstract
    We introduce the Julia package Groebner.jl for computing Gr"obner bases with the F4 algorithm. Groebner.jl is an efficient, lightweight, portable, thoroughly tested, and documented open-source software. The package works over integers modulo a prime and over the rationals and supports various monomial orderings. The implementation incorporates modern symbolic computation techniques and leverages the Julia type system and tooling, which allows Groebner.jl to be on par in performance with the leading computer algebra systems. Our package is freely available at https://github.com/sumiya11/Groebner.jl .

AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks

  • Authors: Abhisek Kundu, Naveen K. Mellempudi, Dharma Teja Vooturi, Bharat Kaul, Pradeep Dubey
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06941
  • Pdf link: https://arxiv.org/pdf/2304.06941
  • Abstract
    Sparse training is emerging as a promising avenue for reducing the computational cost of training neural networks. Several recent studies have proposed pruning methods using learnable thresholds to efficiently explore the non-uniform distribution of sparsity inherent within the models. In this paper, we propose Gradient Annealing (GA), where gradients of masked weights are scaled down in a non-linear manner. GA provides an elegant trade-off between sparsity and accuracy without the need for additional sparsity-inducing regularization. We integrated GA with the latest learnable pruning methods to create an automated sparse training algorithm called AutoSparse, which achieves better accuracy and/or training/inference FLOPS reduction than existing learnable pruning methods for sparse ResNet50 and MobileNetV1 on ImageNet-1K: AutoSparse achieves (2x, 7x) reduction in (training,inference) FLOPS for ResNet50 on ImageNet at 80% sparsity. Finally, AutoSparse outperforms sparse-to-sparse SotA method MEST (uniform sparsity) for 80% sparse ResNet50 with similar accuracy, where MEST uses 12% more training FLOPS and 50% more inference FLOPS.

Cultural-aware Machine Learning based Analysis of COVID-19 Vaccine Hesitancy

  • Authors: Raed Alharbi, Sylvia Chan-Olmsted, Huan Chen, My T. Thai
  • Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06953
  • Pdf link: https://arxiv.org/pdf/2304.06953
  • Abstract
    Understanding the COVID-19 vaccine hesitancy, such as who and why, is very crucial since a large-scale vaccine adoption remains as one of the most efficient methods of controlling the pandemic. Such an understanding also provides insights into designing successful vaccination campaigns for future pandemics. Unfortunately, there are many factors involving in deciding whether to take the vaccine, especially from the cultural point of view. To obtain these goals, we design a novel culture-aware machine learning (ML) model, based on our new data collection, for predicting vaccination willingness. We further analyze the most important features which contribute to the ML model's predictions using advanced AI explainers such as the Probabilistic Graphical Model (PGM) and Shapley Additive Explanations (SHAP). These analyses reveal the key factors that most likely impact the vaccine adoption decisions. Our findings show that Hispanic and African American are most likely impacted by cultural characteristics such as religions and ethnic affiliation, whereas the vaccine trust and approval influence the Asian communities the most. Our results also show that cultural characteristics, rumors, and political affiliation are associated with increased vaccine rejection.

Self-Supervised Learning based Depth Estimation from Monocular Images

  • Authors: Mayank Poddar, Akash Mishra, Mohit Kewlani, Haoyang Pei
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06966
  • Pdf link: https://arxiv.org/pdf/2304.06966
  • Abstract
    Depth Estimation has wide reaching applications in the field of Computer vision such as target tracking, augmented reality, and self-driving cars. The goal of Monocular Depth Estimation is to predict the depth map, given a 2D monocular RGB image as input. The traditional depth estimation methods are based on depth cues and used concepts like epipolar geometry. With the evolution of Convolutional Neural Networks, depth estimation has undergone tremendous strides. In this project, our aim is to explore possible extensions to existing SoTA Deep Learning based Depth Estimation Models and to see whether performance metrics could be further improved. In a broader sense, we are looking at the possibility of implementing Pose Estimation, Efficient Sub-Pixel Convolution Interpolation, Semantic Segmentation Estimation techniques to further enhance our proposed architecture and to provide fine-grained and more globally coherent depth map predictions. We also plan to do away with camera intrinsic parameters during training and apply weather augmentations to further generalize our model.

LightRW: FPGA Accelerated Graph Dynamic Random Walks

  • Authors: Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, Weng-Fai Wong
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.07004
  • Pdf link: https://arxiv.org/pdf/2304.07004
  • Abstract
    Graph dynamic random walks (GDRWs) have recently emerged as a powerful paradigm for graph analytics and learning applications, including graph embedding and graph neural networks. Despite the fact that many existing studies optimize the performance of GDRWs on multi-core CPUs, massive random memory accesses and costly synchronizations cause severe resource underutilization, and the processing of GDRWs is usually the key performance bottleneck in many graph applications. This paper studies an alternative architecture, FPGA, to address these issues in GDRWs, as FPGA has the ability of hardware customization so that we are able to explore fine-grained pipeline execution and specialized memory access optimizations. Specifically, we propose {LightRW}, a novel FPGA-based accelerator for GDRWs. LightRW embraces a series of optimizations to enable fine-grained pipeline execution on the chip and to exploit the massive parallelism of FPGA while significantly reducing memory accesses. As current commonly used sampling methods in GDRWs do not efficiently support fine-grained pipeline execution, we develop a parallelized reservoir sampling method to sample multiple vertices per cycle for efficient pipeline execution. To address the random memory access issues, we propose a degree-aware configurable caching method that buffers hot vertices on-chip to alleviate random memory accesses and a dynamic burst access engine that efficiently retrieves neighbors. Experimental results show that our optimization techniques are able to improve the performance of GDRWs on FPGA significantly. Moreover, LightRW delivers up to 9.55x and 9.10x speedup over the state-of-the-art CPU-based MetaPath and Node2vec random walks, respectively. This work is open-sourced on GitHub at https://github.com/Xtra-Computing/LightRW.

DIPNet: Efficiency Distillation and Iterative Pruning for Image Super-Resolution

  • Authors: Lei Yu, Xinpeng Li, Youwei Li, Ting Jiang, Qi Wu, Haoqiang Fan, Shuaicheng Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.07018
  • Pdf link: https://arxiv.org/pdf/2304.07018
  • Abstract
    Efficient deep learning-based approaches have achieved remarkable performance in single image super-resolution. However, recent studies on efficient super-resolution have mainly focused on reducing the number of parameters and floating-point operations through various network designs. Although these methods can decrease the number of parameters and floating-point operations, they may not necessarily reduce actual running time. To address this issue, we propose a novel multi-stage lightweight network boosting method, which can enable lightweight networks to achieve outstanding performance. Specifically, we leverage enhanced high-resolution output as additional supervision to improve the learning ability of lightweight student networks. Upon convergence of the student network, we further simplify our network structure to a more lightweight level using reparameterization techniques and iterative network pruning. Meanwhile, we adopt an effective lightweight network training strategy that combines multi-anchor distillation and progressive learning, enabling the lightweight network to achieve outstanding performance. Ultimately, our proposed method achieves the fastest inference time among all participants in the NTIRE 2023 efficient super-resolution challenge while maintaining competitive super-resolution performance. Additionally, extensive experiments are conducted to demonstrate the effectiveness of the proposed components. The results show that our approach achieves comparable performance in representative dataset DIV2K, both qualitatively and quantitatively, with faster inference and fewer number of network parameters.

FairRec: Fairness Testing for Deep Recommender Systems

  • Authors: Huizhong Guo, Jinfeng Li, Jingyi Wang, Xiangyu Liu, Dongxia Wang, Zehong Hu, Rong Zhang, Hui Xue
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.07030
  • Pdf link: https://arxiv.org/pdf/2304.07030
  • Abstract
    Deep learning-based recommender systems (DRSs) are increasingly and widely deployed in the industry, which brings significant convenience to people's daily life in different ways. However, recommender systems are also shown to suffer from multiple issues,e.g., the echo chamber and the Matthew effect, of which the notation of "fairness" plays a core role.While many fairness notations and corresponding fairness testing approaches have been developed for traditional deep classification models, they are essentially hardly applicable to DRSs. One major difficulty is that there still lacks a systematic understanding and mapping between the existing fairness notations and the diverse testing requirements for deep recommender systems, not to mention further testing or debugging activities. To address the gap, we propose FairRec, a unified framework that supports fairness testing of DRSs from multiple customized perspectives, e.g., model utility, item diversity, item popularity, etc. We also propose a novel, efficient search-based testing approach to tackle the new challenge, i.e., double-ended discrete particle swarm optimization (DPSO) algorithm, to effectively search for hidden fairness issues in the form of certain disadvantaged groups from a vast number of candidate groups. Given the testing report, by adopting a simple re-ranking mitigation strategy on these identified disadvantaged groups, we show that the fairness of DRSs can be significantly improved. We conducted extensive experiments on multiple industry-level DRSs adopted by leading companies. The results confirm that FairRec is effective and efficient in identifying the deeply hidden fairness issues, e.g., achieving 95% testing accuracy with half to 1/8 time.

Task-oriented Document-Grounded Dialog Systems by HLTPR@RWTH for DSTC9 and DSTC10

  • Authors: David Thulke, Nico Daheim, Christian Dugast, Hermann Ney
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.07101
  • Pdf link: https://arxiv.org/pdf/2304.07101
  • Abstract
    This paper summarizes our contributions to the document-grounded dialog tasks at the 9th and 10th Dialog System Technology Challenges (DSTC9 and DSTC10). In both iterations the task consists of three subtasks: first detect whether the current turn is knowledge seeking, second select a relevant knowledge document, and third generate a response grounded on the selected document. For DSTC9 we proposed different approaches to make the selection task more efficient. The best method, Hierarchical Selection, actually improves the results compared to the original baseline and gives a speedup of 24x. In the DSTC10 iteration of the task, the challenge was to adapt systems trained on written dialogs to perform well on noisy automatic speech recognition transcripts. Therefore, we proposed data augmentation techniques to increase the robustness of the models as well as methods to adapt the style of generated responses to fit well into the proceeding dialog. Additionally, we proposed a noisy channel model that allows for increasing the factuality of the generated responses. In addition to summarizing our previous contributions, in this work, we also report on a few small improvements and reconsider the automatic evaluation metrics for the generation task which have shown a low correlation to human judgments.

Grouping Shapley Value Feature Importances of Random Forests for explainable Yield Prediction

  • Authors: Florian Huber, Hannes Engler, Anna Kicherer, Katja Herzog, Reinhard Töpfer, Volker Steinhage
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.07111
  • Pdf link: https://arxiv.org/pdf/2304.07111
  • Abstract
    Explainability in yield prediction helps us fully explore the potential of machine learning models that are already able to achieve high accuracy for a variety of yield prediction scenarios. The data included for the prediction of yields are intricate and the models are often difficult to understand. However, understanding the models can be simplified by using natural groupings of the input features. Grouping can be achieved, for example, by the time the features are captured or by the sensor used to do so. The state-of-the-art for interpreting machine learning models is currently defined by the game-theoretic approach of Shapley values. To handle groups of features, the calculated Shapley values are typically added together, ignoring the theoretical limitations of this approach. We explain the concept of Shapley values directly computed for predefined groups of features and introduce an algorithm to compute them efficiently on tree structures. We provide a blueprint for designing swarm plots that combine many local explanations for global understanding. Extensive evaluation of two different yield prediction problems shows the worth of our approach and demonstrates how we can enable a better understanding of yield prediction models in the future, ultimately leading to mutual enrichment of research and application.

Resource Allocation and Passive Beamforming for IRS-assisted URLLC Systems

  • Authors: Yangyi Zhang, Xinrong Guan, Zhi Ji, Qingqing Wu, Yueming Cai
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.07120
  • Pdf link: https://arxiv.org/pdf/2304.07120
  • Abstract
    In this correspondence, we investigate an intelligent reflective surface (IRS) assisted downlink ultra-reliable and low-latency communication (URLLC) system, where an access point (AP) sends short packets to multiple devices with the help of an IRS. Specifically, a performance comparison between the frequency division multiple access (FDMA) and time division multiple access (TDMA) is conducted for the considered system, from the perspective of average age of information (AoI). Aiming to minimize the maximum average AoI among all devices by jointly optimizing the resource allocation and passive beamforming. However, the formulated problem is difficult to solve due to the non-convex objective function and coupled variables. Thus, we propose an alternating optimization based algorithm by dividing the original problem into two sub-problems which can be efficiently solved. Simulation results show that TDMA can achieve lower AoI by exploiting the time-selective passive beamforming of IRS for maximizing the signal to noise ratio (SNR) of each device consecutively. Moreover, it also shows that as the length of information bits becomes sufficiently large as compared to the available bandwidth, the proposed FDMA transmission scheme becomes more favorable instead, due to the more effective utilization of bandwidth.

A Dynamic Heterogeneous Team-based Non-iterative Approach for Online Pick-up and Just-In-Time Delivery Problems

  • Authors: Shridhar Velhal, Srikrishna B R, Mukunda Bharatheesha, Suresh Sundaram
  • Subjects: Multiagent Systems (cs.MA); Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.07124
  • Pdf link: https://arxiv.org/pdf/2304.07124
  • Abstract
    This paper presents a non-iterative approach for finding the assignment of heterogeneous robots to efficiently execute online Pickup and Just-In-Time Delivery (PJITD) tasks with optimal resource utilization. The PJITD assignments problem is formulated as a spatio-temporal multi-task assignment (STMTA) problem. The physical constraints on the map and vehicle dynamics are incorporated in the cost formulation. The linear sum assignment problem is formulated for the heterogeneous STMTA problem. The recently proposed Dynamic Resource Allocation with Multi-task assignments (DREAM) approach has been modified to solve the heterogeneous PJITD problem. At the start, it computes the minimum number of robots required (with their types) to execute given heterogeneous PJITD tasks. These required robots are added to the team to guarantee the feasibility of all PJITD tasks. Then robots in an updated team are assigned to execute the PJITD tasks while minimizing the total cost for the team to execute all PJITD tasks. The performance of the proposed non-iterative approach has been validated using high-fidelity software-in-loop simulations and hardware experiments. The simulations and experimental results clearly indicate that the proposed approach is scalable and provides optimal resource utilization.

TUM-FAÇADE: Reviewing and enriching point cloud benchmarks for façade segmentation

  • Authors: Olaf Wysocki, Ludwig Hoegner, Uwe Stilla
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.07140
  • Pdf link: https://arxiv.org/pdf/2304.07140
  • Abstract
    Point clouds are widely regarded as one of the best dataset types for urban mapping purposes. Hence, point cloud datasets are commonly investigated as benchmark types for various urban interpretation methods. Yet, few researchers have addressed the use of point cloud benchmarks for fa\c{c}ade segmentation. Robust fa\c{c}ade segmentation is becoming a key factor in various applications ranging from simulating autonomous driving functions to preserving cultural heritage. In this work, we present a method of enriching existing point cloud datasets with fa\c{c}ade-related classes that have been designed to facilitate fa\c{c}ade segmentation testing. We propose how to efficiently extend existing datasets and comprehensively assess their potential for fa\c{c}ade segmentation. We use the method to create the TUM-FA\c{C}ADE dataset, which extends the capabilities of TUM-MLS-2016. Not only can TUM-FA\c{C}ADE facilitate the development of point-cloud-based fa\c{c}ade segmentation tasks, but our procedure can also be applied to enrich further datasets.

Eunomia: Enabling User-specified Fine-Grained Search in Symbolically Executing WebAssembly Binaries

  • Authors: Ningyu He, Zhehao Zhao, Jikai Wang, Yubin Hu, Shengjian Guo, Haoyu Wang, Guangtai Liang, Ding Li, Xiangqun Chen, Yao Guo
  • Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.07204
  • Pdf link: https://arxiv.org/pdf/2304.07204
  • Abstract
    Although existing techniques have proposed automated approaches to alleviate the path explosion problem of symbolic execution, users still need to optimize symbolic execution by applying various searching strategies carefully. As existing approaches mainly support only coarse-grained global searching strategies, they cannot efficiently traverse through complex code structures. In this paper, we propose Eunomia, a symbolic execution technique that allows users to specify local domain knowledge to enable fine-grained search. In Eunomia, we design an expressive DSL, Aes, that lets users precisely pinpoint local searching strategies to different parts of the target program. To further optimize local searching strategies, we design an interval-based algorithm that automatically isolates the context of variables for different local searching strategies, avoiding conflicts between local searching strategies for the same variable. We implement Eunomia as a symbolic execution platform targeting WebAssembly, which enables us to analyze applications written in various languages (like C and Go) but can be compiled into WebAssembly. To the best of our knowledge, Eunomia is the first symbolic execution engine that supports the full features of the WebAssembly runtime. We evaluate Eunomia with a dedicated microbenchmark suite for symbolic execution and six real-world applications. Our evaluation shows that Eunomia accelerates bug detection in real-world applications by up to three orders of magnitude. According to the results of a comprehensive user study, users can significantly improve the efficiency and effectiveness of symbolic execution by writing a simple and intuitive Aes script. Besides verifying six known real-world bugs, Eunomia also detected two new zero-day bugs in a popular open-source project, Collections-C.

Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models

  • Authors: Yaohua Zha, Jinpeng Wang, Tao Dai, Bin Chen, Zhi Wang, Shu-Tao Xia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.07221
  • Pdf link: https://arxiv.org/pdf/2304.07221
  • Abstract
    Recently, pre-trained point cloud models have found extensive applications in downstream tasks like object classification. However, these tasks often require {full fine-tuning} of models and lead to storage-intensive procedures, thus limiting the real applications of pre-trained models. Inspired by the great success of visual prompt tuning (VPT) in vision, we attempt to explore prompt tuning, which serves as an efficient alternative to full fine-tuning for large-scale models, to point cloud pre-trained models to reduce storage costs. However, it is non-trivial to apply the traditional static VPT to point clouds, owing to the distribution diversity of point cloud data. For instance, the scanned point clouds exhibit various types of missing or noisy points. To address this issue, we propose an Instance-aware Dynamic Prompt Tuning (IDPT) for point cloud pre-trained models, which utilizes a prompt module to perceive the semantic prior features of each instance. This semantic prior facilitates the learning of unique prompts for each instance, thus enabling downstream tasks to robustly adapt to pre-trained point cloud models. Notably, extensive experiments conducted on downstream tasks demonstrate that IDPT outperforms full fine-tuning in most tasks with a mere 7% of the trainable parameters, thus significantly reducing the storage pressure. Code is available at \url{https://github.com/zyh16143998882/IDPT}.

Separating Key Agreement and Computational Differential Privacy

  • Authors: Eldon Chung, Vipul Arora, Thomas Tan, Zeyong Li
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.07239
  • Pdf link: https://arxiv.org/pdf/2304.07239
  • Abstract
    Two party differential privacy allows two parties who do not trust each other, to come together and perform a joint analysis on their data whilst maintaining individual-level privacy. We show that any efficient, computationally differentially private protocol that has black-box access to key agreement (and nothing stronger), is also an efficient, information-theoretically differentially private protocol. In other words, the existence of efficient key agreement protocols is insufficient for efficient, computationally differentially private protocols. In doing so, we make progress in answering an open question posed by Vadhan about the minimal computational assumption needed for computational differential privacy. Combined with the information-theoretic lower bound due to McGregor, Mironov, Pitassi, Reingold, Talwar, and Vadhan in [FOCS'10], we show that there is no fully black-box reduction from efficient, computationally differentially private protocols for computing the Hamming distance (or equivalently inner product over the integers) on $n$ bits, with additive error lower than $O\left(\frac{\sqrt{n}}{e^{\epsilon}\log(n)}\right)$, to key agreement. This complements the result by Haitner, Mazor, Silbak, and Tsfadia in [STOC'22], which showed that computing the Hamming distance implies key agreement. We conclude that key agreement is \emph{strictly} weaker than computational differential privacy for computing the inner product, thereby answering their open question on whether key agreement is sufficient.

Covidia: COVID-19 Interdisciplinary Academic Knowledge Graph

  • Authors: Cheng Deng, Jiaxin Ding, Luoyi Fu, Weinan Zhang, Xinbing Wang, Chenghu Zhou
  • Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.07242
  • Pdf link: https://arxiv.org/pdf/2304.07242
  • Abstract
    The pandemic of COVID-19 has inspired extensive works across different research fields. Existing literature and knowledge platforms on COVID-19 only focus on collecting papers on biology and medicine, neglecting the interdisciplinary efforts, which hurdles knowledge sharing and research collaborations between fields to address the problem. Studying interdisciplinary researches requires effective paper category classification and efficient cross-domain knowledge extraction and integration. In this work, we propose Covidia, COVID-19 interdisciplinary academic knowledge graph to bridge the gap between knowledge of COVID-19 on different domains. We design frameworks based on contrastive learning for disciplinary classification, and propose a new academic knowledge graph scheme for entity extraction, relation classification and ontology management in accordance with interdisciplinary researches. Based on Covidia, we also establish knowledge discovery benchmarks for finding COVID-19 research communities and predicting potential links.

Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space

  • Authors: Seokju Yun, Youngmin Ro
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.07254
  • Pdf link: https://arxiv.org/pdf/2304.07254
  • Abstract
    We introduce Dynamic Mobile-Former(DMF), maximizes the capabilities of dynamic convolution by harmonizing it with efficient operators.Our Dynamic MobileFormer effectively utilizes the advantages of Dynamic MobileNet (MobileNet equipped with dynamic convolution) using global information from light-weight attention.A Transformer in Dynamic Mobile-Former only requires a few randomly initialized tokens to calculate global features, making it computationally efficient.And a bridge between Dynamic MobileNet and Transformer allows for bidirectional integration of local and global features.We also simplify the optimization process of vanilla dynamic convolution by splitting the convolution kernel into an input-agnostic kernel and an input-dependent kernel.This allows for optimization in a wider kernel space, resulting in enhanced capacity.By integrating lightweight attention and enhanced dynamic convolution, our Dynamic Mobile-Former achieves not only high efficiency, but also strong performance.We benchmark the Dynamic Mobile-Former on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection, and instanace segmentation.For example, our DMF hits the top-1 accuracy of 79.4% on ImageNet-1K, much higher than PVT-Tiny by 4.3% with only 1/4 FLOPs.Additionally,our proposed DMF-S model performed well on challenging vision datasets such as COCO, achieving a 39.0% mAP,which is 1% higher than that of the Mobile-Former 508M model, despite using 3 GFLOPs less computations.Code and models are available at https://github.com/ysj9909/DMF

Keyword: faster

Sample Average Approximation for Black-Box VI

  • Authors: Javier Burroni, Justin Domke, Daniel Sheldon
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.06803
  • Pdf link: https://arxiv.org/pdf/2304.06803
  • Abstract
    We present a novel approach for black-box VI that bypasses the difficulties of stochastic gradient ascent, including the task of selecting step-sizes. Our approach involves using a sequence of sample average approximation (SAA) problems. SAA approximates the solution of stochastic optimization problems by transforming them into deterministic ones. We use quasi-Newton methods and line search to solve each deterministic optimization problem and present a heuristic policy to automate hyperparameter selection. Our experiments show that our method simplifies the VI problem and achieves faster performance than existing methods.

Automated Translation and Accelerated Solving of Differential Equations on Multiple GPU Platforms

  • Authors: Utkarsh Utkarsh, Valentin Churavy, Yingbo Ma, Tim Besard, Tim Gymnich, Adam R. Gerlach, Alan Edelman, Christopher Rackauckas
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.06835
  • Pdf link: https://arxiv.org/pdf/2304.06835
  • Abstract
    We demonstrate a high-performance vendor-agnostic method for massively parallel solving of ensembles of ordinary differential equations (ODEs) and stochastic differential equations (SDEs) on GPUs. The method is integrated with a widely used differential equation solver library in a high-level language (Julia's DifferentialEquations.jl) and enables GPU acceleration without requiring code changes by the user. Our approach achieves state-of-the-art performance compared to hand-optimized CUDA-C++ kernels, while performing $20-100\times$ faster than the vectorized-map (\texttt{vmap}) approach implemented in JAX and PyTorch. Performance evaluation on NVIDIA, AMD, Intel, and Apple GPUs demonstrates performance portability and vendor-agnosticism. We show composability with MPI to enable distributed multi-GPU workflows. The implemented solvers are fully featured, supporting event handling, automatic differentiation, and incorporating of datasets via the GPU's texture memory, allowing scientists to take advantage of GPU acceleration on all major current architectures without changing their model code and without loss of performance.

Evaluation of Social Biases in Recent Large Pre-Trained Models

  • Authors: Swapnil Sharma, Nikita Anand, Kranthi Kiran G.V., Alind Jain
  • Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06861
  • Pdf link: https://arxiv.org/pdf/2304.06861
  • Abstract
    Large pre-trained language models are widely used in the community. These models are usually trained on unmoderated and unfiltered data from open sources like the Internet. Due to this, biases that we see in platforms online which are a reflection of those in society are in turn captured and learned by these models. These models are deployed in applications that affect millions of people and their inherent biases are harmful to the targeted social groups. In this work, we study the general trend in bias reduction as newer pre-trained models are released. Three recent models ( ELECTRA, DeBERTa, and DistilBERT) are chosen and evaluated against two bias benchmarks, StereoSet and CrowS-Pairs. They are compared to the baseline of BERT using the associated metrics. We explore whether as advancements are made and newer, faster, lighter models are released: are they being developed responsibly such that their inherent social biases have been reduced compared to their older counterparts? The results are compiled and we find that all the models under study do exhibit biases but have generally improved as compared to BERT.

Collaborative Ground-Aerial Multi-Robot System for Disaster Response Missions with a Low-Cost Drone Add-On for Off-the-Shelf Drones

  • Authors: Shalutha Rajapakshe, Dilanka Wickramasinghe, Sahan Gurusinghe, Deepana Ishtaweera, Bhanuka Silva, Peshala Jayasekara, Nick Panitz, Paul Flick, Navinda Kottege
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06992
  • Pdf link: https://arxiv.org/pdf/2304.06992
  • Abstract
    In disaster-stricken environments, it's vital to assess the damage quickly, analyse the stability of the environment, and allocate resources to the most vulnerable areas where victims might be present. These missions are difficult and dangerous to be conducted directly by humans. Using the complementary capabilities of both the ground and aerial robots, we investigate a collaborative approach of aerial and ground robots to address this problem. With an increased field of view, faster speed, and compact size, the aerial robot explores the area and creates a 3D feature-based map graph of the environment while providing a live video stream to the ground control station. Once the aerial robot finishes the exploration run, the ground control station processes the map and sends it to the ground robot. The ground robot, with its higher operation time, static stability, payload delivery and tele-conference capabilities, can then autonomously navigate to identified high-vulnerability locations. We have conducted experiments using a quadcopter and a hexapod robot in an indoor modelled environment with obstacles and uneven ground. Additionally, we have developed a low-cost drone add-on with value-added capabilities, such as victim detection, that can be attached to an off-the-shelf drone. The system was assessed for cost-effectiveness, energy efficiency, and scalability.

DIPNet: Efficiency Distillation and Iterative Pruning for Image Super-Resolution

  • Authors: Lei Yu, Xinpeng Li, Youwei Li, Ting Jiang, Qi Wu, Haoqiang Fan, Shuaicheng Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.07018
  • Pdf link: https://arxiv.org/pdf/2304.07018
  • Abstract
    Efficient deep learning-based approaches have achieved remarkable performance in single image super-resolution. However, recent studies on efficient super-resolution have mainly focused on reducing the number of parameters and floating-point operations through various network designs. Although these methods can decrease the number of parameters and floating-point operations, they may not necessarily reduce actual running time. To address this issue, we propose a novel multi-stage lightweight network boosting method, which can enable lightweight networks to achieve outstanding performance. Specifically, we leverage enhanced high-resolution output as additional supervision to improve the learning ability of lightweight student networks. Upon convergence of the student network, we further simplify our network structure to a more lightweight level using reparameterization techniques and iterative network pruning. Meanwhile, we adopt an effective lightweight network training strategy that combines multi-anchor distillation and progressive learning, enabling the lightweight network to achieve outstanding performance. Ultimately, our proposed method achieves the fastest inference time among all participants in the NTIRE 2023 efficient super-resolution challenge while maintaining competitive super-resolution performance. Additionally, extensive experiments are conducted to demonstrate the effectiveness of the proposed components. The results show that our approach achieves comparable performance in representative dataset DIV2K, both qualitatively and quantitatively, with faster inference and fewer number of network parameters.

GreedyGD: Enhanced Generalized Deduplication for Direct Analytics in IoT

  • Authors: Aaron Hurst, Daniel E. Lucani, Qi Zhang
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.07240
  • Pdf link: https://arxiv.org/pdf/2304.07240
  • Abstract
    Exponential growth in the amount of data generated by the Internet of Things currently pose significant challenges for data communication, storage and analytics and leads to high costs for organisations hoping to leverage their data. Novel techniques are therefore needed to holistically improve the efficiency of data storage and analytics in IoT systems. The emerging compression technique Generalized Deduplication (GD) has been shown to deliver high compression and enable direct compressed data analytics with low storage and memory requirements. In this paper, we propose a new GD-based data compression algorithm called GreedyGD that is designed for analytics. Compared to existing versions of GD, GreedyGD enables more reliable analytics with less data, while running 11.2x faster and delivering even better compression.

Keyword: mobile

AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks

  • Authors: Abhisek Kundu, Naveen K. Mellempudi, Dharma Teja Vooturi, Bharat Kaul, Pradeep Dubey
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06941
  • Pdf link: https://arxiv.org/pdf/2304.06941
  • Abstract
    Sparse training is emerging as a promising avenue for reducing the computational cost of training neural networks. Several recent studies have proposed pruning methods using learnable thresholds to efficiently explore the non-uniform distribution of sparsity inherent within the models. In this paper, we propose Gradient Annealing (GA), where gradients of masked weights are scaled down in a non-linear manner. GA provides an elegant trade-off between sparsity and accuracy without the need for additional sparsity-inducing regularization. We integrated GA with the latest learnable pruning methods to create an automated sparse training algorithm called AutoSparse, which achieves better accuracy and/or training/inference FLOPS reduction than existing learnable pruning methods for sparse ResNet50 and MobileNetV1 on ImageNet-1K: AutoSparse achieves (2x, 7x) reduction in (training,inference) FLOPS for ResNet50 on ImageNet at 80% sparsity. Finally, AutoSparse outperforms sparse-to-sparse SotA method MEST (uniform sparsity) for 80% sparse ResNet50 with similar accuracy, where MEST uses 12% more training FLOPS and 50% more inference FLOPS.

Entropy-Based Energy Dissipation Analysis of Mobile Communication Systems

  • Authors: Litao Yan, Xiaohu Ge
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.06988
  • Pdf link: https://arxiv.org/pdf/2304.06988
  • Abstract
    Compared with the energy efficiency of conventional mobile communication systems, the energy efficiency of fifth generation (5G) communication systems has been improved more than 30 times. However, the energy consumption of 5G communication systems is 3 times of the energy consumption of fourth generation (4G) communication systems when the wireless traffic is increased more than 100 times in the last decade. It is anticipated that the traffic of future sixth generation (6G) communication systems will keep an exponential growth in the next decade. It is a key issue how much space is left for improving of energy efficiency in mobile communication systems. To answer the question, an entropy-based energy dissipation model based on nonequilibrium thermodynamics is first proposed for mobile communication systems. Moreover, the theoretical minimal energy dissipation limits are derived for typical modulations in mobile communication systems. Simulation results show that the practical energy dissipation of information processing and information transmission is three and seven orders of magnitude away from the theoretical minimal energy dissipation limits in mobile communication systems, respectively. These results provide some guidelines for energy efficiency optimization in future mobile communication systems.

A Framework for Fast Prototyping of Photo-realistic Environments with Multiple Pedestrians

  • Authors: Sara Casao, Andrés Otero, Álvaro Serra-Gómez, Ana C. Murillo, Javier Alonso-Mora, Eduardo Montijano
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.07059
  • Pdf link: https://arxiv.org/pdf/2304.07059
  • Abstract
    Robotic applications involving people often require advanced perception systems to better understand complex real-world scenarios. To address this challenge, photo-realistic and physics simulators are gaining popularity as a means of generating accurate data labeling and designing scenarios for evaluating generalization capabilities, e.g., lighting changes, camera movements or different weather conditions. We develop a photo-realistic framework built on Unreal Engine and AirSim to generate easily scenarios with pedestrians and mobile robots. The framework is capable to generate random and customized trajectories for each person and provides up to 50 ready-to-use people models along with an API for their metadata retrieval. We demonstrate the usefulness of the proposed framework with a use case of multi-target tracking, a popular problem in real pedestrian scenarios. The notable feature variability in the obtained perception data is presented and evaluated.

DroidBot-GPT: GPT-powered UI Automation for Android

  • Authors: Hao Wen, Hongming Wang, Jiaxuan Liu, Yuanchun Li
  • Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.07061
  • Pdf link: https://arxiv.org/pdf/2304.07061
  • Abstract
    This paper introduces DroidBot-GPT, a tool that utilizes GPT-like large language models (LLMs) to automate the interactions with Android mobile applications. Given a natural language description of a desired task, DroidBot-GPT can automatically generate and execute actions that navigate the app to complete the task. It works by translating the app GUI state information and the available actions on the smartphone screen to natural language prompts and asking the LLM to make a choice of actions. Since the LLM is typically trained on a large amount of data including the how-to manuals of diverse software applications, it has the ability to make reasonable choices of actions based on the provided information. We evaluate DroidBot-GPT with a self-created dataset that contains 33 tasks collected from 17 Android applications spanning 10 categories. It can successfully complete 39.39% of the tasks, and the average partial completion progress is about 66.76%. Given the fact that our method is fully unsupervised (no modification required from both the app and the LLM), we believe there is great potential to enhance automation performance with better app development paradigms and/or custom model training.

Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space

  • Authors: Seokju Yun, Youngmin Ro
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.07254
  • Pdf link: https://arxiv.org/pdf/2304.07254
  • Abstract
    We introduce Dynamic Mobile-Former(DMF), maximizes the capabilities of dynamic convolution by harmonizing it with efficient operators.Our Dynamic MobileFormer effectively utilizes the advantages of Dynamic MobileNet (MobileNet equipped with dynamic convolution) using global information from light-weight attention.A Transformer in Dynamic Mobile-Former only requires a few randomly initialized tokens to calculate global features, making it computationally efficient.And a bridge between Dynamic MobileNet and Transformer allows for bidirectional integration of local and global features.We also simplify the optimization process of vanilla dynamic convolution by splitting the convolution kernel into an input-agnostic kernel and an input-dependent kernel.This allows for optimization in a wider kernel space, resulting in enhanced capacity.By integrating lightweight attention and enhanced dynamic convolution, our Dynamic Mobile-Former achieves not only high efficiency, but also strong performance.We benchmark the Dynamic Mobile-Former on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection, and instanace segmentation.For example, our DMF hits the top-1 accuracy of 79.4% on ImageNet-1K, much higher than PVT-Tiny by 4.3% with only 1/4 FLOPs.Additionally,our proposed DMF-S model performed well on challenging vision datasets such as COCO, achieving a 39.0% mAP,which is 1% higher than that of the Mobile-Former 508M model, despite using 3 GFLOPs less computations.Code and models are available at https://github.com/ysj9909/DMF

Keyword: pruning

Structured Pruning for Multi-Task Deep Neural Networks

  • Authors: Siddhant Garg, Lijun Zhang, Hui Guan
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06840
  • Pdf link: https://arxiv.org/pdf/2304.06840
  • Abstract
    Although multi-task deep neural network (DNN) models have computation and storage benefits over individual single-task DNN models, they can be further optimized via model compression. Numerous structured pruning methods are already developed that can readily achieve speedups in single-task models, but the pruning of multi-task networks has not yet been extensively studied. In this work, we investigate the effectiveness of structured pruning on multi-task models. We use an existing single-task filter pruning criterion and also introduce an MTL-based filter pruning criterion for estimating the filter importance scores. We prune the model using an iterative pruning strategy with both pruning methods. We show that, with careful hyper-parameter tuning, architectures obtained from different pruning methods do not have significant differences in their performances across tasks when the number of parameters is similar. We also show that iterative structure pruning may not be the best way to achieve a well-performing pruned model because, at extreme pruning levels, there is a high drop in performance across all tasks. But when the same models are randomly initialized and re-trained, they show better results.

AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks

  • Authors: Abhisek Kundu, Naveen K. Mellempudi, Dharma Teja Vooturi, Bharat Kaul, Pradeep Dubey
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06941
  • Pdf link: https://arxiv.org/pdf/2304.06941
  • Abstract
    Sparse training is emerging as a promising avenue for reducing the computational cost of training neural networks. Several recent studies have proposed pruning methods using learnable thresholds to efficiently explore the non-uniform distribution of sparsity inherent within the models. In this paper, we propose Gradient Annealing (GA), where gradients of masked weights are scaled down in a non-linear manner. GA provides an elegant trade-off between sparsity and accuracy without the need for additional sparsity-inducing regularization. We integrated GA with the latest learnable pruning methods to create an automated sparse training algorithm called AutoSparse, which achieves better accuracy and/or training/inference FLOPS reduction than existing learnable pruning methods for sparse ResNet50 and MobileNetV1 on ImageNet-1K: AutoSparse achieves (2x, 7x) reduction in (training,inference) FLOPS for ResNet50 on ImageNet at 80% sparsity. Finally, AutoSparse outperforms sparse-to-sparse SotA method MEST (uniform sparsity) for 80% sparse ResNet50 with similar accuracy, where MEST uses 12% more training FLOPS and 50% more inference FLOPS.

DIPNet: Efficiency Distillation and Iterative Pruning for Image Super-Resolution

  • Authors: Lei Yu, Xinpeng Li, Youwei Li, Ting Jiang, Qi Wu, Haoqiang Fan, Shuaicheng Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.07018
  • Pdf link: https://arxiv.org/pdf/2304.07018
  • Abstract
    Efficient deep learning-based approaches have achieved remarkable performance in single image super-resolution. However, recent studies on efficient super-resolution have mainly focused on reducing the number of parameters and floating-point operations through various network designs. Although these methods can decrease the number of parameters and floating-point operations, they may not necessarily reduce actual running time. To address this issue, we propose a novel multi-stage lightweight network boosting method, which can enable lightweight networks to achieve outstanding performance. Specifically, we leverage enhanced high-resolution output as additional supervision to improve the learning ability of lightweight student networks. Upon convergence of the student network, we further simplify our network structure to a more lightweight level using reparameterization techniques and iterative network pruning. Meanwhile, we adopt an effective lightweight network training strategy that combines multi-anchor distillation and progressive learning, enabling the lightweight network to achieve outstanding performance. Ultimately, our proposed method achieves the fastest inference time among all participants in the NTIRE 2023 efficient super-resolution challenge while maintaining competitive super-resolution performance. Additionally, extensive experiments are conducted to demonstrate the effectiveness of the proposed components. The results show that our approach achieves comparable performance in representative dataset DIV2K, both qualitatively and quantitatively, with faster inference and fewer number of network parameters.

Keyword: voxel

Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding

  • Authors: Yu-Qi Yang, Yu-Xiao Guo, Jian-Yu Xiong, Yang Liu, Hao Pan, Peng-Shuai Wang, Xin Tong, Baining Guo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06906
  • Pdf link: https://arxiv.org/pdf/2304.06906
  • Abstract
    Pretrained backbones with fine-tuning have been widely adopted in 2D vision and natural language processing tasks and demonstrated significant advantages to task-specific networks. In this paper, we present a pretrained 3D backbone, named {\SST}, which first outperforms all state-of-the-art methods in downstream 3D indoor scene understanding tasks. Our backbone network is based on a 3D Swin transformer and carefully designed to efficiently conduct self-attention on sparse voxels with linear memory complexity and capture the irregularity of point signals via generalized contextual relative positional embedding. Based on this backbone design, we pretrained a large {\SST} model on a synthetic Structed3D dataset that is 10 times larger than the ScanNet dataset and fine-tuned the pretrained model in various downstream real-world indoor scene understanding tasks. The results demonstrate that our model pretrained on the synthetic dataset not only exhibits good generality in both downstream segmentation and detection on real 3D point datasets, but also surpasses the state-of-the-art methods on downstream tasks after fine-tuning with +2.3 mIoU and +2.2 mIoU on S3DIS Area5 and 6-fold semantic segmentation, +2.1 mIoU on ScanNet segmentation (val), +1.9 [email protected] on ScanNet detection, +8.1 [email protected] on S3DIS detection. Our method demonstrates the great potential of pretrained 3D backbones with fine-tuning for 3D understanding tasks. The code and models are available at https://github.com/microsoft/Swin3D .

Keyword: lidar

Near Field iToF LIDAR Depth Improvement from Limited Number of Shots

  • Authors: Mena Nagiub, Thorsten Beuth, Ganesh Sistu, Heinrich Gotzig, Ciar án Eising
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.07047
  • Pdf link: https://arxiv.org/pdf/2304.07047
  • Abstract
    Indirect Time of Flight LiDARs can indirectly calculate the scene's depth from the phase shift angle between transmitted and received laser signals with amplitudes modulated at a predefined frequency. Unfortunately, this method generates ambiguity in calculated depth when the phase shift angle value exceeds $2\pi$. Current state-of-the-art methods use raw samples generated using two distinct modulation frequencies to overcome this ambiguity problem. However, this comes at the cost of increasing laser components' stress and raising their temperature, which reduces their lifetime and increases power consumption. In our work, we study two different methods to recover the entire depth range of the LiDAR using fewer raw data sample shots from a single modulation frequency with the support of sensor's gray scale output to reduce the laser components' stress and power consumption.

Prior based Sampling for Adaptive LiDAR

  • Authors: Amit Shomer, Shai Avidan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.07099
  • Pdf link: https://arxiv.org/pdf/2304.07099
  • Abstract
    We propose SampleDepth, a Convolutional Neural Network (CNN), that is suited for an adaptive LiDAR. Typically,LiDAR sampling strategy is pre-defined, constant and independent of the observed scene. Instead of letting a LiDAR sample the scene in this agnostic fashion, SampleDepth determines, adaptively, where it is best to sample the current frame.To do that, SampleDepth uses depth samples from previous time steps to predict a sampling mask for the current frame. Crucially, SampleDepth is trained to optimize the performance of a depth completion downstream task. SampleDepth is evaluated on two different depth completion networks and two LiDAR datasets, KITTI Depth Completion and the newly introduced synthetic dataset, SHIFT. We show that SampleDepth is effective and suitable for different depth completion downstream tasks.

Sub-meter resolution canopy height maps using self-supervised learning and a vision transformer trained on Aerial and GEDI Lidar

  • Authors: Jamie Tolan, Hung-I Yang, Ben Nosarzewski, Guillaume Couairon, Huy Vo, John Brandt, Justine Spore, Sayantan Majumdar, Daniel Haziza, Janaki Vamaraju, Theo Moutakani, Piotr Bojanowski, Tracy Johns, Brian White, Tobias Tiecke, Camille Couprie
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.07213
  • Pdf link: https://arxiv.org/pdf/2304.07213
  • Abstract
    Vegetation structure mapping is critical for understanding the global carbon cycle and monitoring nature-based approaches to climate adaptation and mitigation. Repeat measurements of these data allow for the observation of deforestation or degradation of existing forests, natural forest regeneration, and the implementation of sustainable agricultural practices like agroforestry. Assessments of tree canopy height and crown projected area at a high spatial resolution are also important for monitoring carbon fluxes and assessing tree-based land uses, since forest structures can be highly spatially heterogeneous, especially in agroforestry systems. Very high resolution satellite imagery (less than one meter (1m) ground sample distance) makes it possible to extract information at the tree level while allowing monitoring at a very large scale. This paper presents the first high-resolution canopy height map concurrently produced for multiple sub-national jurisdictions. Specifically, we produce canopy height maps for the states of California and S~{a}o Paolo, at sub-meter resolution, a significant improvement over the ten meter (10m) resolution of previous Sentinel / GEDI based worldwide maps of canopy height. The maps are generated by applying a vision transformer to features extracted from a self-supervised model in Maxar imagery from 2017 to 2020, and are trained against aerial lidar and GEDI observations. We evaluate the proposed maps with set-aside validation lidar data as well as by comparing with other remotely sensed maps and field-collected data, and find our model produces an average Mean Absolute Error (MAE) within set-aside validation areas of 3.0 meters.

Keyword: diffusion

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

  • Authors: Hanze Dong, Wei Xiong, Deepanshu Goyal, Rui Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum, Tong Zhang
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.06767
  • Pdf link: https://arxiv.org/pdf/2304.06767
  • Abstract
    Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially significant repercussions. Consequently, aligning these models with human ethics and preferences is an essential step toward ensuring their responsible and effective deployment in real-world applications. Prior research has primarily employed Reinforcement Learning from Human Feedback (RLHF) as a means of addressing this problem, wherein generative models are fine-tuned using RL algorithms guided by a human-feedback-informed reward model. However, the inefficiencies and instabilities associated with RL algorithms frequently present substantial obstacles to the successful alignment of generative models, necessitating the development of a more robust and streamlined approach. To this end, we introduce a new framework, Reward rAnked FineTuning (RAFT), designed to align generative models more effectively. Utilizing a reward model and a sufficient number of samples, our approach selects the high-quality samples, discarding those that exhibit undesired behavior, and subsequently assembles a streaming dataset. This dataset serves as the basis for aligning the generative model and can be employed under both offline and online settings. Notably, the sample generation process within RAFT is gradient-free, rendering it compatible with black-box generators. Through extensive experiments, we demonstrate that our proposed algorithm exhibits strong performance in the context of both large language models and diffusion models.

Inpaint Anything: Segment Anything Meets Image Inpainting

  • Authors: Tao Yu, Runseng Feng, Ruoyu Feng, Jinming Liu, Xin Jin, Wenjun Zeng, Zhibo Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06790
  • Pdf link: https://arxiv.org/pdf/2304.06790
  • Abstract
    Modern image inpainting systems, despite the significant progress, often struggle with mask selection and holes filling. Based on Segment-Anything Model (SAM), we make the first attempt to the mask-free image inpainting and propose a new paradigm of clicking and filling'', which is named as Inpaint Anything (IA). The core idea behind IA is to combine the strengths of different models in order to build a very powerful and user-friendly pipeline for solving inpainting-related problems. IA supports three main features: (i) Remove Anything: users could click on an object and IA will remove it and smooth the hole'' with the context; (ii) Fill Anything: after certain objects removal, users could provide text-based prompts to IA, and then it will fill the hole with the corresponding generative content via driving AIGC models like Stable Diffusion; (iii) Replace Anything: with IA, users have another option to retain the click-selected object and replace the remaining background with the newly generated scenes. We are also very willing to help everyone share and promote new projects based on our Inpaint Anything (IA). Our codes are available at https://github.com/geekyutao/Inpaint-Anything.

Soundini: Sound-Guided Diffusion for Natural Video Editing

  • Authors: Seung Hyun Lee, Sieun Kim, Innfarn Yoo, Feng Yang, Donghyeon Cho, Youngseo Kim, Huiwen Chang, Jinkyu Kim, Sangpil Kim
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06818
  • Pdf link: https://arxiv.org/pdf/2304.06818
  • Abstract
    We propose a method for adding sound-guided visual effects to specific regions of videos with a zero-shot setting. Animating the appearance of the visual effect is challenging because each frame of the edited video should have visual changes while maintaining temporal consistency. Moreover, existing video editing solutions focus on temporal consistency across frames, ignoring the visual style variations over time, e.g., thunderstorm, wave, fire crackling. To overcome this limitation, we utilize temporal sound features for the dynamic style. Specifically, we guide denoising diffusion probabilistic models with an audio latent representation in the audio-visual latent space. To the best of our knowledge, our work is the first to explore sound-guided natural video editing from various sound sources with sound-specialized properties, such as intensity, timbre, and volume. Additionally, we design optical flow-based guidance to generate temporally consistent video frames, capturing the pixel-wise relationship between adjacent frames. Experimental results show that our method outperforms existing video editing techniques, producing more realistic visual effects that reflect the properties of sound. Please visit our page: https://kuai-lab.github.io/soundini-gallery/.

A Diffusion model for POI recommendation

  • Authors: Yifang Qin, Hongjun Wu, Wei Ju, Xiao Luo, Ming Zhang
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.07041
  • Pdf link: https://arxiv.org/pdf/2304.07041
  • Abstract
    Next Point-of-Interest (POI) recommendation is a critical task in location-based services that aim to provide personalized suggestions for the user's next destination. Previous works on POI recommendation have laid focused on modeling the user's spatial preference. However, existing works that leverage spatial information are only based on the aggregation of users' previous visited positions, which discourages the model from recommending POIs in novel areas. This trait of position-based methods will harm the model's performance in many situations. Additionally, incorporating sequential information into the user's spatial preference remains a challenge. In this paper, we propose Diff-POI: a Diffusion-based model that samples the user's spatial preference for the next POI recommendation. Inspired by the wide application of diffusion algorithm in sampling from distributions, Diff-POI encodes the user's visiting sequence and spatial character with two tailor-designed graph encoding modules, followed by a diffusion-based sampling strategy to explore the user's spatial visiting trends. We leverage the diffusion process and its reversed form to sample from the posterior distribution and optimized the corresponding score function. We design a joint training and inference framework to optimize and evaluate the proposed Diff-POI. Extensive experiments on four real-world POI recommendation datasets demonstrate the superiority of our Diff-POI over state-of-the-art baseline methods. Further ablation and parameter studies on Diff-POI reveal the functionality and effectiveness of the proposed diffusion-based sampling strategy for addressing the limitations of existing methods.

DCFace: Synthetic Face Generation with Dual Condition Diffusion Model

  • Authors: Minchul Kim, Feng Liu, Anil Jain, Xiaoming Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.07060
  • Pdf link: https://arxiv.org/pdf/2304.07060
  • Abstract
    Generating synthetic datasets for training face recognition models is challenging because dataset generation entails more than creating high fidelity images. It involves generating multiple images of same subjects under different factors (\textit{e.g.}, variations in pose, illumination, expression, aging and occlusion) which follows the real image conditional distribution. Previous works have studied the generation of synthetic datasets using GAN or 3D models. In this work, we approach the problem from the aspect of combining subject appearance (ID) and external factor (style) conditions. These two conditions provide a direct way to control the inter-class and intra-class variations. To this end, we propose a Dual Condition Face Generator (DCFace) based on a diffusion model. Our novel Patch-wise style extractor and Time-step dependent ID loss enables DCFace to consistently produce face images of the same subject under different styles with precise control. Face recognition models trained on synthetic images from the proposed DCFace provide higher verification accuracies compared to previous works by $6.11%$ on average in $4$ out of $5$ test datasets, LFW, CFP-FP, CPLFW, AgeDB and CALFW. Code is available at https://github.com/mk-minchul/dcface

Memory Efficient Diffusion Probabilistic Models via Patch-based Generation

  • Authors: Shinei Arakawa, Hideki Tsunashima, Daichi Horita, Keitaro Tanaka, Shigeo Morishima
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.07087
  • Pdf link: https://arxiv.org/pdf/2304.07087
  • Abstract
    Diffusion probabilistic models have been successful in generating high-quality and diverse images. However, traditional models, whose input and output are high-resolution images, suffer from excessive memory requirements, making them less practical for edge devices. Previous approaches for generative adversarial networks proposed a patch-based method that uses positional encoding and global content information. Nevertheless, designing a patch-based approach for diffusion probabilistic models is non-trivial. In this paper, we resent a diffusion probabilistic model that generates images on a patch-by-patch basis. We propose two conditioning methods for a patch-based generation. First, we propose position-wise conditioning using one-hot representation to ensure patches are in proper positions. Second, we propose Global Content Conditioning (GCC) to ensure patches have coherent content when concatenated together. We evaluate our model qualitatively and quantitatively on CelebA and LSUN bedroom datasets and demonstrate a moderate trade-off between maximum memory consumption and generated image quality. Specifically, when an entire image is divided into 2 x 2 patches, our proposed approach can reduce the maximum memory consumption by half while maintaining comparable image quality.

Delta Denoising Score

  • Authors: Amir Hertz, Kfir Aberman, Daniel Cohen-Or
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.07090
  • Pdf link: https://arxiv.org/pdf/2304.07090
  • Abstract
    We introduce Delta Denoising Score (DDS), a novel scoring function for text-based image editing that guides minimal modifications of an input image towards the content described in a target prompt. DDS leverages the rich generative prior of text-to-image diffusion models and can be used as a loss term in an optimization problem to steer an image towards a desired direction dictated by a text. DDS utilizes the Score Distillation Sampling (SDS) mechanism for the purpose of image editing. We show that using only SDS often produces non-detailed and blurry outputs due to noisy gradients. To address this issue, DDS uses a prompt that matches the input image to identify and remove undesired erroneous directions of SDS. Our key premise is that SDS should be zero when calculated on pairs of matched prompts and images, meaning that if the score is non-zero, its gradients can be attributed to the erroneous component of SDS. Our analysis demonstrates the competence of DDS for text based image-to-image translation. We further show that DDS can be used to train an effective zero-shot image translation model. Experimental results indicate that DDS outperforms existing methods in terms of stability and quality, highlighting its potential for real-world applications in text-based image editing.

Towards Controllable Diffusion Models via Reward-Guided Exploration

  • Authors: Hengtong Zhang, Tingyang Xu
  • Subjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM)
  • Arxiv link: https://arxiv.org/abs/2304.07132
  • Pdf link: https://arxiv.org/pdf/2304.07132
  • Abstract
    By formulating data samples' formation as a Markov denoising process, diffusion models achieve state-of-the-art performances in a collection of tasks. Recently, many variants of diffusion models have been proposed to enable controlled sample generation. Most of these existing methods either formulate the controlling information as an input (i.e.,: conditional representation) for the noise approximator, or introduce a pre-trained classifier in the test-phase to guide the Langevin dynamic towards the conditional goal. However, the former line of methods only work when the controlling information can be formulated as conditional representations, while the latter requires the pre-trained guidance classifier to be differentiable. In this paper, we propose a novel framework named RGDM (Reward-Guided Diffusion Model) that guides the training-phase of diffusion models via reinforcement learning (RL). The proposed training framework bridges the objective of weighted log-likelihood and maximum entropy RL, which enables calculating policy gradients via samples from a pay-off distribution proportional to exponential scaled rewards, rather than from policies themselves. Such a framework alleviates the high gradient variances and enables diffusion models to explore for highly rewarded samples in the reverse process. Experiments on 3D shape and molecule generation tasks show significant improvements over existing conditional diffusion models.

A Comparative Study on Generative Models for High Resolution Solar Observation Imaging

  • Authors: Mehdi Cherti, Alexander Czernik, Stefan Kesselheim, Frederic Effenberger, Jenia Jitsev
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.07169
  • Pdf link: https://arxiv.org/pdf/2304.07169
  • Abstract
    Solar activity is one of the main drivers of variability in our solar system and the key source of space weather phenomena that affect Earth and near Earth space. The extensive record of high resolution extreme ultraviolet (EUV) observations from the Solar Dynamics Observatory (SDO) offers an unprecedented, very large dataset of solar images. In this work, we make use of this comprehensive dataset to investigate capabilities of current state-of-the-art generative models to accurately capture the data distribution behind the observed solar activity states. Starting from StyleGAN-based methods, we uncover severe deficits of this model family in handling fine-scale details of solar images when training on high resolution samples, contrary to training on natural face images. When switching to the diffusion based generative model family, we observe strong improvements of fine-scale detail generation. For the GAN family, we are able to achieve similar improvements in fine-scale generation when turning to ProjectedGANs, which uses multi-scale discriminators with a pre-trained frozen feature extractor. We conduct ablation studies to clarify mechanisms responsible for proper fine-scale handling. Using distributed training on supercomputers, we are able to train generative models for up to 1024x1024 resolution that produce high quality samples indistinguishable to human experts, as suggested by the evaluation we conduct. We make all code, models and workflows used in this study publicly available at \url{https://github.com/SLAMPAI/generative-models-for-highres-solar-images}.

Keyword: dynamic

GradMDM: Adversarial Attack on Dynamic Networks

  • Authors: Jianhong Pan, Lin Geng Foo, Qichen Zheng, Zhipeng Fan, Hossein Rahmani, Qiuhong Ke, Jun Liu
  • Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06724
  • Pdf link: https://arxiv.org/pdf/2304.06724
  • Abstract
    Dynamic neural networks can greatly reduce computation redundancy without compromising accuracy by adapting their structures based on the input. In this paper, we explore the robustness of dynamic neural networks against energy-oriented attacks targeted at reducing their efficiency. Specifically, we attack dynamic models with our novel algorithm GradMDM. GradMDM is a technique that adjusts the direction and the magnitude of the gradients to effectively find a small perturbation for each input, that will activate more computational units of dynamic models during inference. We evaluate GradMDM on multiple datasets and dynamic models, where it outperforms previous energy-oriented attack techniques, significantly increasing computation complexity while reducing the perceptibility of the perturbations.

Online Recognition of Incomplete Gesture Data to Interface Collaborative Robots

  • Authors: M. A. Simão, O. Gibaru, P. Neto
  • Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.06777
  • Pdf link: https://arxiv.org/pdf/2304.06777
  • Abstract
    Online recognition of gestures is critical for intuitive human-robot interaction (HRI) and further push collaborative robotics into the market, making robots accessible to more people. The problem is that it is difficult to achieve accurate gesture recognition in real unstructured environments, often using distorted and incomplete multisensory data. This paper introduces an HRI framework to classify large vocabularies of interwoven static gestures (SGs) and dynamic gestures (DGs) captured with wearable sensors. DG features are obtained by applying data dimensionality reduction to raw data from sensors (resampling with cubic interpolation and principal component analysis). Experimental tests were conducted using the UC2017 hand gesture dataset with samples from eight different subjects. The classification models show an accuracy of 95.6% for a library of 24 SGs with a random forest and 99.3% for 10 DGs using artificial neural networks. These results compare equally or favorably with different commonly used classifiers. Long short-term memory deep networks achieved similar performance in online frame-by-frame classification using raw incomplete data, performing better in terms of accuracy than static models with specially crafted features, but worse in training and inference time. The recognized gestures are used to teleoperate a robot in a collaborative process that consists in preparing a breakfast meal.

Soundini: Sound-Guided Diffusion for Natural Video Editing

  • Authors: Seung Hyun Lee, Sieun Kim, Innfarn Yoo, Feng Yang, Donghyeon Cho, Youngseo Kim, Huiwen Chang, Jinkyu Kim, Sangpil Kim
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06818
  • Pdf link: https://arxiv.org/pdf/2304.06818
  • Abstract
    We propose a method for adding sound-guided visual effects to specific regions of videos with a zero-shot setting. Animating the appearance of the visual effect is challenging because each frame of the edited video should have visual changes while maintaining temporal consistency. Moreover, existing video editing solutions focus on temporal consistency across frames, ignoring the visual style variations over time, e.g., thunderstorm, wave, fire crackling. To overcome this limitation, we utilize temporal sound features for the dynamic style. Specifically, we guide denoising diffusion probabilistic models with an audio latent representation in the audio-visual latent space. To the best of our knowledge, our work is the first to explore sound-guided natural video editing from various sound sources with sound-specialized properties, such as intensity, timbre, and volume. Additionally, we design optical flow-based guidance to generate temporally consistent video frames, capturing the pixel-wise relationship between adjacent frames. Experimental results show that our method outperforms existing video editing techniques, producing more realistic visual effects that reflect the properties of sound. Please visit our page: https://kuai-lab.github.io/soundini-gallery/.

DGNN-Booster: A Generic FPGA Accelerator Framework For Dynamic Graph Neural Network Inference

  • Authors: Hanqiu Chen, Cong Hao
  • Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06831
  • Pdf link: https://arxiv.org/pdf/2304.06831
  • Abstract
    Dynamic Graph Neural Networks (DGNNs) are becoming increasingly popular due to their effectiveness in analyzing and predicting the evolution of complex interconnected graph-based systems. However, hardware deployment of DGNNs still remains a challenge. First, DGNNs do not fully utilize hardware resources because temporal data dependencies cause low hardware parallelism. Additionally, there is currently a lack of generic DGNN hardware accelerator frameworks, and existing GNN accelerator frameworks have limited ability to handle dynamic graphs with changing topologies and node features. To address the aforementioned challenges, in this paper, we propose DGNN-Booster, which is a novel Field-Programmable Gate Array (FPGA) accelerator framework for real-time DGNN inference using High-Level Synthesis (HLS). It includes two different FPGA accelerator designs with different dataflows that can support the most widely used DGNNs. We showcase the effectiveness of our designs by implementing and evaluating two representative DGNN models on ZCU102 board and measuring the end-to-end performance. The experiment results demonstrate that DGNN-Booster can achieve a speedup of up to 5.6x compared to the CPU baseline (6226R), 8.4x compared to the GPU baseline (A6000) and 2.1x compared to the FPGA baseline without applying optimizations proposed in this paper. Moreover, DGNN-Booster can achieve over 100x and over 1000x runtime energy efficiency than the CPU and GPU baseline respectively. Our implementation code and on-board measurements are publicly available at https://github.com/sharc-lab/DGNN-Booster.

Video alignment using unsupervised learning of local and global features

  • Authors: Niloofar Fakhfour, Mohammad ShahverdiKondori, Hoda Mohammadzade
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06841
  • Pdf link: https://arxiv.org/pdf/2304.06841
  • Abstract
    In this paper, we tackle the problem of video alignment, the process of matching the frames of a pair of videos containing similar actions. The main challenge in video alignment is that accurate correspondence should be established despite the differences in the execution processes and appearances between the two videos. We introduce an unsupervised method for alignment that uses global and local features of the frames. In particular, we introduce effective features for each video frame by means of three machine vision tools: person detection, pose estimation, and VGG network. Then the features are processed and combined to construct a multidimensional time series that represent the video. The resulting time series are used to align videos of the same actions using a novel version of dynamic time warping named Diagonalized Dynamic Time Warping(DDTW). The main advantage of our approach is that no training is required, which makes it applicable for any new type of action without any need to collect training samples for it. For evaluation, we considered video synchronization and phase classification tasks on the Penn action dataset. Also, for an effective evaluation of the video synchronization task, we present a new metric called Enclosed Area Error(EAE). The results show that our method outperforms previous state-of-the-art methods, such as TCC and other self-supervised and supervised methods.

Multiagent Incentive Design for Dynamic Task Delegation with Off-Menu Actions

  • Authors: Tao Zhang, Quanyan Zhu
  • Subjects: Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2304.06842
  • Pdf link: https://arxiv.org/pdf/2304.06842
  • Abstract
    Surprisingly, the literature on dynamic mechanism seems to give relatively less attention to agents' dynamic participation, particularly regarding incentive compatibility with respect to these decisions. This addresses this gap by studying a dynamic mechanism design problem of task delegation. We expand upon the classic state mechanism model by incorporating agents' dual decisions concerning participation (off-menu actions) and regular action selections across multiple periods. The principal faces adverse selection, as agents receive private information over time that remains unobserved by the principal, and designs a mechanism of a task policy profile, which outlines the evolution of available action menus for agents; a coupling policy profile that directly impacts agents' utilities; and an off-switch function profile that assigns compensation or penalties if an agent withdraws. First, we present a sufficient condition called ``payoff-flow conservation" for ensuring dynamic incentive compatibility concerning regular actions. Second, we propose a unique process called persistence transformation, which allows us to derive a closed-form formulation for each off-switch function based on the task policy and carrier functions. This enables us to obtain a sufficient condition for the incentive-compatible concerning agents' combined decisions of off-menu and regular actions, aligning with the principal's preferences. Third, we provide a necessary condition for incentive compatibility that goes beyond canonical envelope conditions by leveraging the coupled optimality of the principal-desired off-menu and regular actions. This method allows us to obtain a set of sufficient conditions for incentive compatibility by pinning down an explicit expression for each carrier function, thereby determining the precise closed-form formulations of both the coupling functions and the off-switch functions.

Adaptive Safety-critical Control with Uncertainty Estimation for Human-robot Collaboration

  • Authors: Dianhao Zhang, Mien Van, Stephen Mcllvanna, Yuzhu Sun, Seán McLoone
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06867
  • Pdf link: https://arxiv.org/pdf/2304.06867
  • Abstract
    In advanced manufacturing, strict safety guarantees are required to allow humans and robots to work together in a shared workspace. One of the challenges in this application field is the variety and unpredictability of human behavior, leading to potential dangers for human coworkers. This paper presents a novel control framework by adopting safety-critical control and uncertainty estimation for human-robot collaboration. Additionally, to select the shortest path during collaboration, a novel quadratic penalty method is presented. The innovation of the proposed approach is that the proposed controller will prevent the robot from violating any safety constraints even in cases where humans move accidentally in a collaboration task. This is implemented by the combination of a time-varying integral barrier Lyapunov function (TVIBLF) and an adaptive exponential control barrier function (AECBF) to achieve a flexible mode switch between path tracking and collision avoidance with guaranteed closed-loop system stability. The performance of our approach is demonstrated in simulation studies on a 7-DOF robot manipulator. Additionally, a comparison between the tasks involving static and dynamic targets is provided.

Submerse: Visualizing Storm Surge Flooding Simulations in Immersive Display Ecologies

  • Authors: Saeed Boorboor, Yoonsang Kim, Ping Hu, Josef M. Moses, Brian A. Colle, Arie E. Kaufman
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.06872
  • Pdf link: https://arxiv.org/pdf/2304.06872
  • Abstract
    We present Submerse, an end-to-end framework for visualizing flooding scenarios on large and immersive display ecologies. Specifically, we reconstruct a surface mesh from input flood simulation data and generate a to-scale 3D virtual scene by incorporating geographical data such as terrain, textures, buildings, and additional scene objects. To optimize computation and memory performance for large simulation datasets, we discretize the data on an adaptive grid using dynamic quadtrees and support level-of-detail based rendering. Moreover, to provide a perception of flooding direction for a time instance, we animate the surface mesh by synthesizing water waves. As interaction is key for effective decision-making and analysis, we introduce two novel techniques for flood visualization in immersive systems: (1) an automatic scene-navigation method using optimal camera viewpoints generated for marked points-of-interest based on the display layout, and (2) an AR-based focus+context technique using an auxiliary display system. Submerse is developed in collaboration between computer scientists and atmospheric scientists. We evaluate the effectiveness of our system and application by conducting workshops with emergency managers, domain experts, and concerned stakeholders in the Stony Brook Reality Deck, an immersive gigapixel facility, to visualize a superstorm flooding scenario in New York City.

Sampling-based Reactive Synthesis for Nondeterministic Hybrid Systems

  • Authors: Qi Heng Ho, Zachary N. Sunberg, Morteza Lahijanian
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06876
  • Pdf link: https://arxiv.org/pdf/2304.06876
  • Abstract
    This paper introduces a sampling-based strategy synthesis algorithm for nondeterministic hybrid systems with complex continuous dynamics under temporal and reachability constraints. We view the evolution of the hybrid system as a two-player game, where the nondeterminism is an adversarial player whose objective is to prevent achieving temporal and reachability goals. The aim is to synthesize a winning strategy -- a reactive (robust) strategy that guarantees the satisfaction of the goals under all possible moves of the adversarial player. The approach is based on growing a (search) game-tree in the hybrid space by combining a sampling-based planning method with a novel bandit-based technique to select and improve on partial strategies. We provide conditions under which the algorithm is probabilistically complete, i.e., if a winning strategy exists, the algorithm will almost surely find it. The case studies and benchmark results show that the algorithm is general and consistently outperforms the state of the art.

Energy-Efficient UAV Communications in the Presence of Wind: 3D Modeling and Trajectory Design

  • Authors: Xinhong Dai, Bin Duo, Xiaojun Yuan, Marco Di Renzo
  • Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06909
  • Pdf link: https://arxiv.org/pdf/2304.06909
  • Abstract
    The rapid development of unmanned aerial vehicle (UAV) technology provides flexible communication services to terrestrial nodes. Energy efficiency is crucial to the deployment of UAVs, especially rotary-wing UAVs whose propulsion power is sensitive to the wind effect. In this paper, we first derive a three-dimensional (3D) generalised propulsion energy consumption model (GPECM) for rotary-wing UAVs under the consideration of stochastic wind modeling and 3D force analysis. Based on the GPECM, we study a UAV-enabled downlink communication system, where a rotary-wing UAV flies subject to stochastic wind disturbance and provides communication services for ground users (GUs). We aim to maximize the energy efficiency (EE) of the UAV by jointly optimizing the 3D trajectory and user scheduling among the GUs based on the GPECM. We formulate the problem as stochastic optimization, which is difficult to solve due to the lack of real-time wind information. To address this issue, we propose an offline-based online adaptive (OBOA) design with two phases, namely, an offline phase and an online phase. In the offline phase, we average the wind effect on the UAV by leveraging stochastic programming (SP) based on wind statistics; then, in the online phase, we further optimize the instantaneous velocity to adapt the real-time wind. Simulation results show that the optimized trajectories of the UAV in both two phases can better adapt to the wind in changing speed and direction, and achieves a higher EE compared with the windless scheme. In particular, our proposed OBOA design can be applied in the scenario with dramatic wind changes, and makes the UAV adjust its velocity dynamically to achieve a better performance in terms of EE.

SMAE: Few-shot Learning for HDR Deghosting with Saturation-Aware Masked Autoencoders

  • Authors: Qingsen Yan, Song Zhang, Weiye Chen, Hao Tang, Yu Zhu, Jinqiu Sun, Luc Van Gool, Yanning Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06914
  • Pdf link: https://arxiv.org/pdf/2304.06914
  • Abstract
    Generating a high-quality High Dynamic Range (HDR) image from dynamic scenes has recently been extensively studied by exploiting Deep Neural Networks (DNNs). Most DNNs-based methods require a large amount of training data with ground truth, requiring tedious and time-consuming work. Few-shot HDR imaging aims to generate satisfactory images with limited data. However, it is difficult for modern DNNs to avoid overfitting when trained on only a few images. In this work, we propose a novel semi-supervised approach to realize few-shot HDR imaging via two stages of training, called SSHDR. Unlikely previous methods, directly recovering content and removing ghosts simultaneously, which is hard to achieve optimum, we first generate content of saturated regions with a self-supervised mechanism and then address ghosts via an iterative semi-supervised learning framework. Concretely, considering that saturated regions can be regarded as masking Low Dynamic Range (LDR) input regions, we design a Saturated Mask AutoEncoder (SMAE) to learn a robust feature representation and reconstruct a non-saturated HDR image. We also propose an adaptive pseudo-label selection strategy to pick high-quality HDR pseudo-labels in the second stage to avoid the effect of mislabeled samples. Experiments demonstrate that SSHDR outperforms state-of-the-art methods quantitatively and qualitatively within and across different datasets, achieving appealing HDR visualization with few labeled samples.

An NMPC-ECBF Framework for Dynamic Motion Planning and Execution in vision-based Human-Robot Collaboration

  • Authors: Dianhao Zhang, Mien Van, Pantelis Sopasakis, Seán McLoone
  • Subjects: Robotics (cs.RO); Image and Video Processing (eess.IV); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06923
  • Pdf link: https://arxiv.org/pdf/2304.06923
  • Abstract
    To enable safe and effective human-robot collaboration (HRC) in smart manufacturing, seamless integration of sensing, cognition, and prediction into the robot controller is critical for real-time awareness, response, and communication inside a heterogeneous environment (robots, humans, and equipment). The proposed approach takes advantage of the prediction capabilities of nonlinear model predictive control (NMPC) to execute a safe path planning based on feedback from a vision system. In order to satisfy the requirement of real-time path planning, an embedded solver based on a penalty method is applied. However, due to tight sampling times NMPC solutions are approximate, and hence the safety of the system cannot be guaranteed. To address this we formulate a novel safety-critical paradigm with an exponential control barrier function (ECBF) used as a safety filter. We also design a simple human-robot collaboration scenario using V-REP to evaluate the performance of the proposed controller and investigate whether integrating human pose prediction can help with safe and efficient collaboration. The robot uses OptiTrack cameras for perception and dynamically generates collision-free trajectories to the predicted target interactive position. Results for a number of different configurations confirm the efficiency of the proposed motion planning and execution framework. It yields a 19.8% reduction in execution time for the HRC task considered.

Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events

  • Authors: Yangguang Wang, Xiang Zhang, Mingyuan Lin, Lei Yu, Boxin Shi, Wen Yang, Gui-Song Xia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06930
  • Pdf link: https://arxiv.org/pdf/2304.06930
  • Abstract
    Scene Dynamic Recovery (SDR) by inverting distorted Rolling Shutter (RS) images to an undistorted high frame-rate Global Shutter (GS) video is a severely ill-posed problem, particularly when prior knowledge about camera/object motions is unavailable. Commonly used artificial assumptions on motion linearity and data-specific characteristics, regarding the temporal dynamics information embedded in the RS scanlines, are prone to producing sub-optimal solutions in real-world scenarios. To address this challenge, we propose an event-based RS2GS framework within a self-supervised learning paradigm that leverages the extremely high temporal resolution of event cameras to provide accurate inter/intra-frame information. % In this paper, we propose to leverage the event camera to provide inter/intra-frame information as the emitted events have an extremely high temporal resolution and learn an event-based RS2GS network within a self-supervised learning framework, where real-world events and RS images can be exploited to alleviate the performance degradation caused by the domain gap between the synthesized and real data. Specifically, an Event-based Inter/intra-frame Compensator (E-IC) is proposed to predict the per-pixel dynamic between arbitrary time intervals, including the temporal transition and spatial translation. Exploring connections in terms of RS-RS, RS-GS, and GS-RS, we explicitly formulate mutual constraints with the proposed E-IC, resulting in supervisions without ground-truth GS images. Extensive evaluations over synthetic and real datasets demonstrate that the proposed method achieves state-of-the-art and shows remarkable performance for event-based RS2GS inversion in real-world scenarios. The dataset and code are available at https://w3un.github.io/selfunroll/.

A Unified HDR Imaging Method with Pixel and Patch Level

  • Authors: Qingsen Yan, Weiye Chen, Song Zhang, Yu Zhu, Jinqiu Sun, Yanning Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06943
  • Pdf link: https://arxiv.org/pdf/2304.06943
  • Abstract
    Mapping Low Dynamic Range (LDR) images with different exposures to High Dynamic Range (HDR) remains nontrivial and challenging on dynamic scenes due to ghosting caused by object motion or camera jitting. With the success of Deep Neural Networks (DNNs), several DNNs-based methods have been proposed to alleviate ghosting, they cannot generate approving results when motion and saturation occur. To generate visually pleasing HDR images in various cases, we propose a hybrid HDR deghosting network, called HyHDRNet, to learn the complicated relationship between reference and non-reference images. The proposed HyHDRNet consists of a content alignment subnetwork and a Transformer-based fusion subnetwork. Specifically, to effectively avoid ghosting from the source, the content alignment subnetwork uses patch aggregation and ghost attention to integrate similar content from other non-reference images with patch level and suppress undesired components with pixel level. To achieve mutual guidance between patch-level and pixel-level, we leverage a gating module to sufficiently swap useful information both in ghosted and saturated regions. Furthermore, to obtain a high-quality HDR image, the Transformer-based fusion subnetwork uses a Residual Deformable Transformer Block (RDTB) to adaptively merge information for different exposed regions. We examined the proposed method on four widely used public HDR image deghosting datasets. Experiments demonstrate that HyHDRNet outperforms state-of-the-art methods both quantitatively and qualitatively, achieving appealing HDR visualization with unified textures and colors.

Entropy-Based Energy Dissipation Analysis of Mobile Communication Systems

  • Authors: Litao Yan, Xiaohu Ge
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.06988
  • Pdf link: https://arxiv.org/pdf/2304.06988
  • Abstract
    Compared with the energy efficiency of conventional mobile communication systems, the energy efficiency of fifth generation (5G) communication systems has been improved more than 30 times. However, the energy consumption of 5G communication systems is 3 times of the energy consumption of fourth generation (4G) communication systems when the wireless traffic is increased more than 100 times in the last decade. It is anticipated that the traffic of future sixth generation (6G) communication systems will keep an exponential growth in the next decade. It is a key issue how much space is left for improving of energy efficiency in mobile communication systems. To answer the question, an entropy-based energy dissipation model based on nonequilibrium thermodynamics is first proposed for mobile communication systems. Moreover, the theoretical minimal energy dissipation limits are derived for typical modulations in mobile communication systems. Simulation results show that the practical energy dissipation of information processing and information transmission is three and seven orders of magnitude away from the theoretical minimal energy dissipation limits in mobile communication systems, respectively. These results provide some guidelines for energy efficiency optimization in future mobile communication systems.

LightRW: FPGA Accelerated Graph Dynamic Random Walks

  • Authors: Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, Weng-Fai Wong
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.07004
  • Pdf link: https://arxiv.org/pdf/2304.07004
  • Abstract
    Graph dynamic random walks (GDRWs) have recently emerged as a powerful paradigm for graph analytics and learning applications, including graph embedding and graph neural networks. Despite the fact that many existing studies optimize the performance of GDRWs on multi-core CPUs, massive random memory accesses and costly synchronizations cause severe resource underutilization, and the processing of GDRWs is usually the key performance bottleneck in many graph applications. This paper studies an alternative architecture, FPGA, to address these issues in GDRWs, as FPGA has the ability of hardware customization so that we are able to explore fine-grained pipeline execution and specialized memory access optimizations. Specifically, we propose {LightRW}, a novel FPGA-based accelerator for GDRWs. LightRW embraces a series of optimizations to enable fine-grained pipeline execution on the chip and to exploit the massive parallelism of FPGA while significantly reducing memory accesses. As current commonly used sampling methods in GDRWs do not efficiently support fine-grained pipeline execution, we develop a parallelized reservoir sampling method to sample multiple vertices per cycle for efficient pipeline execution. To address the random memory access issues, we propose a degree-aware configurable caching method that buffers hot vertices on-chip to alleviate random memory accesses and a dynamic burst access engine that efficiently retrieves neighbors. Experimental results show that our optimization techniques are able to improve the performance of GDRWs on FPGA significantly. Moreover, LightRW delivers up to 9.55x and 9.10x speedup over the state-of-the-art CPU-based MetaPath and Node2vec random walks, respectively. This work is open-sourced on GitHub at https://github.com/Xtra-Computing/LightRW.

Learning Graph ODE for Continuous-Time Sequential Recommendation

  • Authors: Yifang Qin, Wei Ju, Hongjun Wu, Xiao Luo, Ming Zhang
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.07042
  • Pdf link: https://arxiv.org/pdf/2304.07042
  • Abstract
    Sequential recommendation aims at understanding user preference by capturing successive behavior correlations, which are usually represented as the item purchasing sequences based on their past interactions. Existing efforts generally predict the next item via modeling the sequential patterns. Despite effectiveness, there exist two natural deficiencies: (i) user preference is dynamic in nature, and the evolution of collaborative signals is often ignored; and (ii) the observed interactions are often irregularly-sampled, while existing methods model item transitions assuming uniform intervals. Thus, how to effectively model and predict the underlying dynamics for user preference becomes a critical research problem. To tackle the above challenges, in this paper, we focus on continuous-time sequential recommendation and propose a principled graph ordinary differential equation framework named GDERec. Technically, GDERec is characterized by an autoregressive graph ordinary differential equation consisting of two components, which are parameterized by two tailored graph neural networks (GNNs) respectively to capture user preference from the perspective of hybrid dynamical systems. The two customized GNNs are trained alternately in an autoregressive manner to track the evolution of the underlying system from irregular observations, and thus learn effective representations of users and items beneficial to the sequential recommendation. Extensive experiments on five benchmark datasets demonstrate the superiority of our model over various state-of-the-art recommendation methods.

Nonlinear feedback stabilisation and stochastic disturbance suppression of actively Q-switched lasers

  • Authors: Lukas Tarra, Andreas Deutschmann-Olek, Andreas Kugi
  • Subjects: Systems and Control (eess.SY); Chaotic Dynamics (nlin.CD); Optics (physics.optics)
  • Arxiv link: https://arxiv.org/abs/2304.07075
  • Pdf link: https://arxiv.org/pdf/2304.07075
  • Abstract
    Actively Q-switched lasers are widely used tools which are required to produce stable output pulse energies for many applications. In this paper, a model-based control concept for actively Q-switched lasers is presented which stabilises their nonlinear pulse-to-pulse dynamics and rejects stochastic disturbances arising from amplified spontaneous emission. The feasibility of the control task is demonstrated to strongly depend on the design of the semi-active prelasing approach. In contrast to state-of-the-art hardware-based controllers, the proposed concept is flexible and cost-effective as it is not tailored to specific operation parameters.

A Dynamic Heterogeneous Team-based Non-iterative Approach for Online Pick-up and Just-In-Time Delivery Problems

  • Authors: Shridhar Velhal, Srikrishna B R, Mukunda Bharatheesha, Suresh Sundaram
  • Subjects: Multiagent Systems (cs.MA); Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.07124
  • Pdf link: https://arxiv.org/pdf/2304.07124
  • Abstract
    This paper presents a non-iterative approach for finding the assignment of heterogeneous robots to efficiently execute online Pickup and Just-In-Time Delivery (PJITD) tasks with optimal resource utilization. The PJITD assignments problem is formulated as a spatio-temporal multi-task assignment (STMTA) problem. The physical constraints on the map and vehicle dynamics are incorporated in the cost formulation. The linear sum assignment problem is formulated for the heterogeneous STMTA problem. The recently proposed Dynamic Resource Allocation with Multi-task assignments (DREAM) approach has been modified to solve the heterogeneous PJITD problem. At the start, it computes the minimum number of robots required (with their types) to execute given heterogeneous PJITD tasks. These required robots are added to the team to guarantee the feasibility of all PJITD tasks. Then robots in an updated team are assigned to execute the PJITD tasks while minimizing the total cost for the team to execute all PJITD tasks. The performance of the proposed non-iterative approach has been validated using high-fidelity software-in-loop simulations and hardware experiments. The simulations and experimental results clearly indicate that the proposed approach is scalable and provides optimal resource utilization.

Towards Controllable Diffusion Models via Reward-Guided Exploration

  • Authors: Hengtong Zhang, Tingyang Xu
  • Subjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM)
  • Arxiv link: https://arxiv.org/abs/2304.07132
  • Pdf link: https://arxiv.org/pdf/2304.07132
  • Abstract
    By formulating data samples' formation as a Markov denoising process, diffusion models achieve state-of-the-art performances in a collection of tasks. Recently, many variants of diffusion models have been proposed to enable controlled sample generation. Most of these existing methods either formulate the controlling information as an input (i.e.,: conditional representation) for the noise approximator, or introduce a pre-trained classifier in the test-phase to guide the Langevin dynamic towards the conditional goal. However, the former line of methods only work when the controlling information can be formulated as conditional representations, while the latter requires the pre-trained guidance classifier to be differentiable. In this paper, we propose a novel framework named RGDM (Reward-Guided Diffusion Model) that guides the training-phase of diffusion models via reinforcement learning (RL). The proposed training framework bridges the objective of weighted log-likelihood and maximum entropy RL, which enables calculating policy gradients via samples from a pay-off distribution proportional to exponential scaled rewards, rather than from policies themselves. Such a framework alleviates the high gradient variances and enables diffusion models to explore for highly rewarded samples in the reverse process. Experiments on 3D shape and molecule generation tasks show significant improvements over existing conditional diffusion models.

On Data Sampling Strategies for Training Neural Network Speech Separation Models

  • Authors: William Ravenscroft, Stefan Goetze, Thomas Hain
  • Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.07142
  • Pdf link: https://arxiv.org/pdf/2304.07142
  • Abstract
    Speech separation remains an important area of multi-speaker signal processing. Deep neural network (DNN) models have attained the best performance on many speech separation benchmarks. Some of these models can take significant time to train and have high memory requirements. Previous work has proposed shortening training examples to address these issues but the impact of this on model performance is not yet well understood. In this work, the impact of applying these training signal length (TSL) limits is analysed for two speech separation models: SepFormer, a transformer model, and Conv-TasNet, a convolutional model. The WJS0-2Mix, WHAMR and Libri2Mix datasets are analysed in terms of signal length distribution and its impact on training efficiency. It is demonstrated that, for specific distributions, applying specific TSL limits results in better performance. This is shown to be mainly due to randomly sampling the start index of the waveforms resulting in more unique examples for training. A SepFormer model trained using a TSL limit of 4.42s and dynamic mixing (DM) is shown to match the best-performing SepFormer model trained with DM and unlimited signal lengths. Furthermore, the 4.42s TSL limit results in a 44% reduction in training time with WHAMR.

FOCUS : A framework for energy system optimization from prosumer to district and city scale

  • Authors: Jingyu Gong, Yi Nie, Jonas van Ouwerkerk, Felix Wege, Mauricio Celi Cortés, Christoph von Oy, Jonas Brucksch, Christian Bußar, Thomas Schreiber, Dirk Uwe Sauer, Dirk Müller, Antonello Monti
  • Subjects: Systems and Control (eess.SY); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2304.07150
  • Pdf link: https://arxiv.org/pdf/2304.07150
  • Abstract
    Decarbonizing the energy sector is one of the main challenges to combat the climate crisis. Cities play an important role to reach climate neutrality as more than 70% of global CO2 emissions originate from urban areas. Decarbonization of energy supply systems can be achieved through various means, including the use of renewable energy sources, improving the efficiency of technologies, the coupling of different energy sectors, and the use of flexibility considering individual prosumer behaviour. This leads to an increasingly decentralized energy system, which is challenging to operate in a robust and cost-effective way. The evaluation of technologies and subsystems can only be done from the perspective of the system in which it is embedded and it is highly dependent on their networking and application scenarios. Therefore, the design and operation of energy systems require adequate computation and evaluation tools, which offer a holistic view of all interconnected components. The currently available optimization tools have limitations, such as limited scope of technologies and sectors, high requirements on data, high computational cost, and difficulty in handling multi-objective optimization. To overcome these limitations a software framework called FOCUS for the flexible and dynamic modeling of any urban sector-coupled energy system is developed. The framework includes a library containing models for different technologies and offers a variety of parameter sets for each technology. FOCUS can handle multi-objective problems by returning Pareto-optimal fronts, which helps users to discover the trade-off between criteria and objectives. The developed tool can identify new flexibility potentials in the energy system, actively support companies in the respective field to optimize urban energy system planning solutions, and determine possible threads to the stable operation of such systems.

Unsupervised Learning Optical Flow in Multi-frame Dynamic Environment Using Temporal Dynamic Modeling

  • Authors: Zitang Sun, Shin'ya Nishida, Zhengbo Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.07159
  • Pdf link: https://arxiv.org/pdf/2304.07159
  • Abstract
    For visual estimation of optical flow, a crucial function for many vision tasks, unsupervised learning, using the supervision of view synthesis has emerged as a promising alternative to supervised methods, since ground-truth flow is not readily available in many cases. However, unsupervised learning is likely to be unstable when pixel tracking is lost due to occlusion and motion blur, or the pixel matching is impaired due to variation in image content and spatial structure over time. In natural environments, dynamic occlusion or object variation is a relatively slow temporal process spanning several frames. We, therefore, explore the optical flow estimation from multiple-frame sequences of dynamic scenes, whereas most of the existing unsupervised approaches are based on temporal static models. We handle the unsupervised optical flow estimation with a temporal dynamic model by introducing a spatial-temporal dual recurrent block based on the predictive coding structure, which feeds the previous high-level motion prior to the current optical flow estimator. Assuming temporal smoothness of optical flow, we use motion priors of the adjacent frames to provide more reliable supervision of the occluded regions. To grasp the essence of challenging scenes, we simulate various scenarios across long sequences, including dynamic occlusion, content variation, and spatial variation, and adopt self-supervised distillation to make the model understand the object's motion patterns in a prolonged dynamic environment. Experiments on KITTI 2012, KITTI 2015, Sintel Clean, and Sintel Final datasets demonstrate the effectiveness of our methods on unsupervised optical flow estimation. The proposal achieves state-of-the-art performance with advantages in memory overhead.

A Comparative Study on Generative Models for High Resolution Solar Observation Imaging

  • Authors: Mehdi Cherti, Alexander Czernik, Stefan Kesselheim, Frederic Effenberger, Jenia Jitsev
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.07169
  • Pdf link: https://arxiv.org/pdf/2304.07169
  • Abstract
    Solar activity is one of the main drivers of variability in our solar system and the key source of space weather phenomena that affect Earth and near Earth space. The extensive record of high resolution extreme ultraviolet (EUV) observations from the Solar Dynamics Observatory (SDO) offers an unprecedented, very large dataset of solar images. In this work, we make use of this comprehensive dataset to investigate capabilities of current state-of-the-art generative models to accurately capture the data distribution behind the observed solar activity states. Starting from StyleGAN-based methods, we uncover severe deficits of this model family in handling fine-scale details of solar images when training on high resolution samples, contrary to training on natural face images. When switching to the diffusion based generative model family, we observe strong improvements of fine-scale detail generation. For the GAN family, we are able to achieve similar improvements in fine-scale generation when turning to ProjectedGANs, which uses multi-scale discriminators with a pre-trained frozen feature extractor. We conduct ablation studies to clarify mechanisms responsible for proper fine-scale handling. Using distributed training on supercomputers, we are able to train generative models for up to 1024x1024 resolution that produce high quality samples indistinguishable to human experts, as suggested by the evaluation we conduct. We make all code, models and workflows used in this study publicly available at \url{https://github.com/SLAMPAI/generative-models-for-highres-solar-images}.

EV-Catcher: High-Speed Object Catching Using Low-latency Event-based Neural Networks

  • Authors: Ziyun Wang, Fernando Cladera Ojeda, Anthony Bisulco, Daewon Lee, Camillo J. Taylor, Kostas Daniilidis, M. Ani Hsieh, Daniel D. Lee, Volkan Isler
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.07200
  • Pdf link: https://arxiv.org/pdf/2304.07200
  • Abstract
    Event-based sensors have recently drawn increasing interest in robotic perception due to their lower latency, higher dynamic range, and lower bandwidth requirements compared to standard CMOS-based imagers. These properties make them ideal tools for real-time perception tasks in highly dynamic environments. In this work, we demonstrate an application where event cameras excel: accurately estimating the impact location of fast-moving objects. We introduce a lightweight event representation called Binary Event History Image (BEHI) to encode event data at low latency, as well as a learning-based approach that allows real-time inference of a confidence-enabled control signal to the robot. To validate our approach, we present an experimental catching system in which we catch fast-flying ping-pong balls. We show that the system is capable of achieving a success rate of 81% in catching balls targeted at different locations, with a velocity of up to 13 m/s even on compute-constrained embedded platforms such as the Nvidia Jetson NX.

Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models

  • Authors: Yaohua Zha, Jinpeng Wang, Tao Dai, Bin Chen, Zhi Wang, Shu-Tao Xia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.07221
  • Pdf link: https://arxiv.org/pdf/2304.07221
  • Abstract
    Recently, pre-trained point cloud models have found extensive applications in downstream tasks like object classification. However, these tasks often require {full fine-tuning} of models and lead to storage-intensive procedures, thus limiting the real applications of pre-trained models. Inspired by the great success of visual prompt tuning (VPT) in vision, we attempt to explore prompt tuning, which serves as an efficient alternative to full fine-tuning for large-scale models, to point cloud pre-trained models to reduce storage costs. However, it is non-trivial to apply the traditional static VPT to point clouds, owing to the distribution diversity of point cloud data. For instance, the scanned point clouds exhibit various types of missing or noisy points. To address this issue, we propose an Instance-aware Dynamic Prompt Tuning (IDPT) for point cloud pre-trained models, which utilizes a prompt module to perceive the semantic prior features of each instance. This semantic prior facilitates the learning of unique prompts for each instance, thus enabling downstream tasks to robustly adapt to pre-trained point cloud models. Notably, extensive experiments conducted on downstream tasks demonstrate that IDPT outperforms full fine-tuning in most tasks with a mere 7% of the trainable parameters, thus significantly reducing the storage pressure. Code is available at \url{https://github.com/zyh16143998882/IDPT}.

Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space

  • Authors: Seokju Yun, Youngmin Ro
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.07254
  • Pdf link: https://arxiv.org/pdf/2304.07254
  • Abstract
    We introduce Dynamic Mobile-Former(DMF), maximizes the capabilities of dynamic convolution by harmonizing it with efficient operators.Our Dynamic MobileFormer effectively utilizes the advantages of Dynamic MobileNet (MobileNet equipped with dynamic convolution) using global information from light-weight attention.A Transformer in Dynamic Mobile-Former only requires a few randomly initialized tokens to calculate global features, making it computationally efficient.And a bridge between Dynamic MobileNet and Transformer allows for bidirectional integration of local and global features.We also simplify the optimization process of vanilla dynamic convolution by splitting the convolution kernel into an input-agnostic kernel and an input-dependent kernel.This allows for optimization in a wider kernel space, resulting in enhanced capacity.By integrating lightweight attention and enhanced dynamic convolution, our Dynamic Mobile-Former achieves not only high efficiency, but also strong performance.We benchmark the Dynamic Mobile-Former on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection, and instanace segmentation.For example, our DMF hits the top-1 accuracy of 79.4% on ImageNet-1K, much higher than PVT-Tiny by 4.3% with only 1/4 FLOPs.Additionally,our proposed DMF-S model performed well on challenging vision datasets such as COCO, achieving a 39.0% mAP,which is 1% higher than that of the Mobile-Former 508M model, despite using 3 GFLOPs less computations.Code and models are available at https://github.com/ysj9909/DMF

New submissions for Fri, 14 Apr 23

Keyword: efficient

RELS-DQN: A Robust and Efficient Local Search Framework for Combinatorial Optimization

  • Authors: Yuanhang Shao, Tonmoy Dey, Nikola Vuckovic, Luke Van Popering, Alan Kuhnle
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06048
  • Pdf link: https://arxiv.org/pdf/2304.06048
  • Abstract
    Combinatorial optimization (CO) aims to efficiently find the best solution to NP-hard problems ranging from statistical physics to social media marketing. A wide range of CO applications can benefit from local search methods because they allow reversible action over greedy policies. Deep Q-learning (DQN) using message-passing neural networks (MPNN) has shown promise in replicating the local search behavior and obtaining comparable results to the local search algorithms. However, the over-smoothing and the information loss during the iterations of message passing limit its robustness across applications, and the large message vectors result in memory inefficiency. Our paper introduces RELS-DQN, a lightweight DQN framework that exhibits the local search behavior while providing practical scalability. Using the RELS-DQN model trained on one application, it can generalize to various applications by providing solution values higher than or equal to both the local search algorithms and the existing DQN models while remaining efficient in runtime and memory.

Exploiting Symmetry and Heuristic Demonstrations in Off-policy Reinforcement Learning for Robotic Manipulation

  • Authors: Amir M. Soufi Enayati, Zengjie Zhang, Kashish Gupta, Homayoun Najjaran
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06055
  • Pdf link: https://arxiv.org/pdf/2304.06055
  • Abstract
    Reinforcement learning demonstrates significant potential in automatically building control policies in numerous domains, but shows low efficiency when applied to robot manipulation tasks due to the curse of dimensionality. To facilitate the learning of such tasks, prior knowledge or heuristics that incorporate inherent simplification can effectively improve the learning performance. This paper aims to define and incorporate the natural symmetry present in physical robotic environments. Then, sample-efficient policies are trained by exploiting the expert demonstrations in symmetrical environments through an amalgamation of reinforcement and behavior cloning, which gives the off-policy learning process a diverse yet compact initiation. Furthermore, it presents a rigorous framework for a recent concept and explores its scope for robot manipulation tasks. The proposed method is validated via two point-to-point reaching tasks of an industrial arm, with and without an obstacle, in a simulation experiment study. A PID controller, which tracks the linear joint-space trajectories with hard-coded temporal logic to produce interim midpoints, is used to generate demonstrations in the study. The results of the study present the effect of the number of demonstrations and quantify the magnitude of behavior cloning to exemplify the possible improvement of model-free reinforcement learning in common manipulation tasks. A comparison study between the proposed method and a traditional off-policy reinforcement learning algorithm indicates its advantage in learning performance and potential value for applications.

Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays

  • Authors: Chen Xie, Francesco Daghero, Yukai Chen, Marco Castellano, Luca Gandolfi, Andrea Calimera, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06059
  • Pdf link: https://arxiv.org/pdf/2304.06059
  • Abstract
    Ultra-low-resolution Infrared (IR) array sensors offer a low-cost, energy-efficient, and privacy-preserving solution for people counting, with applications such as occupancy monitoring. Previous work has shown that Deep Learning (DL) can yield superior performance on this task. However, the literature was missing an extensive comparative analysis of various efficient DL architectures for IR array-based people counting, that considers not only their accuracy, but also the cost of deploying them on memory- and energy-constrained Internet of Things (IoT) edge nodes. In this work, we address this need by comparing 6 different DL architectures on a novel dataset composed of IR images collected from a commercial 8x8 array, which we made openly available. With a wide architectural exploration of each model type, we obtain a rich set of Pareto-optimal solutions, spanning cross-validated balanced accuracy scores in the 55.70-82.70% range. When deployed on a commercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, these models occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference, while consuming 17.18-120.43 $\mu$J of energy. Our models are significantly more accurate than a previous deterministic method (up to +39.9%), while being up to 3.53x faster and more energy efficient. Further, our models' accuracy is comparable to state-of-the-art DL solutions on similar resolution sensors, despite a much lower complexity. All our models enable continuous, real-time inference on a MCU-based IoT node, with years of autonomous operation without battery recharging.

Energy-guided Entropic Neural Optimal Transport

  • Authors: Petr Mokrov, Alexander Korotin, Evgeny Burnaev
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.06094
  • Pdf link: https://arxiv.org/pdf/2304.06094
  • Abstract
    Energy-Based Models (EBMs) are known in the Machine Learning community for the decades. Since the seminal works devoted to EBMs dating back to the noughties there have been appearing a lot of efficient methods which solve the generative modelling problem by means of energy potentials (unnormalized likelihood functions). In contrast, the realm of Optimal Transport (OT) and, in particular, neural OT solvers is much less explored and limited by few recent works (excluding WGAN based approaches which utilize OT as a loss function and do not model OT maps themselves). In our work, we bridge the gap between EBMs and Entropy-regularized OT. We present the novel methodology which allows utilizing the recent developments and technical improvements of the former in order to enrich the latter. We validate the applicability of our method on toy 2D scenarios as well as standard unpaired image-to-image translation problems. For the sake of simplicity, we choose simple short- and long- run EBMs as a backbone of our Energy-guided Entropic OT method, leaving the application of more sophisticated EBMs for future research.

IoT trust and reputation: a survey and taxonomy

  • Authors: Muhammad Aaqib, Aftab Ali, Liming Chen, Omar Nibouche
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06119
  • Pdf link: https://arxiv.org/pdf/2304.06119
  • Abstract
    IoT is one of the fastest-growing technologies and it is estimated that more than a billion devices would be utilized across the globe by the end of 2030. To maximize the capability of these connected entities, trust and reputation among IoT entities is essential. Several trust management models have been proposed in the IoT environment; however, these schemes have not fully addressed the IoT devices features, such as devices role, device type and its dynamic behavior in a smart environment. As a result, traditional trust and reputation models are insufficient to tackle these characteristics and uncertainty risks while connecting nodes to the network. Whilst continuous study has been carried out and various articles suggest promising solutions in constrained environments, research on trust and reputation is still at its infancy. In this paper, we carry out a comprehensive literature review on state-of-the-art research on the trust and reputation of IoT devices and systems. Specifically, we first propose a new structure, namely a new taxonomy, to organize the trust and reputation models based on the ways trust is managed. The proposed taxonomy comprises of traditional trust management-based systems and artificial intelligence-based systems, and combine both the classes which encourage the existing schemes to adapt these emerging concepts. This collaboration between the conventional mathematical and the advanced ML models result in design schemes that are more robust and efficient. Then we drill down to compare and analyse the methods and applications of these systems based on community-accepted performance metrics, e.g. scalability, delay, cooperativeness and efficiency. Finally, built upon the findings of the analysis, we identify and discuss open research issues and challenges, and further speculate and point out future research directions.

Label-Free Concept Bottleneck Models

  • Authors: Tuomas Oikarinen, Subhro Das, Lam M. Nguyen, Tsui-Wei Weng
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06129
  • Pdf link: https://arxiv.org/pdf/2304.06129
  • Abstract
    Concept bottleneck models (CBM) are a popular way of creating more interpretable neural networks by having hidden layer neurons correspond to human-understandable concepts. However, existing CBMs and their variants have two crucial limitations: first, they need to collect labeled data for each of the predefined concepts, which is time consuming and labor intensive; second, the accuracy of a CBM is often significantly lower than that of a standard neural network, especially on more complex datasets. This poor performance creates a barrier for adopting CBMs in practical real world applications. Motivated by these challenges, we propose Label-free CBM which is a novel framework to transform any neural network into an interpretable CBM without labeled concept data, while retaining a high accuracy. Our Label-free CBM has many advantages, it is: scalable - we present the first CBM scaled to ImageNet, efficient - creating a CBM takes only a few hours even for very large datasets, and automated - training it for a new dataset requires minimal human effort. Our code is available at https://github.com/Trustworthy-ML-Lab/Label-free-CBM.

AGI for Agriculture

  • Authors: Guoyu Lu, Sheng Li, Gengchen Mai, Jin Sun, Dajiang Zhu, Lilong Chai, Haijian Sun, Xianqiao Wang, Haixing Dai, Ninghao Liu, Rui Xu, Daniel Petti, Changying Li, Tianming Liu, Changying Li
  • Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06136
  • Pdf link: https://arxiv.org/pdf/2304.06136
  • Abstract
    Artificial General Intelligence (AGI) is poised to revolutionize a variety of sectors, including healthcare, finance, transportation, and education. Within healthcare, AGI is being utilized to analyze clinical medical notes, recognize patterns in patient data, and aid in patient management. Agriculture is another critical sector that impacts the lives of individuals worldwide. It serves as a foundation for providing food, fiber, and fuel, yet faces several challenges, such as climate change, soil degradation, water scarcity, and food security. AGI has the potential to tackle these issues by enhancing crop yields, reducing waste, and promoting sustainable farming practices. It can also help farmers make informed decisions by leveraging real-time data, leading to more efficient and effective farm management. This paper delves into the potential future applications of AGI in agriculture, such as agriculture image processing, natural language processing (NLP), robotics, knowledge graphs, and infrastructure, and their impact on precision livestock and precision crops. By leveraging the power of AGI, these emerging technologies can provide farmers with actionable insights, allowing for optimized decision-making and increased productivity. The transformative potential of AGI in agriculture is vast, and this paper aims to highlight its potential to revolutionize the industry.

Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

  • Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.06178
  • Pdf link: https://arxiv.org/pdf/2304.06178
  • Abstract
    Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.

SePEnTra: A secure and privacy-preserving energy trading mechanisms in transactive energy market

  • Authors: Rumpa Dasgupta, Amin Sakzad, Carsten Rudolph, Rafael Dowsley
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.06179
  • Pdf link: https://arxiv.org/pdf/2304.06179
  • Abstract
    In this paper, we design and present a novel model called SePEnTra to ensure the security and privacy of energy data while sharing with other entities during energy trading to determine optimal price signals. Furthermore, the market operator can use this data to detect malicious activities of users in the later stage without violating privacy (e.g., deviation of actual energy generation/consumption from forecast beyond a threshold). We use two cryptographic primitives, additive secret sharing and Pedersen commitment, in SePEnTra. The performance of our model is evaluated theoretically and numerically. We compare the performance of SePEnTra with the same Transactive energy market (TEM) framework without security mechanisms. The result shows that even though using advanced cryptographic primitives in a large market framework, SePEnTra has very low computational complexity and communication overhead. Moreover, it is storage efficient for all parties.

SURFSUP: Learning Fluid Simulation for Novel Surfaces

  • Authors: Arjun Mani, Ishaan Preetam Chandratreya, Elliot Creager, Carl Vondrick, Richard Zemel
  • Subjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)
  • Arxiv link: https://arxiv.org/abs/2304.06197
  • Pdf link: https://arxiv.org/pdf/2304.06197
  • Abstract
    Modeling the mechanics of fluid in complex scenes is vital to applications in design, graphics, and robotics. Learning-based methods provide fast and differentiable fluid simulators, however most prior work is unable to accurately model how fluids interact with genuinely novel surfaces not seen during training. We introduce SURFSUP, a framework that represents objects implicitly using signed distance functions (SDFs), rather than an explicit representation of meshes or particles. This continuous representation of geometry enables more accurate simulation of fluid-object interactions over long time periods while simultaneously making computation more efficient. Moreover, SURFSUP trained on simple shape primitives generalizes considerably out-of-distribution, even to complex real-world scenes and objects. Finally, we show we can invert our model to design simple objects to manipulate fluid flow.

Space-Time Tradeoffs for Conjunctive Queries with Access Patterns

  • Authors: Hangdong Zhao, Shaleen Deep, Paraschos Koutris
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.06221
  • Pdf link: https://arxiv.org/pdf/2304.06221
  • Abstract
    In this paper, we investigate space-time tradeoffs for answering conjunctive queries with access patterns (CQAPs). The goal is to create a space-efficient data structure in an initial preprocessing phase and use it for answering (multiple) queries in an online phase. Previous work has developed data structures that trades off space usage for answering time for queries of practical interest, such as the path and triangle query. However, these approaches lack a comprehensive framework and are not generalizable. Our main contribution is a general algorithmic framework for obtaining space-time tradeoffs for any CQAP. Our framework builds upon the $\PANDA$ algorithm and tree decomposition techniques. We demonstrate that our framework captures all state-of-the-art tradeoffs that were independently produced for various queries. Further, we show surprising improvements over the state-of-the-art tradeoffs known in the existing literature for reachability queries.

Improving Segmentation of Objects with Varying Sizes in Biomedical Images using Instance-wise and Center-of-Instance Segmentation Loss Function

  • Authors: Muhammad Febrian Rachmadi, Charissa Poon, Henrik Skibbe
  • Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06229
  • Pdf link: https://arxiv.org/pdf/2304.06229
  • Abstract
    In this paper, we propose a novel two-component loss for biomedical image segmentation tasks called the Instance-wise and Center-of-Instance (ICI) loss, a loss function that addresses the instance imbalance problem commonly encountered when using pixel-wise loss functions such as the Dice loss. The Instance-wise component improves the detection of small instances or ``blobs" in image datasets with both large and small instances. The Center-of-Instance component improves the overall detection accuracy. We compared the ICI loss with two existing losses, the Dice loss and the blob loss, in the task of stroke lesion segmentation using the ATLAS R2.0 challenge dataset from MICCAI 2022. Compared to the other losses, the ICI loss provided a better balanced segmentation, and significantly outperformed the Dice loss with an improvement of $1.7-3.7%$ and the blob loss by $0.6-5.0%$ in terms of the Dice similarity coefficient on both validation and test set, suggesting that the ICI loss is a potential solution to the instance imbalance problem.

Physics-informed radial basis network (PIRBN): A local approximation neural network for solving nonlinear PDEs

  • Authors: Jinshuai Bai, Gui-Rong Liu, Ashish Gupta, Laith Alzubaidi, Xi-Qiao Feng, YuanTong Gu
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06234
  • Pdf link: https://arxiv.org/pdf/2304.06234
  • Abstract
    Our recent intensive study has found that physics-informed neural networks (PINN) tend to be local approximators after training. This observation leads to this novel physics-informed radial basis network (PIRBN), which can maintain the local property throughout the entire training process. Compared to deep neural networks, a PIRBN comprises of only one hidden layer and a radial basis "activation" function. Under appropriate conditions, we demonstrated that the training of PIRBNs using gradient descendent methods can converge to Gaussian processes. Besides, we studied the training dynamics of PIRBN via the neural tangent kernel (NTK) theory. In addition, comprehensive investigations regarding the initialisation strategies of PIRBN were conducted. Based on numerical examples, PIRBN has been demonstrated to be more effective and efficient than PINN in solving PDEs with high-frequency features and ill-posed computational domains. Moreover, the existing PINN numerical techniques, such as adaptive learning, decomposition and different types of loss functions, are applicable to PIRBN. The programs that can regenerate all numerical results can be found at https://github.com/JinshuaiBai/PIRBN.

Cross-View Hierarchy Network for Stereo Image Super-Resolution

  • Authors: Wenbin Zou, Hongxia Gao, Liang Chen, Yunchen Zhang, Mingchao Jiang, Zhongxin Yu, Ming Tan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06236
  • Pdf link: https://arxiv.org/pdf/2304.06236
  • Abstract
    Stereo image super-resolution aims to improve the quality of high-resolution stereo image pairs by exploiting complementary information across views. To attain superior performance, many methods have prioritized designing complex modules to fuse similar information across views, yet overlooking the importance of intra-view information for high-resolution reconstruction. It also leads to problems of wrong texture in recovered images. To address this issue, we explore the interdependencies between various hierarchies from intra-view and propose a novel method, named Cross-View-Hierarchy Network for Stereo Image Super-Resolution (CVHSSR). Specifically, we design a cross-hierarchy information mining block (CHIMB) that leverages channel attention and large kernel convolution attention to extract both global and local features from the intra-view, enabling the efficient restoration of accurate texture details. Additionally, a cross-view interaction module (CVIM) is proposed to fuse similar features from different views by utilizing cross-view attention mechanisms, effectively adapting to the binocular scene. Extensive experiments demonstrate the effectiveness of our method. CVHSSR achieves the best stereo image super-resolution performance than other state-of-the-art methods while using fewer parameters. The source code and pre-trained models are available at https://github.com/AlexZou14/CVHSSR.

EWT: Efficient Wavelet-Transformer for Single Image Denoising

  • Authors: Juncheng Li, Bodong Cheng, Ying Chen, Guangwei Gao, Tieyong Zeng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06274
  • Pdf link: https://arxiv.org/pdf/2304.06274
  • Abstract
    Transformer-based image denoising methods have achieved encouraging results in the past year. However, it must uses linear operations to model long-range dependencies, which greatly increases model inference time and consumes GPU storage space. Compared with convolutional neural network-based methods, current Transformer-based image denoising methods cannot achieve a balance between performance improvement and resource consumption. In this paper, we propose an Efficient Wavelet Transformer (EWT) for image denoising. Specifically, we use Discrete Wavelet Transform (DWT) and Inverse Wavelet Transform (IWT) for downsampling and upsampling, respectively. This method can fully preserve the image features while reducing the image resolution, thereby greatly reducing the device resource consumption of the Transformer model. Furthermore, we propose a novel Dual-stream Feature Extraction Block (DFEB) to extract image features at different levels, which can further reduce model inference time and GPU memory usage. Experiments show that our method speeds up the original Transformer by more than 80%, reduces GPU memory usage by more than 60%, and achieves excellent denoising results. All code will be public.

Optimizing Multi-Domain Performance with Active Learning-based Improvement Strategies

  • Authors: Anand Gokul Mahalingam, Aayush Shah, Akshay Gulati, Royston Mascarenhas, Rakshitha Panduranga
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06277
  • Pdf link: https://arxiv.org/pdf/2304.06277
  • Abstract
    Improving performance in multiple domains is a challenging task, and often requires significant amounts of data to train and test models. Active learning techniques provide a promising solution by enabling models to select the most informative samples for labeling, thus reducing the amount of labeled data required to achieve high performance. In this paper, we present an active learning-based framework for improving performance across multiple domains. Our approach consists of two stages: first, we use an initial set of labeled data to train a base model, and then we iteratively select the most informative samples for labeling to refine the model. We evaluate our approach on several multi-domain datasets, including image classification, sentiment analysis, and object recognition. Our experiments demonstrate that our approach consistently outperforms baseline methods and achieves state-of-the-art performance on several datasets. We also show that our method is highly efficient, requiring significantly fewer labeled samples than other active learning-based methods. Overall, our approach provides a practical and effective solution for improving performance across multiple domains using active learning techniques.

Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning

  • Authors: Wenli Xiao, Yiwei Lyu, John Dolan
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06281
  • Pdf link: https://arxiv.org/pdf/2304.06281
  • Abstract
    Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Although shielding with Linear Temporal Logic (LTL) is a promising formal method to ensure safety in single-agent Reinforcement Learning (RL), it results in conservative behaviors when scaling to multi-agent scenarios. Additionally, it poses computational challenges for synthesizing shields in complex multi-agent environments. This work introduces Model-based Dynamic Shielding (MBDS) to support MARL algorithm design. Our algorithm synthesizes distributive shields, which are reactive systems running in parallel with each MARL agent, to monitor and rectify unsafe behaviors. The shields can dynamically split, merge, and recompute based on agents' states. This design enables efficient synthesis of shields to monitor agents in complex environments without coordination overheads. We also propose an algorithm to synthesize shields without prior knowledge of the dynamics model. The proposed algorithm obtains an approximate world model by interacting with the environment during the early stage of exploration, making our MBDS enjoy formal safety guarantees with high probability. We demonstrate in simulations that our framework can surpass existing baselines in terms of safety guarantees and learning performance.

ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis

  • Authors: Hongchen Tan, Baocai Yin, Kun Wei, Xiuping Liu, Xin Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06297
  • Pdf link: https://arxiv.org/pdf/2304.06297
  • Abstract
    We propose a novel Text-to-Image Generation Network, Adaptive Layout Refinement Generative Adversarial Network (ALR-GAN), to adaptively refine the layout of synthesized images without any auxiliary information. The ALR-GAN includes an Adaptive Layout Refinement (ALR) module and a Layout Visual Refinement (LVR) loss. The ALR module aligns the layout structure (which refers to locations of objects and background) of a synthesized image with that of its corresponding real image. In ALR module, we proposed an Adaptive Layout Refinement (ALR) loss to balance the matching of hard and easy features, for more efficient layout structure matching. Based on the refined layout structure, the LVR loss further refines the visual representation within the layout area. Experimental results on two widely-used datasets show that ALR-GAN performs competitively at the Text-to-Image generation task.

Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution

  • Authors: Zhuo Su, Jiehua Zhang, Tianpeng Liu, Zhen Liu, Shuanghui Zhang, Matti Pietikäinen, Li Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06305
  • Pdf link: https://arxiv.org/pdf/2304.06305
  • Abstract
    This paper proposes a novel module called middle spectrum grouped convolution (MSGC) for efficient deep convolutional neural networks (DCNNs) with the mechanism of grouped convolution. It explores the broad "middle spectrum" area between channel pruning and conventional grouped convolution. Compared with channel pruning, MSGC can retain most of the information from the input feature maps due to the group mechanism; compared with grouped convolution, MSGC benefits from the learnability, the core of channel pruning, for constructing its group topology, leading to better channel division. The middle spectrum area is unfolded along four dimensions: group-wise, layer-wise, sample-wise, and attention-wise, making it possible to reveal more powerful and interpretable structures. As a result, the proposed module acts as a booster that can reduce the computational cost of the host backbones for general image recognition with even improved predictive accuracy. For example, in the experiments on ImageNet dataset for image classification, MSGC can reduce the multiply-accumulates (MACs) of ResNet-18 and ResNet-50 by half but still increase the Top-1 accuracy by more than 1%. With 35% reduction of MACs, MSGC can also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on MS COCO dataset for object detection show similar observations. Our code and trained models are available at https://github.com/hellozhuo/msgc.

Efficient Multimodal Fusion via Interactive Prompting

  • Authors: Yaowei Li, Ruijie Quan, Linchao Zhu, Yi Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06306
  • Pdf link: https://arxiv.org/pdf/2304.06306
  • Abstract
    Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era. Following this trend, the size of multi-modal learning models constantly increases, leading to an urgent need to reduce the massive computational cost of finetuning these models for downstream tasks. In this paper, we propose an efficient and flexible multimodal fusion method, namely PMF, tailored for fusing unimodally pre-trained transformers. Specifically, we first present a modular multimodal fusion framework that exhibits high flexibility and facilitates mutual interactions among different modalities. In addition, we disentangle vanilla prompts into three types in order to learn different optimizing objectives for multimodal learning. It is also worth noting that we propose to add prompt vectors only on the deep layers of the unimodal transformers, thus significantly reducing the training memory usage. Experiment results show that our proposed method achieves comparable performance to several other multimodal finetuning methods with less than 3% trainable parameters and up to 66% saving of training memory usage.

Out-of-distribution Few-shot Learning For Edge Devices without Model Fine-tuning

  • Authors: Xinyun Zhang, Lanqing Hong
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06309
  • Pdf link: https://arxiv.org/pdf/2304.06309
  • Abstract
    Few-shot learning (FSL) via customization of a deep learning network with limited data has emerged as a promising technique to achieve personalized user experiences on edge devices. However, existing FSL methods primarily assume independent and identically distributed (IID) data and utilize either computational backpropagation updates for each task or a common model with task-specific prototypes. Unfortunately, the former solution is infeasible for edge devices that lack on-device backpropagation capabilities, while the latter often struggles with limited generalization ability, especially for out-of-distribution (OOD) data. This paper proposes a lightweight, plug-and-play FSL module called Task-aware Normalization (TANO) that enables efficient and task-aware adaptation of a deep neural network without backpropagation. TANO covers the properties of multiple user groups by coordinating the updates of several groups of the normalization statistics during meta-training and automatically identifies the appropriate normalization group for a downstream few-shot task. Consequently, TANO provides stable but task-specific estimations of the normalization statistics to close the distribution gaps and achieve efficient model adaptation. Results on both intra-domain and out-of-domain generalization experiments demonstrate that TANO outperforms recent methods in terms of accuracy, inference speed, and model size. Moreover, TANO achieves promising results on widely-used FSL benchmarks and data from real applications.

Universally Optimal Deterministic Broadcasting in the HYBRID Distributed Model

  • Authors: Yi-Jun Chang, Oren Hecht, Dean Leitersdorf
  • Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.06317
  • Pdf link: https://arxiv.org/pdf/2304.06317
  • Abstract
    In theoretical computer science, it is a common practice to show existential lower bounds for problems, meaning there is a family of pathological inputs on which no algorithm can do better. However, most inputs of interest can be solved much more efficiently, giving rise to the notion of universally optimal algorithms, which run as fast as possible on every input. Questions on the existence of universally optimal algorithms were first raised by Garay, Kutten, and Peleg in FOCS '93. This research direction reemerged recently through a series of works, including the influential work of Haeupler, Wajc, and Zuzic in STOC '21, which resolves some of these decades-old questions in the supported CONGEST model. We work in the HYBRID distributed model, which analyzes networks combining both global and local communication. Much attention has recently been devoted to solving distance related problems, such as All-Pairs Shortest Paths (APSP) in HYBRID, culminating in a $\tilde \Theta(n^{1/2})$ round algorithm for exact APSP. However, by definition, every problem in HYBRID is solvable in $D$ (diameter) rounds, showing that it is far from universally optimal. We show the first universally optimal algorithms in HYBRID, by presenting a fundamental tool that solves any broadcasting problem in a universally optimal number of rounds, deterministically. Specifically, we consider the problem in a graph $G$ where a set of $k$ messages $M$ distributed arbitrarily across $G$, requires every node to learn all of $M$. We show a universal lower bound and a matching, deterministic upper bound, for any graph $G$, any value $k$, and any distribution of $M$ across $G$. This broadcasting tool opens a new exciting direction of research into showing universally optimal algorithms in HYBRID. As an example, we use it to obtain algorithms for approximate and exact APSP in general and sparse graphs.

Continual Learning of Hand Gestures for Human-Robot Interaction

  • Authors: Xavier Cucurull, Anaís Garrell
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06319
  • Pdf link: https://arxiv.org/pdf/2304.06319
  • Abstract
    In this paper, we present an efficient method to incrementally learn to classify static hand gestures. This method allows users to teach a robot to recognize new symbols in an incremental manner. Contrary to other works which use special sensors or external devices such as color or data gloves, our proposed approach makes use of a single RGB camera to perform static hand gesture recognition from 2D images. Furthermore, our system is able to incrementally learn up to 38 new symbols using only 5 samples for each old class, achieving a final average accuracy of over 90%. In addition to that, the incremental training time can be reduced to a 10% of the time required when using all data available.

An Automotive Case Study on the Limits of Approximation for Object Detection

  • Authors: Martí Caro, Hamid Tabani, Jaume Abella, Francesc Moll, Enric Morancho, Ramon Canal, Josep Altet, Antonio Calomarde, Francisco J. Cazorla, Antonio Rubio, Pau Fontova, Jordi Fornt
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.06327
  • Pdf link: https://arxiv.org/pdf/2304.06327
  • Abstract
    The accuracy of camera-based object detection (CBOD) built upon deep learning is often evaluated against the real objects in frames only. However, such simplistic evaluation ignores the fact that many unimportant objects are small, distant, or background, and hence, their misdetections have less impact than those for closer, larger, and foreground objects in domains such as autonomous driving. Moreover, sporadic misdetections are irrelevant since confidence on detections is typically averaged across consecutive frames, and detection devices (e.g. cameras, LiDARs) are often redundant, thus providing fault tolerance. This paper exploits such intrinsic fault tolerance of the CBOD process, and assesses in an automotive case study to what extent CBOD can tolerate approximation coming from multiple sources such as lower precision arithmetic, approximate arithmetic units, and even random faults due to, for instance, low voltage operation. We show that the accuracy impact of those sources of approximation is within 1% of the baseline even when considering the three approximate domains simultaneously, and hence, multiple sources of approximation can be exploited to build highly efficient accelerators for CBOD in cars.

EF/CF: High Performance Smart Contract Fuzzing for Exploit Generation

  • Authors: Michael Rodler, David Paaßen, Wenting Li, Lukas Bernhard, Thorsten Holz, Ghassan Karame, Lucas Davi
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.06341
  • Pdf link: https://arxiv.org/pdf/2304.06341
  • Abstract
    Smart contracts are increasingly being used to manage large numbers of high-value cryptocurrency accounts. There is a strong demand for automated, efficient, and comprehensive methods to detect security vulnerabilities in a given contract. While the literature features a plethora of analysis methods for smart contracts, the existing proposals do not address the increasing complexity of contracts. Existing analysis tools suffer from false alarms and missed bugs in today's smart contracts that are increasingly defined by complexity and interdependencies. To scale accurate analysis to modern smart contracts, we introduce EF/CF, a high-performance fuzzer for Ethereum smart contracts. In contrast to previous work, EF/CF efficiently and accurately models complex smart contract interactions, such as reentrancy and cross-contract interactions, at a very high fuzzing throughput rate. To achieve this, EF/CF transpiles smart contract bytecode into native C++ code, thereby enabling the reuse of existing, optimized fuzzing toolchains. Furthermore, EF/CF increases fuzzing efficiency by employing a structure-aware mutation engine for smart contract transaction sequences and using a contract's ABI to generate valid transaction inputs. In a comprehensive evaluation, we show that EF/CF scales better -- without compromising accuracy -- to complex contracts compared to state-of-the-art approaches, including other fuzzers, symbolic/concolic execution, and hybrid approaches. Moreover, we show that EF/CF can automatically generate transaction sequences that exploit reentrancy bugs to steal Ether.

DDT: Dual-branch Deformable Transformer for Image Denoising

  • Authors: Kangliang Liu, Xiangcheng Du, Sijie Liu, Yingbin Zheng, Xingjiao Wu, Cheng Jin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06346
  • Pdf link: https://arxiv.org/pdf/2304.06346
  • Abstract
    Transformer is beneficial for image denoising tasks since it can model long-range dependencies to overcome the limitations presented by inductive convolutional biases. However, directly applying the transformer structure to remove noise is challenging because its complexity grows quadratically with the spatial resolution. In this paper, we propose an efficient Dual-branch Deformable Transformer (DDT) denoising network which captures both local and global interactions in parallel. We divide features with a fixed patch size and a fixed number of patches in local and global branches, respectively. In addition, we apply deformable attention operation in both branches, which helps the network focus on more important regions and further reduces computational complexity. We conduct extensive experiments on real-world and synthetic denoising tasks, and the proposed DDT achieves state-of-the-art performance with significantly fewer computational costs.

ODAM: Gradient-based instance-specific visual explanations for object detection

  • Authors: Chenyang Zhao, Antoni B. Chan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06354
  • Pdf link: https://arxiv.org/pdf/2304.06354
  • Abstract
    We propose the gradient-weighted Object Detector Activation Maps (ODAM), a visualized explanation technique for interpreting the predictions of object detectors. Utilizing the gradients of detector targets flowing into the intermediate feature maps, ODAM produces heat maps that show the influence of regions on the detector's decision for each predicted attribute. Compared to previous works classification activation maps (CAM), ODAM generates instance-specific explanations rather than class-specific ones. We show that ODAM is applicable to both one-stage detectors and two-stage detectors with different types of detector backbones and heads, and produces higher-quality visual explanations than the state-of-the-art both effectively and efficiently. We next propose a training scheme, Odam-Train, to improve the explanation ability on object discrimination of the detector through encouraging consistency between explanations for detections on the same object, and distinct explanations for detections on different objects. Based on the heat maps produced by ODAM with Odam-Train, we propose Odam-NMS, which considers the information of the model's explanation for each prediction to distinguish the duplicate detected objects. We present a detailed analysis of the visualized explanations of detectors and carry out extensive experiments to validate the effectiveness of the proposed ODAM.

IBIA: An Incremental Build-Infer-Approximate Framework for Approximate Inference of Partition Function

  • Authors: Shivani Bathla, Vinita Vasudevan
  • Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06366
  • Pdf link: https://arxiv.org/pdf/2304.06366
  • Abstract
    Exact computation of the partition function is known to be intractable, necessitating approximate inference techniques. Existing methods for approximate inference are slow to converge for many benchmarks. The control of accuracy-complexity trade-off is also non-trivial in many of these methods. We propose a novel incremental build-infer-approximate (IBIA) framework for approximate inference that addresses these issues. In this framework, the probabilistic graphical model is converted into a sequence of clique tree forests (SCTF) with bounded clique sizes. We show that the SCTF can be used to efficiently compute the partition function. We propose two new algorithms which are used to construct the SCTF and prove the correctness of both. The first is an algorithm for incremental construction of CTFs that is guaranteed to give a valid CTF with bounded clique sizes and the second is an approximation algorithm that takes a calibrated CTF as input and yields a valid and calibrated CTF with reduced clique sizes as the output. We have evaluated our method using several benchmark sets from recent UAI competitions and our results show good accuracies with competitive runtimes.

An attack resilient policy on the tip pool for DAG-based distributed ledgers

  • Authors: Lianna Zhao, Andrew Culleny, Sebastian Muellerz, Olivia Saay, Robert Shorten
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.06369
  • Pdf link: https://arxiv.org/pdf/2304.06369
  • Abstract
    This paper discusses congestion control and inconsistency problems in DAG-based distributed ledgers and proposes an additional filter to mitigate these issues. Unlike traditional blockchains, DAG-based DLTs use a directed acyclic graph structure to organize transactions, allowing higher scalability and efficiency. However, this also introduces challenges in controlling the rate at which blocks are added to the network and preventing the influence of spam attacks. To address these challenges, we propose a filter to limit the tip pool size and to avoid referencing old blocks. Furthermore, we present experimental results to demonstrate the effectiveness of this filter in reducing the negative impacts of various attacks. Our approach offers a lightweight and efficient solution for managing the flow of blocks in DAG-based DLTs, which can enhance the consistency and reliability of these systems. Index

Contact Models in Robotics: a Comparative Analysis

  • Authors: Quentin Le Lidec, Wilson Jallet, Louis Montaut, Ivan Laptev, Cordelia Schmid, Justin Carpentier
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06372
  • Pdf link: https://arxiv.org/pdf/2304.06372
  • Abstract
    Physics simulation is ubiquitous in robotics. Whether in model-based approaches (e.g., trajectory optimization), or model-free algorithms (e.g., reinforcement learning), physics simulators are a central component of modern control pipelines in robotics. Over the past decades, several robotic simulators have been developed, each with dedicated contact modeling assumptions and algorithmic solutions. In this article, we survey the main contact models and the associated numerical methods commonly used in robotics for simulating advanced robot motions involving contact interactions. In particular, we recall the physical laws underlying contacts and friction (i.e., Signorini condition, Coulomb's law, and the maximum dissipation principle), and how they are transcribed in current simulators. For each physics engine, we expose their inherent physical relaxations along with their limitations due to the numerical techniques employed. Based on our study, we propose theoretically grounded quantitative criteria on which we build benchmarks assessing both the physical and computational aspects of simulation. We support our work with an open-source and efficient C++ implementation of the existing algorithmic variations. Our results demonstrate that some approximations or algorithms commonly used in robotics can severely widen the reality gap and impact target applications. We hope this work will help motivate the development of new contact models, contact solvers, and robotic simulators in general, at the root of recent progress in motion generation in robotics.

Learning Accurate Performance Predictors for Ultrafast Automated Model Compression

  • Authors: Ziwei Wang, Jiwen Lu, Han Xiao, Shengyu Liu, Jie Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06393
  • Pdf link: https://arxiv.org/pdf/2304.06393
  • Abstract
    In this paper, we propose an ultrafast automated model compression framework called SeerNet for flexible network deployment. Conventional non-differen-tiable methods discretely search the desirable compression policy based on the accuracy from exhaustively trained lightweight models, and existing differentiable methods optimize an extremely large supernet to obtain the required compressed model for deployment. They both cause heavy computational cost due to the complex compression policy search and evaluation process. On the contrary, we obtain the optimal efficient networks by directly optimizing the compression policy with an accurate performance predictor, where the ultrafast automated model compression for various computational cost constraint is achieved without complex compression policy search and evaluation. Specifically, we first train the performance predictor based on the accuracy from uncertain compression policies actively selected by efficient evolutionary search, so that informative supervision is provided to learn the accurate performance predictor with acceptable cost. Then we leverage the gradient that maximizes the predicted performance under the barrier complexity constraint for ultrafast acquisition of the desirable compression policy, where adaptive update stepsizes with momentum are employed to enhance optimality of the acquired pruning and quantization strategy. Compared with the state-of-the-art automated model compression methods, experimental results on image classification and object detection show that our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.

Fast And Automatic Floating Point Error Analysis With CHEF-FP

  • Authors: Garima Singh, Baidyanath Kundu, Harshitha Menon, Alexander Penev, David J. Lange, Vassil Vassilev
  • Subjects: Numerical Analysis (math.NA); Hardware Architecture (cs.AR); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.06441
  • Pdf link: https://arxiv.org/pdf/2304.06441
  • Abstract
    As we reach the limit of Moore's Law, researchers are exploring different paradigms to achieve unprecedented performance. Approximate Computing (AC), which relies on the ability of applications to tolerate some error in the results to trade-off accuracy for performance, has shown significant promise. Despite the success of AC in domains such as Machine Learning, its acceptance in High-Performance Computing (HPC) is limited due to stringent requirements for accuracy. We need tools and techniques to identify regions of code that are amenable to approximations and their impact on the application output quality to guide developers to employ selective approximation. To this end, we propose CHEF-FP, a flexible, scalable, and easy-to-use source-code transformation tool based on Automatic Differentiation (AD) for analyzing approximation errors in HPC applications. CHEF-FP uses Clad, an efficient AD tool built as a plugin to the Clang compiler and based on the LLVM compiler infrastructure, as a backend and utilizes its AD abilities to evaluate approximation errors in C++ code. CHEF-FP works at the source by injecting error estimation code into the generated adjoints. This enables the error-estimation code to undergo compiler optimizations resulting in improved analysis time and reduced memory usage. We also provide theoretical and architectural augmentations to source code transformation-based AD tools to perform FP error analysis. This paper primarily focuses on analyzing errors introduced by mixed-precision AC techniques. We also show the applicability of our tool in estimating other kinds of errors by evaluating our tool on codes that use approximate functions. Moreover, we demonstrate the speedups CHEF-FP achieved during analysis time compared to the existing state-of-the-art tool due to its ability to generate and insert approximation error estimate code directly into the derivative source.

SpectFormer: Frequency and Attention is what you need in a Vision Transformer

  • Authors: Badri N. Patro, Vinay P. Namboodiri, Vijay Srinivas Agneeswaran
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06446
  • Pdf link: https://arxiv.org/pdf/2304.06446
  • Abstract
    Vision transformers have been applied successfully for image recognition tasks. There have been either multi-headed self-attention based (ViT \cite{dosovitskiy2020image}, DeIT, \cite{touvron2021training}) similar to the original work in textual models or more recently based on spectral layers (Fnet\cite{lee2021fnet}, GFNet\cite{rao2021global}, AFNO\cite{guibas2021efficient}). We hypothesize that both spectral and multi-headed attention plays a major role. We investigate this hypothesis through this work and observe that indeed combining spectral and multi-headed attention layers provides a better transformer architecture. We thus propose the novel Spectformer architecture for transformers that combines spectral and multi-headed attention layers. We believe that the resulting representation allows the transformer to capture the feature representation appropriately and it yields improved performance over other transformer representations. For instance, it improves the top-1 accuracy by 2% on ImageNet compared to both GFNet-H and LiT. SpectFormer-S reaches 84.25% top-1 accuracy on ImageNet-1K (state of the art for small version). Further, Spectformer-L achieves 85.7% that is the state of the art for the comparable base version of the transformers. We further ensure that we obtain reasonable results in other scenarios such as transfer learning on standard datasets such as CIFAR-10, CIFAR-100, Oxford-IIIT-flower, and Standford Car datasets. We then investigate its use in downstream tasks such of object detection and instance segmentation on the MS-COCO dataset and observe that Spectformer shows consistent performance that is comparable to the best backbones and can be further optimized and improved. Hence, we believe that combined spectral and attention layers are what are needed for vision transformers.

CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input

  • Authors: Senmao Tian, Ming Lu, Jiaming Liu, Yandong Guo, Yurong Chen, Shunli Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06454
  • Pdf link: https://arxiv.org/pdf/2304.06454
  • Abstract
    With the development of high-definition display devices, the practical scenario of Super-Resolution (SR) usually needs to super-resolve large input like 2K to higher resolution (4K/8K). To reduce the computational and memory cost, current methods first split the large input into local patches and then merge the SR patches into the output. These methods adaptively allocate a subnet for each patch. Quantization is a very important technique for network acceleration and has been used to design the subnets. Current methods train an MLP bit selector to determine the propoer bit for each layer. However, they uniformly sample subnets for training, making simple subnets overfitted and complicated subnets underfitted. Therefore, the trained bit selector fails to determine the optimal bit. Apart from this, the introduced bit selector brings additional cost to each layer of the SR network. In this paper, we propose a novel method named Content-Aware Bit Mapping (CABM), which can remove the bit selector without any performance loss. CABM also learns a bit selector for each layer during training. After training, we analyze the relation between the edge information of an input patch and the bit of each layer. We observe that the edge information can be an effective metric for the selected bit. Therefore, we design a strategy to build an Edge-to-Bit lookup table that maps the edge score of a patch to the bit of each layer during inference. The bit configuration of SR network can be determined by the lookup tables of all layers. Our strategy can find better bit configuration, resulting in more efficient mixed precision networks. We conduct detailed experiments to demonstrate the generalization ability of our method. The code will be released.

Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages

  • Authors: Israel Abebe Azime, Sana Sabah Al-Azzawi, Atnafu Lambebo Tonja, Iyanuoluwa Shode, Jesujoba Alabi, Ayodele Awokoya, Mardiyyah Oduwole, Tosin Adewumi, Samuel Fanijo, Oyinkansola Awosan, Oreen Yousuf
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06459
  • Pdf link: https://arxiv.org/pdf/2304.06459
  • Abstract
    AfriSenti-SemEval Shared Task 12 of SemEval-2023. The task aims to perform monolingual sentiment classification (sub-task A) for 12 African languages, multilingual sentiment classification (sub-task B), and zero-shot sentiment classification (task C). For sub-task A, we conducted experiments using classical machine learning classifiers, Afro-centric language models, and language-specific models. For task B, we fine-tuned multilingual pre-trained language models that support many of the languages in the task. For task C, we used we make use of a parameter-efficient Adapter approach that leverages monolingual texts in the target language for effective zero-shot transfer. Our findings suggest that using pre-trained Afro-centric language models improves performance for low-resource African languages. We also ran experiments using adapters for zero-shot tasks, and the results suggest that we can obtain promising results by using adapters with a limited amount of resources.

Repositioning Tiered HotSpot Execution Performance Relative to the Interpreter

  • Authors: Jonathan Lambert, Kevin Casey, Rosemary Monahan
  • Subjects: Programming Languages (cs.PL); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.06460
  • Pdf link: https://arxiv.org/pdf/2304.06460
  • Abstract
    Although the advantages of just-in-time compilation over traditional interpretive execution are widely recognised, there needs to be more current research investigating and repositioning the performance differences between these two execution models relative to contemporary workloads. Specifically, there is a need to examine the performance differences between Java Runtime Environment (JRE) Java Virtual Machine (JVM) tiered execution and JRE JVM interpretive execution relative to modern multicore architectures and modern concurrent and parallel benchmark workloads. This article aims to fill this research gap by presenting the results of a study that compares the performance of these two execution models under load from the Renaissance Benchmark Suite. This research is relevant to anyone interested in understanding the performance differences between just-in-time compiled code and interpretive execution. It provides a contemporary assessment of the interpretive JVM core, the entry and starting point for bytecode execution, relative to just-in-time tiered execution. The study considers factors such as the JRE version, the GNU GCC version used in the JRE build toolchain, and the garbage collector algorithm specified at runtime, and their impact on the performance difference envelope between interpretive and tiered execution. Our findings indicate that tiered execution is considerably more efficient than interpretive execution, and the performance gap has increased, ranging from 4 to 37 times more efficient. On average, tiered execution is approximately 15 times more efficient than interpretive execution. Additionally, the performance differences between interpretive and tiered execution are influenced by workload category, with narrower performance differences observed for web-based workloads and more significant differences for Functional and Scala-type workloads.

Towards Understanding the Benefits and Challenges of Demand Responsive Public Transit- A Case Study in the City of Charlotte, NC

  • Authors: Sanaz Sadat Hosseini, Mona Azarbayjani, Jason Lawrence, Hamed Tabkhi
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06467
  • Pdf link: https://arxiv.org/pdf/2304.06467
  • Abstract
    Access to adequate public transportation plays a critical role in inequity and socio-economic mobility, particularly in low-income communities. Low-income workers who rely heavily on public transportation face a spatial disparity between home and work, which leads to higher unemployment, longer job searches, and longer commute times. The overarching goal of this study is to get initial data that would result in creating a connected, coordinated, demand-responsive, and efficient public bus system that minimizes transit gaps for low-income, transit-dependent communities. To create equitable metropolitan public transportation, this paper evaluates existing CATS mobile applications that assist passengers in finding bus routes and arrival times. Our community survey methodology includes filling out questionnaires on Charlotte's current bus system on specific bus lines and determining user acceptance for a future novel smart technology. We have also collected data on the demand and transit gap for a real-world pilot study, Sprinter bus line, Bus line 7, Bus line 9, and Bus lines 97-99. These lines connect all of Charlotte City's main areas and are the most important bus lines in the system. On the studied routes, the primary survey results indicate that the current bus system has many flaws, the major one being the lack of proper timing to meet the needs of passengers. The most common problems are long commutes and long waiting times at stations. Moreover, the existing application provides inaccurate information, and on average, 80 percent of travelers and respondents are inclined to use new technology.

An Efficient Transfer Learning-based Approach for Apple Leaf Disease Classification

  • Authors: Md. Hamjajul Ashmafee, Tasnim Ahmed, Sabbir Ahmed, Md. Bakhtiar Hasan, Mst Nura Jahan, A.B.M. Ashikur Rahman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06520
  • Pdf link: https://arxiv.org/pdf/2304.06520
  • Abstract
    Correct identification and categorization of plant diseases are crucial for ensuring the safety of the global food supply and the overall financial success of stakeholders. In this regard, a wide range of solutions has been made available by introducing deep learning-based classification systems for different staple crops. Despite being one of the most important commercial crops in many parts of the globe, research proposing a smart solution for automatically classifying apple leaf diseases remains relatively unexplored. This study presents a technique for identifying apple leaf diseases based on transfer learning. The system extracts features using a pretrained EfficientNetV2S architecture and passes to a classifier block for effective prediction. The class imbalance issues are tackled by utilizing runtime data augmentation. The effect of various hyperparameters, such as input resolution, learning rate, number of epochs, etc., has been investigated carefully. The competence of the proposed pipeline has been evaluated on the apple leaf disease subset from the publicly available `PlantVillage' dataset, where it achieved an accuracy of 99.21%, outperforming the existing works.

Multi-kernel Correntropy-based Orientation Estimation of IMUs: Gradient Descent Methods

  • Authors: Shilei Li, Lijing Li, Dawei Shi, Yunjiang Lou, Ling Shi
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.06548
  • Pdf link: https://arxiv.org/pdf/2304.06548
  • Abstract
    This paper presents two computationally efficient algorithms for the orientation estimation of inertial measurement units (IMUs): the correntropy-based gradient descent (CGD) and the correntropy-based decoupled orientation estimation (CDOE). Traditional methods, such as gradient descent (GD) and decoupled orientation estimation (DOE), rely on the mean squared error (MSE) criterion, making them vulnerable to external acceleration and magnetic interference. To address this issue, we demonstrate that the multi-kernel correntropy loss (MKCL) is an optimal objective function for maximum likelihood estimation (MLE) when the noise follows a type of heavy-tailed distribution. In certain situations, the estimation error of the MKCL is bounded even in the presence of arbitrarily large outliers. By replacing the standard MSE cost function with MKCL, we develop the CGD and CDOE algorithms. We evaluate the effectiveness of our proposed methods by comparing them with existing algorithms in various situations. Experimental results indicate that our proposed methods (CGD and CDOE) outperform their conventional counterparts (GD and DOE), especially when faced with external acceleration and magnetic disturbances. Furthermore, the new algorithms demonstrate significantly lower computational complexity than Kalman filter-based approaches, making them suitable for applications with low-cost microprocessors.

Multiscale Finite Element Formulations for 2D/1D Problems

  • Authors: Karl Hollaus, Markus Schöbinger
  • Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
  • Arxiv link: https://arxiv.org/abs/2304.06553
  • Pdf link: https://arxiv.org/pdf/2304.06553
  • Abstract
    Multiscale finite element methods for 2D/1D problems have been studied in this work to demonstrate their excellent ability to solve real-world problems. These methods are much more efficient than conventional 3D finite element methods and just as accurate. The 2D/1D multiscale finite element methods are based on a magnetic vector potential or a current vector potential. Known currents for excitation can be replaced by the Biot-Savart-field. Boundary conditions allow to integrate planes of symmetry. All presented approaches consider eddy currents, an insulation layer and preserve the edge effect. A segment of a fictitious electrical machine has been studied to demonstrate all above options, the accuracy and the low computational costs of the 2D/1D multiscale finite element methods.

Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation

  • Authors: Mohit Sharma, Claudio Fantacci, Yuxiang Zhou, Skanda Koppula, Nicolas Heess, Jon Scholz, Yusuf Aytar
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06600
  • Pdf link: https://arxiv.org/pdf/2304.06600
  • Abstract
    Recent works have shown that large models pretrained on common visual learning tasks can provide useful representations for a wide range of specialized perception problems, as well as a variety of robotic manipulation tasks. While prior work on robotic manipulation has predominantly used frozen pretrained features, we demonstrate that in robotics this approach can fail to reach optimal performance, and that fine-tuning of the full model can lead to significantly better results. Unfortunately, fine-tuning disrupts the pretrained visual representation, and causes representational drift towards the fine-tuned task thus leading to a loss of the versatility of the original model. We introduce "lossless adaptation" to address this shortcoming of classical fine-tuning. We demonstrate that appropriate placement of our parameter efficient adapters can significantly reduce the performance gap between frozen pretrained representations and full end-to-end fine-tuning without changes to the original representation and thus preserving original capabilities of the pretrained model. We perform a comprehensive investigation across three major model architectures (ViTs, NFNets, and ResNets), supervised (ImageNet-1K classification) and self-supervised pretrained weights (CLIP, BYOL, Visual MAE) in 3 task domains and 35 individual tasks, and demonstrate that our claims are strongly validated in various settings.

Robustness Measures and Monitors for Time Window Temporal Logic

  • Authors: Ahmad Ahmad, Cristian-Ioan Vasile, Roberto Tron, Calin Belta
  • Subjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.06645
  • Pdf link: https://arxiv.org/pdf/2304.06645
  • Abstract
    Temporal logics (TLs) have been widely used to formalize interpretable tasks for cyber-physical systems. Time Window Temporal Logic (TWTL) has been recently proposed as a specification language for dynamical systems. In particular, it can easily express robotic tasks, and it allows for efficient, automata-based verification and synthesis of control policies for such systems. In this paper, we define two quantitative semantics for this logic, and two corresponding monitoring algorithms, which allow for real-time quantification of satisfaction of formulas by trajectories of discrete-time systems. We demonstrate the new semantics and their runtime monitors on numerical examples.

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

  • Authors: Enze Xie, Lewei Yao, Han Shi, Zhili Liu, Daquan Zhou, Zhaoqiang Liu, Jiawei Li, Zhenguo Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06648
  • Pdf link: https://arxiv.org/pdf/2304.06648
  • Abstract
    Diffusion models have proven to be highly effective in generating high-quality images. However, adapting large pre-trained diffusion models to new domains remains an open challenge, which is critical for real-world applications. This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains. DiffFit is embarrassingly simple that only fine-tunes the bias term and newly-added scaling factors in specific layers, yet resulting in significant training speed-up and reduced model storage costs. Compared with full fine-tuning, DiffFit achieves 2$\times$ training speed-up and only needs to store approximately 0.12% of the total model parameters. Intuitive theoretical analysis has been provided to justify the efficacy of scaling factors on fast adaptation. On 8 downstream datasets, DiffFit achieves superior or competitive performances compared to the full fine-tuning while being more efficient. Remarkably, we show that DiffFit can adapt a pre-trained low-resolution generative model to a high-resolution one by adding minimal cost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of 3.02 on ImageNet 512$\times$512 benchmark by fine-tuning only 25 epochs from a public pre-trained ImageNet 256$\times$256 checkpoint while being 30$\times$ more training efficient than the closest competitor.

DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer

  • Authors: Amit Kumar Rana, Sabarinath Mahadevan, Alexander Hermans, Bastian Leibe
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06668
  • Pdf link: https://arxiv.org/pdf/2304.06668
  • Abstract
    Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth annotations for training, which are expensive to create. Interactive segmentation networks help generate such annotations based on an image and the corresponding user interactions such as clicks. Existing methods for this task can only process a single instance at a time and each user interaction requires a full forward pass through the entire deep network. We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as spatio-temporal queries to a Transformer decoder with a potential to segment multiple object instances in a single iteration. Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image when compared to other methods. DynaMITe achieves state-of-the-art results on multiple existing interactive segmentation benchmarks, and also on the new multi-instance benchmark that we propose in this paper.

Inertia-Aware Microgrid Investment Planning Using Tractable Decomposition Algorithms

  • Authors: Agnes Marjorie Nakiganda, Shahab Dehghan, Petros Aristidou
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06674
  • Pdf link: https://arxiv.org/pdf/2304.06674
  • Abstract
    The integration of the frequency dynamics into Micro-Grid (MG) investment and operational planning problems is vital in ensuring the security of the system in the post-contingency states. However, the task of including transient security constraints in MG planning problems is non-trivial. This is due to the highly non-linear and non-convex nature of the analytical closed form of the frequency metrics (e.g., frequency nadir) and power flow constraints. To handle this issue, this paper presents two algorithms for decomposing the MG investment planning problem into multiple levels to enhance computational tractability and optimality. Furthermore, the sensitivity of the decisions made at each level is captured by corresponding dual cutting planes to model feasible secure regions. This, in turn, ensures both the optimal determination and placement of inertia services and accelerates the convergence of the proposed decomposition algorithms. The efficient and effective performance of the proposed algorithms is tested and verified on an 18-bus Low Voltage (LV) network and a 30-bus Medium Voltage (MV) network under various operating scenarios.

OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems

  • Authors: Jiachang Liu, Sam Rosen, Chudi Zhong, Cynthia Rudin
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.06686
  • Pdf link: https://arxiv.org/pdf/2304.06686
  • Abstract
    We consider an important problem in scientific discovery, identifying sparse governing equations for nonlinear dynamical systems. This involves solving sparse ridge regression problems to provable optimality in order to determine which terms drive the underlying dynamics. We propose a fast algorithm, OKRidge, for sparse ridge regression, using a novel lower bound calculation involving, first, a saddle point formulation, and from there, either solving (i) a linear system or (ii) using an ADMM-based approach, where the proximal operators can be efficiently evaluated by solving another linear system and an isotonic regression problem. We also propose a method to warm-start our solver, which leverages a beam search. Experimentally, our methods attain provable optimality with run times that are orders of magnitude faster than those of the existing MIP formulations solved by the commercial solver Gurobi.

Representing Volumetric Videos as Dynamic MLP Maps

  • Authors: Sida Peng, Yunzhi Yan, Qing Shuai, Hujun Bao, Xiaowei Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06717
  • Pdf link: https://arxiv.org/pdf/2304.06717
  • Abstract
    This paper introduces a novel representation of volumetric videos for real-time view synthesis of dynamic scenes. Recent advances in neural scene representations demonstrate their remarkable capability to model and render complex static scenes, but extending them to represent dynamic scenes is not straightforward due to their slow rendering speed or high storage cost. To solve this problem, our key idea is to represent the radiance field of each frame as a set of shallow MLP networks whose parameters are stored in 2D grids, called MLP maps, and dynamically predicted by a 2D CNN decoder shared by all frames. Representing 3D scenes with shallow MLPs significantly improves the rendering speed, while dynamically predicting MLP parameters with a shared 2D CNN instead of explicitly storing them leads to low storage cost. Experiments show that the proposed approach achieves state-of-the-art rendering quality on the NHR and ZJU-MoCap datasets, while being efficient for real-time rendering with a speed of 41.7 fps for $512 \times 512$ images on an RTX 3090 GPU. The code is available at https://zju3dv.github.io/mlp_maps/.

Keyword: faster

Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays

  • Authors: Chen Xie, Francesco Daghero, Yukai Chen, Marco Castellano, Luca Gandolfi, Andrea Calimera, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06059
  • Pdf link: https://arxiv.org/pdf/2304.06059
  • Abstract
    Ultra-low-resolution Infrared (IR) array sensors offer a low-cost, energy-efficient, and privacy-preserving solution for people counting, with applications such as occupancy monitoring. Previous work has shown that Deep Learning (DL) can yield superior performance on this task. However, the literature was missing an extensive comparative analysis of various efficient DL architectures for IR array-based people counting, that considers not only their accuracy, but also the cost of deploying them on memory- and energy-constrained Internet of Things (IoT) edge nodes. In this work, we address this need by comparing 6 different DL architectures on a novel dataset composed of IR images collected from a commercial 8x8 array, which we made openly available. With a wide architectural exploration of each model type, we obtain a rich set of Pareto-optimal solutions, spanning cross-validated balanced accuracy scores in the 55.70-82.70% range. When deployed on a commercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, these models occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference, while consuming 17.18-120.43 $\mu$J of energy. Our models are significantly more accurate than a previous deterministic method (up to +39.9%), while being up to 3.53x faster and more energy efficient. Further, our models' accuracy is comparable to state-of-the-art DL solutions on similar resolution sensors, despite a much lower complexity. All our models enable continuous, real-time inference on a MCU-based IoT node, with years of autonomous operation without battery recharging.

Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

  • Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.06178
  • Pdf link: https://arxiv.org/pdf/2304.06178
  • Abstract
    Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.

Beyond the Quadratic Time Barrier for Network Unreliability

  • Authors: Ruoxu Cen, William He, Jason Li, Debmalya Panigrahi
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.06552
  • Pdf link: https://arxiv.org/pdf/2304.06552
  • Abstract
    Karger (STOC 1995) gave the first FPTAS for the network (un)reliability problem, setting in motion research over the next three decades that obtained increasingly faster running times, eventually leading to a $\tilde{O}(n^2)$-time algorithm (Karger, STOC 2020). This represented a natural culmination of this line of work because the algorithmic techniques used can enumerate $\Theta(n^2)$ (near)-minimum cuts. In this paper, we go beyond this quadratic barrier and obtain a faster algorithm for the network unreliability problem. Our algorithm runs in $m^{1+o(1)} + \tilde{O}(n^{1.5})$ time. Our main contribution is a new estimator for network unreliability in very reliable graphs. These graphs are usually the bottleneck for network unreliability since the disconnection event is elusive. Our estimator is obtained by defining an appropriate importance sampling subroutine on a dual spanning tree packing of the graph. To complement this estimator for very reliable graphs, we use recursive contraction for moderately reliable graphs. We show that an interleaving of sparsification and contraction can be used to obtain a better parametrization of the recursive contraction algorithm that yields a faster running time matching the one obtained for the very reliable case.

Class-Incremental Learning of Plant and Disease Detection: Growing Branches with Knowledge Distillation

  • Authors: Mathieu Pagé Fortin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06619
  • Pdf link: https://arxiv.org/pdf/2304.06619
  • Abstract
    This paper investigates the problem of class-incremental object detection for agricultural applications where a model needs to learn new plant species and diseases incrementally without forgetting the previously learned ones. We adapt two public datasets to include new categories over time, simulating a more realistic and dynamic scenario. We then compare three class-incremental learning methods that leverage different forms of knowledge distillation to mitigate catastrophic forgetting. Our experiments show that all three methods suffer from catastrophic forgetting, but the recent Dynamic Y-KD approach, which additionally uses a dynamic architecture that grows new branches to learn new tasks, outperforms ILOD and Faster-ILOD in most scenarios both on new and old classes. These results highlight the challenges and opportunities of continual object detection for agricultural applications. In particular, the large intra-class and small inter-class variability that is typical of plant images exacerbate the difficulty of learning new categories without interfering with previous knowledge. We publicly release our code to encourage future work.

OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems

  • Authors: Jiachang Liu, Sam Rosen, Chudi Zhong, Cynthia Rudin
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.06686
  • Pdf link: https://arxiv.org/pdf/2304.06686
  • Abstract
    We consider an important problem in scientific discovery, identifying sparse governing equations for nonlinear dynamical systems. This involves solving sparse ridge regression problems to provable optimality in order to determine which terms drive the underlying dynamics. We propose a fast algorithm, OKRidge, for sparse ridge regression, using a novel lower bound calculation involving, first, a saddle point formulation, and from there, either solving (i) a linear system or (ii) using an ADMM-based approach, where the proximal operators can be efficiently evaluated by solving another linear system and an isotonic regression problem. We also propose a method to warm-start our solver, which leverages a beam search. Experimentally, our methods attain provable optimality with run times that are orders of magnitude faster than those of the existing MIP formulations solved by the commercial solver Gurobi.

Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields

  • Authors: Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, Peter Hedman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06706
  • Pdf link: https://arxiv.org/pdf/2304.06706
  • Abstract
    Neural Radiance Field training can be accelerated through the use of grid-based representations in NeRF's learned mapping from spatial coordinates to colors and volumetric density. However, these grid-based approaches lack an explicit understanding of scale and therefore often introduce aliasing, usually in the form of jaggies or missing scene content. Anti-aliasing has previously been addressed by mip-NeRF 360, which reasons about sub-volumes along a cone rather than points along a ray, but this approach is not natively compatible with current grid-based techniques. We show how ideas from rendering and signal processing can be used to construct a technique that combines mip-NeRF 360 and grid-based models such as Instant NGP to yield error rates that are 8% - 76% lower than either prior technique, and that trains 22x faster than mip-NeRF 360.

Keyword: mobile

Situational-Aware Multi-Graph Convolutional Recurrent Network (SA-MGCRN) for Travel Demand Forecasting During Wildfires

  • Authors: Xiaojian Zhang, Xilei Zhao, Yiming Xu, Ruggiero Lovreglio, Daniel Nilsson
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06233
  • Pdf link: https://arxiv.org/pdf/2304.06233
  • Abstract
    Real-time forecasting of travel demand during wildfire evacuations is crucial for emergency managers and transportation planners to make timely and better-informed decisions. However, few studies focus on accurate travel demand forecasting in large-scale emergency evacuations. Therefore, this study develops and tests a new methodological framework for modeling trip generation in wildfire evacuations by using (a) large-scale GPS data generated by mobile devices and (b) state-of-the-art AI technologies. The proposed methodology aims at forecasting evacuation trips and other types of trips. Based on the travel demand inferred from the GPS data, we develop a new deep learning model, i.e., Situational-Aware Multi-Graph Convolutional Recurrent Network (SA-MGCRN), along with a model updating scheme to achieve real-time forecasting of travel demand during wildfire evacuations. The proposed methodological framework is tested in this study for a real-world case study: the 2019 Kincade Fire in Sonoma County, CA. The results show that SA-MGCRN significantly outperforms all the selected state-of-the-art benchmarks in terms of prediction performance. Our finding suggests that the most important model components of SA-MGCRN are evacuation order/warning information, proximity to fire, and population change, which are consistent with behavioral theories and empirical findings.

Loosely Coupled Odometry, UWB Ranging, and Cooperative Spatial Detection for Relative Monte-Carlo Multi-Robot Localization

  • Authors: Xianjia Yu, Paola Torrico Morrn, Sahar Salimpour, Jorge Pena Queralta, Tomi Westerlund
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06264
  • Pdf link: https://arxiv.org/pdf/2304.06264
  • Abstract
    As mobile robots become more ubiquitous, their deployments grow across use cases where GNSS positioning is either unavailable or unreliable. This has led to increased interest in multi-modal relative localization methods. Complementing onboard odometry, ranging allows for relative state estimation, with ultra-wideband (UWB) ranging having gained widespread recognition due to its low cost and centimeter-level out-of-box accuracy. Infrastructure-free localization methods allow for more dynamic, ad-hoc, and flexible deployments, yet they have received less attention from the research community. In this work, we propose a cooperative relative multi-robot localization where we leverage inter-robot ranging and simultaneous spatial detections of objects in the environment. To achieve this, we equip robots with a single UWB transceiver and a stereo camera. We propose a novel Monte-Carlo approach to estimate relative states by either employing only UWB ranges or dynamically integrating simultaneous spatial detections from the stereo cameras. We also address the challenges for UWB ranging error mitigation, especially in non-line-of-sight, with a study on different LSTM networks to estimate the ranging error. The proposed approach has multiple benefits. First, we show that a single range is enough to estimate the accurate relative states of two robots when fusing odometry measurements. Second, our experiments also demonstrate that our approach surpasses traditional methods such as multilateration in terms of accuracy. Third, to increase accuracy even further, we allow for the integration of cooperative spatial detections. Finally, we show how ROS 2 and Zenoh can be integrated to build a scalable wireless communication solution for multi-robot systems. The experimental validation includes real-time deployment and autonomous navigation based on the relative positioning method.

Gamifying Math Education using Object Detection

  • Authors: Yueqiu Sun, Rohitkrishna Nambiar, Vivek Vidyasagaran
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06270
  • Pdf link: https://arxiv.org/pdf/2304.06270
  • Abstract
    Manipulatives used in the right way help improve mathematical concepts leading to better learning outcomes. In this paper, we present a phygital (physical + digital) curriculum inspired teaching system for kids aged 5-8 to learn geometry using shape tile manipulatives. Combining smaller shapes to form larger ones is an important skill kids learn early on which requires shape tiles to be placed close to each other in the play area. This introduces a challenge of oriented object detection for densely packed objects with arbitrary orientations. Leveraging simulated data for neural network training and light-weight mobile architectures, we enable our system to understand user interactions and provide real-time audiovisual feedback. Experimental results show that our network runs real-time with high precision/recall on consumer devices, thereby providing a consistent and enjoyable learning experience.

Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution

  • Authors: Zhuo Su, Jiehua Zhang, Tianpeng Liu, Zhen Liu, Shuanghui Zhang, Matti Pietikäinen, Li Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06305
  • Pdf link: https://arxiv.org/pdf/2304.06305
  • Abstract
    This paper proposes a novel module called middle spectrum grouped convolution (MSGC) for efficient deep convolutional neural networks (DCNNs) with the mechanism of grouped convolution. It explores the broad "middle spectrum" area between channel pruning and conventional grouped convolution. Compared with channel pruning, MSGC can retain most of the information from the input feature maps due to the group mechanism; compared with grouped convolution, MSGC benefits from the learnability, the core of channel pruning, for constructing its group topology, leading to better channel division. The middle spectrum area is unfolded along four dimensions: group-wise, layer-wise, sample-wise, and attention-wise, making it possible to reveal more powerful and interpretable structures. As a result, the proposed module acts as a booster that can reduce the computational cost of the host backbones for general image recognition with even improved predictive accuracy. For example, in the experiments on ImageNet dataset for image classification, MSGC can reduce the multiply-accumulates (MACs) of ResNet-18 and ResNet-50 by half but still increase the Top-1 accuracy by more than 1%. With 35% reduction of MACs, MSGC can also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on MS COCO dataset for object detection show similar observations. Our code and trained models are available at https://github.com/hellozhuo/msgc.

Towards Understanding the Benefits and Challenges of Demand Responsive Public Transit- A Case Study in the City of Charlotte, NC

  • Authors: Sanaz Sadat Hosseini, Mona Azarbayjani, Jason Lawrence, Hamed Tabkhi
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06467
  • Pdf link: https://arxiv.org/pdf/2304.06467
  • Abstract
    Access to adequate public transportation plays a critical role in inequity and socio-economic mobility, particularly in low-income communities. Low-income workers who rely heavily on public transportation face a spatial disparity between home and work, which leads to higher unemployment, longer job searches, and longer commute times. The overarching goal of this study is to get initial data that would result in creating a connected, coordinated, demand-responsive, and efficient public bus system that minimizes transit gaps for low-income, transit-dependent communities. To create equitable metropolitan public transportation, this paper evaluates existing CATS mobile applications that assist passengers in finding bus routes and arrival times. Our community survey methodology includes filling out questionnaires on Charlotte's current bus system on specific bus lines and determining user acceptance for a future novel smart technology. We have also collected data on the demand and transit gap for a real-world pilot study, Sprinter bus line, Bus line 7, Bus line 9, and Bus lines 97-99. These lines connect all of Charlotte City's main areas and are the most important bus lines in the system. On the studied routes, the primary survey results indicate that the current bus system has many flaws, the major one being the lack of proper timing to meet the needs of passengers. The most common problems are long commutes and long waiting times at stations. Moreover, the existing application provides inaccurate information, and on average, 80 percent of travelers and respondents are inclined to use new technology.

IoT-Based Water Quality Assessment System for Industrial Waste WaterHealthcare Perspective

  • Authors: Abdur Rab Dhruba, Kazi Nabiul Alam, Md. Shakib Khan, Sananda Saha, Mohammad Monirujjaman Khan, Mohammed Baz, Mehedi Masud, Mohammed A. AlZain
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06491
  • Pdf link: https://arxiv.org/pdf/2304.06491
  • Abstract
    The environment, especially water, gets polluted due to industrialization and urbanization. Pollution due to industrialization and urbanization has harmful effects on both the environment and the lives on Earth. This polluted water can cause food poisoning, diarrhea, short-term gastrointestinal problems, respiratory diseases, skin problems, and other serious health complications. In a developing country like Bangladesh, where ready-made garments sector is one of the major sources of the total Gross Domestic Product (GDP), most of the wastes released from the garment factories are dumped into the nearest rivers or canals. Hence, the quality of the water of these bodies become very incompatible for the living beings, and so, it has become one of the major threats to the environment and human health. In addition, the amount of fish in the rivers and canals in Bangladesh is decreasing day by day as a result of water pollution. Therefore, to save fish and other water animals and the environment, we need to monitor the quality of the water and find out the reasons for the pollution. Real-time monitoring of the quality of water is vital for controlling water pollution. Most of the approaches for controlling water pollution are mainly biological and lab-based, which takes a lot of time and resources. To address this issue, we developed an Internet of Things (IoT)-based real-time water quality monitoring system, integrated with a mobile application. The proposed system in this research measures some of the most important indexes of water, including the potential of hydrogen (pH), total dissolved solids (TDS), and turbidity, and temperature of water. The proposed system results will be very helpful in saving the environment, and thus, improving the health of living creatures on Earth.

IoT-Based Remote Health Monitoring System Employing Smart Sensors for Asthma Patients during COVID-19 Pandemic

  • Authors: Nafisa Shamim Rafa, Basma Binte Azmal, Abdur Rab Dhruba, Mohammad Monirujjaman Khan, Turki M. Alanazi, Faris A. Almalki, Othman AlOmeir
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06511
  • Pdf link: https://arxiv.org/pdf/2304.06511
  • Abstract
    COVID19 and asthma are respiratory diseases that can be life threatening in uncontrolled circumstances and require continuous monitoring. A poverty stricken South Asian country like Bangladesh has been bearing the brunt of the COVID19 pandemic since its beginning. The majority of the country's population resides in rural areas, where proper healthcare is difficult to access. This emphasizes the necessity of telemedicine, implementing the concept of the Internet of Things (IoT), which is still under development in Bangladesh. This paper demonstrates how the current challenges in the healthcare system are resolvable through the design of a remote health and environment monitoring system, specifically for asthma patients who are at an increased risk of COVID19. Since on-time treatment is essential, this system will allow doctors and medical staff to receive patient information in real time and deliver their services immediately to the patient regardless of their location. The proposed system consists of various sensors collecting heart rate, body temperature, ambient temperature, humidity, and air quality data and processing them through the Arduino Microcontroller. It is integrated with a mobile application. All this data is sent to the mobile application via a Bluetooth module and updated every few seconds so that the medical staff can instantly track patients' conditions and emergencies. The developed prototype is portable and easily usable by anyone. The system has been applied to five people of different ages and medical histories over a particular period. Upon analyzing all their data, it became clear which participants were particularly vulnerable to health deterioration and needed constant observation. Through this research, awareness about asthmatic symptoms will improve and help prevent their severity through effective treatment anytime, anywhere.

Keyword: pruning

Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution

  • Authors: Zhuo Su, Jiehua Zhang, Tianpeng Liu, Zhen Liu, Shuanghui Zhang, Matti Pietikäinen, Li Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06305
  • Pdf link: https://arxiv.org/pdf/2304.06305
  • Abstract
    This paper proposes a novel module called middle spectrum grouped convolution (MSGC) for efficient deep convolutional neural networks (DCNNs) with the mechanism of grouped convolution. It explores the broad "middle spectrum" area between channel pruning and conventional grouped convolution. Compared with channel pruning, MSGC can retain most of the information from the input feature maps due to the group mechanism; compared with grouped convolution, MSGC benefits from the learnability, the core of channel pruning, for constructing its group topology, leading to better channel division. The middle spectrum area is unfolded along four dimensions: group-wise, layer-wise, sample-wise, and attention-wise, making it possible to reveal more powerful and interpretable structures. As a result, the proposed module acts as a booster that can reduce the computational cost of the host backbones for general image recognition with even improved predictive accuracy. For example, in the experiments on ImageNet dataset for image classification, MSGC can reduce the multiply-accumulates (MACs) of ResNet-18 and ResNet-50 by half but still increase the Top-1 accuracy by more than 1%. With 35% reduction of MACs, MSGC can also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on MS COCO dataset for object detection show similar observations. Our code and trained models are available at https://github.com/hellozhuo/msgc.

Learning Accurate Performance Predictors for Ultrafast Automated Model Compression

  • Authors: Ziwei Wang, Jiwen Lu, Han Xiao, Shengyu Liu, Jie Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06393
  • Pdf link: https://arxiv.org/pdf/2304.06393
  • Abstract
    In this paper, we propose an ultrafast automated model compression framework called SeerNet for flexible network deployment. Conventional non-differen-tiable methods discretely search the desirable compression policy based on the accuracy from exhaustively trained lightweight models, and existing differentiable methods optimize an extremely large supernet to obtain the required compressed model for deployment. They both cause heavy computational cost due to the complex compression policy search and evaluation process. On the contrary, we obtain the optimal efficient networks by directly optimizing the compression policy with an accurate performance predictor, where the ultrafast automated model compression for various computational cost constraint is achieved without complex compression policy search and evaluation. Specifically, we first train the performance predictor based on the accuracy from uncertain compression policies actively selected by efficient evolutionary search, so that informative supervision is provided to learn the accurate performance predictor with acceptable cost. Then we leverage the gradient that maximizes the predicted performance under the barrier complexity constraint for ultrafast acquisition of the desirable compression policy, where adaptive update stepsizes with momentum are employed to enhance optimality of the acquired pruning and quantization strategy. Compared with the state-of-the-art automated model compression methods, experimental results on image classification and object detection show that our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.

Keyword: voxel

$E(3) \times SO(3)$-Equivariant Networks for Spherical Deconvolution in Diffusion MRI

  • Authors: Axel Elaldi, Guido Gerig, Neel Dey
  • Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06103
  • Pdf link: https://arxiv.org/pdf/2304.06103
  • Abstract
    We present Roto-Translation Equivariant Spherical Deconvolution (RT-ESD), an $E(3)\times SO(3)$ equivariant framework for sparse deconvolution of volumes where each voxel contains a spherical signal. Such 6D data naturally arises in diffusion MRI (dMRI), a medical imaging modality widely used to measure microstructure and structural connectivity. As each dMRI voxel is typically a mixture of various overlapping structures, there is a need for blind deconvolution to recover crossing anatomical structures such as white matter tracts. Existing dMRI work takes either an iterative or deep learning approach to sparse spherical deconvolution, yet it typically does not account for relationships between neighboring measurements. This work constructs equivariant deep learning layers which respect to symmetries of spatial rotations, reflections, and translations, alongside the symmetries of voxelwise spherical rotations. As a result, RT-ESD improves on previous work across several tasks including fiber recovery on the DiSCo dataset, deconvolution-derived partial volume estimation on real-world \textit{in vivo} human brain dMRI, and improved downstream reconstruction of fiber tractograms on the Tractometer dataset. Our implementation is available at https://github.com/AxelElaldi/e3so3_conv

Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

  • Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.06178
  • Pdf link: https://arxiv.org/pdf/2304.06178
  • Abstract
    Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.

Brain Structure Ages -- A new biomarker for multi-disease classification

  • Authors: Huy-Dung Nguyen, Michaël Clément, Boris Mansencal, Pierrick Coupé
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06591
  • Pdf link: https://arxiv.org/pdf/2304.06591
  • Abstract
    Age is an important variable to describe the expected brain's anatomy status across the normal aging trajectory. The deviation from that normative aging trajectory may provide some insights into neurological diseases. In neuroimaging, predicted brain age is widely used to analyze different diseases. However, using only the brain age gap information (\ie the difference between the chronological age and the estimated age) can be not enough informative for disease classification problems. In this paper, we propose to extend the notion of global brain age by estimating brain structure ages using structural magnetic resonance imaging. To this end, an ensemble of deep learning models is first used to estimate a 3D aging map (\ie voxel-wise age estimation). Then, a 3D segmentation mask is used to obtain the final brain structure ages. This biomarker can be used in several situations. First, it enables to accurately estimate the brain age for the purpose of anomaly detection at the population level. In this situation, our approach outperforms several state-of-the-art methods. Second, brain structure ages can be used to compute the deviation from the normal aging process of each brain structure. This feature can be used in a multi-disease classification task for an accurate differential diagnosis at the subject level. Finally, the brain structure age deviations of individuals can be visualized, providing some insights about brain abnormality and helping clinicians in real medical contexts.

Keyword: lidar

Survey on LiDAR Perception in Adverse Weather Conditions

  • Authors: Mariella Dreissig, Dominik Scheuble, Florian Piewak, Joschka Boedecker
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06312
  • Pdf link: https://arxiv.org/pdf/2304.06312
  • Abstract
    Autonomous vehicles rely on a variety of sensors to gather information about their surrounding. The vehicle's behavior is planned based on the environment perception, making its reliability crucial for safety reasons. The active LiDAR sensor is able to create an accurate 3D representation of a scene, making it a valuable addition for environment perception for autonomous vehicles. Due to light scattering and occlusion, the LiDAR's performance change under adverse weather conditions like fog, snow or rain. This limitation recently fostered a large body of research on approaches to alleviate the decrease in perception performance. In this survey, we gathered, analyzed, and discussed different aspects on dealing with adverse weather conditions in LiDAR-based environment perception. We address topics such as the availability of appropriate data, raw point cloud processing and denoising, robust perception algorithms and sensor fusion to mitigate adverse weather induced shortcomings. We furthermore identify the most pressing gaps in the current literature and pinpoint promising research directions.

An Automotive Case Study on the Limits of Approximation for Object Detection

  • Authors: Martí Caro, Hamid Tabani, Jaume Abella, Francesc Moll, Enric Morancho, Ramon Canal, Josep Altet, Antonio Calomarde, Francisco J. Cazorla, Antonio Rubio, Pau Fontova, Jordi Fornt
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.06327
  • Pdf link: https://arxiv.org/pdf/2304.06327
  • Abstract
    The accuracy of camera-based object detection (CBOD) built upon deep learning is often evaluated against the real objects in frames only. However, such simplistic evaluation ignores the fact that many unimportant objects are small, distant, or background, and hence, their misdetections have less impact than those for closer, larger, and foreground objects in domains such as autonomous driving. Moreover, sporadic misdetections are irrelevant since confidence on detections is typically averaged across consecutive frames, and detection devices (e.g. cameras, LiDARs) are often redundant, thus providing fault tolerance. This paper exploits such intrinsic fault tolerance of the CBOD process, and assesses in an automotive case study to what extent CBOD can tolerate approximation coming from multiple sources such as lower precision arithmetic, approximate arithmetic units, and even random faults due to, for instance, low voltage operation. We show that the accuracy impact of those sources of approximation is within 1% of the baseline even when considering the three approximate domains simultaneously, and hence, multiple sources of approximation can be exploited to build highly efficient accelerators for CBOD in cars.

RadarGNN: Transformation Invariant Graph Neural Network for Radar-based Perception

  • Authors: Felix Fent, Philipp Bauerschmidt, Markus Lienkamp
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06547
  • Pdf link: https://arxiv.org/pdf/2304.06547
  • Abstract
    A reliable perception has to be robust against challenging environmental conditions. Therefore, recent efforts focused on the use of radar sensors in addition to camera and lidar sensors for perception applications. However, the sparsity of radar point clouds and the poor data availability remain challenging for current perception methods. To address these challenges, a novel graph neural network is proposed that does not just use the information of the points themselves but also the relationships between the points. The model is designed to consider both point features and point-pair features, embedded in the edges of the graph. Furthermore, a general approach for achieving transformation invariance is proposed which is robust against unseen scenarios and also counteracts the limited data availability. The transformation invariance is achieved by an invariant data representation rather than an invariant model architecture, making it applicable to other methods. The proposed RadarGNN model outperforms all previous methods on the RadarScenes dataset. In addition, the effects of different invariances on the object detection and semantic segmentation quality are investigated. The code is made available as open-source software under https://github.com/TUMFTM/RadarGNN.

Keyword: diffusion

Social Biases through the Text-to-Image Generation Lens

  • Authors: Ranjita Naik, Besmira Nushi
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06034
  • Pdf link: https://arxiv.org/pdf/2304.06034
  • Abstract
    Text-to-Image (T2I) generation is enabling new applications that support creators, designers, and general end users of productivity software by generating illustrative content with high photorealism starting from a given descriptive text as a prompt. Such models are however trained on massive amounts of web data, which surfaces the peril of potential harmful biases that may leak in the generation process itself. In this paper, we take a multi-dimensional approach to studying and quantifying common social biases as reflected in the generated images, by focusing on how occupations, personality traits, and everyday situations are depicted across representations of (perceived) gender, age, race, and geographical location. Through an extensive set of both automated and human evaluation experiments we present findings for two popular T2I models: DALLE-v2 and Stable Diffusion. Our results reveal that there exist severe occupational biases of neutral prompts majorly excluding groups of people from results for both models. Such biases can get mitigated by increasing the amount of specification in the prompt itself, although the prompting mitigation will not address discrepancies in image quality or other usages of the model or its representations in other scenarios. Further, we observe personality traits being associated with only a limited set of people at the intersection of race, gender, and age. Finally, an analysis of geographical location representations on everyday situations (e.g., park, food, weddings) shows that for most situations, images generated through default location-neutral prompts are closer and more similar to images generated for locations of United States and Germany.

$E(3) \times SO(3)$-Equivariant Networks for Spherical Deconvolution in Diffusion MRI

  • Authors: Axel Elaldi, Guido Gerig, Neel Dey
  • Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06103
  • Pdf link: https://arxiv.org/pdf/2304.06103
  • Abstract
    We present Roto-Translation Equivariant Spherical Deconvolution (RT-ESD), an $E(3)\times SO(3)$ equivariant framework for sparse deconvolution of volumes where each voxel contains a spherical signal. Such 6D data naturally arises in diffusion MRI (dMRI), a medical imaging modality widely used to measure microstructure and structural connectivity. As each dMRI voxel is typically a mixture of various overlapping structures, there is a need for blind deconvolution to recover crossing anatomical structures such as white matter tracts. Existing dMRI work takes either an iterative or deep learning approach to sparse spherical deconvolution, yet it typically does not account for relationships between neighboring measurements. This work constructs equivariant deep learning layers which respect to symmetries of spatial rotations, reflections, and translations, alongside the symmetries of voxelwise spherical rotations. As a result, RT-ESD improves on previous work across several tasks including fiber recovery on the DiSCo dataset, deconvolution-derived partial volume estimation on real-world \textit{in vivo} human brain dMRI, and improved downstream reconstruction of fiber tractograms on the Tractometer dataset. Our implementation is available at https://github.com/AxelElaldi/e3so3_conv

PATMAT: Person Aware Tuning of Mask-Aware Transformer for Face Inpainting

  • Authors: Saman Motamed, Jianjin Xu, Chen Henry Wu, Fernando De la Torre
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06107
  • Pdf link: https://arxiv.org/pdf/2304.06107
  • Abstract
    Generative models such as StyleGAN2 and Stable Diffusion have achieved state-of-the-art performance in computer vision tasks such as image synthesis, inpainting, and de-noising. However, current generative models for face inpainting often fail to preserve fine facial details and the identity of the person, despite creating aesthetically convincing image structures and textures. In this work, we propose Person Aware Tuning (PAT) of Mask-Aware Transformer (MAT) for face inpainting, which addresses this issue. Our proposed method, PATMAT, effectively preserves identity by incorporating reference images of a subject and fine-tuning a MAT architecture trained on faces. By using ~40 reference images, PATMAT creates anchor points in MAT's style module, and tunes the model using the fixed anchors to adapt the model to a new face identity. Moreover, PATMAT's use of multiple images per anchor during training allows the model to use fewer reference images than competing methods. We demonstrate that PATMAT outperforms state-of-the-art models in terms of image quality, the preservation of person-specific details, and the identity of the subject. Our results suggest that PATMAT can be a promising approach for improving the quality of personalized face inpainting.

An Edit Friendly DDPM Noise Space: Inversion and Manipulations

  • Authors: Inbar Huberman-Spiegelglas, Vladimir Kulikov, Tomer Michaeli
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06140
  • Pdf link: https://arxiv.org/pdf/2304.06140
  • Abstract
    Denoising diffusion probabilistic models (DDPMs) employ a sequence of white Gaussian noise samples to generate an image. In analogy with GANs, those noise maps could be considered as the latent code associated with the generated image. However, this native noise space does not possess a convenient structure, and is thus challenging to work with in editing tasks. Here, we propose an alternative latent noise space for DDPM that enables a wide range of editing operations via simple means, and present an inversion method for extracting these edit-friendly noise maps for any given image (real or synthetically generated). As opposed to the native DDPM noise space, the edit-friendly noise maps do not have a standard normal distribution and are not statistically independent across timesteps. However, they allow perfect reconstruction of any desired image, and simple transformations on them translate into meaningful manipulations of the output image (e.g., shifting, color edits). Moreover, in text-conditional models, fixing those noise maps while changing the text prompt, modifies semantics while retaining structure. We illustrate how this property enables text-based editing of real images via the diverse DDPM sampling scheme (in contrast to the popular non-diverse DDIM inversion). We also show how it can be used within existing diffusion-based editing methods to improve their quality and diversity.

Intriguing properties of synthetic images: from generative adversarial networks to diffusion models

  • Authors: Riccardo Corvi, Davide Cozzolino, Giovanni Poggi, Koki Nagano, Luisa Verdoliva
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06408
  • Pdf link: https://arxiv.org/pdf/2304.06408
  • Abstract
    Detecting fake images is becoming a major goal of computer vision. This need is becoming more and more pressing with the continuous improvement of synthesis methods based on Generative Adversarial Networks (GAN), and even more with the appearance of powerful methods based on Diffusion Models (DM). Towards this end, it is important to gain insight into which image features better discriminate fake images from real ones. In this paper we report on our systematic study of a large number of image generators of different families, aimed at discovering the most forensically relevant characteristics of real and generated images. Our experiments provide a number of interesting observations and shed light on some intriguing properties of synthetic images: (1) not only the GAN models but also the DM and VQ-GAN (Vector Quantized Generative Adversarial Networks) models give rise to visible artifacts in the Fourier domain and exhibit anomalous regular patterns in the autocorrelation; (2) when the dataset used to train the model lacks sufficient variety, its biases can be transferred to the generated images; (3) synthetic and real images exhibit significant differences in the mid-high frequency signal content, observable in their radial and angular spectral power distributions.

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

  • Authors: Enze Xie, Lewei Yao, Han Shi, Zhili Liu, Daquan Zhou, Zhaoqiang Liu, Jiawei Li, Zhenguo Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06648
  • Pdf link: https://arxiv.org/pdf/2304.06648
  • Abstract
    Diffusion models have proven to be highly effective in generating high-quality images. However, adapting large pre-trained diffusion models to new domains remains an open challenge, which is critical for real-world applications. This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains. DiffFit is embarrassingly simple that only fine-tunes the bias term and newly-added scaling factors in specific layers, yet resulting in significant training speed-up and reduced model storage costs. Compared with full fine-tuning, DiffFit achieves 2$\times$ training speed-up and only needs to store approximately 0.12% of the total model parameters. Intuitive theoretical analysis has been provided to justify the efficacy of scaling factors on fast adaptation. On 8 downstream datasets, DiffFit achieves superior or competitive performances compared to the full fine-tuning while being more efficient. Remarkably, we show that DiffFit can adapt a pre-trained low-resolution generative model to a high-resolution one by adding minimal cost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of 3.02 on ImageNet 512$\times$512 benchmark by fine-tuning only 25 epochs from a public pre-trained ImageNet 256$\times$256 checkpoint while being 30$\times$ more training efficient than the closest competitor.

Learning Controllable 3D Diffusion Models from Single-view Images

  • Authors: Jiatao Gu, Qingzhe Gao, Shuangfei Zhai, Baoquan Chen, Lingjie Liu, Josh Susskind
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06700
  • Pdf link: https://arxiv.org/pdf/2304.06700
  • Abstract
    Diffusion models have recently become the de-facto approach for generative modeling in the 2D domain. However, extending diffusion models to 3D is challenging due to the difficulties in acquiring 3D ground truth data for training. On the other hand, 3D GANs that integrate implicit 3D representations into GANs have shown remarkable 3D-aware generation when trained only on single-view image datasets. However, 3D GANs do not provide straightforward ways to precisely control image synthesis. To address these challenges, We present Control3Diff, a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis for single-view datasets. Control3Diff explicitly models the underlying latent distribution (optionally conditioned on external inputs), thus enabling direct control during the diffusion process. Moreover, our approach is general and applicable to any type of controlling input, allowing us to train it with the same diffusion objective without any auxiliary supervision. We validate the efficacy of Control3Diff on standard image generation benchmarks, including FFHQ, AFHQ, and ShapeNet, using various conditioning inputs such as images, sketches, and text prompts. Please see the project website (\url{https://jiataogu.me/control3diff}) for video comparisons.

DiffusionRig: Learning Personalized Priors for Facial Appearance Editing

  • Authors: Zheng Ding, Xuaner Zhang, Zhihao Xia, Lars Jebe, Zhuowen Tu, Xiuming Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.06711
  • Pdf link: https://arxiv.org/pdf/2304.06711
  • Abstract
    We address the problem of learning person-specific facial priors from a small number (e.g., 20) of portrait photos of the same person. This enables us to edit this specific person's facial appearance, such as expression and lighting, while preserving their identity and high-frequency facial details. Key to our approach, which we dub DiffusionRig, is a diffusion model conditioned on, or "rigged by," crude 3D face models estimated from single in-the-wild images by an off-the-shelf estimator. On a high level, DiffusionRig learns to map simplistic renderings of 3D face models to realistic photos of a given person. Specifically, DiffusionRig is trained in two stages: It first learns generic facial priors from a large-scale face dataset and then person-specific priors from a small portrait photo collection of the person of interest. By learning the CGI-to-photo mapping with such personalized priors, DiffusionRig can "rig" the lighting, facial expression, head pose, etc. of a portrait photo, conditioned only on coarse 3D models while preserving this person's identity and other high-frequency characteristics. Qualitative and quantitative experiments show that DiffusionRig outperforms existing approaches in both identity preservation and photorealism. Please see the project website: https://diffusionrig.github.io for the supplemental material, video, code, and data.

Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction

  • Authors: Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, Hao Su
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06714
  • Pdf link: https://arxiv.org/pdf/2304.06714
  • Abstract
    3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images. Despite numerous task-specific methods, developing a comprehensive model remains challenging. In this paper, we present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects. Previous studies have used two-stage approaches that rely on pretrained NeRFs as real data to train diffusion models. In contrast, we propose a new single-stage training paradigm with an end-to-end objective that jointly optimizes a NeRF auto-decoder and a latent diffusion model, enabling simultaneous 3D reconstruction and prior learning, even from sparsely available views. At test time, we can directly sample the diffusion prior for unconditional generation, or combine it with arbitrary observations of unseen objects for NeRF reconstruction. SSDNeRF demonstrates robust results comparable to or better than leading task-specific methods in unconditional generation and single/sparse-view 3D reconstruction.

Expressive Text-to-Image Generation with Rich Text

  • Authors: Songwei Ge, Taesung Park, Jun-Yan Zhu, Jia-Bin Huang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06720
  • Pdf link: https://arxiv.org/pdf/2304.06720
  • Abstract
    Plain text has become a prevalent interface for text-to-image synthesis. However, its limited customization options hinder users from accurately describing desired outputs. For example, plain text makes it hard to specify continuous quantities, such as the precise RGB color value or importance of each word. Furthermore, creating detailed text prompts for complex scenes is tedious for humans to write and challenging for text encoders to interpret. To address these challenges, we propose using a rich-text editor supporting formats such as font style, size, color, and footnote. We extract each word's attributes from rich text to enable local style control, explicit token reweighting, precise color rendering, and detailed region synthesis. We achieve these capabilities through a region-based diffusion process. We first obtain each word's region based on cross-attention maps of a vanilla diffusion process using plain text. For each region, we enforce its text attributes by creating region-specific detailed prompts and applying region-specific guidance. We present various examples of image generation from rich text and demonstrate that our method outperforms strong baselines with quantitative evaluations.

Keyword: dynamic

Fairness: from the ethical principle to the practice of Machine Learning development as an ongoing agreement with stakeholders

  • Authors: Georgina Curto, Flavio Comim
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06031
  • Pdf link: https://arxiv.org/pdf/2304.06031
  • Abstract
    This paper clarifies why bias cannot be completely mitigated in Machine Learning (ML) and proposes an end-to-end methodology to translate the ethical principle of justice and fairness into the practice of ML development as an ongoing agreement with stakeholders. The pro-ethical iterative process presented in the paper aims to challenge asymmetric power dynamics in the fairness decision making within ML design and support ML development teams to identify, mitigate and monitor bias at each step of ML systems development. The process also provides guidance on how to explain the always imperfect trade-offs in terms of bias to users.

Web 3.0: The Future of Internet

  • Authors: Wensheng Gan, Zhenqiang Ye, Shicheng Wan, Philip S. Yu
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06032
  • Pdf link: https://arxiv.org/pdf/2304.06032
  • Abstract
    With the rapid growth of the Internet, human daily life has become deeply bound to the Internet. To take advantage of massive amounts of data and information on the internet, the Web architecture is continuously being reinvented and upgraded. From the static informative characteristics of Web 1.0 to the dynamic interactive features of Web 2.0, scholars and engineers have worked hard to make the internet world more open, inclusive, and equal. Indeed, the next generation of Web evolution (i.e., Web 3.0) is already coming and shaping our lives. Web 3.0 is a decentralized Web architecture that is more intelligent and safer than before. The risks and ruin posed by monopolists or criminals will be greatly reduced by a complete reconstruction of the Internet and IT infrastructure. In a word, Web 3.0 is capable of addressing web data ownership according to distributed technology. It will optimize the internet world from the perspectives of economy, culture, and technology. Then it promotes novel content production methods, organizational structures, and economic forms. However, Web 3.0 is not mature and is now being disputed. Herein, this paper presents a comprehensive survey of Web 3.0, with a focus on current technologies, challenges, opportunities, and outlook. This article first introduces a brief overview of the history of World Wide Web as well as several differences among Web 1.0, Web 2.0, Web 3.0, and Web3. Then, some technical implementations of Web 3.0 are illustrated in detail. We discuss the revolution and benefits that Web 3.0 brings. Finally, we explore several challenges and issues in this promising area.

Learning solution of nonlinear constitutive material models using physics-informed neural networks: COMM-PINN

  • Authors: Shahed Rezaei, Ahmad Moeineddin, Ali Harandi
  • Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06044
  • Pdf link: https://arxiv.org/pdf/2304.06044
  • Abstract
    We applied physics-informed neural networks to solve the constitutive relations for nonlinear, path-dependent material behavior. As a result, the trained network not only satisfies all thermodynamic constraints but also instantly provides information about the current material state (i.e., free energy, stress, and the evolution of internal variables) under any given loading scenario without requiring initial data. One advantage of this work is that it bypasses the repetitive Newton iterations needed to solve nonlinear equations in complex material models. Additionally, strategies are provided to reduce the required order of derivation for obtaining the tangent operator. The trained model can be directly used in any finite element package (or other numerical methods) as a user-defined material model. However, challenges remain in the proper definition of collocation points and in integrating several non-equality constraints that become active or non-active simultaneously. We tested this methodology on rate-independent processes such as the classical von Mises plasticity model with a nonlinear hardening law, as well as local damage models for interface cracking behavior with a nonlinear softening law. Finally, we discuss the potential and remaining challenges for future developments of this new approach.

Primal-Dual Contextual Bayesian Optimization for Control System Online Optimization with Time-Average Constraints

  • Authors: Wenjie Xu, Yuning Jiang, Bratislav Svetozarevic, Colin N. Jones
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.06104
  • Pdf link: https://arxiv.org/pdf/2304.06104
  • Abstract
    This paper studies the problem of online performance optimization of constrained closed-loop control systems, where both the objective and the constraints are unknown black-box functions affected by exogenous time-varying contextual disturbances. A primal-dual contextual Bayesian optimization algorithm is proposed that achieves sublinear cumulative regret with respect to the dynamic optimal solution under certain regularity conditions. Furthermore, the algorithm achieves zero time-average constraint violation, ensuring that the average value of the constraint function satisfies the desired constraint. The method is applied to both sampled instances from Gaussian processes and a continuous stirred tank reactor parameter tuning problem; simulation results show that the method simultaneously provides close-to-optimal performance and maintains constraint feasibility on average. This contrasts current state-of-the-art methods, which either suffer from large cumulative regret or severe constraint violations for the case studies presented.

IoT trust and reputation: a survey and taxonomy

  • Authors: Muhammad Aaqib, Aftab Ali, Liming Chen, Omar Nibouche
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06119
  • Pdf link: https://arxiv.org/pdf/2304.06119
  • Abstract
    IoT is one of the fastest-growing technologies and it is estimated that more than a billion devices would be utilized across the globe by the end of 2030. To maximize the capability of these connected entities, trust and reputation among IoT entities is essential. Several trust management models have been proposed in the IoT environment; however, these schemes have not fully addressed the IoT devices features, such as devices role, device type and its dynamic behavior in a smart environment. As a result, traditional trust and reputation models are insufficient to tackle these characteristics and uncertainty risks while connecting nodes to the network. Whilst continuous study has been carried out and various articles suggest promising solutions in constrained environments, research on trust and reputation is still at its infancy. In this paper, we carry out a comprehensive literature review on state-of-the-art research on the trust and reputation of IoT devices and systems. Specifically, we first propose a new structure, namely a new taxonomy, to organize the trust and reputation models based on the ways trust is managed. The proposed taxonomy comprises of traditional trust management-based systems and artificial intelligence-based systems, and combine both the classes which encourage the existing schemes to adapt these emerging concepts. This collaboration between the conventional mathematical and the advanced ML models result in design schemes that are more robust and efficient. Then we drill down to compare and analyse the methods and applications of these systems based on community-accepted performance metrics, e.g. scalability, delay, cooperativeness and efficiency. Finally, built upon the findings of the analysis, we identify and discuss open research issues and challenges, and further speculate and point out future research directions.

Robust and Context-Aware Real-Time Collaborative Robot Handling via Dynamic Gesture Commands

  • Authors: Rui Chen, Alvin Shek, Changliu Liu
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06175
  • Pdf link: https://arxiv.org/pdf/2304.06175
  • Abstract
    This paper studies real-time collaborative robot (cobot) handling, where the cobot maneuvers an object under human dynamic gesture commands. Enabling dynamic gesture commands is useful when the human needs to avoid direct contact with the robot or the object handled by the robot. However, the key challenge lies in the heterogeneity in human behaviors and the stochasticity in the perception of dynamic gestures, which requires the robot handling policy to be adaptable and robust. To address these challenges, we introduce Conditional Collaborative Handling Process (CCHP) to encode a contextaware cobot handling policy and a procedure to learn such policy from human-human collaboration. We thoroughly evaluate the adaptability and robustness of CCHP and apply our approach to a real-time cobot assembly task with Kinova Gen3 robot arm. Results show that our method leads to significantly less human effort and smoother human-robot collaboration than state-of-the-art rule-based approach even with first-time users.

Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

  • Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.06178
  • Pdf link: https://arxiv.org/pdf/2304.06178
  • Abstract
    Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.

Do "bad" citations have "good" effects?

  • Authors: Honglin Bao, Misha Teplitskiy
  • Subjects: Digital Libraries (cs.DL); Computers and Society (cs.CY); Multiagent Systems (cs.MA); Adaptation and Self-Organizing Systems (nlin.AO)
  • Arxiv link: https://arxiv.org/abs/2304.06190
  • Pdf link: https://arxiv.org/pdf/2304.06190
  • Abstract
    The scientific community generally discourages authors of research papers from citing papers that did not influence them because such "rhetorical" citations are assumed to degrade the literature and incentives for good work. Intuitively, a world where authors cite only substantively appears attractive. We argue that manding substantive citing may have underappreciated consequences on the allocation of attention and dynamism. We develop a novel agent-based model in which agents cite substantively and rhetorically. Agents first select papers to read based on their expected quality, read them and observe their actual quality, become influenced by those that are sufficiently good, and substantively cite them. Next, agents fill any remaining slots in the reference lists with papers that support their claims, regardless of whether they were actually influential. By turning rhetorical citing on-and-off, we find that rhetorical citing increases the correlation between quality and citations, increases citation churn, and reduces citation inequality. This occurs because rhetorical citing redistributes some citations from a stable set of elite-quality papers to a more dynamic set with high-to-moderate quality and high rhetorical value. Increasing the size of reference lists, often seen as an undesirable trend, amplifies the effects. In sum, rhetorical citing helps deconcentrate attention and makes it easier to displace incumbent ideas, so whether it is indeed undesirable depends on the metrics used to judge desirability.

Learning Over All Contracting and Lipschitz Closed-Loops for Partially-Observed Nonlinear Systems

  • Authors: Nicholas H. Barbara, Ruigang Wang, Ian R. Manchester
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.06193
  • Pdf link: https://arxiv.org/pdf/2304.06193
  • Abstract
    This paper presents a policy parameterization for learning-based control on nonlinear, partially-observed dynamical systems. The parameterization is based on a nonlinear version of the Youla parameterization and the recently proposed Recurrent Equilibrium Network (REN) class of models. We prove that the resulting Youla-REN parameterization automatically satisfies stability (contraction) and user-tunable robustness (Lipschitz) conditions on the closed-loop system. This means it can be used for safe learning-based control with no additional constraints or projections required to enforce stability or robustness. We test the new policy class in simulation on two reinforcement learning tasks: 1) magnetic suspension, and 2) inverting a rotary-arm pendulum. We find that the Youla-REN performs similarly to existing learning-based and optimal control methods while also ensuring stability and exhibiting improved robustness to adversarial disturbances.

Sub-Optimal Moving Horizon Estimation in Feedback Control of Linear Constrained Systems

  • Authors: Yujia Yang, Chris Manzie, Ye Pu
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06216
  • Pdf link: https://arxiv.org/pdf/2304.06216
  • Abstract
    Moving horizon estimation (MHE) offers benefits relative to other estimation approaches by its ability to explicitly handle constraints, but suffers increased computation cost. To help enable MHE on platforms with limited computation power, we propose to solve the optimization problem underlying MHE sub-optimally for a fixed number of optimization iterations per time step. The stability of the closed-loop system is analyzed using the small-gain theorem by considering the closed-loop controlled system, the optimization algorithm dynamics, and the estimation error dynamics as three interconnected subsystems. By assuming incremental input/output-to-state stability ({\delta}- IOSS) of the system and imposing standard ISS conditions on the controller, we derive conditions on the iteration number such that the interconnected system is input-to-state stable (ISS) w.r.t. the external disturbances. A simulation using an MHE- MPC estimator-controller pair is used to validate the results.

Physics-informed radial basis network (PIRBN): A local approximation neural network for solving nonlinear PDEs

  • Authors: Jinshuai Bai, Gui-Rong Liu, Ashish Gupta, Laith Alzubaidi, Xi-Qiao Feng, YuanTong Gu
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06234
  • Pdf link: https://arxiv.org/pdf/2304.06234
  • Abstract
    Our recent intensive study has found that physics-informed neural networks (PINN) tend to be local approximators after training. This observation leads to this novel physics-informed radial basis network (PIRBN), which can maintain the local property throughout the entire training process. Compared to deep neural networks, a PIRBN comprises of only one hidden layer and a radial basis "activation" function. Under appropriate conditions, we demonstrated that the training of PIRBNs using gradient descendent methods can converge to Gaussian processes. Besides, we studied the training dynamics of PIRBN via the neural tangent kernel (NTK) theory. In addition, comprehensive investigations regarding the initialisation strategies of PIRBN were conducted. Based on numerical examples, PIRBN has been demonstrated to be more effective and efficient than PINN in solving PDEs with high-frequency features and ill-posed computational domains. Moreover, the existing PINN numerical techniques, such as adaptive learning, decomposition and different types of loss functions, are applicable to PIRBN. The programs that can regenerate all numerical results can be found at https://github.com/JinshuaiBai/PIRBN.

Loosely Coupled Odometry, UWB Ranging, and Cooperative Spatial Detection for Relative Monte-Carlo Multi-Robot Localization

  • Authors: Xianjia Yu, Paola Torrico Morrn, Sahar Salimpour, Jorge Pena Queralta, Tomi Westerlund
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06264
  • Pdf link: https://arxiv.org/pdf/2304.06264
  • Abstract
    As mobile robots become more ubiquitous, their deployments grow across use cases where GNSS positioning is either unavailable or unreliable. This has led to increased interest in multi-modal relative localization methods. Complementing onboard odometry, ranging allows for relative state estimation, with ultra-wideband (UWB) ranging having gained widespread recognition due to its low cost and centimeter-level out-of-box accuracy. Infrastructure-free localization methods allow for more dynamic, ad-hoc, and flexible deployments, yet they have received less attention from the research community. In this work, we propose a cooperative relative multi-robot localization where we leverage inter-robot ranging and simultaneous spatial detections of objects in the environment. To achieve this, we equip robots with a single UWB transceiver and a stereo camera. We propose a novel Monte-Carlo approach to estimate relative states by either employing only UWB ranges or dynamically integrating simultaneous spatial detections from the stereo cameras. We also address the challenges for UWB ranging error mitigation, especially in non-line-of-sight, with a study on different LSTM networks to estimate the ranging error. The proposed approach has multiple benefits. First, we show that a single range is enough to estimate the accurate relative states of two robots when fusing odometry measurements. Second, our experiments also demonstrate that our approach surpasses traditional methods such as multilateration in terms of accuracy. Third, to increase accuracy even further, we allow for the integration of cooperative spatial detections. Finally, we show how ROS 2 and Zenoh can be integrated to build a scalable wireless communication solution for multi-robot systems. The experimental validation includes real-time deployment and autonomous navigation based on the relative positioning method.

Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning

  • Authors: Wenli Xiao, Yiwei Lyu, John Dolan
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06281
  • Pdf link: https://arxiv.org/pdf/2304.06281
  • Abstract
    Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Although shielding with Linear Temporal Logic (LTL) is a promising formal method to ensure safety in single-agent Reinforcement Learning (RL), it results in conservative behaviors when scaling to multi-agent scenarios. Additionally, it poses computational challenges for synthesizing shields in complex multi-agent environments. This work introduces Model-based Dynamic Shielding (MBDS) to support MARL algorithm design. Our algorithm synthesizes distributive shields, which are reactive systems running in parallel with each MARL agent, to monitor and rectify unsafe behaviors. The shields can dynamically split, merge, and recompute based on agents' states. This design enables efficient synthesis of shields to monitor agents in complex environments without coordination overheads. We also propose an algorithm to synthesize shields without prior knowledge of the dynamics model. The proposed algorithm obtains an approximate world model by interacting with the environment during the early stage of exploration, making our MBDS enjoy formal safety guarantees with high probability. We demonstrate in simulations that our framework can surpass existing baselines in terms of safety guarantees and learning performance.

Neural State-Space Models: Empirical Evaluation of Uncertainty Quantification

  • Authors: Marco Forgione, Dario Piga
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06349
  • Pdf link: https://arxiv.org/pdf/2304.06349
  • Abstract
    Effective quantification of uncertainty is an essential and still missing step towards a greater adoption of deep-learning approaches in different applications, including mission-critical ones. In particular, investigations on the predictive uncertainty of deep-learning models describing non-linear dynamical systems are very limited to date. This paper is aimed at filling this gap and presents preliminary results on uncertainty quantification for system identification with neural state-space models. We frame the learning problem in a Bayesian probabilistic setting and obtain posterior distributions for the neural network's weights and outputs through approximate inference techniques. Based on the posterior, we construct credible intervals on the outputs and define a surprise index which can effectively diagnose usage of the model in a potentially dangerous out-of-distribution regime, where predictions cannot be trusted.

Emergence of Symbols in Neural Networks for Semantic Understanding and Communication

  • Authors: Yang Chen, Liangxuan Guo, Shan Yu
  • Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Symbolic Computation (cs.SC); Neurons and Cognition (q-bio.NC)
  • Arxiv link: https://arxiv.org/abs/2304.06377
  • Pdf link: https://arxiv.org/pdf/2304.06377
  • Abstract
    Being able to create meaningful symbols and proficiently use them for higher cognitive functions such as communication, reasoning, planning, etc., is essential and unique for human intelligence. Current deep neural networks are still far behind human's ability to create symbols for such higher cognitive functions. Here we propose a solution, named SEA-net, to endow neural networks with ability of symbol creation, semantic understanding and communication. SEA-net generates symbols that dynamically configure the network to perform specific tasks. These symbols capture compositional semantic information that enables the system to acquire new functions purely by symbolic manipulation or communication. In addition, we found that these self-generated symbols exhibit an intrinsic structure resembling that of natural language, suggesting a common framework underlying the generation and understanding of symbols in both human brains and artificial neural networks. We hope that it will be instrumental in producing more capable systems in the future that can synergize the strengths of connectionist and symbolic approaches for AI.

Energy-Efficient GPU Clusters Scheduling for Deep Learning

  • Authors: Diandian Gu, Xintong Xie, Gang Huang, Xin Jin, Xuanzhe Liu
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.06381
  • Pdf link: https://arxiv.org/pdf/2304.06381
  • Abstract
    Training deep neural networks (DNNs) is a major workload in datacenters today, resulting in a tremendously fast growth of energy consumption. It is important to reduce the energy consumption while completing the DL training jobs early in data centers. In this paper, we propose PowerFlow, a GPU clusters scheduler that reduces the average Job Completion Time (JCT) under an energy budget. We first present performance models for DL training jobs to predict the throughput and energy consumption performance with different configurations. Based on the performance models, PowerFlow dynamically allocates GPUs and adjusts the GPU-level or job-level configurations of DL training jobs. PowerFlow applies network packing and buddy allocation to job placement, thus avoiding extra energy consumed by cluster fragmentations. Evaluation results show that under the same energy consumption, PowerFlow improves the average JCT by 1.57 - 3.39 x at most, compared to competitive baselines.

TransHP: Image Classification with Hierarchical Prompting

  • Authors: Wenhao Wang, Yifan Sun, Wei Li, Yi Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06385
  • Pdf link: https://arxiv.org/pdf/2304.06385
  • Abstract
    This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task. Different from prior HIC methods, our hierarchical prompting is the first to explicitly inject ancestor-class information as a tokenized hint that benefits the descendant-class discrimination. We think it well imitates human visual recognition, i.e., humans may use the ancestor class as a prompt to draw focus on the subtle differences among descendant classes. We model this prompting mechanism into a Transformer with Hierarchical Prompting (TransHP). TransHP consists of three steps: 1) learning a set of prompt tokens to represent the coarse (ancestor) classes, 2) on-the-fly predicting the coarse class of the input image at an intermediate block, and 3) injecting the prompt token of the predicted coarse class into the intermediate feature. Though the parameters of TransHP maintain the same for all input images, the injected coarse-class prompt conditions (modifies) the subsequent feature extraction and encourages a dynamic focus on relatively subtle differences among the descendant classes. Extensive experiments show that TransHP improves image classification on accuracy (e.g., improving ViT-B/16 by +2.83% ImageNet classification accuracy), training data efficiency (e.g., +12.69% improvement under 10% ImageNet training data), and model explainability. Moreover, TransHP also performs favorably against prior HIC methods, showing that TransHP well exploits the hierarchical information.

Communicating Actor Automata -- Modelling Erlang Processes as Communicating Machines

  • Authors: Dominic Orchard (University of Kent, UK), Mihail Munteanu (Masabi Ltd.), Paulo Torrens (University of Kent, UK)
  • Subjects: Programming Languages (cs.PL)
  • Arxiv link: https://arxiv.org/abs/2304.06395
  • Pdf link: https://arxiv.org/pdf/2304.06395
  • Abstract
    Brand and Zafiropulo's notion of Communicating Finite-State Machines (CFSMs) provides a succinct and powerful model of message-passing concurrency, based around channels. However, a major variant of message-passing concurrency is not readily captured by CFSMs: the actor model. In this work, we define a variant of CFSMs, called Communicating Actor Automata, to capture the actor model of concurrency as provided by Erlang: with mailboxes, from which messages are received according to repeated application of pattern matching. Furthermore, this variant of CFSMs supports dynamic process topologies, capturing common programming idioms in the context of actor-based message-passing concurrency. This gives a new basis for modelling, specifying, and verifying Erlang programs. We also consider a class of CAAs that give rise to freedom from race conditions.

Event-based tracking of human hands

  • Authors: Laura Duarte, Mohammad Safeea, Pedro Neto
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06534
  • Pdf link: https://arxiv.org/pdf/2304.06534
  • Abstract
    This paper proposes a novel method for human hands tracking using data from an event camera. The event camera detects changes in brightness, measuring motion, with low latency, no motion blur, low power consumption and high dynamic range. Captured frames are analysed using lightweight algorithms reporting 3D hand position data. The chosen pick-and-place scenario serves as an example input for collaborative human-robot interactions and in obstacle avoidance for human-robot safety applications. Events data are pre-processed into intensity frames. The regions of interest (ROI) are defined through object edge event activity, reducing noise. ROI features are extracted for use in-depth perception. Event-based tracking of human hand demonstrated feasible, in real time and at a low computational cost. The proposed ROI-finding method reduces noise from intensity images, achieving up to 89% of data reduction in relation to the original, while preserving the features. The depth estimation error in relation to ground truth (measured with wearables), measured using dynamic time warping and using a single event camera, is from 15 to 30 millimetres, depending on the plane it is measured. Tracking of human hands in 3D space using a single event camera data and lightweight algorithms to define ROI features (hands tracking in space).

DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos

  • Authors: Qi Zhao, M. Salman Asif, Zhan Ma
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06544
  • Pdf link: https://arxiv.org/pdf/2304.06544
  • Abstract
    Existing implicit neural representation (INR) methods do not fully exploit spatiotemporal redundancies in videos. Index-based INRs ignore the content-specific spatial features and hybrid INRs ignore the contextual dependency on adjacent frames, leading to poor modeling capability for scenes with large motion or dynamics. We analyze this limitation from the perspective of function fitting and reveal the importance of frame difference. To use explicit motion information, we propose Difference Neural Representation for Videos (DNeRV), which consists of two streams for content and frame difference. We also introduce a collaborative content unit for effective feature fusion. We test DNeRV for video compression, inpainting, and interpolation. DNeRV achieves competitive results against the state-of-the-art neural compression approaches and outperforms existing implicit methods on downstream inpainting and interpolation for $960 \times 1920$ videos.

Class-Incremental Learning of Plant and Disease Detection: Growing Branches with Knowledge Distillation

  • Authors: Mathieu Pagé Fortin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06619
  • Pdf link: https://arxiv.org/pdf/2304.06619
  • Abstract
    This paper investigates the problem of class-incremental object detection for agricultural applications where a model needs to learn new plant species and diseases incrementally without forgetting the previously learned ones. We adapt two public datasets to include new categories over time, simulating a more realistic and dynamic scenario. We then compare three class-incremental learning methods that leverage different forms of knowledge distillation to mitigate catastrophic forgetting. Our experiments show that all three methods suffer from catastrophic forgetting, but the recent Dynamic Y-KD approach, which additionally uses a dynamic architecture that grows new branches to learn new tasks, outperforms ILOD and Faster-ILOD in most scenarios both on new and old classes. These results highlight the challenges and opportunities of continual object detection for agricultural applications. In particular, the large intra-class and small inter-class variability that is typical of plant images exacerbate the difficulty of learning new categories without interfering with previous knowledge. We publicly release our code to encourage future work.

Robustness Measures and Monitors for Time Window Temporal Logic

  • Authors: Ahmad Ahmad, Cristian-Ioan Vasile, Roberto Tron, Calin Belta
  • Subjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.06645
  • Pdf link: https://arxiv.org/pdf/2304.06645
  • Abstract
    Temporal logics (TLs) have been widely used to formalize interpretable tasks for cyber-physical systems. Time Window Temporal Logic (TWTL) has been recently proposed as a specification language for dynamical systems. In particular, it can easily express robotic tasks, and it allows for efficient, automata-based verification and synthesis of control policies for such systems. In this paper, we define two quantitative semantics for this logic, and two corresponding monitoring algorithms, which allow for real-time quantification of satisfaction of formulas by trajectories of discrete-time systems. We demonstrate the new semantics and their runtime monitors on numerical examples.

ProtoDiv: Prototype-guided Division of Consistent Pseudo-bags for Whole-slide Image Classification

  • Authors: Rui Yang, Pei Liu, Luping Ji
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06652
  • Pdf link: https://arxiv.org/pdf/2304.06652
  • Abstract
    Due to the limitations of inadequate Whole-Slide Image (WSI) samples with weak labels, pseudo-bag-based multiple instance learning (MIL) appears as a vibrant prospect in WSI classification. However, the pseudo-bag dividing scheme, often crucial for classification performance, is still an open topic worth exploring. Therefore, this paper proposes a novel scheme, ProtoDiv, using a bag prototype to guide the division of WSI pseudo-bags. Rather than designing complex network architecture, this scheme takes a plugin-and-play approach to safely augment WSI data for effective training while preserving sample consistency. Furthermore, we specially devise an attention-based prototype that could be optimized dynamically in training to adapt to a classification task. We apply our ProtoDiv scheme on seven baseline models, and then carry out a group of comparison experiments on two public WSI datasets. Experiments confirm our ProtoDiv could usually bring obvious performance improvements to WSI classification.

D-SVM over Networked Systems with Non-Ideal Linking Conditions

  • Authors: Mohammadreza Doostmohammadian, Alireza Aghasi, Houman Zarrabi
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.06667
  • Pdf link: https://arxiv.org/pdf/2304.06667
  • Abstract
    This paper considers distributed optimization algorithms, with application in binary classification via distributed support-vector-machines (D-SVM) over multi-agent networks subject to some link nonlinearities. The agents solve a consensus-constraint distributed optimization cooperatively via continuous-time dynamics, while the links are subject to strongly sign-preserving odd nonlinear conditions. Logarithmic quantization and clipping (saturation) are two examples of such nonlinearities. In contrast to existing literature that mostly considers ideal links and perfect information exchange over linear channels, we show how general sector-bounded models affect the convergence to the optimizer (i.e., the SVM classifier) over dynamic balanced directed networks. In general, any odd sector-bounded nonlinear mapping can be applied to our dynamics. The main challenge is to show that the proposed system dynamics always have one zero eigenvalue (associated with the consensus) and the other eigenvalues all have negative real parts. This is done by recalling arguments from matrix perturbation theory. Then, the solution is shown to converge to the agreement state under certain conditions. For example, the gradient tracking (GT) step size is tighter than the linear case by factors related to the upper/lower sector bounds. To the best of our knowledge, no existing work in distributed optimization and learning literature considers non-ideal link conditions.

Inertia-Aware Microgrid Investment Planning Using Tractable Decomposition Algorithms

  • Authors: Agnes Marjorie Nakiganda, Shahab Dehghan, Petros Aristidou
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06674
  • Pdf link: https://arxiv.org/pdf/2304.06674
  • Abstract
    The integration of the frequency dynamics into Micro-Grid (MG) investment and operational planning problems is vital in ensuring the security of the system in the post-contingency states. However, the task of including transient security constraints in MG planning problems is non-trivial. This is due to the highly non-linear and non-convex nature of the analytical closed form of the frequency metrics (e.g., frequency nadir) and power flow constraints. To handle this issue, this paper presents two algorithms for decomposing the MG investment planning problem into multiple levels to enhance computational tractability and optimality. Furthermore, the sensitivity of the decisions made at each level is captured by corresponding dual cutting planes to model feasible secure regions. This, in turn, ensures both the optimal determination and placement of inertia services and accelerates the convergence of the proposed decomposition algorithms. The efficient and effective performance of the proposed algorithms is tested and verified on an 18-bus Low Voltage (LV) network and a 30-bus Medium Voltage (MV) network under various operating scenarios.

OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems

  • Authors: Jiachang Liu, Sam Rosen, Chudi Zhong, Cynthia Rudin
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.06686
  • Pdf link: https://arxiv.org/pdf/2304.06686
  • Abstract
    We consider an important problem in scientific discovery, identifying sparse governing equations for nonlinear dynamical systems. This involves solving sparse ridge regression problems to provable optimality in order to determine which terms drive the underlying dynamics. We propose a fast algorithm, OKRidge, for sparse ridge regression, using a novel lower bound calculation involving, first, a saddle point formulation, and from there, either solving (i) a linear system or (ii) using an ADMM-based approach, where the proximal operators can be efficiently evaluated by solving another linear system and an isotonic regression problem. We also propose a method to warm-start our solver, which leverages a beam search. Experimentally, our methods attain provable optimality with run times that are orders of magnitude faster than those of the existing MIP formulations solved by the commercial solver Gurobi.

Representing Volumetric Videos as Dynamic MLP Maps

  • Authors: Sida Peng, Yunzhi Yan, Qing Shuai, Hujun Bao, Xiaowei Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06717
  • Pdf link: https://arxiv.org/pdf/2304.06717
  • Abstract
    This paper introduces a novel representation of volumetric videos for real-time view synthesis of dynamic scenes. Recent advances in neural scene representations demonstrate their remarkable capability to model and render complex static scenes, but extending them to represent dynamic scenes is not straightforward due to their slow rendering speed or high storage cost. To solve this problem, our key idea is to represent the radiance field of each frame as a set of shallow MLP networks whose parameters are stored in 2D grids, called MLP maps, and dynamically predicted by a 2D CNN decoder shared by all frames. Representing 3D scenes with shallow MLPs significantly improves the rendering speed, while dynamically predicting MLP parameters with a shared 2D CNN instead of explicitly storing them leads to low storage cost. Experiments show that the proposed approach achieves state-of-the-art rendering quality on the NHR and ZJU-MoCap datasets, while being efficient for real-time rendering with a speed of 41.7 fps for $512 \times 512$ images on an RTX 3090 GPU. The code is available at https://zju3dv.github.io/mlp_maps/.

New submissions for Thu, 27 Apr 23

Keyword: efficient

VeML: An End-to-End Machine Learning Lifecycle for Large-scale and High-dimensional Data

  • Authors: Van-Duc Le
  • Subjects: Machine Learning (cs.LG); Databases (cs.DB); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.13037
  • Pdf link: https://arxiv.org/pdf/2304.13037
  • Abstract
    An end-to-end machine learning (ML) lifecycle consists of many iterative processes, from data preparation and ML model design to model training and then deploying the trained model for inference. When building an end-to-end lifecycle for an ML problem, many ML pipelines must be designed and executed that produce a huge number of lifecycle versions. Therefore, this paper introduces VeML, a Version management system dedicated to end-to-end ML Lifecycle. Our system tackles several crucial problems that other systems have not solved. First, we address the high cost of building an ML lifecycle, especially for large-scale and high-dimensional dataset. We solve this problem by proposing to transfer the lifecycle of similar datasets managed in our system to the new training data. We design an algorithm based on the core set to compute similarity for large-scale, high-dimensional data efficiently. Another critical issue is the model accuracy degradation by the difference between training data and testing data during the ML lifetime, which leads to lifecycle rebuild. Our system helps to detect this mismatch without getting labeled data from testing data and rebuild the ML lifecycle for a new data version. To demonstrate our contributions, we conduct experiments on real-world, large-scale datasets of driving images and spatiotemporal sensor data and show promising results.

Diffusion Probabilistic Model Based Accurate and High-Degree-of-Freedom Metasurface Inverse Design

  • Authors: Zezhou Zhang, Chuanchuan Yang, Yifeng Qin, Hao Feng, Jiqiang Feng, Hongbin Li
  • Subjects: Machine Learning (cs.LG); Optics (physics.optics)
  • Arxiv link: https://arxiv.org/abs/2304.13038
  • Pdf link: https://arxiv.org/pdf/2304.13038
  • Abstract
    Conventional meta-atom designs rely heavily on researchers' prior knowledge and trial-and-error searches using full-wave simulations, resulting in time-consuming and inefficient processes. Inverse design methods based on optimization algorithms, such as evolutionary algorithms, and topological optimizations, have been introduced to design metamaterials. However, none of these algorithms are general enough to fulfill multi-objective tasks. Recently, deep learning methods represented by Generative Adversarial Networks (GANs) have been applied to inverse design of metamaterials, which can directly generate high-degree-of-freedom meta-atoms based on S-parameter requirements. However, the adversarial training process of GANs makes the network unstable and results in high modeling costs. This paper proposes a novel metamaterial inverse design method based on the diffusion probability theory. By learning the Markov process that transforms the original structure into a Gaussian distribution, the proposed method can gradually remove the noise starting from the Gaussian distribution and generate new high-degree-of-freedom meta-atoms that meet S-parameter conditions, which avoids the model instability introduced by the adversarial training process of GANs and ensures more accurate and high-quality generation results. Experiments have proven that our method is superior to representative methods of GANs in terms of model convergence speed, generation accuracy, and quality.

Optimizing Deep Learning Models For Raspberry Pi

  • Authors: Salem Ameen, Kangaranmulle Siriwardana, Theo Theodoridis
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.13039
  • Pdf link: https://arxiv.org/pdf/2304.13039
  • Abstract
    Deep learning models have become increasingly popular for a wide range of applications, including computer vision, natural language processing, and speech recognition. However, these models typically require large amounts of computational resources, making them challenging to run on low-power devices such as the Raspberry Pi. One approach to addressing this challenge is to use pruning techniques to reduce the size of the deep learning models. Pruning involves removing unimportant weights and connections from the model, resulting in a smaller and more efficient model. Pruning can be done during training or after the model has been trained. Another approach is to optimize the deep learning models specifically for the Raspberry Pi architecture. This can include optimizing the model's architecture and parameters to take advantage of the Raspberry Pi's hardware capabilities, such as its CPU and GPU. Additionally, the model can be optimized for energy efficiency by minimizing the amount of computation required. Pruning and optimizing deep learning models for the Raspberry Pi can help overcome the computational and energy constraints of low-power devices, making it possible to run deep learning models on a wider range of devices. In the following sections, we will explore these approaches in more detail and discuss their effectiveness for optimizing deep learning models for the Raspberry Pi.

Organizational Governance of Emerging Technologies: AI Adoption in Healthcare

  • Authors: Jee Young Kim, William Boag, Freya Gulamali, Alifia Hasan, Henry David Jeffry Hogg, Mark Lifson, Deirdre Mulligan, Manesh Patel, Inioluwa Deborah Raji, Ajai Sehgal, Keo Shaw, Danny Tobey, Alexandra Valladares, David Vidal, Suresh Balu, Mark Sendak
  • Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13081
  • Pdf link: https://arxiv.org/pdf/2304.13081
  • Abstract
    Private and public sector structures and norms refine how emerging technology is used in practice. In healthcare, despite a proliferation of AI adoption, the organizational governance surrounding its use and integration is often poorly understood. What the Health AI Partnership (HAIP) aims to do in this research is to better define the requirements for adequate organizational governance of AI systems in healthcare settings and support health system leaders to make more informed decisions around AI adoption. To work towards this understanding, we first identify how the standards for the AI adoption in healthcare may be designed to be used easily and efficiently. Then, we map out the precise decision points involved in the practical institutional adoption of AI technology within specific health systems. Practically, we achieve this through a multi-organizational collaboration with leaders from major health systems across the United States and key informants from related fields. Working with the consultancy IDEO.org, we were able to conduct usability-testing sessions with healthcare and AI ethics professionals. Usability analysis revealed a prototype structured around mock key decision points that align with how organizational leaders approach technology adoption. Concurrently, we conducted semi-structured interviews with 89 professionals in healthcare and other relevant fields. Using a modified grounded theory approach, we were able to identify 8 key decision points and comprehensive procedures throughout the AI adoption lifecycle. This is one of the most detailed qualitative analyses to date of the current governance structures and processes involved in AI adoption by health systems in the United States. We hope these findings can inform future efforts to build capabilities to promote the safe, effective, and responsible adoption of emerging technologies in healthcare.

Bridging graph data models: RDF, RDF-star, and property graphs as directed acyclic graphs

  • Authors: Ewout Gelling, George Fletcher, Michael Schmidt
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.13097
  • Pdf link: https://arxiv.org/pdf/2304.13097
  • Abstract
    Graph database users today face a choice between two technology stacks: the Resource Description Framework (RDF), on one side, is a data model with built-in semantics that was originally developed by the W3C to exchange interconnected data on the Web; on the other side, Labeled Property Graphs (LPGs) are geared towards efficient graph processing and have strong roots in developer and engineering communities. The two models look at graphs from different abstraction layers (triples in RDF vs. edges connecting vertices with inlined properties in LPGs), expose - at least at the surface - distinct features, come with different query languages, and are embedded into their own software ecosystems. In this short paper, we introduce a novel unifying graph data model called Statement Graphs, which combines the traits of both RDF and LPG and achieves interoperability at different levels: it (a) provides the ability to manage RDF and LPG data as a single, interconnected graph, (b) supports querying over the integrated graph using any RDF or LPG query language, while (c) clearing the way for graph stack independent data exchange mechanisms and formats. We formalize our new model as directed acyclic graphs and sketch a system of bidirectional mappings between RDF, LPGs, and Statement Graphs. Our mappings implicitly define read query semantics for RDF and LPGs query languages over the unified data model, thus providing graph users with the flexibility to use the query language of their choice for their graph use cases. As a proof of concept for our ideas, we also present the 1G Playground; an in-memory DBMS built on the concepts of Statement Graphs, which facilitates storage of both RDF and LPG data, and allows for cross-model querying using both SPARQL and Gremlin.

Exponentially Convergent Numerical Method for Abstract Cauchy Problem with Fractional Derivative of Caputo Type

  • Authors: Dmytro Sytnyk, Barbara Wohlmuth
  • Subjects: Numerical Analysis (math.NA); Mathematical Software (cs.MS); Analysis of PDEs (math.AP); Classical Analysis and ODEs (math.CA)
  • Arxiv link: https://arxiv.org/abs/2304.13099
  • Pdf link: https://arxiv.org/pdf/2304.13099
  • Abstract
    We present an exponentially convergent numerical method to approximate the solution of the Cauchy problem for the inhomogeneous fractional differential equation with an unbounded operator coefficient and Caputo fractional derivative in time. The numerical method is based on the newly obtained solution formula that consolidates the mild solution representations of sub-parabolic, parabolic and sub-hyperbolic equations with sectorial operator coefficient $A$ and non-zero initial data. The involved integral operators are approximated using the sinc-quadrature formulas that are tailored to the spectral parameters of $A$, fractional order $\alpha$ and the smoothness of the first initial condition, as well as to the properties of the equation's right-hand side $f(t)$. The resulting method possesses exponential convergence for positive sectorial $A$, any finite $t$, including $t = 0$, and the whole range $\alpha \in (0,2)$. It is suitable for a practically important case, when no knowledge of $f(t)$ is available outside the considered interval $t \in [0, T]$. The algorithm of the method is capable of multi-level parallelism. We provide numerical examples that confirm the theoretical error estimates.

Application of Transformers for Nonlinear Channel Compensation in Optical Systems

  • Authors: Behnam Behinaein Hamgini, Hossein Najafi, Ali Bakhshali, Zhuhong Zhang
  • Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.13119
  • Pdf link: https://arxiv.org/pdf/2304.13119
  • Abstract
    In this paper, we introduce a new nonlinear channel equalization method for the coherent long-haul transmission based on Transformers. We show that due to their capability to attend directly to the memory across a sequence of symbols, Transformers can be used effectively with a parallelized structure. We present an implementation of encoder part of Transformer for nonlinear equalization and analyze its performance over a wide range of different hyper-parameters. It is shown that by processing blocks of symbols at each iteration and carefully selecting subsets of the encoder's output to be processed together, an efficient nonlinear compensation can be achieved. We also propose the use of a physic-informed mask inspired by nonlinear perturbation theory for reducing the computational complexity of Transformer nonlinear equalization.

Directed Chain Generative Adversarial Networks

  • Authors: Ming Min, Ruimeng Hu, Tomoyuki Ichiba
  • Subjects: Machine Learning (cs.LG); Probability (math.PR)
  • Arxiv link: https://arxiv.org/abs/2304.13131
  • Pdf link: https://arxiv.org/pdf/2304.13131
  • Abstract
    Real-world data can be multimodal distributed, e.g., data describing the opinion divergence in a community, the interspike interval distribution of neurons, and the oscillators natural frequencies. Generating multimodal distributed real-world data has become a challenge to existing generative adversarial networks (GANs). For example, neural stochastic differential equations (Neural SDEs), treated as infinite-dimensional GANs, have demonstrated successful performance mainly in generating unimodal time series data. In this paper, we propose a novel time series generator, named directed chain GANs (DC-GANs), which inserts a time series dataset (called a neighborhood process of the directed chain or input) into the drift and diffusion coefficients of the directed chain SDEs with distributional constraints. DC-GANs can generate new time series of the same distribution as the neighborhood process, and the neighborhood process will provide the key step in learning and generating multimodal distributed time series. The proposed DC-GANs are examined on four datasets, including two stochastic models from social sciences and computational neuroscience, and two real-world datasets on stock prices and energy consumption. To our best knowledge, DC-GANs are the first work that can generate multimodal time series data and consistently outperforms state-of-the-art benchmarks with respect to measures of distribution, data similarity, and predictive ability.

ESimCSE Unsupervised Contrastive Learning Jointly with UDA Semi-Supervised Learning for Large Label System Text Classification Mode

  • Authors: Ruan Lu, Zhou HangCheng, Ran Meng, Zhao Jin, Qin JiaoYu, Wei Feng, Wang ChenZi
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.13140
  • Pdf link: https://arxiv.org/pdf/2304.13140
  • Abstract
    The challenges faced by text classification with large tag systems in natural language processing tasks include multiple tag systems, uneven data distribution, and high noise. To address these problems, the ESimCSE unsupervised comparative learning and UDA semi-supervised comparative learning models are combined through the use of joint training techniques in the models.The ESimCSE model efficiently learns text vector representations using unlabeled data to achieve better classification results, while UDA is trained using unlabeled data through semi-supervised learning methods to improve the prediction performance of the models and stability, and further improve the generalization ability of the model. In addition, adversarial training techniques FGM and PGD are used in the model training process to improve the robustness and reliability of the model. The experimental results show that there is an 8% and 10% accuracy improvement relative to Baseline on the public dataset Ruesters as well as on the operational dataset, respectively, and a 15% improvement in manual validation accuracy can be achieved on the operational dataset, indicating that the method is effective.

LumiGAN: Unconditional Generation of Relightable 3D Human Faces

  • Authors: Boyang Deng, Yifan Wang, Gordon Wetzstein
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13153
  • Pdf link: https://arxiv.org/pdf/2304.13153
  • Abstract
    Unsupervised learning of 3D human faces from unstructured 2D image data is an active research area. While recent works have achieved an impressive level of photorealism, they commonly lack control of lighting, which prevents the generated assets from being deployed in novel environments. To this end, we introduce LumiGAN, an unconditional Generative Adversarial Network (GAN) for 3D human faces with a physically based lighting module that enables relighting under novel illumination at inference time. Unlike prior work, LumiGAN can create realistic shadow effects using an efficient visibility formulation that is learned in a self-supervised manner. LumiGAN generates plausible physical properties for relightable faces, including surface normals, diffuse albedo, and specular tint without any ground truth data. In addition to relightability, we demonstrate significantly improved geometry generation compared to state-of-the-art non-relightable 3D GANs and notably better photorealism than existing relightable GANs.

LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization

  • Authors: Sheng Liu, Cong Phuoc Huynh, Cong Chen, Maxim Arap, Raffay Hamid
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13166
  • Pdf link: https://arxiv.org/pdf/2304.13166
  • Abstract
    We present a simple yet effective self-supervised pre-training method for image harmonization which can leverage large-scale unannotated image datasets. To achieve this goal, we first generate pre-training data online with our Label-Efficient Masked Region Transform (LEMaRT) pipeline. Given an image, LEMaRT generates a foreground mask and then applies a set of transformations to perturb various visual attributes, e.g., defocus blur, contrast, saturation, of the region specified by the generated mask. We then pre-train image harmonization models by recovering the original image from the perturbed image. Secondly, we introduce an image harmonization model, namely SwinIH, by retrofitting the Swin Transformer [27] with a combination of local and global self-attention mechanisms. Pre-training SwinIH with LEMaRT results in a new state of the art for image harmonization, while being label-efficient, i.e., consuming less annotated data for fine-tuning than existing methods. Notably, on iHarmony4 dataset [8], SwinIH outperforms the state of the art, i.e., SCS-Co [16] by a margin of 0.4 dB when it is fine-tuned on only 50% of the training data, and by 1.0 dB when it is trained on the full training dataset.

Connector 0.5: A unified framework for graph representation learning

  • Authors: Thanh Sang Nguyen, Jooho Lee, Van Thuy Hoang, O-Joun Lee
  • Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.13195
  • Pdf link: https://arxiv.org/pdf/2304.13195
  • Abstract
    Graph representation learning models aim to represent the graph structure and its features into low-dimensional vectors in a latent space, which can benefit various downstream tasks, such as node classification and link prediction. Due to its powerful graph data modelling capabilities, various graph embedding models and libraries have been proposed to learn embeddings and help researchers ease conducting experiments. In this paper, we introduce a novel graph representation framework covering various graph embedding models, ranging from shallow to state-of-the-art models, namely Connector. First, we consider graph generation by constructing various types of graphs with different structural relations, including homogeneous, signed, heterogeneous, and knowledge graphs. Second, we introduce various graph representation learning models, ranging from shallow to deep graph embedding models. Finally, we plan to build an efficient open-source framework that can provide deep graph embedding models to represent structural relations in graphs. The framework is available at https://github.com/NSLab-CUK/Connector.

Numerical methods for computing the discrete and continuous Laplace transforms

  • Authors: Yupeng Zhang, Yueyang Shen, Rongqian Zhang, Yuyao Liu, Yunjie Guo, Daxuan Deng, Ivo D. Dinov
  • Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph); Complex Variables (math.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13204
  • Pdf link: https://arxiv.org/pdf/2304.13204
  • Abstract
    We propose a numerical method to spline-interpolate discrete signals and then apply the integral transforms to the corresponding analytical spline functions. This represents a robust and computationally efficient technique for estimating the Laplace transform for noisy data. We revisited a Meijer-G symbolic approach to compute the Laplace transform and alternative approaches to extend canonical observed time-series. A discrete quantization scheme provides the foundation for rapid and reliable estimation of the inverse Laplace transform. We derive theoretic estimates for the inverse Laplace transform of analytic functions and demonstrate empirical results validating the algorithmic performance using observed and simulated data. We also introduce a generalization of the Laplace transform in higher dimensional space-time. We tested the discrete LT algorithm on data sampled from analytic functions with known exact Laplace transforms. The validation of the discrete ILT involves using complex functions with known analytic ILTs.

Cooperative Hierarchical Deep Reinforcement Learning based Joint Sleep, Power, and RIS Control for Energy-Efficient HetNet

  • Authors: Hao Zhou, Medhat Elsayed, Majid Bavand, Raimundas Gaigalas, Steve Furr, Melike Erol-Kantarci
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.13226
  • Pdf link: https://arxiv.org/pdf/2304.13226
  • Abstract
    Energy efficiency (EE) is one of the most important metrics for 5G and future 6G networks to reduce energy costs and control carbon footprint. Sleep control, as a cost-efficient approach, can significantly lower power consumption by switching off network devices selectively. Meanwhile, reconfigurable intelligent surface (RIS) has emerged as a promising technique to enhance the EE of 5G beyond and 6G networks. In this work, we jointly consider sleep and transmission power control for reconfigurable intelligent surface (RIS)-aided energy-efficient heterogeneous networks (Hetnets). In particular, we first propose a fractional programming (FP) method for RIS phase-shift control, which aims to maximize the sum-rate under given transmission power levels. Then, considering the timescale difference between sleep control and power control, we introduce a cooperative hierarchical deep reinforcement learning (Co-HDRL) algorithm, including a cross-entropy enabled meta-controller for sleep control, and correlated equilibrium-based sub-controllers for power control. Moreover, we proposed a surrogate optimization method as one baseline for RIS control, and conventional HDRL as another baseline for sleep and power control. Finally, simulations show that the RIS-assisted sleep control can achieve more than 16% lower energy consumption and 30% higher energy efficiency than baseline algorithms.

Generating Adversarial Examples with Task Oriented Multi-Objective Optimization

  • Authors: Anh Bui, Trung Le, He Zhao, Quan Tran, Paul Montague, Dinh Phung
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13229
  • Pdf link: https://arxiv.org/pdf/2304.13229
  • Abstract
    Deep learning models, even the-state-of-the-art ones, are highly vulnerable to adversarial examples. Adversarial training is one of the most efficient methods to improve the model's robustness. The key factor for the success of adversarial training is the capability to generate qualified and divergent adversarial examples which satisfy some objectives/goals (e.g., finding adversarial examples that maximize the model losses for simultaneously attacking multiple models). Therefore, multi-objective optimization (MOO) is a natural tool for adversarial example generation to achieve multiple objectives/goals simultaneously. However, we observe that a naive application of MOO tends to maximize all objectives/goals equally, without caring if an objective/goal has been achieved yet. This leads to useless effort to further improve the goal-achieved tasks, while putting less focus on the goal-unachieved tasks. In this paper, we propose \emph{Task Oriented MOO} to address this issue, in the context where we can explicitly define the goal achievement for a task. Our principle is to only maintain the goal-achieved tasks, while letting the optimizer spend more effort on improving the goal-unachieved tasks. We conduct comprehensive experiments for our Task Oriented MOO on various adversarial example generation schemes. The experimental results firmly demonstrate the merit of our proposed approach. Our code is available at \url{https://github.com/tuananhbui89/TAMOO}.

Numerical Approximation of Andrews Plots with Optimal Spatial-Spectral Smoothing

  • Authors: Mitchell Rimerman, Nate Strawn
  • Subjects: Numerical Analysis (math.NA); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.13239
  • Pdf link: https://arxiv.org/pdf/2304.13239
  • Abstract
    Andrews plots provide aesthetically pleasant visualizations of high-dimensional datasets. This work proves that Andrews plots (when defined in terms of the principal component scores of a dataset) are optimally smooth'' on average, and solve an infinite-dimensional quadratic minimization program over the set of linear isometries from the Euclidean data space to $L^2([0,1])$. By building technical machinery that characterizes the solutions to general infinite-dimensional quadratic minimization programs over linear isometries, we further show that the solution set is (in the generic case) a manifold. To avoid the ambiguities presented by this manifold of solutions, we add spectral smoothing'' terms to the infinite-dimensional optimization program to induce Andrews plots with optimal spatial-spectral smoothing. We characterize the (generic) set of solutions to this program and prove that the resulting plots admit efficient numerical approximations. These spatial-spectral smooth Andrews plots tend to avoid some ``visual clutter'' that arises due to the oscillation of trigonometric polynomials.

Structure Diagram Recognition in Financial Announcements

  • Authors: Meixuan Qiao, Jun Wang, Junfu Xiang, Qiyu Hou, Ruixuan Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13240
  • Pdf link: https://arxiv.org/pdf/2304.13240
  • Abstract
    Accurately extracting structured data from structure diagrams in financial announcements is of great practical importance for building financial knowledge graphs and further improving the efficiency of various financial applications. First, we proposed a new method for recognizing structure diagrams in financial announcements, which can better detect and extract different types of connecting lines, including straight lines, curves, and polylines of different orientations and angles. Second, we developed a two-stage method to efficiently generate the industry's first benchmark of structure diagrams from Chinese financial announcements, where a large number of diagrams were synthesized and annotated using an automated tool to train a preliminary recognition model with fairly good performance, and then a high-quality benchmark can be obtained by automatically annotating the real-world structure diagrams using the preliminary model and then making few manual corrections. Finally, we experimentally verified the significant performance advantage of our structure diagram recognition method over previous methods.

ESCM: An Efficient and Secure Communication Mechanism for UAV Networks

  • Authors: Haoxiang Luo, Yifan Wu, Gang Sun, Hongfang Yu, Shizhong Xu, Mohsen Guizani
  • Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.13244
  • Pdf link: https://arxiv.org/pdf/2304.13244
  • Abstract
    UAV (unmanned aerial vehicle) is gradually entering various human activities. It has also become an important part of satellite-air-ground-sea integrated network (SAGS) for 6G communication. In order to achieve high mobility, UAV has strict requirements on communication latency, and it cannot be illegally controlled as weapons of attack with malicious intentions. Therefore, an efficient and secure communication method specifically designed for UAV network is required. This paper proposes a communication mechanism named ESCM for the above requirements. For high efficiency of communication, ESCM designs a routing protocol based on artificial bee colony algorithm (ABC) for UAV network to accelerate communication between UAVs. Meanwhile, we plan to use blockchain to guarantee the communication security of UAV networks. However, blockchain has unstable links in high mobility network scenarios, resulting in low consensus efficiency and high communication overhead. Therefore, ESCM also introduces the concept of the digital twin, mapping the UAVs from the physical world into Cyberspace, transforming the UAV network into a static network. And this virtual UAV network is called CyberUAV. Then, in CyberUAV, we design a blockchain system and propose a consensus algorithm based on network coding, named proof of network coding (PoNC). PoNC not only ensures the security of ESCM, but also further improves the performance of ESCM through network coding. Simulation results show that ESCM has obvious advantages in communication efficiency and security. Moreover, encoding messages through PoNC consensus can increase the network throughput, and make mobile blockchain static through digital twin can improve the consensus success rate.

CrowdCache: A Decentralized Game-Theoretic Framework for Mobile Edge Content Sharing

  • Authors: Duong Thuy Anh Nguyen, Jiaming Cheng, Duong Tung Nguyen, Angelia Nedich
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2304.13246
  • Pdf link: https://arxiv.org/pdf/2304.13246
  • Abstract
    Mobile edge computing (MEC) is a promising solution for enhancing the user experience, minimizing content delivery expenses, and reducing backhaul traffic. In this paper, we propose a novel privacy-preserving decentralized game-theoretic framework for resource crowdsourcing in MEC. Our framework models the interactions between a content provider (CP) and multiple mobile edge device users (MEDs) as a non-cooperative game, in which MEDs offer idle storage resources for content caching in exchange for rewards. We introduce efficient decentralized gradient play algorithms for Nash equilibrium (NE) computation by exchanging local information among neighboring MEDs only, thus preventing attackers from learning users' private information. The key challenge in designing such algorithms is that communication among MEDs is not fixed and is facilitated by a sequence of undirected time-varying graphs. Our approach achieves linear convergence to the NE without imposing any assumptions on the values of parameters in the local objective functions, such as requiring strong monotonicity to be stronger than its dependence on other MEDs' actions, which is commonly required in existing literature when the graph is directed time-varying. Extensive simulations demonstrate the effectiveness of our approach in achieving efficient resource outsourcing decisions while preserving the privacy of the edge devices.

Solution of planar elastic stress problems using stress basis functions

  • Authors: Sankalp Tiwari, Anindya Chatterjee
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13251
  • Pdf link: https://arxiv.org/pdf/2304.13251
  • Abstract
    The use of global displacement basis functions to solve boundary-value problems in linear elasticity is well established. No prior work uses a global stress tensor basis for such solutions. We present two such methods for solving stress problems in linear elasticity. In both methods, we split the sought stress $\sigma$ into two parts, where neither part is required to satisfy strain compatibility. The first part, $\sigma_p$, is any stress in equilibrium with the loading. The second part, $\sigma_h$, is a self-equilibrated stress field on the unloaded body. In both methods, $\sigma_h$ is expanded using tensor-valued global stress basis functions developed elsewhere. In the first method, the coefficients in the expansion are found by minimizing the strain energy based on the well-known complementary energy principle. For the second method, which is restricted to planar homogeneous isotropic bodies, we show that we merely need to minimize the squared $L^2$ norm of the trace of stress. For demonstration, we solve eight stress problems involving sharp corners, multiple-connectedness, non-zero net force and/or moment on an internal hole, body force, discontinuous surface traction, material inhomogeneity, and anisotropy. The first method presents a new application of a known principle. The second method presents a hitherto unreported principle, to the best of our knowledge.

C2PI: An Efficient Crypto-Clear Two-Party Neural Network Private Inference

  • Authors: Yuke Zhang, Dake Chen, Souvik Kundu, Haomei Liu, Ruiheng Peng, Peter A. Beerel
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.13266
  • Pdf link: https://arxiv.org/pdf/2304.13266
  • Abstract
    Recently, private inference (PI) has addressed the rising concern over data and model privacy in machine learning inference as a service. However, existing PI frameworks suffer from high computational and communication costs due to the expensive multi-party computation (MPC) protocols. Existing literature has developed lighter MPC protocols to yield more efficient PI schemes. We, in contrast, propose to lighten them by introducing an empirically-defined privacy evaluation. To that end, we reformulate the threat model of PI and use inference data privacy attacks (IDPAs) to evaluate data privacy. We then present an enhanced IDPA, named distillation-based inverse-network attack (DINA), for improved privacy evaluation. Finally, we leverage the findings from DINA and propose C2PI, a two-party PI framework presenting an efficient partitioning of the neural network model and requiring only the initial few layers to be performed with MPC protocols. Based on our experimental evaluations, relaxing the formal data privacy guarantees C2PI can speed up existing PI frameworks, including Delphi [1] and Cheetah [2], up to 2.89x and 3.88x under LAN and WAN settings, respectively, and save up to 2.75x communication costs.

Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference

  • Authors: Souvik Kundu, Yuke Zhang, Dake Chen, Peter A. Beerel
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.13274
  • Pdf link: https://arxiv.org/pdf/2304.13274
  • Abstract
    Large number of ReLU and MAC operations of Deep neural networks make them ill-suited for latency and compute-efficient private inference. In this paper, we present a model optimization method that allows a model to learn to be shallow. In particular, we leverage the ReLU sensitivity of a convolutional block to remove a ReLU layer and merge its succeeding and preceding convolution layers to a shallow block. Unlike existing ReLU reduction methods, our joint reduction method can yield models with improved reduction of both ReLUs and linear operations by up to 1.73x and 1.47x, respectively, evaluated with ResNet18 on CIFAR-100 without any significant accuracy-drop.

Membrane Potential Distribution Adjustment and Parametric Surrogate Gradient in Spiking Neural Networks

  • Authors: Siqi Wang, Tee Hiang Cheng, Meng-Hiot Lim
  • Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.13289
  • Pdf link: https://arxiv.org/pdf/2304.13289
  • Abstract
    As an emerging network model, spiking neural networks (SNNs) have aroused significant research attentions in recent years. However, the energy-efficient binary spikes do not augur well with gradient descent-based training approaches. Surrogate gradient (SG) strategy is investigated and applied to circumvent this issue and train SNNs from scratch. Due to the lack of well-recognized SG selection rule, most SGs are chosen intuitively. We propose the parametric surrogate gradient (PSG) method to iteratively update SG and eventually determine an optimal surrogate gradient parameter, which calibrates the shape of candidate SGs. In SNNs, neural potential distribution tends to deviate unpredictably due to quantization error. We evaluate such potential shift and propose methodology for potential distribution adjustment (PDA) to minimize the loss of undesired pre-activations. Experimental results demonstrate that the proposed methods can be readily integrated with backpropagation through time (BPTT) algorithm and help modulated SNNs to achieve state-of-the-art performance on both static and dynamic dataset with fewer timesteps.

Scene Graph Lossless Compression with Adaptive Prediction for Objects and Relations

  • Authors: Yufeng Zhang, Weiyao Lin, Wenrui Dai, Huabin Liu, Hongkai Xiong
  • Subjects: Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2304.13359
  • Pdf link: https://arxiv.org/pdf/2304.13359
  • Abstract
    The scene graph is a new data structure describing objects and their pairwise relationship within image scenes. As the size of scene graph in vision applications grows, how to losslessly and efficiently store such data on disks or transmit over the network becomes an inevitable problem. However, the compression of scene graph is seldom studied before because of the complicated data structures and distributions. Existing solutions usually involve general-purpose compressors or graph structure compression methods, which is weak at reducing redundancy for scene graph data. This paper introduces a new lossless compression framework with adaptive predictors for joint compression of objects and relations in scene graph data. The proposed framework consists of a unified prior extractor and specialized element predictors to adapt for different data elements. Furthermore, to exploit the context information within and between graph elements, Graph Context Convolution is proposed to support different graph context modeling schemes for different graph elements. Finally, a learned distribution model is devised to predict numerical data under complicated conditional constraints. Experiments conducted on labeled or generated scene graphs proves the effectiveness of the proposed framework in scene graph lossless compression task.

Efficient Explainable Face Verification based on Similarity Score Argument Backpropagation

  • Authors: Marco Huber, Anh Thi Luu, Philipp Terhörst, Naser Damer
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13409
  • Pdf link: https://arxiv.org/pdf/2304.13409
  • Abstract
    Explainable Face Recognition is gaining growing attention as the use of the technology is gaining ground in security-critical applications. Understanding why two faces images are matched or not matched by a given face recognition system is important to operators, users, anddevelopers to increase trust, accountability, develop better systems, and highlight unfair behavior. In this work, we propose xSSAB, an approach to back-propagate similarity score-based arguments that support or oppose the face matching decision to visualize spatial maps that indicate similar and dissimilar areas as interpreted by the underlying FR model. Furthermore, we present Patch-LFW, a new explainable face verification benchmark that enables along with a novel evaluation protocol, the first quantitative evaluation of the validity of similarity and dissimilarity maps in explainable face recognition approaches. We compare our efficient approach to state-of-the-art approaches demonstrating a superior trade-off between efficiency and performance. The code as well as the proposed Patch-LFW is publicly available at: https://github.com/marcohuber/xSSAB.

Fair Selection of Edge Nodes to Participate in Clustered Federated Multitask Learning

  • Authors: Abdullatif Albaseer, Mohamed Abdallah, Ala Al-Fuqaha, Abegaz Mohammed, Aiman Erbad, Octavia A. Dobre
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.13423
  • Pdf link: https://arxiv.org/pdf/2304.13423
  • Abstract
    Clustered federated Multitask learning is introduced as an efficient technique when data is unbalanced and distributed amongst clients in a non-independent and identically distributed manner. While a similarity metric can provide client groups with specialized models according to their data distribution, this process can be time-consuming because the server needs to capture all data distribution first from all clients to perform the correct clustering. Due to resource and time constraints at the network edge, only a fraction of devices {is} selected every round, necessitating the need for an efficient scheduling technique to address these issues. Thus, this paper introduces a two-phased client selection and scheduling approach to improve the convergence speed while capturing all data distributions. This approach ensures correct clustering and fairness between clients by leveraging bandwidth reuse for participants spent a longer time training their models and exploiting the heterogeneity in the devices to schedule the participants according to their delay. The server then performs the clustering depending on predetermined thresholds and stopping criteria. When a specified cluster approximates a stopping point, the server employs a greedy selection for that cluster by picking the devices with lower delay and better resources. The convergence analysis is provided, showing the relationship between the proposed scheduling approach and the convergence rate of the specialized models to obtain convergence bounds under non-i.i.d. data distribution. We carry out extensive simulations, and the results demonstrate that the proposed algorithms reduce training time and improve the convergence speed while equipping every user with a customized model tailored to its data distribution.

FLEX: an Adaptive Exploration Algorithm for Nonlinear Systems

  • Authors: Matthieu Blanke, Marc Lelarge
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.13426
  • Pdf link: https://arxiv.org/pdf/2304.13426
  • Abstract
    Model-based reinforcement learning is a powerful tool, but collecting data to fit an accurate model of the system can be costly. Exploring an unknown environment in a sample-efficient manner is hence of great importance. However, the complexity of dynamics and the computational limitations of real systems make this task challenging. In this work, we introduce FLEX, an exploration algorithm for nonlinear dynamics based on optimal experimental design. Our policy maximizes the information of the next step and results in an adaptive exploration algorithm, compatible with generic parametric learning models and requiring minimal resources. We test our method on a number of nonlinear environments covering different settings, including time-varying dynamics. Keeping in mind that exploration is intended to serve an exploitation objective, we also test our algorithm on downstream model-based classical control tasks and compare it to other state-of-the-art model-based and model-free approaches. The performance achieved by FLEX is competitive and its computational cost is low.

An efficient multiple harmonic balance method for computing quasi-periodic responses of nonlinear systems

  • Authors: Qisi Wang, Zipu Yan, Honghua Dai
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13446
  • Pdf link: https://arxiv.org/pdf/2304.13446
  • Abstract
    Quasi-periodic responses composed of multiple base frequencies widely exist in science and engineering problems. The multiple harmonic balance (MHB) method is one of the most commonly used approaches for such problems. However, it is limited by low-order estimations due to complex symbolic operations in practical uses. Many variants have been developed to improve the MHB method, among which the time domain MHB-like methods are regarded as crucial improvements because of their high efficiency and simple derivation. But there is still one main drawback remaining to be addressed. The time domain MHB-like methods negatively suffer from non-physical solutions, which have been shown to be caused by aliasing (mixtures of the high-order into the low-order harmonics). Inspired by the collocation-based harmonic balancing framework recently established by our group, we herein propose a reconstruction multiple harmonic balance (RMHB) method to reconstruct the conventional MHB method using discrete time domain collocations. Our study shows that the relation between the MHB and time domain MHB-like methods is determined by an aliasing matrix, which is non-zero when aliasing occurs. On this basis, a conditional equivalence is established to form the RMHB method. Three numerical examples demonstrate that this new method is more robust and efficient than the state-of-the-art methods.

An Improved Modular Addition Checksum Algorithm

  • Authors: Philip Koopman
  • Subjects: Data Structures and Algorithms (cs.DS); Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.13496
  • Pdf link: https://arxiv.org/pdf/2304.13496
  • Abstract
    This paper introduces a checksum algorithm that provides a new point in the performance/complexity/effectiveness checksum tradeoff space. It has better fault detection properties than single-sum and dual-sum modular addition checksums. It is also simpler to compute efficiently than a cyclic redundancy check (CRC) due to exploiting commonly available hardware and programming language support for unsigned integer division. The key idea is to compute a single running sum, but introduce a left shift by the size (in bits) of the modulus before performing the modular reduction after each addition step. This approach provides a Hamming Distance of 3 for longer data word lengths than dual-sum approaches such as the Fletcher checksum. Moreover, it provides this capability using a single running sum that is only twice the size of the final computed check value, while providing fault detection capabilities even better than large-block variants of dual-sum approaches that require larger division operations.

Leveraging Compositional Methods for Modeling and Verification of an Autonomous Taxi System

  • Authors: Alessandro Pinto, Anthony Corso, Edward Schmerling
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.13517
  • Pdf link: https://arxiv.org/pdf/2304.13517
  • Abstract
    We apply a compositional formal modeling and verification method to an autonomous aircraft taxi system. We provide insights into the modeling approach and we identify several research areas where further development is needed. Specifically, we identify the following needs: (1) semantics of composition of viewpoints expressed in different specification languages, and tools to reason about heterogeneous declarative models; (2) libraries of formal models for autonomous systems to speed up modeling and enable efficient reasoning; (3) methods to lift verification results generated by automated reasoning tools to the specification level; (4) probabilistic contract frameworks to reason about imperfect implementations; (5) standard high-level functional architectures for autonomous systems; and (6) a theory of higher-order contracts. We believe that addressing these research needs, among others, could improve the adoption of formal methods in the design of autonomous systems including learning-enabled systems, and increase confidence in their safe operations.

Konzeption und Umsetzung einer mobilen Applikation zur Validierung von fälschungssicheren Produktlabeln

  • Authors: Oliver Linne
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.13519
  • Pdf link: https://arxiv.org/pdf/2304.13519
  • Abstract
    Due to increasing numbers of product piracy worldwide, a cost-effective method for verifying the origin of a product is to be developed. For this purpose, a certificate of authenticity can be created using precisely measurable, unique properties of special physical objects that are difficult to reconstruct. In the context of the present work, this is a counterfeit-proof label composed of randomly distributed gold nanospheres or rods in a semi-transparent material. The characteristic positioning of the label's elements can be precisely measured using a smartphone's camera and additional technologies. This can create an offline usable verification method for the general public without the need for an existing network connection. The present work provides a first part of the proof of concept that such a system and especially the associated algorithmic computation method can be implemented and efficiently used in a mobile application. In addition, a method suitable in practice for transmitting and securing the required information is determined in each case. Furthermore, the results of the validation of counterfeit-proof product labels are analyzed in detail and existing weaknesses are pointed out. -- Auf Grund weltweit steigender Zahlen der Produktpiraterie soll ein kosteng"unstiges Verfahren zur Verifizierung der Herkunft eines Produktes entwickelt werden. Daf"ur l"asst sich durch exakt messbare, einzigartige, jedoch schwer rekonstruierbare Eigenschaften spezieller physischer Objekte ein Echtheitszertifikat kreieren. Dieses ist im Kontext der vorliegenden Arbeit ein f"alschungssicheres Label, das sich in einem semi-transparenten Material aus zuf"allig verteilten Goldnanok"ugelchen oder -st"abchen zusammensetzt. Die charakteristischen Positionierungen der Elemente des Labels lassen sich mit der Kamera eines Smartphones und zus"atzlichen Technologien pr"azise messen. Dadurch kann f"ur die breite Bev"olkerung ohne die Notwendigkeit einer bestehenden Netzwerkverbindung ein offline verwendbares Verifikationsverfahren erschaffen werden. Die vorliegende Arbeit liefert einen ersten Teil des Machbarkeitsnachweises, dass ein derartiges System und insbesondere das damit einhergehende algorithmische Berechnungsverfahren in einer mobilen Applikation implementier -- und effizient einsetzbar ist. Zudem wird je eine in der Praxis geeignete Methode zur "Ubermittlung und Sicherung der ben"otigten Informationen eruiert. Des Weiteren werden die Resultate der Validierung von f"alschungssicheren Produktlabeln ausf"uhrlich analysiert und vorhandene Schw"achen aufgezeigt.

Integrated Architecture for Neural Networks and Security Primitives using RRAM Crossbar

  • Authors: Simranjeet Singh, Furqan Zahoor, Gokulnath Rajendran, Vikas Rana, Sachin Patkar, Anupam Chattopadhyay, Farhad Merchant
  • Subjects: Emerging Technologies (cs.ET)
  • Arxiv link: https://arxiv.org/abs/2304.13531
  • Pdf link: https://arxiv.org/pdf/2304.13531
  • Abstract
    This paper proposes an architecture that integrates neural networks (NNs) and hardware security modules using a single resistive random access memory (RRAM) crossbar. The proposed architecture enables using a single crossbar to implement NN, true random number generator (TRNG), and physical unclonable function (PUF) applications while exploiting the multi-state storage characteristic of the RRAM crossbar for the vector-matrix multiplication operation required for the implementation of NN. The TRNG is implemented by utilizing the crossbar's variation in device switching thresholds to generate random bits. The PUF is implemented using the same crossbar initialized as an entropy source for the TRNG. Additionally, the weights locking concept is introduced to enhance the security of NNs by preventing unauthorized access to the NN weights. The proposed architecture provides flexibility to configure the RRAM device in multiple modes to suit different applications. It shows promise in achieving a more efficient and compact design for the hardware implementation of NNs and security primitives.

A Two-Step Rule for Backpropagation

  • Authors: Ahmed Boughammoura
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.13537
  • Pdf link: https://arxiv.org/pdf/2304.13537
  • Abstract
    We present a simplified computational rule for the back-propagation formulas for artificial neural networks. In this work, we provide a generic two-step rule for the back-propagation algorithm in matrix notation. Moreover, this rule incorporates both the forward and backward phases of the computations involved in the learning process. Specifically, this recursive computing rule permits the propagation of the changes to all synaptic weights in the network, layer by layer, efficiently. In particular, we use this rule to compute both the up and down partial derivatives of the cost function of all the connections feeding into the output layer.

ElegansNet: a brief scientific report and initial experiments

  • Authors: Francesco Bardozzo, Andrea Terlizzi, Pietro Liò, Roberto Tagliaferri
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13538
  • Pdf link: https://arxiv.org/pdf/2304.13538
  • Abstract
    This research report introduces ElegansNet, a neural network that mimics real-world neuronal network circuitry, with the goal of better understanding the interplay between connectome topology and deep learning systems. The proposed approach utilizes the powerful representational capabilities of living beings' neuronal circuitry to design and generate improved deep learning systems with a topology similar to natural networks. The Caenorhabditis elegans connectome is used as a reference due to its completeness, reasonable size, and functional neuron classes annotations. It is demonstrated that the connectome of simple organisms exhibits specific functional relationships between neurons, and once transformed into learnable tensor networks and integrated into modern architectures, it offers bio-plausible structures that efficiently solve complex tasks. The performance of the models is demonstrated against randomly wired networks and compared to artificial networks ranked on global benchmarks. In the first case, ElegansNet outperforms randomly wired networks. Interestingly, ElegansNet models show slightly similar performance with only those based on the Watts-Strogatz small-world property. When compared to state-of-the-art artificial neural networks, such as transformers or attention-based autoencoders, ElegansNet outperforms well-known deep learning and traditional models in both supervised image classification tasks and unsupervised hand-written digits reconstruction, achieving top-1 accuracy of 99.99% on Cifar10 and 99.84% on MNIST Unsup on the validation sets.

D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs

  • Authors: Aditya Dhakal, Sameer G. Kulkarni, K. K. Ramakrishnan
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.13541
  • Pdf link: https://arxiv.org/pdf/2304.13541
  • Abstract
    Hardware accelerators such as GPUs are required for real-time, low-latency inference with Deep Neural Networks (DNN). However, due to the inherent limits to the parallelism they can exploit, DNNs often under-utilize the capacity of today's high-end accelerators. Although spatial multiplexing of the GPU, leads to higher GPU utilization and higher inference throughput, there remain a number of challenges. Finding the GPU percentage for right-sizing the GPU for each DNN through profiling, determining an optimal batching of requests to balance throughput improvement while meeting application-specific deadlines and service level objectives (SLOs), and maximizing throughput by appropriately scheduling DNNs are still significant challenges. This paper introduces a dynamic and fair spatio-temporal scheduler (D-STACK) that enables multiple DNNs to run in the GPU concurrently. To help allocate the appropriate GPU percentage (we call it the "Knee"), we develop and validate a model that estimates the parallelism each DNN can utilize. We also develop a lightweight optimization formulation to find an efficient batch size for each DNN operating with D-STACK. We bring together our optimizations and our spatio-temporal scheduler to provide a holistic inference framework. We demonstrate its ability to provide high throughput while meeting application SLOs. We compare D-STACK with an ideal scheduler that can allocate the right GPU percentage for every DNN kernel. D-STACK gets higher than 90 percent throughput and GPU utilization compared to the ideal scheduler. We also compare D-STACK with other GPU multiplexing and scheduling methods (e.g., NVIDIA Triton, Clipper, Nexus), using popular DNN models. Our controlled experiments with multiplexing several popular DNN models achieve up to 1.6X improvement in GPU utilization and up to 4X improvement in inference throughput.

On the Order of Power Series and the Sum of Square Roots Problem

  • Authors: Louis Gaillard, Gorav Jindal
  • Subjects: Computational Complexity (cs.CC); Symbolic Computation (cs.SC)
  • Arxiv link: https://arxiv.org/abs/2304.13605
  • Pdf link: https://arxiv.org/pdf/2304.13605
  • Abstract
    This paper focuses on the study of the order of power series that are linear combinations of a given finite set of power series. The order of a formal power series, known as $\textrm{ord}(f)$, is defined as the minimum exponent of $x$ that has a non-zero coefficient in $f(x)$. Our first result is that the order of the Wronskian of these power series is equivalent up to a polynomial factor, to the maximum order which occurs in the linear combination of these power series. This implies that the Wronskian approach used in (Kayal and Saha, TOCT'2012) to upper bound the order of sum of square roots is optimal up to a polynomial blowup. We also demonstrate similar upper bounds, similar to those of (Kayal and Saha, TOCT'2012), for the order of power series in a variety of other scenarios. We also solve a special case of the inequality testing problem outlined in (Etessami et al., TOCT'2014). In the second part of the paper, we study the equality variant of the sum of square roots problem, which is decidable in polynomial time due to (Bl"omer, FOCS'1991). We investigate a natural generalization of this problem when the input integers are given as straight line programs. Under the assumption of the Generalized Riemann Hypothesis (GRH), we show that this problem can be reduced to the so-called one dimensional'' variant. We identify the key mathematical challenges for solving this one dimensional'' variant.

The Roles of Symbols in Neural-based AI: They are Not What You Think!

  • Authors: Daniel L. Silver, Tom M. Mitchell
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13626
  • Pdf link: https://arxiv.org/pdf/2304.13626
  • Abstract
    We propose that symbols are first and foremost external communication tools used between intelligent agents that allow knowledge to be transferred in a more efficient and effective manner than having to experience the world directly. But, they are also used internally within an agent through a form of self-communication to help formulate, describe and justify subsymbolic patterns of neural activity that truly implement thinking. Symbols, and our languages that make use of them, not only allow us to explain our thinking to others and ourselves, but also provide beneficial constraints (inductive bias) on learning about the world. In this paper we present relevant insights from neuroscience and cognitive science, about how the human brain represents symbols and the concepts they refer to, and how today's artificial neural networks can do the same. We then present a novel neuro-symbolic hypothesis and a plausible architecture for intelligent agents that combines subsymbolic representations for symbols and concepts for learning and reasoning. Our hypothesis and associated architecture imply that symbols will remain critical to the future of intelligent systems NOT because they are the fundamental building blocks of thought, but because they are characterizations of subsymbolic processes that constitute thought.

Experimental Validation of Model-less Robust Voltage Control using Measurement-based Estimated Voltage Sensitivity Coefficients

  • Authors: Rahul Gupta, Mario Paolone
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.13638
  • Pdf link: https://arxiv.org/pdf/2304.13638
  • Abstract
    Increasing adoption of smart meters and phasor measurement units (PMUs) in power distribution networks are enabling the adoption of data-driven/model-less control schemes to mitigate grid issues such as over/under voltages and power-flow congestions. However, such a scheme can lead to infeasible/inaccurate control decisions due to measurement inaccuracies. In this context, the authors' previous work proposed a robust measurement-based control scheme accounting for the uncertainties of the estimated models. In this scheme, a recursive least squares (RLS)-based method estimates the grid model (in the form of voltage magnitude sensitivity coefficients). Then, a robust control problem optimizes power set-points of distributed energy resources (DERs) such that the nodal voltage limits are satisfied. The estimated voltage sensitivity coefficients are used to model the nodal voltages, and the control robustness is achieved by accounting for their uncertainties. This work presents the first experimental validation of such a robust model-less control scheme on a real power distribution grid. The scheme is applied for voltage control by regulating two photovoltaic (PV) inverters connected in a real microgrid which is a replica of the CIGRE benchmark microgrid network at the EPFL Distributed Electrical Systems Laboratory.

PVP: Pre-trained Visual Parameter-Efficient Tuning

  • Authors: Zhao Song, Ke Yang, Naiyang Guan, Junjie Zhu, Peng Qiao, Qingyong Hu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.13639
  • Pdf link: https://arxiv.org/pdf/2304.13639
  • Abstract
    Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks. However, it is still highly challenging to fully fine-tune these models for downstream tasks due to their high computational and storage costs. Recently, Parameter-Efficient Tuning (PETuning) techniques, e.g., Visual Prompt Tuning (VPT) and Low-Rank Adaptation (LoRA), have significantly reduced the computation and storage cost by inserting lightweight prompt modules into the pre-trained models and tuning these prompt modules with a small number of trainable parameters, while keeping the transformer backbone frozen. Although only a few parameters need to be adjusted, most PETuning methods still require a significant amount of downstream task training data to achieve good results. The performance is inadequate on low-data regimes, especially when there are only one or two examples per class. To this end, we first empirically identify the poor performance is mainly due to the inappropriate way of initializing prompt modules, which has also been verified in the pre-trained language models. Next, we propose a Pre-trained Visual Parameter-efficient (PVP) Tuning framework, which pre-trains the parameter-efficient tuning modules first and then leverages the pre-trained modules along with the pre-trained transformer backbone to perform parameter-efficient tuning on downstream tasks. Experiment results on five Fine-Grained Visual Classification (FGVC) and VTAB-1k datasets demonstrate that our proposed method significantly outperforms state-of-the-art PETuning methods.

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

  • Authors: Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Markus Wulfmeier, Jan Humplik, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley, Francesco Nori, Raia Hadsell, Nicolas Heess
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13653
  • Pdf link: https://arxiv.org/pdf/2304.13653
  • Abstract
    We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. We first trained individual skills in isolation and then composed those skills end-to-end in a self-play setting. The resulting policy exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and transitions between them in a smooth, stable, and efficient manner - well beyond what is intuitively expected from the robot. The agents also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. The full range of behaviors emerged from a small set of simple rewards. Our agents were trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer, despite significant unmodeled effects and variations across robot instances. Although the robots are inherently fragile, minor hardware modifications together with basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way. Indeed, even though the agents were optimized for scoring, in experiments they walked 156% faster, took 63% less time to get up, and kicked 24% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives. Examples of the emergent behaviors and full 1v1 matches are available on the supplementary website.

A Personalized Dense Retrieval Framework for Unified Information Access

  • Authors: Hansi Zeng, Surya Kallumadi, Zaid Alibadi, Rodrigo Nogueira, Hamed Zamani
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.13654
  • Pdf link: https://arxiv.org/pdf/2304.13654
  • Abstract
    Developing a universal model that can efficiently and effectively respond to a wide range of information access requests -- from retrieval to recommendation to question answering -- has been a long-lasting goal in the information retrieval community. This paper argues that the flexibility, efficiency, and effectiveness brought by the recent development in dense retrieval and approximate nearest neighbor search have smoothed the path towards achieving this goal. We develop a generic and extensible dense retrieval framework, called \framework, that can handle a wide range of (personalized) information access requests, such as keyword search, query by example, and complementary item recommendation. Our proposed approach extends the capabilities of dense retrieval models for ad-hoc retrieval tasks by incorporating user-specific preferences through the development of a personalized attentive network. This allows for a more tailored and accurate personalized information access experience. Our experiments on real-world e-commerce data suggest the feasibility of developing universal information access models by demonstrating significant improvements even compared to competitive baselines specifically developed for each of these individual information access tasks. This work opens up a number of fundamental research directions for future exploration.

Building K-Anonymous User Cohorts with\ Consecutive Consistent Weighted Sampling (CCWS)

  • Authors: Xinyi Zheng, Weijie Zhao, Xiaoyun Li, Ping Li
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.13677
  • Pdf link: https://arxiv.org/pdf/2304.13677
  • Abstract
    To retrieve personalized campaigns and creatives while protecting user privacy, digital advertising is shifting from member-based identity to cohort-based identity. Under such identity regime, an accurate and efficient cohort building algorithm is desired to group users with similar characteristics. In this paper, we propose a scalable $K$-anonymous cohort building algorithm called {\em consecutive consistent weighted sampling} (CCWS). The proposed method combines the spirit of the ($p$-powered) consistent weighted sampling and hierarchical clustering, so that the $K$-anonymity is ensured by enforcing a lower bound on the size of cohorts. Evaluations on a LinkedIn dataset consisting of $>70$M users and ads campaigns demonstrate that CCWS achieves substantial improvements over several hashing-based methods including sign random projections (SignRP), minwise hashing (MinHash), as well as the vanilla CWS.

Hitting Subgraphs in Sparse Graphs and Geometric Intersection Graphs

  • Authors: Daniel Lokshtanov, Fahad Panolan, Saket Saurabh, Jie Xue, Meirav Zehavi
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.13695
  • Pdf link: https://arxiv.org/pdf/2304.13695
  • Abstract
    We investigate a fundamental vertex-deletion problem called (Induced) Subgraph Hitting: given a graph $G$ and a set $\mathcal{F}$ of forbidden graphs, the goal is to compute a minimum-sized set $S$ of vertices of $G$ such that $G-S$ does not contain any graph in $\mathcal{F}$ as an (induced) subgraph. This is a generic problem that encompasses many well-known problems that were extensively studied on their own, particularly (but not only) from the perspectives of both approximation and parameterization. We focus on the design of efficient approximation schemes, i.e., with running time $f(\varepsilon,\mathcal{F}) \cdot n^{O(1)}$, which are also of significant interest to both communities. Technically, our main contribution is a linear-time approximation-preserving reduction from (Induced) Subgraph Hitting on any graph class $\mathcal{G}$ of bounded expansion to the same problem on bounded degree graphs within $\mathcal{G}$. This yields a novel algorithmic technique to design (efficient) approximation schemes for the problem on very broad graph classes, well beyond the state-of-the-art. Specifically, applying this reduction, we derive approximation schemes with (almost) linear running time for the problem on any graph classes that have strongly sublinear separators and many important classes of geometric intersection graphs (such as fat-object graphs, pseudo-disk graphs, etc.). Our proofs introduce novel concepts and combinatorial observations that may be of independent interest (and, which we believe, will find other uses) for studies of approximation algorithms, parameterized complexity, sparse graph classes, and geometric intersection graphs. As a byproduct, we also obtain the first robust algorithm for $k$-Subgraph Isomorphism on intersection graphs of fat objects and pseudo-disks, with running time $f(k) \cdot n \log n + O(m)$.

An Investigation into Active Control for Accessible Orbital Flight

  • Authors: Timothy Cai
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.13704
  • Pdf link: https://arxiv.org/pdf/2304.13704
  • Abstract
    Recently, a practical and publicly accessible satellite standard called the SmallSat has amplified public involvement in orbital research. This allows for flexible and efficient deployments of impactful low-earth-orbit experiments that would otherwise never be flown. However, the launch industry responsible for flying these experiments is not flexible nor efficient. This project aims to make orbital technologies accessible at the miniature scale, specifically thrust-vector-control, through an iterative engineering process simplifying and miniaturizing technologies from launch vehicles such as the Space Shuttle and Falcon 9. An Arduino-based custom flight computer was developed alongside state machine control software and active-control hardware, all designed to scale. Together, these three major components emulate the methods used in the aerospace industry. Initial test flights and recent ground test data have indicated stable control with a maximum of 7{\deg} and 2.62{\deg} of deviation from the intended flight path respectively, an acceptable stability range when compared to similar finned flights. Results show that scalable thrust vectoring is possible at a small scale, giving adaptability and control applicable to both small and large test vehicles. With accessible orbital flight, countless experiments can be completed concurrently, allowing for faster amateur rocket development and opening another path to space.

Association Rules Mining with Auto-Encoders

  • Authors: Théophile Berteloot, Richard Khoury, Audrey Durand
  • Subjects: Machine Learning (cs.LG); Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.13717
  • Pdf link: https://arxiv.org/pdf/2304.13717
  • Abstract
    Association rule mining is one of the most studied research fields of data mining, with applications ranging from grocery basket problems to explainable classification systems. Classical association rule mining algorithms have several limitations, especially with regards to their high execution times and number of rules produced. Over the past decade, neural network solutions have been used to solve various optimization problems, such as classification, regression or clustering. However there are still no efficient way association rules using neural networks. In this paper, we present an auto-encoder solution to mine association rule called ARM-AE. We compare our algorithm to FP-Growth and NSGAII on three categorical datasets, and show that our algorithm discovers high support and confidence rule set and has a better execution time than classical methods while preserving the quality of the rule set produced.

Controllable Image Generation via Collage Representations

  • Authors: Arantxa Casanova, Marlène Careil, Adriana Romero-Soriano, Christopher J. Pal, Jakob Verbeek, Michal Drozdzal
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13722
  • Pdf link: https://arxiv.org/pdf/2304.13722
  • Abstract
    Recent advances in conditional generative image models have enabled impressive results. On the one hand, text-based conditional models have achieved remarkable generation quality, by leveraging large-scale datasets of image-text pairs. To enable fine-grained controllability, however, text-based models require long prompts, whose details may be ignored by the model. On the other hand, layout-based conditional models have also witnessed significant advances. These models rely on bounding boxes or segmentation maps for precise spatial conditioning in combination with coarse semantic labels. The semantic labels, however, cannot be used to express detailed appearance characteristics. In this paper, we approach fine-grained scene controllability through image collages which allow a rich visual description of the desired scene as well as the appearance and location of the objects therein, without the need of class nor attribute labels. We introduce "mixing and matching scenes" (M&Ms), an approach that consists of an adversarially trained generative image model which is conditioned on appearance features and spatial positions of the different elements in a collage, and integrates these into a coherent image. We train our model on the OpenImages (OI) dataset and evaluate it on collages derived from OI and MS-COCO datasets. Our experiments on the OI dataset show that M&Ms outperforms baselines in terms of fine-grained scene controllability while being very competitive in terms of image quality and sample diversity. On the MS-COCO dataset, we highlight the generalization ability of our model by outperforming DALL-E in terms of the zero-shot FID metric, despite using two magnitudes fewer parameters and data. Collage based generative models have the potential to advance content creation in an efficient and effective way as they are intuitive to use and yield high quality generations.

Keyword: faster

GENIE-NF-AI: Identifying Neurofibromatosis Tumors using Liquid Neural Network (LTC) trained on AACR GENIE Datasets

  • Authors: Michael Bidollahkhani, Ferhat Atasoy, Elnaz Abedini, Ali Davar, Omid Hamza, Fırat Sefaoğlu, Amin Jafari, Muhammed Nadir Yalçın, Hamdan Abdellatef
  • Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.13429
  • Pdf link: https://arxiv.org/pdf/2304.13429
  • Abstract
    In recent years, the field of medicine has been increasingly adopting artificial intelligence (AI) technologies to provide faster and more accurate disease detection, prediction, and assessment. In this study, we propose an interpretable AI approach to diagnose patients with neurofibromatosis using blood tests and pathogenic variables. We evaluated the proposed method using a dataset from the AACR GENIE project and compared its performance with modern approaches. Our proposed approach outperformed existing models with 99.86% accuracy. We also conducted NF1 and interpretable AI tests to validate our approach. Our work provides an explainable approach model using logistic regression and explanatory stimulus as well as a black-box model. The explainable models help to explain the predictions of black-box models while the glass-box models provide information about the best-fit features. Overall, our study presents an interpretable AI approach for diagnosing patients with neurofibromatosis and demonstrates the potential of AI in the medical field.

From Chaos Comes Order: Ordering Event Representations for Object Detection

  • Authors: Nikola Zubić, Daniel Gehrig, Mathias Gehrig, Davide Scaramuzza
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13455
  • Pdf link: https://arxiv.org/pdf/2304.13455
  • Abstract
    Today, state-of-the-art deep neural networks that process events first convert them into dense, grid-like input representations before using an off-the-shelf network. However, selecting the appropriate representation for the task traditionally requires training a neural network for each representation and selecting the best one based on the validation score, which is very time-consuming. In this work, we eliminate this bottleneck by selecting the best representation based on the Gromov-Wasserstein Discrepancy (GWD) between the raw events and their representation. It is approximately 200 times faster to compute than training a neural network and preserves the task performance ranking of event representations across multiple representations, network backbones, and datasets. This means that finding a representation with a high task score is equivalent to finding a representation with a low GWD. We use this insight to, for the first time, perform a hyperparameter search on a large family of event representations, revealing new and powerful representations that exceed the state-of-the-art. On object detection, our optimized representation outperforms existing representations by 1.9% mAP on the 1 Mpx dataset and 8.6% mAP on the Gen1 dataset and even outperforms the state-of-the-art by 1.8% mAP on Gen1 and state-of-the-art feed-forward methods by 6.0% mAP on the 1 Mpx dataset. This work opens a new unexplored field of explicit representation optimization for event-based learning methods.

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

  • Authors: Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Markus Wulfmeier, Jan Humplik, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley, Francesco Nori, Raia Hadsell, Nicolas Heess
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13653
  • Pdf link: https://arxiv.org/pdf/2304.13653
  • Abstract
    We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. We first trained individual skills in isolation and then composed those skills end-to-end in a self-play setting. The resulting policy exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and transitions between them in a smooth, stable, and efficient manner - well beyond what is intuitively expected from the robot. The agents also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. The full range of behaviors emerged from a small set of simple rewards. Our agents were trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer, despite significant unmodeled effects and variations across robot instances. Although the robots are inherently fragile, minor hardware modifications together with basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way. Indeed, even though the agents were optimized for scoring, in experiments they walked 156% faster, took 63% less time to get up, and kicked 24% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives. Examples of the emergent behaviors and full 1v1 matches are available on the supplementary website.

An Investigation into Active Control for Accessible Orbital Flight

  • Authors: Timothy Cai
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.13704
  • Pdf link: https://arxiv.org/pdf/2304.13704
  • Abstract
    Recently, a practical and publicly accessible satellite standard called the SmallSat has amplified public involvement in orbital research. This allows for flexible and efficient deployments of impactful low-earth-orbit experiments that would otherwise never be flown. However, the launch industry responsible for flying these experiments is not flexible nor efficient. This project aims to make orbital technologies accessible at the miniature scale, specifically thrust-vector-control, through an iterative engineering process simplifying and miniaturizing technologies from launch vehicles such as the Space Shuttle and Falcon 9. An Arduino-based custom flight computer was developed alongside state machine control software and active-control hardware, all designed to scale. Together, these three major components emulate the methods used in the aerospace industry. Initial test flights and recent ground test data have indicated stable control with a maximum of 7{\deg} and 2.62{\deg} of deviation from the intended flight path respectively, an acceptable stability range when compared to similar finned flights. Results show that scalable thrust vectoring is possible at a small scale, giving adaptability and control applicable to both small and large test vehicles. With accessible orbital flight, countless experiments can be completed concurrently, allowing for faster amateur rocket development and opening another path to space.

Keyword: mobile

Learning to Predict Navigational Patterns from Partial Observations

  • Authors: Robin Karlsson, Alexander Carballo, Francisco Lepe-Salazar, Keisuke Fujii, Kento Ohtani, Kazuya Takeda
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.13242
  • Pdf link: https://arxiv.org/pdf/2304.13242
  • Abstract
    Human beings cooperatively navigate rule-constrained environments by adhering to mutually known navigational patterns, which may be represented as directional pathways or road lanes. Inferring these navigational patterns from incompletely observed environments is required for intelligent mobile robots operating in unmapped locations. However, algorithmically defining these navigational patterns is nontrivial. This paper presents the first self-supervised learning (SSL) method for learning to infer navigational patterns in real-world environments from partial observations only. We explain how geometric data augmentation, predictive world modeling, and an information-theoretic regularizer enables our model to predict an unbiased local directional soft lane probability (DSLP) field in the limit of infinite data. We demonstrate how to infer global navigational patterns by fitting a maximum likelihood graph to the DSLP field. Experiments show that our SSL model outperforms two SOTA supervised lane graph prediction models on the nuScenes dataset. We propose our SSL method as a scalable and interpretable continual learning paradigm for navigation by perception. Code released upon publication.

ESCM: An Efficient and Secure Communication Mechanism for UAV Networks

  • Authors: Haoxiang Luo, Yifan Wu, Gang Sun, Hongfang Yu, Shizhong Xu, Mohsen Guizani
  • Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.13244
  • Pdf link: https://arxiv.org/pdf/2304.13244
  • Abstract
    UAV (unmanned aerial vehicle) is gradually entering various human activities. It has also become an important part of satellite-air-ground-sea integrated network (SAGS) for 6G communication. In order to achieve high mobility, UAV has strict requirements on communication latency, and it cannot be illegally controlled as weapons of attack with malicious intentions. Therefore, an efficient and secure communication method specifically designed for UAV network is required. This paper proposes a communication mechanism named ESCM for the above requirements. For high efficiency of communication, ESCM designs a routing protocol based on artificial bee colony algorithm (ABC) for UAV network to accelerate communication between UAVs. Meanwhile, we plan to use blockchain to guarantee the communication security of UAV networks. However, blockchain has unstable links in high mobility network scenarios, resulting in low consensus efficiency and high communication overhead. Therefore, ESCM also introduces the concept of the digital twin, mapping the UAVs from the physical world into Cyberspace, transforming the UAV network into a static network. And this virtual UAV network is called CyberUAV. Then, in CyberUAV, we design a blockchain system and propose a consensus algorithm based on network coding, named proof of network coding (PoNC). PoNC not only ensures the security of ESCM, but also further improves the performance of ESCM through network coding. Simulation results show that ESCM has obvious advantages in communication efficiency and security. Moreover, encoding messages through PoNC consensus can increase the network throughput, and make mobile blockchain static through digital twin can improve the consensus success rate.

CrowdCache: A Decentralized Game-Theoretic Framework for Mobile Edge Content Sharing

  • Authors: Duong Thuy Anh Nguyen, Jiaming Cheng, Duong Tung Nguyen, Angelia Nedich
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2304.13246
  • Pdf link: https://arxiv.org/pdf/2304.13246
  • Abstract
    Mobile edge computing (MEC) is a promising solution for enhancing the user experience, minimizing content delivery expenses, and reducing backhaul traffic. In this paper, we propose a novel privacy-preserving decentralized game-theoretic framework for resource crowdsourcing in MEC. Our framework models the interactions between a content provider (CP) and multiple mobile edge device users (MEDs) as a non-cooperative game, in which MEDs offer idle storage resources for content caching in exchange for rewards. We introduce efficient decentralized gradient play algorithms for Nash equilibrium (NE) computation by exchanging local information among neighboring MEDs only, thus preventing attackers from learning users' private information. The key challenge in designing such algorithms is that communication among MEDs is not fixed and is facilitated by a sequence of undirected time-varying graphs. Our approach achieves linear convergence to the NE without imposing any assumptions on the values of parameters in the local objective functions, such as requiring strong monotonicity to be stronger than its dependence on other MEDs' actions, which is commonly required in existing literature when the graph is directed time-varying. Extensive simulations demonstrate the effectiveness of our approach in achieving efficient resource outsourcing decisions while preserving the privacy of the edge devices.

Digital technologies in the context of university transition and disability: Theoretical and empirical advances

  • Authors: Edgar Pacheco
  • Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.13262
  • Pdf link: https://arxiv.org/pdf/2304.13262
  • Abstract
    Since transition to higher education emerged as a research topic in the early 1970s, scholarly inquiry has focused on students without impairments and, what is more, little attention has been paid to the role of digital technologies. This article seeks to address this knowledge gap by looking at the university experiences of a group of first-year students with vision impairments from New Zealand, and the way they use digital tools, such as social media and mobile devices, to manage their transition-related challenges. The article summarises the findings from a longitudinal qualitative project which was methodologically informed by action research (AR). The article explores and discusses scholarly inquiry of transition to university and introduces a conceptual framework which includes five overlapping stages, the transition issues faced by the students and the roles played by digital technologies. The article updates and expands the theoretical understanding of transition to higher education and provides empirical evidence for practitioners to support the needs, inclusion, and participation of young people with disabilities in the tertiary setting.

Konzeption und Umsetzung einer mobilen Applikation zur Validierung von fälschungssicheren Produktlabeln

  • Authors: Oliver Linne
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.13519
  • Pdf link: https://arxiv.org/pdf/2304.13519
  • Abstract
    Due to increasing numbers of product piracy worldwide, a cost-effective method for verifying the origin of a product is to be developed. For this purpose, a certificate of authenticity can be created using precisely measurable, unique properties of special physical objects that are difficult to reconstruct. In the context of the present work, this is a counterfeit-proof label composed of randomly distributed gold nanospheres or rods in a semi-transparent material. The characteristic positioning of the label's elements can be precisely measured using a smartphone's camera and additional technologies. This can create an offline usable verification method for the general public without the need for an existing network connection. The present work provides a first part of the proof of concept that such a system and especially the associated algorithmic computation method can be implemented and efficiently used in a mobile application. In addition, a method suitable in practice for transmitting and securing the required information is determined in each case. Furthermore, the results of the validation of counterfeit-proof product labels are analyzed in detail and existing weaknesses are pointed out. -- Auf Grund weltweit steigender Zahlen der Produktpiraterie soll ein kosteng"unstiges Verfahren zur Verifizierung der Herkunft eines Produktes entwickelt werden. Daf"ur l"asst sich durch exakt messbare, einzigartige, jedoch schwer rekonstruierbare Eigenschaften spezieller physischer Objekte ein Echtheitszertifikat kreieren. Dieses ist im Kontext der vorliegenden Arbeit ein f"alschungssicheres Label, das sich in einem semi-transparenten Material aus zuf"allig verteilten Goldnanok"ugelchen oder -st"abchen zusammensetzt. Die charakteristischen Positionierungen der Elemente des Labels lassen sich mit der Kamera eines Smartphones und zus"atzlichen Technologien pr"azise messen. Dadurch kann f"ur die breite Bev"olkerung ohne die Notwendigkeit einer bestehenden Netzwerkverbindung ein offline verwendbares Verifikationsverfahren erschaffen werden. Die vorliegende Arbeit liefert einen ersten Teil des Machbarkeitsnachweises, dass ein derartiges System und insbesondere das damit einhergehende algorithmische Berechnungsverfahren in einer mobilen Applikation implementier -- und effizient einsetzbar ist. Zudem wird je eine in der Praxis geeignete Methode zur "Ubermittlung und Sicherung der ben"otigten Informationen eruiert. Des Weiteren werden die Resultate der Validierung von f"alschungssicheren Produktlabeln ausf"uhrlich analysiert und vorhandene Schw"achen aufgezeigt.

Thermal Vision for Soil Assessment in a Multipurpose Environmental Chamber under Martian Conditions towards Robot Navigation

  • Authors: Raúl Castilla-Arquillo, Anthony Mandow, Carlos J. Pérez-del-Pulgar, César Álvarez-Llamas, José M. Vadillo, Javier Laserna
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.13525
  • Pdf link: https://arxiv.org/pdf/2304.13525
  • Abstract
    Soil assessment is important for mobile robot planning and navigation on natural and planetary environments. Terramechanic characteristics can be inferred from the thermal behaviour of soils under the influence of sunlight using remote sensors such as Long-Wave Infrared cameras. However, this behaviour is greatly affected by the low atmospheric pressures of planets such as Mars, so practical models are needed to relate robot remote sensing data on Earth to target planetary exploration conditions. This article proposes a general framework based on multipurpose environmental chambers to generate representative diurnal cycle dataset pairs that can be useful to relate the thermal behaviour of a soil on Earth to the corresponding behaviour under planetary pressure conditions using remote sensing. Furthermore, we present an application of the proposed framework to generate datasets using the UMA-Laserlab chamber, which can replicate the atmospheric \ch{CO2} composition of Mars. In particular, we analyze the thermal behaviour of four soil samples of different granularity by comparing replicated Martian surface conditions and their Earth's diurnal cycle equivalent. Results indicate a correlation between granularity and thermal inertia that is consistent with available Mars surface measurements recorded by rovers. The resulting dataset pairs, consisting of representative diurnal cycle thermal images with heater, air, and subsurface temperatures, have been made available for the scientific community.

Keyword: pruning

Optimizing Deep Learning Models For Raspberry Pi

  • Authors: Salem Ameen, Kangaranmulle Siriwardana, Theo Theodoridis
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.13039
  • Pdf link: https://arxiv.org/pdf/2304.13039
  • Abstract
    Deep learning models have become increasingly popular for a wide range of applications, including computer vision, natural language processing, and speech recognition. However, these models typically require large amounts of computational resources, making them challenging to run on low-power devices such as the Raspberry Pi. One approach to addressing this challenge is to use pruning techniques to reduce the size of the deep learning models. Pruning involves removing unimportant weights and connections from the model, resulting in a smaller and more efficient model. Pruning can be done during training or after the model has been trained. Another approach is to optimize the deep learning models specifically for the Raspberry Pi architecture. This can include optimizing the model's architecture and parameters to take advantage of the Raspberry Pi's hardware capabilities, such as its CPU and GPU. Additionally, the model can be optimized for energy efficiency by minimizing the amount of computation required. Pruning and optimizing deep learning models for the Raspberry Pi can help overcome the computational and energy constraints of low-power devices, making it possible to run deep learning models on a wider range of devices. In the following sections, we will explore these approaches in more detail and discuss their effectiveness for optimizing deep learning models for the Raspberry Pi.

Towards Compute-Optimal Transfer Learning

  • Authors: Massimo Caccia, Alexandre Galashov, Arthur Douillard, Amal Rannen-Triki, Dushyant Rao, Michela Paganini, Laurent Charlin, Marc'Aurelio Ranzato, Razvan Pascanu
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13164
  • Pdf link: https://arxiv.org/pdf/2304.13164
  • Abstract
    The field of transfer learning is undergoing a significant shift with the introduction of large pretrained models which have demonstrated strong adaptability to a variety of downstream tasks. However, the high computational and memory requirements to finetune or use these models can be a hindrance to their widespread use. In this study, we present a solution to this issue by proposing a simple yet effective way to trade computational efficiency for asymptotic performance which we define as the performance a learning algorithm achieves as compute tends to infinity. Specifically, we argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance. We evaluate our method on the Nevis'22 continual learning benchmark that offers a diverse set of transfer scenarios. Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.

Machine Vision-Based Crop-Load Estimation Using YOLOv8

  • Authors: Dawood Ahmed, Ranjan Sapkota, Martin Churuvija, Manoj Karkee
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.13282
  • Pdf link: https://arxiv.org/pdf/2304.13282
  • Abstract
    Labor shortages in fruit crop production have prompted the development of mechanized and automated machines as alternatives to labor-intensive orchard operations such as harvesting, pruning, and thinning. Agricultural robots capable of identifying tree canopy parts and estimating geometric and topological parameters, such as branch diameter, length, and angles, can optimize crop yields through automated pruning and thinning platforms. In this study, we proposed a machine vision system to estimate canopy parameters in apple orchards and determine an optimal number of fruit for individual branches, providing a foundation for robotic pruning, flower thinning, and fruitlet thinning to achieve desired yield and quality.Using color and depth information from an RGB-D sensor (Microsoft Azure Kinect DK), a YOLOv8-based instance segmentation technique was developed to identify trunks and branches of apple trees during the dormant season. Principal Component Analysis was applied to estimate branch diameter (used to calculate limb cross-sectional area, or LCSA) and orientation. The estimated branch diameter was utilized to calculate LCSA, which served as an input for crop-load estimation, with larger LCSA values indicating a higher potential fruit-bearing capacity.RMSE for branch diameter estimation was 2.08 mm, and for crop-load estimation, 3.95. Based on commercial apple orchard management practices, the target crop-load (number of fruit) for each segmented branch was estimated with a mean absolute error (MAE) of 2.99 (ground truth crop-load was 6 apples per LCSA). This study demonstrated a promising workflow with high performance in identifying trunks and branches of apple trees in dynamic commercial orchard environments and integrating farm management practices into automated decision-making.

Concept-Monitor: Understanding DNN training through individual neurons

  • Authors: Mohammad Ali Khan, Tuomas Oikarinen, Tsui-Wei Weng
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13346
  • Pdf link: https://arxiv.org/pdf/2304.13346
  • Abstract
    In this work, we propose a general framework called Concept-Monitor to help demystify the black-box DNN training processes automatically using a novel unified embedding space and concept diversity metric. Concept-Monitor enables human-interpretable visualization and indicators of the DNN training processes and facilitates transparency as well as deeper understanding on how DNNs develop along the during training. Inspired by these findings, we also propose a new training regularizer that incentivizes hidden neurons to learn diverse concepts, which we show to improve training performance. Finally, we apply Concept-Monitor to conduct several case studies on different training paradigms including adversarial training, fine-tuning and network pruning via the Lottery Ticket Hypothesis

Filter Pruning via Filters Similarity in Consecutive Layers

  • Authors: Xiaorui Wang, Jun Wang, Xin Tang, Peng Gao, Rui Fang, Guotong Xie
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13397
  • Pdf link: https://arxiv.org/pdf/2304.13397
  • Abstract
    Filter pruning is widely adopted to compress and accelerate the Convolutional Neural Networks (CNNs), but most previous works ignore the relationship between filters and channels in different layers. Processing each layer independently fails to utilize the collaborative relationship across layers. In this paper, we intuitively propose a novel pruning method by explicitly leveraging the Filters Similarity in Consecutive Layers (FSCL). FSCL compresses models by pruning filters whose corresponding features are more worthless in the model. The extensive experiments demonstrate the effectiveness of FSCL, and it yields remarkable improvement over state-of-the-art on accuracy, FLOPs and parameter reduction on several benchmark models and datasets.

Sparsified Model Zoo Twins: Investigating Populations of Sparsified Neural Network Models

  • Authors: Dominik Honegger, Konstantin Schürholt, Damian Borth
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13718
  • Pdf link: https://arxiv.org/pdf/2304.13718
  • Abstract
    With growing size of Neural Networks (NNs), model sparsification to reduce the computational cost and memory demand for model inference has become of vital interest for both research and production. While many sparsification methods have been proposed and successfully applied on individual models, to the best of our knowledge their behavior and robustness has not yet been studied on large populations of models. With this paper, we address that gap by applying two popular sparsification methods on populations of models (so called model zoos) to create sparsified versions of the original zoos. We investigate the performance of these two methods for each zoo, compare sparsification layer-wise, and analyse agreement between original and sparsified populations. We find both methods to be very robust with magnitude pruning able outperform variational dropout with the exception of high sparsification ratios above 80%. Further, we find sparsified models agree to a high degree with their original non-sparsified counterpart, and that the performance of original and sparsified model is highly correlated. Finally, all models of the model zoos and their sparsified model twins are publicly available: modelzoos.cc.

Keyword: voxel

VGOS: Voxel Grid Optimization for View Synthesis from Sparse Inputs

  • Authors: Jiakai Sun, Zhanjie Zhang, Jiafu Chen, Guangyuan Li, Boyan Ji, Lei Zhao, Wei Xing
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.13386
  • Pdf link: https://arxiv.org/pdf/2304.13386
  • Abstract
    Neural Radiance Fields (NeRF) has shown great success in novel view synthesis due to its state-of-the-art quality and flexibility. However, NeRF requires dense input views (tens to hundreds) and a long training time (hours to days) for a single scene to generate high-fidelity images. Although using the voxel grids to represent the radiance field can significantly accelerate the optimization process, we observe that for sparse inputs, the voxel grids are more prone to overfitting to the training views and will have holes and floaters, which leads to artifacts. In this paper, we propose VGOS, an approach for fast (3-5 minutes) radiance field reconstruction from sparse inputs (3-10 views) to address these issues. To improve the performance of voxel-based radiance field in sparse input scenarios, we propose two methods: (a) We introduce an incremental voxel training strategy, which prevents overfitting by suppressing the optimization of peripheral voxels in the early stage of reconstruction. (b) We use several regularization techniques to smooth the voxels, which avoids degenerate solutions. Experiments demonstrate that VGOS achieves state-of-the-art performance for sparse inputs with super-fast convergence. Code will be available at https://github.com/SJoJoK/VGOS.

Keyword: lidar

Single-View Height Estimation with Conditional Diffusion Probabilistic Models

  • Authors: Isaac Corley, Peyman Najafirad
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.13214
  • Pdf link: https://arxiv.org/pdf/2304.13214
  • Abstract
    Digital Surface Models (DSM) offer a wealth of height information for understanding the Earth's surface as well as monitoring the existence or change in natural and man-made structures. Classical height estimation requires multi-view geospatial imagery or LiDAR point clouds which can be expensive to acquire. Single-view height estimation using neural network based models shows promise however it can struggle with reconstructing high resolution features. The latest advancements in diffusion models for high resolution image synthesis and editing have yet to be utilized for remote sensing imagery, particularly height estimation. Our approach involves training a generative diffusion model to learn the joint distribution of optical and DSM images across both domains as a Markov chain. This is accomplished by minimizing a denoising score matching objective while being conditioned on the source image to generate realistic high resolution 3D surfaces. In this paper we experiment with conditional denoising diffusion probabilistic models (DDPM) for height estimation from a single remotely sensed image and show promising results on the Vaihingen benchmark dataset.

Keyword: diffusion

Diffusion Probabilistic Model Based Accurate and High-Degree-of-Freedom Metasurface Inverse Design

  • Authors: Zezhou Zhang, Chuanchuan Yang, Yifeng Qin, Hao Feng, Jiqiang Feng, Hongbin Li
  • Subjects: Machine Learning (cs.LG); Optics (physics.optics)
  • Arxiv link: https://arxiv.org/abs/2304.13038
  • Pdf link: https://arxiv.org/pdf/2304.13038
  • Abstract
    Conventional meta-atom designs rely heavily on researchers' prior knowledge and trial-and-error searches using full-wave simulations, resulting in time-consuming and inefficient processes. Inverse design methods based on optimization algorithms, such as evolutionary algorithms, and topological optimizations, have been introduced to design metamaterials. However, none of these algorithms are general enough to fulfill multi-objective tasks. Recently, deep learning methods represented by Generative Adversarial Networks (GANs) have been applied to inverse design of metamaterials, which can directly generate high-degree-of-freedom meta-atoms based on S-parameter requirements. However, the adversarial training process of GANs makes the network unstable and results in high modeling costs. This paper proposes a novel metamaterial inverse design method based on the diffusion probability theory. By learning the Markov process that transforms the original structure into a Gaussian distribution, the proposed method can gradually remove the noise starting from the Gaussian distribution and generate new high-degree-of-freedom meta-atoms that meet S-parameter conditions, which avoids the model instability introduced by the adversarial training process of GANs and ensures more accurate and high-quality generation results. Experiments have proven that our method is superior to representative methods of GANs in terms of model convergence speed, generation accuracy, and quality.

Directed Chain Generative Adversarial Networks

  • Authors: Ming Min, Ruimeng Hu, Tomoyuki Ichiba
  • Subjects: Machine Learning (cs.LG); Probability (math.PR)
  • Arxiv link: https://arxiv.org/abs/2304.13131
  • Pdf link: https://arxiv.org/pdf/2304.13131
  • Abstract
    Real-world data can be multimodal distributed, e.g., data describing the opinion divergence in a community, the interspike interval distribution of neurons, and the oscillators natural frequencies. Generating multimodal distributed real-world data has become a challenge to existing generative adversarial networks (GANs). For example, neural stochastic differential equations (Neural SDEs), treated as infinite-dimensional GANs, have demonstrated successful performance mainly in generating unimodal time series data. In this paper, we propose a novel time series generator, named directed chain GANs (DC-GANs), which inserts a time series dataset (called a neighborhood process of the directed chain or input) into the drift and diffusion coefficients of the directed chain SDEs with distributional constraints. DC-GANs can generate new time series of the same distribution as the neighborhood process, and the neighborhood process will provide the key step in learning and generating multimodal distributed time series. The proposed DC-GANs are examined on four datasets, including two stochastic models from social sciences and computational neuroscience, and two real-world datasets on stock prices and energy consumption. To our best knowledge, DC-GANs are the first work that can generate multimodal time series data and consistently outperforms state-of-the-art benchmarks with respect to measures of distribution, data similarity, and predictive ability.

Single-View Height Estimation with Conditional Diffusion Probabilistic Models

  • Authors: Isaac Corley, Peyman Najafirad
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.13214
  • Pdf link: https://arxiv.org/pdf/2304.13214
  • Abstract
    Digital Surface Models (DSM) offer a wealth of height information for understanding the Earth's surface as well as monitoring the existence or change in natural and man-made structures. Classical height estimation requires multi-view geospatial imagery or LiDAR point clouds which can be expensive to acquire. Single-view height estimation using neural network based models shows promise however it can struggle with reconstructing high resolution features. The latest advancements in diffusion models for high resolution image synthesis and editing have yet to be utilized for remote sensing imagery, particularly height estimation. Our approach involves training a generative diffusion model to learn the joint distribution of optical and DSM images across both domains as a Markov chain. This is accomplished by minimizing a denoising score matching objective while being conditioned on the source image to generate realistic high resolution 3D surfaces. In this paper we experiment with conditional denoising diffusion probabilistic models (DDPM) for height estimation from a single remotely sensed image and show promising results on the Vaihingen benchmark dataset.

Score-based Generative Modeling Through Backward Stochastic Differential Equations: Inversion and Generation

  • Authors: Zihao Wang
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13224
  • Pdf link: https://arxiv.org/pdf/2304.13224
  • Abstract
    The proposed BSDE-based diffusion model represents a novel approach to diffusion modeling, which extends the application of stochastic differential equations (SDEs) in machine learning. Unlike traditional SDE-based diffusion models, our model can determine the initial conditions necessary to reach a desired terminal distribution by adapting an existing score function. We demonstrate the theoretical guarantees of the model, the benefits of using Lipschitz networks for score matching, and its potential applications in various areas such as diffusion inversion, conditional diffusion, and uncertainty quantification. Our work represents a contribution to the field of score-based generative learning and offers a promising direction for solving real-world problems.

Preconditioned discontinuous Galerkin method and convection-diffusion-reaction problems with guaranteed bounds to resulting spectra

  • Authors: Liya Gaynutdinova, Martin Ladecký, Ivana Pultarová, Miloslav Vlasák, Jan Zeman
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13315
  • Pdf link: https://arxiv.org/pdf/2304.13315
  • Abstract
    This paper focuses on the design, analysis and implementation of a new preconditioning concept for linear second order partial differential equations, including the convection-diffusion-reaction problems discretized by Galerkin or discontinuous Galerkin methods. We expand on the approach introduced by Gergelits et al. and adapt it to the more general settings, assuming that both the original and preconditioning matrices are composed of sparse matrices of very low ranks, representing local contributions to the global matrices. When applied to a symmetric problem, the method provides bounds to all individual eigenvalues of the preconditioned matrix. We show that this preconditioning strategy works not only for Galerkin discretization, but also for the discontinuous Galerkin discretization, where local contributions are associated with individual edges of the triangulation. In the case of non-symmetric problems, the method yields guaranteed bounds to real and imaginary parts of the resulting eigenvalues. We include some numerical experiments illustrating the method and its implementation, showcasing its effectiveness for the two variants of discretized (convection-)diffusion-reaction problems.

Event-triggered Boundary Control of a Class of Reaction-Diffusion PDEs with Time-dependent Reactivity

  • Authors: Bhathiya Rathnayake, Mamadou Diagne
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.13322
  • Pdf link: https://arxiv.org/pdf/2304.13322
  • Abstract
    This paper presents an event-triggered boundary control strategy for a class of reaction-diffusion PDEs with time-varying reactivity under Robin actuation. The control approach consists of a backstepping full-state feedback boundary controller and a dynamic event-triggering condition, which determines the time instants when the control input needs to be updated. It is proved that under the proposed event-triggered boundary control approach, there is a uniform minimal dwell-time between two event times. Furthermore, the well-posedness and the global exponential convergence of the closed-loop system to zero in $L^2$-sense are established. A simulation is conducted to validate the theoretical developments.

Mixed finite element methods for nonlinear reaction-diffusion equations with interfaces

  • Authors: Jeonghun J. Lee, Xinran Jin
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13376
  • Pdf link: https://arxiv.org/pdf/2304.13376
  • Abstract
    We develop mixed finite element methods for nonlinear reaction-diffusion equations with interfaces which have Robin-type interface conditions. We introduce the velocity of chemicals as new variables and reformulate the governing equations. The stability of semidiscrete solutions, existence and the a priori error estimates of fully discrete solutions are proved by fixed point theorem and continuous/discrete Gr${"o}$nwall inequalities. Numerical results illustrating our theoretical analysis are included.

DiffuseExpand: Expanding dataset for 2D medical image segmentation using diffusion models

  • Authors: Shitong Shao, Xiaohan Yuan, Zhen Huang, Ziming Qiu, Shuai Wang, Kevin Zhou
  • Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13416
  • Pdf link: https://arxiv.org/pdf/2304.13416
  • Abstract
    Dataset expansion can effectively alleviate the problem of data scarcity for medical image segmentation, due to privacy concerns and labeling difficulties. However, existing expansion algorithms still face great challenges due to their inability of guaranteeing the diversity of synthesized images with paired segmentation masks. In recent years, Diffusion Probabilistic Models (DPMs) have shown powerful image synthesis performance, even better than Generative Adversarial Networks. Based on this insight, we propose an approach called DiffuseExpand for expanding datasets for 2D medical image segmentation using DPM, which first samples a variety of masks from Gaussian noise to ensure the diversity, and then synthesizes images to ensure the alignment of images and masks. After that, DiffuseExpand chooses high-quality samples to further enhance the effectiveness of data expansion. Our comparison and ablation experiments on COVID-19 and CGMH Pelvis datasets demonstrate the effectiveness of DiffuseExpand. Our code is released at https://anonymous.4open.science/r/DiffuseExpand.

Training-Free Location-Aware Text-to-Image Synthesis

  • Authors: Jiafeng Mao, Xueting Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13427
  • Pdf link: https://arxiv.org/pdf/2304.13427
  • Abstract
    Current large-scale generative models have impressive efficiency in generating high-quality images based on text prompts. However, they lack the ability to precisely control the size and position of objects in the generated image. In this study, we analyze the generative mechanism of the stable diffusion model and propose a new interactive generation paradigm that allows users to specify the position of generated objects without additional training. Moreover, we propose an object detection-based evaluation metric to assess the control capability of location aware generation task. Our experimental results show that our method outperforms state-of-the-art methods on both control capacity and image quality.

Keyword: dynamic

Model Extraction Attacks Against Reinforcement Learning Based Controllers

  • Authors: Momina Sajid, Yanning Shen, Yasser Shoukry
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.13090
  • Pdf link: https://arxiv.org/pdf/2304.13090
  • Abstract
    We introduce the problem of model-extraction attacks in cyber-physical systems in which an attacker attempts to estimate (or extract) the feedback controller of the system. Extracting (or estimating) the controller provides an unmatched edge to attackers since it allows them to predict the future control actions of the system and plan their attack accordingly. Hence, it is important to understand the ability of the attackers to perform such an attack. In this paper, we focus on the setting when a Deep Neural Network (DNN) controller is trained using Reinforcement Learning (RL) algorithms and is used to control a stochastic system. We play the role of the attacker that aims to estimate such an unknown DNN controller, and we propose a two-phase algorithm. In the first phase, also called the offline phase, the attacker uses side-channel information about the RL-reward function and the system dynamics to identify a set of candidate estimates of the unknown DNN. In the second phase, also called the online phase, the attacker observes the behavior of the unknown DNN and uses these observations to shortlist the set of final policy estimates. We provide theoretical analysis of the error between the unknown DNN and the estimated one. We also provide numerical results showing the effectiveness of the proposed algorithm.

Uncovering the Representation of Spiking Neural Networks Trained with Surrogate Gradient

  • Authors: Yuhang Li, Youngeun Kim, Hyoungseob Park, Priyadarshini Panda
  • Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.13098
  • Pdf link: https://arxiv.org/pdf/2304.13098
  • Abstract
    Spiking Neural Networks (SNNs) are recognized as the candidate for the next-generation neural networks due to their bio-plausibility and energy efficiency. Recently, researchers have demonstrated that SNNs are able to achieve nearly state-of-the-art performance in image recognition tasks using surrogate gradient training. However, some essential questions exist pertaining to SNNs that are little studied: Do SNNs trained with surrogate gradient learn different representations from traditional Artificial Neural Networks (ANNs)? Does the time dimension in SNNs provide unique representation power? In this paper, we aim to answer these questions by conducting a representation similarity analysis between SNNs and ANNs using Centered Kernel Alignment (CKA). We start by analyzing the spatial dimension of the networks, including both the width and the depth. Furthermore, our analysis of residual connections shows that SNNs learn a periodic pattern, which rectifies the representations in SNNs to be ANN-like. We additionally investigate the effect of the time dimension on SNN representation, finding that deeper layers encourage more dynamics along the time dimension. We also investigate the impact of input data such as event-stream data and adversarial attacks. Our work uncovers a host of new findings of representations in SNNs. We hope this work will inspire future research to fully comprehend the representation power of SNNs. Code is released at https://github.com/Intelligent-Computing-Lab-Yale/SNNCKA.

Time-Selective RNN for Device-Free Multi-Room Human Presence Detection Using WiFi CSI

  • Authors: Fang-Yu Chu, Li-Hsiang Shen, An-Hung Hsiao, Kai-Ten Feng
  • Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13107
  • Pdf link: https://arxiv.org/pdf/2304.13107
  • Abstract
    Human presence detection is a crucial technology for various applications, including home automation, security, and healthcare. While camera-based systems have traditionally been used for this purpose, they raise privacy concerns. To address this issue, recent research has explored the use of channel state information (CSI) approaches that can be extracted from commercial WiFi access points (APs) and provide detailed channel characteristics. In this thesis, we propose a device-free human presence detection system for multi-room scenarios using a time-selective conditional dual feature extract recurrent Network (TCD-FERN). Our system is designed to capture significant time features with the condition on current human features using a dynamic and static (DaS) data preprocessing technique to extract moving and spatial features of people and differentiate between line-of-sight (LoS) path blocking and non-blocking cases. To mitigate the feature attenuation problem caused by room partitions, we employ a voting scheme. We conduct evaluation and real-time experiments to demonstrate that our proposed TCD-FERN system can achieve human presence detection for multi-room scenarios using fewer commodity WiFi APs.

Analysis and Mitigation of Shared Resource Contention on Heterogeneous Multicore: An Industrial Case Study

  • Authors: Michael Bechtel, Heechul Yun
  • Subjects: Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.13110
  • Pdf link: https://arxiv.org/pdf/2304.13110
  • Abstract
    In this paper, we address the industrial challenge put forth by ARM in ECRTS 2022. We systematically analyze the effect of shared resource contention to an augmented reality head-up display (AR-HUD) case-study application of the industrial challenge on a heterogeneous multicore platform, NVIDIA Jetson Nano. We configure the AR-HUD application such that it can process incoming image frames in real-time at 20Hz on the platform. We use micro-architectural denial-of-service (DoS) attacks as aggressor tasks of the challenge and show that they can dramatically impact the latency and accuracy of the AR-HUD application, which results in significant deviations of the estimated trajectories from the ground truth, despite our best effort to mitigate their influence by using cache partitioning and real-time scheduling of the AR-HUD application. We show that dynamic LLC (or DRAM depending on the aggressor) bandwidth throttling of the aggressor tasks is an effective mean to ensure real-time performance of the AR-HUD application without resorting to over-provisioning the system.

Roll-Drop: accounting for observation noise with a single parameter

  • Authors: Luigi Campanaro, Daniele De Martini, Siddhant Gangapurwala, Wolfgang Merkt, Ioannis Havoutis
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13150
  • Pdf link: https://arxiv.org/pdf/2304.13150
  • Abstract
    This paper proposes a simple strategy for sim-to-real in Deep-Reinforcement Learning (DRL) -- called Roll-Drop -- that uses dropout during simulation to account for observation noise during deployment without explicitly modelling its distribution for each state. DRL is a promising approach to control robots for highly dynamic and feedback-based manoeuvres, and accurate simulators are crucial to providing cheap and abundant data to learn the desired behaviour. Nevertheless, the simulated data are noiseless and generally show a distributional shift that challenges the deployment on real machines where sensor readings are affected by noise. The standard solution is modelling the latter and injecting it during training; while this requires a thorough system identification, Roll-Drop enhances the robustness to sensor noise by tuning only a single parameter. We demonstrate an 80% success rate when up to 25% noise is injected in the observations, with twice higher robustness than the baselines. We deploy the controller trained in simulation on a Unitree A1 platform and assess this improved robustness on the physical system.

The Limited Integrator Model Regulator And its Use in Vehicle Steering Control

  • Authors: Bilin Aksun-Guvenc, Levent Guvenc
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.13161
  • Pdf link: https://arxiv.org/pdf/2304.13161
  • Abstract
    Unexpected yaw disturbances like braking on unilaterally icy road, side wind forces and tire rupture are very difficult to handle by the driver of a road vehicle, due to his/her large panic reaction period ranging between 0.5 to 2 seconds. Automatic driver assist systems provide counteracting yaw moments during this driver panic reaction period to maintain the stability of the yaw dynamics of the vehicle. An active steering based driver assist system that uses the model regulator control architecture is introduced and used here for yaw dynamics stabilization in such situations. The model regulator which is a special form of a two degree of freedom control architecture is introduced and explained in detail in a tutorial fashion whereby its integral action capability, among others, is also shown. An auxiliary steering actuation system is assumed and a limited integrator version of the model regulator based steering controller is developed in order not to saturate the auxiliary steering actuator. This low frequency limited integrator implementation also allows the driver to take care of low frequency steering and disturbance rejection tasks. Linear simulation results are used to demonstrate the effectiveness of the proposed method.

How to design, and tune, a computed torque controller: An introduction and a Matlab example

  • Authors: Lluís Ros
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.13167
  • Pdf link: https://arxiv.org/pdf/2304.13167
  • Abstract
    This note briefly introduces the computed torque control method for trajectory tracking. The method is applicable to fully actuated robots, i.e, those whose inverse dynamics can be solved for any feasible acceleration. This includes many systems, like robot arms or hands, or any tree-like mechanism with all its joints actuated. Using simple explanations, we see how such a controller can be obtained using feedback linearization, and how its gains can be tuned to satisfy a desired settling time for the error signal. We end up discussing the advantages and shortcomings of the controller. A companion Matlab script can be downloaded from https://bit.ly/3QShxYi that implements and tests the controller on a simple actuated pendulum.

Dynamic Datasets and Market Environments for Financial Reinforcement Learning

  • Authors: Xiao-Yang Liu, Ziyi Xia, Hongyang Yang, Jiechao Gao, Daochen Zha, Ming Zhu, Christina Dan Wang, Zhaoran Wang, Jian Guo
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13174
  • Pdf link: https://arxiv.org/pdf/2304.13174
  • Abstract
    The financial market is a particularly challenging playground for deep reinforcement learning due to its unique feature of dynamic datasets. Building high-quality market environments for training financial reinforcement learning (FinRL) agents is difficult due to major factors such as the low signal-to-noise ratio of financial data, survivorship bias of historical data, and model overfitting. In this paper, we present FinRL-Meta, a data-centric and openly accessible library that processes dynamic datasets from real-world markets into gym-style market environments and has been actively maintained by the AI4Finance community. First, following a DataOps paradigm, we provide hundreds of market environments through an automatic data curation pipeline. Second, we provide homegrown examples and reproduce popular research papers as stepping stones for users to design new trading strategies. We also deploy the library on cloud platforms so that users can visualize their own results and assess the relative performance via community-wise competitions. Third, we provide dozens of Jupyter/Python demos organized into a curriculum and a documentation website to serve the rapidly growing community. The open-source codes for the data curation pipeline are available at https://github.com/AI4Finance-Foundation/FinRL-Meta

Splitting physics-informed neural networks for inferring the dynamics of integer- and fractional-order neuron models

  • Authors: Simin Shekarpaz, Fanhai Zeng, George Karniadakis
  • Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Computational Physics (physics.comp-ph)
  • Arxiv link: https://arxiv.org/abs/2304.13205
  • Pdf link: https://arxiv.org/pdf/2304.13205
  • Abstract
    We introduce a new approach for solving forward systems of differential equations using a combination of splitting methods and physics-informed neural networks (PINNs). The proposed method, splitting PINN, effectively addresses the challenge of applying PINNs to forward dynamical systems and demonstrates improved accuracy through its application to neuron models. Specifically, we apply operator splitting to decompose the original neuron model into sub-problems that are then solved using PINNs. Moreover, we develop an $L^1$ scheme for discretizing fractional derivatives in fractional neuron models, leading to improved accuracy and efficiency. The results of this study highlight the potential of splitting PINNs in solving both integer- and fractional-order neuron models, as well as other similar systems in computational science and engineering.

Performance of the Gittins Policy in the G/G/1 and G/G/k, With and Without Setup Times

  • Authors: Yige Hong, Ziv Scully
  • Subjects: Performance (cs.PF); Probability (math.PR)
  • Arxiv link: https://arxiv.org/abs/2304.13231
  • Pdf link: https://arxiv.org/pdf/2304.13231
  • Abstract
    How should we schedule jobs to minimize mean queue length? In the preemptive M/G/1 queue, we know the optimal policy is the Gittins policy, which uses any available information about jobs' remaining service times to dynamically prioritize jobs. For models more complex than the M/G/1, optimal scheduling is generally intractable. This leads us to ask: beyond the M/G/1, does Gittins still perform well? Recent results indicate that Gittins performs well in the M/G/k, meaning that its additive suboptimality gap is bounded by an expression which is negligible in heavy traffic. But allowing multiple servers is just one way to extend the M/G/1, and most other extensions remain open. Does Gittins still perform well with non-Poisson arrival processes? Or if servers require setup times when transitioning from idle to busy? In this paper, we give the first analysis of the Gittins policy that can handle any combination of (a) multiple servers, (b) non-Poisson arrivals, and (c) setup times. Our results thus cover the G/G/1 and G/G/k, with and without setup times, bounding Gittins's suboptimality gap in each case. Each of (a), (b), and (c) adds a term to our bound, but all the terms are negligible in heavy traffic, thus implying Gittins's heavy-traffic optimality in all the systems we consider. Another consequence of our results is that Gittins is optimal in the M/G/1 with setup times at all loads.

Analyzing In-browser Cryptojacking

  • Authors: Muhammad Saad, David Mohaisen
  • Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Machine Learning (cs.LG); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.13253
  • Pdf link: https://arxiv.org/pdf/2304.13253
  • Abstract
    Cryptojacking is the permissionless use of a target device to covertly mine cryptocurrencies. With cryptojacking, attackers use malicious JavaScript codes to force web browsers into solving proof-of-work puzzles, thus making money by exploiting the resources of the website visitors. To understand and counter such attacks, we systematically analyze the static, dynamic, and economic aspects of in-browser cryptojacking. For static analysis, we perform content, currency, and code-based categorization of cryptojacking samples to 1) measure their distribution across websites, 2) highlight their platform affinities, and 3) study their code complexities. We apply machine learning techniques to distinguish cryptojacking scripts from benign and malicious JavaScript samples with 100% accuracy. For dynamic analysis, we analyze the effect of cryptojacking on critical system resources, such as CPU and battery usage. We also perform web browser fingerprinting to analyze the information exchange between the victim node and the dropzone cryptojacking server. We also build an analytical model to empirically evaluate the feasibility of cryptojacking as an alternative to online advertisement. Our results show a sizeable negative profit and loss gap, indicating that the model is economically infeasible. Finally, leveraging insights from our analyses, we build countermeasures for in-browser cryptojacking that improve the existing remedies.

Bayesian Federated Learning: A Survey

  • Authors: Longbing Cao, Hui Chen, Xuhui Fan, Joao Gama, Yew-Soon Ong, Vipin Kumar
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.13267
  • Pdf link: https://arxiv.org/pdf/2304.13267
  • Abstract
    Federated learning (FL) demonstrates its advantages in integrating distributed infrastructure, communication, computing and learning in a privacy-preserving manner. However, the robustness and capabilities of existing FL methods are challenged by limited and dynamic data and conditions, complexities including heterogeneities and uncertainties, and analytical explainability. Bayesian federated learning (BFL) has emerged as a promising approach to address these issues. This survey presents a critical overview of BFL, including its basic concepts, its relations to Bayesian learning in the context of FL, and a taxonomy of BFL from both Bayesian and federated perspectives. We categorize and discuss client- and server-side and FL-based BFL methods and their pros and cons. The limitations of the existing BFL methods and the future directions of BFL research further address the intricate requirements of real-life FL applications.

Game-based Platforms for Artificial Intelligence Research

  • Authors: Chengpeng Hu, Yunlong Zhao, Ziqi Wang, Haocheng Du, Jialin Liu
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13269
  • Pdf link: https://arxiv.org/pdf/2304.13269
  • Abstract
    Games have been the perfect test-beds for artificial intelligence research for the characteristics that widely exist in real-world scenarios. Learning and optimisation, decision making in dynamic and uncertain environments, game theory, planning and scheduling, design and education are common research areas shared between games and real-world problems. Numerous open-sourced games or game-based environments have been implemented for studying artificial intelligence. In addition to single- or multi-player, collaborative or adversarial games, there has also been growing interest in implementing platforms for creative design in recent years. Those platforms provide ideal benchmarks for exploring and comparing artificial intelligence ideas and techniques. This paper reviews the game-based platforms for artificial intelligence research, discusses the research trend induced by the evolution of those platforms, and gives an outlook.

Machine Vision-Based Crop-Load Estimation Using YOLOv8

  • Authors: Dawood Ahmed, Ranjan Sapkota, Martin Churuvija, Manoj Karkee
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.13282
  • Pdf link: https://arxiv.org/pdf/2304.13282
  • Abstract
    Labor shortages in fruit crop production have prompted the development of mechanized and automated machines as alternatives to labor-intensive orchard operations such as harvesting, pruning, and thinning. Agricultural robots capable of identifying tree canopy parts and estimating geometric and topological parameters, such as branch diameter, length, and angles, can optimize crop yields through automated pruning and thinning platforms. In this study, we proposed a machine vision system to estimate canopy parameters in apple orchards and determine an optimal number of fruit for individual branches, providing a foundation for robotic pruning, flower thinning, and fruitlet thinning to achieve desired yield and quality.Using color and depth information from an RGB-D sensor (Microsoft Azure Kinect DK), a YOLOv8-based instance segmentation technique was developed to identify trunks and branches of apple trees during the dormant season. Principal Component Analysis was applied to estimate branch diameter (used to calculate limb cross-sectional area, or LCSA) and orientation. The estimated branch diameter was utilized to calculate LCSA, which served as an input for crop-load estimation, with larger LCSA values indicating a higher potential fruit-bearing capacity.RMSE for branch diameter estimation was 2.08 mm, and for crop-load estimation, 3.95. Based on commercial apple orchard management practices, the target crop-load (number of fruit) for each segmented branch was estimated with a mean absolute error (MAE) of 2.99 (ground truth crop-load was 6 apples per LCSA). This study demonstrated a promising workflow with high performance in identifying trunks and branches of apple trees in dynamic commercial orchard environments and integrating farm management practices into automated decision-making.

Membrane Potential Distribution Adjustment and Parametric Surrogate Gradient in Spiking Neural Networks

  • Authors: Siqi Wang, Tee Hiang Cheng, Meng-Hiot Lim
  • Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.13289
  • Pdf link: https://arxiv.org/pdf/2304.13289
  • Abstract
    As an emerging network model, spiking neural networks (SNNs) have aroused significant research attentions in recent years. However, the energy-efficient binary spikes do not augur well with gradient descent-based training approaches. Surrogate gradient (SG) strategy is investigated and applied to circumvent this issue and train SNNs from scratch. Due to the lack of well-recognized SG selection rule, most SGs are chosen intuitively. We propose the parametric surrogate gradient (PSG) method to iteratively update SG and eventually determine an optimal surrogate gradient parameter, which calibrates the shape of candidate SGs. In SNNs, neural potential distribution tends to deviate unpredictably due to quantization error. We evaluate such potential shift and propose methodology for potential distribution adjustment (PDA) to minimize the loss of undesired pre-activations. Experimental results demonstrate that the proposed methods can be readily integrated with backpropagation through time (BPTT) algorithm and help modulated SNNs to achieve state-of-the-art performance on both static and dynamic dataset with fewer timesteps.

Systems Modeling for novice engineers to comprehend software products better

  • Authors: Mrityunjay Kumar, Venkatesh Choppella
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.13294
  • Pdf link: https://arxiv.org/pdf/2304.13294
  • Abstract
    One of the key challenges for a novice engineer in a product company is to comprehend the product sufficiently and quickly. It can take anywhere from six months to several years for them to attain mastery but they need to start delivering results much before. SaaS (Software-as-a-Service) products have sophisticated system architecture which adds to the time and effort of understanding them. On the other hand, time available to new hires for product understanding continues to be short and getting shorter, given the pressure to deliver more in less time. Constructivist theory views learning as a personal process in which the learner constructs new knowledge for themselves. Building and refining a mental model is the key way in which they learn, similar to how the brain operates. This paper presents an approach to improve system comprehension process by using a system model that a) acts as a transitional object to aid and refine the mental model of the learner, and b) captures the current understanding of the dynamics of the software system in a way that can be reasoned with and simulated. We have adapted discrete systems modeling techniques and used a transition system as a lightweight modeling language. Such a model can be used by novice engineers during their product ramp-up phase to build a model of the software system that captures their knowledge of the system and aid their mental model. The paper also presents a learning approach in which the learners create and refine these models iteratively using the available and newly uncovered knowledge about the software system. We hypothesize that by leveraging this modeling language and approach, novice engineers can reduce the time it takes them to achieve desired proficiency level of system comprehension. This paper presents early ideas on this language and approach.

HiQ -- A Declarative, Non-intrusive, Dynamic and Transparent Observability and Optimization System

  • Authors: Fuheng Wu, Ivan Davchev, Jun Qian
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.13302
  • Pdf link: https://arxiv.org/pdf/2304.13302
  • Abstract
    This paper proposes a non-intrusive, declarative, dynamic and transparent system called HiQ to track Python program runtime information without compromising on the run-time system performance and losing insight. HiQ can be used for monolithic and distributed systems, offline and online applications. HiQ is developed when we optimize our large deep neural network (DNN) models which are written in Python, but it can be generalized to any Python program or distributed system, or even other languages like Java. We have implemented the system and adopted it in our deep learning model life cycle management system to catch the bottleneck while keeping our production code clean and highly performant. The implementation is open-sourced at: https://github.com/oracle/hiq.

The physical Church thesis and the sensitivity to initial conditions

  • Authors: Gilles Dowek
  • Subjects: Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.13318
  • Pdf link: https://arxiv.org/pdf/2304.13318
  • Abstract
    The physical Church thesis is a thesis about nature that expresses that all that can be computed by a physical system-a machine-is computable in the sense of computability theory. At a first look, this thesis seems contradictory with the existence, in nature, of chaotic dynamical systems, that is systems whose evolution cannot be ''computed'' because of their sensitivity to initial conditions. The goal of this note is to show that there exist dynamical systems that are both computable and chaotic, and thus that the existence in nature of chaotic dynamical system is not, per se, a refutation of the physical Church thesis. Thus, chaos seems to be compatible with computability, in the same way as it is compatible with determinism.

Event-triggered Boundary Control of a Class of Reaction-Diffusion PDEs with Time-dependent Reactivity

  • Authors: Bhathiya Rathnayake, Mamadou Diagne
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.13322
  • Pdf link: https://arxiv.org/pdf/2304.13322
  • Abstract
    This paper presents an event-triggered boundary control strategy for a class of reaction-diffusion PDEs with time-varying reactivity under Robin actuation. The control approach consists of a backstepping full-state feedback boundary controller and a dynamic event-triggering condition, which determines the time instants when the control input needs to be updated. It is proved that under the proposed event-triggered boundary control approach, there is a uniform minimal dwell-time between two event times. Furthermore, the well-posedness and the global exponential convergence of the closed-loop system to zero in $L^2$-sense are established. A simulation is conducted to validate the theoretical developments.

Evaluation of Regularization-based Continual Learning Approaches: Application to HAR

  • Authors: Bonpagna Kann (UGA, M-PSI), Sandra Castellanos-Paez (UGA, M-PSI), Philippe Lalanda (UGA, M-PSI)
  • Subjects: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.13327
  • Pdf link: https://arxiv.org/pdf/2304.13327
  • Abstract
    Pervasive computing allows the provision of services in many important areas, including the relevant and dynamic field of health and well-being. In this domain, Human Activity Recognition (HAR) has gained a lot of attention in recent years. Current solutions rely on Machine Learning (ML) models and achieve impressive results. However, the evolution of these models remains difficult, as long as a complete retraining is not performed. To overcome this problem, the concept of Continual Learning is very promising today and, more particularly, the techniques based on regularization. These techniques are particularly interesting for their simplicity and their low cost. Initial studies have been conducted and have shown promising outcomes. However, they remain very specific and difficult to compare. In this paper, we provide a comprehensive comparison of three regularization-based methods that we adapted to the HAR domain, highlighting their strengths and limitations. Our experiments were conducted on the UCI HAR dataset and the results showed that no single technique outperformed all others in all scenarios considered.

Group Equivariant BEV for 3D Object Detection

  • Authors: Hongwei Liu, Jian Yang, Jianfeng Zhang, Dongheng Shao, Jielong Guo, Shaobo Li, Xuan Tang, Xian Wei
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13390
  • Pdf link: https://arxiv.org/pdf/2304.13390
  • Abstract
    Recently, 3D object detection has attracted significant attention and achieved continuous improvement in real road scenarios. The environmental information is collected from a single sensor or multi-sensor fusion to detect interested objects. However, most of the current 3D object detection approaches focus on developing advanced network architectures to improve the detection precision of the object rather than considering the dynamic driving scenes, where data collected from sensors equipped in the vehicle contain various perturbation features. As a result, existing work cannot still tackle the perturbation issue. In order to solve this problem, we propose a group equivariant bird's eye view network (GeqBevNet) based on the group equivariant theory, which introduces the concept of group equivariant into the BEV fusion object detection network. The group equivariant network is embedded into the fused BEV feature map to facilitate the BEV-level rotational equivariant feature extraction, thus leading to lower average orientation error. In order to demonstrate the effectiveness of the GeqBevNet, the network is verified on the nuScenes validation dataset in which mAOE can be decreased to 0.325. Experimental results demonstrate that GeqBevNet can extract more rotational equivariant features in the 3D object detection of the actual road scene and improve the performance of object orientation prediction.

Acceleration for Timing-Aware Gate-Level Logic Simulation with One-Pass GPU Parallelism

  • Authors: Weijie Fang, Yanggeng Fu, Jiaquan Gao, Longkun Guo, Gregory Gutin, Xiaoyan Zhang
  • Subjects: Data Structures and Algorithms (cs.DS); Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.13398
  • Pdf link: https://arxiv.org/pdf/2304.13398
  • Abstract
    Witnessing the advancing scale and complexity of chip design and benefiting from high-performance computation technologies, the simulation of Very Large Scale Integration (VLSI) Circuits imposes an increasing requirement for acceleration through parallel computing with GPU devices. However, the conventional parallel strategies do not fully align with modern GPU abilities, leading to new challenges in the parallelism of VLSI simulation when using GPU, despite some previous successful demonstrations of significant acceleration. In this paper, we propose a novel approach to accelerate 4-value logic timing-aware gate-level logic simulation using waveform-based GPU parallelism. Our approach utilizes a new strategy that can effectively handle the dependency between tasks during the parallelism, reducing the synchronization requirement between CPU and GPU when parallelizing the simulation on combinational circuits. This approach requires only one round of data transfer and hence achieves one-pass parallelism. Moreover, to overcome the difficulty within the adoption of our strategy in GPU devices, we design a series of data structures and tune them to dynamically allocate and store new-generated output with uncertain scale. Finally, experiments are carried out on industrial-scale open-source benchmarks to demonstrate the performance gain of our approach compared to several state-of-the-art baselines.

Secure Communication Model For Quantum Federated Learning: A Post Quantum Cryptography (PQC) Framework

  • Authors: Dev Gurung, Shiva Raj Pokhrel, Gang Li
  • Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13413
  • Pdf link: https://arxiv.org/pdf/2304.13413
  • Abstract
    We design a model of Post Quantum Cryptography (PQC) Quantum Federated Learning (QFL). We develop a framework with a dynamic server selection and study convergence and security conditions. The implementation and results are publicly available1.

FLEX: an Adaptive Exploration Algorithm for Nonlinear Systems

  • Authors: Matthieu Blanke, Marc Lelarge
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.13426
  • Pdf link: https://arxiv.org/pdf/2304.13426
  • Abstract
    Model-based reinforcement learning is a powerful tool, but collecting data to fit an accurate model of the system can be costly. Exploring an unknown environment in a sample-efficient manner is hence of great importance. However, the complexity of dynamics and the computational limitations of real systems make this task challenging. In this work, we introduce FLEX, an exploration algorithm for nonlinear dynamics based on optimal experimental design. Our policy maximizes the information of the next step and results in an adaptive exploration algorithm, compatible with generic parametric learning models and requiring minimal resources. We test our method on a number of nonlinear environments covering different settings, including time-varying dynamics. Keeping in mind that exploration is intended to serve an exploitation objective, we also test our algorithm on downstream model-based classical control tasks and compare it to other state-of-the-art model-based and model-free approaches. The performance achieved by FLEX is competitive and its computational cost is low.

On MPC-based Strategies for Optimal Voltage References in DC Microgrids

  • Authors: Pol Jané-Soneira, Ionela Prodan, Albertus Johannes Malan, Sören Hohmann
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.13495
  • Pdf link: https://arxiv.org/pdf/2304.13495
  • Abstract
    Modern power systems are characterized by low inertia and fast voltage dynamics due to the increase of sources connecting via power electronics and the removal of large traditional thermal generators. Power electronics are commonly equipped with fast controllers that are able to reach a desired voltage setpoint within seconds. In this paper, we propose and compare two approaches using Model Predictive Control (MPC) to compute optimal voltage references for the power electronic devices in order to minimize the losses in a DC microgrid: i) a traditional setpoint-tracking MPC which receives a previously computed optimal setpoint; ii) an economic MPC which does not require a priori computed setpoints. We show that the economic MPC outperforms the setpoint-tracking MPC in simulations with the CIGRE benchmark system when multiple load disturbances occur. Some insights and discussions related to the stability of the closed-loop system using its dissipativity properties are highlighted for both approaches.

Techno-Economic Assessment in Communications: New Challenges

  • Authors: Carlos Bendicho, Daniel Bendicho
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.13505
  • Pdf link: https://arxiv.org/pdf/2304.13505
  • Abstract
    This article shows a brief history of Techno-Economic Assessment (TEA) in Communications, a proposed redefinition of TEA as well as the new challenges derived from a dynamic context with cloud-native virtualized networks, the Helium Network & alike blockchain-based decentralized networks, the new network as a platform (NaaP) paradigm, carbon pricing, network sharing, and web3, metaverse and blockchain technologies. The authors formulate the research question and show the need to improve TEA models to integrate and manage all this increasing complexity. This paper also proposes the characteristics TEA models should have and their current degree of compliance for several use cases: 5G and beyond, software-defined wide area network (SD-WAN), secure access service edge (SASE), secure service edge (SSE), and cloud cybersecurity risk assessment. The authors also present TEA extensibility to request for proposals (RFP) processes and other industries, to conclude that there is an urgent need for agile and effective TEA in Comms that allows industrialization of agile decision-making for all market stakeholders to choose the optimal solution for any technology, scenario and use case.

A Secure Medical Record Sharing Scheme Based on Blockchain and Two-fold Encryption

  • Authors: Md. Ahsan Habib, Kazi Md. Rokibul Alam, Yasuhiko Morimoto
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.13511
  • Pdf link: https://arxiv.org/pdf/2304.13511
  • Abstract
    Usually, a medical record (MR) contains the patients disease-oriented sensitive information. In addition, the MR needs to be shared among different bodies, e.g., diagnostic centres, hospitals, physicians, etc. Hence, retaining the privacy and integrity of MR is crucial. A blockchain based secure MR sharing system can manage these aspects properly. This paper proposes a blockchain based electronic (e-) MR sharing scheme that (i) considers the medical image and the text as the input, (ii) enriches the data privacy through a two-fold encryption mechanism consisting of an asymmetric cryptosystem and the dynamic DNA encoding, (iii) assures data integrity by storing the encrypted e-MR in the distinct block designated for each user in the blockchain, and (iv) eventually, enables authorized entities to regain the e-MR through decryption. Preliminary evaluations, analyses, comparisons with state-of-the-art works, etc., imply the efficacy of the proposed scheme.

D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs

  • Authors: Aditya Dhakal, Sameer G. Kulkarni, K. K. Ramakrishnan
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.13541
  • Pdf link: https://arxiv.org/pdf/2304.13541
  • Abstract
    Hardware accelerators such as GPUs are required for real-time, low-latency inference with Deep Neural Networks (DNN). However, due to the inherent limits to the parallelism they can exploit, DNNs often under-utilize the capacity of today's high-end accelerators. Although spatial multiplexing of the GPU, leads to higher GPU utilization and higher inference throughput, there remain a number of challenges. Finding the GPU percentage for right-sizing the GPU for each DNN through profiling, determining an optimal batching of requests to balance throughput improvement while meeting application-specific deadlines and service level objectives (SLOs), and maximizing throughput by appropriately scheduling DNNs are still significant challenges. This paper introduces a dynamic and fair spatio-temporal scheduler (D-STACK) that enables multiple DNNs to run in the GPU concurrently. To help allocate the appropriate GPU percentage (we call it the "Knee"), we develop and validate a model that estimates the parallelism each DNN can utilize. We also develop a lightweight optimization formulation to find an efficient batch size for each DNN operating with D-STACK. We bring together our optimizations and our spatio-temporal scheduler to provide a holistic inference framework. We demonstrate its ability to provide high throughput while meeting application SLOs. We compare D-STACK with an ideal scheduler that can allocate the right GPU percentage for every DNN kernel. D-STACK gets higher than 90 percent throughput and GPU utilization compared to the ideal scheduler. We also compare D-STACK with other GPU multiplexing and scheduling methods (e.g., NVIDIA Triton, Clipper, Nexus), using popular DNN models. Our controlled experiments with multiplexing several popular DNN models achieve up to 1.6X improvement in GPU utilization and up to 4X improvement in inference throughput.

FLCC: Efficient Distributed Federated Learning on IoMT over CSMA/CA

  • Authors: Abdelaziz Salama, Syed Ali Zaidi, Des McLernon, Mohammed M. H. Qazzaz
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.13549
  • Pdf link: https://arxiv.org/pdf/2304.13549
  • Abstract
    Federated Learning (FL) has emerged as a promising approach for privacy preservation, allowing sharing of the model parameters between users and the cloud server rather than the raw local data. FL approaches have been adopted as a cornerstone of distributed machine learning (ML) to solve several complex use cases. FL presents an interesting interplay between communication and ML performance when implemented over distributed wireless nodes. Both the dynamics of networking and learning play an important role. In this article, we investigate the performance of FL on an application that might be used to improve a remote healthcare system over ad hoc networks which employ CSMA/CA to schedule its transmissions. Our FL over CSMA/CA (FLCC) model is designed to eliminate untrusted devices and harness frequency reuse and spatial clustering techniques to improve the throughput required for coordinating a distributed implementation of FL in the wireless network. In our proposed model, frequency allocation is performed on the basis of spatial clustering performed using virtual cells. Each cell assigns a FL server and dedicated carrier frequencies to exchange the updated model's parameters within the cell. We present two metrics to evaluate the network performance: 1) probability of successful transmission while minimizing the interference, and 2) performance of distributed FL model in terms of accuracy and loss while considering the networking dynamics. We benchmark the proposed approach using a well-known MNIST dataset for performance evaluation. We demonstrate that the proposed approach outperforms the baseline FL algorithms in terms of explicitly defining the chosen users' criteria and achieving high accuracy in a robust network.

Turning block-sequential automata networks into smaller parallel networks with isomorphic limit dynamics

  • Authors: Pacôme Perrotin, Sylvain Sené
  • Subjects: Discrete Mathematics (cs.DM)
  • Arxiv link: https://arxiv.org/abs/2304.13550
  • Pdf link: https://arxiv.org/pdf/2304.13550
  • Abstract
    We state an algorithm that, given an automata network and a block-sequential update schedule, produces an automata network of the same size or smaller with the same limit dynamics under the parallel update schedule. Then, we focus on the family of automata cycles which share a unique path of automata, called tangential cycles, and show that a restriction of our algorithm allows to reduce any instance of these networks under a block-sequential update schedule into a smaller parallel network of the family and to characterize the number of reductions operated while conserving their limit dynamics. We also show that any tangential cycles reduced by our main algorithm are transformed into a network whose size is that of the largest cycle of the initial network. We end by showing that the restricted algorithm allows the direct characterization of block-sequential double cycles as parallel ones.

Latency Target based Analysis of the DASH.js Player

  • Authors: Piers O'Hanlon, Adil Aslam
  • Subjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.13551
  • Pdf link: https://arxiv.org/pdf/2304.13551
  • Abstract
    We analyse the low latency performance of the three Adaptive Bitrate (ABR) algorithms in the dash.js Dynamic Adaptive Streaming over HTTP (DASH) player with respect to a range of latency targets and configuration options. We perform experiments on our DASH Testbed which allows for testing with a range of real world derived network profiles. Our experiments enable a better understanding of how latency targets affect quality of experience (QoE), and how well the different algorithms adhere to their targets. We find that with dash.js v4.5.0 the default Dynamic algorithm achieves the best overall QoE. We show that whilst the other algorithms can achieve higher video quality at lower latencies, they do so only at the expense of increased stalling. We analyse the poor performance of L2A-LL in our tests and develop modifications which demonstrate significant improvements. We also highlight how some low latency configuration settings can be detrimental to performance.

Leapfrog methods for relativistic charged-particle dynamics

  • Authors: Ernst Hairer, Christian Lubich, Yanyan Shi
  • Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Plasma Physics (physics.plasm-ph)
  • Arxiv link: https://arxiv.org/abs/2304.13578
  • Pdf link: https://arxiv.org/pdf/2304.13578
  • Abstract
    A basic leapfrog integrator and its energy-preserving and variational / symplectic variants are proposed and studied for the numerical integration of the equations of motion of relativistic charged particles in an electromagnetic field. The methods are based on a four-dimensional formulation of the equations of motion. Structure-preserving properties of the numerical methods are analysed, in particular conservation and long-time near-conservation of energy and mass shell as well as preservation of volume in phase space. In the non-relativistic limit, the considered methods reduce to the Boris algorithm for non-relativistic charged-particle dynamics and its energy-preserving and variational / symplectic variants.

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

  • Authors: Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Markus Wulfmeier, Jan Humplik, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley, Francesco Nori, Raia Hadsell, Nicolas Heess
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13653
  • Pdf link: https://arxiv.org/pdf/2304.13653
  • Abstract
    We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. We first trained individual skills in isolation and then composed those skills end-to-end in a self-play setting. The resulting policy exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and transitions between them in a smooth, stable, and efficient manner - well beyond what is intuitively expected from the robot. The agents also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. The full range of behaviors emerged from a small set of simple rewards. Our agents were trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer, despite significant unmodeled effects and variations across robot instances. Although the robots are inherently fragile, minor hardware modifications together with basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way. Indeed, even though the agents were optimized for scoring, in experiments they walked 156% faster, took 63% less time to get up, and kicked 24% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives. Examples of the emergent behaviors and full 1v1 matches are available on the supplementary website.

Learning battery model parameter dynamics from data with recursive Gaussian process regression

  • Authors: Antti Aitio, Dominik Jöst, Dirk Uwe Sauer, David A. Howey
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13666
  • Pdf link: https://arxiv.org/pdf/2304.13666
  • Abstract
    Estimating state of health is a critical function of a battery management system but remains challenging due to the variability of operating conditions and usage requirements of real applications. As a result, techniques based on fitting equivalent circuit models may exhibit inaccuracy at extremes of performance and over long-term ageing, or instability of parameter estimates. Pure data-driven techniques, on the other hand, suffer from lack of generality beyond their training dataset. In this paper, we propose a hybrid approach combining data- and model-driven techniques for battery health estimation. Specifically, we demonstrate a Bayesian data-driven method, Gaussian process regression, to estimate model parameters as functions of states, operating conditions, and lifetime. Computational efficiency is ensured through a recursive approach yielding a unified joint state-parameter estimator that learns parameter dynamics from data and is robust to gaps and varying operating conditions. Results show the efficacy of the method, on both simulated and measured data, including accurate estimates and forecasts of battery capacity and internal resistance. This opens up new opportunities to understand battery ageing in real applications.

A Control-Centric Benchmark for Video Prediction

  • Authors: Stephen Tian, Chelsea Finn, Jiajun Wu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.13723
  • Pdf link: https://arxiv.org/pdf/2304.13723
  • Abstract
    Video is a promising source of knowledge for embodied agents to learn models of the world's dynamics. Large deep networks have become increasingly effective at modeling complex video data in a self-supervised manner, as evaluated by metrics based on human perceptual similarity or pixel-wise comparison. However, it remains unclear whether current metrics are accurate indicators of performance on downstream tasks. We find empirically that for planning robotic manipulation, existing metrics can be unreliable at predicting execution success. To address this, we propose a benchmark for action-conditioned video prediction in the form of a control benchmark that evaluates a given model for simulated robotic manipulation through sampling-based planning. Our benchmark, Video Prediction for Visual Planning ($VP^2$), includes simulated environments with 11 task categories and 310 task instance definitions, a full planning implementation, and training datasets containing scripted interaction trajectories for each task category. A central design goal of our benchmark is to expose a simple interface -- a single forward prediction call -- so it is straightforward to evaluate almost any action-conditioned video prediction model. We then leverage our benchmark to study the effects of scaling model size, quantity of training data, and model ensembling by analyzing five highly-performant video prediction models, finding that while scale can improve perceptual quality when modeling visually diverse settings, other attributes such as uncertainty awareness can also aid planning performance.

New submissions for Fri, 17 Mar 23

Keyword: pruning

There is no result

Keyword: neural\ architecture\ search

There is no result

Keyword: 3d object detection

Among Us: Adversarially Robust Collaborative Perception by Consensus

  • Authors: Yiming Li, Qi Fang, Jiamu Bai, Siheng Chen, Felix Juefei-Xu, Chen Feng
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2303.09495
  • Pdf link: https://arxiv.org/pdf/2303.09495
  • Abstract
    Multiple robots could perceive a scene (e.g., detect objects) collaboratively better than individuals, although easily suffer from adversarial attacks when using deep learning. This could be addressed by the adversarial defense, but its training requires the often-unknown attacking mechanism. Differently, we propose ROBOSAC, a novel sampling-based defense strategy generalizable to unseen attackers. Our key idea is that collaborative perception should lead to consensus rather than dissensus in results compared to individual perception. This leads to our hypothesize-and-verify framework: perception results with and without collaboration from a random subset of teammates are compared until reaching a consensus. In such a framework, more teammates in the sampled subset often entail better perception performance but require longer sampling time to reject potential attackers. Thus, we derive how many sampling trials are needed to ensure the desired size of an attacker-free subset, or equivalently, the maximum size of such a subset that we can successfully sample within a given number of trials. We validate our method on the task of collaborative 3D object detection in autonomous driving scenarios.

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

  • Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09551
  • Pdf link: https://arxiv.org/pdf/2303.09551
  • Abstract
    3D scene understanding plays a vital role in vision-based autonomous driving. While most existing methods focus on 3D object detection, they have difficulty describing real-world objects of arbitrary shapes and infinite classes. Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial 2D-3D attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To obtain dense occupancy prediction, we design a pipeline to generate dense occupancy ground truth without expansive occupancy annotations. Specifically, we fuse multi-frame LiDAR scans of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense occupancy labels. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our method. Code and dataset are available at https://github.com/weiyithu/SurroundOcc

Keyword: voxel

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

  • Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09551
  • Pdf link: https://arxiv.org/pdf/2303.09551
  • Abstract
    3D scene understanding plays a vital role in vision-based autonomous driving. While most existing methods focus on 3D object detection, they have difficulty describing real-world objects of arbitrary shapes and infinite classes. Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial 2D-3D attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To obtain dense occupancy prediction, we design a pipeline to generate dense occupancy ground truth without expansive occupancy annotations. Specifically, we fuse multi-frame LiDAR scans of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense occupancy labels. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our method. Code and dataset are available at https://github.com/weiyithu/SurroundOcc

Keyword: lidar

SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments

  • Authors: Yudi Dai (1), Yitai Lin (1), Xiping Lin (2), Chenglu Wen (1), Lan Xu (2), Hongwei Yi (3), Siqi Shen (1), Yuexin Ma (2), Cheng Wang (1) ((1) Xiamen University, China, (2) ShanghaiTech University, China, (3) Max Planck Institute for Intelligent Systems, Germany)
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09095
  • Pdf link: https://arxiv.org/pdf/2303.09095
  • Abstract
    We present SLOPER4D, a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction in the wild. Employing a head-mounted device integrated with a LiDAR and camera, we record 12 human subjects' activities over 10 diverse urban scenes from an egocentric view. Frame-wise annotations for 2D key points, 3D pose parameters, and global translations are provided, together with reconstructed scene point clouds. To obtain accurate 3D ground truth in such large dynamic scenes, we propose a joint optimization method to fit local SMPL meshes to the scene and fine-tune the camera calibration during dynamic motions frame by frame, resulting in plausible and scene-natural 3D human poses. Eventually, SLOPER4D consists of 15 sequences of human motions, each of which has a trajectory length of more than 200 meters (up to 1,300 meters) and covers an area of more than 2,000 $m^2$ (up to 13,000 $m^2$), including more than 100K LiDAR frames, 300k video frames, and 500K IMU-based motion frames. With SLOPER4D, we provide a detailed and thorough analysis of two critical tasks, including camera-based 3D HPE and LiDAR-based 3D HPE in urban environments, and benchmark a new task, GHPE. The in-depth analysis demonstrates SLOPER4D poses significant challenges to existing methods and produces great research opportunities. The dataset and code are released at \url{this http URL}

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

  • Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09551
  • Pdf link: https://arxiv.org/pdf/2303.09551
  • Abstract
    3D scene understanding plays a vital role in vision-based autonomous driving. While most existing methods focus on 3D object detection, they have difficulty describing real-world objects of arbitrary shapes and infinite classes. Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial 2D-3D attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To obtain dense occupancy prediction, we design a pipeline to generate dense occupancy ground truth without expansive occupancy annotations. Specifically, we fuse multi-frame LiDAR scans of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense occupancy labels. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our method. Code and dataset are available at https://github.com/weiyithu/SurroundOcc

New submissions for Fri, 28 Apr 23

Keyword: efficient

SamurAI: A Versatile IoT Node With Event-Driven Wake-Up and Embedded ML Acceleration

  • Authors: Ivan Miro-Panades (LSTA), Benoit Tain (LECA), Jean-Frederic Christmann (LFIM), David Coriat (LIIM), Romain Lemaire (LIIM), Clement Jany, Baudouin Martineau (DSYS), Fabrice Chaix (DSYS), Guillaume Waltener (DSYS), Emmanuel Pluchart (LSTA), Jean-Philippe Noel (LFIM), Adam Makosiej, Maxime Montoya, Simone Bacles-Min (LIIM), David Briand (LIAE), Jean-Marc Philippe, Yvain Thonnart (LFIM), Alexandre Valentian (LSTA), Frederic Heitzmann (DSYS), Fabien Clermidy (DSCIN)
  • Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13726
  • Pdf link: https://arxiv.org/pdf/2304.13726
  • Abstract
    Increased capabilities such as recognition and self-adaptability are now required from IoT applications. While IoT node power consumption is a major concern for these applications, cloud-based processing is becoming unsustainable due to continuous sensor or image data transmission over the wireless network. Thus optimized ML capabilities and data transfers should be integrated in the IoT node. Moreover, IoT applications are torn between sporadic data-logging and energy-hungry data processing (e.g. image classification). Thus, the versatility of the node is key in addressing this wide diversity of energy and processing needs. This paper presents SamurAI, a versatile IoT node bridging this gap in processing and in energy by leveraging two on-chip sub-systems: a low power, clock-less, event-driven Always-Responsive (AR) part and an energy-efficient On-Demand (OD) part. AR contains a 1.7MOPS event-driven, asynchronous Wake-up Controller (WuC) with a 207ns wake-up time optimized for sporadic computing, while OD combines a deep-sleep RISC-V CPU and 1.3TOPS/W Machine Learning (ML) for more complex tasks up to 36GOPS. This architecture partitioning achieves best in class versatility metrics such as peak performance to idle power ratio. On an applicative classification scenario, it demonstrates system power gains, up to 3.5x compared to cloud-based processing, and thus extended battery lifetime.

A Unified Approach to Lane Change Intention Recognition and Driving Status Prediction through TCN-LSTM and Multi-Task Learning Models

  • Authors: Renteng Yuan, Mohamed Abdel-Aty, Xin Gu, Ou Zheng, Qiaojun Xiang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13732
  • Pdf link: https://arxiv.org/pdf/2304.13732
  • Abstract
    Lane change (LC) is a continuous and complex operation process. Accurately detecting and predicting LC processes can help traffic participants better understand their surrounding environment, recognize potential LC safety hazards, and improve traffic safety. This present paper focuses on LC processes, developing an LC intention recognition (LC-IR) model and an LC status prediction (LC-SP) model. A novel ensemble temporal convolutional network with Long Short-Term Memory units (TCN-LSTM) is first proposed to capture long-range dependencies in sequential data. Then, three multi-task models (MTL-LSTM, MTL-TCN, MTL-TCN -LSTM) are developed to capture the intrinsic relationship among output indicators. Furthermore, a unified modeling framework for LC intention recognition and driving status prediction (LC-IR-SP) is developed. To validate the performance of the proposed models, a total number of 1023 vehicle trajectories is extracted from the CitySim dataset. The Pearson coefficient is employed to determine the related indicators. The results indicate that using150 frames as input length, the TCN-LSTM model with 96.67% accuracy outperforms TCN and LSTM models in LC intention classification and provides more balanced results for each class. Three proposed multi-tasking learning models provide markedly increased performance compared to corresponding single-task models, with an average reduction of 24.24% and 22.86% in the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), respectively. The developed LC-IR-SP model has promising applications for autonomous vehicles to identity lane change behaviors, calculate a real-time traffic conflict index and improve vehicle control strategies.

Surrogate Assisted Generation of Human-Robot Interaction Scenarios

  • Authors: Varun Bhatt, Heramb Nemlekar, Matthew Fontaine, Bryon Tjanaka, Hejia Zhang, Ya-Chuan Hsu, Stefanos Nikolaidis
  • Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13787
  • Pdf link: https://arxiv.org/pdf/2304.13787
  • Abstract
    As human-robot interaction (HRI) systems advance, so does the difficulty of evaluating and understanding the strengths and limitations of these systems in different environments and with different users. To this end, previous methods have algorithmically generated diverse scenarios that reveal system failures in a shared control teleoperation task. However, these methods require directly evaluating generated scenarios by simulating robot policies and human actions. The computational cost of these evaluations limits their applicability in more complex domains. Thus, we propose augmenting scenario generation systems with surrogate models that predict both human and robot behaviors. In the shared control teleoperation domain and a more complex shared workspace collaboration task, we show that surrogate assisted scenario generation efficiently synthesizes diverse datasets of challenging scenarios. We demonstrate that these failures are reproducible in real-world interactions.

A Data-Driven Hybrid Automaton Framework to Modeling Complex Dynamical Systems

  • Authors: Yejiang Yang, Zihao Mo, Weiming Xiang
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.13811
  • Pdf link: https://arxiv.org/pdf/2304.13811
  • Abstract
    In this paper, a computationally efficient data-driven hybrid automaton model is proposed to capture unknown complex dynamical system behaviors using multiple neural networks. The sampled data of the system is divided by valid partitions into groups corresponding to their topologies and based on which, transition guards are defined. Then, a collection of small-scale neural networks that are computationally efficient are trained as the local dynamical description for their corresponding topologies. After modeling the system with a neural-network-based hybrid automaton, the set-valued reachability analysis with low computation cost is provided based on interval analysis and a split and combined process. At last, a numerical example of the limit cycle is presented to illustrate that the developed models can significantly reduce the computational cost in reachable set computation without sacrificing any modeling precision.

Programmatically Grounded, Compositionally Generalizable Robotic Manipulation

  • Authors: Renhao Wang, Jiayuan Mao, Joy Hsu, Hang Zhao, Jiajun Wu, Yang Gao
  • Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.13826
  • Pdf link: https://arxiv.org/pdf/2304.13826
  • Abstract
    Robots operating in the real world require both rich manipulation skills as well as the ability to semantically reason about when to apply those skills. Towards this goal, recent works have integrated semantic representations from large-scale pretrained vision-language (VL) models into manipulation models, imparting them with more general reasoning capabilities. However, we show that the conventional pretraining-finetuning pipeline for integrating such representations entangles the learning of domain-specific action information and domain-general visual information, leading to less data-efficient training and poor generalization to unseen objects and tasks. To this end, we propose ProgramPort, a modular approach to better leverage pretrained VL models by exploiting the syntactic and semantic structures of language instructions. Our framework uses a semantic parser to recover an executable program, composed of functional modules grounded on vision and action across different modalities. Each functional module is realized as a combination of deterministic computation and learnable neural networks. Program execution produces parameters to general manipulation primitives for a robotic end-effector. The entire modular network can be trained with end-to-end imitation learning objectives. Experiments show that our model successfully disentangles action and perception, translating to improved zero-shot and compositional generalization in a variety of manipulation behaviors. Project webpage at: \url{https://progport.github.io}.

Physics-informed Data-driven Discovery of Constitutive Models with Application to Strain-Rate-sensitive Soft Materials

  • Authors: Kshitiz Upadhyay, Jan N. Fuhg, Nikolaos Bouklas, K.T. Ramesh
  • Subjects: Computational Engineering, Finance, and Science (cs.CE); Materials Science (cond-mat.mtrl-sci); Soft Condensed Matter (cond-mat.soft)
  • Arxiv link: https://arxiv.org/abs/2304.13897
  • Pdf link: https://arxiv.org/pdf/2304.13897
  • Abstract
    A novel data-driven constitutive modeling approach is proposed, which combines the physics-informed nature of modeling based on continuum thermodynamics with the benefits of machine learning. This approach is demonstrated on strain-rate-sensitive soft materials. This model is based on the viscous dissipation-based visco-hyperelasticity framework where the total stress is decomposed into volumetric, isochoric hyperelastic, and isochoric viscous overstress contributions. It is shown that each of these stress components can be written as linear combinations of the components of an irreducible integrity basis. Three Gaussian process regression-based surrogate models are trained (one per stress component) between principal invariants of strain and strain rate tensors and the corresponding coefficients of the integrity basis components. It is demonstrated that this type of model construction enforces key physics-based constraints on the predicted responses: the second law of thermodynamics, the principles of local action and determinism, objectivity, the balance of angular momentum, an assumed reference state, isotropy, and limited memory. The three surrogate models that constitute our constitutive model are evaluated by training them on small-size numerically generated data sets corresponding to a single deformation mode and then analyzing their predictions over a much wider testing regime comprising multiple deformation modes. Our physics-informed data-driven constitutive model predictions are compared with the corresponding predictions of classical continuum thermodynamics-based and purely data-driven models. It is shown that our surrogate models can reasonably capture the stress-strain-strain rate responses in both training and testing regimes, and provide improvements in terms of prediction accuracy, generalizability to multiple deformation modes, and compatibility with limited data.

MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results

  • Authors: Qingpeng Zhu, Wenxiu Sun, Yuekun Dai, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Qianhui Sun, Chen Change Loy, Jinwei Gu, Yi Yu, Yangke Huang, Kang Zhang, Meiya Chen, Yu Wang, Yongchao Li, Hao Jiang, Amrit Kumar Muduli, Vikash Kumar, Kunal Swami, Pankaj Kumar Bajpai, Yunchao Ma, Jiajun Xiao, Zhi Ling
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13916
  • Pdf link: https://arxiv.org/pdf/2304.13916
  • Abstract
    Depth completion from RGB images and sparse Time-of-Flight (ToF) measurements is an important problem in computer vision and robotics. While traditional methods for depth completion have relied on stereo vision or structured light techniques, recent advances in deep learning have enabled more accurate and efficient completion of depth maps from RGB images and sparse ToF measurements. To evaluate the performance of different depth completion methods, we organized an RGB+sparse ToF depth completion competition. The competition aimed to encourage research in this area by providing a standardized dataset and evaluation metrics to compare the accuracy of different approaches. In this report, we present the results of the competition and analyze the strengths and weaknesses of the top-performing methods. We also discuss the implications of our findings for future research in RGB+sparse ToF depth completion. We hope that this competition and report will help to advance the state-of-the-art in this important area of research. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2023.

Proportionally Representative Clustering

  • Authors: Haris Aziz, Barton E. Lee, Sean Morota Chu
  • Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2304.13917
  • Pdf link: https://arxiv.org/pdf/2304.13917
  • Abstract
    In recent years, there has been a surge in effort to formalize notions of fairness in machine learning. We focus on clustering -- one of the fundamental tasks in unsupervised machine learning. We propose a new axiom that captures proportional representation fairness (PRF). We make a case that the concept achieves the raison d'{^{e}}tre of several existing concepts in the literature in an arguably more convincing manner. Our fairness concept is not satisfied by existing fair clustering algorithms. We design efficient algorithms to achieve PRF both for unconstrained and discrete clustering problems.

SkinSAM: Empowering Skin Cancer Segmentation with Segment Anything Model

  • Authors: Mingzhe Hu, Yuheng Li, Xiaofeng Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13973
  • Pdf link: https://arxiv.org/pdf/2304.13973
  • Abstract
    Skin cancer is a prevalent and potentially fatal disease that requires accurate and efficient diagnosis and treatment. Although manual tracing is the current standard in clinics, automated tools are desired to reduce human labor and improve accuracy. However, developing such tools is challenging due to the highly variable appearance of skin cancers and complex objects in the background. In this paper, we present SkinSAM, a fine-tuned model based on the Segment Anything Model that showed outstanding segmentation performance. The models are validated on HAM10000 dataset which includes 10015 dermatoscopic images. While larger models (ViT_L, ViT_H) performed better than the smaller one (ViT_b), the finetuned model (ViT_b_finetuned) exhibited the greatest improvement, with a Mean pixel accuracy of 0.945, Mean dice score of 0.8879, and Mean IoU score of 0.7843. Among the lesion types, vascular lesions showed the best segmentation results. Our research demonstrates the great potential of adapting SAM to medical image segmentation tasks.

An FPTAS for Budgeted Laminar Matroid Independent Set

  • Authors: Ilan Doron-Arad, Ariel Kulik, Hadas Shachnai
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.13984
  • Pdf link: https://arxiv.org/pdf/2304.13984
  • Abstract
    We study the budgeted laminar matroid independent set problem. The input is a ground set, where each element has a cost and a non-negative profit, along with a laminar matroid over the elements and a budget. The goal is to select a maximum profit independent set of the matroid whose total cost is bounded by the budget. Several well known special cases, where we have, e.g., no matroid constraint (the classic knapsack problem) or a uniform matroid constraint (knapsack with a cardinality constraint), admit a fully polynomial-time approximation scheme (FPTAS). In contrast, the budgeted matroid independent set (BMI) problem with a general matroid has an efficient polynomial-time approximation scheme (EPTAS) but does not admit an FPTAS. This implies an EPTAS for our problem, which is the best known result prior to this work. We present an FPTAS for budgeted laminar matroid independent set, improving the previous EPTAS for this matroid family and generalizing the FPTAS known for knapsack with a cardinality constraint and multiple-choice knapsack. Our scheme is based on a simple dynamic program which utilizes the tree-like structure of laminar matroids.

Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification

  • Authors: Thanh-Tung Nguyen, Viktor Schlegel, Abhinav Kashyap, Stefan Winkler, Shao-Syuan Huang, Jie-Jyun Liu, Chih-Jen Lin
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13998
  • Pdf link: https://arxiv.org/pdf/2304.13998
  • Abstract
    Clinical notes are assigned ICD codes - sets of codes for diagnoses and procedures. In the recent years, predictive machine learning models have been built for automatic ICD coding. However, there is a lack of widely accepted benchmarks for automated ICD coding models based on large-scale public EHR data. This paper proposes a public benchmark suite for ICD-10 coding using a large EHR dataset derived from MIMIC-IV, the most recent public EHR dataset. We implement and compare several popular methods for ICD coding prediction tasks to standardize data preprocessing and establish a comprehensive ICD coding benchmark dataset. This approach fosters reproducibility and model comparison, accelerating progress toward employing automated ICD coding in future studies. Furthermore, we create a new ICD-9 benchmark using MIMIC-IV data, providing more data points and a higher number of ICD codes than MIMIC-III. Our open-source code offers easy access to data processing steps, benchmark creation, and experiment replication for those with MIMIC-IV access, providing insights, guidance, and protocols to efficiently develop ICD coding models.

A Supervised Machine Learning Approach to Operator Intent Recognition for Teleoperated Mobile Robot Navigation

  • Authors: Evangelos Tsagkournis, Dimitris Panagopoulos, Giannis Petousakis, Grigoris Nikolaou, Rustam Stolkin, Manolis Chiou
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14003
  • Pdf link: https://arxiv.org/pdf/2304.14003
  • Abstract
    In applications that involve human-robot interaction (HRI), human-robot teaming (HRT), and cooperative human-machine systems, the inference of the human partner's intent is of critical importance. This paper presents a method for the inference of the human operator's navigational intent, in the context of mobile robots that provide full or partial (e.g., shared control) teleoperation. We propose the Machine Learning Operator Intent Inference (MLOII) method, which a) processes spatial data collected by the robot's sensors; b) utilizes a supervised machine learning algorithm to estimate the operator's most probable navigational goal online. The proposed method's ability to reliably and efficiently infer the intent of the human operator is experimentally evaluated in realistically simulated exploration and remote inspection scenarios. The results in terms of accuracy and uncertainty indicate that the proposed method is comparable to another state-of-the-art method found in the literature.

Diagonalization Based Parallel-in-Time Method for a Class of Fourth Order Time Dependent PDEs

  • Authors: Gobinda Garai, Bankim C. Mandal
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14021
  • Pdf link: https://arxiv.org/pdf/2304.14021
  • Abstract
    In this paper, we design, analyze and implement efficient time parallel method for a class of fourth order time-dependent partial differential equations (PDEs), namely biharmonic heat equation, linearized Cahn-Hilliard (CH) equation and the nonlinear CH equation. We use diagonalization technique on all-at-once system to develop efficient iterative time parallel methods for investigating the solution behaviour of said equations. We present the convergence analysis of Parallel-in-Time (PinT) algorithms. We verify our findings by presenting numerical results.

Attacks on Robust Distributed Learning Schemes via Sensitivity Curve Maximization

  • Authors: Christian A. Schroth, Stefan Vlaski, Abdelhak M. Zoubir
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.14024
  • Pdf link: https://arxiv.org/pdf/2304.14024
  • Abstract
    Distributed learning paradigms, such as federated or decentralized learning, allow a collection of agents to solve global learning and optimization problems through limited local interactions. Most such strategies rely on a mixture of local adaptation and aggregation steps, either among peers or at a central fusion center. Classically, aggregation in distributed learning is based on averaging, which is statistically efficient, but susceptible to attacks by even a small number of malicious agents. This observation has motivated a number of recent works, which develop robust aggregation schemes by employing robust variations of the mean. We present a new attack based on sensitivity curve maximization (SCM), and demonstrate that it is able to disrupt existing robust aggregation schemes by injecting small, but effective perturbations.

COSST: Multi-organ Segmentation with Partially Labeled Datasets Using Comprehensive Supervisions and Self-training

  • Authors: Han Liu, Zhoubing Xu, Riqiang Gao, Hao Li, Jianing Wang, Guillaume Chabin, Ipek Oguz, Sasa Grbic
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14030
  • Pdf link: https://arxiv.org/pdf/2304.14030
  • Abstract
    Deep learning models have demonstrated remarkable success in multi-organ segmentation but typically require large-scale datasets with all organs of interest annotated. However, medical image datasets are often low in sample size and only partially labeled, i.e., only a subset of organs are annotated. Therefore, it is crucial to investigate how to learn a unified model on the available partially labeled datasets to leverage their synergistic potential. In this paper, we empirically and systematically study the partial-label segmentation with in-depth analyses on the existing approaches and identify three distinct types of supervision signals, including two signals derived from ground truth and one from pseudo label. We propose a novel training framework termed COSST, which effectively and efficiently integrates comprehensive supervision signals with self-training. Concretely, we first train an initial unified model using two ground truth-based signals and then iteratively incorporate the pseudo label signal to the initial model using self-training. To mitigate performance degradation caused by unreliable pseudo labels, we assess the reliability of pseudo labels via outlier detection in latent space and exclude the most unreliable pseudo labels from each self-training iteration. Extensive experiments are conducted on six CT datasets for three partial-label segmentation tasks. Experimental results show that our proposed COSST achieves significant improvement over the baseline method, i.e., individual networks trained on each partially labeled dataset. Compared to the state-of-the-art partial-label segmentation methods, COSST demonstrates consistent superior performance on various segmentation tasks and with different training data size.

A Parameterized Theory of PAC Learning

  • Authors: Cornelius Brand, Robert Ganian, Kirill Simonov
  • Subjects: Computational Complexity (cs.CC); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14058
  • Pdf link: https://arxiv.org/pdf/2304.14058
  • Abstract
    Probably Approximately Correct (i.e., PAC) learning is a core concept of sample complexity theory, and efficient PAC learnability is often seen as a natural counterpart to the class P in classical computational complexity. But while the nascent theory of parameterized complexity has allowed us to push beyond the P-NP ``dichotomy'' in classical computational complexity and identify the exact boundaries of tractability for numerous problems, there is no analogue in the domain of sample complexity that could push beyond efficient PAC learnability. As our core contribution, we fill this gap by developing a theory of parameterized PAC learning which allows us to shed new light on several recent PAC learning results that incorporated elements of parameterized complexity. Within the theory, we identify not one but two notions of fixed-parameter learnability that both form distinct counterparts to the class FPT -- the core concept at the center of the parameterized complexity paradigm -- and develop the machinery required to exclude fixed-parameter learnability. We then showcase the applications of this theory to identify refined boundaries of tractability for CNF and DNF learning as well as for a range of learning problems on graphs.

Fourier-Gegenbauer Pseudospectral Method for Solving Time-Dependent One-Dimensional Fractional Partial Differential Equations with Variable Coefficients and Periodic Solutions

  • Authors: Kareem T. Elgindy
  • Subjects: Numerical Analysis (math.NA); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.14061
  • Pdf link: https://arxiv.org/pdf/2304.14061
  • Abstract
    In this paper, we present a novel pseudospectral (PS) method for solving a new class of initial-value problems (IVPs) of time-dependent one-dimensional fractional partial differential equations (FPDEs) with variable coefficients and periodic solutions. A main ingredient of our work is the use of the recently developed periodic RL/Caputo fractional derivative (FD) operators with sliding positive fixed memory length of Bourafa et al. [1] or their reduced forms obtained by Elgindy [2] as the natural FD operators to accurately model FPDEs with periodic solutions. The proposed method converts the IVP into a well-conditioned linear system of equations using the PS method based on Fourier collocations and Gegenbauer quadratures. The reduced linear system has a simple special structure and can be solved accurately and rapidly by using standard linear system solvers. A rigorous study of the error and convergence of the proposed method is presented. The idea and results presented in this paper are expected to be useful in the future to address more general problems involving FPDEs with periodic solutions.

Lightweight, Pre-trained Transformers for Remote Sensing Timeseries

  • Authors: Gabriel Tseng, Ivan Zvonkov, Mirali Purohit, David Rolnick, Hannah Kerner
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14065
  • Pdf link: https://arxiv.org/pdf/2304.14065
  • Abstract
    Machine learning algorithms for parsing remote sensing data have a wide range of societally relevant applications, but labels used to train these algorithms can be difficult or impossible to acquire. This challenge has spurred research into self-supervised learning for remote sensing data aiming to unlock the use of machine learning in geographies or application domains where labelled datasets are small. Current self-supervised learning approaches for remote sensing data draw significant inspiration from techniques applied to natural images. However, remote sensing data has important differences from natural images -- for example, the temporal dimension is critical for many tasks and data is collected from many complementary sensors. We show that designing models and self-supervised training techniques specifically for remote sensing data results in both smaller and more performant models. We introduce the Pretrained Remote Sensing Transformer (Presto), a transformer-based model pre-trained on remote sensing pixel-timeseries data. Presto excels at a wide variety of globally distributed remote sensing tasks and outperforms much larger models. Presto can be used for transfer learning or as a feature extractor for simple models, enabling efficient deployment at scale.

Linear and Nonlinear Parareal Methods for the Cahn-Hilliard Equation

  • Authors: Gobinda Garai, Bankim C. Mandal
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14074
  • Pdf link: https://arxiv.org/pdf/2304.14074
  • Abstract
    In this paper, we propose, analyze and implement efficient time parallel methods for the Cahn-Hilliard (CH) equation. It is of great importance to develop efficient numerical methods for the CH equation, given the range of applicability of the CH equation has. The CH equation generally needs to be simulated for a very long time to get the solution of phase coarsening stage. Therefore it is desirable to accelerate the computation using parallel method in time. We present linear and nonlinear Parareal methods for the CH equation depending on the choice of fine approximation. We illustrate our results by numerical experiments.

Lowering the Entry Bar to HPC-Scale Uncertainty Quantification

  • Authors: Linus Seelinger, Anne Reinarz, Jean Benezech, Mikkel Bue Lykkegaard, Lorenzo Tamellini, Robert Scheichl
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14087
  • Pdf link: https://arxiv.org/pdf/2304.14087
  • Abstract
    Treating uncertainties in models is essential in many fields of science and engineering. Uncertainty quantification (UQ) on complex and computationally costly numerical models necessitates a combination of efficient model solvers, advanced UQ methods and HPC-scale resources. The resulting technical complexities as well as lack of separation of concerns between UQ and model experts is holding back many interesting UQ applications. The aim of this paper is to close the gap between advanced UQ methods and advanced models by removing the hurdle of complex software stack integration, which in turn will offer a straightforward way to scale even prototype-grade UQ applications to high-performance resources. We achieve this goal by introducing a parallel software architecture based on UM-Bridge, a universal interface for linking UQ and models. We present three realistic applications from different areas of science and engineering, scaling from single machines to large clusters on the Google Cloud Platform.

Securing Autonomous Air Traffic Management: Blockchain Networks Driven by Explainable AI

  • Authors: Louise Axon, Dimitrios Panagiotakopoulos, Samuel Ayo, Carolina Sanchez-Hernandez, Yan Zong, Simon Brown, Lei Zhang, Michael Goldsmith, Sadie Creese, Weisi Guo
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.14095
  • Pdf link: https://arxiv.org/pdf/2304.14095
  • Abstract
    Air Traffic Management data systems today are inefficient and not scalable to enable future unmanned systems. Current data is fragmented, siloed, and not easily accessible. There is data conflict, misuse, and eroding levels of trust in provenance and accuracy. With increased autonomy in aviation, Artificially Intelligent (AI) enabled unmanned traffic management (UTM) will be more reliant on secure data from diverse stakeholders. There is an urgent need to develop a secure network that has trustworthy data chains and works with the requirements generated by UTM. Here, we review existing research in 3 key interconnected areas: (1) blockchain development for secure data transfer between competing aviation stakeholders, (2) self-learning networking architectures that distribute consensus to achieve secure air traffic control, (3) explainable AI to build trust with human stakeholders and backpropagate requirements for blockchain and network optimisation. When connected together, this new digital ecosystem blueprint is tailored for safety critical UTM sectors. We motivate the readers with a case study, where a federated learning UTM uses real air traffic and weather data is secured and explained to human operators. This emerging area still requires significant research and development by the community to ensure it can enable future autonomous air mobility.

Learning Neural PDE Solvers with Parameter-Guided Channel Attention

  • Authors: Makoto Takamoto, Francesco Alesiani, Mathias Niepert
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn); Geophysics (physics.geo-ph)
  • Arxiv link: https://arxiv.org/abs/2304.14118
  • Pdf link: https://arxiv.org/pdf/2304.14118
  • Abstract
    Scientific Machine Learning (SciML) is concerned with the development of learned emulators of physical systems governed by partial differential equations (PDE). In application domains such as weather forecasting, molecular dynamics, and inverse design, ML-based surrogate models are increasingly used to augment or replace inefficient and often non-differentiable numerical simulation algorithms. While a number of ML-based methods for approximating the solutions of PDEs have been proposed in recent years, they typically do not adapt to the parameters of the PDEs, making it difficult to generalize to PDE parameters not seen during training. We propose a Channel Attention mechanism guided by PDE Parameter Embeddings (CAPE) component for neural surrogate models and a simple yet effective curriculum learning strategy. The CAPE module can be combined with neural PDE solvers allowing them to adapt to unseen PDE parameters. The curriculum learning strategy provides a seamless transition between teacher-forcing and fully auto-regressive training. We compare CAPE in conjunction with the curriculum learning strategy using a popular PDE benchmark and obtain consistent and significant improvements over the baseline models. The experiments also show several advantages of CAPE, such as its increased ability to generalize to unseen PDE parameters without large increases inference time and parameter count.

Exploiting Inductive Bias in Transformer for Point Cloud Classification and Segmentation

  • Authors: Zihao Li, Pan Gao, Hui Yuan, Ran Wei, Manoranjan Paul
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14124
  • Pdf link: https://arxiv.org/pdf/2304.14124
  • Abstract
    Discovering inter-point connection for efficient high-dimensional feature extraction from point coordinate is a key challenge in processing point cloud. Most existing methods focus on designing efficient local feature extractors while ignoring global connection, or vice versa. In this paper, we design a new Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point relations, which considers both local and global attentions. Specifically, considering local spatial coherence, local feature learning is performed through Relative Position Encoding and Attentive Feature Pooling. We incorporate the learned locality into the Transformer module. The local feature affects value component in Transformer to modulate the relationship between channels of each point, which can enhance self-attention mechanism with locality based channel interaction. We demonstrate its superiority experimentally on classification and segmentation tasks. The code is available at: https://github.com/jiamang/IBT

Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point Clouds

  • Authors: Pengfei Song, Luoyu MEI, Han Cheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); General Topology (math.GN)
  • Arxiv link: https://arxiv.org/abs/2304.14132
  • Pdf link: https://arxiv.org/pdf/2304.14132
  • Abstract
    This paper presents a framework for semantic segmentation on sparse sequential point clouds of millimeter-wave radar. Compared with cameras and lidars, millimeter-wave radars have the advantage of not revealing privacy, having a strong anti-interference ability, and having long detection distance. The sparsity and capturing temporal-topological features of mmWave data is still a problem. However, the issue of capturing the temporal-topological coupling features under the human semantic segmentation task prevents previous advanced segmentation methods (e.g PointNet, PointCNN, Point Transformer) from being well utilized in practical scenarios. To address the challenge caused by the sparsity and temporal-topological feature of the data, we (i) introduce graph structure and topological features to the point cloud, (ii) propose a semantic segmentation framework including a global feature-extracting module and a sequential feature-extracting module. In addition, we design an efficient and more fitting loss function for a better training process and segmentation results based on graph clustering. Experimentally, we deploy representative semantic segmentation algorithms (Transformer, GCNN, etc.) on a custom dataset. Experimental results indicate that our model achieves mean accuracy on the custom dataset by $\mathbf{82.31}%$ and outperforms the state-of-the-art algorithms. Moreover, to validate the model's robustness, we deploy our model on the well-known S3DIS dataset. On the S3DIS dataset, our model achieves mean accuracy by $\mathbf{92.6}%$, outperforming baseline algorithms.

Multiplicity Problems on Algebraic Series and Context-Free Grammars

  • Authors: Nikhil Balaji, Lorenzo Clemente, Klara Nosan, Mahsa Shirmohammadi, James Worrell
  • Subjects: Formal Languages and Automata Theory (cs.FL); Computational Complexity (cs.CC)
  • Arxiv link: https://arxiv.org/abs/2304.14145
  • Pdf link: https://arxiv.org/pdf/2304.14145
  • Abstract
    In this paper we obtain complexity bounds for computational problems on algebraic power series over several commuting variables. The power series are specified by systems of polynomial equations: a formalism closely related to weighted context-free grammars. We focus on three problems -- decide whether a given algebraic series is identically zero, determine whether all but finitely many coefficients are zero, and compute the coefficient of a specific monomial. We relate these questions to well-known computational problems on arithmetic circuits and thereby show that all three problems lie in the counting hierarchy. Our main result improves the best known complexity bound on deciding zeroness of an algebraic series. This problem is known to lie in PSPACE by reduction to the decision problem for the existential fragment of the theory of real closed fields. Here we show that the problem lies in the counting hierarchy by reduction to the problem of computing the degree of a polynomial given by an arithmetic circuit. As a corollary we obtain new complexity bounds on multiplicity equivalence of context-free grammars restricted to a bounded language, language inclusion of a nondeterministic finite automaton in an unambiguous context-free grammar, and language inclusion of a non-deterministic context-free grammar in an unambiguous finite automaton.

Tractability of sampling recovery on unweighted function classes

  • Authors: David Krieg
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14169
  • Pdf link: https://arxiv.org/pdf/2304.14169
  • Abstract
    It is well-known that the problem of sampling recovery in the $L_2$-norm on unweighted Korobov spaces (Sobolev spaces with mixed smoothness) as well as classical smoothness classes such as H"older classes suffers from the curse of dimensionality. We show that the problem is tractable for those classes if they are intersected with the Wiener algebra of functions with summable Fourier coefficients. In fact, this is a relatively simple implication of powerful results by Rauhut and Ward [Appl. Comput. Harmon. Anal. 40 (2016), pp. 321--351]. Tractability is achieved by the use of non-linear algorithms, while linear algorithms cannot do the job.

The Mutual Information In The Vicinity of Capacity-Achieving Input Distributions

  • Authors: Hao-Chung Cheng, Barış Nakiboğlu
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2304.14219
  • Pdf link: https://arxiv.org/pdf/2304.14219
  • Abstract
    The mutual information is analyzed as a function of the input distribution using an identity due to Tops\o{e} for channels with (possibly multiple) linear cost constraints and finite input and output sets. The mutual information is bounded above by a function decreasing quadratically with the distance to the set of all capacity-achieving input distributions for the case when the distance is less than a certain threshold. The closed-form expressions for the threshold and the coefficient of the quadratic decrease are derived. A counter-example demonstrating the non-existence of such a quadratic bound in the case of infinitely many linear cost constraints is provided. Implications of these observations for the channel coding problem and applications of the proof technique to related problems are discussed.

Developing Distributed High-performance Computing Capabilities of an Open Science Platform for Robust Epidemic Analysis

  • Authors: Nicholson Collier, Justin M. Wozniak, Abby Stevens, Yadu Babuji, Mickaël Binois, Ardindam Fadikar, Alexandra Würth, Kyle Chard, Jonathan Ozik
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.14244
  • Pdf link: https://arxiv.org/pdf/2304.14244
  • Abstract
    COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among domain experts, mathematical modelers, and scientific computing specialists. Computationally, however, it also revealed critical gaps in the ability of researchers to exploit advanced computing systems. These challenging areas include gaining access to scalable computing systems, porting models and workflows to new systems, sharing data of varying sizes, and producing results that can be reproduced and validated by others. Informed by our team's work in supporting public health decision makers during the COVID-19 pandemic and by the identified capability gaps in applying high-performance computing (HPC) to the modeling of complex social systems, we present the goals, requirements, and initial implementation of OSPREY, an open science platform for robust epidemic analysis. The prototype implementation demonstrates an integrated, algorithm-driven HPC workflow architecture, coordinating tasks across federated HPC resources, with robust, secure and automated access to each of the resources. We demonstrate scalable and fault-tolerant task execution, an asynchronous API to support fast time-to-solution algorithms, an inclusive, multi-language approach, and efficient wide-area data management. The example OSPREY code is made available on a public repository.

Evaluating the Impact of Pair Documentation on Requirements Quality and Team Productivity

  • Authors: Nosheen Qamar, Nosheen Sabahat, Amir Mashmool, Amir Mosavi
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.14255
  • Pdf link: https://arxiv.org/pdf/2304.14255
  • Abstract
    The most important deliverable of the requirements engineering process is the software requirements specification(SRS)document. Requirements documentation is important during the complete software development lifecycle to share the vision and effective communication between major stakeholders. The Standish Group reported that the top factors behind project failures are related to requirements. By giving the right level of attention to key requirements good quality software can be produced. Therefore, more research is needed in this area and this study is trying to fill this gap. This empirical study aims to examine the importance of pair documentation. Unconventional documentation refers to the approach when two persons work on the same document's requirements collaboratively just like pair programming on the requirements quality and team productivity. Twenty pairs of documentation writers worked into two groups. one group using pair documentation, i.e., the experimental group, and the other one using conventional documentation, i.e., the control group. the resultant requirement's documents for the same project, produced by both groups were then compared. It is observed that there is a significant improvement in the quality and productivity of the experimental group using pair documentation. The findings of this study may assist requirement engineers in forming efficient teams that can create high-quality SRS documents.

A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving Services

  • Authors: Dewant Katare, Diego Perino, Jari Nurmi, Martijn Warnier, Marijn Janssen, Aaron Yi Ding
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14271
  • Pdf link: https://arxiv.org/pdf/2304.14271
  • Abstract
    Autonomous driving services rely heavily on sensors such as cameras, LiDAR, radar, and communication modules. A common practice of processing the sensed data is using a high-performance computing unit placed inside the vehicle, which deploys AI models and algorithms to act as the brain or administrator of the vehicle. The vehicular data generated from average hours of driving can be up to 20 Terabytes depending on the data rate and specification of the sensors. Given the scale and fast growth of services for autonomous driving, it is essential to improve the overall energy and environmental efficiency, especially in the trend towards vehicular electrification (e.g., battery-powered). Although the areas have seen significant advancements in sensor technologies, wireless communications, computing and AI/ML algorithms, the challenge still exists in how to apply and integrate those technology innovations to achieve energy efficiency. This survey reviews and compares the connected vehicular applications, vehicular communications, approximation and Edge AI techniques. The focus is on energy efficiency by covering newly proposed approximation and enabling frameworks. To the best of our knowledge, this survey is the first to review the latest approximate Edge AI frameworks and publicly available datasets in energy-efficient autonomous driving. The insights and vision from this survey can be beneficial for the collaborative driving service development on low-power and memory-constrained systems and also for the energy optimization of autonomous vehicles.

On Solution Discovery via Reconfiguration

  • Authors: Michael R. Fellows, Mario Grobler, Nicole Megow, Amer E. Mouawad, Vijayaragunathan Ramamoorthi, Frances A. Rosamond, Daniel Schmand, Sebastian Siebertz
  • Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.14295
  • Pdf link: https://arxiv.org/pdf/2304.14295
  • Abstract
    The dynamics of real-world applications and systems require efficient methods for improving infeasible solutions or restoring corrupted ones by making modifications to the current state of a system in a restricted way. We propose a new framework of solution discovery via reconfiguration for constructing a feasible solution for a given problem by executing a sequence of small modifications starting from a given state. Our framework integrates and formalizes different aspects of classical local search, reoptimization, and combinatorial reconfiguration. We exemplify our framework on a multitude of fundamental combinatorial problems, namely Vertex Cover, Independent Set, Dominating Set, and Coloring. We study the classical as well as the parameterized complexity of the solution discovery variants of those problems and explore the boundary between tractable and intractable instances.

Incremental Generalized Category Discovery

  • Authors: Bingchen Zhao, Oisin Mac Aodha
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14310
  • Pdf link: https://arxiv.org/pdf/2304.14310
  • Abstract
    We explore the problem of Incremental Generalized Category Discovery (IGCD). This is a challenging category incremental learning setting where the goal is to develop models that can correctly categorize images from previously seen categories, in addition to discovering novel ones. Learning is performed over a series of time steps where the model obtains new labeled and unlabeled data, and discards old data, at each iteration. The difficulty of the problem is compounded in our generalized setting as the unlabeled data can contain images from categories that may or may not have been observed before. We present a new method for IGCD which combines non-parametric categorization with efficient image sampling to mitigate catastrophic forgetting. To quantify performance, we propose a new benchmark dataset named iNatIGCD that is motivated by a real-world fine-grained visual categorization task. In our experiments we outperform existing related methods

Empirical Individual State Observability

  • Authors: Benjamin Cellini, Burak Boyacıoğlu, Floris van Breugel
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.14313
  • Pdf link: https://arxiv.org/pdf/2304.14313
  • Abstract
    A dynamical system is observable if there is a one-to-one mapping from the system's measured outputs and inputs to all of the system's states. Analytical and empirical tools exist for quantifying the (full state) observability of linear and nonlinear systems; however, empirical tools for evaluating the observability of individual state variables are lacking. Here, a new empirical approach termed Empirical Individual State Observability (E-ISO) is developed to quantify the level of observability of individual state variables. E-ISO first builds an empirical observability matrix via simulation, then applies convex optimization to efficiently determine the subset of its rows required to estimate each state variable individually. Finally, (un)observability measures for these subsets are calculated to provide independent estimates of the observability of each state variable. Multiple example applications of E-ISO on linear and nonlinear systems are shown to be consistent with analytical results. Broadly, E-ISO will be an invaluable tool both for designing active sensing control laws or optimizing sensor placement to increase the observability of individual state variables for engineered systems, and analyzing the trajectory decisions made by organisms.

SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

  • Authors: Yichen Xie, Chenfeng Xu, Marie-Julie Rakotosaona, Patrick Rim, Federico Tombari, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14340
  • Pdf link: https://arxiv.org/pdf/2304.14340
  • Abstract
    By identifying four important components of existing LiDAR-camera 3D object detection methods (LiDAR and camera candidates, transformation, and fusion outputs), we observe that all existing methods either find dense candidates or yield dense representations of scenes. However, given that objects occupy only a small part of a scene, finding dense candidates and generating dense representations is noisy and inefficient. We propose SparseFusion, a novel multi-sensor 3D detection method that exclusively uses sparse candidates and sparse representations. Specifically, SparseFusion utilizes the outputs of parallel detectors in the LiDAR and camera modalities as sparse candidates for fusion. We transform the camera candidates into the LiDAR coordinate space by disentangling the object representations. Then, we can fuse the multi-modality candidates in a unified 3D space by a lightweight self-attention module. To mitigate negative transfer between modalities, we propose novel semantic and geometric cross-modality transfer modules that are applied prior to the modality-specific detectors. SparseFusion achieves state-of-the-art performance on the nuScenes benchmark while also running at the fastest speed, even outperforming methods with stronger backbones. We perform extensive experiments to demonstrate the effectiveness and efficiency of our modules and overall method pipeline. Our code will be made publicly available at https://github.com/yichen928/SparseFusion.

$π$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

  • Authors: Chengyue Wu, Teng Wang, Yixiao Ge, Zeyu Lu, Ruisong Zhou, Ping Luo, Ying Shan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14381
  • Pdf link: https://arxiv.org/pdf/2304.14381
  • Abstract
    Foundation models have achieved great advances in multi-task learning with a unified interface of unimodal and multimodal tasks. However, the potential of such multi-task learners has not been exploited during transfer learning. In this work, we present a universal parameter-efficient transfer learning method, termed Predict-Interpolate Tuning ($\pi$-Tuning), for vision, language, and vision-language tasks. It aggregates the parameters of lightweight task-specific experts learned from similar tasks to aid the target downstream task. The task similarities are predicted in a unified modality-independent space, yielding a scalable graph to demonstrate task relationships. $\pi$-Tuning has several appealing benefits. First, it flexibly explores both intra- and inter-modal transferability between similar tasks to improve the accuracy and robustness of transfer learning, especially in data-scarce scenarios. Second, it offers a systematical solution for transfer learning with multi-task prediction-and-then-interpolation, compatible with diverse types of parameter-efficient experts, such as prompt and adapter. Third, an extensive study of task-level mutual benefits on 14 unimodal and 6 multimodal datasets shows that $\pi$-Tuning surpasses fine-tuning and other parameter-efficient transfer learning methods both in full-shot and low-shot regimes. The task graph also enables an in-depth interpretable analysis of task transferability across modalities.

Dynamic Pricing and Learning with Bayesian Persuasion

  • Authors: Shipra Agrawal, Yiding Feng, Wei Tang
  • Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14385
  • Pdf link: https://arxiv.org/pdf/2304.14385
  • Abstract
    We consider a novel dynamic pricing and learning setting where in addition to setting prices of products in sequential rounds, the seller also ex-ante commits to 'advertising schemes'. That is, in the beginning of each round the seller can decide what kind of signal they will provide to the buyer about the product's quality upon realization. Using the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses, we formulate the problem of finding an optimal design of the advertising scheme along with a pricing scheme that maximizes the seller's expected revenue. Without any apriori knowledge of the buyers' demand function, our goal is to design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy. We study the regret of the algorithm when compared to the optimal clairvoyant price and advertising scheme. Our main result is a computationally efficient online algorithm that achieves an $O(T^{2/3}(m\log T)^{1/3})$ regret bound when the valuation function is linear in the product quality. Here $m$ is the cardinality of the discrete product quality domain and $T$ is the time horizon. This result requires some natural monotonicity and Lipschitz assumptions on the valuation function, but no Lipschitz or smoothness assumption on the buyers' demand function. For constant $m$, our result matches the regret lower bound for dynamic pricing within logarithmic factors, which is a special case of our problem. We also obtain several improved results for the widely considered special case of additive valuations, including an $\tilde{O}(T^{2/3})$ regret bound independent of $m$ when $m\le T^{1/3}$.

string2string: A Modern Python Library for String-to-String Algorithms

  • Authors: Mirac Suzgun, Stuart M. Shieber, Dan Jurafsky
  • Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL)
  • Arxiv link: https://arxiv.org/abs/2304.14395
  • Pdf link: https://arxiv.org/pdf/2304.14395
  • Abstract
    We introduce string2string, an open-source library that offers a comprehensive suite of efficient algorithms for a broad range of string-to-string problems. It includes traditional algorithmic solutions as well as recent advanced neural approaches to tackle various problems in string alignment, distance measurement, lexical and semantic search, and similarity analysis -- along with several helpful visualization tools and metrics to facilitate the interpretation and analysis of these methods. Notable algorithms featured in the library include the Smith-Waterman algorithm for pairwise local alignment, the Hirschberg algorithm for global alignment, the Wagner-Fisher algorithm for edit distance, BARTScore and BERTScore for similarity analysis, the Knuth-Morris-Pratt algorithm for lexical search, and Faiss for semantic search. Besides, it wraps existing efficient and widely-used implementations of certain frameworks and metrics, such as sacreBLEU and ROUGE, whenever it is appropriate and suitable. Overall, the library aims to provide extensive coverage and increased flexibility in comparison to existing libraries for strings. It can be used for many downstream applications, tasks, and problems in natural-language processing, bioinformatics, and computational social sciences. It is implemented in Python, easily installable via pip, and accessible through a simple API. Source code, documentation, and tutorials are all available on our GitHub page: https://github.com/stanfordnlp/string2string.

Maximizing Model Generalization for Manufacturing with Self-Supervised Learning and Federated Learning

  • Authors: Matthew Russell, Peng Wang
  • Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.14398
  • Pdf link: https://arxiv.org/pdf/2304.14398
  • Abstract
    Deep Learning (DL) can diagnose faults and assess machine health from raw condition monitoring data without manually designed statistical features. However, practical manufacturing applications remain extremely difficult for existing DL methods. Machine data is often unlabeled and from very few health conditions (e.g., only normal operating data). Furthermore, models often encounter shifts in domain as process parameters change and new categories of faults emerge. Traditional supervised learning may struggle to learn compact, discriminative representations that generalize to these unseen target domains since it depends on having plentiful classes to partition the feature space with decision boundaries. Transfer Learning (TL) with domain adaptation attempts to adapt these models to unlabeled target domains but assumes similar underlying structure that may not be present if new faults emerge. This study proposes focusing on maximizing the feature generality on the source domain and applying TL via weight transfer to copy the model to the target domain. Specifically, Self-Supervised Learning (SSL) with Barlow Twins may produce more discriminative features for monitoring health condition than supervised learning by focusing on semantic properties of the data. Furthermore, Federated Learning (FL) for distributed training may also improve generalization by efficiently expanding the effective size and diversity of training data by sharing information across multiple client machines. Results show that Barlow Twins outperforms supervised learning in an unlabeled target domain with emerging motor faults when the source training data contains very few distinct categories. Incorporating FL may also provide a slight advantage by diffusing knowledge of health conditions between machines.

Keyword: faster

Physics-informed neural networks for predicting gas flow dynamics and unknown parameters in diesel engines

  • Authors: Kamaljyoti Nath, Xuhui Meng, Daniel J Smith, George Em Karniadakis
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13799
  • Pdf link: https://arxiv.org/pdf/2304.13799
  • Abstract
    This paper presents a physics-informed neural network (PINN) approach for monitoring the health of diesel engines. The aim is to evaluate the engine dynamics, identify unknown parameters in a "mean value" model, and anticipate maintenance requirements. The PINN model is applied to diesel engines with a variable-geometry turbocharger and exhaust gas recirculation, using measurement data of selected state variables. The results demonstrate the ability of the PINN model to predict simultaneously both unknown parameters and dynamics accurately with both clean and noisy data, and the importance of the self-adaptive weight in the loss function for faster convergence. The input data for these simulations are derived from actual engine running conditions, while the outputs are simulated data, making this a practical case study of PINN's ability to predict real-world dynamical systems. The mean value model of the diesel engine incorporates empirical formulae to represent certain states, but these formulae may not be generalizable to other engines. To address this, the study considers the use of deep neural networks (DNNs) in addition to the PINN model. The DNNs are trained using laboratory test data and are used to model the engine-specific empirical formulae in the mean value model, allowing for a more flexible and adaptive representation of the engine's states. In other words, the mean value model uses both the PINN model and the DNNs to represent the engine's states, with the PINN providing a physics-based understanding of the engine's overall dynamics and the DNNs offering a more engine-specific and adaptive representation of the empirical formulae. By combining these two approaches, the study aims to offer a comprehensive and versatile approach to monitoring the health and performance of diesel engines.

A Survey on Solving and Discovering Differential Equations Using Deep Neural Networks

  • Authors: Hyeonjung (Tari)Jung, Jayant Gupta, Bharat Jayaprakash, Matthew Eagon, Harish Panneer Selvam, Carl Molnar, William Northrop, Shashi Shekhar
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.13807
  • Pdf link: https://arxiv.org/pdf/2304.13807
  • Abstract
    Ordinary and partial differential equations (DE) are used extensively in scientific and mathematical domains to model physical systems. Current literature has focused primarily on deep neural network (DNN) based methods for solving a specific DE or a family of DEs. Research communities with a history of using DE models may view DNN-based differential equation solvers (DNN-DEs) as a faster and transferable alternative to current numerical methods. However, there is a lack of systematic surveys detailing the use of DNN-DE methods across physical application domains and a generalized taxonomy to guide future research. This paper surveys and classifies previous works and provides an educational tutorial for senior practitioners, professionals, and graduate students in engineering and computer science. First, we propose a taxonomy to navigate domains of DE systems studied under the umbrella of DNN-DE. Second, we examine the theory and performance of the Physics Informed Neural Network (PINN) to demonstrate how the influential DNN-DE architecture mathematically solves a system of equations. Third, to reinforce the key ideas of solving and discovery of DEs using DNN, we provide a tutorial using DeepXDE, a Python package for developing PINNs, to develop DNN-DEs for solving and discovering a classic DE, the linear transport equation.

Variational Bayes Made Easy

  • Authors: Mohammad Emtiyaz Khan
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.14251
  • Pdf link: https://arxiv.org/pdf/2304.14251
  • Abstract
    Variational Bayes is a popular method for approximate inference but its derivation can be cumbersome. To simplify the process, we give a 3-step recipe to identify the posterior form by explicitly looking for linearity with respect to expectations of well-known distributions. We can then directly write the update by simply ``reading-off'' the terms in front of those expectations. The recipe makes the derivation easier, faster, shorter, and more general.

Keyword: mobile

AI-based Predictive Analytic Approaches for safeguarding the Future of Electric/Hybrid Vehicles

  • Authors: Ishan Shivansh Bangroo
  • Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.13841
  • Pdf link: https://arxiv.org/pdf/2304.13841
  • Abstract
    In response to the global need for sustainable energy, green technology may help fight climate change. Before green infrastructure to be easily integrated into the world's energy system, it needs upgrading. By improving energy infrastructure and decision-making, artificial intelligence (AI) may help solve this challenge. EHVs have grown in popularity because to concerns about global warming and the need for more ecologically friendly transportation. EHVs may work better with cutting-edge technologies like AI. Electric vehicles (EVs) reduce greenhouse gas emissions and promote sustainable mobility. Electric automobiles (EVs) are growing in popularity due to their benefits for climate change mitigation and sustainable mobility. Unfortunately, EV production consumes a lot of energy and materials, which may harm nature. EV production is being improved using green technologies like artificial intelligence and predictive analysis. Electric and hybrid vehicles (EHVs) may help meet the need for ecologically friendly transportation. However, the Battery Management System (BMS) controls EHV performance and longevity. AI may improve EHV energy efficiency, emissions reduction, and sustainability. Remote hijacking, security breaches, and unauthorized access are EHV cybersecurity vulnerabilities addressed in the article. AI research and development may help make transportation more sustainable, as may optimizing EHVs and charging infrastructure.

Detecting inner-LAN anomalies using hierarchical forecasting

  • Authors: Sevvandi Kandanaarachchi, Mahdi Abolghasemi, Hideya Ochiai, Asha Rao
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.13941
  • Pdf link: https://arxiv.org/pdf/2304.13941
  • Abstract
    Increasing activity and the number of devices online are leading to increasing and more diverse cyber attacks. This continuously evolving attack activity makes signature-based detection methods ineffective. Once malware has infiltrated into a LAN, bypassing an external gateway or entering via an unsecured mobile device, it can potentially infect all nodes in the LAN as well as carry out nefarious activities such as stealing valuable data, leading to financial damage and loss of reputation. Such infiltration could be viewed as an insider attack, increasing the need for LAN monitoring and security. In this paper we aim to detect such inner-LAN activity by studying the variations in Address Resolution Protocol (ARP) calls within the LAN. We find anomalous nodes by modelling inner-LAN traffic using hierarchical forecasting methods. We substantially reduce the false positives ever present in anomaly detection, by using an extreme value theory based method. We use a dataset from a real inner-LAN monitoring project, containing over 10M ARP calls from 362 nodes. Furthermore, the small number of false positives generated using our methods, is a potential solution to the "alert fatigue" commonly reported by security experts.

A Review of Panoptic Segmentation for Mobile Mapping Point Clouds

  • Authors: Binbin Xiang, Yuanwen Yue, Torben Peters, Konrad Schindler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13980
  • Pdf link: https://arxiv.org/pdf/2304.13980
  • Abstract
    3D point cloud panoptic segmentation is the combined task to (i) assign each point to a semantic class and (ii) separate the points in each class into object instances. Recently there has been an increased interest in such comprehensive 3D scene understanding, building on the rapid advances of semantic segmentation due to the advent of deep 3D neural networks. Yet, to date there is very little work about panoptic segmentation of outdoor mobile-mapping data, and no systematic comparisons. The present paper tries to close that gap. It reviews the building blocks needed to assemble a panoptic segmentation pipeline and the related literature. Moreover, a modular pipeline is set up to perform comprehensive, systematic experiments to assess the state of panoptic segmentation in the context of street mapping. As a byproduct, we also provide the first public dataset for that task, by extending the NPM3D dataset to include instance labels.

A Supervised Machine Learning Approach to Operator Intent Recognition for Teleoperated Mobile Robot Navigation

  • Authors: Evangelos Tsagkournis, Dimitris Panagopoulos, Giannis Petousakis, Grigoris Nikolaou, Rustam Stolkin, Manolis Chiou
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14003
  • Pdf link: https://arxiv.org/pdf/2304.14003
  • Abstract
    In applications that involve human-robot interaction (HRI), human-robot teaming (HRT), and cooperative human-machine systems, the inference of the human partner's intent is of critical importance. This paper presents a method for the inference of the human operator's navigational intent, in the context of mobile robots that provide full or partial (e.g., shared control) teleoperation. We propose the Machine Learning Operator Intent Inference (MLOII) method, which a) processes spatial data collected by the robot's sensors; b) utilizes a supervised machine learning algorithm to estimate the operator's most probable navigational goal online. The proposed method's ability to reliably and efficiently infer the intent of the human operator is experimentally evaluated in realistically simulated exploration and remote inspection scenarios. The results in terms of accuracy and uncertainty indicate that the proposed method is comparable to another state-of-the-art method found in the literature.

MCLFIQ: Mobile Contactless Fingerprint Image Quality

  • Authors: Jannis Priesnitz, Axel Weißenfeld, Christian Rathgeb, Bernhard Strobl, Ralph Lessmann, Christoph Busch1
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14123
  • Pdf link: https://arxiv.org/pdf/2304.14123
  • Abstract
    We propose MCLFIQ: Mobile Contactless Fingerprint Image Quality, the first quality assessment algorithm for mobile contactless fingerprint samples. To this end, we retrained the NIST Fingerprint Image Quality (NFIQ) 2 method, which was originally designed for contact-based fingerprints, with a synthetic contactless fingerprint database. We evaluate the predictive performance of the resulting MCLFIQ model in terms of Error-vs.-Discard Characteristic (EDC) curves on three real-world contactless fingerprint databases using two recognition algorithms. In experiments, the MCLFIQ method is compared against the original NFIQ 2 method and a sharpness-based quality assessment algorithm developed for contactless fingerprint images. Obtained results show that the re-training of NFIQ 2 on synthetic data is a viable alternative to training on real databases. Moreover, the evaluation shows that our MCLFIQ method works more accurate and robust compared to NFIQ 2 and the sharpness-based quality assessment. We suggest considering the proposed MCLFIQ method as a candidate for a new standard algorithm for contactless fingerprint quality assessment.

Combining HoloLens with Instant-NeRFs: Advanced Real-Time 3D Mobile Mapping

  • Authors: Dennis Haitz, Boris Jutzi, Markus Ulrich, Miriam Jaeger, Patrick Huebner
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14301
  • Pdf link: https://arxiv.org/pdf/2304.14301
  • Abstract
    This work represents a large step into modern ways of fast 3D reconstruction based on RGB camera images. Utilizing a Microsoft HoloLens 2 as a multisensor platform that includes an RGB camera and an inertial measurement unit for SLAM-based camera-pose determination, we train a Neural Radiance Field (NeRF) as a neural scene representation in real-time with the acquired data from the HoloLens. The HoloLens is connected via Wifi to a high-performance PC that is responsible for the training and 3D reconstruction. After the data stream ends, the training is stopped and the 3D reconstruction is initiated, which extracts a point cloud of the scene. With our specialized inference algorithm, five million scene points can be extracted within 1 second. In addition, the point cloud also includes radiometry per point. Our method of 3D reconstruction outperforms grid point sampling with NeRFs by multiple orders of magnitude and can be regarded as a complete real-time 3D reconstruction method in a mobile mapping setup.

A Versatile Low-Complexity Feedback Scheme for FDD Systems via Generative Modeling

  • Authors: Nurettin Turan, Benedikt Fesl, Michael Koller, Michael Joham, Wolfgang Utschick
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.14373
  • Pdf link: https://arxiv.org/pdf/2304.14373
  • Abstract
    In this work, we propose a versatile feedback scheme which can be deployed for both single- and multi-user multiple-input multiple-output (MIMO) frequency division duplex (FDD) systems. Particularly, we propose to use a Gaussian mixture model (GMM) with a reduced number of parameters for codebook construction, feedback encoding, and precoder design. The GMM is fitted offline at the base station (BS) to uplink (UL) training samples to approximate the channel distribution of all possible mobile terminals (MTs) located inside the BS cell. Afterwards, a codebook is constructed, where each codebook entry is based on one GMM component. By extracting directional information of the constructed codebook, the proposed GMM-based feedback approach allows to jointly design the precoders of a multi-user MIMO (MU-MIMO) system using common precoding algorithms. Alternatively, the GMM's sample generation ability can be utilized to design the precoders using a state-of-the-art stochastic iterative algorithm. After offloading the GMM to the MTs, they determine their feedback simply as the index of the GMM component with the highest responsibility for their received pilot signal. This strategy exhibits low complexity and allows for parallelization. Simulation results show that the proposed approach outperforms conventional methods, especially for a reduced number of pilots.

Keyword: pruning

Fine Tuning with Abnormal Examples

  • Authors: Will Rieger
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.13783
  • Pdf link: https://arxiv.org/pdf/2304.13783
  • Abstract
    Given the prevalence of crowd sourced labor in creating Natural Language processing datasets, these aforementioned sets have become increasingly large. For instance, the SQUAD dataset currently sits at over 80,000 records. However, because the English language is rather repetitive in structure, the distribution of word frequencies in the SQUAD dataset's contexts are relatively unchanged. By measuring each sentences distance from the co-variate distance of frequencies of all sentences in the dataset, we identify 10,500 examples that create a more uniform distribution for training. While fine-tuning ELECTRA [4] on this subset of examples reaches better performance to a model trained on all 87,000 examples. Herein we introduce a methodology for systematically pruning datasets for fine tuning reaching better out of sample performance.

JaxPruner: A concise library for sparsity research

  • Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Karolina Dziugaite, Pablo Samuel Castro, Utku Evci
  • Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.14082
  • Pdf link: https://arxiv.org/pdf/2304.14082
  • Abstract
    This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks.

Keyword: voxel

There is no result

Keyword: lidar

Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point Clouds

  • Authors: Pengfei Song, Luoyu MEI, Han Cheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); General Topology (math.GN)
  • Arxiv link: https://arxiv.org/abs/2304.14132
  • Pdf link: https://arxiv.org/pdf/2304.14132
  • Abstract
    This paper presents a framework for semantic segmentation on sparse sequential point clouds of millimeter-wave radar. Compared with cameras and lidars, millimeter-wave radars have the advantage of not revealing privacy, having a strong anti-interference ability, and having long detection distance. The sparsity and capturing temporal-topological features of mmWave data is still a problem. However, the issue of capturing the temporal-topological coupling features under the human semantic segmentation task prevents previous advanced segmentation methods (e.g PointNet, PointCNN, Point Transformer) from being well utilized in practical scenarios. To address the challenge caused by the sparsity and temporal-topological feature of the data, we (i) introduce graph structure and topological features to the point cloud, (ii) propose a semantic segmentation framework including a global feature-extracting module and a sequential feature-extracting module. In addition, we design an efficient and more fitting loss function for a better training process and segmentation results based on graph clustering. Experimentally, we deploy representative semantic segmentation algorithms (Transformer, GCNN, etc.) on a custom dataset. Experimental results indicate that our model achieves mean accuracy on the custom dataset by $\mathbf{82.31}%$ and outperforms the state-of-the-art algorithms. Moreover, to validate the model's robustness, we deploy our model on the well-known S3DIS dataset. On the S3DIS dataset, our model achieves mean accuracy by $\mathbf{92.6}%$, outperforming baseline algorithms.

Quadric Representations for LiDAR Odometry, Mapping and Localization

  • Authors: Chao Xia, Chenfeng Xu, Patrick Rim, Mingyu Ding, Nanning Zheng, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14190
  • Pdf link: https://arxiv.org/pdf/2304.14190
  • Abstract
    Current LiDAR odometry, mapping and localization methods leverage point-wise representations of 3D scenes and achieve high accuracy in autonomous driving tasks. However, the space-inefficiency of methods that use point-wise representations limits their development and usage in practical applications. In particular, scan-submap matching and global map representation methods are restricted by the inefficiency of nearest neighbor searching (NNS) for large-volume point clouds. To improve space-time efficiency, we propose a novel method of describing scenes using quadric surfaces, which are far more compact representations of 3D objects than conventional point clouds. In contrast to point cloud-based methods, our quadric representation-based method decomposes a 3D scene into a collection of sparse quadric patches, which improves storage efficiency and avoids the slow point-wise NNS process. Our method first segments a given point cloud into patches and fits each of them to a quadric implicit function. Each function is then coupled with other geometric descriptors of the patch, such as its center position and covariance matrix. Collectively, these patch representations fully describe a 3D scene, which can be used in place of the original point cloud and employed in LiDAR odometry, mapping and localization algorithms. We further design a novel incremental growing method for quadric representations, which eliminates the need to repeatedly re-fit quadric surfaces from the original point cloud. Extensive odometry, mapping and localization experiments on large-volume point clouds in the KITTI and UrbanLoco datasets demonstrate that our method maintains low latency and memory utility while achieving competitive, and even superior, accuracy.

A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving Services

  • Authors: Dewant Katare, Diego Perino, Jari Nurmi, Martijn Warnier, Marijn Janssen, Aaron Yi Ding
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14271
  • Pdf link: https://arxiv.org/pdf/2304.14271
  • Abstract
    Autonomous driving services rely heavily on sensors such as cameras, LiDAR, radar, and communication modules. A common practice of processing the sensed data is using a high-performance computing unit placed inside the vehicle, which deploys AI models and algorithms to act as the brain or administrator of the vehicle. The vehicular data generated from average hours of driving can be up to 20 Terabytes depending on the data rate and specification of the sensors. Given the scale and fast growth of services for autonomous driving, it is essential to improve the overall energy and environmental efficiency, especially in the trend towards vehicular electrification (e.g., battery-powered). Although the areas have seen significant advancements in sensor technologies, wireless communications, computing and AI/ML algorithms, the challenge still exists in how to apply and integrate those technology innovations to achieve energy efficiency. This survey reviews and compares the connected vehicular applications, vehicular communications, approximation and Edge AI techniques. The focus is on energy efficiency by covering newly proposed approximation and enabling frameworks. To the best of our knowledge, this survey is the first to review the latest approximate Edge AI frameworks and publicly available datasets in energy-efficient autonomous driving. The insights and vision from this survey can be beneficial for the collaborative driving service development on low-power and memory-constrained systems and also for the energy optimization of autonomous vehicles.

SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

  • Authors: Yichen Xie, Chenfeng Xu, Marie-Julie Rakotosaona, Patrick Rim, Federico Tombari, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14340
  • Pdf link: https://arxiv.org/pdf/2304.14340
  • Abstract
    By identifying four important components of existing LiDAR-camera 3D object detection methods (LiDAR and camera candidates, transformation, and fusion outputs), we observe that all existing methods either find dense candidates or yield dense representations of scenes. However, given that objects occupy only a small part of a scene, finding dense candidates and generating dense representations is noisy and inefficient. We propose SparseFusion, a novel multi-sensor 3D detection method that exclusively uses sparse candidates and sparse representations. Specifically, SparseFusion utilizes the outputs of parallel detectors in the LiDAR and camera modalities as sparse candidates for fusion. We transform the camera candidates into the LiDAR coordinate space by disentangling the object representations. Then, we can fuse the multi-modality candidates in a unified 3D space by a lightweight self-attention module. To mitigate negative transfer between modalities, we propose novel semantic and geometric cross-modality transfer modules that are applied prior to the modality-specific detectors. SparseFusion achieves state-of-the-art performance on the nuScenes benchmark while also running at the fastest speed, even outperforming methods with stronger backbones. We perform extensive experiments to demonstrate the effectiveness and efficiency of our modules and overall method pipeline. Our code will be made publicly available at https://github.com/yichen928/SparseFusion.

SMAT: A Self-Reinforcing Framework for Simultaneous Mapping and Tracking in Unbounded Urban Environments

  • Authors: Tingxiang Fan, Bowen Shen, Yinqiang Zhang, Chuye Zhang, Lei Yang, Hua Chen, Wei Zhang, Jia Pan
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14356
  • Pdf link: https://arxiv.org/pdf/2304.14356
  • Abstract
    With the increasing prevalence of robots in daily life, it is crucial to enable robots to construct a reliable map online to navigate in unbounded and changing environments. Although existing methods can individually achieve the goals of spatial mapping and dynamic object detection and tracking, limited research has been conducted on an effective combination of these two important abilities. The proposed framework, SMAT (Simultaneous Mapping and Tracking), integrates the front-end dynamic object detection and tracking module with the back-end static mapping module using a self-reinforcing mechanism, which promotes mutual improvement of mapping and tracking performance. The conducted experiments demonstrate the framework's effectiveness in real-world applications, achieving successful long-range navigation and mapping in multiple urban environments using only one LiDAR, a CPU-only onboard computer, and a consumer-level GPS receiver.

Keyword: diffusion

Towards ethical multimodal systems

  • Authors: Alexis Roger, Esma Aïmeur, Irina Rish
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13765
  • Pdf link: https://arxiv.org/pdf/2304.13765
  • Abstract
    The impact of artificial intelligence systems on our society is increasing at an unprecedented speed. For instance, ChatGPT is being tested in mental health treatment applications such as Koko, Stable Diffusion generates pieces of art competitive with (or outperforming) human artists, and so on. Ethical concerns regarding the behavior and applications of generative AI systems have been increasing over the past years, and the field of AI alignment - steering the behavior of AI systems towards being aligned with human values - is a rapidly growing subfield of modern AI. In this paper, we address the challenges involved in ethical evaluation of a multimodal artificial intelligence system. The multimodal systems we focus on take both text and an image as input and output text, completing the sentence or answering the question asked as input. We perform the evaluation of these models in two steps: we first discus the creation of a multimodal ethical database and then use this database to construct morality-evaluating algorithms. The creation of the multimodal ethical database is done interactively through human feedback. Users are presented with multiple examples and votes on whether they are ethical or not. Once these answers have been aggregated into a dataset, we built and tested different algorithms to automatically evaluate the morality of multimodal systems. These algorithms aim to classify the answers as ethical or not. The models we tested are a RoBERTa-large classifier and a multilayer perceptron classifier.

Preserving Superconvergence of Spectral Elements for Curved Domains via $h$ and $p$-Geometric Refinement

  • Authors: Jacob Jones, Rebecca Conley, Xiangmin Jiao
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13766
  • Pdf link: https://arxiv.org/pdf/2304.13766
  • Abstract
    Spectral element methods (SEM), which are extensions of finite element methods (FEM), are important emerging techniques for solving partial differential equations in physics and engineering. SEM can potentially deliver better accuracy due to the potential superconvergence for well-shaped tensor-product elements. However, for complex geometries, the accuracy of SEM often degrades due to a combination of geometric inaccuracies near curved boundaries and the loss of superconvergence with simplicial or non-tensor-product elements. We propose to overcome the first issue by using $h$- and $p$-geometric refinement, to refine the mesh near high-curvature regions and increase the degree of geometric basis functions, respectively. We show that when using mixed-meshes with tensor-product elements in the interior of the domain, curvature-based geometric refinement near boundaries can improve the accuracy of the interior elements by reducing pollution errors and preserving the superconvergence. To overcome the second issue, we apply a post-processing technique to recover the accuracy near the curved boundaries by using the adaptive extended stencil finite element method (AES-FEM). The combination of curvature-based geometric refinement and accurate post-processing delivers an effective and easier-to-implement alternative to other methods based on exact geometries. We demonstrate our techniques by solving the convection-diffusion equation in 2D and show one to two orders of magnitude of improvement in the solution accuracy, even when the elements are poorly shaped near boundaries.

Multimodal Composite Association Score: Measuring Gender Bias in Generative Multimodal Models

  • Authors: Abhishek Mandal, Susan Leavy, Suzanne Little
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13855
  • Pdf link: https://arxiv.org/pdf/2304.13855
  • Abstract
    Generative multimodal models based on diffusion models have seen tremendous growth and advances in recent years. Models such as DALL-E and Stable Diffusion have become increasingly popular and successful at creating images from texts, often combining abstract ideas. However, like other deep learning models, they also reflect social biases they inherit from their training data, which is often crawled from the internet. Manually auditing models for biases can be very time and resource consuming and is further complicated by the unbounded and unconstrained nature of inputs these models can take. Research into bias measurement and quantification has generally focused on small single-stage models working on a single modality. Thus the emergence of multistage multimodal models requires a different approach. In this paper, we propose Multimodal Composite Association Score (MCAS) as a new method of measuring gender bias in multimodal generative models. Evaluating both DALL-E 2 and Stable Diffusion using this approach uncovered the presence of gendered associations of concepts embedded within the models. We propose MCAS as an accessible and scalable method of quantifying potential bias for models with different modalities and a range of potential biases.

Two kinds of numerical algorithms for ultra-slow diffusion equations

  • Authors: Min Cai, Changpin Li, Yu Wang
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13966
  • Pdf link: https://arxiv.org/pdf/2304.13966
  • Abstract
    In this article, two kinds of numerical algorithms are derived for the ultra-slow (or superslow) diffusion equation in one and two space dimensions, where the ultra-slow diffusion is characterized by the Caputo-Hadamard fractional derivative of order $\alpha \in (0,1)$. To describe the spatial interaction, the Riesz fractional derivative and the fractional Laplacian are used in one and two space dimensions, respectively. The Caputo-Hadamard derivative is discretized by two typical approximate formulae, i.e., L2-1${\sigma}$ and L1-2 methods. The spatial fractional derivatives are discretized by the 2-nd order finite difference methods. When L2-1${\sigma}$ discretization is used, the derived numerical scheme is unconditionally stable with error estimate $\mathcal{O}(\tau^{2}+h^{2})$ for all $\alpha \in (0, 1)$, in which $\tau$ and $h$ are temporal and spatial stepsizes, respectively. When L1-2 discretization is used, the derived numerical scheme is stable with error estimate $\mathcal{O}(\tau^{3-\alpha}+h^{2})$ for $\alpha \in (0, 0.3738)$. The illustrative examples displayed are in line with the theoretical analysis.

Edit Everything: A Text-Guided Generative System for Images Editing

  • Authors: Defeng Xie, Ruichen Wang, Jian Ma, Chen Chen, Haonan Lu, Dong Yang, Fobo Shi, Xiaodong Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14006
  • Pdf link: https://arxiv.org/pdf/2304.14006
  • Abstract
    We introduce a new generative system called Edit Everything, which can take image and text inputs and produce image outputs. Edit Everything allows users to edit images using simple text instructions. Our system designs prompts to guide the visual module in generating requested images. Experiments demonstrate that Edit Everything facilitates the implementation of the visual aspects of Stable Diffusion with the use of Segment Anything model and CLIP. Our system is publicly available at https://github.com/DefengXie/Edit_Everything.

Localized orthogonal decomposition for a multiscale parabolic stochastic partial differential equation

  • Authors: Annika Lang, Per Ljung, Axel Målqvist
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14049
  • Pdf link: https://arxiv.org/pdf/2304.14049
  • Abstract
    A multiscale method is proposed for a parabolic stochastic partial differential equation with additive noise and highly oscillatory diffusion. The framework is based on the localized orthogonal decomposition (LOD) method and computes a coarse-scale representation of the elliptic operator, enriched by fine-scale information on the diffusion. Optimal order strong convergence is derived. The LOD technique is combined with a (multilevel) Monte-Carlo estimator and the weak error is analyzed. Numerical examples that confirm the theoretical findings are provided, and the computational efficiency of the method is highlighted.

DataComp: In search of the next generation of multimodal datasets

  • Authors: Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14108
  • Pdf link: https://arxiv.org/pdf/2304.14108
  • Abstract
    Large multimodal datasets have been instrumental in recent breakthroughs such as CLIP, Stable Diffusion, and GPT-4. At the same time, datasets rarely receive the same research attention as model architectures or training algorithms. To address this shortcoming in the machine learning ecosystem, we introduce DataComp, a benchmark where the training code is fixed and researchers innovate by proposing new training sets. We provide a testbed for dataset experiments centered around a new candidate pool of 12.8B image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or curate new data sources and then evaluate their new dataset by running our standardized CLIP training code and testing on 38 downstream test sets. Our benchmark consists of multiple scales, with four candidate pool sizes and associated compute budgets ranging from 12.8M to 12.8B samples seen during training. This multi-scale design facilitates the study of scaling trends and makes the benchmark accessible to researchers with varying resources. Our baseline experiments show that the DataComp workflow is a promising way of improving multimodal datasets. We introduce DataComp-1B, a dataset created by applying a simple filtering algorithm to the 12.8B candidate pool. The resulting 1.4B subset enables training a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet. Our new ViT-L/14 model outperforms a larger ViT-g/14 trained on LAION-2B by 0.7 percentage points while requiring 9x less training compute. We also outperform OpenAI's CLIP ViT-L/14 by 3.7 percentage points, which is trained with the same compute budget as our model. These gains highlight the potential for improving model performance by carefully curating training sets. We view DataComp-1B as only the first step and hope that DataComp paves the way toward the next generation of multimodal datasets.

Functional Diffusion Maps

  • Authors: María Barroso, Carlos María Alaíz, Ángela Fernández, Jose Luis Torrecilla
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.14378
  • Pdf link: https://arxiv.org/pdf/2304.14378
  • Abstract
    Nowadays many real-world datasets can be considered as functional, in the sense that the processes which generate them are continuous. A fundamental property of this type of data is that in theory they belong to an infinite-dimensional space. Although in practice we usually receive finite observations, they are still high-dimensional and hence dimensionality reduction methods are crucial. In this vein, the main state-of-the-art method for functional data analysis is Functional PCA. Nevertheless, this classic technique assumes that the data lie in a linear manifold, and hence it could have problems when this hypothesis is not fulfilled. In this research, attention has been placed on a non-linear manifold learning method: Diffusion Maps. The article explains how to extend this multivariate method to functional data and compares its behavior against Functional PCA over different simulated and real examples.

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

  • Authors: Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin, Ming-Hsuan Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14404
  • Pdf link: https://arxiv.org/pdf/2304.14404
  • Abstract
    Recent advancements in diffusion models have greatly improved the quality and diversity of synthesized content. To harness the expressive power of diffusion models, researchers have explored various controllable mechanisms that allow users to intuitively guide the content synthesis process. Although the latest efforts have primarily focused on video synthesis, there has been a lack of effective methods for controlling and describing desired content and motion. In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis. To tackle the ambiguity of sparse motion inputs and achieve better synthesis quality, MCDiff first utilizes a flow completion model to predict the dense video motion based on the semantic understanding of the video frame and the sparse motion control. Then, the diffusion model synthesizes high-quality future frames to form the output video. We qualitatively and quantitatively show that MCDiff achieves the state-the-of-art visual quality in stroke-guided controllable video synthesis. Additional experiments on MPII Human Pose further exhibit the capability of our model on diverse content and motion synthesis.

Putting People in Their Place: Affordance-Aware Human Insertion into Scenes

  • Authors: Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, Jingwan Lu, Alexei A. Efros, Krishna Kumar Singh
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14406
  • Pdf link: https://arxiv.org/pdf/2304.14406
  • Abstract
    We study the problem of inferring scene affordances by presenting a method for realistically inserting people into scenes. Given a scene image with a marked region and an image of a person, we insert the person into the scene while respecting the scene affordances. Our model can infer the set of realistic poses given the scene context, re-pose the reference person, and harmonize the composition. We set up the task in a self-supervised fashion by learning to re-pose humans in video clips. We train a large-scale diffusion model on a dataset of 2.4M video clips that produces diverse plausible poses while respecting the scene context. Given the learned human-scene composition, our model can also hallucinate realistic people and scenes when prompted without conditioning and also enables interactive editing. A quantitative evaluation shows that our method synthesizes more realistic human appearance and more natural human-scene interactions than prior work.

Keyword: dynamic

TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation

  • Authors: Zhaoyan Liu, Noel Vouitsis, Satya Krishna Gorti, Jimmy Ba, Gabriel Loaiza-Ganem
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.13742
  • Pdf link: https://arxiv.org/pdf/2304.13742
  • Abstract
    We propose TR0N, a highly general framework to turn pre-trained unconditional generative models, such as GANs and VAEs, into conditional models. The conditioning can be highly arbitrary, and requires only a pre-trained auxiliary model. For example, we show how to turn unconditional models into class-conditional ones with the help of a classifier, and also into text-to-image models by leveraging CLIP. TR0N learns a lightweight stochastic mapping which "translates" between the space of conditions and the latent space of the generative model, in such a way that the generated latent corresponds to a data sample satisfying the desired condition. The translated latent samples are then further improved upon through Langevin dynamics, enabling us to obtain higher-quality data samples. TR0N requires no training data nor fine-tuning, yet can achieve a zero-shot FID of 10.9 on MS-COCO, outperforming competing alternatives not only on this metric, but also in sampling speed -- all while retaining a much higher level of generality. Our code is available at https://github.com/layer6ai-labs/tr0n.

Physics-informed neural networks for predicting gas flow dynamics and unknown parameters in diesel engines

  • Authors: Kamaljyoti Nath, Xuhui Meng, Daniel J Smith, George Em Karniadakis
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13799
  • Pdf link: https://arxiv.org/pdf/2304.13799
  • Abstract
    This paper presents a physics-informed neural network (PINN) approach for monitoring the health of diesel engines. The aim is to evaluate the engine dynamics, identify unknown parameters in a "mean value" model, and anticipate maintenance requirements. The PINN model is applied to diesel engines with a variable-geometry turbocharger and exhaust gas recirculation, using measurement data of selected state variables. The results demonstrate the ability of the PINN model to predict simultaneously both unknown parameters and dynamics accurately with both clean and noisy data, and the importance of the self-adaptive weight in the loss function for faster convergence. The input data for these simulations are derived from actual engine running conditions, while the outputs are simulated data, making this a practical case study of PINN's ability to predict real-world dynamical systems. The mean value model of the diesel engine incorporates empirical formulae to represent certain states, but these formulae may not be generalizable to other engines. To address this, the study considers the use of deep neural networks (DNNs) in addition to the PINN model. The DNNs are trained using laboratory test data and are used to model the engine-specific empirical formulae in the mean value model, allowing for a more flexible and adaptive representation of the engine's states. In other words, the mean value model uses both the PINN model and the DNNs to represent the engine's states, with the PINN providing a physics-based understanding of the engine's overall dynamics and the DNNs offering a more engine-specific and adaptive representation of the empirical formulae. By combining these two approaches, the study aims to offer a comprehensive and versatile approach to monitoring the health and performance of diesel engines.

A Data-Driven Hybrid Automaton Framework to Modeling Complex Dynamical Systems

  • Authors: Yejiang Yang, Zihao Mo, Weiming Xiang
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.13811
  • Pdf link: https://arxiv.org/pdf/2304.13811
  • Abstract
    In this paper, a computationally efficient data-driven hybrid automaton model is proposed to capture unknown complex dynamical system behaviors using multiple neural networks. The sampled data of the system is divided by valid partitions into groups corresponding to their topologies and based on which, transition guards are defined. Then, a collection of small-scale neural networks that are computationally efficient are trained as the local dynamical description for their corresponding topologies. After modeling the system with a neural-network-based hybrid automaton, the set-valued reachability analysis with low computation cost is provided based on interval analysis and a split and combined process. At last, a numerical example of the limit cycle is presented to illustrate that the developed models can significantly reduce the computational cost in reachable set computation without sacrificing any modeling precision.

Controlled density transport using Perron Frobenius generators

  • Authors: Jake Buzhardt, Phanindra Tallapragada
  • Subjects: Systems and Control (eess.SY); Robotics (cs.RO); Fluid Dynamics (physics.flu-dyn)
  • Arxiv link: https://arxiv.org/abs/2304.13829
  • Pdf link: https://arxiv.org/pdf/2304.13829
  • Abstract
    We consider the problem of the transport of a density of states from an initial state distribution to a desired final state distribution through a dynamical system with actuation. In particular, we consider the case where the control signal is a function of time, but not space; that is, the same actuation is applied at every point in the state space. This is motivated by several problems in fluid mechanics, such as mixing and manipulation of a collection of particles by a global control input such as a uniform magnetic field, as well as by more general control problems where a density function describes an uncertainty distribution or a distribution of agents in a multi-agent system. We formulate this problem using the generators of the Perron-Frobenius operator associated with the drift and control vector fields of the system. By considering finite-dimensional approximations of these operators, the density transport problem can be expressed as a control problem for a bilinear system in a high-dimensional, lifted state. With this system, we frame the density control problem as a problem of driving moments of the density function to the moments of a desired density function, where the moments of the density can be expressed as an output which is linear in the lifted state. This output tracking problem for the lifted bilinear system is then solved using differential dynamic programming, an iterative trajectory optimization scheme.

Understand the Dynamic World: An End-to-End Knowledge Informed Framework for Open Domain Entity State Tracking

  • Authors: Mingchen Li, Lifu Huang
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13854
  • Pdf link: https://arxiv.org/pdf/2304.13854
  • Abstract
    Open domain entity state tracking aims to predict reasonable state changes of entities (i.e., [attribute] of [entity] was [before_state] and [after_state] afterwards) given the action descriptions. It's important to many reasoning tasks to support human everyday activities. However, it's challenging as the model needs to predict an arbitrary number of entity state changes caused by the action while most of the entities are implicitly relevant to the actions and their attributes as well as states are from open vocabularies. To tackle these challenges, we propose a novel end-to-end Knowledge Informed framework for open domain Entity State Tracking, namely KIEST, which explicitly retrieves the relevant entities and attributes from external knowledge graph (i.e., ConceptNet) and incorporates them to autoregressively generate all the entity state changes with a novel dynamic knowledge grained encoder-decoder framework. To enforce the logical coherence among the predicted entities, attributes, and states, we design a new constraint decoding strategy and employ a coherence reward to improve the decoding process. Experimental results show that our proposed KIEST framework significantly outperforms the strong baselines on the public benchmark dataset OpenPI.

Ensoul: A framework for the creation of self organizing intelligent ultra low power systems (SOULS) through evolutionary enerstatic networks

  • Authors: Ty Roachford
  • Subjects: Artificial Intelligence (cs.AI); Adaptation and Self-Organizing Systems (nlin.AO)
  • Arxiv link: https://arxiv.org/abs/2304.13863
  • Pdf link: https://arxiv.org/pdf/2304.13863
  • Abstract
    Ensoul is a framework proposed for the purpose of creating technologies that create more technologies through the combined use of networks, and nests, of energy homeostatic (enerstatic) loops and open-ended evolutionary techniques. Generative technologies developed by such an approach serve as both simple, yet insightful models of thermodynamically driven complex systems and as powerful sources of novel technologies. "Self Organizing intelligent Ultra Low power Systems" (SOULS) is a term that well describes the technologies produced by such a generative technology, as well as the generative technology itself. The term is meant to capture the abstract nature of such technologies as being independent of the substrate in which they are embedded. In other words, SOULS can be biological, artificial or hybrid in form.

Physics-informed Data-driven Discovery of Constitutive Models with Application to Strain-Rate-sensitive Soft Materials

  • Authors: Kshitiz Upadhyay, Jan N. Fuhg, Nikolaos Bouklas, K.T. Ramesh
  • Subjects: Computational Engineering, Finance, and Science (cs.CE); Materials Science (cond-mat.mtrl-sci); Soft Condensed Matter (cond-mat.soft)
  • Arxiv link: https://arxiv.org/abs/2304.13897
  • Pdf link: https://arxiv.org/pdf/2304.13897
  • Abstract
    A novel data-driven constitutive modeling approach is proposed, which combines the physics-informed nature of modeling based on continuum thermodynamics with the benefits of machine learning. This approach is demonstrated on strain-rate-sensitive soft materials. This model is based on the viscous dissipation-based visco-hyperelasticity framework where the total stress is decomposed into volumetric, isochoric hyperelastic, and isochoric viscous overstress contributions. It is shown that each of these stress components can be written as linear combinations of the components of an irreducible integrity basis. Three Gaussian process regression-based surrogate models are trained (one per stress component) between principal invariants of strain and strain rate tensors and the corresponding coefficients of the integrity basis components. It is demonstrated that this type of model construction enforces key physics-based constraints on the predicted responses: the second law of thermodynamics, the principles of local action and determinism, objectivity, the balance of angular momentum, an assumed reference state, isotropy, and limited memory. The three surrogate models that constitute our constitutive model are evaluated by training them on small-size numerically generated data sets corresponding to a single deformation mode and then analyzing their predictions over a much wider testing regime comprising multiple deformation modes. Our physics-informed data-driven constitutive model predictions are compared with the corresponding predictions of classical continuum thermodynamics-based and purely data-driven models. It is shown that our surrogate models can reasonably capture the stress-strain-strain rate responses in both training and testing regimes, and provide improvements in terms of prediction accuracy, generalizability to multiple deformation modes, and compatibility with limited data.

Conditional dominance in games with unawareness

  • Authors: Martin Meier, Burkhard C. Schipper
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2304.13901
  • Pdf link: https://arxiv.org/pdf/2304.13901
  • Abstract
    Heifetz, Meier, and Schipper (2013) introduced dynamic game with unawareness consisting of a partially ordered set of games in extensive form. Here, we study the normal form of dynamic games with unawareness. The generalized normal form associated with a dynamic game with unawareness consists of a partially ordered set of games in norm form. We use the generalized normal form to characterize extensive-form rationalizability (resp., prudent rationalizability) in dynamic games with unawareness by iterated conditional strict (resp., weak) dominance in the associated generalized normal form. We also show that the analogue to iterated admissibility for dynamic games with unawareness depends on extensive-form structure. This is because under unawareness, a player's information set not only determines which nodes she considers possible but also of which game tree(s) she is aware of.

Level Assembly as a Markov Decision Process

  • Authors: Colan F. Biemer, Seth Cooper
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13922
  • Pdf link: https://arxiv.org/pdf/2304.13922
  • Abstract
    Many games feature a progression of levels that doesn't adapt to the player. This can be problematic because some players may get stuck if the progression is too difficult, while others may find it boring if the progression is too slow to get to more challenging levels. This can be addressed by building levels based on the player's performance and preferences. In this work, we formulate the problem of generating levels for a player as a Markov Decision Process (MDP) and use adaptive dynamic programming (ADP) to solve the MDP before assembling a level. We tested with two case studies and found that using an ADP outperforms two baselines. Furthermore, we experimented with player proxies and switched them in the middle of play, and we show that a simple modification prior to running ADP results in quick adaptation. By using ADP, which searches the entire MDP, we produce a dynamic progression of levels that adapts to the player.

A One-Dimensional Symmetric Force-Based Blending Method for Atomistic-to-Continuum Coupling

  • Authors: Elaine Gorom-Alexander, Xingjie Helen Li
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13939
  • Pdf link: https://arxiv.org/pdf/2304.13939
  • Abstract
    Inspired by the blending method developed by [P. Seleson, S. Beneddine, and S. Prudhome, \emph{A Force-Based Coupling Scheme for Peridynamics and Classical Elasticity}, (2013)] for the nonlocal-to-local coupling, we create a symmetric and consistent blended force-based Atomistic-to-Continuum (a/c) scheme for the atomistic chain in one-dimensional space. The conditions for the well-posedness of the underlying model are established by analyzing an optimal blending size and blending type to ensure the $H^1$ semi-norm stability for the blended force-based operator. We present several numerical experiments to test and confirm the theoretical findings.

Provably Stabilizing Global-Position Tracking Control for Hybrid Models of Multi-Domain Bipedal Walking via Multiple Lyapunov Analysis

  • Authors: Yuan Gao, Kentaro Barhydt, Christopher Niezrecki, Yan Gu
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.13943
  • Pdf link: https://arxiv.org/pdf/2304.13943
  • Abstract
    Accurate control of a humanoid robot's global position (i.e., its three-dimensional position in the world) is critical to the reliable execution of high-risk tasks such as avoiding collision with pedestrians in a crowded environment. This paper introduces a time-based nonlinear control method that achieves accurate global-position tracking (GPT) for multi-domain bipedal walking. Deriving a tracking controller for bipedal robots is challenging due to the highly complex robot dynamics that are time-varying and hybrid, especially for multi-domain walking that involves multiple phases/domains of full actuation, over actuation, and underactuation. To tackle this challenge, we introduce a continuous-phase GPT control law for multi-domain walking, which provably ensures the exponential convergence of the entire error state within the full and over actuation domains and that of the directly regulated error state within the underactuation domain. We then construct sufficient multiple-Lyapunov stability conditions for the hybrid multi-domain tracking error system under the proposed GPT control law. We illustrate the proposed controller design through both three-domain walking with all motors activated and two-domain gait with inactive ankle motors. Simulations of a ROBOTIS OP3 bipedal humanoid robot demonstrate the satisfactory accuracy and convergence rate of the proposed control approach under two different cases of multi-domain walking as well as various walking speeds and desired paths.

A central scheme for coupled hyperbolic systems

  • Authors: Michael Herty, Niklas Kolbe, Siegfried Müller
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13946
  • Pdf link: https://arxiv.org/pdf/2304.13946
  • Abstract
    A novel numerical scheme to solve coupled systems of conservation laws is introduced. The scheme is derived based on a relaxation approach and does not require information on the Lax curves of the coupled systems, which simplifies the computation of suitable coupling data. The coupling condition for the underlying relaxation system plays a crucial role as it determines the behavior of the scheme in the zero relaxation limit. The role of this condition is discussed, a consistency concept with respect to the original problem is introduced, well-posedness is analyzed and explicit, nodal Riemann solvers are provided. Based on a case study considering the p-system of gas dynamics a strategy for the design of the relaxation coupling condition within the new scheme is provided.

Data-driven time-scale separation of ODE right-hand sides using dynamic mode decomposition and time delay embedding

  • Authors: Cody J. Balos
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13971
  • Pdf link: https://arxiv.org/pdf/2304.13971
  • Abstract
    Multi-physics simulation often involve multiple different scales. The ARKODE ODE solver package in the SUNDIALS library addresses multi-scale problems with a multi-rate time-integrator that can work with a right-hand side that has fast scale and slow scale components. In this report, we use dynamic mode decomposition and time delay embedding to extract the fast and and slow components of the right-hand sides of a simple ODE from data. We then use the extracted components to solve the ODE with ARKODE. Finally, to move towards a real-world use case, we attempt to extract fast and slow scale dynamics from synthetic seismic modeling data.

An FPTAS for Budgeted Laminar Matroid Independent Set

  • Authors: Ilan Doron-Arad, Ariel Kulik, Hadas Shachnai
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.13984
  • Pdf link: https://arxiv.org/pdf/2304.13984
  • Abstract
    We study the budgeted laminar matroid independent set problem. The input is a ground set, where each element has a cost and a non-negative profit, along with a laminar matroid over the elements and a budget. The goal is to select a maximum profit independent set of the matroid whose total cost is bounded by the budget. Several well known special cases, where we have, e.g., no matroid constraint (the classic knapsack problem) or a uniform matroid constraint (knapsack with a cardinality constraint), admit a fully polynomial-time approximation scheme (FPTAS). In contrast, the budgeted matroid independent set (BMI) problem with a general matroid has an efficient polynomial-time approximation scheme (EPTAS) but does not admit an FPTAS. This implies an EPTAS for our problem, which is the best known result prior to this work. We present an FPTAS for budgeted laminar matroid independent set, improving the previous EPTAS for this matroid family and generalizing the FPTAS known for knapsack with a cardinality constraint and multiple-choice knapsack. Our scheme is based on a simple dynamic program which utilizes the tree-like structure of laminar matroids.

communication of information in systems of heterogenious agents and systems' dynamics

  • Authors: Inga Ivanova
  • Subjects: Computers and Society (cs.CY); Information Theory (cs.IT); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.14013
  • Pdf link: https://arxiv.org/pdf/2304.14013
  • Abstract
    Communication of information in complex systems can be considered as major driver of systems evolution. What matters is not the communicated information by itself but rather the meaning that is supplied to the information. However informational exchange in a system of heterogenious agents, which code and decode information with different meaning processing structures, is more complex than simple input-output model. The structural difference of coding and decoding algorithms in a system of three or more groups of agents, entertaining different sets of communication codes,provide a source of additional options which has an impact on system's dynamics. The mechanisms of meaning and information processing can be evaluated analytically ion a model framework. The results show that model predictions acccurately fit empirically observed data in systems of different origions.

Unification of Lagrangian staggered-grid hydrodynamics and cell-centered hydrodynamics in one dimension

  • Authors: Xihua Xu
  • Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
  • Arxiv link: https://arxiv.org/abs/2304.14054
  • Pdf link: https://arxiv.org/pdf/2304.14054
  • Abstract
    This paper focuses on the novel scheme to unify both Lagrangian staggered-grid and cell-centered hydrodynamic methods in one dimension. The scheme neither contains empirical parameters nor solves the Riemann problem. It includes two key points: one is the relationship between pressure and velocity, and the other is Newton's second law. The two methods that make use of this scheme satisfy the entropy condition and are conservative in total mass, momentum, and energy. Numerical results show the robustness and accuracy of both methods.

Comparison of Optimization-Based Methods for Energy-Optimal Quadrotor Motion Planning

  • Authors: Welf Rehberg, Joaquim Ortiz-Haro, Marc Toussaint, Wolfgang Hönig
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14062
  • Pdf link: https://arxiv.org/pdf/2304.14062
  • Abstract
    Quadrotors are agile flying robots that are challenging to control. Considering the full dynamics of quadrotors during motion planning is crucial to achieving good solution quality and small tracking errors during flight. Optimization-based methods scale well with high-dimensional state spaces and can handle dynamic constraints directly, therefore they are often used in these scenarios. The resulting optimization problem is notoriously difficult to solve due to its nonconvex constraints. In this work, we present an analysis of four solvers for nonlinear trajectory optimization (KOMO, direct collocation with SCvx, direct collocation with CasADi, Crocoddyl) and evaluate their performance in scenarios where the solvers are tasked to find minimum-effort solutions to geometrically complex problems and problems requiring highly dynamic solutions. Benchmarking these methods helps to determine the best algorithm structures for these kinds of problems.

Compositional 3D Human-Object Neural Animation

  • Authors: Zhi Hou, Baosheng Yu, Dacheng Tao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14070
  • Pdf link: https://arxiv.org/pdf/2304.14070
  • Abstract
    Human-object interactions (HOIs) are crucial for human-centric scene understanding applications such as human-centric visual generation, AR/VR, and robotics. Since existing methods mainly explore capturing HOIs, rendering HOI remains less investigated. In this paper, we address this challenge in HOI animation from a compositional perspective, i.e., animating novel HOIs including novel interaction, novel human and/or novel object driven by a novel pose sequence. Specifically, we adopt neural human-object deformation to model and render HOI dynamics based on implicit neural representations. To enable the interaction pose transferring among different persons and objects, we then devise a new compositional conditional neural radiance field (or CC-NeRF), which decomposes the interdependence between human and object using latent codes to enable compositionally animation control of novel HOIs. Experiments show that the proposed method can generalize well to various novel HOI animation settings. Our project page is https://zhihou7.github.io/CHONA/

Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning: A Dynamic Weight-based Approach

  • Authors: Junlin Lu, Patrick Mannion, Karl Mason
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14115
  • Pdf link: https://arxiv.org/pdf/2304.14115
  • Abstract
    Many decision-making problems feature multiple objectives. In such problems, it is not always possible to know the preferences of a decision-maker for different objectives. However, it is often possible to observe the behavior of decision-makers. In multi-objective decision-making, preference inference is the process of inferring the preferences of a decision-maker for different objectives. This research proposes a Dynamic Weight-based Preference Inference (DWPI) algorithm that can infer the preferences of agents acting in multi-objective decision-making problems, based on observed behavior trajectories in the environment. The proposed method is evaluated on three multi-objective Markov decision processes: Deep Sea Treasure, Traffic, and Item Gathering. The performance of the proposed DWPI approach is compared to two existing preference inference methods from the literature, and empirical results demonstrate significant improvements compared to the baseline algorithms, in terms of both time requirements and accuracy of the inferred preferences. The Dynamic Weight-based Preference Inference algorithm also maintains its performance when inferring preferences for sub-optimal behavior demonstrations. In addition to its impressive performance, the Dynamic Weight-based Preference Inference algorithm does not require any interactions during training with the agent whose preferences are inferred, all that is required is a trajectory of observed behavior.

Learning Neural PDE Solvers with Parameter-Guided Channel Attention

  • Authors: Makoto Takamoto, Francesco Alesiani, Mathias Niepert
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn); Geophysics (physics.geo-ph)
  • Arxiv link: https://arxiv.org/abs/2304.14118
  • Pdf link: https://arxiv.org/pdf/2304.14118
  • Abstract
    Scientific Machine Learning (SciML) is concerned with the development of learned emulators of physical systems governed by partial differential equations (PDE). In application domains such as weather forecasting, molecular dynamics, and inverse design, ML-based surrogate models are increasingly used to augment or replace inefficient and often non-differentiable numerical simulation algorithms. While a number of ML-based methods for approximating the solutions of PDEs have been proposed in recent years, they typically do not adapt to the parameters of the PDEs, making it difficult to generalize to PDE parameters not seen during training. We propose a Channel Attention mechanism guided by PDE Parameter Embeddings (CAPE) component for neural surrogate models and a simple yet effective curriculum learning strategy. The CAPE module can be combined with neural PDE solvers allowing them to adapt to unseen PDE parameters. The curriculum learning strategy provides a seamless transition between teacher-forcing and fully auto-regressive training. We compare CAPE in conjunction with the curriculum learning strategy using a popular PDE benchmark and obtain consistent and significant improvements over the baseline models. The experiments also show several advantages of CAPE, such as its increased ability to generalize to unseen PDE parameters without large increases inference time and parameter count.

A particle method for non-local advection-selection-mutation equations

  • Authors: Frank Ernesto Alvarez, Jules Guilberteau
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14210
  • Pdf link: https://arxiv.org/pdf/2304.14210
  • Abstract
    The well-posedness of a non-local advection-selection-mutation problem deriving from adaptive dynamics models is shown for a wide family of initial data. A particle method is then developed, in order to approximate the solution of such problem by a regularised sum of weighted Dirac masses whose characteristics solve a suitably defined ODE system. The convergence of the particle method over any finite interval is shown and an explicit rate of convergence is given. Furthermore, we investigate the asymptotic-preserving properties of the method in large times, providing sufficient conditions for it to hold true as well as examples and counter-examples. Finally, we illustrate the method in two cases taken from the literature.

Some of the variables, some of the parameters, some of the times, with some physics known: Identification with partial information

  • Authors: Saurabh Malani, Tom S. Bertalan, Tianqi Cui, Jose L. Avalos, Michael Betenbaugh, Ioannis G. Kevrekidis
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.14214
  • Pdf link: https://arxiv.org/pdf/2304.14214
  • Abstract
    Experimental data is often comprised of variables measured independently, at different sampling rates (non-uniform ${\Delta}$t between successive measurements); and at a specific time point only a subset of all variables may be sampled. Approaches to identifying dynamical systems from such data typically use interpolation, imputation or subsampling to reorganize or modify the training data $\textit{prior}$ to learning. Partial physical knowledge may also be available $\textit{a priori}$ (accurately or approximately), and data-driven techniques can complement this knowledge. Here we exploit neural network architectures based on numerical integration methods and $\textit{a priori}$ physical knowledge to identify the right-hand side of the underlying governing differential equations. Iterates of such neural-network models allow for learning from data sampled at arbitrary time points $\textit{without}$ data modification. Importantly, we integrate the network with available partial physical knowledge in "physics informed gray-boxes"; this enables learning unknown kinetic rates or microbial growth functions while simultaneously estimating experimental parameters.

Fast Sampling of $b$-Matchings and $b$-Edge Covers

  • Authors: Zongchen Chen, Yuzhou Gu
  • Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Combinatorics (math.CO); Probability (math.PR)
  • Arxiv link: https://arxiv.org/abs/2304.14289
  • Pdf link: https://arxiv.org/pdf/2304.14289
  • Abstract
    For integer $b \ge 1$, a $b$-matching (resp. $b$-edge cover) of a graph $G=(V,E)$ is a subset $S\subseteq E$ of edges such that every vertex is incident with at most (resp. at least) $b$ edges from $S$. We prove that for any $b \ge 1$ the simple Glauber dynamics for sampling (weighted) $b$-matchings and $b$-edge covers mixes in $O(n\log n)$ time on all $n$-vertex bounded-degree graphs. This significantly improves upon previous results which have worse running time and only work for $b$-matchings with $b \le 7$ and for $b$-edge covers with $b \le 2$. Moreover generally, we prove spectral independence for a broad class of binary symmetric Holant problems with log-concave signatures, including $b$-matchings, $b$-edge covers, and antiferromagnetic $2$-spin edge models. We hence deduce optimal mixing time of Glauber dynamics from spectral independence.

Structured interpolation for multivariate transfer functions of quadratic-bilinear systems

  • Authors: Peter Benner, Serkan Gugercin, Steffen W. R. Werner
  • Subjects: Numerical Analysis (math.NA); Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.14292
  • Pdf link: https://arxiv.org/pdf/2304.14292
  • Abstract
    High-dimensional/high-fidelity nonlinear dynamical systems appear naturally when the goal is to accurately model real-world phenomena. Many physical properties are thereby encoded in the internal differential structure of these resulting large-scale nonlinear systems. The high-dimensionality of the dynamics causes computational bottlenecks, especially when these large-scale systems need to be simulated for a variety of situations such as different forcing terms. This motivates model reduction where the goal is to replace the full-order dynamics with accurate reduced-order surrogates. Interpolation-based model reduction has been proven to be an effective tool for the construction of cheap-to-evaluate surrogate models that preserve the internal structure in the case of weak nonlinearities. In this paper, we consider the construction of multivariate interpolants in frequency domain for structured quadratic-bilinear systems. We propose definitions for structured variants of the symmetric subsystem and generalized transfer functions of quadratic-bilinear systems and provide conditions for structure-preserving interpolation by projection. The theoretical results are illustrated using two numerical examples including the simulation of molecular dynamics in crystal structures.

On Solution Discovery via Reconfiguration

  • Authors: Michael R. Fellows, Mario Grobler, Nicole Megow, Amer E. Mouawad, Vijayaragunathan Ramamoorthi, Frances A. Rosamond, Daniel Schmand, Sebastian Siebertz
  • Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.14295
  • Pdf link: https://arxiv.org/pdf/2304.14295
  • Abstract
    The dynamics of real-world applications and systems require efficient methods for improving infeasible solutions or restoring corrupted ones by making modifications to the current state of a system in a restricted way. We propose a new framework of solution discovery via reconfiguration for constructing a feasible solution for a given problem by executing a sequence of small modifications starting from a given state. Our framework integrates and formalizes different aspects of classical local search, reoptimization, and combinatorial reconfiguration. We exemplify our framework on a multitude of fundamental combinatorial problems, namely Vertex Cover, Independent Set, Dominating Set, and Coloring. We study the classical as well as the parameterized complexity of the solution discovery variants of those problems and explore the boundary between tractable and intractable instances.

Learning Absorption Rates in Glucose-Insulin Dynamics from Meal Covariates

  • Authors: Ke Alexander Wang, Matthew E. Levine, Jiaxin Shi, Emily B. Fox
  • Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Quantitative Methods (q-bio.QM)
  • Arxiv link: https://arxiv.org/abs/2304.14300
  • Pdf link: https://arxiv.org/pdf/2304.14300
  • Abstract
    Traditional models of glucose-insulin dynamics rely on heuristic parameterizations chosen to fit observations within a laboratory setting. However, these models cannot describe glucose dynamics in daily life. One source of failure is in their descriptions of glucose absorption rates after meal events. A meal's macronutritional content has nuanced effects on the absorption profile, which is difficult to model mechanistically. In this paper, we propose to learn the effects of macronutrition content from glucose-insulin data and meal covariates. Given macronutrition information and meal times, we use a neural network to predict an individual's glucose absorption rate. We use this neural rate function as the control function in a differential equation of glucose dynamics, enabling end-to-end training. On simulated data, our approach is able to closely approximate true absorption rates, resulting in better forecast than heuristic parameterizations, despite only observing glucose, insulin, and macronutritional information. Our work readily generalizes to meal events with higher-dimensional covariates, such as images, setting the stage for glucose dynamics models that are personalized to each individual's daily life.

Empirical Individual State Observability

  • Authors: Benjamin Cellini, Burak Boyacıoğlu, Floris van Breugel
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.14313
  • Pdf link: https://arxiv.org/pdf/2304.14313
  • Abstract
    A dynamical system is observable if there is a one-to-one mapping from the system's measured outputs and inputs to all of the system's states. Analytical and empirical tools exist for quantifying the (full state) observability of linear and nonlinear systems; however, empirical tools for evaluating the observability of individual state variables are lacking. Here, a new empirical approach termed Empirical Individual State Observability (E-ISO) is developed to quantify the level of observability of individual state variables. E-ISO first builds an empirical observability matrix via simulation, then applies convex optimization to efficiently determine the subset of its rows required to estimate each state variable individually. Finally, (un)observability measures for these subsets are calculated to provide independent estimates of the observability of each state variable. Multiple example applications of E-ISO on linear and nonlinear systems are shown to be consistent with analytical results. Broadly, E-ISO will be an invaluable tool both for designing active sensing control laws or optimizing sensor placement to increase the observability of individual state variables for engineered systems, and analyzing the trajectory decisions made by organisms.

An Audit Framework for Adopting AI-Nudging on Children

  • Authors: Marianna Ganapini, Enrico Panai
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14338
  • Pdf link: https://arxiv.org/pdf/2304.14338
  • Abstract
    This is an audit framework for AI-nudging. Unlike the static form of nudging usually discussed in the literature, we focus here on a type of nudging that uses large amounts of data to provide personalized, dynamic feedback and interfaces. We call this AI-nudging (Lanzing, 2019, p. 549; Yeung, 2017). The ultimate goal of the audit outlined here is to ensure that an AI system that uses nudges will maintain a level of moral inertia and neutrality by complying with the recommendations, requirements, or suggestions of the audit (in other words, the criteria of the audit). In the case of unintended negative consequences, the audit suggests risk mitigation mechanisms that can be put in place. In the case of unintended positive consequences, it suggests some reinforcement mechanisms. Sponsored by the IBM-Notre Dame Tech Ethics Lab

SMAT: A Self-Reinforcing Framework for Simultaneous Mapping and Tracking in Unbounded Urban Environments

  • Authors: Tingxiang Fan, Bowen Shen, Yinqiang Zhang, Chuye Zhang, Lei Yang, Hua Chen, Wei Zhang, Jia Pan
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14356
  • Pdf link: https://arxiv.org/pdf/2304.14356
  • Abstract
    With the increasing prevalence of robots in daily life, it is crucial to enable robots to construct a reliable map online to navigate in unbounded and changing environments. Although existing methods can individually achieve the goals of spatial mapping and dynamic object detection and tracking, limited research has been conducted on an effective combination of these two important abilities. The proposed framework, SMAT (Simultaneous Mapping and Tracking), integrates the front-end dynamic object detection and tracking module with the back-end static mapping module using a self-reinforcing mechanism, which promotes mutual improvement of mapping and tracking performance. The conducted experiments demonstrate the framework's effectiveness in real-world applications, achieving successful long-range navigation and mapping in multiple urban environments using only one LiDAR, a CPU-only onboard computer, and a consumer-level GPS receiver.

Measuring and Modeling the Free Content Web

  • Authors: Abdulrahman Alabduljabbar, Runyu Ma, Ahmed Abusnaina, Rhongho Jang, Songqing Chen, DaeHun Nyang, and David Mohaisen
  • Subjects: Computers and Society (cs.CY); Cryptography and Security (cs.CR); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.14359
  • Pdf link: https://arxiv.org/pdf/2304.14359
  • Abstract
    Free content websites that provide free books, music, games, movies, etc., have existed on the Internet for many years. While it is a common belief that such websites might be different from premium websites providing the same content types, an analysis that supports this belief is lacking in the literature. In particular, it is unclear if those websites are as safe as their premium counterparts. In this paper, we set out to investigate, by analysis and quantification, the similarities and differences between free content and premium websites, including their risk profiles. To conduct this analysis, we assembled a list of 834 free content websites offering books, games, movies, music, and software, and 728 premium websites offering content of the same type. We then contribute domain-, content-, and risk-level analysis, examining and contrasting the websites' domain names, creation times, SSL certificates, HTTP requests, page size, average load time, and content type. For risk analysis, we consider and examine the maliciousness of these websites at the website- and component-level. Among other interesting findings, we show that free content websites tend to be vastly distributed across the TLDs and exhibit more dynamics with an upward trend for newly registered domains. Moreover, the free content websites are 4.5 times more likely to utilize an expired certificate, 19 times more likely to be malicious at the website level, and 2.64 times more likely to be malicious at the component level. Encouraged by the clear differences between the two types of websites, we explore the automation and generalization of the risk modeling of the free content risky websites, showing that a simple machine learning-based technique can produce 86.81% accuracy in identifying them.

Learning Neural Constitutive Laws From Motion Observations for Generalizable PDE Dynamics

  • Authors: Pingchuan Ma, Peter Yichen Chen, Bolei Deng, Joshua B. Tenenbaum, Tao Du, Chuang Gan, Wojciech Matusik
  • Subjects: Machine Learning (cs.LG); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.14369
  • Pdf link: https://arxiv.org/pdf/2304.14369
  • Abstract
    We propose a hybrid neural network (NN) and PDE approach for learning generalizable PDE dynamics from motion observations. Many NN approaches learn an end-to-end model that implicitly models both the governing PDE and constitutive models (or material models). Without explicit PDE knowledge, these approaches cannot guarantee physical correctness and have limited generalizability. We argue that the governing PDEs are often well-known and should be explicitly enforced rather than learned. Instead, constitutive models are particularly suitable for learning due to their data-fitting nature. To this end, we introduce a new framework termed "Neural Constitutive Laws" (NCLaw), which utilizes a network architecture that strictly guarantees standard constitutive priors, including rotation equivariance and undeformed state equilibrium. We embed this network inside a differentiable simulation and train the model by minimizing a loss function based on the difference between the simulation and the motion observation. We validate NCLaw on various large-deformation dynamical systems, ranging from solids to fluids. After training on a single motion trajectory, our method generalizes to new geometries, initial/boundary conditions, temporal ranges, and even multi-physics systems. On these extremely out-of-distribution generalization tasks, NCLaw is orders-of-magnitude more accurate than previous NN approaches. Real-world experiments demonstrate our method's ability to learn constitutive laws from videos.

Pseudo-Hamiltonian neural networks for learning partial differential equations

  • Authors: Sølve Eidnes, Kjetil Olsen Lye
  • Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14374
  • Pdf link: https://arxiv.org/pdf/2304.14374
  • Abstract
    Pseudo-Hamiltonian neural networks (PHNN) were recently introduced for learning dynamical systems that can be modelled by ordinary differential equations. In this paper, we extend the method to partial differential equations. The resulting model is comprised of up to three neural networks, modelling terms representing conservation, dissipation and external forces, and discrete convolution operators that can either be learned or be prior knowledge. We demonstrate numerically the superior performance of PHNN compared to a baseline model that models the full dynamics by a single neural network. Moreover, since the PHNN model consists of three parts with different physical interpretations, these can be studied separately to gain insight into the system, and the learned model is applicable also if external forces are removed or changed.

Dynamic Pricing and Learning with Bayesian Persuasion

  • Authors: Shipra Agrawal, Yiding Feng, Wei Tang
  • Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14385
  • Pdf link: https://arxiv.org/pdf/2304.14385
  • Abstract
    We consider a novel dynamic pricing and learning setting where in addition to setting prices of products in sequential rounds, the seller also ex-ante commits to 'advertising schemes'. That is, in the beginning of each round the seller can decide what kind of signal they will provide to the buyer about the product's quality upon realization. Using the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses, we formulate the problem of finding an optimal design of the advertising scheme along with a pricing scheme that maximizes the seller's expected revenue. Without any apriori knowledge of the buyers' demand function, our goal is to design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy. We study the regret of the algorithm when compared to the optimal clairvoyant price and advertising scheme. Our main result is a computationally efficient online algorithm that achieves an $O(T^{2/3}(m\log T)^{1/3})$ regret bound when the valuation function is linear in the product quality. Here $m$ is the cardinality of the discrete product quality domain and $T$ is the time horizon. This result requires some natural monotonicity and Lipschitz assumptions on the valuation function, but no Lipschitz or smoothness assumption on the buyers' demand function. For constant $m$, our result matches the regret lower bound for dynamic pricing within logarithmic factors, which is a special case of our problem. We also obtain several improved results for the widely considered special case of additive valuations, including an $\tilde{O}(T^{2/3})$ regret bound independent of $m$ when $m\le T^{1/3}$.

SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos

  • Authors: John Z. Zhang, Shuo Yang, Gengshan Yang, Arun L. Bishop, Deva Ramanan, Zachary Manchester
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14389
  • Pdf link: https://arxiv.org/pdf/2304.14389
  • Abstract
    We present SLoMo: a first-of-its-kind framework for transferring skilled motions from casually captured "in the wild" video footage of humans and animals to legged robots. SLoMo works in three stages: 1) synthesize a physically plausible reconstructed key-point trajectory from monocular videos; 2) optimize a dynamically feasible reference trajectory for the robot offline that includes body and foot motion, as well as contact sequences that closely tracks the key points; 3) track the reference trajectory online using a general-purpose model-predictive controller on robot hardware. Traditional motion imitation for legged motor skills often requires expert animators, collaborative demonstrations, and/or expensive motion capture equipment, all of which limits scalability. Instead, SLoMo only relies on easy-to-obtain monocular video footage, readily available in online repositories such as YouTube. It converts videos into motion primitives that can be executed reliably by real-world robots. We demonstrate our approach by transferring the motions of cats, dogs, and humans to example robots including a quadruped (on hardware) and a humanoid (in simulation). To the best knowledge of the authors, this is the first attempt at a general-purpose motion transfer framework that imitates animal and human motions on legged robots directly from casual videos without artificial markers or labels.

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

  • Authors: Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin, Ming-Hsuan Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14404
  • Pdf link: https://arxiv.org/pdf/2304.14404
  • Abstract
    Recent advancements in diffusion models have greatly improved the quality and diversity of synthesized content. To harness the expressive power of diffusion models, researchers have explored various controllable mechanisms that allow users to intuitively guide the content synthesis process. Although the latest efforts have primarily focused on video synthesis, there has been a lack of effective methods for controlling and describing desired content and motion. In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis. To tackle the ambiguity of sparse motion inputs and achieve better synthesis quality, MCDiff first utilizes a flow completion model to predict the dense video motion based on the semantic understanding of the video frame and the sparse motion control. Then, the diffusion model synthesizes high-quality future frames to form the output video. We qualitatively and quantitatively show that MCDiff achieves the state-the-of-art visual quality in stroke-guided controllable video synthesis. Additional experiments on MPII Human Pose further exhibit the capability of our model on diverse content and motion synthesis.

New submissions for Thu, 4 May 23

Keyword: efficient

Physics-Informed and Data-Driven Discovery of Governing Equations for Complex Phenomena in Heterogeneous Media

  • Authors: Muhammad Sahimi
  • Subjects: Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph)
  • Arxiv link: https://arxiv.org/abs/2305.01653
  • Pdf link: https://arxiv.org/pdf/2305.01653
  • Abstract
    Rapid evolution of sensor technology, advances in instrumentation, and progress in devising data-acquisition softwares/hardwares are providing vast amounts of data for various complex phenomena, ranging from those in atomospheric environment, to large-scale porous formations, and biological systems. The tremendous increase in the speed of scientific computing has also made it possible to emulate diverse high-dimensional, multiscale and multiphysics phenomena that contain elements of stochasticity, and to generate large volumes of numerical data for them in heterogeneous systems. The difficulty is, however, that often the governing equations for such phenomena are not known. A prime example is flow, transport, and deformation processes in macroscopically-heterogeneous materials and geomedia. In other cases, the governing equations are only partially known, in the sense that they either contain various coefficients that must be evaluated based on data, or that they require constitutive relations, such as the relationship between the stress tensor and the velocity gradients for non-Newtonian fluids in the momentum conservation equation, in order for them to be useful to the modeling. Several classes of approaches are emerging to address such problems that are based on machine learning, symbolic regression, the Mori-Zwanzig projection operator formulation, sparse identification of nonlinear dynamics, data assimilation, and stochastic optimization and analysis, or a combination of two or more of such approaches. This Perspective describes the latest developments in this highly important area, and discusses possible future directions.

Scalable Data Point Valuation in Decentralized Learning

  • Authors: Konstantin D. Pandl, Chun-Yin Huang, Ivan Beschastnikh, Xiaoxiao Li, Scott Thiebes, Ali Sunyaev
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2305.01657
  • Pdf link: https://arxiv.org/pdf/2305.01657
  • Abstract
    Existing research on data valuation in federated and swarm learning focuses on valuing client contributions and works best when data across clients is independent and identically distributed (IID). In practice, data is rarely distributed IID. We develop an approach called DDVal for decentralized data valuation, capable of valuing individual data points in federated and swarm learning. DDVal is based on sharing deep features and approximating Shapley values through a k-nearest neighbor approximation method. This allows for novel applications, for example, to simultaneously reward institutions and individuals for providing data to a decentralized machine learning task. The valuation of data points through DDVal allows to also draw hierarchical conclusions on the contribution of institutions, and we empirically show that the accuracy of DDVal in estimating institutional contributions is higher than existing Shapley value approximation methods for federated learning. Specifically, it reaches a cosine similarity in approximating Shapley values of 99.969 % in both, IID and non-IID data distributions across institutions, compared with 99.301 % and 97.250 % for the best state of the art methods. DDVal scales with the number of data points instead of the number of clients, and has a loglinear complexity. This scales more favorably than existing approaches with an exponential complexity. We show that DDVal is especially efficient in data distribution scenarios with many clients that have few data points - for example, more than 16 clients with 8,000 data points each. By integrating DDVal into a decentralized system, we show that it is not only suitable for centralized federated learning, but also decentralized swarm learning, which aligns well with the research on emerging internet technologies such as web3 to reward users for providing data to algorithms.

FlightBERT++: A Non-autoregressive Multi-Horizon Flight Trajectory Prediction Framework

  • Authors: Dongyue Guo, Zheng Zhang, Jianwei Zhang, Yi Lin
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.01658
  • Pdf link: https://arxiv.org/pdf/2305.01658
  • Abstract
    Flight Trajectory Prediction (FTP) is an essential task in Air Traffic Control (ATC), which can assist air traffic controllers to manage airspace more safely and efficiently. Existing approaches generally perform multi-horizon FTP tasks in an autoregressive manner, which is prone to suffer from error accumulation and low-efficiency problems. In this paper, a novel framework, called FlightBERT++, is proposed to i) forecast multi-horizon flight trajectories directly in a non-autoregressive way, and ii) improved the limitation of the binary encoding (BE) representation in the FlightBERT framework. Specifically, the proposed framework is implemented by a generalized Encoder-Decoder architecture, in which the encoder learns the temporal-spatial patterns from historical observations and the decoder predicts the flight status for the future time steps. Compared to conventional architecture, an extra horizon-aware contexts generator (HACG) is dedicatedly designed to consider the prior horizon information that enables us to perform multi-horizon non-autoregressive prediction. Additionally, a differential prediction strategy is designed by well considering both the stationarity of the differential sequence and the high-bits errors of the BE representation. Moreover, the Bit-wise Weighted Binary Cross Entropy loss function is proposed to optimize the proposed framework that can further constrain the high-bits errors of the predictions. Finally, the proposed framework is validated on a real-world flight trajectory dataset. The experimental results show that the proposed framework outperformed the competitive baselines.

Computer-Vision Based Real Time Waypoint Generation for Autonomous Vineyard Navigation with Quadruped Robots

  • Authors: Lee Milburn, Juan Gamba, Miguel Fernandes, Claudio Semini
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.01700
  • Pdf link: https://arxiv.org/pdf/2305.01700
  • Abstract
    The VINUM project seeks to address the shortage of skilled labor in modern vineyards by introducing a cutting-edge mobile robotic solution. Leveraging the capabilities of the quadruped robot, HyQReal, this system, equipped with arm and vision sensors, offers autonomous navigation and winter pruning of grapevines reducing the need for human intervention. At the heart of this approach lies an architecture that empowers the robot to easily navigate vineyards, identify grapevines with unparalleled accuracy, and approach them for pruning with precision. A state machine drives the process, deftly switching between various stages to ensure seamless and efficient task completion. The system's performance was assessed through experimentation, focusing on waypoint precision and optimizing the robot's workspace for single-plant operations. Results indicate that the architecture is highly reliable, with a mean error of 21.5cm and a standard deviation of 17.6cm for HyQReal. However, improvements in grapevine detection accuracy are necessary for optimal performance. This work is based on a computer-vision-based navigation method for quadruped robots in vineyards, opening up new possibilities for selective task automation. The system's architecture works well in ideal weather conditions, generating and arriving at precise waypoints that maximize the attached robotic arm's workspace. This work is an extension of our short paper presented at the Italian Conference on Robotics and Intelligent Machines (I-RIM).

Stars Are All You Need: A Distantly Supervised Pyramid Network for Document-Level End-to-End Sentiment Analysis

  • Authors: Wenchang Li, Yixing Chen, John P. Lalor
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2305.01710
  • Pdf link: https://arxiv.org/pdf/2305.01710
  • Abstract
    In this paper, we propose document-level end-to-end sentiment analysis to efficiently understand aspect and review sentiment expressed in online reviews in a unified manner. In particular, we assume that star rating labels are a "coarse-grained synthesis" of aspect ratings across in the review. We propose a Distantly Supervised Pyramid Network (DSPN) to efficiently perform Aspect-Category Detection, Aspect-Category Sentiment Analysis, and Rating Prediction using only document star rating labels for training. By performing these three related sentiment subtasks in an end-to-end manner, DSPN can extract aspects mentioned in the review, identify the corresponding sentiments, and predict the star rating labels. We evaluate DSPN on multi-aspect review datasets in English and Chinese and find that with only star rating labels for supervision, DSPN can perform comparably well to a variety of benchmark models. We also demonstrate the interpretability of DSPN's outputs on reviews to show the pyramid structure inherent in document level end-to-end sentiment analysis.

Cross-view Action Recognition via Contrastive View-invariant Representation

  • Authors: Yuexi Zhang, Dan Luo, Balaji Sundareshan, Octavia Camps, Mario Sznaier
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.01733
  • Pdf link: https://arxiv.org/pdf/2305.01733
  • Abstract
    Cross view action recognition (CVAR) seeks to recognize a human action when observed from a previously unseen viewpoint. This is a challenging problem since the appearance of an action changes significantly with the viewpoint. Applications of CVAR include surveillance and monitoring of assisted living facilities where is not practical or feasible to collect large amounts of training data when adding a new camera. We present a simple yet efficient CVAR framework to learn invariant features from either RGB videos, 3D skeleton data, or both. The proposed approach outperforms the current state-of-the-art achieving similar levels of performance across input modalities: 99.4% (RGB) and 99.9% (3D skeletons), 99.4% (RGB) and 99.9% (3D Skeletons), 97.3% (RGB), and 99.2% (3D skeletons), and 84.4%(RGB) for the N-UCLA, NTU-RGB+D 60, NTU-RGB+D 120, and UWA3DII datasets, respectively.

Connectivity Queries under Vertex Failures: Not Optimal, but Practical

  • Authors: Evangelos Kosinas
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2305.01756
  • Pdf link: https://arxiv.org/pdf/2305.01756
  • Abstract
    We revisit once more the problem of designing an oracle for answering connectivity queries in undirected graphs in the presence of vertex failures. Specifically, given an undirected graph $G$ with $n$ vertices and $m$ edges and an integer $d_{\star}\ll n$, the goal is to preprocess the graph in order to construct a data structure $\mathcal{D}$ such that, given a set of vertices $F$ with $|F|=d\leq d_{\star}$, we can derive an oracle from $\mathcal{D}$ that can efficiently answer queries of the form "is $x$ connected with $y$ in $G\setminus F$?". Very recently, Long and Saranurak (FOCS 2022) provided a solution to this problem that is almost optimal with respect to the preprocessing time, the space usage, the update time, and the query time. However, their solution is highly complicated, and it seems very difficult to be implemented efficiently. Furthermore, it does not settle the complexity of the problem in the regime where $d_{\star}$ is a constant. Here, we provide a much simpler solution to this problem, that uses only textbook data structures. Our algorithm is deterministic, it has preprocessing time and space complexity $O(d_{\star}m\log n)$, update time $O(d^4 \log n)$, and query time $O(d)$. These bounds compare very well with the previous best, especially considering the simplicity of our approach. In fact, if we assume that $d_{\star}$ is a constant ($d_{\star}\geq 4$), then our algorithm improves on the state-of-the-art in every respect, except space. Nevertheless, even our space usage in this case is almost linear. Finally, the data structure that we provide is flexible with respect to $d_{\star}$: it can be adapted to increases and decreases, in time and space that are almost proportional to the change in $d_{\star}$ and the size of the graph.

Cheap and Deterministic Inference for Deep State-Space Models of Interacting Dynamical Systems

  • Authors: Andreas Look, Melih Kandemir, Barbara Rakitsch, Jan Peters
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2305.01773
  • Pdf link: https://arxiv.org/pdf/2305.01773
  • Abstract
    Graph neural networks are often used to model interacting dynamical systems since they gracefully scale to systems with a varying and high number of agents. While there has been much progress made for deterministic interacting systems, modeling is much more challenging for stochastic systems in which one is interested in obtaining a predictive distribution over future trajectories. Existing methods are either computationally slow since they rely on Monte Carlo sampling or make simplifying assumptions such that the predictive distribution is unimodal. In this work, we present a deep state-space model which employs graph neural networks in order to model the underlying interacting dynamical system. The predictive distribution is multimodal and has the form of a Gaussian mixture model, where the moments of the Gaussian components can be computed via deterministic moment matching rules. Our moment matching scheme can be exploited for sample-free inference, leading to more efficient and stable training compared to Monte Carlo alternatives. Furthermore, we propose structured approximations to the covariance matrices of the Gaussian components in order to scale up to systems with many agents. We benchmark our novel framework on two challenging autonomous driving datasets. Both confirm the benefits of our method compared to state-of-the-art methods. We further demonstrate the usefulness of our individual contributions in a carefully designed ablation study and provide a detailed runtime analysis of our proposed covariance approximations. Finally, we empirically demonstrate the generalization ability of our method by evaluating its performance on unseen scenarios.

Fairly Allocating Goods and (Terrible) Chores

  • Authors: Hadi Hosseini, Aghaheybat Mammadov, Tomasz Wąs
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2305.01786
  • Pdf link: https://arxiv.org/pdf/2305.01786
  • Abstract
    We study the fair allocation of mixtures of indivisible goods and chores under lexicographic preferences$\unicode{x2014}$a subdomain of additive preferences. A prominent fairness notion for allocating indivisible items is envy-freeness up to any item (EFX). Yet, its existence and computation has remained a notable open problem. By identifying a class of instances with "terrible chores", we show that determining the existence of an EFX allocation is NP-complete. This result immediately implies the intractability of EFX under additive preferences. Nonetheless, we propose a natural subclass of lexicographic preferences for which an EFX and Pareto optimal (PO) allocation is guaranteed to exist and can be computed efficiently for any mixed instance. Focusing on two weaker fairness notions, we investigate finding EF1 and PO allocations for special instances with terrible chores, and show that MMS and PO allocations can be computed efficiently for any mixed instance with lexicographic preferences.

Characterizing Compositionality of LQR from the Categorical Perspective

  • Authors: Baike She, Tyler Hanks, James Fairbanks, Matthew Hale
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.01811
  • Pdf link: https://arxiv.org/pdf/2305.01811
  • Abstract
    Composing systems is a fundamental concept in modern control systems, yet it remains challenging to formally analyze how controllers designed for individual subsystems can differ from controllers designed for the composition of those subsystems. To address this challenge, we propose a novel approach to composing control systems based on resource sharing machines, a concept from applied category theory. We use resource sharing machines to investigate the differences between (i) the linear-quadratic regulator (LQR) designed directly for a composite system and (ii) the LQR that is attained through the composition of LQRs designed for each subsystem. We first establish novel formalisms to compose LQR control designs using resource sharing machines. Then we develop new sufficient conditions to guarantee that the LQR designed for a composite system is equal to the LQR attained through composition of LQRs for its subsystems. In addition, we reduce the developed condition to that of checking the controllability and observability of a certain linear, time-invariant system, which provides a simple, computationally efficient procedure for evaluating the equivalence of controllers for composed systems.

Design Space Exploration and Optimization for Carbon-Efficient Extended Reality Systems

  • Authors: Mariam Elgamal, Doug Carmean, Elnaz Ansari, Okay Zed, Ramesh Peri, Srilatha Manne, Udit Gupta, Gu-Yeon Wei, David Brooks, Gage Hills, Carole-Jean Wu
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2305.01831
  • Pdf link: https://arxiv.org/pdf/2305.01831
  • Abstract
    As computing hardware becomes more specialized, designing environmentally sustainable computing systems requires accounting for both hardware and software parameters. Our goal is to design low carbon computing systems while maintaining a competitive level of performance and operational efficiency. Despite previous carbon modeling efforts for computing systems, there is a distinct lack of holistic design strategies to simultaneously optimize for carbon, performance, power and energy. In this work, we take a data-driven approach to characterize the carbon impact (quantified in units of CO2e) of various artificial intelligence (AI) and extended reality (XR) production-level hardware and application use-cases. We propose a holistic design exploration framework to optimize and design for carbon-efficient computing systems and hardware. Our frameworks identifies significant opportunities for carbon efficiency improvements in application-specific and general purpose hardware design and optimization. Using our framework, we demonstrate 10$\times$ carbon efficiency improvement for specialized AI and XR accelerators (quantified by a key metric, tCDP: the product of total CO2e and total application execution time), up to 21% total life cycle carbon savings for existing general-purpose hardware and applications due to hardware over-provisioning, and up to 7.86$\times$ carbon efficiency improvement using advanced 3D integration techniques for resource-constrained XR systems.

Bio-Inspired Simple Neural Network for Low-Light Image Restoration: A Minimalist Approach

  • Authors: Junjie Ye, Jilin Zhao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2305.01844
  • Pdf link: https://arxiv.org/pdf/2305.01844
  • Abstract
    In this study, we explore the potential of using a straightforward neural network inspired by the retina model to efficiently restore low-light images. The retina model imitates the neurophysiological principles and dynamics of various optical neurons. Our proposed neural network model reduces the computational overhead compared to traditional signal-processing models while achieving results similar to complex deep learning models from a subjective perceptual perspective. By directly simulating retinal neuron functionalities with neural networks, we not only avoid manual parameter optimization but also lay the groundwork for constructing artificial versions of specific neurobiological organizations.

Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models

  • Authors: Daochen Zha, Louis Feng, Liang Luo, Bhargav Bhushanam, Zirui Liu, Yusuo Hu, Jade Nie, Yuzhen Huang, Yuandong Tian, Arun Kejariwal, Xia Hu
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2305.01868
  • Pdf link: https://arxiv.org/pdf/2305.01868
  • Abstract
    Sharding a large machine learning model across multiple devices to balance the costs is important in distributed training. This is challenging because partitioning is NP-hard, and estimating the costs accurately and efficiently is difficult. In this work, we explore a "pre-train, and search" paradigm for efficient sharding. The idea is to pre-train a universal and once-for-all neural network to predict the costs of all the possible shards, which serves as an efficient sharding simulator. Built upon this pre-trained cost model, we then perform an online search to identify the best sharding plans given any specific sharding task. We instantiate this idea in deep learning recommendation models (DLRMs) and propose NeuroShard for embedding table sharding. NeuroShard pre-trains neural cost models on augmented tables to cover various sharding scenarios. Then it identifies the best column-wise and table-wise sharding plans with beam search and greedy grid search, respectively. Experiments show that NeuroShard significantly and consistently outperforms the state-of-the-art on the benchmark sharding dataset, achieving up to 23.8% improvement. When deployed in an ultra-large production DLRM with multi-terabyte embedding tables, NeuroShard achieves 11.6% improvement in embedding costs over the state-of-the-art, which translates to 6.6% end-to-end training throughput improvement. To facilitate future research of the "pre-train, and search" paradigm in ML for Systems, we open-source our code at https://github.com/daochenzha/neuroshard

Prediction of Performance and Power Consumption of GPGPU Applications

  • Authors: Gargi Alavani, Santonu Sarkar
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2305.01886
  • Pdf link: https://arxiv.org/pdf/2305.01886
  • Abstract
    Graphics Processing Units (GPUs) have become an integral part of High-Performance Computing to achieve an Exascale performance. The main goal of application developers of GPU is to tune their code extensively to obtain optimal performance, making efficient use of different resources available. While extracting optimal performance of applications on an HPC infrastructure, developers should also ensure the applications have the least energy usage considering the massive power consumption of data centres and HPC servers. This thesis presents two models developed which can be utilized by developers in analysing the CUDA kernel's energy dissipation. The first one is a model that predicts the CUDA kernel's execution time. Here a PTX code is statically analysed to extract instruction features, control flow, and data dependence. We propose two scheduling algorithm approaches that satisfy the performance and hardware constraints. The second model is a static analysis-based power prediction built by utilizing machine learning techniques. Features used for building the model are derived using static analysis of PTX code. These features are chosen to understand the relationship between GPU power consumption and program features that can aid developers in building energy-efficient, sustainable applications. The dataset used for validating both models include kernels from different benchmarks suits, sizes, nature (e.g., compute-bound, memory-bound), and complexity (e.g., control divergence, memory access patterns). We also present a tool that has practically validated the effectiveness and ease of using the two models as design assistance tools for GPU.

Revolutionizing Agrifood Systems with Artificial Intelligence: A Survey

  • Authors: Tao Chen, Liang Lv, Di Wang, Jing Zhang, Yue Yang, Zeyang Zhao, Chen Wang, Xiaowei Guo, Hao Chen, Qingye Wang, Yufei Xu, Qiming Zhang, Bo Du, Liangpei Zhang, Dacheng Tao
  • Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2305.01899
  • Pdf link: https://arxiv.org/pdf/2305.01899
  • Abstract
    With the world population rapidly increasing, transforming our agrifood systems to be more productive, efficient, safe, and sustainable is crucial to mitigate potential food shortages. Recently, artificial intelligence (AI) techniques such as deep learning (DL) have demonstrated their strong abilities in various areas, including language, vision, remote sensing (RS), and agrifood systems applications. However, the overall impact of AI on agrifood systems remains unclear. In this paper, we thoroughly review how AI techniques can transform agrifood systems and contribute to the modern agrifood industry. Firstly, we summarize the data acquisition methods in agrifood systems, including acquisition, storage, and processing techniques. Secondly, we present a progress review of AI methods in agrifood systems, specifically in agriculture, animal husbandry, and fishery, covering topics such as agrifood classification, growth monitoring, yield prediction, and quality assessment. Furthermore, we highlight potential challenges and promising research opportunities for transforming modern agrifood systems with AI. We hope this survey could offer an overall picture to newcomers in the field and serve as a starting point for their further research.

Hybrid Active-Passive IRS Assisted Energy-Efficient Wireless Communication

  • Authors: Qiaoyan Peng, Guangji Chen, Qingqing Wu, Ruiqi Liu, Shaodan Ma, Wen Chen
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.01924
  • Pdf link: https://arxiv.org/pdf/2305.01924
  • Abstract
    Deploying active reflecting elements at the intelligent reflecting surface (IRS) increases signal amplification capability but incurs higher power consumption. Therefore, it remains a challenging and open problem to determine the optimal number of active/passive elements for maximizing energy efficiency (EE). To answer this question, we consider a hybrid active-passive IRS (H-IRS) assisted wireless communication system, where the H-IRS consists of both active and passive reflecting elements.Specifically, we study the optimization of the number of active/passive elements at the H-IRS to maximize EE. To this end, we first derive the closed-form expression for a near-optimal solution under the line-of-sight (LoS) channel case and obtain its optimal solution under the Rayleigh fading channel case. Then, an efficient algorithm is employed to obtain a high-quality sub-optimal solution for the EE maximization under the general Rician channel case. Simulation results demonstrate the effectiveness of the H-IRS for maximizing EE under different Rician factors and IRS locations.

Illicit item detection in X-ray images for security applications

  • Authors: Georgios Batsis, Ioannis Mademlis, Georgios Th. Papadopoulos
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.01936
  • Pdf link: https://arxiv.org/pdf/2305.01936
  • Abstract
    Automated detection of contraband items in X-ray images can significantly increase public safety, by enhancing the productivity and alleviating the mental load of security officers in airports, subways, customs/post offices, etc. The large volume and high throughput of passengers, mailed parcels, etc., during rush hours make it a Big Data analysis task. Modern computer vision algorithms relying on Deep Neural Networks (DNNs) have proven capable of undertaking this task even under resource-constrained and embedded execution scenarios, e.g., as is the case with fast, single-stage, anchor-based object detectors. This paper proposes a two-fold improvement of such algorithms for the X-ray analysis domain, introducing two complementary novelties. Firstly, more efficient anchors are obtained by hierarchical clustering the sizes of the ground-truth training set bounding boxes; thus, the resulting anchors follow a natural hierarchy aligned with the semantic structure of the data. Secondly, the default Non-Maximum Suppression (NMS) algorithm at the end of the object detection pipeline is modified to better handle occluded object detection and to reduce the number of false predictions, by inserting the Efficient Intersection over Union (E-IoU) metric into the Weighted Cluster NMS method. E-IoU provides more discriminative geometrical correlations between the candidate bounding boxes/Regions-of-Interest (RoIs). The proposed method is implemented on a common single-stage object detector (YOLOv5) and its experimental evaluation on a relevant public dataset indicates significant accuracy gains over both the baseline and competing approaches. This highlights the potential of Big Data analysis in enhancing public safety.

Optimal Resource Management for Hierarchical Federated Learning over HetNets with Wireless Energy Transfer

  • Authors: Rami Hamdi, Ahmed Ben Said, Emna Baccour, Aiman Erbad, Amr Mohamed, Mounir Hamdi, Mohsen Guizani
  • Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.01953
  • Pdf link: https://arxiv.org/pdf/2305.01953
  • Abstract
    Remote monitoring systems analyze the environment dynamics in different smart industrial applications, such as occupational health and safety, and environmental monitoring. Specifically, in industrial Internet of Things (IoT) systems, the huge number of devices and the expected performance put pressure on resources, such as computational, network, and device energy. Distributed training of Machine and Deep Learning (ML/DL) models for intelligent industrial IoT applications is very challenging for resource limited devices over heterogeneous wireless networks (HetNets). Hierarchical Federated Learning (HFL) performs training at multiple layers offloading the tasks to nearby Multi-Access Edge Computing (MEC) units. In this paper, we propose a novel energy-efficient HFL framework enabled by Wireless Energy Transfer (WET) and designed for heterogeneous networks with massive Multiple-Input Multiple-Output (MIMO) wireless backhaul. Our energy-efficiency approach is formulated as a Mixed-Integer Non-Linear Programming (MINLP) problem, where we optimize the HFL device association and manage the wireless transmitted energy. However due to its high complexity, we design a Heuristic Resource Management Algorithm, namely H2RMA, that respects energy, channel quality, and accuracy constraints, while presenting a low computational complexity. We also improve the energy consumption of the network using an efficient device scheduling scheme. Finally, we investigate device mobility and its impact on the HFL performance. Our extensive experiments confirm the high performance of the proposed resource management approach in HFL over HetNets, in terms of training loss and grid energy costs.

Putting collective intelligence to the enforcement of the Digital Services Act

  • Authors: Suzanne Vergnolle (LISE)
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2305.01959
  • Pdf link: https://arxiv.org/pdf/2305.01959
  • Abstract
    While underlying the many ways to build strong cooperation settings between regulators and CSOs, this report focuses on making concrete recommendations for the design of an efficient and influential expert group with the European Commission. The creation of an expert group finds its roots in article 64 and recital 137 of the DSA which require the Commission to develop Union expertise and capabilities. Once established, the experts of this group will be able to bring evidence-based information directly to the Commission and specific expertise on the protection of fundamental rights and the safety of users online. By instituting an expert group, the Commission will not only benefit from valuable expert knowledge but will also demonstrate its willingness to put in place an efficient enforcement system based on collective intelligence. Aside from the establishment of an expert group, other cumulative mechanisms will also help the DSA's enforcement to thrive. Civil society organisations should, for instance, consider organising regular crowdsourcing events to deep-dive and analyse the data published by entities covered by the transparency obligations. As it has done in the past, the Commission can sponsor these events and be a direct beneficiary of their results. Another way for civil society organisations to bring information to the Regulator is by legal action, including by making complaints to the regulators.

"Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

  • Authors: Zhixi Cai, Shreya Ghosh, Tom Gedeon, Abhinav Dhall, Kalin Stefanov, Munawar Hayat
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.01979
  • Pdf link: https://arxiv.org/pdf/2305.01979
  • Abstract
    Most deepfake detection methods focus on detecting spatial and/or spatio-temporal changes in facial attributes. This is because available benchmark datasets contain mostly visual-only modifications. However, a sophisticated deepfake may include small segments of audio or audio-visual manipulations that can completely change the meaning of the content. To addresses this gap, we propose and benchmark a new dataset, Localized Audio Visual DeepFake (LAV-DF), consisting of strategic content-driven audio, visual and audio-visual manipulations. The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture which efficiently captures multimodal manipulations. We further improve (i.e. BA-TFD+) the baseline method by replacing the backbone with a Multiscale Vision Transformer and guide the training process with contrastive, frame classification, boundary matching and multimodal boundary matching loss functions. The quantitative analysis demonstrates the superiority of BA- TFD+ on temporal forgery localization and deepfake detection tasks using several benchmark datasets including our newly proposed dataset. The dataset, models and code are available at https://github.com/ControlNet/LAV-DF.

Computing paths of large rank in planar frameworks deterministically

  • Authors: Fedor V. Fomin, Petr A. Golovach, Tuukka Korhonen, Giannos Stamoulis
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2305.01993
  • Pdf link: https://arxiv.org/pdf/2305.01993
  • Abstract
    A framework consists of an undirected graph $G$ and a matroid $M$ whose elements correspond to the vertices of $G$. Recently, Fomin et al. [SODA 2023] and Eiben et al. [ArXiV 2023] developed parameterized algorithms for computing paths of rank $k$ in frameworks. More precisely, for vertices $s$ and $t$ of $G$, and an integer $k$, they gave FPT algorithms parameterized by $k$ deciding whether there is an $(s,t)$-path in $G$ whose vertex set contains a subset of elements of $M$ of rank $k$. These algorithms are based on Schwartz-Zippel lemma for polynomial identity testing and thus are randomized, and therefore the existence of a deterministic FPT algorithm for this problem remains open. We present the first deterministic FPT algorithm that solves the problem in frameworks whose underlying graph $G$ is planar. While the running time of our algorithm is worse than the running times of the recent randomized algorithms, our algorithm works on more general classes of matroids. In particular, this is the first FPT algorithm for the case when matroid $M$ is represented over rationals. Our main technical contribution is the nontrivial adaptation of the classic irrelevant vertex technique to frameworks to reduce the given instance to one of bounded treewidth. This allows us to employ the toolbox of representative sets to design a dynamic programming procedure solving the problem efficiently on instances of bounded treewidth.

Approximating Long Cycle Above Dirac's Guarantee

  • Authors: Fedor F. Fomin, Petr A. Golovach, Danil Sagunov, Kirill Simonov
  • Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
  • Arxiv link: https://arxiv.org/abs/2305.02011
  • Pdf link: https://arxiv.org/pdf/2305.02011
  • Abstract
    Parameterization above (or below) a guarantee is a successful concept in parameterized algorithms. The idea is that many computational problems admit natural'' guarantees bringing to algorithmic questions whether a better solution (above the guarantee) could be obtained efficiently. The above guarantee paradigm has led to several exciting discoveries in the areas of parameterized algorithms and kernelization. We argue that this paradigm could bring forth fresh perspectives on well-studied problems in approximation algorithms. Our example is the longest cycle problem. One of the oldest results in extremal combinatorics is the celebrated Dirac's theorem from 1952. Dirac's theorem provides the following guarantee on the length of the longest cycle: for every 2-connected n-vertex graph G with minimum degree \delta(G)\leq n/2, the length of a longest cycle L is at least 2\delta(G). Thus, the essential'' part in finding the longest cycle is in approximating the ``offset'' k = L - 2 \delta(G). The main result of this paper is the above-guarantee approximation theorem for k. Informally, the theorem says that approximating the offset k is not harder than approximating the total length L of a cycle. In other words, for any (reasonably well-behaved) function f, a polynomial time algorithm constructing a cycle of length f(L) in an undirected graph with a cycle of length L, yields a polynomial time algorithm constructing a cycle of length 2\delta(G)+\Omega(f(k)).

Deep Learning-Based Multiband Signal Fusion for 3-D SAR Super-Resolution

  • Authors: Josiah Smith, Murat Torlak
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.02017
  • Pdf link: https://arxiv.org/pdf/2305.02017
  • Abstract
    Three-dimensional (3-D) synthetic aperture radar (SAR) is widely used in many security and industrial applications requiring high-resolution imaging of concealed or occluded objects. The ability to resolve intricate 3-D targets is essential to the performance of such applications and depends directly on system bandwidth. However, because high-bandwidth systems face several prohibitive hurdles, an alternative solution is to operate multiple radars at distinct frequency bands and fuse the multiband signals. Current multiband signal fusion methods assume a simple target model and a small number of point reflectors, which is invalid for realistic security screening and industrial imaging scenarios wherein the target model effectively consists of a large number of reflectors. To the best of our knowledge, this study presents the first use of deep learning for multiband signal fusion. The proposed network, called kR-Net, employs a hybrid, dual-domain complex-valued convolutional neural network (CV-CNN) to fuse multiband signals and impute the missing samples in the frequency gaps between subbands. By exploiting the relationships in both the wavenumber domain and wavenumber spectral domain, the proposed framework overcomes the drawbacks of existing multiband imaging techniques for realistic scenarios at a fraction of the computation time of existing multiband fusion algorithms. Our method achieves high-resolution imaging of intricate targets previously impossible using conventional techniques and enables finer resolution capacity for concealed weapon detection and occluded object classification using multiband signaling without requiring more advanced hardware. Furthermore, a fully integrated multiband imaging system is developed using commercially available millimeter-wave (mmWave) radars for efficient multiband imaging.

Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model

  • Authors: Di Wang, Jing Zhang, Bo Du, Dacheng Tao, Liangpei Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.02034
  • Pdf link: https://arxiv.org/pdf/2305.02034
  • Abstract
    The success of the Segment Anything Model (SAM) demonstrates the significance of data-centric machine learning. However, due to the difficulties and high costs associated with annotating Remote Sensing (RS) images, a large amount of valuable RS data remains unlabeled, particularly at the pixel level. In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS. SAMRS surpasses existing high-resolution RS segmentation datasets in size by several orders of magnitude, and provides object category, location, and instance information that can be used for semantic segmentation, instance segmentation, and object detection, either individually or in combination. We also provide a comprehensive analysis of SAMRS from various aspects. We hope it could facilitate research in RS segmentation, particularly in large model pre-training.

Improved Static Hand Gesture Classification on Deep Convolutional Neural Networks using Novel Sterile Training Technique

  • Authors: Josiah Smith, Shiva Thiagarajan, Richard Willis, Yiorgos Makris, Murat Torlak
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.02039
  • Pdf link: https://arxiv.org/pdf/2305.02039
  • Abstract
    In this paper, we investigate novel data collection and training techniques towards improving classification accuracy of non-moving (static) hand gestures using a convolutional neural network (CNN) and frequency-modulated-continuous-wave (FMCW) millimeter-wave (mmWave) radars. Recently, non-contact hand pose and static gesture recognition have received considerable attention in many applications ranging from human-computer interaction (HCI), augmented/virtual reality (AR/VR), and even therapeutic range of motion for medical applications. While most current solutions rely on optical or depth cameras, these methods require ideal lighting and temperature conditions. mmWave radar devices have recently emerged as a promising alternative offering low-cost system-on-chip sensors whose output signals contain precise spatial information even in non-ideal imaging conditions. Additionally, deep convolutional neural networks have been employed extensively in image recognition by learning both feature extraction and classification simultaneously. However, little work has been done towards static gesture recognition using mmWave radars and CNNs due to the difficulty involved in extracting meaningful features from the radar return signal, and the results are inferior compared with dynamic gesture classification. This article presents an efficient data collection approach and a novel technique for deep CNN training by introducing ``sterile'' images which aid in distinguishing distinct features among the static gestures and subsequently improve the classification accuracy. Applying the proposed data collection and training methods yields an increase in classification rate of static hand gestures from $85%$ to $93%$ and $90%$ to $95%$ for range and range-angle profiles, respectively.

Approximate Evaluation of Quantitative Second Order Queries

  • Authors: Jan Dreier, Robert Ganian, Thekla Hamm
  • Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2305.02056
  • Pdf link: https://arxiv.org/pdf/2305.02056
  • Abstract
    Courcelle's theorem and its adaptations to cliquewidth have shaped the field of exact parameterized algorithms and are widely considered the archetype of algorithmic meta-theorems. In the past decade, there has been growing interest in developing parameterized approximation algorithms for problems which are not captured by Courcelle's theorem and, in particular, are considered not fixed-parameter tractable under the associated widths. We develop a generalization of Courcelle's theorem that yields efficient approximation schemes for any problem that can be captured by an expanded logic we call Blocked CMSO, capable of making logical statements about the sizes of set variables via so-called weight comparisons. The logic controls weight comparisons via the quantifier-alternation depth of the involved variables, allowing full comparisons for zero-alternation variables and limited comparisons for one-alternation variables. We show that the developed framework threads the very needle of tractability: on one hand it can describe a broad range of approximable problems, while on the other hand we show that the restrictions of our logic cannot be relaxed under well-established complexity assumptions. The running time of our approximation scheme is polynomial in $1/\varepsilon$, allowing us to fully interpolate between faster approximate algorithms and slower exact algorithms. This provides a unified framework to explain the tractability landscape of graph problems parameterized by treewidth and cliquewidth, as well as classical non-graph problems such as Subset Sum and Knapsack.

A survey of modularized backstepping control design approaches to nonlinear ODE systems

  • Authors: Zhengru Ren
  • Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.02066
  • Pdf link: https://arxiv.org/pdf/2305.02066
  • Abstract
    Backstepping is a mature and powerful Lyapunov-based design approach for a specific set of systems. Throughout the development over three decades, innovative theories and practices have extended backstepping to stabilization and tracking problems for nonlinear systems with growing complexity. The attractions of the backstepping-like approach are the recursive design processes and modularized design. A nonlinear system can be transferred into a group of simple problems and solved it by a sequential superposition of the corresponding approaches for each problem. To handle the complexities, backstepping designs always come up with adaptive control and robust control. The survey aims to review the milestone theoretical achievements among thousands of publications making the state-feedback backstepping designs of complex ODE systems to be systematic and modularized. Several selected elegant methods are reviewed, starting from the general designs, and then the finite-time control enhancing the convergence rate, the fuzzy logic system and neural network estimating the system unknowns, the Nussbaum function handling unknown control coefficients, barrier Lyapunov function solving state constraints, and the hyperbolic tangent function applying in robust designs. The associated assumptions and Lyapunov function candidates, inequalities, and the deduction key points are reviewed. The nonlinearity and complexities lay in state constraints, disturbance, input nonlinearities, time-delay effects, pure feedback systems, event-triggered systems, and stochastic systems. Instead of networked systems, the survey focuses on stand-alone systems.

A Vision Transformer Approach for Efficient Near-Field Irregular SAR Super-Resolution

  • Authors: Josiah Smith, Yusef Alimam, Geetika Vedula
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.02074
  • Pdf link: https://arxiv.org/pdf/2305.02074
  • Abstract
    In this paper, we develop a novel super-resolution algorithm for near-field synthetic-aperture radar (SAR) under irregular scanning geometries. As fifth-generation (5G) millimeter-wave (mmWave) devices are becoming increasingly affordable and available, high-resolution SAR imaging is feasible for end-user applications and non-laboratory environments. Emerging applications such freehand imaging, wherein a handheld radar is scanned throughout space by a user, unmanned aerial vehicle (UAV) imaging, and automotive SAR face several unique challenges for high-resolution imaging. First, recovering a SAR image requires knowledge of the array positions throughout the scan. While recent work has introduced camera-based positioning systems capable of adequately estimating the position, recovering the algorithm efficiently is a requirement to enable edge and Internet of Things (IoT) technologies. Efficient algorithms for non-cooperative near-field SAR sampling have been explored in recent work, but suffer image defocusing under position estimation error and can only produce medium-fidelity images. In this paper, we introduce a mobile-friend vision transformer (ViT) architecture to address position estimation error and perform SAR image super-resolution (SR) under irregular sampling geometries. The proposed algorithm, Mobile-SRViT, is the first to employ a ViT approach for SAR image enhancement and is validated in simulation and via empirical studies.

Rethinking the Encoding of Satellite Image Time Series

  • Authors: Xin Cai, Yaxin Bi, Peter Nicholl, Roy Sterritt
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.02086
  • Pdf link: https://arxiv.org/pdf/2305.02086
  • Abstract
    Representation learning of Satellite Image Time Series (SITS) presents its unique challenges, such as prohibitive computation burden caused by high spatiotemporal resolutions, irregular acquisition times, and complex spatiotemporal interactions, leading to highly-specialized neural network architectures for SITS analysis. Despite the promising results achieved by some pioneering work, we argue that satisfactory representation learning paradigms have not yet been established for SITS analysis, causing an isolated island where transferring successful paradigms or the latest advances from Computer Vision (CV) to SITS is arduous. In this paper, we develop a unique perspective of SITS processing as a direct set prediction problem, inspired by the recent trend in adopting query-based transformer decoders to streamline the object detection or image segmentation pipeline, and further propose to decompose the representation learning process of SITS into three explicit steps: collect--update--distribute, which is computationally efficient and suits for irregularly-sampled and asynchronous temporal observations. Facilitated by the unique reformulation and effective feature extraction framework proposed, our models pre-trained on pixel-set format input and then fine-tuned on downstream dense prediction tasks by simply appending a commonly-used segmentation network have attained new state-of-the-art (SoTA) results on PASTIS dataset compared to bespoke neural architectures such as U-TAE. Furthermore, the clear separation, conceptually and practically, between temporal and spatial components in the panoptic segmentation pipeline of SITS allows us to leverage the recent advances in CV, such as Mask2Former, a universal segmentation architecture, resulting in a noticeable 8.8 points increase in PQ compared to the best score reported so far.

Efficient CNN-based Super Resolution Algorithms for mmWave Mobile Radar Imaging

  • Authors: Christos Vasileiou, Josiah W. Smith, Shiva Thiagarajan, Matthew Nigh, Yiorgos Makris, Murat Torlak
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.02092
  • Pdf link: https://arxiv.org/pdf/2305.02092
  • Abstract
    In this paper, we introduce an innovative super resolution approach to emerging modes of near-field synthetic aperture radar (SAR) imaging. Recent research extends convolutional neural network (CNN) architectures from the optical to the electromagnetic domain to achieve super resolution on images generated from radar signaling. Specifically, near-field synthetic aperture radar (SAR) imaging, a method for generating high-resolution images by scanning a radar across space to create a synthetic aperture, is of interest due to its high-fidelity spatial sensing capability, low cost devices, and large application space. Since SAR imaging requires large aperture sizes to achieve high resolution, super-resolution algorithms are valuable for many applications. Freehand smartphone SAR, an emerging sensing modality, requires irregular SAR apertures in the near-field and computation on mobile devices. Achieving efficient high-resolution SAR images from irregularly sampled data collected by freehand motion of a smartphone is a challenging task. In this paper, we propose a novel CNN architecture to achieve SAR image super-resolution for mobile applications by employing state-of-the-art SAR processing and deep learning techniques. The proposed algorithm is verified via simulation and an empirical study. Our algorithm demonstrates high-efficiency and high-resolution radar imaging for near-field scenarios with irregular scanning geometries.

Heterogeneous GNN-RL Based Task Offloading for UAV-aided Smart Agriculture

  • Authors: Turgay Pamuklu, Aisha Syed, W. Sean Kennedy, Melike Erol-Kantarci
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2305.02112
  • Pdf link: https://arxiv.org/pdf/2305.02112
  • Abstract
    Having unmanned aerial vehicles (UAVs) with edge computing capability hover over smart farmlands supports Internet of Things (IoT) devices with low processing capacity and power to accomplish their deadline-sensitive tasks efficiently and economically. In this work, we propose a graph neural network-based reinforcement learning solution to optimize the task offloading from these IoT devices to the UAVs. We conduct evaluations to show that our approach reduces task deadline violations while also increasing the mission time of the UAVs by optimizing their battery usage. Moreover, the proposed solution has increased robustness to network topology changes and is able to adapt to extreme cases, such as the failure of a UAV.

Automatic Parameterization for Aerodynamic Shape Optimization via Deep Geometric Learning

  • Authors: Zhen Wei, Pascal Fua, Michaël Bauerheim
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Fluid Dynamics (physics.flu-dyn)
  • Arxiv link: https://arxiv.org/abs/2305.02116
  • Pdf link: https://arxiv.org/pdf/2305.02116
  • Abstract
    We propose two deep learning models that fully automate shape parameterization for aerodynamic shape optimization. Both models are optimized to parameterize via deep geometric learning to embed human prior knowledge into learned geometric patterns, eliminating the need for further handcrafting. The Latent Space Model (LSM) learns a low-dimensional latent representation of an object from a dataset of various geometries, while the Direct Mapping Model (DMM) builds parameterization on the fly using only one geometry of interest. We also devise a novel regularization loss that efficiently integrates volumetric mesh deformation into the parameterization model. The models directly manipulate the high-dimensional mesh data by moving vertices. LSM and DMM are fully differentiable, enabling gradient-based, end-to-end pipeline design and plug-and-play deployment of surrogate models or adjoint solvers. We perform shape optimization experiments on 2D airfoils and discuss the applicable scenarios for the two models.

On the Channel Correlation in Reconfigurable Intelligent Surface-Aided System

  • Authors: Kuang-Hao (Stanley)Liu
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.02125
  • Pdf link: https://arxiv.org/pdf/2305.02125
  • Abstract
    This works explores the correlation between channels in reconfigurable intelligent surface (RIS)-aided communication systems. In this type of system, an RIS made up of many passive elements with adjustable phases reflects the transmitter's signal to the receiver. Since the transmitter-RIS link may be shared by multiple receivers, the cascade channels of two receivers may experience correlated fading, which can negatively impact system performance. Using the mean correlation coefficient as a metric, we analyze the correlation between two cascade channels and derive an accurate approximation in closed form. We also consider the extreme case of an infinitely large number of RIS elements and obtain a convergence result. Our analysis accuracy is validated by simulation results, which offer insights into the correlation characteristics of RIS-aided fading channels.

An identification method for oscillators with response-dependent inertia

  • Authors: Yuval Harduf (1), Eyal Setter (1), Izhak Bucher (1) ((1) Technion Israel Institute of Technology, Faculty of mechanical engineering)
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.02135
  • Pdf link: https://arxiv.org/pdf/2305.02135
  • Abstract
    This paper is concerned with identifying the instantaneous modal parameters of oscillatory systems with response-dependent inertia (mass, inductance, or equivalent) based on their measured dynamics. An identification method is proposed, which is a variation of the "FORCEVIB" method. The method utilizes analytic signal representation and the properties of the Hilbert transform to obtain an analytic relationship between a system's natural frequency and damping coefficient to its response and excitation signals. The proposed method is validated by comparing the identification results to the asymptotic solution of a simple system with response-dependent inertia and is then demonstrated, numerically and experimentally, for other, more complicated, nonlinear systems.

Learning-Augmented Online TSP on Rings, Trees, Flowers and (almost) Everywhere Else

  • Authors: Evripidis Bampis, Bruno Escoffier, Themis Gouleakis, Niklas Hahn, Kostas Lakis, Golnoosh Shahkarami, Michalis Xefteris
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2305.02169
  • Pdf link: https://arxiv.org/pdf/2305.02169
  • Abstract
    We study the Online Traveling Salesperson Problem (OLTSP) with predictions. In OLTSP, a sequence of initially unknown requests arrive over time at points (locations) of a metric space. The goal is, starting from a particular point of the metric space (the origin), to serve all these requests while minimizing the total time spent. The server moves with unit speed or is "waiting" (zero speed) at some location. We consider two variants: in the open variant, the goal is achieved when the last request is served. In the closed one, the server additionally has to return to the origin. We adopt a prediction model, introduced for OLTSP on the line, in which the predictions correspond to the locations of the requests and extend it to more general metric spaces. We first propose an oracle-based algorithmic framework, inspired by previous work. This framework allows us to design online algorithms for general metric spaces that provide competitive ratio guarantees which, given perfect predictions, beat the best possible classical guarantee (consistency). Moreover, they degrade gracefully along with the increase in error (smoothness), but always within a constant factor of the best known competitive ratio in the classical case (robustness). Having reduced the problem to designing suitable efficient oracles, we describe how to achieve this for general metric spaces as well as specific metric spaces (rings, trees and flowers), the resulting algorithms being tractable in the latter case. The consistency guarantees of our algorithms are tight in almost all cases, and their smoothness guarantees only suffer a linear dependency on the error, which we show is necessary. Finally, we provide robustness guarantees improving previous results.

Evanescent Plane Wave Approximation of Helmholtz Solutions in Spherical Domains

  • Authors: Nicola Galante
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2305.02175
  • Pdf link: https://arxiv.org/pdf/2305.02175
  • Abstract
    The recent results presented in arXiv:2202.05608 have led to significant developments in achieving stable approximations of Helmholtz solutions by plane wave superposition. The study shows that the numerical instability and ill-conditioning inherent in plane wave-based Trefftz methods can be effectively overcome with regularization techniques, provided there exist accurate approximations in the form of expansions with bounded coefficients. Whenever the target solution contains high Fourier modes, propagative plane waves fail to yield stable approximations due to the exponential growth of the expansion coefficients. Conversely, evanescent plane waves, whose modal content covers high Fourier regimes, are able to provide both accurate and stable results. The developed numerical approach, which involves constructing evanescent plane wave approximation sets by sampling the parametric domain according to a probability density function, results in substantial improvements when compared to conventional propagative plane wave schemes. The following work extends this research to the three-dimensional setting, confirming the achieved results and introducing new ones. By generalizing the 3D Jacobi$-$Anger identity to complex-valued directions, we show that any Helmholtz solution in a ball can be represented as a continuous superposition of evanescent plane waves. This representation extends the classical Herglotz one and provides a relevant stability result that cannot be achieved with the use of propagative waves alone. The proposed numerical recipes have been tailored for the 3D setting and extended with new sampling strategies involving extremal systems of points. These methods are tested by numerical experiments, showing the desired accuracy and bounded-coefficient stability, in line with the two-dimensional case.

Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity

  • Authors: Haoran Xu, Maha Elbayad, Kenton Murray, Jean Maillard, Vedanuj Goswami
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2305.02176
  • Pdf link: https://arxiv.org/pdf/2305.02176
  • Abstract
    Mixture-of-experts (MoE) models that employ sparse activation have demonstrated effectiveness in significantly increasing the number of parameters while maintaining low computational requirements per token. However, recent studies have established that MoE models are inherently parameter-inefficient as the improvement in performance diminishes with an increasing number of experts. We hypothesize this parameter inefficiency is a result of all experts having equal capacity, which may not adequately meet the varying complexity requirements of different tokens or tasks, e.g., in a multilingual setting, languages based on their resource levels might require different capacities. In light of this, we propose Stratified Mixture of Experts(SMoE) models, which feature a stratified structure and can assign dynamic capacity to different tokens. We demonstrate the effectiveness of SMoE on two multilingual machine translation benchmarks, where it outperforms multiple state-of-the-art MoE models. On a diverse 15-language dataset, SMoE improves the translation quality over vanilla MoE by +0.93 BLEU points on average. Additionally, SMoE is parameter-efficient, matching vanilla MoE performance with around 50% fewer parameters.

Experiences with Remote Examination Formats in Light of GPT-4

  • Authors: Felix Dobslaw, Peter Bergh
  • Subjects: Computers and Society (cs.CY); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2305.02198
  • Pdf link: https://arxiv.org/pdf/2305.02198
  • Abstract
    Sudden access to the rapidly improving large language model GPT by open-ai forces educational institutions worldwide to revisit their exam procedures. In the pre-GPT era, we successfully applied oral and open-book home exams for two courses in the third year of our predominantly remote Software Engineering BSc program. We ask in this paper whether our current open-book exams are still viable or whether a move back to a legally compliant but less scalable oral exam is the only workable alternative. We further compare work-effort estimates between oral and open-book exams and report on differences in throughput and grade distribution over eight years to better understand the impact of examination format on the outcome. Examining GPT v4 on the most recent open-book exams showed that our current Artificial Intelligence and Reactive Programming exams are not GPT v4 proof. Three potential weaknesses of GPT are outlined. We also found that grade distributions have largely been unaffected by the examination format, opening up for a move to oral examinations only if needed. Throughput was higher for open-book exam course instances (73% vs 64%), while fail rates were too (12% vs 7%), with teacher workload increasing even for smaller classes. We also report on our experience regarding effort. Oral examinations are efficient for smaller groups but come with caveats regarding intensity and stress.

Stream Efficient Learning

  • Authors: Zhi-Hua Zhou
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.02217
  • Pdf link: https://arxiv.org/pdf/2305.02217
  • Abstract
    Data in many real-world applications are often accumulated over time, like a stream. In contrast to conventional machine learning studies that focus on learning from a given training data set, learning from data streams cannot ignore the fact that the incoming data stream can be potentially endless with overwhelming size and unknown changes, and it is impractical to assume to have sufficient computational/storage resource such that all received data can be handled in time. Thus, the generalization performance of learning from data streams depends not only on how many data have been received, but also on how many data can be well exploited timely, with resource and rapidity concerns, in addition to the ability of learning algorithm and complexity of the problem. For this purpose, in this article we introduce the notion of machine learning throughput, define Stream Efficient Learning and present a preliminary theoretical framework.

LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning

  • Authors: Timothy Castiglia, Yi Zhou, Shiqiang Wang, Swanand Kadhe, Nathalie Baracaldo, Stacy Patterson
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2305.02219
  • Pdf link: https://arxiv.org/pdf/2305.02219
  • Abstract
    We propose LESS-VFL, a communication-efficient feature selection method for distributed systems with vertically partitioned data. We consider a system of a server and several parties with local datasets that share a sample ID space but have different feature sets. The parties wish to collaboratively train a model for a prediction task. As part of the training, the parties wish to remove unimportant features in the system to improve generalization, efficiency, and explainability. In LESS-VFL, after a short pre-training period, the server optimizes its part of the global model to determine the relevant outputs from party models. This information is shared with the parties to then allow local feature selection without communication. We analytically prove that LESS-VFL removes spurious features from model training. We provide extensive empirical evidence that LESS-VFL can achieve high accuracy and remove spurious features at a fraction of the communication cost of other feature selection approaches.

Data Privacy with Homomorphic Encryption in Neural Networks Training and Inference

  • Authors: Ivone Amorim, Eva Maia, Pedro Barbosa, Isabel Praça
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.02225
  • Pdf link: https://arxiv.org/pdf/2305.02225
  • Abstract
    The use of Neural Networks (NNs) for sensitive data processing is becoming increasingly popular, raising concerns about data privacy and security. Homomorphic Encryption (HE) has the potential to be used as a solution to preserve data privacy in NN. This study provides a comprehensive analysis on the use of HE for NN training and classification, focusing on the techniques and strategies used to enhance data privacy and security. The current state-of-the-art in HE for NNs is analysed, and the challenges and limitations that need to be addressed to make it a reliable and efficient approach for privacy preservation are identified. Also, the different categories of HE schemes and their suitability for NNs are discussed, as well as the techniques used to optimize the accuracy and efficiency of encrypted models. The review reveals that HE has the potential to provide strong data privacy guarantees for NNs, but several challenges need to be addressed, such as limited support for advanced NN operations, scalability issues, and performance trade-offs.

Multi-dimensional Signal Recovery using Low-rank Deconvolution

  • Authors: David Reixach
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.02264
  • Pdf link: https://arxiv.org/pdf/2305.02264
  • Abstract
    In this work we present Low-rank Deconvolution, a powerful framework for low-level feature-map learning for efficient signal representation with application to signal recovery. Its formulation in multi-linear algebra inherits properties from convolutional sparse coding and low-rank approximation methods as in this setting signals are decomposed in a set of filters convolved with a set of low-rank tensors. We show its advantages by learning compressed video representations and solving image in-painting problems.

EFx Budget-Feasible Allocations with High Nash Welfare

  • Authors: Marius Garbea, Vasilis Gkatzelis, Xizhi Tan
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2305.02280
  • Pdf link: https://arxiv.org/pdf/2305.02280
  • Abstract
    We study the problem of allocating indivisible items to budget-constrained agents, aiming to provide fairness and efficiency guarantees. Specifically, our goal is to ensure that the resulting allocation is envy-free up to any item (EFx) while minimizing the amount of inefficiency that this needs to introduce. We first show that there exist two-agent problem instances for which no EFx allocation is Pareto efficient. We, therefore, turn to approximation and use the Nash social welfare maximizing allocation as a benchmark. For two-agent instances, we provide a procedure that always returns an EFx allocation while achieving the best possible approximation of the optimal Nash social welfare that EFx allocations can achieve. For the more complicated case of three-agent instances, we provide a procedure that guarantees EFx, while achieving a constant approximation of the optimal Nash social welfare for any number of items.

DynamicStereo: Consistent Dynamic Depth from Stereo Videos

  • Authors: Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.02296
  • Pdf link: https://arxiv.org/pdf/2305.02296
  • Abstract
    We consider the problem of reconstructing a dynamic scene observed from a stereo camera. Most existing methods for depth from stereo treat different stereo frames independently, leading to temporally inconsistent depth predictions. Temporal consistency is especially important for immersive AR or VR scenarios, where flickering greatly diminishes the user experience. We propose DynamicStereo, a novel transformer-based architecture to estimate disparity for stereo videos. The network learns to pool information from neighboring frames to improve the temporal consistency of its predictions. Our architecture is designed to process stereo videos efficiently through divided attention layers. We also introduce Dynamic Replica, a new benchmark dataset containing synthetic videos of people and animals in scanned environments, which provides complementary training and evaluation data for dynamic stereo closer to real applications than existing datasets. Training with this dataset further improves the quality of predictions of our proposed DynamicStereo as well as prior methods. Finally, it acts as a benchmark for consistent stereo methods.

Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime

  • Authors: Chuhan Zhang, Antoine Miech, Jiajun Shen, Jean-Baptiste Alayrac, Pauline Luc
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.02297
  • Pdf link: https://arxiv.org/pdf/2305.02297
  • Abstract
    Large-scale visual language models are widely used as pre-trained models and then adapted for various downstream tasks. While humans are known to efficiently learn new tasks from a few examples, deep learning models struggle with adaptation from few examples. In this work, we look into task adaptation in the low-data regime, and provide a thorough study of the existing adaptation methods for generative Visual Language Models. And we show important benefits of self-labelling, i.e. using the model's own predictions to self-improve when having access to a larger number of unlabelled images of the same distribution. Our study demonstrates significant gains using our proposed task adaptation pipeline across a wide range of visual language tasks such as visual classification (ImageNet), visual captioning (COCO), detailed visual captioning (Localised Narratives) and visual question answering (VQAv2).

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

  • Authors: Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, Tomas Pfister
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.02301
  • Pdf link: https://arxiv.org/pdf/2305.02301
  • Abstract
    Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs. We introduce Distilling step-by-step, a new mechanism that (a) trains smaller models that outperform LLMs, and (b) achieves so by leveraging less training data needed by finetuning or distillation. Our method extracts LLM rationales as additional supervision for small models within a multi-task training framework. We present three findings across 4 NLP benchmarks: First, compared to both finetuning and distillation, our mechanism achieves better performance with much fewer labeled/unlabeled training examples. Second, compared to LLMs, we achieve better performance using substantially smaller model sizes. Third, we reduce both the model size and the amount of data required to outperform LLMs; our 770M T5 model outperforms the 540B PaLM model using only 80% of available data on a benchmark task.

CodeGen2: Lessons for Training LLMs on Programming and Natural Languages

  • Authors: Erik Nijkamp, Hiroaki Hayashi, Caiming Xiong, Silvio Savarese, Yingbo Zhou
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.02309
  • Pdf link: https://arxiv.org/pdf/2305.02309
  • Abstract
    Large language models (LLMs) have demonstrated remarkable abilities in representation learning for program synthesis and understanding tasks. The quality of the learned representations appears to be dictated by the neural scaling laws as a function of the number of model parameters and observations, while imposing upper bounds on the model performance by the amount of available data and compute, which is costly. In this study, we attempt to render the training of LLMs for program synthesis more efficient by unifying four key components: (1) model architectures, (2) learning methods, (3) infill sampling, and, (4) data distributions. Specifically, for the model architecture, we attempt to unify encoder and decoder-based models into a single prefix-LM. For learning methods, (i) causal language modeling, (ii) span corruption, (iii) infilling are unified into a simple learning algorithm. For infill sampling, we explore the claim of a "free lunch" hypothesis. For data distributions, the effect of a mixture distribution of programming and natural languages on model performance is explored. We conduct a comprehensive series of empirical experiments on 1B LLMs, for which failures and successes of this exploration are distilled into four lessons. We will provide a final recipe for training and release CodeGen2 models in size 1B, 3.7B, 7B, and, 16B parameters, along with the training framework as open-source: https://github.com/salesforce/CodeGen2.

AG3D: Learning to Generate 3D Avatars from 2D Image Collections

  • Authors: Zijian Dong, Xu Chen, Jinlong Yang, Michael J. Black, Otmar Hilliges, Andreas Geiger
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.02312
  • Pdf link: https://arxiv.org/pdf/2305.02312
  • Abstract
    While progress in 2D generative models of human appearance has been rapid, many applications require 3D avatars that can be animated and rendered. Unfortunately, most existing methods for learning generative models of 3D humans with diverse shape and appearance require 3D training data, which is limited and expensive to acquire. The key to progress is hence to learn generative models of 3D avatars from abundant unstructured 2D image collections. However, learning realistic and complete 3D appearance and geometry in this under-constrained setting remains challenging, especially in the presence of loose clothing such as dresses. In this paper, we propose a new adversarial generative model of realistic 3D people from 2D images. Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator and integrating an efficient and flexible articulation module. To improve realism, we train our model using multiple discriminators while also integrating geometric cues in the form of predicted 2D normal maps. We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance. We validate the effectiveness of our model and the importance of each component via systematic ablation studies.

Keyword: faster

Fast Deterministic Gathering with Detection on Arbitrary Graphs: The Power of Many Robots

  • Authors: Anisur Rahaman Molla, Kaushik Mondal, William K. Moses Jr
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2305.01753
  • Pdf link: https://arxiv.org/pdf/2305.01753
  • Abstract
    Over the years, much research involving mobile computational entities has been performed. From modeling actual microscopic (and smaller) robots, to modeling software processes on a network, many important problems have been studied in this context. Gathering is one such fundamental problem in this area. The problem of gathering $k$ robots, initially arbitrarily placed on the nodes of an $n$-node graph, asks that these robots coordinate and communicate in a local manner, as opposed to global, to move around the graph, find each other, and settle down on a single node as fast as possible. A more difficult problem to solve is gathering with detection, where once the robots gather, they must subsequently realize that gathering has occurred and then terminate. In this paper, we propose a deterministic approach to solve gathering with detection for any arbitrary connected graph that is faster than existing deterministic solutions for even just gathering (without the requirement of detection) for arbitrary graphs. In contrast to earlier work on gathering, it leverages the fact that there are more robots present in the system to achieve gathering with detection faster than those previous papers that focused on just gathering. The state of the art solution for deterministic gathering~[Ta-Shma and Zwick, TALG, 2014] takes $\Tilde{O}$$(n^5 \log \ell)$ rounds, where $\ell$ is the smallest label among robots and $\Tilde{O}$ hides a polylog factor. We design a deterministic algorithm for gathering with detection with the following trade-offs depending on how many robots are present: (i) when $k \geq \lfloor n/2 \rfloor + 1$, the algorithm takes $O(n^3)$ rounds, (ii) when $k \geq \lfloor n/3 \rfloor + 1$, the algorithm takes $O(n^4 \log n)$ rounds, and (iii) otherwise, the algorithm takes $\Tilde{O}$$(n^5)$ rounds. The algorithm is not required to know $k$, but only $n$.

A Lightweight CNN-Transformer Model for Learning Traveling Salesman Problems

  • Authors: Minseop Jung, Jaeseung Lee, Jibum Kim
  • Subjects: Machine Learning (cs.LG); Computational Geometry (cs.CG)
  • Arxiv link: https://arxiv.org/abs/2305.01883
  • Pdf link: https://arxiv.org/pdf/2305.01883
  • Abstract
    Transformer-based models show state-of-the-art performance even for large-scale Traveling Salesman Problems (TSPs). However, they are based on fully-connected attention models and suffer from large computational complexity and GPU memory usage. We propose a lightweight CNN-Transformer model based on a CNN embedding layer and partial self-attention. Our CNN-Transformer model is able to better learn spatial features from input data using a CNN embedding layer compared with the standard Transformer models. It also removes considerable redundancy in fully connected attention models using the proposed partial self-attention. Experiments show that the proposed model outperforms other state-of-the-art Transformer-based models in terms of TSP solution quality, GPU memory usage, and inference time. Our model consumes approximately 20% less GPU memory usage and has 45% faster inference time compared with other state-of-the-art Transformer-based models. Our code is publicly available at https://github.com/cm8908/CNN_Transformer3

Approximate Evaluation of Quantitative Second Order Queries

  • Authors: Jan Dreier, Robert Ganian, Thekla Hamm
  • Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2305.02056
  • Pdf link: https://arxiv.org/pdf/2305.02056
  • Abstract
    Courcelle's theorem and its adaptations to cliquewidth have shaped the field of exact parameterized algorithms and are widely considered the archetype of algorithmic meta-theorems. In the past decade, there has been growing interest in developing parameterized approximation algorithms for problems which are not captured by Courcelle's theorem and, in particular, are considered not fixed-parameter tractable under the associated widths. We develop a generalization of Courcelle's theorem that yields efficient approximation schemes for any problem that can be captured by an expanded logic we call Blocked CMSO, capable of making logical statements about the sizes of set variables via so-called weight comparisons. The logic controls weight comparisons via the quantifier-alternation depth of the involved variables, allowing full comparisons for zero-alternation variables and limited comparisons for one-alternation variables. We show that the developed framework threads the very needle of tractability: on one hand it can describe a broad range of approximable problems, while on the other hand we show that the restrictions of our logic cannot be relaxed under well-established complexity assumptions. The running time of our approximation scheme is polynomial in $1/\varepsilon$, allowing us to fully interpolate between faster approximate algorithms and slower exact algorithms. This provides a unified framework to explain the tractability landscape of graph problems parameterized by treewidth and cliquewidth, as well as classical non-graph problems such as Subset Sum and Knapsack.

Removing Human Bottlenecks in Bird Classification Using Camera Trap Images and Deep Learning

  • Authors: Carl Chalmers, Paul Fergus, Serge Wich, Steven N Longmore, Naomi Davies Walsh, Philip Stephens, Chris Sutherland, Naomi Matthews, Jens Mudde, Amira Nuseibeh
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.02097
  • Pdf link: https://arxiv.org/pdf/2305.02097
  • Abstract
    Birds are important indicators for monitoring both biodiversity and habitat health; they also play a crucial role in ecosystem management. Decline in bird populations can result in reduced eco-system services, including seed dispersal, pollination and pest control. Accurate and long-term monitoring of birds to identify species of concern while measuring the success of conservation interventions is essential for ecologists. However, monitoring is time consuming, costly and often difficult to manage over long durations and at meaningfully large spatial scales. Technology such as camera traps, acoustic monitors and drones provide methods for non-invasive monitoring. There are two main problems with using camera traps for monitoring: a) cameras generate many images, making it difficult to process and analyse the data in a timely manner; and b) the high proportion of false positives hinders the processing and analysis for reporting. In this paper, we outline an approach for overcoming these issues by utilising deep learning for real-time classi-fication of bird species and automated removal of false positives in camera trap data. Images are classified in real-time using a Faster-RCNN architecture. Images are transmitted over 3/4G cam-eras and processed using Graphical Processing Units (GPUs) to provide conservationists with key detection metrics therefore removing the requirement for manual observations. Our models achieved an average sensitivity of 88.79%, a specificity of 98.16% and accuracy of 96.71%. This demonstrates the effectiveness of using deep learning for automatic bird monitoring.

Keyword: mobile

Probabilistic Formal Modelling to Uncover and Interpret Interaction Styles

  • Authors: Oana Andrei, Muffy Calder, Matthew Chalmers, Alistair Morrison
  • Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2305.01656
  • Pdf link: https://arxiv.org/pdf/2305.01656
  • Abstract
    We present a study using new computational methods, based on a novel combination of machine learning for inferring admixture hidden Markov models and probabilistic model checking, to uncover interaction styles in a mobile app. These styles are then used to inform a redesign, which is implemented, deployed, and then analysed using the same methods. The data sets are logged user traces, collected over two six-month deployments of each version, involving thousands of users and segmented into different time intervals. The methods do not assume tasks or absolute metrics such as measures of engagement, but uncover the styles through unsupervised inference of clusters and analysis with probabilistic temporal logic. For both versions there was a clear distinction between the styles adopted by users during the first day/week/month of usage, and during the second and third months, a result we had not anticipated.

Computer-Vision Based Real Time Waypoint Generation for Autonomous Vineyard Navigation with Quadruped Robots

  • Authors: Lee Milburn, Juan Gamba, Miguel Fernandes, Claudio Semini
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.01700
  • Pdf link: https://arxiv.org/pdf/2305.01700
  • Abstract
    The VINUM project seeks to address the shortage of skilled labor in modern vineyards by introducing a cutting-edge mobile robotic solution. Leveraging the capabilities of the quadruped robot, HyQReal, this system, equipped with arm and vision sensors, offers autonomous navigation and winter pruning of grapevines reducing the need for human intervention. At the heart of this approach lies an architecture that empowers the robot to easily navigate vineyards, identify grapevines with unparalleled accuracy, and approach them for pruning with precision. A state machine drives the process, deftly switching between various stages to ensure seamless and efficient task completion. The system's performance was assessed through experimentation, focusing on waypoint precision and optimizing the robot's workspace for single-plant operations. Results indicate that the architecture is highly reliable, with a mean error of 21.5cm and a standard deviation of 17.6cm for HyQReal. However, improvements in grapevine detection accuracy are necessary for optimal performance. This work is based on a computer-vision-based navigation method for quadruped robots in vineyards, opening up new possibilities for selective task automation. The system's architecture works well in ideal weather conditions, generating and arriving at precise waypoints that maximize the attached robotic arm's workspace. This work is an extension of our short paper presented at the Italian Conference on Robotics and Intelligent Machines (I-RIM).

Fast Deterministic Gathering with Detection on Arbitrary Graphs: The Power of Many Robots

  • Authors: Anisur Rahaman Molla, Kaushik Mondal, William K. Moses Jr
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2305.01753
  • Pdf link: https://arxiv.org/pdf/2305.01753
  • Abstract
    Over the years, much research involving mobile computational entities has been performed. From modeling actual microscopic (and smaller) robots, to modeling software processes on a network, many important problems have been studied in this context. Gathering is one such fundamental problem in this area. The problem of gathering $k$ robots, initially arbitrarily placed on the nodes of an $n$-node graph, asks that these robots coordinate and communicate in a local manner, as opposed to global, to move around the graph, find each other, and settle down on a single node as fast as possible. A more difficult problem to solve is gathering with detection, where once the robots gather, they must subsequently realize that gathering has occurred and then terminate. In this paper, we propose a deterministic approach to solve gathering with detection for any arbitrary connected graph that is faster than existing deterministic solutions for even just gathering (without the requirement of detection) for arbitrary graphs. In contrast to earlier work on gathering, it leverages the fact that there are more robots present in the system to achieve gathering with detection faster than those previous papers that focused on just gathering. The state of the art solution for deterministic gathering~[Ta-Shma and Zwick, TALG, 2014] takes $\Tilde{O}$$(n^5 \log \ell)$ rounds, where $\ell$ is the smallest label among robots and $\Tilde{O}$ hides a polylog factor. We design a deterministic algorithm for gathering with detection with the following trade-offs depending on how many robots are present: (i) when $k \geq \lfloor n/2 \rfloor + 1$, the algorithm takes $O(n^3)$ rounds, (ii) when $k \geq \lfloor n/3 \rfloor + 1$, the algorithm takes $O(n^4 \log n)$ rounds, and (iii) otherwise, the algorithm takes $\Tilde{O}$$(n^5)$ rounds. The algorithm is not required to know $k$, but only $n$.

A Vision Transformer Approach for Efficient Near-Field Irregular SAR Super-Resolution

  • Authors: Josiah Smith, Yusef Alimam, Geetika Vedula
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.02074
  • Pdf link: https://arxiv.org/pdf/2305.02074
  • Abstract
    In this paper, we develop a novel super-resolution algorithm for near-field synthetic-aperture radar (SAR) under irregular scanning geometries. As fifth-generation (5G) millimeter-wave (mmWave) devices are becoming increasingly affordable and available, high-resolution SAR imaging is feasible for end-user applications and non-laboratory environments. Emerging applications such freehand imaging, wherein a handheld radar is scanned throughout space by a user, unmanned aerial vehicle (UAV) imaging, and automotive SAR face several unique challenges for high-resolution imaging. First, recovering a SAR image requires knowledge of the array positions throughout the scan. While recent work has introduced camera-based positioning systems capable of adequately estimating the position, recovering the algorithm efficiently is a requirement to enable edge and Internet of Things (IoT) technologies. Efficient algorithms for non-cooperative near-field SAR sampling have been explored in recent work, but suffer image defocusing under position estimation error and can only produce medium-fidelity images. In this paper, we introduce a mobile-friend vision transformer (ViT) architecture to address position estimation error and perform SAR image super-resolution (SR) under irregular sampling geometries. The proposed algorithm, Mobile-SRViT, is the first to employ a ViT approach for SAR image enhancement and is validated in simulation and via empirical studies.

Efficient CNN-based Super Resolution Algorithms for mmWave Mobile Radar Imaging

  • Authors: Christos Vasileiou, Josiah W. Smith, Shiva Thiagarajan, Matthew Nigh, Yiorgos Makris, Murat Torlak
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.02092
  • Pdf link: https://arxiv.org/pdf/2305.02092
  • Abstract
    In this paper, we introduce an innovative super resolution approach to emerging modes of near-field synthetic aperture radar (SAR) imaging. Recent research extends convolutional neural network (CNN) architectures from the optical to the electromagnetic domain to achieve super resolution on images generated from radar signaling. Specifically, near-field synthetic aperture radar (SAR) imaging, a method for generating high-resolution images by scanning a radar across space to create a synthetic aperture, is of interest due to its high-fidelity spatial sensing capability, low cost devices, and large application space. Since SAR imaging requires large aperture sizes to achieve high resolution, super-resolution algorithms are valuable for many applications. Freehand smartphone SAR, an emerging sensing modality, requires irregular SAR apertures in the near-field and computation on mobile devices. Achieving efficient high-resolution SAR images from irregularly sampled data collected by freehand motion of a smartphone is a challenging task. In this paper, we propose a novel CNN architecture to achieve SAR image super-resolution for mobile applications by employing state-of-the-art SAR processing and deep learning techniques. The proposed algorithm is verified via simulation and an empirical study. Our algorithm demonstrates high-efficiency and high-resolution radar imaging for near-field scenarios with irregular scanning geometries.

Distributed Leader Follower Formation Control of Mobile Robots based on Bioinspired Neural Dynamics and Adaptive Sliding Innovation Filter

  • Authors: Zhe Xu, Tao Yan, Simon X. Yang, S. Andrew Gadsden
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.02288
  • Pdf link: https://arxiv.org/pdf/2305.02288
  • Abstract
    This paper investigated the distributed leader follower formation control problem for multiple differentially driven mobile robots. A distributed estimator is first introduced and it only requires the state information from each follower itself and its neighbors. Then, we propose a bioinspired neural dynamic based backstepping and sliding mode control hybrid formation control method with proof of its stability. The proposed control strategy resolves the impractical speed jump issue that exists in the conventional backstepping design. Additionally, considering the system and measurement noises, the proposed control strategy not only removes the chattering issue existing in the conventional sliding mode control but also provides smooth control input with extra robustness. After that, an adaptive sliding innovation filter is integrated with the proposed control to provide accurate state estimates that are robust to modeling uncertainties. Finally, we performed multiple simulations to demonstrate the efficiency and effectiveness of the proposed formation control strategy.

Keyword: pruning

Computer-Vision Based Real Time Waypoint Generation for Autonomous Vineyard Navigation with Quadruped Robots

  • Authors: Lee Milburn, Juan Gamba, Miguel Fernandes, Claudio Semini
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.01700
  • Pdf link: https://arxiv.org/pdf/2305.01700
  • Abstract
    The VINUM project seeks to address the shortage of skilled labor in modern vineyards by introducing a cutting-edge mobile robotic solution. Leveraging the capabilities of the quadruped robot, HyQReal, this system, equipped with arm and vision sensors, offers autonomous navigation and winter pruning of grapevines reducing the need for human intervention. At the heart of this approach lies an architecture that empowers the robot to easily navigate vineyards, identify grapevines with unparalleled accuracy, and approach them for pruning with precision. A state machine drives the process, deftly switching between various stages to ensure seamless and efficient task completion. The system's performance was assessed through experimentation, focusing on waypoint precision and optimizing the robot's workspace for single-plant operations. Results indicate that the architecture is highly reliable, with a mean error of 21.5cm and a standard deviation of 17.6cm for HyQReal. However, improvements in grapevine detection accuracy are necessary for optimal performance. This work is based on a computer-vision-based navigation method for quadruped robots in vineyards, opening up new possibilities for selective task automation. The system's architecture works well in ideal weather conditions, generating and arriving at precise waypoints that maximize the attached robotic arm's workspace. This work is an extension of our short paper presented at the Italian Conference on Robotics and Intelligent Machines (I-RIM).

Bicubic++: Slim, Slimmer, Slimmest -- Designing an Industry-Grade Super-Resolution Network

  • Authors: Bahri Batuhan Bilecen, Mustafa Ayazoglu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2305.02126
  • Pdf link: https://arxiv.org/pdf/2305.02126
  • Abstract
    We propose a real-time and lightweight single-image super-resolution (SR) network named Bicubic++. Despite using spatial dimensions of the input image across the whole network, Bicubic++ first learns quick reversible downgraded and lower resolution features of the image in order to decrease the number of computations. We also construct a training pipeline, where we apply an end-to-end global structured pruning of convolutional layers without using metrics like magnitude and gradient norms, and focus on optimizing the pruned network's PSNR on the validation set. Furthermore, we have experimentally shown that the bias terms take considerable amount of the runtime while increasing PSNR marginally, hence we have also applied bias removal to the convolutional layers. Our method adds ~1dB on Bicubic upscaling PSNR for all tested SR datasets and runs with ~1.17ms on RTX3090 and ~2.9ms on RTX3070, for 720p inputs and 4K outputs, both in FP16 precision. Bicubic++ won NTIRE 2023 RTSR Track 2 x3 SR competition and is the fastest among all competitive methods. Being almost as fast as the standard Bicubic upsampling method, we believe that Bicubic++ can set a new industry standard.

Rethinking Graph Lottery Tickets: Graph Sparsity Matters

  • Authors: Bo Hui, Da Yan, Xiaolong Ma, Wei-Shinn Ku
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.02190
  • Pdf link: https://arxiv.org/pdf/2305.02190
  • Abstract
    Lottery Ticket Hypothesis (LTH) claims the existence of a winning ticket (i.e., a properly pruned sub-network together with original weight initialization) that can achieve competitive performance to the original dense network. A recent work, called UGS, extended LTH to prune graph neural networks (GNNs) for effectively accelerating GNN inference. UGS simultaneously prunes the graph adjacency matrix and the model weights using the same masking mechanism, but since the roles of the graph adjacency matrix and the weight matrices are very different, we find that their sparsifications lead to different performance characteristics. Specifically, we find that the performance of a sparsified GNN degrades significantly when the graph sparsity goes beyond a certain extent. Therefore, we propose two techniques to improve GNN performance when the graph sparsity is high. First, UGS prunes the adjacency matrix using a loss formulation which, however, does not properly involve all elements of the adjacency matrix; in contrast, we add a new auxiliary loss head to better guide the edge pruning by involving the entire adjacency matrix. Second, by regarding unfavorable graph sparsification as adversarial data perturbations, we formulate the pruning process as a min-max optimization problem to gain the robustness of lottery tickets when the graph sparsity is high. We further investigate the question: Can the "retrainable" winning ticket of a GNN be also effective for graph transferring learning? We call it the transferable graph lottery ticket (GLT) hypothesis. Extensive experiments were conducted which demonstrate the superiority of our proposed sparsification method over UGS, and which empirically verified our transferable GLT hypothesis.

Keyword: voxel

There is no result

Keyword: lidar

Direct LiDAR-Inertial Odometry and Mapping: Perceptive and Connective SLAM

  • Authors: Kenny Chen, Ryan Nemiroff, Brett T. Lopez
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.01843
  • Pdf link: https://arxiv.org/pdf/2305.01843
  • Abstract
    This paper presents Direct LiDAR-Inertial Odometry and Mapping (DLIOM), a robust SLAM algorithm with an explicit focus on computational efficiency, operational reliability, and real-world efficacy. DLIOM contains several key algorithmic innovations in both the front-end and back-end subsystems to design a resilient LiDAR-inertial architecture that is perceptive to the environment and produces accurate localization and high-fidelity 3D mapping for autonomous robotic platforms. Our ideas spawned after a deep investigation into modern LiDAR SLAM systems and their inabilities to generalize across different operating environments, in which we address several common algorithmic failure points by means of proactive safe-guards to provide long-term operational reliability in the unstructured real world. We detail several important innovations to localization accuracy and mapping resiliency distributed throughout a typical LiDAR SLAM pipeline to comprehensively increase algorithmic speed, accuracy, and robustness. In addition, we discuss insights gained from our ground-up approach while implementing such a complex system for real-time state estimation on resource-constrained systems, and we experimentally show the increased performance of our method as compared to the current state-of-the-art on both public benchmark and self-collected datasets.

On procedural urban digital twin generation and visualization of large scale data

  • Authors: Sanjay Somanath, Vasilis Naserentin, Orfeas Eleftheriou, Daniel Sjölie, Beata Stahre Wästberg, Anders Logg
  • Subjects: Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2305.02242
  • Pdf link: https://arxiv.org/pdf/2305.02242
  • Abstract
    The desired outcome for urban digital twins is an automatically generated detailed 3D model of a building from aerial imagery, footprints, LiDAR, or a fusion of these. Such 3D models have applications in architecture, civil engineering, urban planning, construction, real estate, GIS, and many others. Further, the visualization of large-scale data in conjunction with the generated 3D models is often a recurring and resource-intensive task. However, a completely automated end-to-end workflow is complex, requiring many steps to achieve a high-quality visualization. Methods for building reconstruction approaches have come a long way from previously manual approaches to semi-automatic or automatic approaches. The next step after reconstructing buildings is visualizing the buildings and their context. Advances in real-time rendering using game engines have enabled the extension of building reconstruction methods to procedurally generated context generation. This paper aims to complement existing methods of 3D building generation. First, we present a literature review covering different options for procedurally generated context generation and visualization methods in-depth, focusing on workflows and data pipelines. Next, we present a semi-automated workflow that extends the building reconstruction pipeline to include procedural context generation (terrain and vegetation) using Unreal Engine and, finally, the integration of various types of large-scale urban analysis data for visualization. We conclude with a series of challenges faced in achieving such pipelines and the limitations of the current approach. The steps for a complete, end-to-end solution involve developing robust systems for building detection, rooftop recognition, and geometry generation and importing and visualizing data in the same 3D environment.

Keyword: diffusion

DiffuSum: Generation Enhanced Extractive Summarization with Diffusion

  • Authors: Haopeng Zhang, Xiao Liu, Jiawei Zhang
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2305.01735
  • Pdf link: https://arxiv.org/pdf/2305.01735
  • Abstract
    Extractive summarization aims to form a summary by directly extracting sentences from the source document. Existing works mostly formulate it as a sequence labeling problem by making individual sentence label predictions. This paper proposes DiffuSum, a novel paradigm for extractive summarization, by directly generating the desired summary sentence representations with diffusion models and extracting sentences based on sentence representation matching. In addition, DiffuSum jointly optimizes a contrastive sentence encoder with a matching loss for sentence representation alignment and a multi-class contrastive loss for representation diversity. Experimental results show that DiffuSum achieves the new state-of-the-art extractive results on CNN/DailyMail with ROUGE scores of $44.83/22.56/40.56$. Experiments on the other two datasets with different summary lengths also demonstrate the effectiveness of DiffuSum. The strong performance of our framework shows the great potential of adapting generative models for extractive summarization.

Multimodal Procedural Planning via Dual Text-Image Prompting

  • Authors: Yujie Lu, Pan Lu, Zhiyu Chen, Wanrong Zhu, Xin Eric Wang, William Yang Wang
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2305.01795
  • Pdf link: https://arxiv.org/pdf/2305.01795
  • Abstract
    Embodied agents have achieved prominent performance in following human instructions to complete tasks. However, the potential of providing instructions informed by texts and images to assist humans in completing tasks remains underexplored. To uncover this capability, we present the multimodal procedural planning (MPP) task, in which models are given a high-level goal and generate plans of paired text-image steps, providing more complementary and informative guidance than unimodal plans. The key challenges of MPP are to ensure the informativeness, temporal coherence,and accuracy of plans across modalities. To tackle this, we propose Text-Image Prompting (TIP), a dual-modality prompting method that jointly leverages zero-shot reasoning ability in large language models (LLMs) and compelling text-to-image generation ability from diffusion-based models. TIP improves the interaction in the dual modalities using Text-to-Image Bridge and Image-to-Text Bridge, allowing LLMs to guide the textual-grounded image plan generation and leveraging the descriptions of image plans to ground the textual plan reversely. To address the lack of relevant datasets, we collect WIKIPLAN and RECIPEPLAN as a testbed for MPP. Our results show compelling human preferences and automatic scores against unimodal and multimodal baselines on WIKIPLAN and RECIPEPLAN in terms of informativeness, temporal coherence, and plan accuracy. Our code and data: https://github.com/YujieLu10/MPP.

Unpaired Downscaling of Fluid Flows with Diffusion Bridges

  • Authors: Tobias Bischoff, Katherine Deck
  • Subjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn); Geophysics (physics.geo-ph)
  • Arxiv link: https://arxiv.org/abs/2305.01822
  • Pdf link: https://arxiv.org/pdf/2305.01822
  • Abstract
    We present a method to downscale idealized geophysical fluid simulations using generative models based on diffusion maps. By analyzing the Fourier spectra of images drawn from different data distributions, we show how one can chain together two independent conditional diffusion models for use in domain translation. The resulting transformation is a diffusion bridge between a low resolution and a high resolution dataset and allows for new sample generation of high-resolution images given specific low resolution features. The ability to generate new samples allows for the computation of any statistic of interest, without any additional calibration or training. Our unsupervised setup is also designed to downscale images without access to paired training data; this flexibility allows for the combination of multiple source and target domains without additional training. We demonstrate that the method enhances resolution and corrects context-dependent biases in geophysical fluid simulations, including in extreme events. We anticipate that the same method can be used to downscale the output of climate simulations, including temperature and precipitation fields, without needing to train a new model for each application and providing a significant computational cost savings.

Multimodal Data Augmentation for Image Captioning using Diffusion Models

  • Authors: Changrong Xiao, Sean Xin Xu, Kunpeng Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.01855
  • Pdf link: https://arxiv.org/pdf/2305.01855
  • Abstract
    Image captioning, an important vision-language task, often requires a tremendous number of finely labeled image-caption pairs for learning the underlying alignment between images and texts. In this paper, we proposed a multimodal data augmentation method, leveraging a recent text-to-image model called Stable Diffusion, to expand the training set via high-quality generation of image-caption pairs. Extensive experiments on the MS COCO dataset demonstrate the advantages of our approach over several benchmark methods, and particularly a significant boost when having fewer training instances. In addition, models trained on our augmented datasets also outperform prior unpaired image captioning methods by a large margin. Finally, further improvement regarding the training efficiency and effectiveness can be obtained after intentionally filtering the generated data based on quality assessment.

The Impacts of Dimensionality, Diffusion, and Directedness on Intrinsic Cross-Model Simulation in Tile-Based Self-Assembly

  • Authors: Daniel Hader, Matthew J. Patitz
  • Subjects: Computational Geometry (cs.CG); Emerging Technologies (cs.ET)
  • Arxiv link: https://arxiv.org/abs/2305.01877
  • Pdf link: https://arxiv.org/pdf/2305.01877
  • Abstract
    Algorithmic self-assembly occurs when disorganized components autonomously combine to form structures and, by their design and the dynamics of the system, are forced to follow the execution of algorithms. Motivated by applications in DNA-nanotechnology, investigations in algorithmic tile-based self-assembly have blossomed into a mature theory with research leveraging tools from computability theory, complexity theory, information theory, and graph theory to develop a wide range of models and show that many are computationally universal, while also exposing powers and limitations of each. Beyond computational universality, the abstract Tile Assembly Model (aTAM) was shown to be intrinsically universal (IU), a strong notion of completeness where a single tile set is capable of simulating all systems within the model; however, this result required non-deterministic tile attachments. This was later confirmed necessary when it was shown that the class of directed aTAM systems is not IU. Building on these results to further investigate the impacts of other dynamics, Hader et al. examined several tile-assembly models which varied across (1) the numbers of dimensions used, (2) restrictions based on diffusion of tiles through space, and (3) whether each system is directed, and showed which models are IU. Such results have shed much light on the roles of various aspects of the dynamics of tile-assembly and their effects on the intrinsic universality of each model. Here we provide direct comparisons of the various models by considering intrinsic simulations between models. We show that in some cases one model is more powerful than another, and in others, pairs of models have mutually exclusive capabilities. This comparison helps to expose the impacts of these three important aspects and further helps define a hierarchy of tile-assembly models.

DiffFacto Controllable Part-Based 3D Point Cloud Generation with Cross Diffusion

  • Authors: Kiyohiro Nakayama, Mikaela Angelina Uy, Jiahui Huang, Shi-Min Hu, Ke Li, Leonidas J Guibas
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.01921
  • Pdf link: https://arxiv.org/pdf/2305.01921
  • Abstract
    While the community of 3D point cloud generation has witnessed a big growth in recent years, there still lacks an effective way to enable intuitive user control in the generation process, hence limiting the general utility of such methods. Since an intuitive way of decomposing a shape is through its parts, we propose to tackle the task of controllable part-based point cloud generation. We introduce DiffFacto, a novel probabilistic generative model that learns the distribution of shapes with part-level control. We propose a factorization that models independent part style and part configuration distributions, and present a novel cross diffusion network that enables us to generate coherent and plausible shapes under our proposed factorization. Experiments show that our method is able to generate novel shapes with multiple axes of control. It achieves state-of-the-art part-level generation quality and generates plausible and coherent shape, while enabling various downstream editing applications such as shape interpolation, mixing and transformation editing. Code will be made publicly available.

Deep Graph Representation Learning and Optimization for Influence Maximization

  • Authors: Chen Ling, Junji Jiang, Junxiang Wang, My Thai, Lukas Xue, James Song, Meikang Qiu, Liang Zhao
  • Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.02200
  • Pdf link: https://arxiv.org/pdf/2305.02200
  • Abstract
    Influence maximization (IM) is formulated as selecting a set of initial users from a social network to maximize the expected number of influenced users. Researchers have made great progress in designing various traditional methods, and their theoretical design and performance gain are close to a limit. In the past few years, learning-based IM methods have emerged to achieve stronger generalization ability to unknown graphs than traditional ones. However, the development of learning-based IM methods is still limited by fundamental obstacles, including 1) the difficulty of effectively solving the objective function; 2) the difficulty of characterizing the diversified underlying diffusion patterns; and 3) the difficulty of adapting the solution under various node-centrality-constrained IM variants. To cope with the above challenges, we design a novel framework DeepIM to generatively characterize the latent representation of seed sets, and we propose to learn the diversified information diffusion pattern in a data-driven and end-to-end manner. Finally, we design a novel objective function to infer optimal seed sets under flexible node-centrality-based budget constraints. Extensive analyses are conducted over both synthetic and real-world datasets to demonstrate the overall performance of DeepIM. The code and data are available at: https://github.com/triplej0079/DeepIM.

Keyword: dynamic

Physics-Informed and Data-Driven Discovery of Governing Equations for Complex Phenomena in Heterogeneous Media

  • Authors: Muhammad Sahimi
  • Subjects: Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph)
  • Arxiv link: https://arxiv.org/abs/2305.01653
  • Pdf link: https://arxiv.org/pdf/2305.01653
  • Abstract
    Rapid evolution of sensor technology, advances in instrumentation, and progress in devising data-acquisition softwares/hardwares are providing vast amounts of data for various complex phenomena, ranging from those in atomospheric environment, to large-scale porous formations, and biological systems. The tremendous increase in the speed of scientific computing has also made it possible to emulate diverse high-dimensional, multiscale and multiphysics phenomena that contain elements of stochasticity, and to generate large volumes of numerical data for them in heterogeneous systems. The difficulty is, however, that often the governing equations for such phenomena are not known. A prime example is flow, transport, and deformation processes in macroscopically-heterogeneous materials and geomedia. In other cases, the governing equations are only partially known, in the sense that they either contain various coefficients that must be evaluated based on data, or that they require constitutive relations, such as the relationship between the stress tensor and the velocity gradients for non-Newtonian fluids in the momentum conservation equation, in order for them to be useful to the modeling. Several classes of approaches are emerging to address such problems that are based on machine learning, symbolic regression, the Mori-Zwanzig projection operator formulation, sparse identification of nonlinear dynamics, data assimilation, and stochastic optimization and analysis, or a combination of two or more of such approaches. This Perspective describes the latest developments in this highly important area, and discusses possible future directions.

Visual Reasoning: from State to Transformation

  • Authors: Xin Hong, Yanyan Lan, Liang Pang, Jiafeng Guo, Xueqi Cheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.01668
  • Pdf link: https://arxiv.org/pdf/2305.01668
  • Abstract
    Most existing visual reasoning tasks, such as CLEVR in VQA, ignore an important factor, i.e.~transformation. They are solely defined to test how well machines understand concepts and relations within static settings, like one image. Such \textbf{state driven} visual reasoning has limitations in reflecting the ability to infer the dynamics between different states, which has shown to be equally important for human cognition in Piaget's theory. To tackle this problem, we propose a novel \textbf{transformation driven} visual reasoning (TVR) task. Given both the initial and final states, the target becomes to infer the corresponding intermediate transformation. Following this definition, a new synthetic dataset namely TRANCE is first constructed on the basis of CLEVR, including three levels of settings, i.e.~Basic (single-step transformation), Event (multi-step transformation), and View (multi-step transformation with variant views). Next, we build another real dataset called TRANCO based on COIN, to cover the loss of transformation diversity on TRANCE. Inspired by human reasoning, we propose a three-staged reasoning framework called TranNet, including observing, analyzing, and concluding, to test how recent advanced techniques perform on TVR. Experimental results show that the state-of-the-art visual reasoning models perform well on Basic, but are still far from human-level intelligence on Event, View, and TRANCO. We believe the proposed new paradigm will boost the development of machine visual reasoning. More advanced methods and new problems need to be investigated in this direction. The resource of TVR is available at \url{https://hongxin2019.github.io/TVR/}.

Single-model uncertainty quantification in neural network potentials does not consistently outperform model ensembles

  • Authors: Aik Rui Tan, Shingo Urata, Samuel Goldman, Johannes C.B. Dietschreit, Rafael Gómez-Bombarelli
  • Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)
  • Arxiv link: https://arxiv.org/abs/2305.01754
  • Pdf link: https://arxiv.org/pdf/2305.01754
  • Abstract
    Neural networks (NNs) often assign high confidence to their predictions, even for points far out-of-distribution, making uncertainty quantification (UQ) a challenge. When they are employed to model interatomic potentials in materials systems, this problem leads to unphysical structures that disrupt simulations, or to biased statistics and dynamics that do not reflect the true physics. Differentiable UQ techniques can find new informative data and drive active learning loops for robust potentials. However, a variety of UQ techniques, including newly developed ones, exist for atomistic simulations and there are no clear guidelines for which are most effective or suitable for a given case. In this work, we examine multiple UQ schemes for improving the robustness of NN interatomic potentials (NNIPs) through active learning. In particular, we compare incumbent ensemble-based methods against strategies that use single, deterministic NNs: mean-variance estimation, deep evidential regression, and Gaussian mixture models. We explore three datasets ranging from in-domain interpolative learning to more extrapolative out-of-domain generalization challenges: rMD17, ammonia inversion, and bulk silica glass. Performance is measured across multiple metrics relating model error to uncertainty. Our experiments show that none of the methods consistently outperformed each other across the various metrics. Ensembling remained better at generalization and for NNIP robustness; MVE only proved effective for in-domain interpolation, while GMM was better out-of-domain; and evidential regression, despite its promise, was not the preferable alternative in any of the cases. More broadly, cost-effective, single deterministic models cannot yet consistently match or outperform ensembling for uncertainty quantification in NNIPs.

Fault Tolerant Processing Unit Using Gamma Distribution Sliding Window For Autonomous Landing Guidance System

  • Authors: Hossam O. Ahmed
  • Subjects: Systems and Control (eess.SY); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2305.01771
  • Pdf link: https://arxiv.org/pdf/2305.01771
  • Abstract
    To keep up with today's dense metropolitan areas and their accompanying traffic problems, a growing number of towns are looking for more advanced and swift urban taxi drones. The safety parameters that must be taken into consideration may be the most important element in the widespread use of such technology. Most recent aviation mishaps have happened during the landing phase, making this a particularly important safety consideration for Vertical and/or Short Take-Off and Landing (V/STOL) drones. In this study, we focused on improving the fault tolerance of the processor architectures used by the predecessors of Autonomous Landing Guidance Assistance Systems (ALGAS), which in turn improves their decision-making capabilities. Furthermore, this is achieved by proposing a fault-tolerant processing architecture that depends on the Gamma Distribution Sliding Window Unit (GDSWU). This proposed GDSWU has been designed completely using VHDL, and the targeted FPFA was the Intel Cyclone V 5CGXFC9D6F27C7 chip. The GDSWU could operate at a maximum frequency of 369.96 MHz, as calculated by the synthesis results of the INTEL Quartus Prime program. The suggested GDSWU core only requires 20.36 mW for dynamic core and I/O power consumption.

Cheap and Deterministic Inference for Deep State-Space Models of Interacting Dynamical Systems

  • Authors: Andreas Look, Melih Kandemir, Barbara Rakitsch, Jan Peters
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2305.01773
  • Pdf link: https://arxiv.org/pdf/2305.01773
  • Abstract
    Graph neural networks are often used to model interacting dynamical systems since they gracefully scale to systems with a varying and high number of agents. While there has been much progress made for deterministic interacting systems, modeling is much more challenging for stochastic systems in which one is interested in obtaining a predictive distribution over future trajectories. Existing methods are either computationally slow since they rely on Monte Carlo sampling or make simplifying assumptions such that the predictive distribution is unimodal. In this work, we present a deep state-space model which employs graph neural networks in order to model the underlying interacting dynamical system. The predictive distribution is multimodal and has the form of a Gaussian mixture model, where the moments of the Gaussian components can be computed via deterministic moment matching rules. Our moment matching scheme can be exploited for sample-free inference, leading to more efficient and stable training compared to Monte Carlo alternatives. Furthermore, we propose structured approximations to the covariance matrices of the Gaussian components in order to scale up to systems with many agents. We benchmark our novel framework on two challenging autonomous driving datasets. Both confirm the benefits of our method compared to state-of-the-art methods. We further demonstrate the usefulness of our individual contributions in a carefully designed ablation study and provide a detailed runtime analysis of our proposed covariance approximations. Finally, we empirically demonstrate the generalization ability of our method by evaluating its performance on unseen scenarios.

Bio-Inspired Simple Neural Network for Low-Light Image Restoration: A Minimalist Approach

  • Authors: Junjie Ye, Jilin Zhao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2305.01844
  • Pdf link: https://arxiv.org/pdf/2305.01844
  • Abstract
    In this study, we explore the potential of using a straightforward neural network inspired by the retina model to efficiently restore low-light images. The retina model imitates the neurophysiological principles and dynamics of various optical neurons. Our proposed neural network model reduces the computational overhead compared to traditional signal-processing models while achieving results similar to complex deep learning models from a subjective perceptual perspective. By directly simulating retinal neuron functionalities with neural networks, we not only avoid manual parameter optimization but also lay the groundwork for constructing artificial versions of specific neurobiological organizations.

The Impacts of Dimensionality, Diffusion, and Directedness on Intrinsic Cross-Model Simulation in Tile-Based Self-Assembly

  • Authors: Daniel Hader, Matthew J. Patitz
  • Subjects: Computational Geometry (cs.CG); Emerging Technologies (cs.ET)
  • Arxiv link: https://arxiv.org/abs/2305.01877
  • Pdf link: https://arxiv.org/pdf/2305.01877
  • Abstract
    Algorithmic self-assembly occurs when disorganized components autonomously combine to form structures and, by their design and the dynamics of the system, are forced to follow the execution of algorithms. Motivated by applications in DNA-nanotechnology, investigations in algorithmic tile-based self-assembly have blossomed into a mature theory with research leveraging tools from computability theory, complexity theory, information theory, and graph theory to develop a wide range of models and show that many are computationally universal, while also exposing powers and limitations of each. Beyond computational universality, the abstract Tile Assembly Model (aTAM) was shown to be intrinsically universal (IU), a strong notion of completeness where a single tile set is capable of simulating all systems within the model; however, this result required non-deterministic tile attachments. This was later confirmed necessary when it was shown that the class of directed aTAM systems is not IU. Building on these results to further investigate the impacts of other dynamics, Hader et al. examined several tile-assembly models which varied across (1) the numbers of dimensions used, (2) restrictions based on diffusion of tiles through space, and (3) whether each system is directed, and showed which models are IU. Such results have shed much light on the roles of various aspects of the dynamics of tile-assembly and their effects on the intrinsic universality of each model. Here we provide direct comparisons of the various models by considering intrinsic simulations between models. We show that in some cases one model is more powerful than another, and in others, pairs of models have mutually exclusive capabilities. This comparison helps to expose the impacts of these three important aspects and further helps define a hierarchy of tile-assembly models.

Class adaptive threshold and negative class guided noisy annotation robust Facial Expression Recognition

  • Authors: Darshan Gera, Badveeti Naveen Siva Kumar, Bobbili Veerendra Raj Kumar, S Balasubramanian
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.01884
  • Pdf link: https://arxiv.org/pdf/2305.01884
  • Abstract
    The hindering problem in facial expression recognition (FER) is the presence of inaccurate annotations referred to as noisy annotations in the datasets. These noisy annotations are present in the datasets inherently because the labeling is subjective to the annotator, clarity of the image, etc. Recent works use sample selection methods to solve this noisy annotation problem in FER. In our work, we use a dynamic adaptive threshold to separate confident samples from non-confident ones so that our learning won't be hampered due to non-confident samples. Instead of discarding the non-confident samples, we impose consistency in the negative classes of those non-confident samples to guide the model to learn better in the positive class. Since FER datasets usually come with 7 or 8 classes, we can correctly guess a negative class by 85% probability even by choosing randomly. By learning "which class a sample doesn't belong to", the model can learn "which class it belongs to" in a better manner. We demonstrate proposed framework's effectiveness using quantitative as well as qualitative results. Our method performs better than the baseline by a margin of 4% to 28% on RAFDB and 3.3% to 31.4% on FERPlus for various levels of synthetic noisy labels in the aforementioned datasets.

Evolving Dictionary Representation for Few-shot Class-incremental Learning

  • Authors: Xuejun Han, Yuhong Guo
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.01885
  • Pdf link: https://arxiv.org/pdf/2305.01885
  • Abstract
    New objects are continuously emerging in the dynamically changing world and a real-world artificial intelligence system should be capable of continual and effectual adaptation to new emerging classes without forgetting old ones. In view of this, in this paper we tackle a challenging and practical continual learning scenario named few-shot class-incremental learning (FSCIL), in which labeled data are given for classes in a base session but very limited labeled instances are available for new incremental classes. To address this problem, we propose a novel and succinct approach by introducing deep dictionary learning which is a hybrid learning architecture that combines dictionary learning and visual representation learning to provide a better space for characterizing different classes. We simultaneously optimize the dictionary and the feature extraction backbone in the base session, while only finetune the dictionary in the incremental session for adaptation to novel classes, which can alleviate the forgetting on base classes compared to finetuning the entire model. To further facilitate future adaptation, we also incorporate multiple pseudo classes into the base session training so that certain space projected by dictionary can be reserved for future new concepts. The extensive experimental results on CIFAR100, miniImageNet and CUB200 validate the effectiveness of our approach compared to other SOTA methods.

PODTherm-GP: A Physics-based Data-Driven Approach for Effective Architecture-Level Thermal Simulation of Multi-Core CPUs

  • Authors: Lin Jiang, Anthony Dowling, Ming-C. Cheng, Yu Liu
  • Subjects: Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph)
  • Arxiv link: https://arxiv.org/abs/2305.01911
  • Pdf link: https://arxiv.org/pdf/2305.01911
  • Abstract
    A thermal simulation methodology derived from the proper orthogonal decomposition (POD) and the Galerkin projection (GP), hereafter referred to as PODTherm-GP, is evaluated in terms of its efficiency and accuracy in a multi-core CPU. The GP projects the heat transfer equation onto a mathematical space whose basis functions are generated from thermal data enabled by the POD learning algorithm. The thermal solution data are collected from FEniCS using the finite element method (FEM) accounting for appropriate parametric variations. The GP incorporates physical principles of heat transfer in the methodology to reach high accuracy and efficiency. The dynamic power map for the CPU in FEM thermal simulation is generated from gem5 and McPACT, together with the SPLASH-2 benchmarks as the simulation workload. It is shown that PODTherm-GP offers an accurate thermal prediction of the CPU with a resolution as fine as the FEM. It is also demonstrated that PODTherm-GP is capable of predicting the dynamic thermal profile of the chip with a good accuracy beyond the training conditions. Additionally, the approach offers a reduction in degrees of freedom by more than 5 orders of magnitude and a speedup of 4 orders, compared to the FEM.

Optimal Resource Management for Hierarchical Federated Learning over HetNets with Wireless Energy Transfer

  • Authors: Rami Hamdi, Ahmed Ben Said, Emna Baccour, Aiman Erbad, Amr Mohamed, Mounir Hamdi, Mohsen Guizani
  • Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.01953
  • Pdf link: https://arxiv.org/pdf/2305.01953
  • Abstract
    Remote monitoring systems analyze the environment dynamics in different smart industrial applications, such as occupational health and safety, and environmental monitoring. Specifically, in industrial Internet of Things (IoT) systems, the huge number of devices and the expected performance put pressure on resources, such as computational, network, and device energy. Distributed training of Machine and Deep Learning (ML/DL) models for intelligent industrial IoT applications is very challenging for resource limited devices over heterogeneous wireless networks (HetNets). Hierarchical Federated Learning (HFL) performs training at multiple layers offloading the tasks to nearby Multi-Access Edge Computing (MEC) units. In this paper, we propose a novel energy-efficient HFL framework enabled by Wireless Energy Transfer (WET) and designed for heterogeneous networks with massive Multiple-Input Multiple-Output (MIMO) wireless backhaul. Our energy-efficiency approach is formulated as a Mixed-Integer Non-Linear Programming (MINLP) problem, where we optimize the HFL device association and manage the wireless transmitted energy. However due to its high complexity, we design a Heuristic Resource Management Algorithm, namely H2RMA, that respects energy, channel quality, and accuracy constraints, while presenting a low computational complexity. We also improve the energy consumption of the network using an efficient device scheduling scheme. Finally, we investigate device mobility and its impact on the HFL performance. Our extensive experiments confirm the high performance of the proposed resource management approach in HFL over HetNets, in terms of training loss and grid energy costs.

District-scale surface temperatures generated from high-resolution longitudinal thermal infrared images

  • Authors: Subin Lin, Vasantha Ramani, Miguel Martin, Pandarasamy Arjunan, Adrian Chong, Filip Biljecki, Marcel Ignatius, Kameshwar Poolla, Clayton Miller
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.01971
  • Pdf link: https://arxiv.org/pdf/2305.01971
  • Abstract
    The paper describes a dataset that was collected by infrared thermography, which is a non-contact, non-intrusive technique to collect data and analyze the built environment in various aspects. While most studies focus on the city and building scales, the rooftop observatory provides high temporal and spatial resolution observations with dynamic interactions on the district scale. The rooftop infrared thermography observatory with a multi-modal platform that is capable of assessing a wide range of dynamic processes in urban systems was deployed in Singapore. It was placed on the top of two buildings that overlook the outdoor context of the campus of the National University of Singapore. The platform collects remote sensing data from tropical areas on a temporal scale, allowing users to determine the temperature trend of individual features such as buildings, roads, and vegetation. The dataset includes 1,365,921 thermal images collected on average at approximately 10 seconds intervals from two locations during ten months.

Computing paths of large rank in planar frameworks deterministically

  • Authors: Fedor V. Fomin, Petr A. Golovach, Tuukka Korhonen, Giannos Stamoulis
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2305.01993
  • Pdf link: https://arxiv.org/pdf/2305.01993
  • Abstract
    A framework consists of an undirected graph $G$ and a matroid $M$ whose elements correspond to the vertices of $G$. Recently, Fomin et al. [SODA 2023] and Eiben et al. [ArXiV 2023] developed parameterized algorithms for computing paths of rank $k$ in frameworks. More precisely, for vertices $s$ and $t$ of $G$, and an integer $k$, they gave FPT algorithms parameterized by $k$ deciding whether there is an $(s,t)$-path in $G$ whose vertex set contains a subset of elements of $M$ of rank $k$. These algorithms are based on Schwartz-Zippel lemma for polynomial identity testing and thus are randomized, and therefore the existence of a deterministic FPT algorithm for this problem remains open. We present the first deterministic FPT algorithm that solves the problem in frameworks whose underlying graph $G$ is planar. While the running time of our algorithm is worse than the running times of the recent randomized algorithms, our algorithm works on more general classes of matroids. In particular, this is the first FPT algorithm for the case when matroid $M$ is represented over rationals. Our main technical contribution is the nontrivial adaptation of the classic irrelevant vertex technique to frameworks to reduce the given instance to one of bounded treewidth. This allows us to employ the toolbox of representative sets to design a dynamic programming procedure solving the problem efficiently on instances of bounded treewidth.

Gym-preCICE: Reinforcement Learning Environments for Active Flow Control

  • Authors: Mosayeb Shams, Ahmed H. Elsheikh
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.02033
  • Pdf link: https://arxiv.org/pdf/2305.02033
  • Abstract
    Active flow control (AFC) involves manipulating fluid flow over time to achieve a desired performance or efficiency. AFC, as a sequential optimisation task, can benefit from utilising Reinforcement Learning (RL) for dynamic optimisation. In this work, we introduce Gym-preCICE, a Python adapter fully compliant with Gymnasium (formerly known as OpenAI Gym) API to facilitate designing and developing RL environments for single- and multi-physics AFC applications. In an actor-environment setting, Gym-preCICE takes advantage of preCICE, an open-source coupling library for partitioned multi-physics simulations, to handle information exchange between a controller (actor) and an AFC simulation environment. The developed framework results in a seamless non-invasive integration of realistic physics-based simulation toolboxes with RL algorithms. Gym-preCICE provides a framework for designing RL environments to model AFC tasks, as well as a playground for applying RL algorithms in various AFC-related engineering applications.

Improved Static Hand Gesture Classification on Deep Convolutional Neural Networks using Novel Sterile Training Technique

  • Authors: Josiah Smith, Shiva Thiagarajan, Richard Willis, Yiorgos Makris, Murat Torlak
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.02039
  • Pdf link: https://arxiv.org/pdf/2305.02039
  • Abstract
    In this paper, we investigate novel data collection and training techniques towards improving classification accuracy of non-moving (static) hand gestures using a convolutional neural network (CNN) and frequency-modulated-continuous-wave (FMCW) millimeter-wave (mmWave) radars. Recently, non-contact hand pose and static gesture recognition have received considerable attention in many applications ranging from human-computer interaction (HCI), augmented/virtual reality (AR/VR), and even therapeutic range of motion for medical applications. While most current solutions rely on optical or depth cameras, these methods require ideal lighting and temperature conditions. mmWave radar devices have recently emerged as a promising alternative offering low-cost system-on-chip sensors whose output signals contain precise spatial information even in non-ideal imaging conditions. Additionally, deep convolutional neural networks have been employed extensively in image recognition by learning both feature extraction and classification simultaneously. However, little work has been done towards static gesture recognition using mmWave radars and CNNs due to the difficulty involved in extracting meaningful features from the radar return signal, and the results are inferior compared with dynamic gesture classification. This article presents an efficient data collection approach and a novel technique for deep CNN training by introducing ``sterile'' images which aid in distinguishing distinct features among the static gestures and subsequently improve the classification accuracy. Applying the proposed data collection and training methods yields an increase in classification rate of static hand gestures from $85%$ to $93%$ and $90%$ to $95%$ for range and range-angle profiles, respectively.

What makes a good pause? Investigating the turn-holding effects of fillers

  • Authors: Bing'er Jiang, Erik Ekstedt, Gabriel Skantze
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2305.02101
  • Pdf link: https://arxiv.org/pdf/2305.02101
  • Abstract
    Filled pauses (or fillers), such as "uh" and "um", are frequent in spontaneous speech and can serve as a turn-holding cue for the listener, indicating that the current speaker is not done yet. In this paper, we use the recently proposed Voice Activity Projection (VAP) model, which is a deep learning model trained to predict the dynamics of conversation, to analyse the effects of filled pauses on the expected turn-hold probability. The results show that, while filled pauses do indeed have a turn-holding effect, it is perhaps not as strong as could be expected, probably due to the redundancy of other cues. We also find that the prosodic properties and position of the filler has a significant effect on the turn-hold probability. However, contrary to what has been suggested in previous work, there is no difference between "uh" and "um" in this regard.

Synergies Between Federated Learning and O-RAN: Towards an Elastic Virtualized Architecture for Multiple Distributed Machine Learning Services

  • Authors: Payam Abdisarabshali, Nicholas Accurso, Filippo Malandra, Weifeng Su, Seyyedali Hosseinalipour
  • Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.02109
  • Pdf link: https://arxiv.org/pdf/2305.02109
  • Abstract
    Federated learning (FL) is the most popular distributed machine learning technique. However, implementation of FL over modern wireless networks faces key challenges caused by (i) dynamics of the network conditions, (ii) coexistence of multiple FL services/tasks in the system, and (iii) concurrent execution of FL services with other network services, which are not jointly considered in prior works. Motivated by these challenges, we introduce a generic FL paradigm over next-generation (NextG) networks, called dynamic multi-service FL (DMS-FL). We identify three unexplored design considerations in DMS-FL: (i) FL service operator accumulation, (ii) wireless resource fragmentation, and (iii) signal strength fluctuations. We take the first steps towards addressing these design considerations through proposing a novel distributed ML architecture called elastic virtualized FL (EV-FL). EV-FL unleashes the full potential of Open RAN (O-RAN) systems and introduces an elastic resource provisioning methodology to execute FL services. It further constitutes a multi-time-scale FL management system that introduces three dimensions into existing FL architectures: (i) virtualization, (ii) scalability, and (iii) elasticity. Through investigating EV-FL, we reveal a series of open research directions for future work. We finally simulate EV-FL to demonstrate its potential to save wireless resources and increase fairness among FL services.

Automatic Parameterization for Aerodynamic Shape Optimization via Deep Geometric Learning

  • Authors: Zhen Wei, Pascal Fua, Michaël Bauerheim
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Fluid Dynamics (physics.flu-dyn)
  • Arxiv link: https://arxiv.org/abs/2305.02116
  • Pdf link: https://arxiv.org/pdf/2305.02116
  • Abstract
    We propose two deep learning models that fully automate shape parameterization for aerodynamic shape optimization. Both models are optimized to parameterize via deep geometric learning to embed human prior knowledge into learned geometric patterns, eliminating the need for further handcrafting. The Latent Space Model (LSM) learns a low-dimensional latent representation of an object from a dataset of various geometries, while the Direct Mapping Model (DMM) builds parameterization on the fly using only one geometry of interest. We also devise a novel regularization loss that efficiently integrates volumetric mesh deformation into the parameterization model. The models directly manipulate the high-dimensional mesh data by moving vertices. LSM and DMM are fully differentiable, enabling gradient-based, end-to-end pipeline design and plug-and-play deployment of surrogate models or adjoint solvers. We perform shape optimization experiments on 2D airfoils and discuss the applicable scenarios for the two models.

System Neural Diversity: Measuring Behavioral Heterogeneity in Multi-Agent Learning

  • Authors: Matteo Bettini, Ajay Shankar, Amanda Prorok
  • Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.02128
  • Pdf link: https://arxiv.org/pdf/2305.02128
  • Abstract
    Evolutionary science provides evidence that diversity confers resilience. Yet, traditional multi-agent reinforcement learning techniques commonly enforce homogeneity to increase training sample efficiency. When a system of learning agents is not constrained to homogeneous policies, individual agents may develop diverse behaviors, resulting in emergent complementarity that benefits the system. Despite this feat, there is a surprising lack of tools that measure behavioral diversity in systems of learning agents. Such techniques would pave the way towards understanding the impact of diversity in collective resilience and performance. In this paper, we introduce System Neural Diversity (SND): a measure of behavioral heterogeneity for multi-agent systems where agents have stochastic policies. %over a continuous state space. We discuss and prove its theoretical properties, and compare it with alternate, state-of-the-art behavioral diversity metrics used in cross-disciplinary domains. Through simulations of a variety of multi-agent tasks, we show how our metric constitutes an important diagnostic tool to analyze latent properties of behavioral heterogeneity. By comparing SND with task reward in static tasks, where the problem does not change during training, we show that it is key to understanding the effectiveness of heterogeneous vs homogeneous agents. In dynamic tasks, where the problem is affected by repeated disturbances during training, we show that heterogeneous agents are first able to learn specialized roles that allow them to cope with the disturbance, and then retain these roles when the disturbance is removed. SND allows a direct measurement of this latent resilience, while other proxies such as task performance (reward) fail to.

An identification method for oscillators with response-dependent inertia

  • Authors: Yuval Harduf (1), Eyal Setter (1), Izhak Bucher (1) ((1) Technion Israel Institute of Technology, Faculty of mechanical engineering)
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.02135
  • Pdf link: https://arxiv.org/pdf/2305.02135
  • Abstract
    This paper is concerned with identifying the instantaneous modal parameters of oscillatory systems with response-dependent inertia (mass, inductance, or equivalent) based on their measured dynamics. An identification method is proposed, which is a variation of the "FORCEVIB" method. The method utilizes analytic signal representation and the properties of the Hilbert transform to obtain an analytic relationship between a system's natural frequency and damping coefficient to its response and excitation signals. The proposed method is validated by comparing the identification results to the asymptotic solution of a simple system with response-dependent inertia and is then demonstrated, numerically and experimentally, for other, more complicated, nonlinear systems.

A Curriculum View of Robust Loss Functions

  • Authors: Zebin Ou, Yue Zhang
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2305.02139
  • Pdf link: https://arxiv.org/pdf/2305.02139
  • Abstract
    Robust loss functions are designed to combat the adverse impacts of label noise, whose robustness is typically supported by theoretical bounds agnostic to the training dynamics. However, these bounds may fail to characterize the empirical performance as it remains unclear why robust loss functions can underfit. We show that most loss functions can be rewritten into a form with the same class-score margin and different sample-weighting functions. The resulting curriculum view provides a straightforward analysis of the training dynamics, which helps attribute underfitting to diminished average sample weights and noise robustness to larger weights for clean samples. We show that simple fixes to the curriculums can make underfitting robust loss functions competitive with the state-of-the-art, and training schedules can substantially affect the noise robustness even with robust loss functions. Code is available at \url{github}.

Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity

  • Authors: Haoran Xu, Maha Elbayad, Kenton Murray, Jean Maillard, Vedanuj Goswami
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2305.02176
  • Pdf link: https://arxiv.org/pdf/2305.02176
  • Abstract
    Mixture-of-experts (MoE) models that employ sparse activation have demonstrated effectiveness in significantly increasing the number of parameters while maintaining low computational requirements per token. However, recent studies have established that MoE models are inherently parameter-inefficient as the improvement in performance diminishes with an increasing number of experts. We hypothesize this parameter inefficiency is a result of all experts having equal capacity, which may not adequately meet the varying complexity requirements of different tokens or tasks, e.g., in a multilingual setting, languages based on their resource levels might require different capacities. In light of this, we propose Stratified Mixture of Experts(SMoE) models, which feature a stratified structure and can assign dynamic capacity to different tokens. We demonstrate the effectiveness of SMoE on two multilingual machine translation benchmarks, where it outperforms multiple state-of-the-art MoE models. On a diverse 15-language dataset, SMoE improves the translation quality over vanilla MoE by +0.93 BLEU points on average. Additionally, SMoE is parameter-efficient, matching vanilla MoE performance with around 50% fewer parameters.

A Multi-step Dynamics Modeling Framework For Autonomous Driving In Multiple Environments

  • Authors: Jason Gibson, Bogdan Vlahov, David Fan, Patrick Spieler, Daniel Pastor, Ali-akbar Agha-mohammadi, Evangelos A. Theodorou
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.02241
  • Pdf link: https://arxiv.org/pdf/2305.02241
  • Abstract
    Modeling dynamics is often the first step to making a vehicle autonomous. While on-road autonomous vehicles have been extensively studied, off-road vehicles pose many challenging modeling problems. An off-road vehicle encounters highly complex and difficult-to-model terrain/vehicle interactions, as well as having complex vehicle dynamics of its own. These complexities can create challenges for effective high-speed control and planning. In this paper, we introduce a framework for multistep dynamics prediction that explicitly handles the accumulation of modeling error and remains scalable for sampling-based controllers. Our method uses a specially-initialized Long Short-Term Memory (LSTM) over a limited time horizon as the learned component in a hybrid model to predict the dynamics of a 4-person seating all-terrain vehicle (Polaris S4 1000 RZR) in two distinct environments. By only having the LSTM predict over a fixed time horizon, we negate the need for long term stability that is often a challenge when training recurrent neural networks. Our framework is flexible as it only requires odometry information for labels. Through extensive experimentation, we show that our method is able to predict millions of possible trajectories in real-time, with a time horizon of five seconds in challenging off road driving scenarios.

Distributed Leader Follower Formation Control of Mobile Robots based on Bioinspired Neural Dynamics and Adaptive Sliding Innovation Filter

  • Authors: Zhe Xu, Tao Yan, Simon X. Yang, S. Andrew Gadsden
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.02288
  • Pdf link: https://arxiv.org/pdf/2305.02288
  • Abstract
    This paper investigated the distributed leader follower formation control problem for multiple differentially driven mobile robots. A distributed estimator is first introduced and it only requires the state information from each follower itself and its neighbors. Then, we propose a bioinspired neural dynamic based backstepping and sliding mode control hybrid formation control method with proof of its stability. The proposed control strategy resolves the impractical speed jump issue that exists in the conventional backstepping design. Additionally, considering the system and measurement noises, the proposed control strategy not only removes the chattering issue existing in the conventional sliding mode control but also provides smooth control input with extra robustness. After that, an adaptive sliding innovation filter is integrated with the proposed control to provide accurate state estimates that are robust to modeling uncertainties. Finally, we performed multiple simulations to demonstrate the efficiency and effectiveness of the proposed formation control strategy.

DynamicStereo: Consistent Dynamic Depth from Stereo Videos

  • Authors: Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.02296
  • Pdf link: https://arxiv.org/pdf/2305.02296
  • Abstract
    We consider the problem of reconstructing a dynamic scene observed from a stereo camera. Most existing methods for depth from stereo treat different stereo frames independently, leading to temporally inconsistent depth predictions. Temporal consistency is especially important for immersive AR or VR scenarios, where flickering greatly diminishes the user experience. We propose DynamicStereo, a novel transformer-based architecture to estimate disparity for stereo videos. The network learns to pool information from neighboring frames to improve the temporal consistency of its predictions. Our architecture is designed to process stereo videos efficiently through divided attention layers. We also introduce Dynamic Replica, a new benchmark dataset containing synthetic videos of people and animals in scanned environments, which provides complementary training and evaluation data for dynamic stereo closer to real applications than existing datasets. Training with this dataset further improves the quality of predictions of our proposed DynamicStereo as well as prior methods. Finally, it acts as a benchmark for consistent stereo methods.

New submissions for Wed, 22 Mar 23

Keyword: pruning

Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing and Neural Networks with Quadratic Activations

  • Authors: Nived Rajaraman, Devvrit, Aryan Mokhtari, Kannan Ramchandran
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2303.11453
  • Pdf link: https://arxiv.org/pdf/2303.11453
  • Abstract
    Pruning schemes have been widely used in practice to reduce the complexity of trained models with a massive number of parameters. Several practical studies have shown that pruning an overparameterized model and fine-tuning generalizes well to new samples. Although the above pipeline, which we refer to as pruning + fine-tuning, has been extremely successful in lowering the complexity of trained models, there is very little known about the theory behind this success. In this paper we address this issue by investigating the pruning + fine-tuning framework on the overparameterized matrix sensing problem, with the ground truth denoted $U_\star \in \mathbb{R}^{d \times r}$ and the overparameterized model $U \in \mathbb{R}^{d \times k}$ with $k \gg r$. We study the approximate local minima of the empirical mean square error, augmented with a smooth version of a group Lasso regularizer, $\sum_{i=1}^k | U e_i |2$ and show that pruning the low $\ell_2$-norm columns results in a solution $U{\text{prune}}$ which has the minimum number of columns $r$, yet is close to the ground truth in training loss. Initializing the subsequent fine-tuning phase from $U_{\text{prune}}$, the resulting solution converges linearly to a generalization error of $O(\sqrt{rd/n})$ ignoring lower order terms, which is statistically optimal. While our analysis provides insights into the role of regularization in pruning, we also show that running gradient descent in the absence of regularization results in models which {are not suitable for greedy pruning}, i.e., many columns could have their $\ell_2$ norm comparable to that of the maximum. Lastly, we extend our results for the training and pruning of two-layer neural networks with quadratic activation functions. Our results provide the first rigorous insights on why greedy pruning + fine-tuning leads to smaller models which also generalize well.

Dynamically Expandable Graph Convolution for Streaming Recommendation

  • Authors: Bowei He, Xu He, Yingxue Zhang, Ruiming Tang, Chen Ma
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2303.11700
  • Pdf link: https://arxiv.org/pdf/2303.11700
  • Abstract
    Personalized recommender systems have been widely studied and deployed to reduce information overload and satisfy users' diverse needs. However, conventional recommendation models solely conduct a one-time training-test fashion and can hardly adapt to evolving demands, considering user preference shifts and ever-increasing users and items in the real world. To tackle such challenges, the streaming recommendation is proposed and has attracted great attention recently. Among these, continual graph learning is widely regarded as a promising approach for the streaming recommendation by academia and industry. However, existing methods either rely on the historical data replay which is often not practical under increasingly strict data regulations, or can seldom solve the \textit{over-stability} issue. To overcome these difficulties, we propose a novel \textbf{D}ynamically \textbf{E}xpandable \textbf{G}raph \textbf{C}onvolution (DEGC) algorithm from a \textit{model isolation} perspective for the streaming recommendation which is orthogonal to previous methods. Based on the motivation of disentangling outdated short-term preferences from useful long-term preferences, we design a sequence of operations including graph convolution pruning, refining, and expanding to only preserve beneficial long-term preference-related parameters and extract fresh short-term preferences. Moreover, we model the temporal user preference, which is utilized as user embedding initialization, for better capturing the individual-level preference shifts. Extensive experiments on the three most representative GCN-based recommendation models and four industrial datasets demonstrate the effectiveness and robustness of our method.

Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

  • Authors: Sung-Feng Huang, Chia-ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-yi Lee
  • Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2303.11816
  • Pdf link: https://arxiv.org/pdf/2303.11816
  • Abstract
    Personalized TTS is an exciting and highly desired application that allows users to train their TTS voice using only a few recordings. However, TTS training typically requires many hours of recording and a large model, making it unsuitable for deployment on mobile devices. To overcome this limitation, related works typically require fine-tuning a pre-trained TTS model to preserve its ability to generate high-quality audio samples while adapting to the target speaker's voice. This process is commonly referred to as ``voice cloning.'' Although related works have achieved significant success in changing the TTS model's voice, they are still required to fine-tune from a large pre-trained model, resulting in a significant size for the voice-cloned model. In this paper, we propose applying trainable structured pruning to voice cloning. By training the structured pruning masks with voice-cloning data, we can produce a unique pruned model for each target speaker. Our experiments demonstrate that using learnable structured pruning, we can compress the model size to 7 times smaller while achieving comparable voice-cloning performance.

Protective Self-Adaptive Pruning to Better Compress DNNs

  • Authors: Liang Li, Pengfei Zhao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.11881
  • Pdf link: https://arxiv.org/pdf/2303.11881
  • Abstract
    Adaptive network pruning approach has recently drawn significant attention due to its excellent capability to identify the importance and redundancy of layers and filters and customize a suitable pruning solution. However, it remains unsatisfactory since current adaptive pruning methods rely mostly on an additional monitor to score layer and filter importance, and thus faces high complexity and weak interpretability. To tackle these issues, we have deeply researched the weight reconstruction process in iterative prune-train process and propose a Protective Self-Adaptive Pruning (PSAP) method. First of all, PSAP can utilize its own information, weight sparsity ratio, to adaptively adjust pruning ratio of layers before each pruning step. Moreover, we propose a protective reconstruction mechanism to prevent important filters from being pruned through supervising gradients and to avoid unrecoverable information loss as well. Our PSAP is handy and explicit because it merely depends on weights and gradients of model itself, instead of requiring an additional monitor as in early works. Experiments on ImageNet and CIFAR-10 also demonstrate its superiority to current works in both accuracy and compression ratio, especially for compressing with a high ratio or pruning from scratch.

Performance-aware Approximation of Global Channel Pruning for Multitask CNNs

  • Authors: Hancheng Ye, Bo Zhang, Tao Chen, Jiayuan Fan, Bin Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.11923
  • Pdf link: https://arxiv.org/pdf/2303.11923
  • Abstract
    Global channel pruning (GCP) aims to remove a subset of channels (filters) across different layers from a deep model without hurting the performance. Previous works focus on either single task model pruning or simply adapting it to multitask scenario, and still face the following problems when handling multitask pruning: 1) Due to the task mismatch, a well-pruned backbone for classification task focuses on preserving filters that can extract category-sensitive information, causing filters that may be useful for other tasks to be pruned during the backbone pruning stage; 2) For multitask predictions, different filters within or between layers are more closely related and interacted than that for single task prediction, making multitask pruning more difficult. Therefore, aiming at multitask model compression, we propose a Performance-Aware Global Channel Pruning (PAGCP) framework. We first theoretically present the objective for achieving superior GCP, by considering the joint saliency of filters from intra- and inter-layers. Then a sequentially greedy pruning strategy is proposed to optimize the objective, where a performance-aware oracle criterion is developed to evaluate sensitivity of filters to each task and preserve the globally most task-related filters. Experiments on several multitask datasets show that the proposed PAGCP can reduce the FLOPs and parameters by over 60% with minor performance drop, and achieves 1.2x$\sim$3.3x acceleration on both cloud and mobile platforms.

Keyword: neural\ architecture\ search

There is no result

Keyword: 3d object detection

Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection

  • Authors: Shihao Wang, Yingfei Liu, Tiancai Wang, Ying Li, Xiangyu Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.11926
  • Pdf link: https://arxiv.org/pdf/2303.11926
  • Abstract
    In this paper, we propose a long-sequence modeling framework, named StreamPETR, for multi-view 3D object detection. Built upon the sparse query design in the PETR series, we systematically develop an object-centric temporal mechanism. The model is performed in an online manner and the long-term historical information is propagated through object queries frame by frame. Besides, we introduce a motion-aware layer normalization to model the movement of the objects. StreamPETR achieves significant performance improvements only with negligible computation cost, compared to the single-frame baseline. On the standard nuScenes benchmark, it reaches a new state-of-the-art performance (63.6% NDS). The lightweight version realizes 45.0% mAP and 31.7 FPS, outperforming the state-of-the-art method (SOLOFusion) by 2.3% mAP and 1.8x faster FPS. Code will be available at https://github.com/exiawsh/StreamPETR.git.

Keyword: voxel

Smart-Tree: Neural Medial Axis Approximation of Point Clouds for 3D Tree Skeletonization

  • Authors: Harry Dobbs, Oliver Batchelor, Richard Green, James Atlas
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.11560
  • Pdf link: https://arxiv.org/pdf/2303.11560
  • Abstract
    In this paper, we present Smart-Tree, a supervised method for approximating the medial axes of branch skeletons from a tree's point cloud. A sparse voxel convolutional neural network extracts each input point's radius and direction towards the medial axis. A greedy algorithm performs robust skeletonization using the estimated medial axis. The proposed method provides robustness to complex tree structures and improves fidelity when dealing with self-occlusions, complex geometry, touching branches, and varying point densities. We train and test the method using a multi-species synthetic tree data set and perform qualitative analysis on a real-life tree point cloud. Experimentation with synthetic and real-world datasets demonstrates the robustness of our approach over the current state-of-the-art method. Further research will focus on training the method on a broader range of tree species and improving robustness to point cloud gaps. The details to obtain the dataset are at https://github.com/uc-vision/synthetic-trees.

CurveCloudNet: Processing Point Clouds with 1D Structure

  • Authors: Colton Stearns, Jiateng Liu, Davis Rempe, Despoina Paschalidou, Jeong Joon Park, Sebastien Mascha, Leonidas J. Guibas
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.12050
  • Pdf link: https://arxiv.org/pdf/2303.12050
  • Abstract
    Modern depth sensors such as LiDAR operate by sweeping laser-beams across the scene, resulting in a point cloud with notable 1D curve-like structures. In this work, we introduce a new point cloud processing scheme and backbone, called CurveCloudNet, which takes advantage of the curve-like structure inherent to these sensors. While existing backbones discard the rich 1D traversal patterns and rely on Euclidean operations, CurveCloudNet parameterizes the point cloud as a collection of polylines (dubbed a "curve cloud"), establishing a local surface-aware ordering on the points. Our method applies curve-specific operations to process the curve cloud, including a symmetric 1D convolution, a ball grouping for merging points along curves, and an efficient 1D farthest point sampling algorithm on curves. By combining these curve operations with existing point-based operations, CurveCloudNet is an efficient, scalable, and accurate backbone with low GPU memory requirements. Evaluations on the ShapeNet, Kortx, Audi Driving, and nuScenes datasets demonstrate that CurveCloudNet outperforms both point-based and sparse-voxel backbones in various segmentation settings, notably scaling better to large scenes than point-based alternatives while exhibiting better single object performance than sparse-voxel alternatives.

Keyword: lidar

Dual-Weight Particle Filter for Radar-Based Dynamic Bayesian Grid Maps

  • Authors: Max Peter Ronecker, Michael Stolz, Daniel Watzenig
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.11390
  • Pdf link: https://arxiv.org/pdf/2303.11390
  • Abstract
    Through constant improvements in recent years radar sensors have become a viable alternative to lidar as the main distancing sensor of an autonomous vehicle. Although robust and with the possibility to directly measure the radial velocity, it brings it's own set of challenges, for which existing algorithms need to be adapted. One core algorithm of a perception system is dynamic occupancy grid mapping, which has traditionally relied on lidar. In this paper we present a dual-weight particle filter as an extension for a Bayesian occupancy grid mapping framework to allow to operate it with radar as its main sensors. It uses two separate particle weights that are computed differently to compensate that a radial velocity measurement in many situations is not able to capture the actual velocity of an object. We evaluate the method extensively with simulated data and show the advantages over existing single weight solutions.

Lidar Line Selection with Spatially-Aware Shapley Value for Cost-Efficient Depth Completion

  • Authors: Kamil Adamczewski, Christos Sakaridis, Vaishakh Patil, Luc Van Gool
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.11720
  • Pdf link: https://arxiv.org/pdf/2303.11720
  • Abstract
    Lidar is a vital sensor for estimating the depth of a scene. Typical spinning lidars emit pulses arranged in several horizontal lines and the monetary cost of the sensor increases with the number of these lines. In this work, we present the new problem of optimizing the positioning of lidar lines to find the most effective configuration for the depth completion task. We propose a solution to reduce the number of lines while retaining the up-to-the-mark quality of depth completion. Our method consists of two components, (1) line selection based on the marginal contribution of a line computed via the Shapley value and (2) incorporating line position spread to take into account its need to arrive at image-wide depth completion. Spatially-aware Shapley values (SaS) succeed in selecting line subsets that yield a depth accuracy comparable to the full lidar input while using just half of the lines.

LoRCoN-LO: Long-term Recurrent Convolutional Network-based LiDAR Odometry

  • Authors: Donghwi Jung, Jae-Kyung Cho, Younghwa Jung, Soohyun Shin, Seong-Woo Kim
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.11853
  • Pdf link: https://arxiv.org/pdf/2303.11853
  • Abstract
    We propose a deep learning-based LiDAR odometry estimation method called LoRCoN-LO that utilizes the long-term recurrent convolutional network (LRCN) structure. The LRCN layer is a structure that can process spatial and temporal information at once by using both CNN and LSTM layers. This feature is suitable for predicting continuous robot movements as it uses point clouds that contain spatial information. Therefore, we built a LoRCoN-LO model using the LRCN layer, and predicted the pose of the robot through this model. For performance verification, we conducted experiments exploiting a public dataset (KITTI). The results of the experiment show that LoRCoN-LO displays accurate odometry prediction in the dataset. The code is available at https://github.com/donghwijung/LoRCoN-LO.

Penalty-Based Imitation Learning With Cross Semantics Generation Sensor Fusion for Autonomous Driving

  • Authors: Hongkuan Zhou, Aifen Sui, Letian Shi
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.11888
  • Pdf link: https://arxiv.org/pdf/2303.11888
  • Abstract
    With the rapid development of Pattern Recognition and Computer Vision technologies, tasks like object detection or semantic segmentation have achieved even better accuracy than human beings. Based on these solid foundations, autonomous driving is becoming an important research direction, aiming to revolute the future of transportation and mobility. Sensors are critical to autonomous driving's security and feasibility to perceive the surrounding environment. Multi-Sensor fusion has become a current research hot spot because of its potential for multidimensional perception and integration ability. In this paper, we propose a novel feature-level multi-sensor fusion technology for end-to-end autonomous driving navigation with imitation learning. Our paper mainly focuses on fusion technologies for Lidar and RGB information. We also provide a brand-new penalty-based imitation learning method to reinforce the model's compliance with traffic rules and unify the objective of imitation learning and the metric of autonomous driving.

Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion

  • Authors: Haisong Liu, Tao Lu, Yihui Xu, Jia Liu, Limin Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.12017
  • Pdf link: https://arxiv.org/pdf/2303.12017
  • Abstract
    In this paper, we study the problem of jointly estimating the optical flow and scene flow from synchronized 2D and 3D data. Previous methods either employ a complex pipeline that splits the joint task into independent stages, or fuse 2D and 3D information in an early-fusion'' or late-fusion'' manner. Such one-size-fits-all approaches suffer from a dilemma of failing to fully utilize the characteristic of each modality or to maximize the inter-modality complementarity. To address the problem, we propose a novel end-to-end framework, which consists of 2D and 3D branches with multiple bidirectional fusion connections between them in specific layers. Different from previous work, we apply a point-based 3D branch to extract the LiDAR features, as it preserves the geometric structure of point clouds. To fuse dense image features and sparse point features, we propose a learnable operator named bidirectional camera-LiDAR fusion module (Bi-CLFM). We instantiate two types of the bidirectional fusion pipeline, one based on the pyramidal coarse-to-fine architecture (dubbed CamLiPWC), and the other one based on the recurrent all-pairs field transforms (dubbed CamLiRAFT). On FlyingThings3D, both CamLiPWC and CamLiRAFT surpass all existing methods and achieve up to a 47.9% reduction in 3D end-point-error from the best published result. Our best-performing model, CamLiRAFT, achieves an error of 4.26% on the KITTI Scene Flow benchmark, ranking 1st among all submissions with much fewer parameters. Besides, our methods have strong generalization performance and the ability to handle non-rigid motion. Code is available at https://github.com/MCG-NJU/CamLiFlow.

CurveCloudNet: Processing Point Clouds with 1D Structure

  • Authors: Colton Stearns, Jiateng Liu, Davis Rempe, Despoina Paschalidou, Jeong Joon Park, Sebastien Mascha, Leonidas J. Guibas
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.12050
  • Pdf link: https://arxiv.org/pdf/2303.12050
  • Abstract
    Modern depth sensors such as LiDAR operate by sweeping laser-beams across the scene, resulting in a point cloud with notable 1D curve-like structures. In this work, we introduce a new point cloud processing scheme and backbone, called CurveCloudNet, which takes advantage of the curve-like structure inherent to these sensors. While existing backbones discard the rich 1D traversal patterns and rely on Euclidean operations, CurveCloudNet parameterizes the point cloud as a collection of polylines (dubbed a "curve cloud"), establishing a local surface-aware ordering on the points. Our method applies curve-specific operations to process the curve cloud, including a symmetric 1D convolution, a ball grouping for merging points along curves, and an efficient 1D farthest point sampling algorithm on curves. By combining these curve operations with existing point-based operations, CurveCloudNet is an efficient, scalable, and accurate backbone with low GPU memory requirements. Evaluations on the ShapeNet, Kortx, Audi Driving, and nuScenes datasets demonstrate that CurveCloudNet outperforms both point-based and sparse-voxel backbones in various segmentation settings, notably scaling better to large scenes than point-based alternatives while exhibiting better single object performance than sparse-voxel alternatives.

New submissions for Thu, 6 Apr 23

Keyword: efficient

A Compositional Resilience Index for Computationally Efficient Safety Analysis of Interconnected Systems

  • Authors: Luyao Niu, Abdullah Al Maruf, Andrew Clark, J. Sukarno Mertoguno, Radha Poovendran
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02058
  • Pdf link: https://arxiv.org/pdf/2304.02058
  • Abstract
    Interconnected systems such as power systems and chemical processes are often required to satisfy safety properties in the presence of faults and attacks. Verifying safety of these systems, however, is computationally challenging due to nonlinear dynamics, high dimensionality, and combinatorial number of possible faults and attacks that can be incurred by the subsystems interconnected within the network. In this paper, we develop a compositional resilience index to verify safety properties of interconnected systems under faults and attacks. The resilience index is a tuple serving the following two purposes. First, it quantifies how a safety property is impacted when a subsystem is compromised by faults and attacks. Second, the resilience index characterizes the needed behavior of a subsystem during normal operations to ensure safety violations will not occur when future adverse events occur. We develop a set of sufficient conditions on the dynamics of each subsystem to satisfy its safety constraint, and leverage these conditions to formulate an optimization program to compute the resilience index. When multiple subsystems are interconnected and their resilience indices are given, we show that the safety constraints of the interconnected system can be efficiently verified by solving a system of linear inequalities. We demonstrate our developed resilience index using a numerical case study on chemical reactors connected in series.

GUTS: Generalized Uncertainty-Aware Thompson Sampling for Multi-Agent Active Search

  • Authors: Nikhil Angad Bakshi, Tejus Gupta, Ramina Ghods, Jeff Schneider
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.02075
  • Pdf link: https://arxiv.org/pdf/2304.02075
  • Abstract
    Robotic solutions for quick disaster response are essential to ensure minimal loss of life, especially when the search area is too dangerous or too vast for human rescuers. We model this problem as an asynchronous multi-agent active-search task where each robot aims to efficiently seek objects of interest (OOIs) in an unknown environment. This formulation addresses the requirement that search missions should focus on quick recovery of OOIs rather than full coverage of the search region. Previous approaches fail to accurately model sensing uncertainty, account for occlusions due to foliage or terrain, or consider the requirement for heterogeneous search teams and robustness to hardware and communication failures. We present the Generalized Uncertainty-aware Thompson Sampling (GUTS) algorithm, which addresses these issues and is suitable for deployment on heterogeneous multi-robot systems for active search in large unstructured environments. We show through simulation experiments that GUTS consistently outperforms existing methods such as parallelized Thompson Sampling and exhaustive search, recovering all OOIs in 80% of all runs. In contrast, existing approaches recover all OOIs in less than 40% of all runs. We conduct field tests using our multi-robot system in an unstructured environment with a search area of approximately 75,000 sq. m. Our system demonstrates robustness to various failure modes, achieving full recovery of OOIs (where feasible) in every field run, and significantly outperforming our baseline.

MadEye: Boosting Live Video Analytics Accuracy with Adaptive Camera Configurations

  • Authors: Mike Wong, Murali Ramanujam, Guha Balakrishnan, Ravi Netravali
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.02101
  • Pdf link: https://arxiv.org/pdf/2304.02101
  • Abstract
    Camera orientations (i.e., rotation and zoom) govern the content that a camera captures in a given scene, which in turn heavily influences the accuracy of live video analytics pipelines. However, existing analytics approaches leave this crucial adaptation knob untouched, instead opting to only alter the way that captured images from fixed orientations are encoded, streamed, and analyzed. We present MadEye, a camera-server system that automatically and continually adapts orientations to maximize accuracy for the workload and resource constraints at hand. To realize this using commodity pan-tilt-zoom (PTZ) cameras, MadEye embeds (1) a search algorithm that rapidly explores the massive space of orientations to identify a fruitful subset at each time, and (2) a novel knowledge distillation strategy to efficiently (with only camera resources) select the ones that maximize workload accuracy. Experiments on diverse workloads show that MadEye boosts accuracy by 2.9-25.7% for the same resource usage, or achieves the same accuracy with 2-3.7x lower resource costs.

DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation

  • Authors: Peiyao Wang, Haibin Ling
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02110
  • Pdf link: https://arxiv.org/pdf/2304.02110
  • Abstract
    Fully supervised action segmentation works on frame-wise action recognition with dense annotations and often suffers from the over-segmentation issue. Existing works have proposed a variety of solutions such as boundary-aware networks, multi-stage refinement, and temporal smoothness losses. However, most of them take advantage of frame-wise supervision, which cannot effectively tackle the evaluation metrics with different granularities. In this paper, for the desirable large receptive field, we first develop a novel local-global attention mechanism with temporal pyramid dilation and temporal pyramid pooling for efficient multi-scale attention. Then we decouple two inherent goals in action segmentation, ie, (1) individual identification solved by frame-wise supervision, and (2) temporal reasoning tackled by action set prediction. Afterward, an action alignment module fuses these different granularity predictions, leading to more accurate and smoother action segmentation. We achieve state-of-the-art accuracy, eg, 82.8% (+2.6%) on GTEA and 74.7% (+1.2%) on Breakfast, which demonstrates the effectiveness of our proposed method, accompanied by extensive ablation studies. The code will be made available later.

Initialization Approach for Nonlinear State-Space Identification via the Subspace Encoder Approach

  • Authors: Rishi Ramkannan, Gerben I. Beintema, Roland Tóth, Maarten Schoukens
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02119
  • Pdf link: https://arxiv.org/pdf/2304.02119
  • Abstract
    The SUBNET neural network architecture has been developed to identify nonlinear state-space models from input-output data. To achieve this, it combines the rolled-out nonlinear state-space equations and a state encoder function, both parameterised as a neural network. The encoder function is introduced to reconstruct the current state from past input-output data. Hence it enables the forward simulation of the rolled-out state-space model. While this approach has shown to provide high-accuracy and consistent model estimation, its convergence can be significantly improved by efficient initialization of the training process. This paper focuses on such an initialisation of the subspace encoder approach using the Best Linear Approximation (BLA). Using the BLA provided state-space matrices and its associated reconstructability map both the state-transition part of the network and the encoder are initialized. The performance of the improved initialisation scheme is evaluated on a Wiener-Hammerstein simulation example and a benchmark dataset. The results show that for a weakly nonlinear system, the proposed initialisation based on the linear reconstructability map results in a faster convergence and a better model quality.

The Bit Complexity of Efficient Continuous Optimization

  • Authors: Mehrdad Ghadiri, Richard Peng, Santosh S. Vempala
  • Subjects: Data Structures and Algorithms (cs.DS); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.02124
  • Pdf link: https://arxiv.org/pdf/2304.02124
  • Abstract
    We analyze the bit complexity of efficient algorithms for fundamental optimization problems, such as linear regression, $p$-norm regression, and linear programming (LP). State-of-the-art algorithms are iterative, and in terms of the number of arithmetic operations, they match the current time complexity of multiplying two $n$-by-$n$ matrices (up to polylogarithmic factors). However, previous work has typically assumed infinite precision arithmetic, and due to complicated inverse maintenance techniques, the actual running times of these algorithms are unknown. To settle the running time and bit complexity of these algorithms, we demonstrate that a core common subroutine, known as \emph{inverse maintenance}, is backward-stable. Additionally, we show that iterative approaches for solving constrained weighted regression problems can be accomplished with bounded-error pre-conditioners. Specifically, we prove that linear programs can be solved approximately in matrix multiplication time multiplied by polylog factors that depend on the condition number $\kappa$ of the matrix and the inner and outer radius of the LP problem. $p$-norm regression can be solved approximately in matrix multiplication time multiplied by polylog factors in $\kappa$. Lastly, linear regression can be solved approximately in input-sparsity time multiplied by polylog factors in $\kappa$. Furthermore, we present results for achieving lower than matrix multiplication time for $p$-norm regression by utilizing faster solvers for sparse linear systems.

Sequential Linearithmic Time Optimal Unimodal Fitting When Minimizing Univariate Linear Losses

  • Authors: Kaan Gokcesu, Hakan Gokcesu
  • Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02141
  • Pdf link: https://arxiv.org/pdf/2304.02141
  • Abstract
    This paper focuses on optimal unimodal transformation of the score outputs of a univariate learning model under linear loss functions. We demonstrate that the optimal mapping between score values and the target region is a rectangular function. To produce this optimal rectangular fit for the observed samples, we propose a sequential approach that can its estimation with each incoming new sample. Our approach has logarithmic time complexity per iteration and is optimally efficient.

Dynamic Adversarial Resource Allocation: the dDAB Game

  • Authors: Daigo Shishika, Yue Guan, Jason R. Marden, Michael Dorothy, Panagiotis Tsiotras, Vijay Kumar
  • Subjects: Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2304.02172
  • Pdf link: https://arxiv.org/pdf/2304.02172
  • Abstract
    This work proposes a dynamic and adversarial resource allocation problem in a graph environment, which is referred to as the dynamic Defender-Attacker Blotto (dDAB) game. A team of defender robots is tasked to ensure numerical advantage at every node in the graph against a team of attacker robots. The engagement is formulated as a discrete-time dynamic game, where the two teams reallocate their robots in sequence and each robot can move at most one hop at each time step. The game terminates with the attacker's victory if any node has more attacker robots than defender robots. Our goal is to identify the necessary and sufficient number of defender robots to guarantee defense. Through a reachability analysis, we first solve the problem for the case where the attacker team stays as a single group. The results are then generalized to the case where the attacker team can freely split and merge into subteams. Crucially, our analysis indicates that there is no incentive for the attacker team to split, which significantly reduces the search space for the attacker's winning strategies and also enables us to design defender counter-strategies using superposition. We also present an efficient numerical algorithm to identify the necessary and sufficient number of defender robots to defend a given graph. Finally, we present illustrative examples to verify the efficacy of the proposed framework.

Explainable Automated Debugging via Large Language Model-driven Scientific Debugging

  • Authors: Sungmin Kang, Bei Chen, Shin Yoo, Jian-Guang Lou
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.02195
  • Pdf link: https://arxiv.org/pdf/2304.02195
  • Abstract
    Automated debugging techniques have the potential to reduce developer effort in debugging, and have matured enough to be adopted by industry. However, one critical issue with existing techniques is that, while developers want rationales for the provided automatic debugging results, existing techniques are ill-suited to provide them, as their deduction process differs significantly from that of human developers. Inspired by the way developers interact with code when debugging, we propose Automated Scientific Debugging (AutoSD), a technique that given buggy code and a bug-revealing test, prompts large language models to automatically generate hypotheses, uses debuggers to actively interact with buggy code, and thus automatically reach conclusions prior to patch generation. By aligning the reasoning of automated debugging more closely with that of human developers, we aim to produce intelligible explanations of how a specific patch has been generated, with the hope that the explanation will lead to more efficient and accurate developer decisions. Our empirical analysis on three program repair benchmarks shows that AutoSD performs competitively with other program repair baselines, and that it can indicate when it is confident in its results. Furthermore, we perform a human study with 20 participants, including six professional developers, to evaluate the utility of explanations from AutoSD. Participants with access to explanations could judge patch correctness in roughly the same time as those without, but their accuracy improved for five out of six real-world bugs studied: 70% of participants answered that they wanted explanations when using repair tools, while 55% answered that they were satisfied with the Scientific Debugging presentation.

PIKS: A Technique to Identify Actionable Trends for Policy-Makers Through Open Healthcare Data

  • Authors: A. Ravishankar Rao, Subrata Garai, Soumyabrata Dey, Hang Peng
  • Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.02208
  • Pdf link: https://arxiv.org/pdf/2304.02208
  • Abstract
    With calls for increasing transparency, governments are releasing greater amounts of data in multiple domains including finance, education and healthcare. The efficient exploratory analysis of healthcare data constitutes a significant challenge. Key concerns in public health include the quick identification and analysis of trends, and the detection of outliers. This allows policies to be rapidly adapted to changing circumstances. We present an efficient outlier detection technique, termed PIKS (Pruned iterative-k means searchlight), which combines an iterative k-means algorithm with a pruned searchlight based scan. We apply this technique to identify outliers in two publicly available healthcare datasets from the New York Statewide Planning and Research Cooperative System, and California's Office of Statewide Health Planning and Development. We provide a comparison of our technique with three other existing outlier detection techniques, consisting of auto-encoders, isolation forests and feature bagging. We identified outliers in conditions including suicide rates, immunity disorders, social admissions, cardiomyopathies, and pregnancy in the third trimester. We demonstrate that the PIKS technique produces results consistent with other techniques such as the auto-encoder. However, the auto-encoder needs to be trained, which requires several parameters to be tuned. In comparison, the PIKS technique has far fewer parameters to tune. This makes it advantageous for fast, "out-of-the-box" data exploration. The PIKS technique is scalable and can readily ingest new datasets. Hence, it can provide valuable, up-to-date insights to citizens, patients and policy-makers. We have made our code open source, and with the availability of open data, other researchers can easily reproduce and extend our work. This will help promote a deeper understanding of healthcare policies and public health issues.

METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens

  • Authors: Zhanyu Wang, Lingqiao Liu, Lei Wang, Luping Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02211
  • Pdf link: https://arxiv.org/pdf/2304.02211
  • Abstract
    In clinical scenarios, multi-specialist consultation could significantly benefit the diagnosis, especially for intricate cases. This inspires us to explore a "multi-expert joint diagnosis" mechanism to upgrade the existing "single expert" framework commonly seen in the current literature. To this end, we propose METransformer, a method to realize this idea with a transformer-based backbone. The key design of our method is the introduction of multiple learnable "expert" tokens into both the transformer encoder and decoder. In the encoder, each expert token interacts with both vision tokens and other expert tokens to learn to attend different image regions for image representation. These expert tokens are encouraged to capture complementary information by an orthogonal loss that minimizes their overlap. In the decoder, each attended expert token guides the cross-attention between input words and visual tokens, thus influencing the generated report. A metrics-based expert voting strategy is further developed to generate the final report. By the multi-experts concept, our model enjoys the merits of an ensemble-based approach but through a manner that is computationally more efficient and supports more sophisticated interactions among experts. Experimental results demonstrate the promising performance of our proposed model on two widely used benchmarks. Last but not least, the framework-level innovation makes our work ready to incorporate advances on existing "single-expert" models to further improve its performance.

BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation

  • Authors: Junheum Park, Jintae Kim, Chang-Su Kim
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02225
  • Pdf link: https://arxiv.org/pdf/2304.02225
  • Abstract
    A novel 4K video frame interpolator based on bilateral transformer (BiFormer) is proposed in this paper, which performs three steps: global motion estimation, local motion refinement, and frame synthesis. First, in global motion estimation, we predict symmetric bilateral motion fields at a coarse scale. To this end, we propose BiFormer, the first transformer-based bilateral motion estimator. Second, we refine the global motion fields efficiently using blockwise bilateral cost volumes (BBCVs). Third, we warp the input frames using the refined motion fields and blend them to synthesize an intermediate frame. Extensive experiments demonstrate that the proposed BiFormer algorithm achieves excellent interpolation performance on 4K datasets. The source codes are available at https://github.com/JunHeum/BiFormer.

Towards Efficient Task-Driven Model Reprogramming with Foundation Models

  • Authors: Shoukai Xu, Jiangchao Yao, Ran Luo, Shuhai Zhang, Zihao Lian, Mingkui Tan, Yaowei Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02263
  • Pdf link: https://arxiv.org/pdf/2304.02263
  • Abstract
    Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data. However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations. Moreover, the data used for pretraining foundation models are usually invisible and very different from the target data of downstream tasks. This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task that has a quite different architecture with only downstream target data. Existing transfer learning or knowledge distillation methods depend on either the same model structure or finetuning of the foundation model. Thus, naively introducing these methods can be either infeasible or very inefficient. To address this, we propose a Task-Driven Model Reprogramming (TDMR) framework. Specifically, we reprogram the foundation model to project the knowledge into a proxy space, which alleviates the adverse effect of task mismatch and domain inconsistency. Then, we reprogram the target model via progressive distillation from the proxy space to efficiently learn the knowledge from the reprogrammed foundation model. TDMR is compatible with different pre-trained model types (CNN, transformer or their mix) and limited target data, and promotes the wide applications of vision foundation models to downstream tasks in a cost-effective manner. Extensive experiments on different downstream classification tasks and target model structures demonstrate the effectiveness of our methods with both CNNs and transformer foundation models.

About optimal loss function for training physics-informed neural networks under respecting causality

  • Authors: Vasiliy A. Es'kin, Danil V. Davydov, Ekaterina D. Egorova, Alexey O. Malkhanov, Mikhail A. Akhukov, Mikhail E. Smorkalov
  • Subjects: Numerical Analysis (math.NA); Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph)
  • Arxiv link: https://arxiv.org/abs/2304.02282
  • Pdf link: https://arxiv.org/pdf/2304.02282
  • Abstract
    A method is presented that allows to reduce a problem described by differential equations with initial and boundary conditions to the problem described only by differential equations. The advantage of using the modified problem for physics-informed neural networks (PINNs) methodology is that it becomes possible to represent the loss function in the form of a single term associated with differential equations, thus eliminating the need to tune the scaling coefficients for the terms related to boundary and initial conditions. The weighted loss functions respecting causality were modified and new weighted loss functions based on generalized functions are derived. Numerical experiments have been carried out for a number of problems, demonstrating the accuracy of the proposed methods.

Deep Quantigraphic Image Enhancement via Comparametric Equations

  • Authors: Xiaomeng Wu, Yongqing Sun, Akisato Kimura
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02285
  • Pdf link: https://arxiv.org/pdf/2304.02285
  • Abstract
    Most recent methods of deep image enhancement can be generally classified into two types: decompose-and-enhance and illumination estimation-centric. The former is usually less efficient, and the latter is constrained by a strong assumption regarding image reflectance as the desired enhancement result. To alleviate this constraint while retaining high efficiency, we propose a novel trainable module that diversifies the conversion from the low-light image and illumination map to the enhanced image. It formulates image enhancement as a comparametric equation parameterized by a camera response function and an exposure compensation ratio. By incorporating this module in an illumination estimation-centric DNN, our method improves the flexibility of deep image enhancement, limits the computational burden to illumination estimation, and allows for fully unsupervised learning adaptable to the diverse demands of different tasks.

A step towards the applicability of algorithms based on invariant causal learning on observational data

  • Authors: Borja Guerrero Santillan
  • Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
  • Arxiv link: https://arxiv.org/abs/2304.02286
  • Pdf link: https://arxiv.org/pdf/2304.02286
  • Abstract
    Machine learning can benefit from causal discovery for interpretation and from causal inference for generalization. In this line of research, a few invariant learning algorithms for out-of-distribution (OOD) generalization have been proposed by using multiple training environments to find invariant relationships. Some of them are focused on causal discovery as Invariant Causal Prediction (ICP), which finds causal parents of a variable of interest, and some directly provide a causal optimal predictor that generalizes well in OOD environments as Invariant Risk Minimization (IRM). This group of algorithms works under the assumption of multiple environments that represent different interventions in the causal inference context. Those environments are not normally available when working with observational data and real-world applications. Here we propose a method to generate them in an efficient way. We assess the performance of this unsupervised learning problem by implementing ICP on simulated data. We also show how to apply ICP efficiently integrated with our method for causal discovery. Finally, we proposed an improved version of our method in combination with ICP for datasets with multiple covariates where ICP and other causal discovery methods normally degrade in performance.

Efficient Deduplication and Leakage Detection in Large Scale Image Datasets with a focus on the CrowdAI Mapping Challenge Dataset

  • Authors: Yeshwanth Kumar Adimoolam, Bodhiswatta Chatterjee, Charalambos Poullis, Melinos Averkiou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02296
  • Pdf link: https://arxiv.org/pdf/2304.02296
  • Abstract
    Recent advancements in deep learning and computer vision have led to widespread use of deep neural networks to extract building footprints from remote-sensing imagery. The success of such methods relies on the availability of large databases of high-resolution remote sensing images with high-quality annotations. The CrowdAI Mapping Challenge Dataset is one of these datasets that has been used extensively in recent years to train deep neural networks. This dataset consists of $ \sim\ $280k training images and $ \sim\ $60k testing images, with polygonal building annotations for all images. However, issues such as low-quality and incorrect annotations, extensive duplication of image samples, and data leakage significantly reduce the utility of deep neural networks trained on the dataset. Therefore, it is an imperative pre-condition to adopt a data validation pipeline that evaluates the quality of the dataset prior to its use. To this end, we propose a drop-in pipeline that employs perceptual hashing techniques for efficient de-duplication of the dataset and identification of instances of data leakage between training and testing splits. In our experiments, we demonstrate that nearly 250k($ \sim\ $90%) images in the training split were identical. Moreover, our analysis on the validation split demonstrates that roughly 56k of the 60k images also appear in the training split, resulting in a data leakage of 93%. The source code used for the analysis and de-duplication of the CrowdAI Mapping Challenge dataset is publicly available at https://github.com/yeshwanth95/CrowdAI_Hash_and_search .

FASTAGEDS: Fast Approximate Graph Entity Dependency Discovery

  • Authors: Guangtong Zhou, Selasi Kwashie, Yidi Zhang, Michael Bewong, Vincent M. Nofong, Debo Cheng, Keqing He, Zaiwen Feng
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.02323
  • Pdf link: https://arxiv.org/pdf/2304.02323
  • Abstract
    This paper studies the discovery of approximate rules in property graphs. We propose a semantically meaningful measure of error for mining graph entity dependencies (GEDs) at almost hold, to tolerate errors and inconsistencies that exist in real-world graphs. We present a new characterisation of GED satisfaction, and devise a depth-first search strategy to traverse the search space of candidate rules efficiently. Further, we perform experiments to demonstrate the feasibility and scalability of our solution, FASTAGEDS, with three real-world graphs.

Direction splitting of $\varphi$-functions in exponential integrators for $d$-dimensional problems in Kronecker form

  • Authors: Marco Caliari, Fabio Cassini
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.02327
  • Pdf link: https://arxiv.org/pdf/2304.02327
  • Abstract
    In this manuscript, we propose an efficient, practical and easy-to-implement way to approximate actions of $\varphi$-functions for matrices with $d$-dimensional Kronecker sum structure in the context of exponential integrators up to second order. The method is based on a direction splitting of the involved matrix functions, which lets us exploit the highly efficient level 3 BLAS for the actual computation of the required actions in a $\mu$-mode fashion. The approach has been successfully tested on two- and three-dimensional problems with various exponential integrators, resulting in a consistent speedup with respect to a technique designed to compute actions of $\varphi$-functions for Kronecker sums.

SMPConv: Self-moving Point Representations for Continuous Convolution

  • Authors: Sanghyeon Kim, Eunbyung Park
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02330
  • Pdf link: https://arxiv.org/pdf/2304.02330
  • Abstract
    Continuous convolution has recently gained prominence due to its ability to handle irregularly sampled data and model long-term dependency. Also, the promising experimental results of using large convolutional kernels have catalyzed the development of continuous convolution since they can construct large kernels very efficiently. Leveraging neural networks, more specifically multilayer perceptrons (MLPs), is by far the most prevalent approach to implementing continuous convolution. However, there are a few drawbacks, such as high computational costs, complex hyperparameter tuning, and limited descriptive power of filters. This paper suggests an alternative approach to building a continuous convolution without neural networks, resulting in more computationally efficient and improved performance. We present self-moving point representations where weight parameters freely move, and interpolation schemes are used to implement continuous functions. When applied to construct convolutional kernels, the experimental results have shown improved performance with drop-in replacement in the existing frameworks. Due to its lightweight structure, we are first to demonstrate the effectiveness of continuous convolution in a large-scale setting, e.g., ImageNet, presenting the improvements over the prior arts. Our code is available on https://github.com/sangnekim/SMPConv

Efficient Optimization-based Cable Force Allocation for Geometric Control of Multiple Quadrotors Transporting a Payload

  • Authors: Khaled Wahba, Wolfgang Hönig
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02359
  • Pdf link: https://arxiv.org/pdf/2304.02359
  • Abstract
    We consider transporting a heavy payload that is attached to multiple quadrotors. The current state-of-the-art controllers either do not avoid inter-robot collision at all, leading to crashes when tasked with carrying payloads that are small in size compared to the cable lengths, or use computational demanding nonlinear optimization. We propose an extension to an existing efficient geometric payload transport controller to effectively avoid such collisions by designing an optimized cable force allocation method, and thus retaining the original stability properties. Our approach introduces a cascade of carefully designed quadratic programs that can be solved efficiently on highly constrained embedded flight controllers. We demonstrate our method on challenging scenarios with up to three small quadrotors with various payloads and cable lengths, with our controller running in real-time directly on the robots.

Robust Performance Analysis for Time-Varying Multi-Agent Systems with Stochastic Packet Loss

  • Authors: Christian Hespe, Herbert Werner
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02393
  • Pdf link: https://arxiv.org/pdf/2304.02393
  • Abstract
    Recently, a scalable approach to system analysis and controller synthesis for homogeneous multi-agent systems with Bernoulli distributed packet loss has been proposed. As a key result of that line of work, it was shown how to obtain upper bounds on the $H_2$-norm that are robust with respect to uncertain interconnection topologies. The main contribution of the current paper is to show that the same upper bounds hold not only for uncertain but also time-varying topologies that are superimposed with the stochastic packet loss. Because the results are formulated in terms of linear matrix inequalities that are independent of the number of agents, multi-agent systems of any size can be analysed efficiently. The applicability of the approach is demonstrated on a numerical first-order consensus example, on which the obtained upper bounds are compared to estimates from Monte-Carlo simulations.

Relative Entropy-Based Waveform Optimization for Rician Target Detection with Dual-Function Radar Communication Systems

  • Authors: Xuyang Wang, Bo Tang, Wenjun Wu, Da Li
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.02409
  • Pdf link: https://arxiv.org/pdf/2304.02409
  • Abstract
    In this paper, we consider waveform design for dualfunction radar-communication systems based on multiple-inputmultiple-out arrays. To achieve better Rician target detection performance, we use the relative entropy associated with the formulated detection problem as the design metric. We also impose a multiuser interference energy constraint on the waveforms to ensure the achievable sum-rate of the communications. Two algorithms are presented to tackle the nonlinear non-convex waveform design problem. In the first algorithm, we derive a quadratic function to minorize the objective function. To tackle the quadratically constrained quadratic programming problem at each iteration, a semidefinite relaxation approach followed by a rank-one decomposition procedure and an efficient alternating direction method of multipliers (ADMM) are proposed, respectively. In the second algorithm, we present a novel ADMM algorithm to tackle the optimization problem and employ an efficient minorization-maximization approach in the inner loop of the ADMM algorithm. Numerical results demonstrate the superiority of both algorithms. Moreover, the presented algorithms can be extended to synthesize peak-to-average-power ratio constrained waveforms, which allows the radio frequency amplifier to operate at an increased efficiency.

Payload Grasping and Transportation by a Quadrotor with a Hook-Based Manipulator

  • Authors: Péter Antal, Tamás Péni, Roland Tóth
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02444
  • Pdf link: https://arxiv.org/pdf/2304.02444
  • Abstract
    The paper proposes an efficient trajectory planning and control approach for payload grasping and transportation using an aerial manipulator. The proposed manipulator structure consists of a hook attached to a quadrotor using a 1 DoF revolute joint. To perform payload grasping, transportation, and release, first, time-optimal reference trajectories are designed through specific waypoints to ensure the fast and reliable execution of the tasks. Then, a two-stage motion control approach is developed based on a robust geometric controller for precise and reliable reference tracking and a linear--quadratic payload regulator for rapid setpoint stabilization of the payload swing. The proposed control architecture and design are evaluated in a high-fidelity physical simulator with external disturbances and also in real flight experiments.

Doubly Stochastic Matrix Models for Estimation of Distribution Algorithms

  • Authors: Valentino Santucci, Josu Ceberio
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.02458
  • Pdf link: https://arxiv.org/pdf/2304.02458
  • Abstract
    Problems with solutions represented by permutations are very prominent in combinatorial optimization. Thus, in recent decades, a number of evolutionary algorithms have been proposed to solve them, and among them, those based on probability models have received much attention. In that sense, most efforts have focused on introducing algorithms that are suited for solving ordering/ranking nature problems. However, when it comes to proposing probability-based evolutionary algorithms for assignment problems, the works have not gone beyond proposing simple and in most cases univariate models. In this paper, we explore the use of Doubly Stochastic Matrices (DSM) for optimizing matching and assignment nature permutation problems. To that end, we explore some learning and sampling methods to efficiently incorporate DSMs within the picture of evolutionary algorithms. Specifically, we adopt the framework of estimation of distribution algorithms and compare DSMs to some existing proposals for permutation problems. Conducted preliminary experiments on instances of the quadratic assignment problem validate this line of research and show that DSMs may obtain very competitive results, while computational cost issues still need to be further investigated.

Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings

  • Authors: Ulf A. Hamster, Ji-Ung Lee, Alexander Geyken, Iryna Gurevych
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.02481
  • Pdf link: https://arxiv.org/pdf/2304.02481
  • Abstract
    Training and inference on edge devices often requires an efficient setup due to computational limitations. While pre-computing data representations and caching them on a server can mitigate extensive edge device computation, this leads to two challenges. First, the amount of storage required on the server that scales linearly with the number of instances. Second, the bandwidth required to send extensively large amounts of data to an edge device. To reduce the memory footprint of pre-computed data representations, we propose a simple, yet effective approach that uses randomly initialized hyperplane projections. To further reduce their size by up to 98.96%, we quantize the resulting floating-point representations into binary vectors. Despite the greatly reduced size, we show that the embeddings remain effective for training models across various English and German sentence classification tasks that retain 94%--99% of their floating-point.

Opening the random forest black box by the analysis of the mutual impact of features

  • Authors: Lucas F. Voges, Lukas C. Jarren, Stephan Seifert
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02490
  • Pdf link: https://arxiv.org/pdf/2304.02490
  • Abstract
    Random forest is a popular machine learning approach for the analysis of high-dimensional data because it is flexible and provides variable importance measures for the selection of relevant features. However, the complex relationships between the features are usually not considered for the selection and thus also neglected for the characterization of the analysed samples. Here we propose two novel approaches that focus on the mutual impact of features in random forests. Mutual forest impact (MFI) is a relation parameter that evaluates the mutual association of the featurs to the outcome and, hence, goes beyond the analysis of correlation coefficients. Mutual impurity reduction (MIR) is an importance measure that combines this relation parameter with the importance of the individual features. MIR and MFI are implemented together with testing procedures that generate p-values for the selection of related and important features. Applications to various simulated data sets and the comparison to other methods for feature selection and relation analysis show that MFI and MIR are very promising to shed light on the complex relationships between features and outcome. In addition, they are not affected by common biases, e.g. that features with many possible splits or high minor allele frequencies are prefered.

Supporting Energy-Based Learning With An Ising Machine Substrate: A Case Study on RBM

  • Authors: Uday Kumar Reddy Vengalam, Yongchao Liu, Tong Geng, Hui Wu, Michael Huang
  • Subjects: Emerging Technologies (cs.ET)
  • Arxiv link: https://arxiv.org/abs/2304.02525
  • Pdf link: https://arxiv.org/pdf/2304.02525
  • Abstract
    Nature apparently does a lot of computation constantly. If we can harness some of that computation at an appropriate level, we can potentially perform certain type of computation (much) faster and more efficiently than we can do with a von Neumann computer. Indeed, many powerful algorithms are inspired by nature and are thus prime candidates for nature-based computation. One particular branch of this effort that has seen some recent rapid advances is Ising machines. Some Ising machines are already showing better performance and energy efficiency for optimization problems. Through design iterations and co-evolution between hardware and algorithm, we expect more benefits from nature-based computing systems. In this paper, we make a case for an augmented Ising machine suitable for both training and inference using an energy-based machine learning algorithm. We show that with a small change, the Ising substrate accelerate key parts of the algorithm and achieve non-trivial speedup and efficiency gain. With a more substantial change, we can turn the machine into a self-sufficient gradient follower to virtually complete training entirely in hardware. This can bring about 29x speedup and about 1000x reduction in energy compared to a Tensor Processing Unit (TPU) host.

Conformal Off-Policy Evaluation in Markov Decision Processes

  • Authors: Daniele Foffano, Alessio Russo, Alexandre Proutiere
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02574
  • Pdf link: https://arxiv.org/pdf/2304.02574
  • Abstract
    Reinforcement Learning aims at identifying and evaluating efficient control policies from data. In many real-world applications, the learner is not allowed to experiment and cannot gather data in an online manner (this is the case when experimenting is expensive, risky or unethical). For such applications, the reward of a given policy (the target policy) must be estimated using historical data gathered under a different policy (the behavior policy). Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees. We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty. The main challenge in OPE stems from the distribution shift due to the discrepancies between the target and the behavior policies. We propose and empirically evaluate different ways to deal with this shift. Some of these methods yield conformalized intervals with reduced length compared to existing approaches, while maintaining the same certainty level.

Energy Efficiency of Unsourced Random Access over the Binary-Input Gaussian Channel

  • Authors: Anton Glebov, Pavel Rybin, Kirill Andreev, Alexey Frolov
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2304.02598
  • Pdf link: https://arxiv.org/pdf/2304.02598
  • Abstract
    We investigate the fundamental limits of the unsourced random access over the binary-input Gaussian channel. By fundamental limits, we mean the minimal energy per bit required to achieve the target per-user probability of error. The original method proposed by Y. Polyanskiy (2017) and based on Gallager's trick does not work well for binary signaling. We utilize Fano's method, which is based on the choice of the so-called ``good'' region. We apply this method for the cases of Gaussian and binary codebooks and obtain two achievability bounds. The first bound is very close to Polyanskiy's bound but does not lead to any improvement. At the same time, the numerical results show that the bound for the binary case practically coincides with the bound for the Gaussian codebook. Thus, we conclude that binary modulation does not lead to performance degradation, and energy-efficient schemes with binary modulation do exist.

A Checklist to Publish Collections as Data in GLAM Institutions

  • Authors: Gustavo Candela, Nele Gabriëls, Sally Chambers, Thuy-An Pham, Sarah Ames, Neil Fitzgerald, Katrine Hofmann, Victor Harbo, Abigail Potter, Meghan Ferriter, Eileen Manchester, Alba Irollo, Ellen Van Keer, Mahendra Mahey, Olga Holownia, Milena Dobreva
  • Subjects: Digital Libraries (cs.DL)
  • Arxiv link: https://arxiv.org/abs/2304.02603
  • Pdf link: https://arxiv.org/pdf/2304.02603
  • Abstract
    Large-scale digitization in Galleries, Libraries, Archives and Museums (GLAM) created the conditions for providing access to collections as data. It opened new opportunities to explore, use and reuse digital collections. Strong proponents of collections as data are the Innovation Labs which provided numerous examples of publishing datasets under open licenses in order to reuse digital content in novel and creative ways. Within the current transition to the emerging data spaces, clouds for cultural heritage and open science, the need to identify practices which support more GLAM institutions to offer datasets becomes a priority, especially within the smaller and medium-sized institutions. This paper answers the need to support GLAM institutions in facilitating the transition into publishing their digital content and to introduce collections as data services; this will also help their future efficient contribution to data spaces and cultural heritage clouds. It offers a checklist that can be used for both creating and evaluating digital collections suitable for computational use. The main contributions of this paper are i) a methodology for devising a checklist to create and assess digital collections for computational use; ii) a checklist to create and assess digital collections suitable for use with computational methods; iii) the assessment of the checklist against the practice of institutions innovating in the Collections as data field; and iv) the results obtained after the application and recommendations for the use of the checklist in GLAM institutions.

Dynamic Point Fields

  • Authors: Sergey Prokudin, Qianli Ma, Maxime Raafat, Julien Valentin, Siyu Tang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02626
  • Pdf link: https://arxiv.org/pdf/2304.02626
  • Abstract
    Recent years have witnessed significant progress in the field of neural surface reconstruction. While the extensive focus was put on volumetric and implicit approaches, a number of works have shown that explicit graphics primitives such as point clouds can significantly reduce computational complexity, without sacrificing the reconstructed surface quality. However, less emphasis has been put on modeling dynamic surfaces with point primitives. In this work, we present a dynamic point field model that combines the representational benefits of explicit point-based graphics with implicit deformation networks to allow efficient modeling of non-rigid 3D surfaces. Using explicit surface primitives also allows us to easily incorporate well-established constraints such as-isometric-as-possible regularisation. While learning this deformation model is prone to local optima when trained in a fully unsupervised manner, we propose to additionally leverage semantic information such as keypoint dynamics to guide the deformation learning. We demonstrate our model with an example application of creating an expressive animatable human avatar from a collection of 3D scans. Here, previous methods mostly rely on variants of the linear blend skinning paradigm, which fundamentally limits the expressivity of such models when dealing with complex cloth appearances such as long skirts. We show the advantages of our dynamic point field framework in terms of its representational power, learning efficiency, and robustness to out-of-distribution novel poses.

HNeRV: A Hybrid Neural Representation for Videos

  • Authors: Hao Chen, Matt Gwilliam, Ser-Nam Lim, Abhinav Shrivastava
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02633
  • Pdf link: https://arxiv.org/pdf/2304.02633
  • Abstract
    Implicit neural representations store videos as neural networks and have performed well for various vision tasks such as video compression and denoising. With frame index or positional index as input, implicit representations (NeRV, E-NeRV, \etc) reconstruct video from fixed and content-agnostic embeddings. Such embedding largely limits the regression capacity and internal generalization for video interpolation. In this paper, we propose a Hybrid Neural Representation for Videos (HNeRV), where a learnable encoder generates content-adaptive embeddings, which act as the decoder input. Besides the input embedding, we introduce HNeRV blocks, which ensure model parameters are evenly distributed across the entire network, such that higher layers (layers near the output) can have more capacity to store high-resolution content and video details. With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks for both reconstruction quality ($+4.7$ PSNR) and convergence speed ($16\times$ faster), and shows better internal generalization. As a simple and efficient video representation, HNeRV also shows decoding advantages for speed, flexibility, and deployment, compared to traditional codecs~(H.264, H.265) and learning-based compression methods. Finally, we explore the effectiveness of HNeRV on downstream tasks such as video compression and video inpainting. We provide project page at https://haochen-rye.github.io/HNeRV, and Code at https://github.com/haochen-rye/HNeRV

Segment Anything

  • Authors: Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02643
  • Pdf link: https://arxiv.org/pdf/2304.02643
  • Abstract
    We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at https://segment-anything.com to foster research into foundation models for computer vision.

Keyword: faster

Initialization Approach for Nonlinear State-Space Identification via the Subspace Encoder Approach

  • Authors: Rishi Ramkannan, Gerben I. Beintema, Roland Tóth, Maarten Schoukens
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02119
  • Pdf link: https://arxiv.org/pdf/2304.02119
  • Abstract
    The SUBNET neural network architecture has been developed to identify nonlinear state-space models from input-output data. To achieve this, it combines the rolled-out nonlinear state-space equations and a state encoder function, both parameterised as a neural network. The encoder function is introduced to reconstruct the current state from past input-output data. Hence it enables the forward simulation of the rolled-out state-space model. While this approach has shown to provide high-accuracy and consistent model estimation, its convergence can be significantly improved by efficient initialization of the training process. This paper focuses on such an initialisation of the subspace encoder approach using the Best Linear Approximation (BLA). Using the BLA provided state-space matrices and its associated reconstructability map both the state-transition part of the network and the encoder are initialized. The performance of the improved initialisation scheme is evaluated on a Wiener-Hammerstein simulation example and a benchmark dataset. The results show that for a weakly nonlinear system, the proposed initialisation based on the linear reconstructability map results in a faster convergence and a better model quality.

The Bit Complexity of Efficient Continuous Optimization

  • Authors: Mehrdad Ghadiri, Richard Peng, Santosh S. Vempala
  • Subjects: Data Structures and Algorithms (cs.DS); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.02124
  • Pdf link: https://arxiv.org/pdf/2304.02124
  • Abstract
    We analyze the bit complexity of efficient algorithms for fundamental optimization problems, such as linear regression, $p$-norm regression, and linear programming (LP). State-of-the-art algorithms are iterative, and in terms of the number of arithmetic operations, they match the current time complexity of multiplying two $n$-by-$n$ matrices (up to polylogarithmic factors). However, previous work has typically assumed infinite precision arithmetic, and due to complicated inverse maintenance techniques, the actual running times of these algorithms are unknown. To settle the running time and bit complexity of these algorithms, we demonstrate that a core common subroutine, known as \emph{inverse maintenance}, is backward-stable. Additionally, we show that iterative approaches for solving constrained weighted regression problems can be accomplished with bounded-error pre-conditioners. Specifically, we prove that linear programs can be solved approximately in matrix multiplication time multiplied by polylog factors that depend on the condition number $\kappa$ of the matrix and the inner and outer radius of the LP problem. $p$-norm regression can be solved approximately in matrix multiplication time multiplied by polylog factors in $\kappa$. Lastly, linear regression can be solved approximately in input-sparsity time multiplied by polylog factors in $\kappa$. Furthermore, we present results for achieving lower than matrix multiplication time for $p$-norm regression by utilizing faster solvers for sparse linear systems.

Efficient CNNs via Passive Filter Pruning

  • Authors: Arshdeep Singh, Mark D. Plumbley
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.02319
  • Pdf link: https://arxiv.org/pdf/2304.02319
  • Abstract
    Convolutional neural networks (CNNs) have shown state-of-the-art performance in various applications. However, CNNs are resource-hungry due to their requirement of high computational complexity and memory storage. Recent efforts toward achieving computational efficiency in CNNs involve filter pruning methods that eliminate some of the filters in CNNs based on the \enquote{importance} of the filters. The majority of existing filter pruning methods are either "active", which use a dataset and generate feature maps to quantify filter importance, or "passive", which compute filter importance using entry-wise norm of the filters without involving data. Under a high pruning ratio where large number of filters are to be pruned from the network, the entry-wise norm methods eliminate relatively smaller norm filters without considering the significance of the filters in producing the node output, resulting in degradation in the performance. To address this, we present a passive filter pruning method where the filters are pruned based on their contribution in producing output by considering the operator norm of the filters. The proposed pruning method generalizes better across various CNNs compared to that of the entry-wise norm-based pruning methods. In comparison to the existing active filter pruning methods, the proposed pruning method is at least 4.5 times faster in computing filter importance and is able to achieve similar performance compared to that of the active filter pruning methods. The efficacy of the proposed pruning method is evaluated on audio scene classification and image classification using various CNNs architecture such as VGGish, DCASE21_Net, VGG-16 and ResNet-50.

Convex Optimization-based Policy Adaptation to Compensate for Distributional Shifts

  • Authors: Navid Hashemi, Justin Ruths, Jyotirmoy V. Deshmukh
  • Subjects: Systems and Control (eess.SY); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02324
  • Pdf link: https://arxiv.org/pdf/2304.02324
  • Abstract
    Many real-world systems often involve physical components or operating environments with highly nonlinear and uncertain dynamics. A number of different control algorithms can be used to design optimal controllers for such systems, assuming a reasonably high-fidelity model of the actual system. However, the assumptions made on the stochastic dynamics of the model when designing the optimal controller may no longer be valid when the system is deployed in the real-world. The problem addressed by this paper is the following: Suppose we obtain an optimal trajectory by solving a control problem in the training environment, how do we ensure that the real-world system trajectory tracks this optimal trajectory with minimal amount of error in a deployment environment. In other words, we want to learn how we can adapt an optimal trained policy to distribution shifts in the environment. Distribution shifts are problematic in safety-critical systems, where a trained policy may lead to unsafe outcomes during deployment. We show that this problem can be cast as a nonlinear optimization problem that could be solved using heuristic method such as particle swarm optimization (PSO). However, if we instead consider a convex relaxation of this problem, we can learn policies that track the optimal trajectory with much better error performance, and faster computation times. We demonstrate the efficacy of our approach on tracking an optimal path using a Dubin's car model, and collision avoidance using both a linear and nonlinear model for adaptive cruise control.

Unfolded Self-Reconstruction LSH: Towards Machine Unlearning in Approximate Nearest Neighbour Search

  • Authors: Kim Yong Tan, Lyu Yueming, Yew-Soon Ong, Ivor Tsang
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.02350
  • Pdf link: https://arxiv.org/pdf/2304.02350
  • Abstract
    Approximate nearest neighbour (ANN) search is an essential component of search engines, recommendation systems, etc. Many recent works focus on learning-based data-distribution-dependent hashing and achieve good retrieval performance. However, due to increasing demand for users' privacy and security, we often need to remove users' data information from Machine Learning (ML) models to satisfy specific privacy and security requirements. This need requires the ANN search algorithm to support fast online data deletion and insertion. Current learning-based hashing methods need retraining the hash function, which is prohibitable due to the vast time-cost of large-scale data. To address this problem, we propose a novel data-dependent hashing method named unfolded self-reconstruction locality-sensitive hashing (USR-LSH). Our USR-LSH unfolded the optimization update for instance-wise data reconstruction, which is better for preserving data information than data-independent LSH. Moreover, our USR-LSH supports fast online data deletion and insertion without retraining. To the best of our knowledge, we are the first to address the machine unlearning of retrieval problems. Empirically, we demonstrate that USR-LSH outperforms the state-of-the-art data-distribution-independent LSH in ANN tasks in terms of precision and recall. We also show that USR-LSH has significantly faster data deletion and insertion time than learning-based data-dependent hashing.

On the Power of Threshold-Based Algorithms for Detecting Cycles in the CONGEST Model

  • Authors: Pierre Fraigniaud, Maël Luce, Ioan Todinca
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.02360
  • Pdf link: https://arxiv.org/pdf/2304.02360
  • Abstract
    It is known that, for every $k\geq 2$, $C_{2k}$-freeness can be decided by a generic Monte-Carlo algorithm running in $n^{1-1/\Theta(k^2)}$ rounds in the CONGEST model. For $2\leq k\leq 5$, faster Monte-Carlo algorithms do exist, running in $O(n^{1-1/k})$ rounds, based on upper bounding the number of messages to be forwarded, and aborting search sub-routines for which this number exceeds certain thresholds. We investigate the possible extension of these threshold-based algorithms, for the detection of larger cycles. We first show that, for every $k\geq 6$, there exists an infinite family of graphs containing a $2k$-cycle for which any threshold-based algorithm fails to detect that cycle. Hence, in particular, neither $C_{12}$-freeness nor $C_{14}$-freeness can be decided by threshold-based algorithms. Nevertheless, we show that ${C_{12},C_{14}}$-freeness can still be decided by a threshold-based algorithm, running in $O(n^{1-1/7})= O(n^{0.857\dots})$ rounds, which is faster than using the generic algorithm, which would run in $O(n^{1-1/22})\simeq O(n^{0.954\dots})$ rounds. Moreover, we exhibit an infinite collection of families of cycles such that threshold-based algorithms can decide $\mathcal{F}$-freeness for every $\mathcal{F}$ in this collection.

HyPFuzz: Formal-Assisted Processor Fuzzing

  • Authors: Chen Chen, Rahul Kande, Nathan Nyugen, Flemming Andersen, Aakash Tyagi, Ahmad-Reza Sadeghi, Jeyavijayan Rajendran
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.02485
  • Pdf link: https://arxiv.org/pdf/2304.02485
  • Abstract
    Recent research has shown that hardware fuzzers can effectively detect security vulnerabilities in modern processors. However, existing hardware fuzzers do not fuzz well the hard-to-reach design spaces. Consequently, these fuzzers cannot effectively fuzz security-critical control- and data-flow logic in the processors, hence missing security vulnerabilities. To tackle this challenge, we present HyPFuzz, a hybrid fuzzer that leverages formal verification tools to help fuzz the hard-to-reach part of the processors. To increase the effectiveness of HyPFuzz, we perform optimizations in time and space. First, we develop a scheduling strategy to prevent under- or over-utilization of the capabilities of formal tools and fuzzers. Second, we develop heuristic strategies to select points in the design space for the formal tool to target. We evaluate HyPFuzz on five widely-used open-source processors. HyPFuzz detected all the vulnerabilities detected by the most recent processor fuzzer and found three new vulnerabilities that were missed by previous extensive fuzzing and formal verification. This led to two new common vulnerabilities and exposures (CVE) entries. HyPFuzz also achieves 11.68$\times$ faster coverage than the most recent processor fuzzer.

APIHarvest: Harvesting API Information from Various Online Sources

  • Authors: Ferdian Thung, Kisub Kim, Ting Zhang, Ivana Clairine Irsan, Ratnadira Widyasari, Zhou Yang, David Lo
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.02514
  • Pdf link: https://arxiv.org/pdf/2304.02514
  • Abstract
    Using APIs to develop software applications is the norm. APIs help developers to build applications faster as they do not need to reinvent the wheel. It is therefore important for developers to understand the APIs that they plan to use. Developers should also make themselves aware of relevant information updates about APIs. In order to do so, developers need to find and keep track of relevant information about the APIs that they are concerned with. Yet, the API information is scattered across various online sources, which makes it difficult to track by hand. Moreover, identifying content that is related to an API is not trivial. Motivated by these challenges, in this work, we introduce a tool named \tool that aims to ease the process of finding API information from various online sources. \tool is built on works that link APIs or libraries to various online sources. It supports finding API information on GitHub repositories, Stack Overflow's posts, tweets, YouTube videos, and common vulnerability and exposure (CVE) entries; and is extensible to support other sources.

Supporting Energy-Based Learning With An Ising Machine Substrate: A Case Study on RBM

  • Authors: Uday Kumar Reddy Vengalam, Yongchao Liu, Tong Geng, Hui Wu, Michael Huang
  • Subjects: Emerging Technologies (cs.ET)
  • Arxiv link: https://arxiv.org/abs/2304.02525
  • Pdf link: https://arxiv.org/pdf/2304.02525
  • Abstract
    Nature apparently does a lot of computation constantly. If we can harness some of that computation at an appropriate level, we can potentially perform certain type of computation (much) faster and more efficiently than we can do with a von Neumann computer. Indeed, many powerful algorithms are inspired by nature and are thus prime candidates for nature-based computation. One particular branch of this effort that has seen some recent rapid advances is Ising machines. Some Ising machines are already showing better performance and energy efficiency for optimization problems. Through design iterations and co-evolution between hardware and algorithm, we expect more benefits from nature-based computing systems. In this paper, we make a case for an augmented Ising machine suitable for both training and inference using an energy-based machine learning algorithm. We show that with a small change, the Ising substrate accelerate key parts of the algorithm and achieve non-trivial speedup and efficiency gain. With a more substantial change, we can turn the machine into a self-sufficient gradient follower to virtually complete training entirely in hardware. This can bring about 29x speedup and about 1000x reduction in energy compared to a Tensor Processing Unit (TPU) host.

HNeRV: A Hybrid Neural Representation for Videos

  • Authors: Hao Chen, Matt Gwilliam, Ser-Nam Lim, Abhinav Shrivastava
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02633
  • Pdf link: https://arxiv.org/pdf/2304.02633
  • Abstract
    Implicit neural representations store videos as neural networks and have performed well for various vision tasks such as video compression and denoising. With frame index or positional index as input, implicit representations (NeRV, E-NeRV, \etc) reconstruct video from fixed and content-agnostic embeddings. Such embedding largely limits the regression capacity and internal generalization for video interpolation. In this paper, we propose a Hybrid Neural Representation for Videos (HNeRV), where a learnable encoder generates content-adaptive embeddings, which act as the decoder input. Besides the input embedding, we introduce HNeRV blocks, which ensure model parameters are evenly distributed across the entire network, such that higher layers (layers near the output) can have more capacity to store high-resolution content and video details. With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks for both reconstruction quality ($+4.7$ PSNR) and convergence speed ($16\times$ faster), and shows better internal generalization. As a simple and efficient video representation, HNeRV also shows decoding advantages for speed, flexibility, and deployment, compared to traditional codecs~(H.264, H.265) and learning-based compression methods. Finally, we explore the effectiveness of HNeRV on downstream tasks such as video compression and video inpainting. We provide project page at https://haochen-rye.github.io/HNeRV, and Code at https://github.com/haochen-rye/HNeRV

Keyword: mobile

Coarse Grained FLS-based Processor with Prognostic Malfunction Feature for UAM Drones using FPGA

  • Authors: Hossam O. Ahmed
  • Subjects: Systems and Control (eess.SY); Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.02099
  • Pdf link: https://arxiv.org/pdf/2304.02099
  • Abstract
    Many overall safety factors need to be considered in the next generation of Urban Air Mobility (UAM) systems and addressing these can become the anchor point for such technology to reach consent for worldwide application. On the other hand, fulfilling the safety requirements from an exponential increase of prolific UAM systems, is extremely complicated, and requires careful consideration of a variety of issues. One of the key goals of these Unmanned Air Systems (UAS) is the requirement to support the launch and control of hundreds of thousands of these advanced drones in the air simultaneously. Given the impracticalities of training the corresponding number of expert pilots, achieving this goal can only be realized through safe operation in either fullautonomous or semi-autonomous modes. According to many recent studies, the majority of flight accidents are concentrated on the last three stages of a flight trip, which include the Initial Approach, Final Approach, and Landing Phases of an airplane trip. Therefore, this paper proposes a novel decentralized processing system for enhancing the safety factors during the critical phases of Vertical and/or Short Take-Off and Landing (V/STOL) drones. This has been achieved by adopting several processing and control algorithms such as an Open Fuzzy Logic System (FLS) integrated with a Flight Rules Unit (FRU), FIR filters, and a novel Prognostic Malfunction processing unit. After applying several optimization techniques, this novel coarse-grained Autonomous Landing Guidance Assistance System (ALGAS3) processing architecture has been optimized to achieve a maximum computational processing performance of 70.82 Giga Operations per Second (GOPS). Also, the proposed ALGAS3 system shows an ultra-low dynamic thermal power dissipation (I/O and core) of 145.4 mW which is ideal for mobile avionic systems using INTEL 5CGXFC9D6F27C7 FPGA chip.

Proprioception and reaction for walking among entanglements

  • Authors: Justin K. Yim, Jiming Ren, David Ologan, Selvin Garcia Gonzalez, Aaron M. Johnson
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02129
  • Pdf link: https://arxiv.org/pdf/2304.02129
  • Abstract
    Entanglements like vines and branches in natural settings or cords and pipes in human spaces prevent mobile robots from accessing many environments. Legged robots should be effective in these settings, and more so than wheeled or tracked platforms, but naive controllers quickly become entangled and stuck. In this paper we present a method for proprioception aimed specifically at the task of sensing entanglements of a robot's legs as well as a reaction strategy to disentangle legs during their swing phase as they advance to their next foothold. We demonstrate our proprioception and reaction strategy enables traversal of entanglements of many stiffnesses and geometries succeeding in 14 out of 16 trials in laboratory tests, as well as a natural outdoor environment.

Minimum algorithm sizes for self-stabilizing gathering and related problems of autonomous mobile robots

  • Authors: Yuichi Asahiro, Masafumi Yamashita
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.02212
  • Pdf link: https://arxiv.org/pdf/2304.02212
  • Abstract
    We investigate a swarm of autonomous mobile robots in the Euclidean plane. A robot has a function called {\em target function} to determine the destination point from the robots' positions. All robots in the swarm conventionally take the same target function, but there is apparent limitation in problem-solving ability. We allow the robots to take different target functions. The number of different target functions necessary and sufficient to solve a problem $\Pi$ is called the {\em minimum algorithm size} (MAS) for $\Pi$. We establish the MASs for solving the gathering and related problems from {\bf any} initial configuration, i.e., in a {\bf self-stabilizing} manner. We show, for example, for $1 \leq c \leq n$, there is a problem $\Pi_c$ such that the MAS for the $\Pi_c$ is $c$, where $n$ is the size of swarm. The MAS for the gathering problem is 2, and the MAS for the fault tolerant gathering problem is 3, when $1 \leq f (< n)$ robots may crash, but the MAS for the problem of gathering all robot (including faulty ones) at a point is not solvable (even if all robots have distinct target functions), as long as a robot may crash.

DEFLOW: Self-supervised 3D Motion Estimation of Debris Flow

  • Authors: Liyuan Zhu, Yuru Jia, Shengyu Huang, Nicholas Meyer, Andreas Wieser, Konrad Schindler, Jordan Aaron
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02569
  • Pdf link: https://arxiv.org/pdf/2304.02569
  • Abstract
    Existing work on scene flow estimation focuses on autonomous driving and mobile robotics, while automated solutions are lacking for motion in nature, such as that exhibited by debris flows. We propose DEFLOW, a model for 3D motion estimation of debris flows, together with a newly captured dataset. We adopt a novel multi-level sensor fusion architecture and self-supervision to incorporate the inductive biases of the scene. We further adopt a multi-frame temporal processing module to enable flow speed estimation over time. Our model achieves state-of-the-art optical flow and depth estimation on our dataset, and fully automates the motion estimation for debris flows. The source code and dataset are available at project page.

Keyword: pruning

Semantic Communications for Image Recovery and Classification via Deep Joint Source and Channel Coding

  • Authors: Zhonghao Lyu, Guangxu Zhu, Jie Xu, Bo Ai, Shuguang Cui
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.02317
  • Pdf link: https://arxiv.org/pdf/2304.02317
  • Abstract
    With the recent advancements in edge artificial intelligence (AI), future sixth-generation (6G) networks need to support new AI tasks such as classification and clustering apart from data recovery. Motivated by the success of deep learning, the semantic-aware and task-oriented communications with deep joint source and channel coding (JSCC) have emerged as new paradigm shifts in 6G from the conventional data-oriented communications with separate source and channel coding (SSCC). However, most existing works focused on the deep JSCC designs for one task of data recovery or AI task execution independently, which cannot be transferred to other unintended tasks. Differently, this paper investigates the JSCC semantic communications to support multi-task services, by performing the image data recovery and classification task execution simultaneously. First, we propose a new end-to-end deep JSCC framework by unifying the coding rate reduction maximization and the mean square error (MSE) minimization in the loss function. Here, the coding rate reduction maximization facilitates the learning of discriminative features for enabling to perform classification tasks directly in the feature space, and the MSE minimization helps the learning of informative features for high-quality image data recovery. Next, to further improve the robustness against variational wireless channels, we propose a new gated deep JSCC design, in which a gated net is incorporated for adaptively pruning the output features to adjust their dimensions based on channel conditions. Finally, we present extensive numerical experiments to validate the performance of our proposed deep JSCC designs as compared to various benchmark schemes.

Efficient CNNs via Passive Filter Pruning

  • Authors: Arshdeep Singh, Mark D. Plumbley
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.02319
  • Pdf link: https://arxiv.org/pdf/2304.02319
  • Abstract
    Convolutional neural networks (CNNs) have shown state-of-the-art performance in various applications. However, CNNs are resource-hungry due to their requirement of high computational complexity and memory storage. Recent efforts toward achieving computational efficiency in CNNs involve filter pruning methods that eliminate some of the filters in CNNs based on the \enquote{importance} of the filters. The majority of existing filter pruning methods are either "active", which use a dataset and generate feature maps to quantify filter importance, or "passive", which compute filter importance using entry-wise norm of the filters without involving data. Under a high pruning ratio where large number of filters are to be pruned from the network, the entry-wise norm methods eliminate relatively smaller norm filters without considering the significance of the filters in producing the node output, resulting in degradation in the performance. To address this, we present a passive filter pruning method where the filters are pruned based on their contribution in producing output by considering the operator norm of the filters. The proposed pruning method generalizes better across various CNNs compared to that of the entry-wise norm-based pruning methods. In comparison to the existing active filter pruning methods, the proposed pruning method is at least 4.5 times faster in computing filter importance and is able to achieve similar performance compared to that of the active filter pruning methods. The efficacy of the proposed pruning method is evaluated on audio scene classification and image classification using various CNNs architecture such as VGGish, DCASE21_Net, VGG-16 and ResNet-50.

Keyword: voxel

There is no result

Keyword: lidar

Re-Evaluating LiDAR Scene Flow for Autonomous Driving

  • Authors: Nathaniel Chodosh, Deva Ramanan, Simon Lucey
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02150
  • Pdf link: https://arxiv.org/pdf/2304.02150
  • Abstract
    Current methods for self-supervised LiDAR scene flow estimation work poorly on real data. A variety of flaws in common evaluation protocols have caused leading approaches to focus on problems that do not exist in real data. We analyze a suite of recent works and find that despite their focus on deep learning, the main challenges of the LiDAR scene flow problem -- removing the dominant rigid motion and robustly estimating the simple motions that remain -- can be more effectively solved with classical techniques such as ICP motion compensation and enforcing piecewise rigid assumptions. We combine these steps with a test-time optimization method to form a state-of-the-art system that does not require any training data. Because our final approach is dataless, it can be applied on different datasets with diverse LiDAR rigs without retraining. Our proposed approach outperforms all existing methods on Argoverse 2.0, halves the error rate on NuScenes, and even rivals the performance of supervised networks on Waymo and lidarKITTI.

GINA-3D: Learning to Generate Implicit Neural Assets in the Wild

  • Authors: Bokui Shen, Xinchen Yan, Charles R. Qi, Mahyar Najibi, Boyang Deng, Leonidas Guibas, Yin Zhou, Dragomir Anguelov
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02163
  • Pdf link: https://arxiv.org/pdf/2304.02163
  • Abstract
    Modeling the 3D world from sensor data for simulation is a scalable way of developing testing and validation environments for robotic learning problems such as autonomous driving. However, manually creating or re-creating real-world-like environments is difficult, expensive, and not scalable. Recent generative model techniques have shown promising progress to address such challenges by learning 3D assets using only plentiful 2D images -- but still suffer limitations as they leverage either human-curated image datasets or renderings from manually-created synthetic 3D environments. In this paper, we introduce GINA-3D, a generative model that uses real-world driving data from camera and LiDAR sensors to create realistic 3D implicit neural assets of diverse vehicles and pedestrians. Compared to the existing image datasets, the real-world driving setting poses new challenges due to occlusions, lighting-variations and long-tail distributions. GINA-3D tackles these challenges by decoupling representation learning and generative modeling into two stages with a learned tri-plane latent structure, inspired by recent advances in generative modeling of images. To evaluate our approach, we construct a large-scale object-centric dataset containing over 520K images of vehicles and pedestrians from the Waymo Open Dataset, and a new set of 80K images of long-tail instances such as construction equipment, garbage trucks, and cable cars. We compare our model with existing approaches and demonstrate that it achieves state-of-the-art performance in quality and diversity for both generated images and geometries.

Can a Laplace PDE Define Air Corridors through Low-Altitude Airspace?

  • Authors: Aeris El Asslouj, Ella Atkins, Hossein Rastgoftar
  • Subjects: Systems and Control (eess.SY); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2304.02175
  • Pdf link: https://arxiv.org/pdf/2304.02175
  • Abstract
    This paper develops a high-density air corridor traffic flow model for Uncrewed Aircraft System (UAS) operation in urban low altitude airspace. To maximize throughput with safe separation guarantees, we define an airspace spatiotemporal planning problem. For the spatial planning, we propose a multi-floor UAS coordination structure divided into a finite number of air corridors safely wrapping buildings and obstacles. We use the USGS Lidar data to map buildings and in turn generate air corridors by modeling UAS coordination as ideal fluid flow with the streamlines obtained by solving the Laplace partial differential equation (PDE). Proper boundary conditions for the differential equations are imposed to direct air corridors along the floors desired motion direction. For temporal planning, we use 4-dimensional path-finding through the corridor network with A* search to maximize airspace usability given each UAS initial and destination waypoint pair.

Keyword: diffusion

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

  • Authors: Alberto Baldrati, Davide Morelli, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2304.02051
  • Pdf link: https://arxiv.org/pdf/2304.02051
  • Abstract
    Fashion illustration is used by designers to communicate their vision and to bring the design idea from conceptualization to realization, showing how clothes interact with the human body. In this context, computer vision can thus be used to improve the fashion design process. Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. We tackle this problem by proposing a new architecture based on latent diffusion models, an approach that has not been used before in the fashion domain. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner. Experimental results on these new datasets demonstrate the effectiveness of our proposal, both in terms of realism and coherence with the given multimodal inputs. Source code and collected multimodal annotations will be publicly released at: https://github.com/aimagelab/multimodal-garment-designer.

A Diffusion-based Method for Multi-turn Compositional Image Generation

  • Authors: Chao Wang, Xiaoyu Yang, Jinmiao Huang, Kevin Ferreira
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02192
  • Pdf link: https://arxiv.org/pdf/2304.02192
  • Abstract
    Multi-turn compositional image generation (M-CIG) is a challenging task that aims to iteratively manipulate a reference image given a modification text. While most of the existing methods for M-CIG are based on generative adversarial networks (GANs), recent advances in image generation have demonstrated the superiority of diffusion models over GANs. In this paper, we propose a diffusion-based method for M-CIG named conditional denoising diffusion with image compositional matching (CDD-ICM). We leverage CLIP as the backbone of image and text encoders, and incorporate a gated fusion mechanism, originally proposed for question answering, to compositionally fuse the reference image and the modification text at each turn of M-CIG. We introduce a conditioning scheme to generate the target image based on the fusion results. To prioritize the semantic quality of the generated target image, we learn an auxiliary image compositional match (ICM) objective, along with the conditional denoising diffusion (CDD) objective in a multi-task learning framework. Additionally, we also perform ICM guidance and classifier-free guidance to improve performance. Experimental results show that CDD-ICM achieves state-of-the-art results on two benchmark datasets for M-CIG, i.e., CoDraw and i-CLEVR.

JPEG Compressed Images Can Bypass Protections Against AI Editing

  • Authors: Pedro Sandoval-Segura, Jonas Geiping, Tom Goldstein
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02234
  • Pdf link: https://arxiv.org/pdf/2304.02234
  • Abstract
    Recently developed text-to-image diffusion models make it easy to edit or create high-quality images. Their ease of use has raised concerns about the potential for malicious editing or deepfake creation. Imperceptible perturbations have been proposed as a means of protecting images from malicious editing by preventing diffusion models from generating realistic images. However, we find that the aforementioned perturbations are not robust to JPEG compression, which poses a major weakness because of the common usage and availability of JPEG. We discuss the importance of robustness for additive imperceptible perturbations and encourage alternative approaches to protect images against editing.

Few-shot Semantic Image Synthesis with Class Affinity Transfer

  • Authors: Marlène Careil, Jakob Verbeek, Stéphane Lathuilière
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02321
  • Pdf link: https://arxiv.org/pdf/2304.02321
  • Abstract
    Semantic image synthesis aims to generate photo realistic images given a semantic segmentation map. Despite much recent progress, training them still requires large datasets of images annotated with per-pixel label maps that are extremely tedious to obtain. To alleviate the high annotation cost, we propose a transfer method that leverages a model trained on a large source dataset to improve the learning ability on small target datasets via estimated pairwise relations between source and target classes. The class affinity matrix is introduced as a first layer to the source model to make it compatible with the target label maps, and the source model is then further finetuned for the target domain. To estimate the class affinities we consider different approaches to leverage prior knowledge: semantic segmentation on the source domain, textual label embeddings, and self-supervised vision features. We apply our approach to GAN-based and diffusion-based architectures for semantic synthesis. Our experiments show that the different ways to estimate class affinity can be effectively combined, and that our approach significantly improves over existing state-of-the-art transfer approaches for generative image models.

Goal-Conditioned Imitation Learning using Score-based Diffusion Policies

  • Authors: Moritz Reuss, Maximilian Li, Xiaogang Jia, Rudolf Lioutikov
  • Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02532
  • Pdf link: https://arxiv.org/pdf/2304.02532
  • Abstract
    We propose a new policy representation based on score-based diffusion models (SDMs). We apply our new policy representation in the domain of Goal-Conditioned Imitation Learning (GCIL) to learn general-purpose goal-specified policies from large uncurated datasets without rewards. Our new goal-conditioned policy architecture "$\textbf{BE}$havior generation with $\textbf{S}$c$\textbf{O}$re-based Diffusion Policies" (BESO) leverages a generative, score-based diffusion model as its policy. BESO decouples the learning of the score model from the inference sampling process, and, hence allows for fast sampling strategies to generate goal-specified behavior in just 3 denoising steps, compared to 30+ steps of other diffusion based policies. Furthermore, BESO is highly expressive and can effectively capture multi-modality present in the solution space of the play data. Unlike previous methods such as Latent Plans or C-Bet, BESO does not rely on complex hierarchical policies or additional clustering for effective goal-conditioned behavior learning. Finally, we show how BESO can even be used to learn a goal-independent policy from play-data using classifier-free guidance. To the best of our knowledge this is the first work that a) represents a behavior policy based on such a decoupled SDM b) learns an SDM based policy in the domain of GCIL and c) provides a way to simultaneously learn a goal-dependent and a goal-independent policy from play-data. We evaluate BESO through detailed simulation and show that it consistently outperforms several state-of-the-art goal-conditioned imitation learning methods on challenging benchmarks. We additionally provide extensive ablation studies and experiments to demonstrate the effectiveness of our method for effective goal-conditioned behavior generation.

Generative Novel View Synthesis with 3D-Aware Diffusion Models

  • Authors: Eric R. Chan, Koki Nagano, Matthew A. Chan, Alexander W. Bergman, Jeong Joon Park, Axel Levy, Miika Aittala, Shalini De Mello, Tero Karras, Gordon Wetzstein
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.02602
  • Pdf link: https://arxiv.org/pdf/2304.02602
  • Abstract
    We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image. Our model samples from the distribution of possible renderings consistent with the input and, even in the presence of ambiguity, is capable of rendering diverse and plausible novel views. To achieve this, our method makes use of existing 2D diffusion backbones but, crucially, incorporates geometry priors in the form of a 3D feature volume. This latent feature field captures the distribution over possible scene representations and improves our method's ability to generate view-consistent novel renderings. In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent sequences. We demonstrate state-of-the-art results on synthetic renderings and room-scale scenes; we also show compelling results for challenging, real-world objects.

GenPhys: From Physical Processes to Generative Models

  • Authors: Ziming Liu, Di Luo, Yilun Xu, Tommi Jaakkola, Max Tegmark
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph); Data Analysis, Statistics and Probability (physics.data-an); Quantum Physics (quant-ph)
  • Arxiv link: https://arxiv.org/abs/2304.02637
  • Pdf link: https://arxiv.org/pdf/2304.02637
  • Abstract
    Since diffusion models (DM) and the more recent Poisson flow generative models (PFGM) are inspired by physical processes, it is reasonable to ask: Can physical processes offer additional new generative models? We show that the answer is yes. We introduce a general family, Generative Models from Physical Processes (GenPhys), where we translate partial differential equations (PDEs) describing physical processes to generative models. We show that generative models can be constructed from s-generative PDEs (s for smooth). GenPhys subsume the two existing generative models (DM and PFGM) and even give rise to new families of generative models, e.g., "Yukawa Generative Models" inspired from weak interactions. On the other hand, some physical processes by default do not belong to the GenPhys family, e.g., the wave equation and the Schr"{o}dinger equation, but could be made into the GenPhys family with some modifications. Our goal with GenPhys is to explore and expand the design space of generative models.

Keyword: dynamic

A Bibliometric Review of Large Language Models Research from 2017 to 2023

  • Authors: Lizhou Fan, Lingyao Li, Zihui Ma, Sanggyu Lee, Huizi Yu, Libby Hemphill
  • Subjects: Digital Libraries (cs.DL); Computation and Language (cs.CL); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.02020
  • Pdf link: https://arxiv.org/pdf/2304.02020
  • Abstract
    Large language models (LLMs) are a class of language models that have demonstrated outstanding performance across a range of natural language processing (NLP) tasks and have become a highly sought-after research area, because of their ability to generate human-like language and their potential to revolutionize science and technology. In this study, we conduct bibliometric and discourse analyses of scholarly literature on LLMs. Synthesizing over 5,000 publications, this paper serves as a roadmap for researchers, practitioners, and policymakers to navigate the current landscape of LLMs research. We present the research trends from 2017 to early 2023, identifying patterns in research paradigms and collaborations. We start with analyzing the core algorithm developments and NLP tasks that are fundamental in LLMs research. We then investigate the applications of LLMs in various fields and domains including medicine, engineering, social science, and humanities. Our review also reveals the dynamic, fast-paced evolution of LLMs research. Overall, this paper offers valuable insights into the current state, impact, and potential of LLMs research and its applications.

Online Joint Assortment-Inventory Optimization under MNL Choices

  • Authors: Yong Liang, Xiaojie Mao, Shiyuan Wang
  • Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
  • Arxiv link: https://arxiv.org/abs/2304.02022
  • Pdf link: https://arxiv.org/pdf/2304.02022
  • Abstract
    We study an online joint assortment-inventory optimization problem, in which we assume that the choice behavior of each customer follows the Multinomial Logit (MNL) choice model, and the attraction parameters are unknown a priori. The retailer makes periodic assortment and inventory decisions to dynamically learn from the realized demands about the attraction parameters while maximizing the expected total profit over time. In this paper, we propose a novel algorithm that can effectively balance the exploration and exploitation in the online decision-making of assortment and inventory. Our algorithm builds on a new estimator for the MNL attraction parameters, a novel approach to incentivize exploration by adaptively tuning certain known and unknown parameters, and an optimization oracle to static single-cycle assortment-inventory planning problems with given parameters. We establish a regret upper bound for our algorithm and a lower bound for the online joint assortment-inventory optimization problem, suggesting that our algorithm achieves nearly optimal regret rate, provided that the static optimization oracle is exact. Then we incorporate more practical approximate static optimization oracles into our algorithm, and bound from above the impact of static optimization errors on the regret of our algorithm. At last, we perform numerical studies to demonstrate the effectiveness of our proposed algorithm.

Online augmentation of learned grasp sequence policies for more adaptable and data-efficient in-hand manipulation

  • Authors: Ethan K. Gordon, Rana Soltani Zarrin
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02052
  • Pdf link: https://arxiv.org/pdf/2304.02052
  • Abstract
    When using a tool, the grasps used for picking it up, reposing, and holding it in a suitable pose for the desired task could be distinct. Therefore, a key challenge for autonomous in-hand tool manipulation is finding a sequence of grasps that facilitates every step of the tool use process while continuously maintaining force closure and stability. Due to the complexity of modeling the contact dynamics, reinforcement learning (RL) techniques can provide a solution in this continuous space subject to highly parameterized physical models. However, these techniques impose a trade-off in adaptability and data efficiency. At test time the tool properties, desired trajectory, and desired application forces could differ substantially from training scenarios. Adapting to this necessitates more data or computationally expensive online policy updates. In this work, we apply the principles of discrete dynamic programming (DP) to augment RL performance with domain knowledge. Specifically, we first design a computationally simple approximation of our environment. We then demonstrate in physical simulation that performing tree searches (i.e., lookaheads) and policy rollouts with this approximation can improve an RL-derived grasp sequence policy with minimal additional online computation. Additionally, we show that pretraining a deep RL network with the DP-derived solution to the discretized problem can speed up policy training.

A Compositional Resilience Index for Computationally Efficient Safety Analysis of Interconnected Systems

  • Authors: Luyao Niu, Abdullah Al Maruf, Andrew Clark, J. Sukarno Mertoguno, Radha Poovendran
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02058
  • Pdf link: https://arxiv.org/pdf/2304.02058
  • Abstract
    Interconnected systems such as power systems and chemical processes are often required to satisfy safety properties in the presence of faults and attacks. Verifying safety of these systems, however, is computationally challenging due to nonlinear dynamics, high dimensionality, and combinatorial number of possible faults and attacks that can be incurred by the subsystems interconnected within the network. In this paper, we develop a compositional resilience index to verify safety properties of interconnected systems under faults and attacks. The resilience index is a tuple serving the following two purposes. First, it quantifies how a safety property is impacted when a subsystem is compromised by faults and attacks. Second, the resilience index characterizes the needed behavior of a subsystem during normal operations to ensure safety violations will not occur when future adverse events occur. We develop a set of sufficient conditions on the dynamics of each subsystem to satisfy its safety constraint, and leverage these conditions to formulate an optimization program to compute the resilience index. When multiple subsystems are interconnected and their resilience indices are given, we show that the safety constraints of the interconnected system can be efficiently verified by solving a system of linear inequalities. We demonstrate our developed resilience index using a numerical case study on chemical reactors connected in series.

Coarse Grained FLS-based Processor with Prognostic Malfunction Feature for UAM Drones using FPGA

  • Authors: Hossam O. Ahmed
  • Subjects: Systems and Control (eess.SY); Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.02099
  • Pdf link: https://arxiv.org/pdf/2304.02099
  • Abstract
    Many overall safety factors need to be considered in the next generation of Urban Air Mobility (UAM) systems and addressing these can become the anchor point for such technology to reach consent for worldwide application. On the other hand, fulfilling the safety requirements from an exponential increase of prolific UAM systems, is extremely complicated, and requires careful consideration of a variety of issues. One of the key goals of these Unmanned Air Systems (UAS) is the requirement to support the launch and control of hundreds of thousands of these advanced drones in the air simultaneously. Given the impracticalities of training the corresponding number of expert pilots, achieving this goal can only be realized through safe operation in either fullautonomous or semi-autonomous modes. According to many recent studies, the majority of flight accidents are concentrated on the last three stages of a flight trip, which include the Initial Approach, Final Approach, and Landing Phases of an airplane trip. Therefore, this paper proposes a novel decentralized processing system for enhancing the safety factors during the critical phases of Vertical and/or Short Take-Off and Landing (V/STOL) drones. This has been achieved by adopting several processing and control algorithms such as an Open Fuzzy Logic System (FLS) integrated with a Flight Rules Unit (FRU), FIR filters, and a novel Prognostic Malfunction processing unit. After applying several optimization techniques, this novel coarse-grained Autonomous Landing Guidance Assistance System (ALGAS3) processing architecture has been optimized to achieve a maximum computational processing performance of 70.82 Giga Operations per Second (GOPS). Also, the proposed ALGAS3 system shows an ultra-low dynamic thermal power dissipation (I/O and core) of 145.4 mW which is ideal for mobile avionic systems using INTEL 5CGXFC9D6F27C7 FPGA chip.

ConvFormer: Parameter Reduction in Transformer Models for 3D Human Pose Estimation by Leveraging Dynamic Multi-Headed Convolutional Attention

  • Authors: Alec Diaz-Arias, Dmitriy Shin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02147
  • Pdf link: https://arxiv.org/pdf/2304.02147
  • Abstract
    Recently, fully-transformer architectures have replaced the defacto convolutional architecture for the 3D human pose estimation task. In this paper we propose \textbf{\textit{ConvFormer}}, a novel convolutional transformer that leverages a new \textbf{\textit{dynamic multi-headed convolutional self-attention}} mechanism for monocular 3D human pose estimation. We designed a spatial and temporal convolutional transformer to comprehensively model human joint relations within individual frames and globally across the motion sequence. Moreover, we introduce a novel notion of \textbf{\textit{temporal joints profile}} for our temporal ConvFormer that fuses complete temporal information immediately for a local neighborhood of joint features. We have quantitatively and qualitatively validated our method on three common benchmark datasets: Human3.6M, MPI-INF-3DHP, and HumanEva. Extensive experiments have been conducted to identify the optimal hyper-parameter set. These experiments demonstrated that we achieved a \textbf{significant parameter reduction relative to prior transformer models} while attaining State-of-the-Art (SOTA) or near SOTA on all three datasets. Additionally, we achieved SOTA for Protocol III on H36M for both GT and CPN detection inputs. Finally, we obtained SOTA on all three metrics for the MPI-INF-3DHP dataset and for all three subjects on HumanEva under Protocol II.

Dynamic Adversarial Resource Allocation: the dDAB Game

  • Authors: Daigo Shishika, Yue Guan, Jason R. Marden, Michael Dorothy, Panagiotis Tsiotras, Vijay Kumar
  • Subjects: Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2304.02172
  • Pdf link: https://arxiv.org/pdf/2304.02172
  • Abstract
    This work proposes a dynamic and adversarial resource allocation problem in a graph environment, which is referred to as the dynamic Defender-Attacker Blotto (dDAB) game. A team of defender robots is tasked to ensure numerical advantage at every node in the graph against a team of attacker robots. The engagement is formulated as a discrete-time dynamic game, where the two teams reallocate their robots in sequence and each robot can move at most one hop at each time step. The game terminates with the attacker's victory if any node has more attacker robots than defender robots. Our goal is to identify the necessary and sufficient number of defender robots to guarantee defense. Through a reachability analysis, we first solve the problem for the case where the attacker team stays as a single group. The results are then generalized to the case where the attacker team can freely split and merge into subteams. Crucially, our analysis indicates that there is no incentive for the attacker team to split, which significantly reduces the search space for the attacker's winning strategies and also enables us to design defender counter-strategies using superposition. We also present an efficient numerical algorithm to identify the necessary and sufficient number of defender robots to defend a given graph. Finally, we present illustrative examples to verify the efficacy of the proposed framework.

Redrafting Requirements Modeling Using a Single Multilevel Diagram

  • Authors: Sabah Al-Fedaghi
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.02188
  • Pdf link: https://arxiv.org/pdf/2304.02188
  • Abstract
    The complexity of software-based systems has increased significantly, especially with regards to capturing requirements along with dependencies among requirements. A conceptual model is a way of thinking about and making sense of the real world s complexities. In this paper, we focused on two approaches in this context: (a) multiple models applied to the same system with simultaneous usage of dissimilar notations vs. (b) a single model that utilizes a single framework of notations. In the first approach, inconsistencies arise among models that require a great deal of painstaking discipline and coordination between them. The multiple-model notion is based on the claim that it is not possible to present all application views in a single representation, so diverse models are used, with each model representing a different view. This article advocates a second approach that utilizes a single model with multilevel (static/dynamic and behavioral) specification. To substantiate this approach s feasibility, we embrace the occurrence-only model, which comprises (a) Stoic ontology, (b) thinging machine (TM) language and (c) Lupascian logic. In this paper, we focus on TM modeling as the mechanism of single-model building. We claim that a TM can be a unifying diagrammatic language for virtually all current modeling languages. To demonstrate such a claim, we redraft almost all the diagrammatic representations in The Handbook of Requirements Modeling of the International Requirements Engineering Board. This redrafting includes context, class, activity, use case, data flow and state diagrams. The results seem to indicate that there are no difficulties in representing all views in TM.

Folklore Sampling is Optimal for Exact Hopsets: Confirming the $\sqrt{n}$ Barrier

  • Authors: Greg Bodwin, Gary Hoppenworth
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.02193
  • Pdf link: https://arxiv.org/pdf/2304.02193
  • Abstract
    For a graph $G$, a $D$-diameter-reducing exact hopset is a small set of additional edges $H$ that, when added to $G$, maintains its graph metric but guarantees that all node pairs have a shortest path in $G \cup H$ using at most $D$ edges. A shortcut set is the analogous concept for reachability. These objects have been studied since the early '90s due to applications in parallel, distributed, dynamic, and streaming graph algorithms. For most of their history, the state-of-the-art construction for either object was a simple folklore algorithm, based on randomly sampling nodes to hit long paths in the graph. However, recent breakthroughs of Kogan and Parter [SODA '22] and Bernstein and Wein [SODA '23] have finally improved over the folklore diameter bound of $\widetilde{O}(n^{1/2})$ for shortcut sets and for $(1+\epsilon)$-approximate hopsets. For both objects it is now known that one can use $O(n)$ hop-edges to reduce diameter to $\widetilde{O}(n^{1/3})$. The only setting where folklore sampling remains unimproved is for exact hopsets. Can these improvements be continued? We settle this question negatively by constructing graphs on which any exact hopset of $O(n)$ edges has diameter $\widetilde{\Omega}(n^{1/2})$. This improves on the previous lower bound of $\widetilde{\Omega}(n^{1/3})$ by Kogan and Parter [FOCS '22]. Using similar ideas, we also polynomially improve the current lower bounds for shortcut sets, constructing graphs on which any shortcut set of $O(n)$ edges reduces diameter to $\widetilde{\Omega}(n^{1/4})$. This improves on the previous lower bound of $\Omega(n^{1/6})$ by Huang and Pettie [SIAM J. Disc. Math. '18]. We also extend our constructions to provide lower bounds against $O(p)$-size exact hopsets and shortcut sets for other values of $p$; in particular, we show that folklore sampling is near-optimal for exact hopsets in the entire range of $p \in [1, n^2]$.

Algorithm and Hardness for Dynamic Attention Maintenance in Large Language Models

  • Authors: Jan van den Brand, Zhao Song, Tianyi Zhou
  • Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
  • Arxiv link: https://arxiv.org/abs/2304.02207
  • Pdf link: https://arxiv.org/pdf/2304.02207
  • Abstract
    Large language models (LLMs) have made fundamental changes in human life. The attention scheme is one of the key components over all the LLMs, such as BERT, GPT-1, Transformers, GPT-2, 3, 3.5 and 4. Inspired by previous theoretical study of static version of the attention multiplication problem [Zandieh, Han, Daliri, and Karbasi arXiv 2023, Alman and Song arXiv 2023]. In this work, we formally define a dynamic version of attention matrix multiplication problem. There are matrices $Q,K, V \in \mathbb{R}^{n \times d}$, they represent query, key and value in LLMs. In each iteration we update one entry in $K$ or $V$. In the query stage, we receive $(i,j) \in [n] \times [d]$ as input, and want to answer $(D^{-1} A V)_{i,j}$, where $A:=\exp(QK^\top) \in \mathbb{R}^{n \times n}$ is a square matrix and $D := \mathrm{diag}(A {\bf 1}_n) \in \mathbb{R}^{n \times n}$ is a diagonal matrix. Here ${\bf 1}_n$ denote a length-$n$ vector that all the entries are ones. We provide two results: an algorithm and a conditional lower bound. $\bullet$ On one hand, inspired by the lazy update idea from [Demetrescu and Italiano FOCS 2000, Sankowski FOCS 2004, Cohen, Lee and Song STOC 2019, Brand SODA 2020], we provide a data-structure that uses $O(n^{\omega(1,1,\tau)-\tau})$ amortized update time, and $O(n^{1+\tau})$ worst-case query time. $\bullet$ On the other hand, show that unless the hinted matrix vector multiplication conjecture [Brand, Nanongkai and Saranurak FOCS 2019] is false, there is no algorithm that can use both $O(n^{\omega(1,1,\tau) - \tau- \Omega(1)})$ amortized update time, and $O(n^{1+\tau-\Omega(1)})$ worst query time. In conclusion, our algorithmic result is conditionally optimal unless hinted matrix vector multiplication conjecture is false.

DiGA: Distil to Generalize and then Adapt for Domain Adaptive Semantic Segmentation

  • Authors: Fengyi Shen, Akhil Gurram, Ziyuan Liu, He Wang, Alois Knoll
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02222
  • Pdf link: https://arxiv.org/pdf/2304.02222
  • Abstract
    Domain adaptive semantic segmentation methods commonly utilize stage-wise training, consisting of a warm-up and a self-training stage. However, this popular approach still faces several challenges in each stage: for warm-up, the widely adopted adversarial training often results in limited performance gain, due to blind feature alignment; for self-training, finding proper categorical thresholds is very tricky. To alleviate these issues, we first propose to replace the adversarial training in the warm-up stage by a novel symmetric knowledge distillation module that only accesses the source domain data and makes the model domain generalizable. Surprisingly, this domain generalizable warm-up model brings substantial performance improvement, which can be further amplified via our proposed cross-domain mixture data augmentation technique. Then, for the self-training stage, we propose a threshold-free dynamic pseudo-label selection mechanism to ease the aforementioned threshold problem and make the model better adapted to the target domain. Extensive experiments demonstrate that our framework achieves remarkable and consistent improvements compared to the prior arts on popular benchmarks. Codes and models are available at https://github.com/fy-vision/DiGA

Topological Characterization of Consensus Solvability in Directed Dynamic Networks

  • Authors: Hugo Rincon Galeana, Ulrich Schmid, Kyrill Winkler, Ami Paz, Stefan Schmid
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.02316
  • Pdf link: https://arxiv.org/pdf/2304.02316
  • Abstract
    Consensus is one of the most fundamental problems in distributed computing. This paper studies the consensus problem in a synchronous dynamic directed network, in which communication is controlled by an oblivious message adversary. The question when consensus is possible in this model has already been studied thoroughly in the literature from a combinatorial perspective, and is known to be challenging. This paper presents a topological perspective on consensus solvability under oblivious message adversaries, which provides interesting new insights. Our main contribution is a topological characterization of consensus solvability, which also leads to explicit decision procedures. Our approach is based on the novel notion of a communication pseudosphere, which can be seen as the message-passing analog of the well-known standard chromatic subdivision for wait-free shared memory systems. We further push the elegance and expressiveness of the "geometric" reasoning enabled by the topological approach by dealing with uninterpreted complexes, which considerably reduce the size of the protocol complex, and by labeling facets with information flow arrows, which give an intuitive meaning to the implicit epistemic status of the faces in a protocol complex.

Convex Optimization-based Policy Adaptation to Compensate for Distributional Shifts

  • Authors: Navid Hashemi, Justin Ruths, Jyotirmoy V. Deshmukh
  • Subjects: Systems and Control (eess.SY); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02324
  • Pdf link: https://arxiv.org/pdf/2304.02324
  • Abstract
    Many real-world systems often involve physical components or operating environments with highly nonlinear and uncertain dynamics. A number of different control algorithms can be used to design optimal controllers for such systems, assuming a reasonably high-fidelity model of the actual system. However, the assumptions made on the stochastic dynamics of the model when designing the optimal controller may no longer be valid when the system is deployed in the real-world. The problem addressed by this paper is the following: Suppose we obtain an optimal trajectory by solving a control problem in the training environment, how do we ensure that the real-world system trajectory tracks this optimal trajectory with minimal amount of error in a deployment environment. In other words, we want to learn how we can adapt an optimal trained policy to distribution shifts in the environment. Distribution shifts are problematic in safety-critical systems, where a trained policy may lead to unsafe outcomes during deployment. We show that this problem can be cast as a nonlinear optimization problem that could be solved using heuristic method such as particle swarm optimization (PSO). However, if we instead consider a convex relaxation of this problem, we can learn policies that track the optimal trajectory with much better error performance, and faster computation times. We demonstrate the efficacy of our approach on tracking an optimal path using a Dubin's car model, and collision avoidance using both a linear and nonlinear model for adaptive cruise control.

Constructing and deconstructing bias: modeling privilege and mentorship in agent-based simulations

  • Authors: Andria L. Smith, Simon Heuschkel, Ksenia Keplinger, Charley M. Wu
  • Subjects: Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2304.02351
  • Pdf link: https://arxiv.org/pdf/2304.02351
  • Abstract
    Bias exists in how we pick leaders, who we perceive as being influential, and who we interact with, not only in society, but in organizational contexts. Drawing from leadership emergence and social influence theories, we investigate potential interventions that support diverse leaders. Using agent-based simulations, we model a collective search process on a fitness landscape. Agents combine individual and social learning, and are represented as a feature vector blending relevant (e.g., individual learning characteristics) and irrelevant (e.g., race or gender) features. Agents use rational principles of learning to estimate feature weights on the basis of performance predictions, which are used to dynamically define social influence in their network. We show how biases arise based on historic privilege, but can be drastically reduced through the use of an intervention (e.g. mentorship). This work provides important insights into the cognitive mechanisms underlying bias construction and deconstruction, while pointing towards real-world interventions to be tested in future empirical work.

Impact Sensitivity Analysis of Cooperative Adaptive Cruise Control Against Resource-Limited Adversaries

  • Authors: Mischa Huisman, Carlos Murguia, Erjen Lefeber, Nathan van de Wouw
  • Subjects: Systems and Control (eess.SY); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.02395
  • Pdf link: https://arxiv.org/pdf/2304.02395
  • Abstract
    Cooperative Adaptive Cruise Control (CACC) is a promising technology that allows groups of vehicles to form in automated tightly-coupled platoons. CACC schemes exploit Vehicle-to-Vehicle (V2V) wireless communications to exchange kinematic information among adjacent vehicles. However, the use of communication networks brings security concerns as cyberattacks could access the vehicles' internal networks and computers to disrupt their operation and even cause crashes. In this manuscript, we present a sensitivity analysis of standard CACC schemes against a class of resource-limited attacks. We present a modelling framework that allows us to systematically compute outer ellipsoidal approximations of reachable sets induced by attacks. We use the size of these sets as a security metric to quantify the potential damage of attacks entering the dynamics at different points and study how two key system parameters (sampling and headway constant) change these metrics. We carry out the latter sensitivity analysis for two different controller implementations (as given the available sensors there is an infinite number of realizations of the same controller) and show how different implementations can significantly affect the impact of attacks. We present extensive simulation experiments to illustrate our ideas.

AutoRL Hyperparameter Landscapes

  • Authors: Aditya Mohan, Carolin Benjamins, Konrad Wienecke, Alexander Dockhorn, Marius Lindauer
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02396
  • Pdf link: https://arxiv.org/pdf/2304.02396
  • Abstract
    Although Reinforcement Learning (RL) has shown to be capable of producing impressive results, its use is limited by the impact of its hyperparameters on performance. This often makes it difficult to achieve good results in practice. Automated RL (AutoRL) addresses this difficulty, yet little is known about the dynamics of the hyperparameter landscapes that hyperparameter optimization (HPO) methods traverse in search of optimal configurations. In view of existing AutoRL approaches dynamically adjusting hyperparameter configurations, we propose an approach to build and analyze these hyperparameter landscapes not just for one point in time but at multiple points in time throughout training. Addressing an important open question on the legitimacy of such dynamic AutoRL approaches, we provide thorough empirical evidence that the hyperparameter landscapes strongly vary over time across representative algorithms from RL literature (DQN and SAC) in different kinds of environments (Cartpole and Hopper). This supports the theory that hyperparameters should be dynamically adjusted during training and shows the potential for more insights on AutoRL problems that can be gained through landscape analyses.

Adaptive Data Augmentation for Contrastive Learning

  • Authors: Yuhan Zhang, He Zhu, Shan Yu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02451
  • Pdf link: https://arxiv.org/pdf/2304.02451
  • Abstract
    In computer vision, contrastive learning is the most advanced unsupervised learning framework. Yet most previous methods simply apply fixed composition of data augmentations to improve data efficiency, which ignores the changes in their optimal settings over training. Thus, the pre-determined parameters of augmentation operations cannot always fit well with an evolving network during the whole training period, which degrades the quality of the learned representations. In this work, we propose AdDA, which implements a closed-loop feedback structure to a generic contrastive learning network. AdDA works by allowing the network to adaptively adjust the augmentation compositions according to the real-time feedback. This online adjustment helps maintain the dynamic optimal composition and enables the network to acquire more generalizable representations with minimal computational overhead. AdDA achieves competitive results under the common linear protocol on ImageNet-100 classification (+1.11% on MoCo v2).

FPGA-Patch: Mitigating Remote Side-Channel Attacks on FPGAs using Dynamic Patch Generation

  • Authors: Mahya Morid Ahmadi, Lilas Alrahis, Ozgur Sinanoglu, Muhammad Shafique
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.02510
  • Pdf link: https://arxiv.org/pdf/2304.02510
  • Abstract
    We propose FPGA-Patch, the first-of-its-kind defense that leverages automated program repair concepts to thwart power side-channel attacks on cloud FPGAs. FPGA-Patch generates isofunctional variants of the target hardware by injecting faults and finding transformations that eliminate failure. The obtained variants display different hardware characteristics, ensuring a maximal diversity in power traces once dynamically swapped at run-time. Yet, FPGA-Patch forces the variants to have enough similarity, enabling bitstream compression and minimizing dynamic exchange costs. Considering AES running on AMD/Xilinx FPGA, FPGA-Patch increases the attacker's effort by three orders of magnitude, while preserving the performance of AES and a minimal area overhead of 14.2%.

Sensor-based Planning and Control for Robotic Systems: Introducing Clarity and Perceivability

  • Authors: Devansh R Agrawal, Dimitra Panagou
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02578
  • Pdf link: https://arxiv.org/pdf/2304.02578
  • Abstract
    We introduce an information measure, termed clarity, motivated by information entropy, and show that it has intuitive properties relevant to dynamic coverage control and informative path planning. Clarity defines the quality of the information we have about a variable of interest in an environment on a scale of [0, 1], and has useful properties for control and planning such as: (I) clarity lower bounds the expected estimation error of any estimator, and (II) given noisy measurements, clarity monotonically approaches a level q_infty < 1. We establish a connection between coverage controllers and information theory via clarity, suggesting a coverage model that is physically consistent with how information is acquired. Next, we define the notion of perceivability of an environment under a given robotic (or more generally, sensing and control) system, i.e., whether the system has sufficient sensing and actuation capabilities to gather desired information. We show that perceivability relates to the reachability of an augmented system, and derive the corresponding Hamilton-Jacobi-Bellman equations to determine perceivability. In simulations, we demonstrate how clarity is a useful concept for planning trajectories, how perceivability can be determined using reachability analysis, and how a Control Barrier Function (CBF) based controller can dramatically reduce the computational burden.

Dynamic Point Fields

  • Authors: Sergey Prokudin, Qianli Ma, Maxime Raafat, Julien Valentin, Siyu Tang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02626
  • Pdf link: https://arxiv.org/pdf/2304.02626
  • Abstract
    Recent years have witnessed significant progress in the field of neural surface reconstruction. While the extensive focus was put on volumetric and implicit approaches, a number of works have shown that explicit graphics primitives such as point clouds can significantly reduce computational complexity, without sacrificing the reconstructed surface quality. However, less emphasis has been put on modeling dynamic surfaces with point primitives. In this work, we present a dynamic point field model that combines the representational benefits of explicit point-based graphics with implicit deformation networks to allow efficient modeling of non-rigid 3D surfaces. Using explicit surface primitives also allows us to easily incorporate well-established constraints such as-isometric-as-possible regularisation. While learning this deformation model is prone to local optima when trained in a fully unsupervised manner, we propose to additionally leverage semantic information such as keypoint dynamics to guide the deformation learning. We demonstrate our model with an example application of creating an expressive animatable human avatar from a collection of 3D scans. Here, previous methods mostly rely on variants of the linear blend skinning paradigm, which fundamentally limits the expressivity of such models when dealing with complex cloth appearances such as long skirts. We show the advantages of our dynamic point field framework in terms of its representational power, learning efficiency, and robustness to out-of-distribution novel poses.

New submissions for Fri, 31 Mar 23

Keyword: efficient

Machine learning-based spin structure detection

  • Authors: Isaac Labrie-Boulay, Thomas Brian Winkler, Daniel Franzen, Alena Romanova, Hans Fangohr, Mathias Kläui
  • Subjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET); Data Analysis, Statistics and Probability (physics.data-an)
  • Arxiv link: https://arxiv.org/abs/2303.16905
  • Pdf link: https://arxiv.org/pdf/2303.16905
  • Abstract
    One of the most important magnetic spin structure is the topologically stabilised skyrmion quasi-particle. Its interesting physical properties make them candidates for memory and efficient neuromorphic computation schemes. For the device operation, detection of the position, shape, and size of skyrmions is required and magnetic imaging is typically employed. A frequently used technique is magneto-optical Kerr microscopy where depending on the samples material composition, temperature, material growing procedures, etc., the measurements suffer from noise, low-contrast, intensity gradients, or other optical artifacts. Conventional image analysis packages require manual treatment, and a more automatic solution is required. We report a convolutional neural network specifically designed for segmentation problems to detect the position and shape of skyrmions in our measurements. The network is tuned using selected techniques to optimize predictions and in particular the number of detected classes is found to govern the performance. The results of this study shows that a well-trained network is a viable method of automating data pre-processing in magnetic microscopy. The approach is easily extendable to other spin structures and other magnetic imaging methods.

Optimizing Reconfigurable Intelligent Surfaces for Short Transmissions: How Detailed Configurations can be Afforded?

  • Authors: Anders Enqvist, Özlem Tuğfe Demir, Cicek Cavdar, Emil Björnson
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.16913
  • Pdf link: https://arxiv.org/pdf/2303.16913
  • Abstract
    In this paper, we examine how to minimize the total energy consumption of a user equipment (UE) when it transmits a finite-sized data payload of a given length. The receiving base station (BS) controls a reconfigurable intelligent surface (RIS) that can be utilized to improve the channel conditions, but only if additional pilot signals are transmitted to configure the RIS. The challenge is that the pilot resources spent on configuring the RIS increase the energy consumption, especially when small payloads are transmitted, so it must be balanced against the energy savings during data transmission. We derive a formula for the energy consumption, taking both the pilot and data transmission power into account. It also includes the effects of imperfect channel state information, the use of phase-shifts with finite resolution at the RIS, and the passive circuit energy consumption. We also consider how dividing the RIS into subarrays consisting of multiple RIS elements using the same reflection coefficient can shorten the pilot length. In particular, the pilot power and subarray size are tuned to the payload length to minimize the energy consumption while maintaining parts of the aperture gain. Our analytical results show that, for a given geometry and transmission payload length, there exists a unique energy-minimizing subarray size and pilot power. For small payloads and when the channel conditions between the BS and UE are favorable compared to the path to the RIS, the energy consumption is minimized using subarrays with many elements and low pilot transmission power. On the other hand, when the channel conditions to the RIS are better and the data payloads are large, it is preferable to use fewer elements per subarray, potentially configuring each element individually and transmitting the pilot signals with additional power.

T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals

  • Authors: James Giroux, Martin Bouchard, Robert Laganiere
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16940
  • Pdf link: https://arxiv.org/pdf/2303.16940
  • Abstract
    Object detection utilizing Frequency Modulated Continous Wave radar is becoming increasingly popular in the field of autonomous systems. Radar does not possess the same drawbacks seen by other emission-based sensors such as LiDAR, primarily the degradation or loss of return signals due to weather conditions such as rain or snow. However, radar does possess traits that make it unsuitable for standard emission-based deep learning representations such as point clouds. Radar point clouds tend to be sparse and therefore information extraction is not efficient. To overcome this, more traditional digital signal processing pipelines were adapted to form inputs residing directly in the frequency domain via Fast Fourier Transforms. Commonly, three transformations were used to form Range-Azimuth-Doppler cubes in which deep learning algorithms could perform object detection. This too has drawbacks, namely the pre-processing costs associated with performing multiple Fourier Transforms and normalization. We explore the possibility of operating on raw radar inputs from analog to digital converters via the utilization of complex transformation layers. Moreover, we introduce hierarchical Swin Vision transformers to the field of radar object detection and show their capability to operate on inputs varying in pre-processing, along with different radar configurations, i.e. relatively low and high numbers of transmitters and receivers, while obtaining on par or better results than the state-of-the-art.

Concise QBF Encodings for Games on a Grid (extended version)

  • Authors: Irfansha Shaik, Jaco van de Pol
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16949
  • Pdf link: https://arxiv.org/pdf/2303.16949
  • Abstract
    Encoding 2-player games in QBF correctly and efficiently is challenging and error-prone. To enable concise specifications and uniform encodings of games played on grid boards, like Tic-Tac-Toe, Connect-4, Domineering, Pursuer-Evader and Breakthrough, we introduce Board-game Domain Definition Language (BDDL), inspired by the success of PDDL in the planning domain. We provide an efficient translation from BDDL into QBF, encoding the existence of a winning strategy of bounded depth. Our lifted encoding treats board positions symbolically and allows concise definitions of conditions, effects and winning configurations, relative to symbolic board positions. The size of the encoding grows linearly in the input model and the considered depth. To show the feasibility of such a generic approach, we use QBF solvers to compute the critical depths of winning strategies for instances of several known games. For several games, our work provides the first QBF encoding. Unlike plan validation in SAT-based planning, validating QBF-based winning strategies is difficult. We show how to validate winning strategies using QBF certificates and interactive game play.

Fairness-Aware Data Valuation for Supervised Learning

  • Authors: José Pombal, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro
  • Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2303.16963
  • Pdf link: https://arxiv.org/pdf/2303.16963
  • Abstract
    Data valuation is a ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how training instances may influence both performance and fairness of ML models. Thus, we propose Fairness-Aware Data vauatiOn (FADO), a data valuation framework that can be used to incorporate fairness concerns into a series of ML-related tasks (e.g., data pre-processing, exploratory data analysis, active learning). We propose an entropy-based data valuation metric suited to address our two-pronged goal of maximizing both performance and fairness, which is more computationally efficient than existing metrics. We then show how FADO can be applied as the basis for unfairness mitigation pre-processing techniques. Our methods achieve promising results -- up to a 40 p.p. improvement in fairness at a less than 1 p.p. loss in performance compared to a baseline -- and promote fairness in a data-centric way, where a deeper understanding of data quality takes center stage.

Computationally efficient sampling methods for sparsity promoting hierarchical Bayesian models

  • Authors: Daniela Calvetti, Erkki Somersalo
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.16988
  • Pdf link: https://arxiv.org/pdf/2303.16988
  • Abstract
    Bayesian hierarchical models have been demonstrated to provide efficient algorithms for finding sparse solutions to ill-posed inverse problems. The models comprise typically a conditionally Gaussian prior model for the unknown, augmented by a hyperprior model for the variances. A widely used choice for the hyperprior is a member of the family of generalized gamma distributions. Most of the work in the literature has concentrated on numerical approximation of the maximum a posteriori (MAP) estimates, and less attention has been paid on sampling methods or other means for uncertainty quantification. Sampling from the hierarchical models is challenging mainly for two reasons: The hierarchical models are typically high-dimensional, thus suffering from the curse of dimensionality, and the strong correlation between the unknown of interest and its variance can make sampling rather inefficient. This work addresses mainly the first one of these obstacles. By using a novel reparametrization, it is shown how the posterior distribution can be transformed into one dominated by a Gaussian white noise, allowing sampling by using the preconditioned Crank-Nicholson (pCN) scheme that has been shown to be efficient for sampling from distributions dominated by a Gaussian component. Furthermore, a novel idea for speeding up the pCN in a special case is developed, and the question of how strongly the hierarchical models are concentrated on sparse solutions is addressed in light of a computed example.

The G-invariant graph Laplacian

  • Authors: Eitan Rosen, Yoel Shkolnisky
  • Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2303.17001
  • Pdf link: https://arxiv.org/pdf/2303.17001
  • Abstract
    Graph Laplacian based algorithms for data lying on a manifold have been proven effective for tasks such as dimensionality reduction, clustering, and denoising. In this work, we consider data sets whose data point not only lie on a manifold, but are also closed under the action of a continuous group. An example of such data set is volumes that line on a low dimensional manifold, where each volume may be rotated in three-dimensional space. We introduce the G-invariant graph Laplacian that generalizes the graph Laplacian by accounting for the action of the group on the data set. We show that like the standard graph Laplacian, the G-invariant graph Laplacian converges to the Laplace-Beltrami operator on the data manifold, but with a significantly improved convergence rate. Furthermore, we show that the eigenfunctions of the G-invariant graph Laplacian admit the form of tensor products between the group elements and eigenvectors of certain matrices, which can be computed efficiently using FFT-type algorithms. We demonstrate our construction and its advantages on the problem of filtering data on a noisy manifold closed under the action of the special unitary group SU(2).

The secret of immersion: actor driven camera movement generation for auto-cinematography

  • Authors: Xinyi Wu, Haohong Wang, Aggelos K. Katsaggelos
  • Subjects: Multimedia (cs.MM); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17041
  • Pdf link: https://arxiv.org/pdf/2303.17041
  • Abstract
    Immersion plays a vital role when designing cinematic creations, yet the difficulty in immersive shooting prevents designers to create satisfactory outputs. In this work, we analyze the specific components that contribute to cinematographic immersion considering spatial, emotional, and aesthetic level, while these components are then combined into a high-level evaluation mechanism. Guided by such a immersion mechanism, we propose a GAN-based camera control system that is able to generate actor-driven camera movements in the 3D virtual environment to obtain immersive film sequences. The proposed encoder-decoder architecture in the generation flow transfers character motion into camera trajectory conditioned on an emotion factor. This ensures spatial and emotional immersion by performing actor-camera synchronization physically and psychologically. The emotional immersion is further strengthened by incorporating regularization that controls camera shakiness for expressing different mental statuses. To achieve aesthetic immersion, we make effort to improve aesthetic frame compositions by modifying the synthesized camera trajectory. Based on a self-supervised adjustor, the adjusted camera placements can project the character to the appropriate on-frame locations following aesthetic rules. The experimental results indicate that our proposed camera control system can efficiently offer immersive cinematic videos, both quantitatively and qualitatively, based on a fine-grained immersive shooting. Live examples are shown in the supplementary video.

Material-agnostic Shaping of Granular Materials with Optimal Transport

  • Authors: Nikhilesh Alatur, Olov Andersson, Roland Siegwart, Lionel Ott
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17047
  • Pdf link: https://arxiv.org/pdf/2303.17047
  • Abstract
    From construction materials, such as sand or asphalt, to kitchen ingredients, like rice, sugar, or salt; the world is full of granular materials. Despite impressive progress in robotic manipulation, manipulating and interacting with granular material remains a challenge due to difficulties in perceiving, representing, modelling, and planning for these variable materials that have complex internal dynamics. While some prior work has looked into estimating or learning accurate dynamics models for granular materials, the literature is still missing a more abstract planning method that can be used for planning manipulation actions for granular materials with unknown material properties. In this work, we leverage tools from optimal transport and connect them to robot motion planning. We propose a heuristics-based sweep planner that does not require knowledge of the material's properties and directly uses a height map representation to generate promising sweeps. These sweeps transform granular material from arbitrary start shapes into arbitrary target shapes. We apply the sweep planner in a fast and reactive feedback loop and avoid the need for model-based planning over multiple time steps. We validate our approach with a large set of simulation and hardware experiments where we show that our method is capable of efficiently solving several complex tasks, including gathering, separating, and shaping of several types of granular materials into different target shapes.

Transductive few-shot adapters for medical image segmentation

  • Authors: Julio Silva-Rodríguez, Jose Dolz, Ismail Ben Ayed
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17051
  • Pdf link: https://arxiv.org/pdf/2303.17051
  • Abstract
    With the recent raise of foundation models in computer vision and NLP, the pretrain-and-adapt strategy, where a large-scale model is fine-tuned on downstream tasks, is gaining popularity. However, traditional fine-tuning approaches may still require significant resources and yield sub-optimal results when the labeled data of the target task is scarce. This is especially the case in clinical settings. To address this challenge, we formalize few-shot efficient fine-tuning (FSEFT), a novel and realistic setting for medical image segmentation. Furthermore, we introduce a novel parameter-efficient fine-tuning strategy tailored to medical image segmentation, with (a) spatial adapter modules that are more appropriate for dense prediction tasks; and (b) a constrained transductive inference, which leverages task-specific prior knowledge. Our comprehensive experiments on a collection of public CT datasets for organ segmentation reveal the limitations of standard fine-tuning methods in few-shot scenarios, point to the potential of vision adapters and transductive inference, and confirm the suitability of foundation models.

A Tensor-based Convolutional Neural Network for Small Dataset Classification

  • Authors: Zhenhua Chen, David Crandall
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2303.17061
  • Pdf link: https://arxiv.org/pdf/2303.17061
  • Abstract
    Inspired by the ConvNets with structured hidden representations, we propose a Tensor-based Neural Network, TCNN. Different from ConvNets, TCNNs are composed of structured neurons rather than scalar neurons, and the basic operation is neuron tensor transformation. Unlike other structured ConvNets, where the part-whole relationships are modeled explicitly, the relationships are learned implicitly in TCNNs. Also, the structured neurons in TCNNs are high-rank tensors rather than vectors or matrices. We compare TCNNs with current popular ConvNets, including ResNets, MobileNets, EfficientNets, RegNets, etc., on CIFAR10, CIFAR100, and Tiny ImageNet. The experiment shows that TCNNs have higher efficiency in terms of parameters. TCNNs also show higher robustness against white-box adversarial attacks on MNIST compared to ConvNets.

Reading Strategies for Graph Visualizations that Wrap Around in Torus Topology

  • Authors: Kun-Ting Chen, Quynh Quang Ngo, Kuno Kurzhals, Kim Marriott, Tim Dwyer, Michael Sedlmair, Daniel Weiskopf
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2303.17066
  • Pdf link: https://arxiv.org/pdf/2303.17066
  • Abstract
    We investigate reading strategies for node-link diagrams that wrap around the boundaries in a flattened torus topology by examining eye tracking data recorded in a previous controlled study. Prior work showed that torus drawing affords greater flexibility in clutter reduction than traditional node-link representations, but impedes link-and-path exploration tasks, while repeating tiles around boundaries aids comprehension. However, it remains unclear what strategies users apply in different wrapping settings. This is important for design implications for future work on more effective wrapped visualizations for network applications, and cyclic data that could benefit from wrapping. We perform visual-exploratory data analysis of gaze data, and conduct statistical tests derived from the patterns identified. Results show distinguishable gaze behaviors, with more visual glances and transitions between areas of interest in the non-replicated layout. Full-context has more successful visual searches than partial-context, but the gaze allocation indicates that the layout could be more space-efficient.

Dependent Task Offloading in Edge Computing Using GNN and Deep Reinforcement Learning

  • Authors: Zequn Cao, Xiaoheng Deng
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.17100
  • Pdf link: https://arxiv.org/pdf/2303.17100
  • Abstract
    Task offloading is a widely used technology in Mobile Edge Computing (MEC), which declines the completion time of user task with the help of resourceful edge servers. Existing works mainly focus on the case that the computation density of a user task is homogenous so that it can be offloaded in full or by percentage. However, various user tasks in real life consist of several inner dependent subtasks, each of which is a minimum execution unit logically. Motivated by this gap, we aim to solve the Dependent Task Offloading (DTO) problem under multi-user multi-edge scenario in this paper. We firstly use Directed Acyclic Graph (DAG) to represent dependent task where nodes indicate subtasks and directed edges indicate dependencies among subtasks. Then we propose a scheme based on Graph Attention Network (GAT) and Deep Reinforcement Learning (DRL) to minimize the makespan of user tasks. To utilize GAT efficiently, we put the training of it on resourceful cloud in unsupervised style due to the numerous data and computation resource requirements. In addition, we design a multi-discrete Action space for DRL algorithm to enhance the applicability of our proposed scheme. Experiments are conducted on broadly distributed synthetic data. The results demonstrate that our proposed approach can be adapted to both simple and complex MEC environments and outperforms other methods.

Deep Generative Model and Its Applications in Efficient Wireless Network Management: A Tutorial and Case Study

  • Authors: Yinqiu Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Abbas Jamalipour
  • Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17114
  • Pdf link: https://arxiv.org/pdf/2303.17114
  • Abstract
    With the phenomenal success of diffusion models and ChatGPT, deep generation models (DGMs) have been experiencing explosive growth from 2022. Not limited to content generation, DGMs are also widely adopted in Internet of Things, Metaverse, and digital twin, due to their outstanding ability to represent complex patterns and generate plausible samples. In this article, we explore the applications of DGMs in a crucial task, i.e., improving the efficiency of wireless network management. Specifically, we firstly overview the generative AI, as well as three representative DGMs. Then, a DGM-empowered framework for wireless network management is proposed, in which we elaborate the issues of the conventional network management approaches, why DGMs can address them efficiently, and the step-by-step workflow for applying DGMs in managing wireless networks. Moreover, we conduct a case study on network economics, using the state-of-the-art DGM model, i.e., diffusion model, to generate effective contracts for incentivizing the mobile AI-Generated Content (AIGC) services. Last but not least, we discuss important open directions for the further research.

Conservation and stability in a discontinuous Galerkin method for the vector invariant spherical shallow water equations

  • Authors: Kieran Ricardo, David Lee, Kenneth Duru
  • Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
  • Arxiv link: https://arxiv.org/abs/2303.17120
  • Pdf link: https://arxiv.org/pdf/2303.17120
  • Abstract
    We develop a novel and efficient discontinuous Galerkin spectral element method (DG-SEM) for the spherical rotating shallow water equations in vector invariant form. We prove that the DG-SEM is energy stable, and discretely conserves mass, vorticity, and linear geostrophic balance on general curvlinear meshes. These theoretical results are possible due to our novel entropy stable numerical DG fluxes for the shallow water equations in vector invariant form. We experimentally verify these results on a cubed sphere mesh. Additionally, we show that our method is robust, that is can be run stably without any dissipation. The entropy stable fluxes are sufficient to control the grid scale noise generated by geostrophic turbulence without the need for artificial stabilisation.

C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation

  • Authors: Nazmul Karim, Niluthpol Chowdhury Mithun, Abhinav Rajvanshi, Han-pang Chiu, Supun Samarasekera, Nazanin Rahnavard
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17132
  • Pdf link: https://arxiv.org/pdf/2303.17132
  • Abstract
    Unsupervised domain adaptation (UDA) approaches focus on adapting models trained on a labeled source domain to an unlabeled target domain. UDA methods have a strong assumption that the source data is accessible during adaptation, which may not be feasible in many real-world scenarios due to privacy concerns and resource constraints of devices. In this regard, source-free domain adaptation (SFDA) excels as access to source data is no longer required during adaptation. Recent state-of-the-art (SOTA) methods on SFDA mostly focus on pseudo-label refinement based self-training which generally suffers from two issues: i) inevitable occurrence of noisy pseudo-labels that could lead to early training time memorization, ii) refinement process requires maintaining a memory bank which creates a significant burden in resource constraint scenarios. To address these concerns, we propose C-SFDA, a curriculum learning aided self-training framework for SFDA that adapts efficiently and reliably to changes across domains based on selective pseudo-labeling. Specifically, we employ a curriculum learning scheme to promote learning from a restricted amount of pseudo labels selected based on their reliabilities. This simple yet effective step successfully prevents label noise propagation during different stages of adaptation and eliminates the need for costly memory-bank based label refinement. Our extensive experimental evaluations on both image recognition and semantic segmentation tasks confirm the effectiveness of our method. C-SFDA is readily applicable to online test-time domain adaptation and also outperforms previous SOTA methods in this task.

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

  • Authors: Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17144
  • Pdf link: https://arxiv.org/pdf/2303.17144
  • Abstract
    Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research. To address this gap, we present DAMO-StreamNet, an optimized framework that combines recent advances from the YOLO series with a comprehensive analysis of spatial and temporal perception mechanisms, delivering a cutting-edge solution. The key innovations of DAMO-StreamNet are: (1) A robust neck structure incorporating deformable convolution, enhancing the receptive field and feature alignment capabilities. (2) A dual-branch structure that integrates short-path semantic features and long-path temporal features, improving motion state prediction accuracy. (3) Logits-level distillation for efficient optimization, aligning the logits of teacher and student networks in semantic space. (4) A real-time forecasting mechanism that updates support frame features with the current frame, ensuring seamless streaming perception during inference. Our experiments demonstrate that DAMO-StreamNet surpasses existing state-of-the-art methods, achieving 37.8% (normal size (600, 960)) and 43.3% (large size (1200, 1920)) sAP without using extra data. This work not only sets a new benchmark for real-time perception but also provides valuable insights for future research. Additionally, DAMO-StreamNet can be applied to various autonomous systems, such as drones and robots, paving the way for real-time perception.

Convergence of the CEM-GMsFEM for compressible flow in highly heterogeneous media

  • Authors: Leonardo A. Poveda, Shubin Fu, Eric T. Chung, Lina Zhao
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.17157
  • Pdf link: https://arxiv.org/pdf/2303.17157
  • Abstract
    This paper presents and analyses a Constraint Energy Minimization Generalized Multiscale Finite Element Method (CEM-GMsFEM) for solving single-phase non-linear compressible flows in highly heterogeneous media. The construction of CEM-GMsFEM hinges on two crucial steps: First, the auxiliary space is constructed by solving local spectral problems, where the basis functions corresponding to small eigenvalues are captured. Then the basis functions are obtained by solving local energy minimization problems over the oversampling domains using the auxiliary space. The basis functions have exponential decay outside the corresponding local oversampling regions. The convergence of the proposed method is provided, and we show that this convergence only depends on the coarse grid size and is independent of the heterogeneities. An online enrichment guided by \emph{a posteriori} error estimator is developed to enhance computational efficiency. Several numerical experiments on a three-dimensional case to confirm the theoretical findings are presented, illustrating the performance of the method and giving efficient and accurate numerical.

Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models

  • Authors: Sifan Long, Zhen Zhao, Junkun Yuan, Zichang Tan, Jiangjiang Liu, Luping Zhou, Shengsheng Wang, Jingdong Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17169
  • Pdf link: https://arxiv.org/pdf/2303.17169
  • Abstract
    Prompt learning has become one of the most efficient paradigms for adapting large pre-trained vision-language models to downstream tasks. Current state-of-the-art methods, like CoOp and ProDA, tend to adopt soft prompts to learn an appropriate prompt for each specific task. Recent CoCoOp further boosts the base-to-new generalization performance via an image-conditional prompt. However, it directly fuses identical image semantics to prompts of different labels and significantly weakens the discrimination among different classes as shown in our experiments. Motivated by this observation, we first propose a class-aware text prompt (CTP) to enrich generated prompts with label-related image information. Unlike CoCoOp, CTP can effectively involve image semantics and avoid introducing extra ambiguities into different prompts. On the other hand, instead of reserving the complete image representations, we propose text-guided feature tuning (TFT) to make the image branch attend to class-related representation. A contrastive loss is employed to align such augmented text and image representations on downstream tasks. In this way, the image-to-text CTP and text-to-image TFT can be mutually promoted to enhance the adaptation of VLMs for downstream tasks. Extensive experiments demonstrate that our method outperforms the existing methods by a significant margin. Especially, compared to CoCoOp, we achieve an average improvement of 4.03% on new classes and 3.19% on harmonic-mean over eleven classification benchmarks.

High-Performance Low-Complexity Hierarchical Frequency Synchronization for Distributed Massive MIMO-OFDMA Systems

  • Authors: Xiao-Yang Wang, Shaoshi Yang, Tian-Hao Yuan, Hou-Yu Zhai, Jianhua Zhang, Lajos Hanzo
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.17188
  • Pdf link: https://arxiv.org/pdf/2303.17188
  • Abstract
    We propose a high-performance yet low-complexity hierarchical frequency synchronization scheme for orthogonal frequency-division multiple-access (OFDMA) aided distributed massive multi-input multi-output (MIMO) systems, where multi-ple carrier frequency offsets (CFOs) have to be estimated in the uplink. To solve this multi-CFO estimation problem efficiently, we classify the active antenna units (AAUs) as the master and the slaves. Then, we split the scheme into two stages. During the first stage the distributed slave AAUs are synchronized with the master AAU, while the user equipment (UE) is synchronized with the closest slave AAU during the second stage. The mean square error (MSE) performance of our scheme is better than that of the representative state-of-the-art baseline schemes, while its computational complexity is substantially lower.

Practical self-supervised continual learning with continual fine-tuning

  • Authors: Chi Ian Tang, Lorena Qendro, Dimitris Spathis, Fahim Kawsar, Cecilia Mascolo, Akhil Mathur
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17235
  • Pdf link: https://arxiv.org/pdf/2303.17235
  • Abstract
    Self-supervised learning (SSL) has shown remarkable performance in computer vision tasks when trained offline. However, in a Continual Learning (CL) scenario where new data is introduced progressively, models still suffer from catastrophic forgetting. Retraining a model from scratch to adapt to newly generated data is time-consuming and inefficient. Previous approaches suggested re-purposing self-supervised objectives with knowledge distillation to mitigate forgetting across tasks, assuming that labels from all tasks are available during fine-tuning. In this paper, we generalize self-supervised continual learning in a practical setting where available labels can be leveraged in any step of the SSL process. With an increasing number of continual tasks, this offers more flexibility in the pre-training and fine-tuning phases. With Kaizen, we introduce a training architecture that is able to mitigate catastrophic forgetting for both the feature extractor and classifier with a carefully designed loss function. By using a set of comprehensive evaluation metrics reflecting different aspects of continual learning, we demonstrated that Kaizen significantly outperforms previous SSL models in competitive vision benchmarks, with up to 16.5% accuracy improvement on split CIFAR-100. Kaizen is able to balance the trade-off between knowledge retention and learning from new data with an end-to-end model, paving the way for practical deployment of continual learning systems.

Simultaneous reconstruction of sound speed and nonlinearity parameter in a paraxial model of vibro-acoustography in frequency domain

  • Authors: Barbara Kaltenbacher ans teresa Rauscher
  • Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
  • Arxiv link: https://arxiv.org/abs/2303.17236
  • Pdf link: https://arxiv.org/pdf/2303.17236
  • Abstract
    In this paper we consider the inverse problem of vibro-acoustography, a technique for enhancing ultrasound imaging by making use of nonlinear effects. It amounts to determining two spatially variable coefficients in a system of PDEs describing propagation of two directed sound beams and the wave resulting from their nonlinear interaction. To justify the use of Newton's method for solving this inverse problem, on one hand we verify well-definedeness and differentiability of the forward operator corresponding to two versions of the PDE model; on the other hand we consider an all-at-once formulation of the inverse problem and prove convergence of Newton's method for its solution.

Computationally efficient predictive control based on ANN state-space model

  • Authors: Jan H. Hoekstra, Bence Cseppentő, Gerben I. Beintema, Maarten Schoukens, Zsolt Kollár, Roland Tóth
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.17305
  • Pdf link: https://arxiv.org/pdf/2303.17305
  • Abstract
    Artificial neural networks (ANN) have been shown to be flexible and effective function estimators for identification of nonlinear state-space models. However, if the resulting models are used directly for nonlinear model predictive control (NMPC), the resulting nonlinear optimization problem is often overly complex due the size of the network, requires the use of high-order observers to track the states of the ANN model, and the overall control scheme exploits little of the structural properties or available autograd tools for these models. In this paper, we propose an efficient approach to auto-convert ANN state-space models to linear parameter-varying (LPV) form and solve predictive control problems by successive solutions of linear model predictive problems, corresponding to quadratic programs (QPs). Furthermore, we show how existing ANN identification methods, such as the SUBNET method that uses a state encoder, can provide efficient implementation of MPCs. The performance of the proposed approach is demonstrated via a simulation study on an unbalanced disc system.

Masked Autoencoders as Image Processors

  • Authors: Huiyu Duan, Wei Shen, Xiongkuo Min, Danyang Tu, Long Teng, Jia Wang, Guangtao Zhai
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17316
  • Pdf link: https://arxiv.org/pdf/2303.17316
  • Abstract
    Transformers have shown significant effectiveness for various vision tasks including both high-level vision and low-level vision. Recently, masked autoencoders (MAE) for feature pre-training have further unleashed the potential of Transformers, leading to state-of-the-art performances on various high-level vision tasks. However, the significance of MAE pre-training on low-level vision tasks has not been sufficiently explored. In this paper, we show that masked autoencoders are also scalable self-supervised learners for image processing tasks. We first present an efficient Transformer model considering both channel attention and shifted-window-based self-attention termed CSformer. Then we develop an effective MAE architecture for image processing (MAEIP) tasks. Extensive experimental results show that with the help of MAEIP pre-training, our proposed CSformer achieves state-of-the-art performance on various image processing tasks, including Gaussian denoising, real image denoising, single-image motion deblurring, defocus deblurring, and image deraining.

Topics in the Haystack: Extracting and Evaluating Topics beyond Coherence

  • Authors: Anton Thielmann, Quentin Seifert, Arik Reuter, Elisabeth Bergherr, Benjamin Säfken
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2303.17324
  • Pdf link: https://arxiv.org/pdf/2303.17324
  • Abstract
    Extracting and identifying latent topics in large text corpora has gained increasing importance in Natural Language Processing (NLP). Most models, whether probabilistic models similar to Latent Dirichlet Allocation (LDA) or neural topic models, follow the same underlying approach of topic interpretability and topic extraction. We propose a method that incorporates a deeper understanding of both sentence and document themes, and goes beyond simply analyzing word frequencies in the data. This allows our model to detect latent topics that may include uncommon words or neologisms, as well as words not present in the documents themselves. Additionally, we propose several new evaluation metrics based on intruder words and similarity measures in the semantic space. We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task. We demonstrate the competitive performance of our method with a large benchmark study, and achieve superior results compared to state-of-the-art topic modeling and document clustering models.

Linear Insertion Deletion Codes in the High-Noise and High-Rate Regimes

  • Authors: Kuan Cheng, Zhengzhong Jin, Xin Li, Zhide Wei, Yu Zheng
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2303.17370
  • Pdf link: https://arxiv.org/pdf/2303.17370
  • Abstract
    This work continues the study of linear error correcting codes against adversarial insertion deletion errors (insdel errors). Previously, the work of Cheng, Guruswami, Haeupler, and Li \cite{CGHL21} showed the existence of asymptotically good linear insdel codes that can correct arbitrarily close to $1$ fraction of errors over some constant size alphabet, or achieve rate arbitrarily close to $1/2$ even over the binary alphabet. As shown in \cite{CGHL21}, these bounds are also the best possible. However, known explicit constructions in \cite{CGHL21}, and subsequent improved constructions by Con, Shpilka, and Tamo \cite{9770830} all fall short of meeting these bounds. Over any constant size alphabet, they can only achieve rate $&lt; 1/8$ or correct $&lt; 1/4$ fraction of errors; over the binary alphabet, they can only achieve rate $&lt; 1/1216$ or correct $&lt; 1/54$ fraction of errors. Apparently, previous techniques face inherent barriers to achieve rate better than $1/4$ or correct more than $1/2$ fraction of errors. In this work we give new constructions of such codes that meet these bounds, namely, asymptotically good linear insdel codes that can correct arbitrarily close to $1$ fraction of errors over some constant size alphabet, and binary asymptotically good linear insdel codes that can achieve rate arbitrarily close to $1/2$.\ All our constructions are efficiently encodable and decodable. Our constructions are based on a novel approach of code concatenation, which embeds the index information implicitly into codewords. This significantly differs from previous techniques and may be of independent interest. Finally, we also prove the existence of linear concatenated insdel codes with parameters that match random linear codes, and propose a conjecture about linear insdel codes.

Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions

  • Authors: Yicheng Luo, Jackie Kay, Edward Grefenstette, Marc Peter Deisenroth
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17396
  • Pdf link: https://arxiv.org/pdf/2303.17396
  • Abstract
    Offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment. Online finetuning of such offline models can further improve performance. But how should we ideally finetune agents obtained from offline RL training? While offline RL algorithms can in principle be used for finetuning, in practice, their online performance improves slowly. In contrast, we show that it is possible to use standard online off-policy algorithms for faster improvement. However, we find this approach may suffer from policy collapse, where the policy undergoes severe performance deterioration during initial online learning. We investigate the issue of policy collapse and how it relates to data diversity, algorithm choices and online replay distribution. Based on these insights, we propose a conservative policy optimization procedure that can achieve stable and sample-efficient online learning from offline pretraining.

An Efficient Mobile Gateway Selection and Discovery Based-Routing Protocol in Heterogeneous LTE-VANET Networks

  • Authors: Driss Abada, Rachid Adrdor, Omar Boutkhoum, Adil Bohouch
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2303.17439
  • Pdf link: https://arxiv.org/pdf/2303.17439
  • Abstract
    Coupling cellular communication networks with vehicular ad hoc networks (VANET) can be a very interesting way out for providing Internet access to vehicles in the road. However, due to the several specific characteristics of VANETs, making an efficient multi-hop routing from vehicular sources to the Internet gateways through Long Term Evolution (LTE) technology is still challenging. In this paper, an Internet mobile gateway selection scheme is proposed to elect more potential vehicles to behave as gateways to Internet in VANETs. Therefore, the discovery and the selection of route to those mobiles gateways is carried out via an efficient multiple metrics-based relay selection mechanism. The objective is to select the more reliable route to the mobile gateways, by reducing the communication overhead and performing seamless handover. The proposed protocol is compared with one recent protocol based on packet delivery ratio, average end-to-end delay and overhead. The results show that the proposed protocol ameliorates significantly the network performance in the contrast of the other protocol.

NN-Copula-CD: A Copula-Guided Interpretable Neural Network for Change Detection in Heterogeneous Remote Sensing Images

  • Authors: Weiming Li, Xueqian Wang, Gang Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2303.17448
  • Pdf link: https://arxiv.org/pdf/2303.17448
  • Abstract
    Change detection (CD) in heterogeneous remote sensing images is a practical and challenging issue for real-life emergencies. In the past decade, the heterogeneous CD problem has significantly benefited from the development of deep neural networks (DNN). However, the data-driven DNNs always perform like a black box where the lack of interpretability limits the trustworthiness and controllability of DNNs in most practical CD applications. As a strong knowledge-driven tool to measure correlation between random variables, Copula theory has been introduced into CD, yet it suffers from non-robust CD performance without manual prior selection for Copula functions. To address the above issues, we propose a knowledge-data-driven heterogeneous CD method (NN-Copula-CD) based on the Copula-guided interpretable neural network. In our NN-Copula-CD, the mathematical characteristics of Copula are designed as the losses to supervise a simple fully connected neural network to learn the correlation between bi-temporal image patches, and then the changed regions are identified via binary classification for the correlation coefficients of all image patch pairs of the bi-temporal images. We conduct in-depth experiments on three datasets with multimodal images (e.g., Optical, SAR, and NIR), where the quantitative results and visualized analysis demonstrate both the effectiveness and interpretability of the proposed NN-Copula-CD.

HMES: A Scalable Human Mobility and Epidemic Simulation System with Fast Intervention Modeling

  • Authors: Haoyu Geng, Guanjie Zheng, Zhengqing Han, Hua Wei, Zhenhui Li
  • Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2303.17464
  • Pdf link: https://arxiv.org/pdf/2303.17464
  • Abstract
    Recently, the world has witnessed the most severe pandemic (COVID-19) in this century. Studies on epidemic prediction and simulation have received increasing attention. However, the current methods suffer from three issues. First, most of the current studies focus on epidemic prediction, which can not provide adequate support for intervention policy making. Second, most of the current interventions are based on population groups rather than fine-grained individuals, which can not make the measures towards the infected people and may cause waste of medical resources. Third, current simulations are not efficient and flexible enough for large-scale complex systems. In this paper, we propose a new epidemic simulation framework called HMES to address the above three challenges. The proposed framework covers a full pipeline of epidemic simulation and enables comprehensive fine-grained control in a large scale. In addition, we conduct experiments on real COVID-19 data. HMES demonstrates more accurate modeling of disease transmission up to 300 million people and up to 3 times acceleration compared to the state-of-the-art methods.

PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation

  • Authors: Qitao Zhao, Ce Zheng, Mengyuan Liu, Pichao Wang, Chen Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17472
  • Pdf link: https://arxiv.org/pdf/2303.17472
  • Abstract
    Recently, transformer-based methods have gained significant success in sequential 2D-to-3D lifting human pose estimation. As a pioneering work, PoseFormer captures spatial relations of human joints in each video frame and human dynamics across frames with cascaded transformer layers and has achieved impressive performance. However, in real scenarios, the performance of PoseFormer and its follow-ups is limited by two factors: (a) The length of the input joint sequence; (b) The quality of 2D joint detection. Existing methods typically apply self-attention to all frames of the input sequence, causing a huge computational burden when the frame number is increased to obtain advanced estimation accuracy, and they are not robust to noise naturally brought by the limited capability of 2D joint detectors. In this paper, we propose PoseFormerV2, which exploits a compact representation of lengthy skeleton sequences in the frequency domain to efficiently scale up the receptive field and boost robustness to noisy 2D joint detection. With minimum modifications to PoseFormer, the proposed method effectively fuses features both in the time domain and frequency domain, enjoying a better speed-accuracy trade-off than its precursor. Extensive experiments on two benchmark datasets (i.e., Human3.6M and MPI-INF-3DHP) demonstrate that the proposed approach significantly outperforms the original PoseFormer and other transformer-based variants. Code is released at \url{https://github.com/QitaoZhao/PoseFormerV2}.

Efficient distributed representations beyond negative sampling

  • Authors: Lorenzo Dall'Amico, Enrico Maria Belliardo
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2303.17475
  • Pdf link: https://arxiv.org/pdf/2303.17475
  • Abstract
    This article describes an efficient method to learn distributed representations, also known as embeddings. This is accomplished minimizing an objective function similar to the one introduced in the Word2Vec algorithm and later adopted in several works. The optimization computational bottleneck is the calculation of the softmax normalization constants for which a number of operations scaling quadratically with the sample size is required. This complexity is unsuited for large datasets and negative sampling is a popular workaround, allowing one to obtain distributed representations in linear time with respect to the sample size. Negative sampling consists, however, in a change of the loss function and hence solves a different optimization problem from the one originally proposed. Our contribution is to show that the sotfmax normalization constants can be estimated in linear time, allowing us to design an efficient optimization strategy to learn distributed representations. We test our approximation on two popular applications related to word and node embeddings. The results evidence competing performance in terms of accuracy with respect to negative sampling with a remarkably lower computational time.

Teaching contact-rich tasks from visual demonstrations by constraint extraction

  • Authors: Christian Hegeler, Filippo Rozzi, Loris Roveda, Kevin Haninger
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17481
  • Pdf link: https://arxiv.org/pdf/2303.17481
  • Abstract
    Contact-rich manipulation involves kinematic constraints on the task motion, typically with discrete transitions between these constraints during the task. Allowing the robot to detect and reason about these contact constraints can support robust and dynamic manipulation, but how can these contact models be efficiently learned? Purely visual observations are an attractive data source, allowing passive task demonstrations with unmodified objects. Existing approaches for vision-only learning from demonstration are effective in pick-and-place applications and planar tasks. Nevertheless, accuracy/occlusions and unobserved task dynamics can limit their robustness in contact-rich manipulation. To use visual demonstrations for contact-rich robotic tasks, we consider the demonstration of pose trajectories with transitions between holonomic kinematic constraints, first clustering the trajectories into discrete contact modes, then fitting kinematic constraints per each mode. The fit constraints are then used to (i) detect contact online with force/torque measurements and (ii) plan the robot policy with respect to the active constraint. We demonstrate the approach with real experiments, on cabling and rake tasks, showing the approach gives robust manipulation through contact transitions.

Edge Ranking of Graphs in Transportation Networks using a Graph Neural Network (GNN)

  • Authors: Debasish Jana, Sven Malama, Sriram Narasimhan, Ertugrul Taciroglu
  • Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17485
  • Pdf link: https://arxiv.org/pdf/2303.17485
  • Abstract
    Many networks, such as transportation, power, and water distribution, can be represented as graphs. Crucial challenge in graph representations is identifying the importance of graph edges and their influence on overall network efficiency and information flow performance. For example, important edges in a transportation network are those roads that, when affected, will significantly alter the network's overall efficiency. Commonly used approach to finding such important edges is ``edge betweenness centrality'' (EBC), an edge ranking measure to determine the influential edges of the graph based on connectivity and information spread. Computing the EBC utilizing the common Brandes algorithm involves calculating the shortest paths for every node pair, which can be computationally expensive and restrictive, especially for large graphs. Changes in the graph parameters, e.g., in the edge weight or the addition and deletion of nodes or edges, require the recalculation of the EBC. As the main contribution, we propose an approximate method to estimate the EBC using a Graph Neural Network (GNN), a deep learning-based approach. We show that it is computationally efficient compared to the conventional method, especially for large graphs. The proposed method of GNN-based edge ranking is evaluated on several synthetic graphs and a real-world transportation data set. We show that this framework can estimate the approximate edge ranking much faster compared to the conventional method. This approach is inductive, i.e., training and testing are performed on different sets of graphs with varying numbers of nodes and edges. The proposed method is especially suitable for applications on large-scale networks when edge information is desired, for example, in urban infrastructure improvement projects, power, and water network resilience analyses, and optimizing resource allocations in engineering networks.

3D Line Mapping Revisited

  • Authors: Shaohui Liu, Yifan Yu, Rémi Pautrat, Marc Pollefeys, Viktor Larsson
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17504
  • Pdf link: https://arxiv.org/pdf/2303.17504
  • Abstract
    In contrast to sparse keypoints, a handful of line segments can concisely encode the high-level scene layout, as they often delineate the main structural elements. In addition to offering strong geometric cues, they are also omnipresent in urban landscapes and indoor scenes. Despite their apparent advantages, current line-based reconstruction methods are far behind their point-based counterparts. In this paper we aim to close the gap by introducing LIMAP, a library for 3D line mapping that robustly and efficiently creates 3D line maps from multi-view imagery. This is achieved through revisiting the degeneracy problem of line triangulation, carefully crafted scoring and track building, and exploiting structural priors such as line coincidence, parallelism, and orthogonality. Our code integrates seamlessly with existing point-based Structure-from-Motion methods and can leverage their 3D points to further improve the line reconstruction. Furthermore, as a byproduct, the method is able to recover 3D association graphs between lines and points / vanishing points (VPs). In thorough experiments, we show that LIMAP significantly outperforms existing approaches for 3D line mapping. Our robust 3D line maps also open up new research directions. We show two example applications: visual localization and bundle adjustment, where integrating lines alongside points yields the best results. Code is available at https://github.com/cvg/limap.

Sum-of-Squares Lower Bounds for Densest $k$-Subgraph

  • Authors: Chris Jones, Aaron Potechin, Goutham Rajendran, Jeff Xu
  • Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2303.17506
  • Pdf link: https://arxiv.org/pdf/2303.17506
  • Abstract
    Given a graph and an integer $k$, Densest $k$-Subgraph is the algorithmic task of finding the subgraph on $k$ vertices with the maximum number of edges. This is a fundamental problem that has been subject to intense study for decades, with applications spanning a wide variety of fields. The state-of-the-art algorithm is an $O(n^{1/4 + \epsilon})$-factor approximation (for any $\epsilon &gt; 0$) due to Bhaskara et al. [STOC '10]. Moreover, the so-called log-density framework predicts that this is optimal, i.e. it is impossible for an efficient algorithm to achieve an $O(n^{1/4 - \epsilon})$-factor approximation. In the average case, Densest $k$-Subgraph is a prototypical noisy inference task which is conjectured to exhibit a statistical-computational gap. In this work, we provide the strongest evidence yet of hardness for Densest $k$-Subgraph by showing matching lower bounds against the powerful Sum-of-Squares (SoS) algorithm, a meta-algorithm based on convex programming that achieves state-of-art algorithmic guarantees for many optimization and inference problems. For $k \leq n^{\frac{1}{2}}$, we obtain a degree $n^{\delta}$ SoS lower bound for the hard regime as predicted by the log-density framework. To show this, we utilize the modern framework for proving SoS lower bounds on average-case problems pioneered by Barak et al. [FOCS '16]. A key issue is that small denser-than-average subgraphs in the input will greatly affect the value of the candidate pseudoexpectation operator around the subgraph. To handle this challenge, we devise a novel matrix factorization scheme based on the positive minimum vertex separator. We then prove an intersection tradeoff lemma to show that the error terms when using this separator are indeed small.

Learning in Factored Domains with Information-Constrained Visual Representations

  • Authors: Tyler Malloy, Miao Liu, Matthew D. Riemer, Tim Klinger, Gerald Tesauro, Chris R. Sims
  • Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Neurons and Cognition (q-bio.NC)
  • Arxiv link: https://arxiv.org/abs/2303.17508
  • Pdf link: https://arxiv.org/pdf/2303.17508
  • Abstract
    Humans learn quickly even in tasks that contain complex visual information. This is due in part to the efficient formation of compressed representations of visual information, allowing for better generalization and robustness. However, compressed representations alone are insufficient for explaining the high speed of human learning. Reinforcement learning (RL) models that seek to replicate this impressive efficiency may do so through the use of factored representations of tasks. These informationally simplistic representations of tasks are similarly motivated as the use of compressed representations of visual information. Recent studies have connected biological visual perception to disentangled and compressed representations. This raises the question of how humans learn to efficiently represent visual information in a manner useful for learning tasks. In this paper we present a model of human factored representation learning based on an altered form of a $\beta$-Variational Auto-encoder used in a visual learning task. Modelling results demonstrate a trade-off in the informational complexity of model latent dimension spaces, between the speed of learning and the accuracy of reconstructions.

Hybrid Dealiasing of Complex Convolutions

  • Authors: Noel Murasko, John C. Bowman
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.17510
  • Pdf link: https://arxiv.org/pdf/2303.17510
  • Abstract
    Efficient algorithms for computing linear convolutions based on the fast Fourier transform are developed. A hybrid approach is described that combines the conventional practice of explicit dealiasing (explicitly padding the input data with zeros) and implicit dealiasing (mathematically accounting for these zero values). The new approach generalizes implicit dealiasing to arbitrary padding ratios and includes explicit dealiasing as a special case. Unlike existing implementations of implicit dealiasing, hybrid dealiasing tailors its subtransform sizes to the convolution geometry. Multidimensional convolutions are implemented with hybrid dealiasing by decomposing them into lower-dimensional convolutions. Convolutions of complex-valued and Hermitian inputs of equal length are illustrated with pseudocode and implemented in the open-source FFTW++ library. Hybrid dealiasing is shown to outperform explicit dealiasing in one, two, and three dimensions.

Power-Optimal HARQ Protocol for Reliable Free Space Optical Communication

  • Authors: Georgios D. Chondrogiannis, Nikos A. Mitsiou, Nestor D. Chatzidiamantis, Alexandros-Apostolos A. Boulogeorgos, George K. Karagiannidis
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.17512
  • Pdf link: https://arxiv.org/pdf/2303.17512
  • Abstract
    This paper investigates the usage of hybrid automatic repeat request (HARQ) protocols for power-efficient and reliable communications over free space optical (FSO) links. By exploiting the large coherence time of the FSO channel, the proposed transmission schemes combat turbulence-induced fading by retransmitting the failed packets in the same coherence interval. To assess the performance of the presented HARQ technique, we extract a theoretical framework for the outage performance. In more detail, a closed-form expression for the outage probability (OP) is reported and an approximation for the high signal-to-noise ratio (SNR) region is extracted. Building upon the theoretical framework, we formulate a transmission power allocation problem throughout the retransmission rounds. This optimization problem is solved numerically through the use of an iterative algorithm. In addition, the average throughput of the HARQ schemes under consideration is examined. Simulation results validate the theoretical analysis under different turbulence conditions and demonstrate the performance improvement, in terms of both OP and throughput, of the proposed HARQ schemes compared to fixed transmit power HARQ benchmarks.

Nonlinear Approximation with Subsampled Rank-1 Lattices

  • Authors: Felix Bartel, Fabian Taubert
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.17541
  • Pdf link: https://arxiv.org/pdf/2303.17541
  • Abstract
    In this paper we approximate high-dimensional functions $f\colon\mathbb T^d\to\mathbb C$ by sparse trigonometric polynomials based on function evaluations. Recently it was shown that a dimension-incremental sparse Fourier transform (SFT) approach does not require the signal to be exactly sparse and is applicable in this setting. We combine this approach with subsampling techniques for rank-1 lattices. This way our approach benefits from the underlying structure in the sampling points making fast Fourier algorithms applicable whilst achieving the good sampling complexity of random points (logarithmic oversampling). In our analysis we show detection guarantees of the frequencies corresponding to the Fourier coefficients of largest magnitude. In numerical experiments we make a comparison to full rank-1 lattices and uniformly random points to confirm our findings.

Active User Identification in Fast Fading Massive Random Access Channels

  • Authors: Jyotish Robin, Elza Erkip
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.17543
  • Pdf link: https://arxiv.org/pdf/2303.17543
  • Abstract
    Reliable and prompt identification of active users is critical for enabling random access in massive machine-to-machine type networks which typically operate within stringent access delay and energy constraints. In this paper, an energy efficient active user identification protocol is envisioned in which the active users simultaneously transmit On-Off Keying (OOK) modulated preambles whereas the base station uses non-coherent detection to avoid the channel estimation overheads. The minimum number of channel-uses required for active user identification in the asymptotic regime of total number of users $\ell$ when the number of active devices k scales as $k = \Theta(1)$ is characterized along with an achievability scheme relying on the equivalence of activity detection to a group testing problem. A practical scheme for active user identification based on a belief propagation strategy is also proposed and its performance is compared against the theoretical bounds.

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

  • Authors: Chenpng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2303.17550
  • Pdf link: https://arxiv.org/pdf/2303.17550
  • Abstract
    While recent research has made significant progress in speech-driven talking face generation, the quality of the generated video still lags behind that of real recordings. One reason for this is the use of handcrafted intermediate representations like facial landmarks and 3DMM coefficients, which are designed based on human knowledge and are insufficient to precisely describe facial movements. Additionally, these methods require an external pretrained model for extracting these representations, whose performance sets an upper bound on talking face generation. To address these limitations, we propose a novel method called DAE-Talker that leverages data-driven latent representations obtained from a diffusion autoencoder (DAE). DAE contains an image encoder that encodes an image into a latent vector and a DDIM image decoder that reconstructs the image from it. We train our DAE on talking face video frames and then extract their latent representations as the training target for a Conformer-based speech2latent model. This allows DAE-Talker to synthesize full video frames and produce natural head movements that align with the content of speech, rather than relying on a predetermined head pose from a template video. We also introduce pose modelling in speech2latent for pose controllability. Additionally, we propose a novel method for generating continuous video frames with the DDIM image decoder trained on individual frames, eliminating the need for modelling the joint distribution of consecutive frames directly. Our experiments show that DAE-Talker outperforms existing popular methods in lip-sync, video fidelity, and pose naturalness. We also conduct ablation studies to analyze the effectiveness of the proposed techniques and demonstrate the pose controllability of DAE-Talker.

DDP: Diffusion Model for Dense Visual Prediction

  • Authors: Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17559
  • Pdf link: https://arxiv.org/pdf/2303.17559
  • Abstract
    We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task compared to the specialist counterparts. For example, semantic segmentation (83.9 mIoU on Cityscapes), BEV map segmentation (70.6 mIoU on nuScenes), and depth estimation (0.05 REL on KITTI). We hope that our approach will serve as a solid baseline and facilitate future research

Using AI to Measure Parkinson's Disease Severity at Home

  • Authors: Md Saiful Islam, Wasifur Rahman, Abdelrahman Abdelkader, Phillip T. Yang, Sangwu Lee, Jamie L. Adams, Ruth B. Schneider, E. Ray Dorsey, Ehsan Hoque
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17573
  • Pdf link: https://arxiv.org/pdf/2303.17573
  • Abstract
    We present an artificial intelligence system to remotely assess the motor performance of individuals with Parkinson's disease (PD). Participants performed a motor task (i.e., tapping fingers) in front of a webcam, and data from 250 global participants were rated by three expert neurologists following the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS). The neurologists' ratings were highly reliable, with an intra-class correlation coefficient (ICC) of 0.88. We developed computer algorithms to obtain objective measurements that align with the MDS-UPDRS guideline and are strongly correlated with the neurologists' ratings. Our machine learning model trained on these measures outperformed an MDS-UPDRS certified rater, with a mean absolute error (MAE) of 0.59 compared to the rater's MAE of 0.79. However, the model performed slightly worse than the expert neurologists (0.53 MAE). The methodology can be replicated for similar motor tasks, providing the possibility of evaluating individuals with PD and other movement disorders remotely, objectively, and in areas with limited access to neurological care.

Human-Robot Interaction using VAHR: Virtual Assistant, Human, and Robots in the Loop

  • Authors: Ahmad Amine, Mostafa Aldilati, Hadi Hasan, Noel Maalouf, Imad H. Elhajj
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17582
  • Pdf link: https://arxiv.org/pdf/2303.17582
  • Abstract
    Robots have become ubiquitous tools in various industries and households, highlighting the importance of human-robot interaction (\textbf{HRI}). This has increased the need for easy and accessible communication between humans and robots. Recent research has focused on the intersection of virtual assistant technology, such as Amazon's Alexa, with robots and its effect on HRI. This paper presents the Virtual Assistant, Human, and Robots in the loop (VAHR) system, which utilizes bidirectional communication to control multiple robots through Alexa. VAHR's performance was evaluated through a human-subjects experiment, comparing objective and subjective metrics of traditional keyboard and mouse interfaces to VAHR. The results showed that VAHR required 41% less Robot Attention Demand and ensured 91% more Fan-out time compared to the standard method. Additionally, VAHR led to a 62.5% improvement in multi-tasking, highlighting the potential for efficient human-robot interaction in physically- and mentally-demanding scenarios. However, subjective metrics revealed a need for human operators to build confidence and trust with this new method of operation.

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models

  • Authors: Eric Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, Humphrey Shi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17591
  • Pdf link: https://arxiv.org/pdf/2303.17591
  • Abstract
    The unlearning problem of deep learning models, once primarily an academic concern, has become a prevalent issue in the industry. The significant advances in text-to-image generation techniques have prompted global discussions on privacy, copyright, and safety, as numerous unauthorized personal IDs, content, artistic creations, and potentially harmful materials have been learned by these models and later utilized to generate and distribute uncontrolled content. To address this challenge, we propose \textbf{Forget-Me-Not}, an efficient and low-cost solution designed to safely remove specified IDs, objects, or styles from a well-configured text-to-image model in as little as 30 seconds, without impairing its ability to generate other content. Alongside our method, we introduce the \textbf{Memorization Score (M-Score)} and \textbf{ConceptBench} to measure the models' capacity to generate general concepts, grouped into three primary categories: ID, object, and style. Using M-Score and ConceptBench, we demonstrate that Forget-Me-Not can effectively eliminate targeted concepts while maintaining the model's performance on other concepts. Furthermore, Forget-Me-Not offers two practical extensions: a) removal of potentially harmful or NSFW content, and b) enhancement of model accuracy, inclusion and diversity through \textbf{concept correction and disentanglement}. It can also be adapted as a lightweight model patch for Stable Diffusion, allowing for concept manipulation and convenient distribution. To encourage future research in this critical area and promote the development of safe and inclusive generative models, we will open-source our code and ConceptBench at \href{https://github.com/SHI-Labs/Forget-Me-Not}{https://github.com/SHI-Labs/Forget-Me-Not}.

MobileInst: Video Instance Segmentation on the Mobile

  • Authors: Renhong Zhang, Tianheng Cheng, Shusheng Yang, Haoyi Jiang, Shuai Zhang, Jiancheng Lyu, Xin Li, Xiaowen Ying, Dashan Gao, Wenyu Liu, Xinggang Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17594
  • Pdf link: https://arxiv.org/pdf/2303.17594
  • Abstract
    Although recent approaches aiming for video instance segmentation have achieved promising results, it is still difficult to employ those approaches for real-world applications on mobile devices, which mainly suffer from (1) heavy computation and memory cost and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. Firstly, MobileInst adopts a mobile vision transformer to extract multi-level semantic features and presents an efficient query-based dual-transformer instance decoder for mask kernels and a semantic-enhanced mask decoder to generate instance segmentation per frame. Secondly, MobileInst exploits simple yet effective kernel reuse and kernel association to track objects for video instance segmentation. Further, we propose temporal query passing to enhance the tracking ability for kernels. We conduct experiments on COCO and YouTube-VIS datasets to demonstrate the superiority of MobileInst and evaluate the inference latency on a mobile CPU core of Qualcomm Snapdragon-778G, without other methods of acceleration. On the COCO dataset, MobileInst achieves 30.5 mask AP and 176 ms on the mobile CPU, which reduces the latency by 50% compared to the previous SOTA. For video instance segmentation, MobileInst achieves 35.0 AP on YouTube-VIS 2019 and 30.1 AP on YouTube-VIS 2021. Code will be available to facilitate real-world applications and future research.

Token Merging for Fast Stable Diffusion

  • Authors: Daniel Bolya, Judy Hoffman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17604
  • Pdf link: https://arxiv.org/pdf/2303.17604
  • Abstract
    The landscape of image generation has been forever changed by open vocabulary diffusion models. However, at their core these models use transformers, which makes generation slow. Better implementations to increase the throughput of these transformers have emerged, but they still evaluate the entire model. In this paper, we instead speed up diffusion models by exploiting natural redundancy in generated images by merging redundant tokens. After making some diffusion-specific improvements to Token Merging (ToMe), our ToMe for Stable Diffusion can reduce the number of tokens in an existing Stable Diffusion model by up to 60% while still producing high quality images without any extra training. In the process, we speed up image generation by up to 2x and reduce memory consumption by up to 5.6x. Furthermore, this speed-up stacks with efficient implementations such as xFormers, minimally impacting quality while being up to 5.4x faster for large images. Code is available at https://github.com/dbolya/tomesd.

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

  • Authors: Xuanyao Chen, Zhijian Liu, Haotian Tang, Li Yi, Hang Zhao, Song Han
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17605
  • Pdf link: https://arxiv.org/pdf/2303.17605
  • Abstract
    High-resolution images enable neural networks to learn richer visual representations. However, this improved performance comes at the cost of growing computational complexity, hindering their usage in latency-sensitive applications. As not all pixels are equal, skipping computations for less-important regions offers a simple and effective measure to reduce the computation. This, however, is hard to be translated into actual speedup for CNNs since it breaks the regularity of the dense convolution workload. In this paper, we introduce SparseViT that revisits activation sparsity for recent window-based vision transformers (ViTs). As window attentions are naturally batched over blocks, actual speedup with window activation pruning becomes possible: i.e., ~50% latency reduction with 60% sparsity. Different layers should be assigned with different pruning ratios due to their diverse sensitivities and computational costs. We introduce sparsity-aware adaptation and apply the evolutionary search to efficiently find the optimal layerwise sparsity configuration within the vast search space. SparseViT achieves speedups of 1.5x, 1.4x, and 1.3x compared to its dense counterpart in monocular 3D object detection, 2D instance segmentation, and 2D semantic segmentation, respectively, with negligible to no loss of accuracy.

Keyword: faster

Urgency-aware Routing in Single Origin-destination Itineraries through Artificial Currencies

  • Authors: Leonardo Pedroso, W.P.M.H. Heemels, Mauro Salazar
  • Subjects: Systems and Control (eess.SY); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2303.16945
  • Pdf link: https://arxiv.org/pdf/2303.16945
  • Abstract
    Within mobility systems, the presence of self-interested users can lead to aggregate routing patterns that are far from the societal optimum which could be achieved by centrally controlling the users' choices. In this paper, we design a fair incentive mechanism to steer the selfish behavior of the users to align with the societally optimal aggregate routing. The proposed mechanism is based on an artificial currency that cannot be traded or bought, but only spent or received when traveling. Specifically, we consider a parallel-arc network with a single origin and destination node within a repeated game setting whereby each user chooses from one of the available arcs to reach their destination on a daily basis. In this framework, taking faster routes comes at a cost, whereas taking slower routes is incentivized by a reward. The users are thus playing against their future selves when choosing their present actions. To capture this complex behavior, we assume the users to be rational and to minimize an urgency-weighted combination of their immediate and future discomfort. To design the optimal pricing, we first derive a closed-form expression for the best individual response strategy. Second, we formulate the pricing design problem for each arc to achieve the societally optimal aggregate flows, and reformulate it so that it can be solved with gradient-free optimization methods. Our numerical simulations show that it is possible to achieve a near-optimal routing whilst significantly reducing the users' perceived discomfort when compared to a centralized optimal but urgency-unaware policy.

PopSparse: Accelerated block sparse matrix multiplication on IPU

  • Authors: Zhiyi Li, Douglas Orr, Valeriu Ohan, Godfrey Da costa, Tom Murray, Adam Sanders, Deniz Beker, Dominic Masters
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16999
  • Pdf link: https://arxiv.org/pdf/2303.16999
  • Abstract
    Reducing the computational cost of running large scale neural networks using sparsity has attracted great attention in the deep learning community. While much success has been achieved in reducing FLOP and parameter counts while maintaining acceptable task performance, achieving actual speed improvements has typically been much more difficult, particularly on general purpose accelerators (GPAs) such as NVIDIA GPUs using low precision number formats. In this work we introduce PopSparse, a library that enables fast sparse operations on Graphcore IPUs by leveraging both the unique hardware characteristics of IPUs as well as any block structure defined in the data. We target two different types of sparsity: static, where the sparsity pattern is fixed at compile-time; and dynamic, where it can change each time the model is run. We present benchmark results for matrix multiplication for both of these modes on IPU with a range of block sizes, matrix sizes and densities. Results indicate that the PopSparse implementations are faster than dense matrix multiplications on IPU at a range of sparsity levels with large matrix size and block size. Furthermore, static sparsity in general outperforms dynamic sparsity. While previous work on GPAs has shown speedups only for very high sparsity (typically 99% and above), the present work demonstrates that our static sparse implementation outperforms equivalent dense calculations in FP16 at lower sparsity (around 90%).

Overcoming Challenges to Continuous Integration in HPC

  • Authors: Todd Gamblin, Daniel S. Katz
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.17034
  • Pdf link: https://arxiv.org/pdf/2303.17034
  • Abstract
    Continuous integration (CI) has become a ubiquitous practice in modern software development, with major code hosting services offering free automation on popular platforms. CI offers major benefits, as it enables detecting bugs in code prior to committing changes. While high-performance computing (HPC) research relies heavily on software, HPC machines are not considered "common" platforms. This presents several challenges that hinder the adoption of CI in HPC environments, making it difficult to maintain bug-free HPC projects, and resulting in adverse effects on the research community. In this article, we explore the challenges that impede HPC CI, such as hardware diversity, security, isolation, administrative policies, and non-standard authentication, environments, and job submission mechanisms. We propose several solutions that could enhance the quality of HPC software and the experience of developers. Implementing these solutions would require significant changes at HPC centers, but if these changes are made, it would ultimately enable faster and better science.

ACM with Overlapping Partitions: Implementation and Periodicity Analysis

  • Authors: Anthony O'Dea
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.17069
  • Pdf link: https://arxiv.org/pdf/2303.17069
  • Abstract
    The Arnold Cat Map (ACM) is a popular chaotic map used in image encryption. Chaotic maps are known for their sensitivity to initial conditions and their ability to mix, or rearrange, pixels. However, ACM is periodic, and the period is relatively short. This periodicity decreases the effective key space for a cryptosystem. Further, ACM can only be performed on square matrices. For non-square images, this issue can be solved by performing ACM on multiple square partitions of the image. If these partitions overlap, the periodicity will greatly increase. The resulting system will be referred to as overlapping ACM or OACM. This paper will cover the implementation and periodicity analysis for these overlapping systems, which previous papers involving similar overlapping block partitions did not. Viewing OACM as a scan as opposed to a map allows for faster implementation and period analysis.

TreePiece: Faster Semantic Parsing via Tree Tokenization

  • Authors: Sid Wang, Akshat Shrivastava, Sasha Livshits
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17161
  • Pdf link: https://arxiv.org/pdf/2303.17161
  • Abstract
    Autoregressive (AR) encoder-decoder neural networks have proved successful in many NLP problems, including Semantic Parsing -- a task that translates natural language to machine-readable parse trees. However, the sequential prediction process of AR models can be slow. To accelerate AR for semantic parsing, we introduce a new technique called TreePiece that tokenizes a parse tree into subtrees and generates one subtree per decoding step. On TopV2 benchmark, TreePiece shows 4.6 times faster decoding speed than standard AR, and comparable speed but significantly higher accuracy compared to Non-Autoregressive (NAR).

DPP-based Client Selection for Federated Learning with Non-IID Data

  • Authors: Yuxuan Zhang, Chao Xu, Howard H. Yang, Xijun Wang, Tony Q. S. Quek
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17358
  • Pdf link: https://arxiv.org/pdf/2303.17358
  • Abstract
    This paper proposes a client selection (CS) method to tackle the communication bottleneck of federated learning (FL) while concurrently coping with FL's data heterogeneity issue. Specifically, we first analyze the effect of CS in FL and show that FL training can be accelerated by adequately choosing participants to diversify the training dataset in each round of training. Based on this, we leverage data profiling and determinantal point process (DPP) sampling techniques to develop an algorithm termed Federated Learning with DPP-based Participant Selection (FL-DP$^3$S). This algorithm effectively diversifies the participants' datasets in each round of training while preserving their data privacy. We conduct extensive experiments to examine the efficacy of our proposed method. The results show that our scheme attains a faster convergence rate, as well as a smaller communication overhead than several baselines.

Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions

  • Authors: Yicheng Luo, Jackie Kay, Edward Grefenstette, Marc Peter Deisenroth
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17396
  • Pdf link: https://arxiv.org/pdf/2303.17396
  • Abstract
    Offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment. Online finetuning of such offline models can further improve performance. But how should we ideally finetune agents obtained from offline RL training? While offline RL algorithms can in principle be used for finetuning, in practice, their online performance improves slowly. In contrast, we show that it is possible to use standard online off-policy algorithms for faster improvement. However, we find this approach may suffer from policy collapse, where the policy undergoes severe performance deterioration during initial online learning. We investigate the issue of policy collapse and how it relates to data diversity, algorithm choices and online replay distribution. Based on these insights, we propose a conservative policy optimization procedure that can achieve stable and sample-efficient online learning from offline pretraining.

Edge Ranking of Graphs in Transportation Networks using a Graph Neural Network (GNN)

  • Authors: Debasish Jana, Sven Malama, Sriram Narasimhan, Ertugrul Taciroglu
  • Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17485
  • Pdf link: https://arxiv.org/pdf/2303.17485
  • Abstract
    Many networks, such as transportation, power, and water distribution, can be represented as graphs. Crucial challenge in graph representations is identifying the importance of graph edges and their influence on overall network efficiency and information flow performance. For example, important edges in a transportation network are those roads that, when affected, will significantly alter the network's overall efficiency. Commonly used approach to finding such important edges is ``edge betweenness centrality'' (EBC), an edge ranking measure to determine the influential edges of the graph based on connectivity and information spread. Computing the EBC utilizing the common Brandes algorithm involves calculating the shortest paths for every node pair, which can be computationally expensive and restrictive, especially for large graphs. Changes in the graph parameters, e.g., in the edge weight or the addition and deletion of nodes or edges, require the recalculation of the EBC. As the main contribution, we propose an approximate method to estimate the EBC using a Graph Neural Network (GNN), a deep learning-based approach. We show that it is computationally efficient compared to the conventional method, especially for large graphs. The proposed method of GNN-based edge ranking is evaluated on several synthetic graphs and a real-world transportation data set. We show that this framework can estimate the approximate edge ranking much faster compared to the conventional method. This approach is inductive, i.e., training and testing are performed on different sets of graphs with varying numbers of nodes and edges. The proposed method is especially suitable for applications on large-scale networks when edge information is desired, for example, in urban infrastructure improvement projects, power, and water network resilience analyses, and optimizing resource allocations in engineering networks.

Pgx: Hardware-accelerated parallel game simulation for reinforcement learning

  • Authors: Sotetsu Koyamada, Shinri Okano, Soichiro Nishimori, Yu Murata, Keigo Habara, Haruka Kita, Shin Ishii
  • Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17503
  • Pdf link: https://arxiv.org/pdf/2303.17503
  • Abstract
    We propose Pgx, a collection of board game simulators written in JAX. Thanks to auto-vectorization and Just-In-Time compilation of JAX, Pgx scales easily to thousands of parallel execution on GPU/TPU accelerators. We found that the simulation of Pgx on a single A100 GPU is 10x faster than that of existing reinforcement learning libraries. Pgx implements games considered vital benchmarks in artificial intelligence research, such as Backgammon, Shogi, and Go. Pgx is available at https://github.com/sotetsuk/pgx.

Token Merging for Fast Stable Diffusion

  • Authors: Daniel Bolya, Judy Hoffman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17604
  • Pdf link: https://arxiv.org/pdf/2303.17604
  • Abstract
    The landscape of image generation has been forever changed by open vocabulary diffusion models. However, at their core these models use transformers, which makes generation slow. Better implementations to increase the throughput of these transformers have emerged, but they still evaluate the entire model. In this paper, we instead speed up diffusion models by exploiting natural redundancy in generated images by merging redundant tokens. After making some diffusion-specific improvements to Token Merging (ToMe), our ToMe for Stable Diffusion can reduce the number of tokens in an existing Stable Diffusion model by up to 60% while still producing high quality images without any extra training. In the process, we speed up image generation by up to 2x and reduce memory consumption by up to 5.6x. Furthermore, this speed-up stacks with efficient implementations such as xFormers, minimally impacting quality while being up to 5.4x faster for large images. Code is available at https://github.com/dbolya/tomesd.

Keyword: mobile

A Tensor-based Convolutional Neural Network for Small Dataset Classification

  • Authors: Zhenhua Chen, David Crandall
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2303.17061
  • Pdf link: https://arxiv.org/pdf/2303.17061
  • Abstract
    Inspired by the ConvNets with structured hidden representations, we propose a Tensor-based Neural Network, TCNN. Different from ConvNets, TCNNs are composed of structured neurons rather than scalar neurons, and the basic operation is neuron tensor transformation. Unlike other structured ConvNets, where the part-whole relationships are modeled explicitly, the relationships are learned implicitly in TCNNs. Also, the structured neurons in TCNNs are high-rank tensors rather than vectors or matrices. We compare TCNNs with current popular ConvNets, including ResNets, MobileNets, EfficientNets, RegNets, etc., on CIFAR10, CIFAR100, and Tiny ImageNet. The experiment shows that TCNNs have higher efficiency in terms of parameters. TCNNs also show higher robustness against white-box adversarial attacks on MNIST compared to ConvNets.

Dependent Task Offloading in Edge Computing Using GNN and Deep Reinforcement Learning

  • Authors: Zequn Cao, Xiaoheng Deng
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.17100
  • Pdf link: https://arxiv.org/pdf/2303.17100
  • Abstract
    Task offloading is a widely used technology in Mobile Edge Computing (MEC), which declines the completion time of user task with the help of resourceful edge servers. Existing works mainly focus on the case that the computation density of a user task is homogenous so that it can be offloaded in full or by percentage. However, various user tasks in real life consist of several inner dependent subtasks, each of which is a minimum execution unit logically. Motivated by this gap, we aim to solve the Dependent Task Offloading (DTO) problem under multi-user multi-edge scenario in this paper. We firstly use Directed Acyclic Graph (DAG) to represent dependent task where nodes indicate subtasks and directed edges indicate dependencies among subtasks. Then we propose a scheme based on Graph Attention Network (GAT) and Deep Reinforcement Learning (DRL) to minimize the makespan of user tasks. To utilize GAT efficiently, we put the training of it on resourceful cloud in unsupervised style due to the numerous data and computation resource requirements. In addition, we design a multi-discrete Action space for DRL algorithm to enhance the applicability of our proposed scheme. Experiments are conducted on broadly distributed synthetic data. The results demonstrate that our proposed approach can be adapted to both simple and complex MEC environments and outperforms other methods.

Deep Generative Model and Its Applications in Efficient Wireless Network Management: A Tutorial and Case Study

  • Authors: Yinqiu Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Abbas Jamalipour
  • Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17114
  • Pdf link: https://arxiv.org/pdf/2303.17114
  • Abstract
    With the phenomenal success of diffusion models and ChatGPT, deep generation models (DGMs) have been experiencing explosive growth from 2022. Not limited to content generation, DGMs are also widely adopted in Internet of Things, Metaverse, and digital twin, due to their outstanding ability to represent complex patterns and generate plausible samples. In this article, we explore the applications of DGMs in a crucial task, i.e., improving the efficiency of wireless network management. Specifically, we firstly overview the generative AI, as well as three representative DGMs. Then, a DGM-empowered framework for wireless network management is proposed, in which we elaborate the issues of the conventional network management approaches, why DGMs can address them efficiently, and the step-by-step workflow for applying DGMs in managing wireless networks. Moreover, we conduct a case study on network economics, using the state-of-the-art DGM model, i.e., diffusion model, to generate effective contracts for incentivizing the mobile AI-Generated Content (AIGC) services. Last but not least, we discuss important open directions for the further research.

GAT-COBO: Cost-Sensitive Graph Neural Network for Telecom Fraud Detection

  • Authors: Xinxin Hu, Haotian Chen, Junjie Zhang, Hongchang Chen, Shuxin Liu, Xing Li, Yahui Wang, Xiangyang Xue
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17334
  • Pdf link: https://arxiv.org/pdf/2303.17334
  • Abstract
    Along with the rapid evolution of mobile communication technologies, such as 5G, there has been a drastically increase in telecom fraud, which significantly dissipates individual fortune and social wealth. In recent years, graph mining techniques are gradually becoming a mainstream solution for detecting telecom fraud. However, the graph imbalance problem, caused by the Pareto principle, brings severe challenges to graph data mining. This is a new and challenging problem, but little previous work has been noticed. In this paper, we propose a Graph ATtention network with COst-sensitive BOosting (GAT-COBO) for the graph imbalance problem. First, we design a GAT-based base classifier to learn the embeddings of all nodes in the graph. Then, we feed the embeddings into a well-designed cost-sensitive learner for imbalanced learning. Next, we update the weights according to the misclassification cost to make the model focus more on the minority class. Finally, we sum the node embeddings obtained by multiple cost-sensitive learners to obtain a comprehensive node representation, which is used for the downstream anomaly detection task. Extensive experiments on two real-world telecom fraud detection datasets demonstrate that our proposed method is effective for the graph imbalance problem, outperforming the state-of-the-art GNNs and GNN-based fraud detectors. In addition, our model is also helpful for solving the widespread over-smoothing problem in GNNs. The GAT-COBO code and datasets are available at https://github.com/xxhu94/GAT-COBO.

An Efficient Mobile Gateway Selection and Discovery Based-Routing Protocol in Heterogeneous LTE-VANET Networks

  • Authors: Driss Abada, Rachid Adrdor, Omar Boutkhoum, Adil Bohouch
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2303.17439
  • Pdf link: https://arxiv.org/pdf/2303.17439
  • Abstract
    Coupling cellular communication networks with vehicular ad hoc networks (VANET) can be a very interesting way out for providing Internet access to vehicles in the road. However, due to the several specific characteristics of VANETs, making an efficient multi-hop routing from vehicular sources to the Internet gateways through Long Term Evolution (LTE) technology is still challenging. In this paper, an Internet mobile gateway selection scheme is proposed to elect more potential vehicles to behave as gateways to Internet in VANETs. Therefore, the discovery and the selection of route to those mobiles gateways is carried out via an efficient multiple metrics-based relay selection mechanism. The objective is to select the more reliable route to the mobile gateways, by reducing the communication overhead and performing seamless handover. The proposed protocol is compared with one recent protocol based on packet delivery ratio, average end-to-end delay and overhead. The results show that the proposed protocol ameliorates significantly the network performance in the contrast of the other protocol.

Cost Sensitive GNN-based Imbalanced Learning for Mobile Social Network Fraud Detection

  • Authors: Xinxin Hu, Haotian Chen, Hongchang Chen, Shuxin Liu, Xing Li, Shibo Zhang, Yahui Wang, Xiangyang Xue
  • Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17486
  • Pdf link: https://arxiv.org/pdf/2303.17486
  • Abstract
    With the rapid development of mobile networks, the people's social contacts have been considerably facilitated. However, the rise of mobile social network fraud upon those networks, has caused a great deal of distress, in case of depleting personal and social wealth, then potentially doing significant economic harm. To detect fraudulent users, call detail record (CDR) data, which portrays the social behavior of users in mobile networks, has been widely utilized. But the imbalance problem in the aforementioned data, which could severely hinder the effectiveness of fraud detectors based on graph neural networks(GNN), has hardly been addressed in previous work. In this paper, we are going to present a novel Cost-Sensitive Graph Neural Network (CSGNN) by creatively combining cost-sensitive learning and graph neural networks. We conduct extensive experiments on two open-source realworld mobile network fraud datasets. The results show that CSGNN can effectively solve the graph imbalance problem and then achieve better detection performance than the state-of-the-art algorithms. We believe that our research can be applied to solve the graph imbalance problems in other fields. The CSGNN code and datasets are publicly available at https://github.com/xxhu94/CSGNN.

MobileInst: Video Instance Segmentation on the Mobile

  • Authors: Renhong Zhang, Tianheng Cheng, Shusheng Yang, Haoyi Jiang, Shuai Zhang, Jiancheng Lyu, Xin Li, Xiaowen Ying, Dashan Gao, Wenyu Liu, Xinggang Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17594
  • Pdf link: https://arxiv.org/pdf/2303.17594
  • Abstract
    Although recent approaches aiming for video instance segmentation have achieved promising results, it is still difficult to employ those approaches for real-world applications on mobile devices, which mainly suffer from (1) heavy computation and memory cost and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. Firstly, MobileInst adopts a mobile vision transformer to extract multi-level semantic features and presents an efficient query-based dual-transformer instance decoder for mask kernels and a semantic-enhanced mask decoder to generate instance segmentation per frame. Secondly, MobileInst exploits simple yet effective kernel reuse and kernel association to track objects for video instance segmentation. Further, we propose temporal query passing to enhance the tracking ability for kernels. We conduct experiments on COCO and YouTube-VIS datasets to demonstrate the superiority of MobileInst and evaluate the inference latency on a mobile CPU core of Qualcomm Snapdragon-778G, without other methods of acceleration. On the COCO dataset, MobileInst achieves 30.5 mask AP and 176 ms on the mobile CPU, which reduces the latency by 50% compared to the previous SOTA. For video instance segmentation, MobileInst achieves 35.0 AP on YouTube-VIS 2019 and 30.1 AP on YouTube-VIS 2021. Code will be available to facilitate real-world applications and future research.

Keyword: pruning

Explainable Intrusion Detection Systems Using Competitive Learning Techniques

  • Authors: Jesse Ables, Thomas Kirby, Sudip Mittal, Ioana Banicescu, Shahram Rahimi, William Anderson, Maria Seale
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17387
  • Pdf link: https://arxiv.org/pdf/2303.17387
  • Abstract
    The current state of the art systems in Artificial Intelligence (AI) enabled intrusion detection use a variety of black box methods. These black box methods are generally trained using Error Based Learning (EBL) techniques with a focus on creating accurate models. These models have high performative costs and are not easily explainable. A white box Competitive Learning (CL) based eXplainable Intrusion Detection System (X-IDS) offers a potential solution to these problem. CL models utilize an entirely different learning paradigm than EBL approaches. This different learning process makes the CL family of algorithms innately explainable and less resource intensive. In this paper, we create an X-IDS architecture that is based on DARPA's recommendation for explainable systems. In our architecture we leverage CL algorithms like, Self Organizing Maps (SOM), Growing Self Organizing Maps (GSOM), and Growing Hierarchical Self Organizing Map (GHSOM). The resulting models can be data-mined to create statistical and visual explanations. Our architecture is tested using NSL-KDD and CIC-IDS-2017 benchmark datasets, and produces accuracies that are 1% - 3% less than EBL models. However, CL models are much more explainable than EBL models. Additionally, we use a pruning process that is able to significantly reduce the size of these CL based models. By pruning our models, we are able to increase prediction speeds. Lastly, we analyze the statistical and visual explanations generated by our architecture, and we give a strategy that users could use to help navigate the set of explanations. These explanations will help users build trust with an Intrusion Detection System (IDS), and allow users to discover ways to increase the IDS's potency.

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

  • Authors: Xuanyao Chen, Zhijian Liu, Haotian Tang, Li Yi, Hang Zhao, Song Han
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17605
  • Pdf link: https://arxiv.org/pdf/2303.17605
  • Abstract
    High-resolution images enable neural networks to learn richer visual representations. However, this improved performance comes at the cost of growing computational complexity, hindering their usage in latency-sensitive applications. As not all pixels are equal, skipping computations for less-important regions offers a simple and effective measure to reduce the computation. This, however, is hard to be translated into actual speedup for CNNs since it breaks the regularity of the dense convolution workload. In this paper, we introduce SparseViT that revisits activation sparsity for recent window-based vision transformers (ViTs). As window attentions are naturally batched over blocks, actual speedup with window activation pruning becomes possible: i.e., ~50% latency reduction with 60% sparsity. Different layers should be assigned with different pruning ratios due to their diverse sensitivities and computational costs. We introduce sparsity-aware adaptation and apply the evolutionary search to efficiently find the optimal layerwise sparsity configuration within the vast search space. SparseViT achieves speedups of 1.5x, 1.4x, and 1.3x compared to its dense counterpart in monocular 3D object detection, 2D instance segmentation, and 2D semantic segmentation, respectively, with negligible to no loss of accuracy.

Keyword: voxel

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions

  • Authors: Lingdong Kong, Youquan Liu, Xin Li, Runnan Chen, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17597
  • Pdf link: https://arxiv.org/pdf/2303.17597
  • Abstract
    The robustness of 3D perception systems under natural corruptions from environments and sensors is pivotal for safety-critical applications. Existing large-scale 3D perception datasets often contain data that are meticulously cleaned. Such configurations, however, cannot reflect the reliability of perception models during the deployment stage. In this work, we present Robo3D, the first comprehensive benchmark heading toward probing the robustness of 3D detectors and segmentors under out-of-distribution scenarios against natural corruptions that occur in real-world environments. Specifically, we consider eight corruption types stemming from adversarial weather conditions, external disturbances, and internal sensor failure. We uncover that, although promising results have been progressively achieved on standard benchmarks, state-of-the-art 3D perception models are at risk of being vulnerable to corruptions. We draw key observations on the use of data representations, augmentation schemes, and training strategies, that could severely affect the model's performance. To pursue better robustness, we propose a density-insensitive training framework along with a simple flexible voxelization strategy to enhance the model resiliency. We hope our benchmark and approach could inspire future research in designing more robust and reliable 3D perception models. Our robustness benchmark suite is publicly available.

Keyword: lidar

T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals

  • Authors: James Giroux, Martin Bouchard, Robert Laganiere
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16940
  • Pdf link: https://arxiv.org/pdf/2303.16940
  • Abstract
    Object detection utilizing Frequency Modulated Continous Wave radar is becoming increasingly popular in the field of autonomous systems. Radar does not possess the same drawbacks seen by other emission-based sensors such as LiDAR, primarily the degradation or loss of return signals due to weather conditions such as rain or snow. However, radar does possess traits that make it unsuitable for standard emission-based deep learning representations such as point clouds. Radar point clouds tend to be sparse and therefore information extraction is not efficient. To overcome this, more traditional digital signal processing pipelines were adapted to form inputs residing directly in the frequency domain via Fast Fourier Transforms. Commonly, three transformations were used to form Range-Azimuth-Doppler cubes in which deep learning algorithms could perform object detection. This too has drawbacks, namely the pre-processing costs associated with performing multiple Fourier Transforms and normalization. We explore the possibility of operating on raw radar inputs from analog to digital converters via the utilization of complex transformation layers. Moreover, we introduce hierarchical Swin Vision transformers to the field of radar object detection and show their capability to operate on inputs varying in pre-processing, along with different radar configurations, i.e. relatively low and high numbers of transmitters and receivers, while obtaining on par or better results than the state-of-the-art.

BEVFusion4D: Learning LiDAR-Camera Fusion Under Bird's-Eye-View via Cross-Modality Guidance and Temporal Aggregation

  • Authors: Hongxiang Cai, Zeyuan Zhang, Zhenyu Zhou, Ziyin Li, Wenbo Ding, Jiuhua Zhao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17099
  • Pdf link: https://arxiv.org/pdf/2303.17099
  • Abstract
    Integrating LiDAR and Camera information into Bird's-Eye-View (BEV) has become an essential topic for 3D object detection in autonomous driving. Existing methods mostly adopt an independent dual-branch framework to generate LiDAR and camera BEV, then perform an adaptive modality fusion. Since point clouds provide more accurate localization and geometry information, they could serve as a reliable spatial prior to acquiring relevant semantic information from the images. Therefore, we design a LiDAR-Guided View Transformer (LGVT) to effectively obtain the camera representation in BEV space and thus benefit the whole dual-branch fusion system. LGVT takes camera BEV as the primitive semantic query, repeatedly leveraging the spatial cue of LiDAR BEV for extracting image features across multiple camera views. Moreover, we extend our framework into the temporal domain with our proposed Temporal Deformable Alignment (TDA) module, which aims to aggregate BEV features from multiple historical frames. Including these two modules, our framework dubbed BEVFusion4D achieves state-of-the-art results in 3D object detection, with 72.0% mAP and 73.5% NDS on the nuScenes validation set, and 73.3% mAP and 74.7% NDS on nuScenes test set, respectively.

Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving

  • Authors: Zijian Zhu, Yichi Zhang, Hai Chen, Yinpeng Dong, Shu Zhao, Wenbo Ding, Jiachen Zhong, Shibao Zheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.17297
  • Pdf link: https://arxiv.org/pdf/2303.17297
  • Abstract
    3D object detection is an essential perception task in autonomous driving to understand the environments. The Bird's-Eye-View (BEV) representations have significantly improved the performance of 3D detectors with camera inputs on popular benchmarks. However, there still lacks a systematic understanding of the robustness of these vision-dependent BEV models, which is closely related to the safety of autonomous driving systems. In this paper, we evaluate the natural and adversarial robustness of various representative models under extensive settings, to fully understand their behaviors influenced by explicit BEV features compared with those without BEV. In addition to the classic settings, we propose a 3D consistent patch attack by applying adversarial patches in the 3D space to guarantee the spatiotemporal consistency, which is more realistic for the scenario of autonomous driving. With substantial experiments, we draw several findings: 1) BEV models tend to be more stable than previous methods under different natural conditions and common corruptions due to the expressive spatial representations; 2) BEV models are more vulnerable to adversarial noises, mainly caused by the redundant BEV features; 3) Camera-LiDAR fusion models have superior performance under different settings with multi-modal inputs, but BEV fusion model is still vulnerable to adversarial noises of both point cloud and image. These findings alert the safety issue in the applications of BEV detectors and could facilitate the development of more robust models.

Event-based Agile Object Catching with a Quadrupedal Robot

  • Authors: Benedek Forrai, Takahiro Miki, Daniel Gehrig, Marco Hutter, Davide Scaramuzza
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17479
  • Pdf link: https://arxiv.org/pdf/2303.17479
  • Abstract
    Quadrupedal robots are conquering various indoor and outdoor applications due to their ability to navigate challenging uneven terrains. Exteroceptive information greatly enhances this capability since perceiving their surroundings allows them to adapt their controller and thus achieve higher levels of robustness. However, sensors such as LiDARs and RGB cameras do not provide sufficient information to quickly and precisely react in a highly dynamic environment since they suffer from a bandwidth-latency tradeoff. They require significant bandwidth at high frame rates while featuring significant perceptual latency at lower frame rates, thereby limiting their versatility on resource-constrained platforms. In this work, we tackle this problem by equipping our quadruped with an event camera, which does not suffer from this tradeoff due to its asynchronous and sparse operation. In leveraging the low latency of the events, we push the limits of quadruped agility and demonstrate high-speed ball catching for the first time. We show that our quadruped equipped with an event camera can catch objects with speeds up to 15 m/s from 4 meters, with a success rate of 83%. Using a VGA event camera, our method runs at 100 Hz on an NVIDIA Jetson Orin.

Keyword: diffusion

HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion

  • Authors: Ziya Erkoç, Fangchang Ma, Qi Shan, Matthias Nießner, Angela Dai
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17015
  • Pdf link: https://arxiv.org/pdf/2303.17015
  • Abstract
    Implicit neural fields, typically encoded by a multilayer perceptron (MLP) that maps from coordinates (e.g., xyz) to signals (e.g., signed distances), have shown remarkable promise as a high-fidelity and compact representation. However, the lack of a regular and explicit grid structure also makes it challenging to apply generative modeling directly on implicit neural fields in order to synthesize new data. To this end, we propose HyperDiffusion, a novel approach for unconditional generative modeling of implicit neural fields. HyperDiffusion operates directly on MLP weights and generates new neural implicit fields encoded by synthesized MLP parameters. Specifically, a collection of MLPs is first optimized to faithfully represent individual data samples. Subsequently, a diffusion process is trained in this MLP weight space to model the underlying distribution of neural implicit fields. HyperDiffusion enables diffusion modeling over a implicit, compact, and yet high-fidelity representation of complex signals across 3D shapes and 4D mesh animations within one single unified framework.

DiffCollage: Parallel Generation of Large Content with Diffusion Models

  • Authors: Qinsheng Zhang, Jiaming Song, Xun Huang, Yongxin Chen, Ming-Yu Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17076
  • Pdf link: https://arxiv.org/pdf/2303.17076
  • Abstract
    We present DiffCollage, a compositional diffusion model that can generate large content by leveraging diffusion models trained on generating pieces of the large content. Our approach is based on a factor graph representation where each factor node represents a portion of the content and a variable node represents their overlap. This representation allows us to aggregate intermediate outputs from diffusion models defined on individual nodes to generate content of arbitrary size and shape in parallel without resorting to an autoregressive generation procedure. We apply DiffCollage to various tasks, including infinite image generation, panorama image generation, and long-duration text-guided motion generation. Extensive experimental results with a comparison to strong autoregressive baselines verify the effectiveness of our approach.

Deep Generative Model and Its Applications in Efficient Wireless Network Management: A Tutorial and Case Study

  • Authors: Yinqiu Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Abbas Jamalipour
  • Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17114
  • Pdf link: https://arxiv.org/pdf/2303.17114
  • Abstract
    With the phenomenal success of diffusion models and ChatGPT, deep generation models (DGMs) have been experiencing explosive growth from 2022. Not limited to content generation, DGMs are also widely adopted in Internet of Things, Metaverse, and digital twin, due to their outstanding ability to represent complex patterns and generate plausible samples. In this article, we explore the applications of DGMs in a crucial task, i.e., improving the efficiency of wireless network management. Specifically, we firstly overview the generative AI, as well as three representative DGMs. Then, a DGM-empowered framework for wireless network management is proposed, in which we elaborate the issues of the conventional network management approaches, why DGMs can address them efficiently, and the step-by-step workflow for applying DGMs in managing wireless networks. Moreover, we conduct a case study on network economics, using the state-of-the-art DGM model, i.e., diffusion model, to generate effective contracts for incentivizing the mobile AI-Generated Content (AIGC) services. Last but not least, we discuss important open directions for the further research.

Discriminative Class Tokens for Text-to-Image Diffusion Models

  • Authors: Idan Schwartz, Vésteinn Snæbjarnarson, Sagie Benaim, Hila Chefer, Ryan Cotterell, Lior Wolf, Serge Belongie
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17155
  • Pdf link: https://arxiv.org/pdf/2303.17155
  • Abstract
    Recent advances in text-to-image diffusion models have enabled the generation of diverse and high-quality images. However, generated images often fall short of depicting subtle details and are susceptible to errors due to ambiguity in the input text. One way of alleviating these issues is to train diffusion models on class-labeled datasets. This comes with a downside, doing so limits their expressive power: (i) supervised datasets are generally small compared to large-scale scraped text-image datasets on which text-to-image models are trained, and so the quality and diversity of generated images are severely affected, or (ii) the input is a hard-coded label, as opposed to free-form text, which limits the control over the generated images. In this work, we propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text while achieving high accuracy through discriminative signals from a pretrained classifier, which guides the generation. This is done by iteratively modifying the embedding of a single input token of a text-to-image diffusion model, using the classifier, by steering generated images toward a given target class. Our method is fast compared to prior fine-tuning methods and does not require a collection of in-class images or retraining of a noise-tolerant classifier. We evaluate our method extensively, showing that the generated images are: (i) more accurate and of higher quality than standard diffusion models, (ii) can be used to augment training data in a low-resource setting, and (iii) reveal information about the data used to train the guiding classifier. The code is available at \url{https://github.com/idansc/discriminative_class_tokens}

LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation

  • Authors: Guangcong Zheng, Xianpan Zhou, Xuewei Li, Zhongang Qi, Ying Shan, Xi Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17189
  • Pdf link: https://arxiv.org/pdf/2303.17189
  • Abstract
    Recently, diffusion models have achieved great success in image synthesis. However, when it comes to the layout-to-image generation where an image often has a complex scene of multiple objects, how to make strong control over both the global layout map and each detailed object remains a challenging task. In this paper, we propose a diffusion model named LayoutDiffusion that can obtain higher generation quality and greater controllability than the previous works. To overcome the difficult multimodal fusion of image and layout, we propose to construct a structural image patch with region information and transform the patched image into a special layout to fuse with the normal layout in a unified form. Moreover, Layout Fusion Module (LFM) and Object-aware Cross Attention (OaCA) are proposed to model the relationship among multiple objects and designed to be object-aware and position-sensitive, allowing for precisely controlling the spatial related information. Extensive experiments show that our LayoutDiffusion outperforms the previous SOTA methods on FID, CAS by relatively 46.35%, 26.70% on COCO-stuff and 44.29%, 41.82% on VG. Code is available at https://github.com/ZGCTroy/LayoutDiffusion.

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models

  • Authors: Vidit Goel, Elia Peruzzo, Yifan Jiang, Dejia Xu, Nicu Sebe, Trevor Darrell, Zhangyang Wang, Humphrey Shi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17546
  • Pdf link: https://arxiv.org/pdf/2303.17546
  • Abstract
    Image editing using diffusion models has witnessed extremely fast-paced growth recently. There are various ways in which previous works enable controlling and editing images. Some works use high-level conditioning such as text, while others use low-level conditioning. Nevertheless, most of them lack fine-grained control over the properties of the different objects present in the image, i.e. object-level image editing. In this work, we consider an image as a composition of multiple objects, each defined by various properties. Out of these properties, we identify structure and appearance as the most intuitive to understand and useful for editing purposes. We propose Structure-and-Appearance Paired Diffusion model (PAIR-Diffusion), which is trained using structure and appearance information explicitly extracted from the images. The proposed model enables users to inject a reference image's appearance into the input image at both the object and global levels. Additionally, PAIR-Diffusion allows editing the structure while maintaining the style of individual components of the image unchanged. We extensively evaluate our method on LSUN datasets and the CelebA-HQ face dataset, and we demonstrate fine-grained control over both structure and appearance at the object level. We also applied the method to Stable Diffusion to edit any real image at the object level.

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

  • Authors: Chenpng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2303.17550
  • Pdf link: https://arxiv.org/pdf/2303.17550
  • Abstract
    While recent research has made significant progress in speech-driven talking face generation, the quality of the generated video still lags behind that of real recordings. One reason for this is the use of handcrafted intermediate representations like facial landmarks and 3DMM coefficients, which are designed based on human knowledge and are insufficient to precisely describe facial movements. Additionally, these methods require an external pretrained model for extracting these representations, whose performance sets an upper bound on talking face generation. To address these limitations, we propose a novel method called DAE-Talker that leverages data-driven latent representations obtained from a diffusion autoencoder (DAE). DAE contains an image encoder that encodes an image into a latent vector and a DDIM image decoder that reconstructs the image from it. We train our DAE on talking face video frames and then extract their latent representations as the training target for a Conformer-based speech2latent model. This allows DAE-Talker to synthesize full video frames and produce natural head movements that align with the content of speech, rather than relying on a predetermined head pose from a template video. We also introduce pose modelling in speech2latent for pose controllability. Additionally, we propose a novel method for generating continuous video frames with the DDIM image decoder trained on individual frames, eliminating the need for modelling the joint distribution of consecutive frames directly. Our experiments show that DAE-Talker outperforms existing popular methods in lip-sync, video fidelity, and pose naturalness. We also conduct ablation studies to analyze the effectiveness of the proposed techniques and demonstrate the pose controllability of DAE-Talker.

DDP: Diffusion Model for Dense Visual Prediction

  • Authors: Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17559
  • Pdf link: https://arxiv.org/pdf/2303.17559
  • Abstract
    We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task compared to the specialist counterparts. For example, semantic segmentation (83.9 mIoU on Cityscapes), BEV map segmentation (70.6 mIoU on nuScenes), and depth estimation (0.05 REL on KITTI). We hope that our approach will serve as a solid baseline and facilitate future research

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models

  • Authors: Eric Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, Humphrey Shi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17591
  • Pdf link: https://arxiv.org/pdf/2303.17591
  • Abstract
    The unlearning problem of deep learning models, once primarily an academic concern, has become a prevalent issue in the industry. The significant advances in text-to-image generation techniques have prompted global discussions on privacy, copyright, and safety, as numerous unauthorized personal IDs, content, artistic creations, and potentially harmful materials have been learned by these models and later utilized to generate and distribute uncontrolled content. To address this challenge, we propose \textbf{Forget-Me-Not}, an efficient and low-cost solution designed to safely remove specified IDs, objects, or styles from a well-configured text-to-image model in as little as 30 seconds, without impairing its ability to generate other content. Alongside our method, we introduce the \textbf{Memorization Score (M-Score)} and \textbf{ConceptBench} to measure the models' capacity to generate general concepts, grouped into three primary categories: ID, object, and style. Using M-Score and ConceptBench, we demonstrate that Forget-Me-Not can effectively eliminate targeted concepts while maintaining the model's performance on other concepts. Furthermore, Forget-Me-Not offers two practical extensions: a) removal of potentially harmful or NSFW content, and b) enhancement of model accuracy, inclusion and diversity through \textbf{concept correction and disentanglement}. It can also be adapted as a lightweight model patch for Stable Diffusion, allowing for concept manipulation and convenient distribution. To encourage future research in this critical area and promote the development of safe and inclusive generative models, we will open-source our code and ConceptBench at \href{https://github.com/SHI-Labs/Forget-Me-Not}{https://github.com/SHI-Labs/Forget-Me-Not}.

Consistent View Synthesis with Pose-Guided Diffusion Models

  • Authors: Hung-Yu Tseng, Qinbo Li, Changil Kim, Suhib Alsisan, Jia-Bin Huang, Johannes Kopf
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17598
  • Pdf link: https://arxiv.org/pdf/2303.17598
  • Abstract
    Novel view synthesis from a single image has been a cornerstone problem for many Virtual Reality applications that provide immersive experiences. However, most existing techniques can only synthesize novel views within a limited range of camera motion or fail to generate consistent and high-quality novel views under significant camera movement. In this work, we propose a pose-guided diffusion model to generate a consistent long-term video of novel views from a single image. We design an attention layer that uses epipolar lines as constraints to facilitate the association between different viewpoints. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of the proposed diffusion model against state-of-the-art transformer-based and GAN-based approaches.

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

  • Authors: Wen Wang, Kangyang Xie, Zide Liu, Hao Chen, Yue Cao, Xinlong Wang, Chunhua Shen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17599
  • Pdf link: https://arxiv.org/pdf/2303.17599
  • Abstract
    Large-scale text-to-image diffusion models achieve unprecedented success in image generation and editing. However, how to extend such success to video editing is unclear. Recent initial attempts at video editing require significant text-to-video data and computation resources for training, which is often not accessible. In this work, we propose vid2vid-zero, a simple yet effective method for zero-shot video editing. Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video. At the core of our method is a null-text inversion module for text-to-video alignment, a cross-frame modeling module for temporal consistency, and a spatial regularization module for fidelity to the original video. Without any training, we leverage the dynamic nature of the attention mechanism to enable bi-directional temporal modeling at test time. Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos. Code will be made available at \url{https://github.com/baaivision/vid2vid-zero}.

Token Merging for Fast Stable Diffusion

  • Authors: Daniel Bolya, Judy Hoffman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17604
  • Pdf link: https://arxiv.org/pdf/2303.17604
  • Abstract
    The landscape of image generation has been forever changed by open vocabulary diffusion models. However, at their core these models use transformers, which makes generation slow. Better implementations to increase the throughput of these transformers have emerged, but they still evaluate the entire model. In this paper, we instead speed up diffusion models by exploiting natural redundancy in generated images by merging redundant tokens. After making some diffusion-specific improvements to Token Merging (ToMe), our ToMe for Stable Diffusion can reduce the number of tokens in an existing Stable Diffusion model by up to 60% while still producing high quality images without any extra training. In the process, we speed up image generation by up to 2x and reduce memory consumption by up to 5.6x. Furthermore, this speed-up stacks with efficient implementations such as xFormers, minimally impacting quality while being up to 5.4x faster for large images. Code is available at https://github.com/dbolya/tomesd.

AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control

  • Authors: Ruixiang Jiang, Can Wang, Jingbo Zhang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17606
  • Pdf link: https://arxiv.org/pdf/2303.17606
  • Abstract
    Neural implicit fields are powerful for representing 3D scenes and generating high-quality novel views, but it remains challenging to use such implicit representations for creating a 3D human avatar with a specific identity and artistic style that can be easily animated. Our proposed method, AvatarCraft, addresses this challenge by using diffusion models to guide the learning of geometry and texture for a neural avatar based on a single text prompt. We carefully design the optimization framework of neural implicit fields, including a coarse-to-fine multi-bounding box training strategy, shape regularization, and diffusion-based constraints, to produce high-quality geometry and texture. Additionally, we make the human avatar animatable by deforming the neural implicit field with an explicit warping field that maps the target human mesh to a template human mesh, both represented using parametric human models. This simplifies animation and reshaping of the generated avatar by controlling pose and shape parameters. Extensive experiments on various text descriptions show that AvatarCraft is effective and robust in creating human avatars and rendering novel views, poses, and shapes. Our project page is: \url{https://avatar-craft.github.io/}.

Keyword: dynamic

Thrust vector control and state estimation architecture for low-cost small-scale launchers

  • Authors: Pedro dos Santos, Paulo Oliveira
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.16983
  • Pdf link: https://arxiv.org/pdf/2303.16983
  • Abstract
    This paper proposes an integrated architecture for Thrust Vector Control (TVC) and state estimation for low-cost small-scale launchers, naturally unstable, and propelled by a solid motor. The architecture is based on a non-linear, six-degrees-of-freedom model for the generic thrust-vector-controlled launcher dynamics and kinematics, deduced and implemented in a realistic simulation environment. For estimation and control design purposes, a linearized version of the model is proposed. Single-nozzle TVC actuation is adopted, allowing for pitch and yaw control, with the control law being derived from the Linear Quadratic Regulator (LQR) with additional integral action (LQI). The control system is implemented through gain scheduling. Full state estimation is performed resorting to complementary kinematic filters, closely related to linear Kalman fitering theory. The architecture, composed by the navigation and control systems, is tested in simulation environment, demonstrating satisfactory attitude tracking performance and robustness to both external disturbances and model uncertainties.

PopSparse: Accelerated block sparse matrix multiplication on IPU

  • Authors: Zhiyi Li, Douglas Orr, Valeriu Ohan, Godfrey Da costa, Tom Murray, Adam Sanders, Deniz Beker, Dominic Masters
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16999
  • Pdf link: https://arxiv.org/pdf/2303.16999
  • Abstract
    Reducing the computational cost of running large scale neural networks using sparsity has attracted great attention in the deep learning community. While much success has been achieved in reducing FLOP and parameter counts while maintaining acceptable task performance, achieving actual speed improvements has typically been much more difficult, particularly on general purpose accelerators (GPAs) such as NVIDIA GPUs using low precision number formats. In this work we introduce PopSparse, a library that enables fast sparse operations on Graphcore IPUs by leveraging both the unique hardware characteristics of IPUs as well as any block structure defined in the data. We target two different types of sparsity: static, where the sparsity pattern is fixed at compile-time; and dynamic, where it can change each time the model is run. We present benchmark results for matrix multiplication for both of these modes on IPU with a range of block sizes, matrix sizes and densities. Results indicate that the PopSparse implementations are faster than dense matrix multiplications on IPU at a range of sparsity levels with large matrix size and block size. Furthermore, static sparsity in general outperforms dynamic sparsity. While previous work on GPAs has shown speedups only for very high sparsity (typically 99% and above), the present work demonstrates that our static sparse implementation outperforms equivalent dense calculations in FP16 at lower sparsity (around 90%).

Scalable Implicit Solvers with Dynamic Mesh Adaptation for a Relativistic Drift-Kinetic Fokker-Planck-Boltzmann Model

  • Authors: Johann Rudi, Max Heldman, Emil M. Constantinescu, Qi Tang, Xian-Zhu Tang
  • Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Plasma Physics (physics.plasm-ph)
  • Arxiv link: https://arxiv.org/abs/2303.17019
  • Pdf link: https://arxiv.org/pdf/2303.17019
  • Abstract
    In this work we consider a relativistic drift-kinetic model for runaway electrons along with a Fokker-Planck operator for small-angle Coulomb collisions, a radiation damping operator, and a secondary knock-on (Boltzmann) collision source. We develop a new scalable fully implicit solver utilizing finite volume and conservative finite difference schemes and dynamic mesh adaptivity. A new data management framework in the PETSc library based on the p4est library is developed to enable simulations with dynamic adaptive mesh refinement (AMR), parallel computation, and load balancing. This framework is tested through the development of the runaway electron solver that is able to dynamically capture both bulk Maxwellian at the low-energy region and a runaway tail at the high-energy region. To effectively capture features via the AMR algorithm, a new AMR indicator prediction strategy is proposed that is performed alongside the implicit time evolution of the solution. This strategy is complemented by the introduction of computationally cheap feature-based AMR indicators that are analyzed theoretically. Numerical results quantify the advantages of the prediction strategy in better capturing features compared with nonpredictive strategies; and we demonstrate trade-offs regarding computational costs. The full solver is further verified through several benchmark problems including manufactured solutions and solutions of physics models. We particularly focus on demonstrating the advantages of using implicit time stepping and AMR for runaway electron simulations.

Stability bounds of droop-controlled inverters in power grid networks

  • Authors: Philipp C. Böttcher, Leonardo Rydin Gorjão, Dirk Witthaut
  • Subjects: Systems and Control (eess.SY); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2303.17032
  • Pdf link: https://arxiv.org/pdf/2303.17032
  • Abstract
    The energy mix of future power systems will include high shares of wind power and solar PV. These generation facilities are generally connected via power-electronic inverters. While conventional generation responds dynamically to the state of the electric power system, inverters are power electronic hardware and need to be programmed to react to the state of the system. Choosing an appropriate control scheme and the corresponding parameters is necessary to guarantee that the system operates safely. A prominent control scheme for inverters is droop control, which mimics the response of conventional generation. In this work, we investigate the stability of coupled systems of droop-controlled inverters in arbitrary network topologies. Employing linear stability analysis, we derive effective local stability criteria that consider both the overall network topology as well as its interplay with the inverters' intrinsic parameters. First, we explore the stability of an inverter coupled to an infinite grid in an analytic fashion and uncover stability and instability regions. Secondly, we extend the analysis to a generic topology of inverters and provide mathematical criteria for stability and instability of the system. Last, we showcase the usefulness of the criteria by examining two model systems using numerical simulations. The developed criteria show which parameters might lead to an unstable operating state.

Material-agnostic Shaping of Granular Materials with Optimal Transport

  • Authors: Nikhilesh Alatur, Olov Andersson, Roland Siegwart, Lionel Ott
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17047
  • Pdf link: https://arxiv.org/pdf/2303.17047
  • Abstract
    From construction materials, such as sand or asphalt, to kitchen ingredients, like rice, sugar, or salt; the world is full of granular materials. Despite impressive progress in robotic manipulation, manipulating and interacting with granular material remains a challenge due to difficulties in perceiving, representing, modelling, and planning for these variable materials that have complex internal dynamics. While some prior work has looked into estimating or learning accurate dynamics models for granular materials, the literature is still missing a more abstract planning method that can be used for planning manipulation actions for granular materials with unknown material properties. In this work, we leverage tools from optimal transport and connect them to robot motion planning. We propose a heuristics-based sweep planner that does not require knowledge of the material's properties and directly uses a height map representation to generate promising sweeps. These sweeps transform granular material from arbitrary start shapes into arbitrary target shapes. We apply the sweep planner in a fast and reactive feedback loop and avoid the need for model-based planning over multiple time steps. We validate our approach with a large set of simulation and hardware experiments where we show that our method is capable of efficiently solving several complex tasks, including gathering, separating, and shaping of several types of granular materials into different target shapes.

Modularized Control Synthesis for Complex Signal Temporal Logic Specifications

  • Authors: Zengjie Zhang, Sofie Haesaert
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.17086
  • Pdf link: https://arxiv.org/pdf/2303.17086
  • Abstract
    The control synthesis of a dynamic system subject to signal temporal logic (STL) specifications is commonly formulated as a mixed-integer linear programming (MILP) problem. Solving a MILP problem is computationally expensive when the STL formulas are long and complex. In this paper, we propose a framework to transform a long and complex STL formula into a syntactically separate form, i.e., the logical combination of a series of short and simple subformulas with non-overlapping timing intervals. Using this framework, one can easily modularize the synthesis of a complex formula using the synthesis solutions of the subformulas, which improves the efficiency of solving a MILP problem. Specifically, we propose a group of separation principles to guarantee the syntactic equivalence between the original formula and its syntactically separate counterpart. Then, we propose novel methods to solve the largest satisfaction region and the open-loop controller of the specification in a modularized manner. The efficacy of the methods is validated with a robot monitoring case study in simulation. Our work is promising to promote the efficiency of control synthesis for systems with complicated specifications.

Learning Reliable Representations for Incomplete Multi-View Partial Multi-Label Classification

  • Authors: Chengliang Liu, Jie Wen, Yong Xu, Liqiang Nie, Min Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17117
  • Pdf link: https://arxiv.org/pdf/2303.17117
  • Abstract
    As a cross-topic of multi-view learning and multi-label classification, multi-view multi-label classification has gradually gained traction in recent years. The application of multi-view contrastive learning has further facilitated this process, however, the existing multi-view contrastive learning methods crudely separate the so-called negative pair, which largely results in the separation of samples belonging to the same category or similar ones. Besides, plenty of multi-view multi-label learning methods ignore the possible absence of views and labels. To address these issues, in this paper, we propose an incomplete multi-view partial multi-label classification network named RANK. In this network, a label-driven multi-view contrastive learning strategy is proposed to leverage supervised information to preserve the structure within view and perform consistent alignment across views. Furthermore, we break through the view-level weights inherent in existing methods and propose a quality-aware sub-network to dynamically assign quality scores to each view of each sample. The label correlation information is fully utilized in the final multi-label cross-entropy classification loss, effectively improving the discriminative power. Last but not least, our model is not only able to handle complete multi-view multi-label datasets, but also works on datasets with missing instances and labels. Extensive experiments confirm that our RANK outperforms existing state-of-the-art methods.

Weighted Scheduling of Time-Sensitive Coflows

  • Authors: Olivier Brun, Rachid El-Azouzi, Quang-Trung Luu, Francesco De Pellergrini, Balakrishna J. Prabhu, Cédric Richier
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.17175
  • Pdf link: https://arxiv.org/pdf/2303.17175
  • Abstract
    Datacenter networks routinely support the data transfers of distributed computing frameworks in the form of coflows, i.e., sets of concurrent flows related to a common task. The vast majority of the literature has focused on the problem of scheduling coflows for completion time minimization, i.e., to maximize the average rate at which coflows are dispatched in the network fabric. However, many modern applications generate coflows dedicated to online services and mission-critical computing tasks which have to comply with specific completion deadlines. In this paper, we introduce $\mathtt{WDCoflow}$, a new algorithm to maximize the weighted number of coflows that complete before their deadline. By combining a dynamic programming algorithm along with parallel inequalities, our heuristic solution performs at once coflow admission control and coflow prioritization, imposing a $\sigma$-order on the set of coflows. With extensive simulation, we demonstrate the effectiveness of our algorithm in improving up to $3\times$ more coflows that meet their deadline in comparison the best SotA solution, namely $\mathtt{CS\text{-}MHA}$. Furthermore, when weights are used to differentiate coflow classes, $\mathtt{WDCoflow}$ is able to improve the admission per class up to $4\times$, while increasing the average weighted coflow admission rate.

Innovative Countermeasures to Defeat Cyber Attacks Against Blockchain Wallets: A Crypto Terminal Use Case

  • Authors: Pascal Urien (LTCI)
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.17206
  • Pdf link: https://arxiv.org/pdf/2303.17206
  • Abstract
    Blockchain transactions are signed by private keys. Secure key storage and tamper-proof computers are essential requirements for deploying a trusted infrastructure. In this paper, we identify some threats against blockchain wallets and propose a set of physical and logical countermeasures to thwart them. We present the crypto terminal device, operating with a removable secure element, built on open software and hardware architectures, capable of detecting a cloned device or corrupted software. These technologies are based on tamper-resistant computing (javacard), smart card anti-cloning, smart card content attestation, application firewall, bare-metal architecture, remote attestation, dynamic Physical Unclonable Function (dPUF), and programming tokens as a root of trust.This paper is an extended version of the paper ''Innovative Countermeasures to Defeat Cyber Attacks Against Blockchain Wallets,'' 2021 5th Cyber Security in Networking Conference (CSNet), 2021, pp. 49-54, doi: 10.1109/CSNet52717.2021.9614649

Multifactor Sequential Disentanglement via Structured Koopman Autoencoders

  • Authors: Nimrod Berman, Ilan Naiman, Omri Azencot
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17264
  • Pdf link: https://arxiv.org/pdf/2303.17264
  • Abstract
    Disentangling complex data to its latent factors of variation is a fundamental task in representation learning. Existing work on sequential disentanglement mostly provides two factor representations, i.e., it separates the data to time-varying and time-invariant factors. In contrast, we consider multifactor disentanglement in which multiple (more than two) semantic disentangled components are generated. Key to our approach is a strong inductive bias where we assume that the underlying dynamics can be represented linearly in the latent space. Under this assumption, it becomes natural to exploit the recently introduced Koopman autoencoder models. However, disentangled representations are not guaranteed in Koopman approaches, and thus we propose a novel spectral loss term which leads to structured Koopman matrices and disentanglement. Overall, we propose a simple and easy to code new deep model that is fully unsupervised and it supports multifactor disentanglement. We showcase new disentangling abilities such as swapping of individual static factors between characters, and an incremental swap of disentangled factors from the source to the target. Moreover, we evaluate our method extensively on two factor standard benchmark tasks where we significantly improve over competing unsupervised approaches, and we perform competitively in comparison to weakly- and self-supervised state-of-the-art approaches. The code is available at https://github.com/azencot-group/SKD.

Improved a posteriori Error Bounds for Reduced port-Hamiltonian Systems

  • Authors: Johannes Rettberg, Dominik Wittwar, Patrick Buchfink, Robin Herkert, Jörg Fehr, Bernard Haasdonk
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.17329
  • Pdf link: https://arxiv.org/pdf/2303.17329
  • Abstract
    Projection-based model order reduction of dynamical systems usually introduces an error between the high-fidelity model and its counterpart of lower dimension. This unknown error can be bounded by residual-based methods, which are typically known to be highly pessimistic in the sense of largely overestimating the true error. This work applies two improved error bounding techniques, namely (a) a hierarchical error bound and (b) an error bound based on an auxiliary linear problem, to the case of port-Hamiltonian systems. The approaches rely on a second approximation of (a) the dynamical system and (b) the error system. In this paper, these methods are for the first time adapted to port-Hamiltonian systems by exploiting their structure. The mathematical relationship between the two methods is discussed both, theoretically and numerically. The effectiveness of the described methods is demonstrated using a challenging three-dimensional port-Hamiltonian model of a classical guitar with fluid-structure interaction.

Uniform Substitution for Dynamic Logic with Communicating Hybrid Programs

  • Authors: Marvin Brieger, Stefan Mitsch, André Platzer
  • Subjects: Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2303.17333
  • Pdf link: https://arxiv.org/pdf/2303.17333
  • Abstract
    This paper introduces a uniform substitution calculus for $d\mathcal{L}\text{CHP}$, the dynamic logic of communicating hybrid programs. Uniform substitution enables parsimonious prover kernels by using axioms instead of axiom schemata. Instantiations can be recovered from a single proof rule responsible for soundness-critical instantiation checks rather than being spread across axiom schemata in side conditions. Even though communication and parallelism reasoning are notorious for necessitating subtle soundness-critical side conditions, uniform substitution when generalized to $d\mathcal{L}\text{CHP}$ manages to limit and isolate their conceptual overhead. Since uniform substitution has proven to simplify the implementation of hybrid systems provers substantially, uniform substitution for $d\mathcal{L}_\text{CHP}$ paves the way for a parsimonious implementation of theorem provers for hybrid systems with communication and parallelism.

The Essential Algorithms for the Matrix Chain

  • Authors: Francisco López, Lars Karlsson, Paolo Bientinesi
  • Subjects: Discrete Mathematics (cs.DM)
  • Arxiv link: https://arxiv.org/abs/2303.17352
  • Pdf link: https://arxiv.org/pdf/2303.17352
  • Abstract
    For a given product of $n$ matrices, the matrix chain multiplication problem asks for a parenthesisation that minimises the number of arithmetic operations. In 1973, Godbole presented a now classical dynamic programming formulation with cubic time complexity on the length of the chain. The best known algorithms run in linearithmic time, and the best known approximation algorithms run in linear time with an approximation factor smaller than two. All solutions have in common that they select an optimal parenthesisation from a set of $C_{n-1}$ (Catalan number $n - 1$) distinct parenthesisations. We studied the set of parenthesisations and discovered (a) that all of the exponentially many parenthesisations are useful in the sense that they are optimal in an infinite subset of the input space, (b) that only $n + 1$ parenthesisations are essential in the sense that they are arbitrarily better than the second best on an infinite subset of the input space, and (c) that the best essential parenthesisation is never more than twice as costly as the best non-essential parenthesisation. Through random sampling of the input space, we further discovered that the set of essential parenthesisations includes an optimal parenthesisation in the vast majority of inputs, and that the best essential parenthesisation is on average much closer to optimal than the worst-case bound. The results have direct consequences for the development of compilers for linear algebra expressions where the matrix sizes are unknown at compile-time.

Dynamic Conceptional Contrastive Learning for Generalized Category Discovery

  • Authors: Nan Pu, Zhun Zhong, Nicu Sebe
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17393
  • Pdf link: https://arxiv.org/pdf/2303.17393
  • Abstract
    Generalized category discovery (GCD) is a recently proposed open-world problem, which aims to automatically cluster partially labeled data. The main challenge is that the unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories. This leads traditional novel category discovery (NCD) methods to be incapacitated for GCD, due to their assumption of unlabeled data are only from novel categories. One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data. However, this manner largely ignores underlying relationships between instances of the same concepts (e.g., class, super-class, and sub-class), which results in inferior representation learning. In this paper, we propose a Dynamic Conceptional Contrastive Learning (DCCL) framework, which can effectively improve clustering accuracy by alternately estimating underlying visual conceptions and learning conceptional representation. In addition, we design a dynamic conception generation and update mechanism, which is able to ensure consistent conception learning and thus further facilitate the optimization of DCCL. Extensive experiments show that DCCL achieves new state-of-the-art performances on six generic and fine-grained visual recognition datasets, especially on fine-grained ones. For example, our method significantly surpasses the best competitor by 16.2% on the new classes for the CUB-200 dataset. Code is available at https://github.com/TPCD/DCCL.

Fast inference of latent space dynamics in huge relational event networks

  • Authors: Igor Artico, Ernst Wit
  • Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2303.17460
  • Pdf link: https://arxiv.org/pdf/2303.17460
  • Abstract
    Relational events are a type of social interactions, that sometimes are referred to as dynamic networks. Its dynamics typically depends on emerging patterns, so-called endogenous variables, or external forces, referred to as exogenous variables. Comprehensive information on the actors in the network, especially for huge networks, is rare, however. A latent space approach in network analysis has been a popular way to account for unmeasured covariates that are driving network configurations. Bayesian and EM-type algorithms have been proposed for inferring the latent space, but both the sheer size many social network applications as well as the dynamic nature of the process, and therefore the latent space, make computations prohibitively expensive. In this work we propose a likelihood-based algorithm that can deal with huge relational event networks. We propose a hierarchical strategy for inferring network community dynamics embedded into an interpretable latent space. Node dynamics are described by smooth spline processes. To make the framework feasible for large networks we borrow from machine learning optimization methodology. Model-based clustering is carried out via a convex clustering penalization, encouraging shared trajectories for ease of interpretation. We propose a model-based approach for separating macro-microstructures and perform a hierarchical analysis within successive hierarchies. The method can fit millions of nodes on a public Colab GPU in a few minutes. The code and a tutorial are available in a Github repository.

PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation

  • Authors: Qitao Zhao, Ce Zheng, Mengyuan Liu, Pichao Wang, Chen Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17472
  • Pdf link: https://arxiv.org/pdf/2303.17472
  • Abstract
    Recently, transformer-based methods have gained significant success in sequential 2D-to-3D lifting human pose estimation. As a pioneering work, PoseFormer captures spatial relations of human joints in each video frame and human dynamics across frames with cascaded transformer layers and has achieved impressive performance. However, in real scenarios, the performance of PoseFormer and its follow-ups is limited by two factors: (a) The length of the input joint sequence; (b) The quality of 2D joint detection. Existing methods typically apply self-attention to all frames of the input sequence, causing a huge computational burden when the frame number is increased to obtain advanced estimation accuracy, and they are not robust to noise naturally brought by the limited capability of 2D joint detectors. In this paper, we propose PoseFormerV2, which exploits a compact representation of lengthy skeleton sequences in the frequency domain to efficiently scale up the receptive field and boost robustness to noisy 2D joint detection. With minimum modifications to PoseFormer, the proposed method effectively fuses features both in the time domain and frequency domain, enjoying a better speed-accuracy trade-off than its precursor. Extensive experiments on two benchmark datasets (i.e., Human3.6M and MPI-INF-3DHP) demonstrate that the proposed approach significantly outperforms the original PoseFormer and other transformer-based variants. Code is released at \url{https://github.com/QitaoZhao/PoseFormerV2}.

Differentiable Environment Primitives for Contact State Estimation

  • Authors: Kevin Haninger, Kangwagye Samuel, Filippo Rozzi, Sehoon Oh, Loris Roveda
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17476
  • Pdf link: https://arxiv.org/pdf/2303.17476
  • Abstract
    In contact-rich manipulation, the robot dynamics are coupled with an environment that has application-specific dynamic properties (stiffness, inertia) and geometry (contact normal). Knowledge of these environmental parameters can improve control and monitoring, but they are often unobserved and may vary, either online or between task instances. Observers, such as the extended Kalman filter, can be used to estimate these parameters, but such model-based techniques can require too much engineering work to scale up to complex environments, such as multi-point contact. To accelerate environment modeling, we propose environment primitives: parameterized environment dynamics that can be connected in parallel and are expressed in an automatic differentiation framework. This simplifies offline gradient-based optimization to fit model parameters and linearization of the coupled dynamics for an observer. This method is implemented for stiffness contact models, allowing the fitting of contact geometry and stiffness offline or their online estimation by an extended Kalman filter. This method is applied to a collaborative robot, estimating external force, contact stiffness, and contact geometry from the motor position and current. The estimates of external force and stiffness are compared with a momentum observer and direct force measurements.

On the Analysis of Computational Delays in Reinforcement Learning-based Rate Adaptation Algorithms

  • Authors: Ricardo Trancoso, Ruben Queiros, Helder Fontes, Rui Campos
  • Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17477
  • Pdf link: https://arxiv.org/pdf/2303.17477
  • Abstract
    Several research works have applied Reinforcement Learning (RL) algorithms to solve the Rate Adaptation (RA) problem in Wi-Fi networks. The dynamic nature of the radio link requires the algorithms to be responsive to changes in link quality. Delays in the execution of the algorithm may be detrimental to its performance, which in turn may decrease network performance. This aspect has been overlooked in the state of the art. In this paper, we present an analysis of common computational delays in RL-based RA algorithms, and propose a methodology that may be applied to reduce these computational delays and increase the efficiency of this type of algorithms. We apply the proposed methodology to an existing RL-based RA algorithm. The obtained experimental results indicate a reduction of one order of magnitude in the execution time of the algorithm, improving its responsiveness to link quality changes.

Event-based Agile Object Catching with a Quadrupedal Robot

  • Authors: Benedek Forrai, Takahiro Miki, Daniel Gehrig, Marco Hutter, Davide Scaramuzza
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17479
  • Pdf link: https://arxiv.org/pdf/2303.17479
  • Abstract
    Quadrupedal robots are conquering various indoor and outdoor applications due to their ability to navigate challenging uneven terrains. Exteroceptive information greatly enhances this capability since perceiving their surroundings allows them to adapt their controller and thus achieve higher levels of robustness. However, sensors such as LiDARs and RGB cameras do not provide sufficient information to quickly and precisely react in a highly dynamic environment since they suffer from a bandwidth-latency tradeoff. They require significant bandwidth at high frame rates while featuring significant perceptual latency at lower frame rates, thereby limiting their versatility on resource-constrained platforms. In this work, we tackle this problem by equipping our quadruped with an event camera, which does not suffer from this tradeoff due to its asynchronous and sparse operation. In leveraging the low latency of the events, we push the limits of quadruped agility and demonstrate high-speed ball catching for the first time. We show that our quadruped equipped with an event camera can catch objects with speeds up to 15 m/s from 4 meters, with a success rate of 83%. Using a VGA event camera, our method runs at 100 Hz on an NVIDIA Jetson Orin.

Teaching contact-rich tasks from visual demonstrations by constraint extraction

  • Authors: Christian Hegeler, Filippo Rozzi, Loris Roveda, Kevin Haninger
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17481
  • Pdf link: https://arxiv.org/pdf/2303.17481
  • Abstract
    Contact-rich manipulation involves kinematic constraints on the task motion, typically with discrete transitions between these constraints during the task. Allowing the robot to detect and reason about these contact constraints can support robust and dynamic manipulation, but how can these contact models be efficiently learned? Purely visual observations are an attractive data source, allowing passive task demonstrations with unmodified objects. Existing approaches for vision-only learning from demonstration are effective in pick-and-place applications and planar tasks. Nevertheless, accuracy/occlusions and unobserved task dynamics can limit their robustness in contact-rich manipulation. To use visual demonstrations for contact-rich robotic tasks, we consider the demonstration of pose trajectories with transitions between holonomic kinematic constraints, first clustering the trajectories into discrete contact modes, then fitting kinematic constraints per each mode. The fit constraints are then used to (i) detect contact online with force/torque measurements and (ii) plan the robot policy with respect to the active constraint. We demonstrate the approach with real experiments, on cabling and rake tasks, showing the approach gives robust manipulation through contact transitions.

DDP: Diffusion Model for Dense Visual Prediction

  • Authors: Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17559
  • Pdf link: https://arxiv.org/pdf/2303.17559
  • Abstract
    We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task compared to the specialist counterparts. For example, semantic segmentation (83.9 mIoU on Cityscapes), BEV map segmentation (70.6 mIoU on nuScenes), and depth estimation (0.05 REL on KITTI). We hope that our approach will serve as a solid baseline and facilitate future research

TiDy-PSFs: Computational Imaging with Time-Averaged Dynamic Point-Spread-Functions

  • Authors: Sachin Shah, Sakshum Kulshrestha, Christopher A. Metzler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17583
  • Pdf link: https://arxiv.org/pdf/2303.17583
  • Abstract
    Point-spread-function (PSF) engineering is a powerful computational imaging techniques wherein a custom phase mask is integrated into an optical system to encode additional information into captured images. Used in combination with deep learning, such systems now offer state-of-the-art performance at monocular depth estimation, extended depth-of-field imaging, lensless imaging, and other tasks. Inspired by recent advances in spatial light modulator (SLM) technology, this paper answers a natural question: Can one encode additional information and achieve superior performance by changing a phase mask dynamically over time? We first prove that the set of PSFs described by static phase masks is non-convex and that, as a result, time-averaged PSFs generated by dynamic phase masks are fundamentally more expressive. We then demonstrate, in simulation, that time-averaged dynamic (TiDy) phase masks can offer substantially improved monocular depth estimation and extended depth-of-field imaging performance.

Polarity is all you need to learn and transfer faster

  • Authors: Qingyang Wang, Michael A.Powell, Ali Geisa, Eric Bridgeford, Joshua T. Vogelstein
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
  • Arxiv link: https://arxiv.org/abs/2303.17589
  • Pdf link: https://arxiv.org/pdf/2303.17589
  • Abstract
    Natural intelligences (NIs) thrive in a dynamic world - they learn quickly, sometimes with only a few samples. In contrast, Artificial intelligences (AIs) typically learn with prohibitive amount of training samples and computational power. What design principle difference between NI and AI could contribute to such a discrepancy? Here, we propose an angle from weight polarity: development processes initialize NIs with advantageous polarity configurations; as NIs grow and learn, synapse magnitudes update yet polarities are largely kept unchanged. We demonstrate with simulation and image classification tasks that if weight polarities are adequately set $\textit{a priori}$, then networks learn with less time and data. We also explicitly illustrate situations in which $\textit{a priori}$ setting the weight polarities is disadvantageous for networks. Our work illustrates the value of weight polarities from the perspective of statistical and computational efficiency during learning.

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

  • Authors: Wen Wang, Kangyang Xie, Zide Liu, Hao Chen, Yue Cao, Xinlong Wang, Chunhua Shen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17599
  • Pdf link: https://arxiv.org/pdf/2303.17599
  • Abstract
    Large-scale text-to-image diffusion models achieve unprecedented success in image generation and editing. However, how to extend such success to video editing is unclear. Recent initial attempts at video editing require significant text-to-video data and computation resources for training, which is often not accessible. In this work, we propose vid2vid-zero, a simple yet effective method for zero-shot video editing. Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video. At the core of our method is a null-text inversion module for text-to-video alignment, a cross-frame modeling module for temporal consistency, and a spatial regularization module for fidelity to the original video. Without any training, we leverage the dynamic nature of the attention mechanism to enable bi-directional temporal modeling at test time. Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos. Code will be made available at \url{https://github.com/baaivision/vid2vid-zero}.

New submissions for Mon, 1 May 23

Keyword: efficient

MINN: Learning the dynamics of differential-algebraic equations and application to battery modeling

  • Authors: Yicun Huang, Changfu Zou, Yang Li, Torsten Wik
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14422
  • Pdf link: https://arxiv.org/pdf/2304.14422
  • Abstract
    The concept of integrating physics-based and data-driven approaches has become popular for modeling sustainable energy systems. However, the existing literature mainly focuses on the data-driven surrogates generated to replace physics-based models. These models often trade accuracy for speed but lack the generalisability, adaptability, and interpretability inherent in physics-based models, which are often indispensable in the modeling of real-world dynamic systems for optimization and control purposes. In this work, we propose a novel architecture for generating model-integrated neural networks (MINN) to allow integration on the level of learning physics-based dynamics of the system. The obtained hybrid model solves an unsettled research problem in control-oriented modeling, i.e., how to obtain an optimally simplified model that is physically insightful, numerically accurate, and computationally tractable simultaneously. We apply the proposed neural network architecture to model the electrochemical dynamics of lithium-ion batteries and show that MINN is extremely data-efficient to train while being sufficiently generalizable to previously unseen input data, owing to its underlying physical invariants. The MINN battery model has an accuracy comparable to the first principle-based model in predicting both the system outputs and any locally distributed electrochemical behaviors but achieves two orders of magnitude reduction in the solution time.

Model Explainability in Physiological and Healthcare-based Neural Networks

  • Authors: Rohit Sharma, Abhinav Gupta, Arnav Gupta, Bo Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.14495
  • Pdf link: https://arxiv.org/pdf/2304.14495
  • Abstract
    The estimation and monitoring of SpO2 are crucial for assessing lung function and treating chronic pulmonary diseases. The COVID-19 pandemic has highlighted the importance of early detection of changes in SpO2, particularly in asymptomatic patients with clinical deterioration. However, conventional SpO2 measurement methods rely on contact-based sensing, presenting the risk of cross-contamination and complications in patients with impaired limb perfusion. Additionally, pulse oximeters may not be available in marginalized communities and undeveloped countries. To address these limitations and provide a more comfortable and unobtrusive way to monitor SpO2, recent studies have investigated SpO2 measurement using videos. However, measuring SpO2 using cameras in a contactless way, particularly from smartphones, is challenging due to weaker physiological signals and lower optical selectivity of smartphone camera sensors. The system includes three main steps: 1) extraction of the region of interest (ROI), which includes the palm and back of the hand, from the smartphone-captured videos; 2) spatial averaging of the ROI to produce R, G, and B time series; and 3) feeding the time series into an optophysiology-inspired CNN for SpO2 estimation. Our proposed method can provide a more efficient and accurate way to monitor SpO2 using videos captured from consumer-grade smartphones, which can be especially useful in telehealth and health screening settings.

MWaste: A Deep Learning Approach to Manage Household Waste

  • Authors: Suman Kunwar
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14498
  • Pdf link: https://arxiv.org/pdf/2304.14498
  • Abstract
    Computer vision methods have shown to be effective in classifying garbage into recycling categories for waste processing, existing methods are costly, imprecise, and unclear. To tackle this issue, we introduce MWaste, a mobile application that uses computer vision and deep learning techniques to classify waste materials as trash, plastic, paper, metal, glass or cardboard. Its effectiveness was tested on various neural network architectures and real-world images, achieving an average precision of 92% on the test set. This app can help combat climate change by enabling efficient waste processing and reducing the generation of greenhouse gases caused by incorrect waste disposal.

SRCNet: Seminal Representation Collaborative Network for Marine Oil Spill Segmentation

  • Authors: Fang Chen, Heiko Balzter, Peng Ren, Huiyu Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14500
  • Pdf link: https://arxiv.org/pdf/2304.14500
  • Abstract
    Effective oil spill segmentation in Synthetic Aperture Radar (SAR) images is critical for marine oil pollution cleanup, and proper image representation is helpful for accurate image segmentation. In this paper, we propose an effective oil spill image segmentation network named SRCNet by leveraging SAR image representation and the training for oil spill segmentation simultaneously. Specifically, our proposed segmentation network is constructed with a pair of deep neural nets with the collaboration of the seminal representation that describes SAR images, where one deep neural net is the generative net which strives to produce oil spill segmentation maps, and the other is the discriminative net which trys its best to distinguish between the produced and the true segmentations, and they thus built a two-player game. Particularly, the seminal representation exploited in our proposed SRCNet originates from SAR imagery, modelling with the internal characteristics of SAR images. Thus, in the training process, the collaborated seminal representation empowers the mapped generative net to produce accurate oil spill segmentation maps efficiently with small amount of training data, promoting the discriminative net reaching its optimal solution at a fast speed. Therefore, our proposed SRCNet operates effective oil spill segmentation in an economical and efficient manner. Additionally, to increase the segmentation capability of the proposed segmentation network in terms of accurately delineating oil spill details in SAR images, a regularisation term that penalises the segmentation loss is devised. This encourages our proposed SRCNet for accurately segmenting oil spill areas from SAR images. Empirical experimental evaluations from different metrics validate the effectiveness of our proposed SRCNet for oil spill image segmentation.

Read My Mind: A Multi-Modal Dataset for Human Belief Prediction

  • Authors: Jiafei Duan, Samson Yu, Nicholas Tan, Yi Ru Wang, Cheston Tan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.14501
  • Pdf link: https://arxiv.org/pdf/2304.14501
  • Abstract
    Understanding human intentions is key to enabling effective and efficient human-robot interaction (HRI) in collaborative settings. To enable developments and evaluation of the ability of artificial intelligence (AI) systems to infer human beliefs, we introduce a large-scale multi-modal video dataset for intent prediction based on object-context relations.

Suspicious Vehicle Detection Using Licence Plate Detection And Facial Feature Recognition

  • Authors: Vrinda Agarwal, Aaron George Pichappa, Manideep Ramisetty, Bala Murugan MS, Manoj kumar Rajagopal
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.14507
  • Pdf link: https://arxiv.org/pdf/2304.14507
  • Abstract
    With the increasing need to strengthen vehicle safety and detection, the availability of pre-existing methods of catching criminals and identifying vehicles manually through the various traffic surveillance cameras is not only time-consuming but also inefficient. With the advancement of technology in every field the use of real-time traffic surveillance models will help facilitate an easy approach. Keeping this in mind, the main focus of our paper is to develop a combined face recognition and number plate recognition model to ensure vehicle safety and real-time tracking of running-away criminals and stolen vehicles.

An Efficient Ensemble Explainable AI (XAI) Approach for Morphed Face Detection

  • Authors: Rudresh Dwivedi, Ritesh Kumar, Deepak Chopra, Pranay Kothari, Manjot Singh
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14509
  • Pdf link: https://arxiv.org/pdf/2304.14509
  • Abstract
    The extensive utilization of biometric authentication systems have emanated attackers / imposters to forge user identity based on morphed images. In this attack, a synthetic image is produced and merged with genuine. Next, the resultant image is user for authentication. Numerous deep neural convolutional architectures have been proposed in literature for face Morphing Attack Detection (MADs) to prevent such attacks and lessen the risks associated with them. Although, deep learning models achieved optimal results in terms of performance, it is difficult to understand and analyse these networks since they are black box/opaque in nature. As a consequence, incorrect judgments may be made. There is, however, a dearth of literature that explains decision-making methods of black box deep learning models for biometric Presentation Attack Detection (PADs) or MADs that can aid the biometric community to have trust in deep learning-based biometric systems for identification and authentication in various security applications such as border control, criminal database establishment etc. In this work, we present a novel visual explanation approach named Ensemble XAI integrating Saliency maps, Class Activation Maps (CAM) and Gradient-CAM (Grad-CAM) to provide a more comprehensive visual explanation for a deep learning prognostic model (EfficientNet-B1) that we have employed to predict whether the input presented to a biometric authentication system is morphed or genuine. The experimentations have been performed on three publicly available datasets namely Face Research Lab London Set, Wide Multi-Channel Presentation Attack (WMCA), and Makeup Induced Face Spoofing (MIFS). The experimental evaluations affirms that the resultant visual explanations highlight more fine-grained details of image features/areas focused by EfficientNet-B1 to reach decisions along with appropriate reasoning.

Visual Referential Games Further the Emergence of Disentangled Representations

  • Authors: Kevin Denamganaï, Sondess Missaoui, James Alfred Walker
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.14511
  • Pdf link: https://arxiv.org/pdf/2304.14511
  • Abstract
    Natural languages are powerful tools wielded by human beings to communicate information. Among their desirable properties, compositionality has been the main focus in the context of referential games and variants, as it promises to enable greater systematicity to the agents which would wield it. The concept of disentanglement has been shown to be of paramount importance to learned representations that generalise well in deep learning, and is thought to be a necessary condition to enable systematicity. Thus, this paper investigates how do compositionality at the level of the emerging languages, disentanglement at the level of the learned representations, and systematicity relate to each other in the context of visual referential games. Firstly, we find that visual referential games that are based on the Obverter architecture outperforms state-of-the-art unsupervised learning approach in terms of many major disentanglement metrics. Secondly, we expand the previously proposed Positional Disentanglement (PosDis) metric for compositionality to (re-)incorporate some concerns pertaining to informativeness and completeness features found in the Mutual Information Gap (MIG) disentanglement metric it stems from. This extension allows for further discrimination between the different kind of compositional languages that emerge in the context of Obverter-based referential games, in a way that neither the referential game accuracy nor previous metrics were able to capture. Finally we investigate whether the resulting (emergent) systematicity, as measured by zero-shot compositional learning tests, correlates with any of the disentanglement and compositionality metrics proposed so far. Throughout the training process, statically significant correlation coefficients can be found both positive and negative depending on the moment of the measure.

Multivariate Representation Learning for Information Retrieval

  • Authors: Hamed Zamani, Michael Bendersky
  • Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14522
  • Pdf link: https://arxiv.org/pdf/2304.14522
  • Abstract
    Dense retrieval models use bi-encoder network architectures for learning query and document representations. These representations are often in the form of a vector representation and their similarities are often computed using the dot product function. In this paper, we propose a new representation learning framework for dense retrieval. Instead of learning a vector for each query and document, our framework learns a multivariate distribution and uses negative multivariate KL divergence to compute the similarity between distributions. For simplicity and efficiency reasons, we assume that the distributions are multivariate normals and then train large language models to produce mean and variance vectors for these distributions. We provide a theoretical foundation for the proposed framework and show that it can be seamlessly integrated into the existing approximate nearest neighbor algorithms to perform retrieval efficiently. We conduct an extensive suite of experiments on a wide range of datasets, and demonstrate significant improvements compared to competitive dense retrieval models.

It is all about where you start: Text-to-image generation with seed selection

  • Authors: Dvir Samuel, Rami Ben-Ari, Simon Raviv, Nir Darshan, Gal Chechik
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14530
  • Pdf link: https://arxiv.org/pdf/2304.14530
  • Abstract
    Text-to-image diffusion models can synthesize a large variety of concepts in new compositions and scenarios. However, they still struggle with generating uncommon concepts, rare unusual combinations, or structured concepts like hand palms. Their limitation is partly due to the long-tail nature of their training data: web-crawled data sets are strongly unbalanced, causing models to under-represent concepts from the tail of the distribution. Here we characterize the effect of unbalanced training data on text-to-image models and offer a remedy. We show that rare concepts can be correctly generated by carefully selecting suitable generation seeds in the noise space, a technique that we call SeedSelect. SeedSelect is efficient and does not require retraining the diffusion model. We evaluate the benefit of SeedSelect on a series of problems. First, in few-shot semantic data augmentation, where we generate semantically correct images for few-shot and long-tail benchmarks. We show classification improvement on all classes, both from the head and tail of the training data of diffusion models. We further evaluate SeedSelect on correcting images of hands, a well-known pitfall of current diffusion models, and show that it improves hand generation substantially.

Identifying Minimal Changes in the Zone Abstract Domain

  • Authors: Kenny Ballou, Elena Sherman
  • Subjects: Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.14550
  • Pdf link: https://arxiv.org/pdf/2304.14550
  • Abstract
    Verification techniques express program states as logical formulas over program variables. For example, symbolic execution and abstract interpretation encode program states as a set of integer inequalities. However, for real-world programs these formulas tend to become large, which affects scalability of analyses. To address this problem, researchers developed complementary approaches which either remove redundant inequalities or extract a subset of inequalities sufficient for specific reasoning. For arbitrary integer inequalities, such reduction approaches either have high complexities or over-approximate. However, efficiency and precision of these approaches can be improved for a restricted type of logical formulas used in relational numerical abstract domains. While previous work investigated custom efficient redundant inequality elimination for Zones states, our work examines custom semantic slicing algorithms that identify a minimal set of changed inequalities in Zones states. The client application of the minimal changes in Zones is an empirical study on comparison between invariants computed by data-flow analysis using Zones, Intervals and Predicates numerical domains. In particular, evaluations compare how our proposed algorithms affect the precision of comparing Zones vs. Intervals and Zones vs. Predicates abstract domains. The results show our techniques reduce the number of variables by more than 70% and the number of inequalities by 30%, compared to full states. The approach refines the granularity of comparison between domains, reducing incomparable invariants between Zones and Predicates from 52% to 4%, and increases equality of Intervals and Zones, invariants from 27% to 71%. The techniques improve the comparison efficiency by reducing total runtime for all subject comparisons for Zones and Predicates from over 4 minutes to a few seconds.

Neural Implicit Dense Semantic SLAM

  • Authors: Yasaman Haghighi, Suryansh Kumar, Jean Philippe Thiran, Luc Van Gool
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14560
  • Pdf link: https://arxiv.org/pdf/2304.14560
  • Abstract
    This paper presents an efficient online framework to solve the well-known semantic Visual Simultaneous Localization and Mapping (V-SLAM) problem for indoor scenes leveraging the advantages of neural implicit scene representation. Existing methods on similar lines, such as NICE-SLAM, has some critical practical limitations to put to use for such an important indoor scene understanding problem. To this end, we contend for the following proposition for modern semantic V-SLAM contrary to existing methods assuming RGB-D frames as input (i) For a rigid scene, robust and accurate camera motion could be computed with disentangled tracking and 3D mapping pipeline. (ii) Using neural fields, a dense and multifaceted scene representation of SDF, semantics, RGB, and depth is provided memory efficiently. (iii) Rather than using every frame, we demonstrate that the set of keyframes is sufficient to learn excellent scene representation, thereby improving the pipeline's train time. (iv) Multiple local mapping networks could be used to extend the pipeline for large-scale scenes. We show via extensive experiments on several popular benchmark datasets that our approach offers accurate tracking, mapping, and semantic labeling at test time even with noisy and highly sparse depth measurements. Later in the paper, we show that our pipeline can easily extend to RGB image input. Overall, the proposed pipeline offers a favorable solution to an important scene understanding task that can assist in diverse robot visual perception and related problems.

An Adaptive Channel Reservation MAC Protocol Based on Forwarding Traffic of Key Nodes

  • Authors: Ze Liu, Bo Li, Mao Yang, ZhongJiang Yan
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.14581
  • Pdf link: https://arxiv.org/pdf/2304.14581
  • Abstract
    Ad Hoc networks with multi-hop topology are widely used in military and civilian applications. One challenge for Ad Hoc networks is to design efficient Media Access Control (MAC) protocols to ensure the quality of service (QoS). In Ad Hoc networks, there is a kind of node called key node, which undertakes more forwarding traffic than other surrounding nodes. The number of neighbor nodes around key nodes is often large, and the surrounding channel environment and interference are often more complex. Thus, the key nodes can hardly get enough channel access opportunities, resulting in poor end-to-end performance. Therefore, we propose an adaptive channel reservation MAC protocol based on forwarding traffic of key nodes, which is aimed at alleviating the congestion for key nodes. Nodes initiate reservations for future transmission time according to the buffer status before sending packets and then calculate the Weight of Reservation Ability (WRA). The node adaptively adjusts its reservation opportunity by comparing the WRA with neighbor nodes, thus improving the channel access efficiency and ensuring the transmission opportunity of key nodes. Extensive simulation confirms that our proposed FTKN-CRM provides significant improvements in end-to-end performance over the IEEE 802.11ax protocol and other reservation access protocols.

Learning adaptive manipulation of objects with revolute joint: A case study on varied cabinet doors opening

  • Authors: Hongxiang Yu, Dashun Guo, Zhongxiang Zhou, Yue Wang, Rong Xiong
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14602
  • Pdf link: https://arxiv.org/pdf/2304.14602
  • Abstract
    This paper introduces a learning-based framework for robot adaptive manipulating the object with a revolute joint in unstructured environments. We concentrate our discussion on various cabinet door opening tasks. To improve the performance of Deep Reinforcement Learning in this scene, we analytically provide an efficient sampling manner utilizing the constraints of the objects. To open various kinds of doors, we add encoded environment parameters that define the various environments to the input of out policy. To transfer the policy into the real world, we train an adaptation module in simulation and fine-tune the adaptation module to cut down the impact of the policy-unaware environment parameters. We design a series of experiments to validate the efficacy of our framework. Additionally, we testify to the model's performance in the real world compared to the traditional door opening method.

Timely Mobile Routing: An Experimental Study

  • Authors: Vishakha Ramani, Jiachen Chen, Roy D. Yates
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.14603
  • Pdf link: https://arxiv.org/pdf/2304.14603
  • Abstract
    Time-critical cyber-physical applications demand the timely delivery of information. In this work, we employ a high-speed packet processing testbed to quantitatively analyze a packet forwarding application running on a shared memory multi-processor architecture, where efficient synchronization of concurrent access to a Forwarding Information Base is essential for low-latency and timely delivery of information. While modern packet processing frameworks are optimized for maximum packet throughput, their ability to support timely delivery remains an open question. Here we focus on the age of information performance issues induced by throughput-focused packet processing frameworks. Our results underscore the importance of careful selection of offered load parameters and concurrency constructs in such frameworks.

Local-Global Transformer Enhanced Unfolding Network for Pan-sharpening

  • Authors: Mingsong Li, Yikun Liu, Tao Xiao, Yuwen Huang, Gongping Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.14612
  • Pdf link: https://arxiv.org/pdf/2304.14612
  • Abstract
    Pan-sharpening aims to increase the spatial resolution of the low-resolution multispectral (LrMS) image with the guidance of the corresponding panchromatic (PAN) image. Although deep learning (DL)-based pan-sharpening methods have achieved promising performance, most of them have a two-fold deficiency. For one thing, the universally adopted black box principle limits the model interpretability. For another thing, existing DL-based methods fail to efficiently capture local and global dependencies at the same time, inevitably limiting the overall performance. To address these mentioned issues, we first formulate the degradation process of the high-resolution multispectral (HrMS) image as a unified variational optimization problem, and alternately solve its data and prior subproblems by the designed iterative proximal gradient descent (PGD) algorithm. Moreover, we customize a Local-Global Transformer (LGT) to simultaneously model local and global dependencies, and further formulate an LGT-based prior module for image denoising. Besides the prior module, we also design a lightweight data module. Finally, by serially integrating the data and prior modules in each iterative stage, we unfold the iterative algorithm into a stage-wise unfolding network, Local-Global Transformer Enhanced Unfolding Network (LGTEUN), for the interpretable MS pan-sharpening. Comprehensive experimental results on three satellite data sets demonstrate the effectiveness and efficiency of LGTEUN compared with state-of-the-art (SOTA) methods. The source code is available at https://github.com/lms-07/LGTEUN.

DataFlower: Exploiting the Data-flow Paradigm for Serverless Workflow Orchestration

  • Authors: Zijun Li, Chuhao Xu, Quan Chen, Jieru Zhao, Chen Chen, Minyi Guo
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.14629
  • Pdf link: https://arxiv.org/pdf/2304.14629
  • Abstract
    Serverless computing that runs functions with auto-scaling is a popular task execution pattern in the cloud-native era. By connecting serverless functions into workflows, tenants can achieve complex functionality. Prior researches adopt the control-flow paradigm to orchestrate a serverless workflow. However, the control-flow paradigm inherently results in long response latency, due to the heavy data persistence overhead, sequential resource usage, and late function triggering. Our investigation shows that the data-flow paradigm has the potential to resolve the above problems, with careful design and optimization. We propose DataFlower, a scheme that achieves the data-flow paradigm for serverless workflows. In DataFlower, a container is abstracted to be a function logic unit and a data logic unit. The function logic unit runs the functions, and the data logic unit handles the data transmission asynchronously. Moreover, a host-container collaborative communication mechanism is used to support efficient data transfer. Our experimental results show that compared to state-of-the-art serverless designs, DataFlower reduces the 99%-ile latency of the benchmarks by up to 35.4%, and improves the peak throughput by up to 3.8X.

An Adaptive Policy to Employ Sharpness-Aware Minimization

  • Authors: Weisen Jiang, Hansi Yang, Yu Zhang, James Kwok
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14647
  • Pdf link: https://arxiv.org/pdf/2304.14647
  • Abstract
    Sharpness-aware minimization (SAM), which searches for flat minima by min-max optimization, has been shown to be useful in improving model generalization. However, since each SAM update requires computing two gradients, its computational cost and training time are both doubled compared to standard empirical risk minimization (ERM). Recent state-of-the-arts reduce the fraction of SAM updates and thus accelerate SAM by switching between SAM and ERM updates randomly or periodically. In this paper, we design an adaptive policy to employ SAM based on the loss landscape geometry. Two efficient algorithms, AE-SAM and AE-LookSAM, are proposed. We theoretically show that AE-SAM has the same convergence rate as SAM. Experimental results on various datasets and architectures demonstrate the efficiency and effectiveness of the adaptive policy.

Effective Data Aggregation in WSN for Enhanced Security and Data Privacy

  • Authors: B. Murugeshwari, S. Aminta Sabatini, Lovelit Jose, S. Padmapriya
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.14654
  • Pdf link: https://arxiv.org/pdf/2304.14654
  • Abstract
    The two biggest problems with wireless sensor networks are security and energy usage. In sensing devices, malicious nodes could be found in large numbers. The researchers have proposed several methods to find these rogue nodes. To prevent assaults on these networks and data transmission, the data must be secured. Data aggregation aids in reducing the number of messages transmitted within the network, which in turn lowers total network energy consumption. Additionally, when decrypting the aggregated data, the base station can distinguish between encrypted and consolidated analysis based on top of the cryptographic keys. By examining the effectiveness of the data aggregation in this research. To solve the above problem, the system provides a method in which an efficient cluster agent is preferred pedestal on its location at the access point and energy availability. The sensor network's energy consumption is reduced by selecting an effective cluster agent, extending the network's lifespan. The cluster's agent is in indict of compiling data for each member node. The clustering agent validates the data and tosses any errors before aggregation. The clustering agent only aggregates confirmed data. To provide end-to-end anonymity, ElGamal elliptic curve (ECE) encryption is used to secure the client data and reassign the encrypted information en route for the cluster agent. Only the base station (BS) can decrypt the data. Furthermore, an ID-based signature system is utilized to enable authenticity. This research presents a technique for recuperating lost data. The access point employs a cache-based backup system to search for lost data.

Client Recruitment for Federated Learning in ICU Length of Stay Prediction

  • Authors: Vincent Scheltjens, Lyse Naomi Wamba Momo, Wouter Verbeke, Bart De Moor
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14663
  • Pdf link: https://arxiv.org/pdf/2304.14663
  • Abstract
    Machine and deep learning methods for medical and healthcare applications have shown significant progress and performance improvement in recent years. These methods require vast amounts of training data which are available in the medical sector, albeit decentralized. Medical institutions generate vast amounts of data for which sharing and centralizing remains a challenge as the result of data and privacy regulations. The federated learning technique is well-suited to tackle these challenges. However, federated learning comes with a new set of open problems related to communication overhead, efficient parameter aggregation, client selection strategies and more. In this work, we address the step prior to the initiation of a federated network for model training, client recruitment. By intelligently recruiting clients, communication overhead and overall cost of training can be reduced without sacrificing predictive performance. Client recruitment aims at pre-excluding potential clients from partaking in the federation based on a set of criteria indicative of their eventual contributions to the federation. In this work, we propose a client recruitment approach using only the output distribution and sample size at the client site. We show how a subset of clients can be recruited without sacrificing model performance whilst, at the same time, significantly improving computation time. By applying the recruitment approach to the training of federated models for accurate patient Length of Stay prediction using data from 189 Intensive Care Units, we show how the models trained in federations made up from recruited clients significantly outperform federated models trained with the standard procedure in terms of predictive power and training time.

Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video Quality Assessment

  • Authors: Haoning Wu, Liang Liao, Annan Wang, Chaofeng Chen, Jingwen Hou, Wenxiu Sun, Qiong Yan, Weisi Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14672
  • Pdf link: https://arxiv.org/pdf/2304.14672
  • Abstract
    The proliferation of videos collected during in-the-wild natural settings has pushed the development of effective Video Quality Assessment (VQA) methodologies. Contemporary supervised opinion-driven VQA strategies predominantly hinge on training from expensive human annotations for quality scores, which limited the scale and distribution of VQA datasets and consequently led to unsatisfactory generalization capacity of methods driven by these data. On the other hand, although several handcrafted zero-shot quality indices do not require training from human opinions, they are unable to account for the semantics of videos, rendering them ineffective in comprehending complex authentic distortions (e.g., white balance, exposure) and assessing the quality of semantic content within videos. To address these challenges, we introduce the text-prompted Semantic Affinity Quality Index (SAQI) and its localized version (SAQI-Local) using Contrastive Language-Image Pre-training (CLIP) to ascertain the affinity between textual prompts and visual features, facilitating a comprehensive examination of semantic quality concerns without the reliance on human quality annotations. By amalgamating SAQI with existing low-level metrics, we propose the unified Blind Video Quality Index (BVQI) and its improved version, BVQI-Local, which demonstrates unprecedented performance, surpassing existing zero-shot indices by at least 24% on all datasets. Moreover, we devise an efficient fine-tuning scheme for BVQI-Local that jointly optimizes text prompts and final fusion weights, resulting in state-of-the-art performance and superior generalization ability in comparison to prevalent opinion-driven VQA methods. We conduct comprehensive analyses to investigate different quality concerns of distinct indices, demonstrating the effectiveness and rationality of our design.

Quantum Cross Subspace Alignment Codes via the $N$-sum Box Abstraction

  • Authors: Yuxiang Lu, Syed Ali Jafar
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2304.14676
  • Pdf link: https://arxiv.org/pdf/2304.14676
  • Abstract
    Cross-subspace alignment (CSA) codes are used in various private information retrieval (PIR) schemes (e.g., with secure storage) and in secure distributed batch matrix multiplication (SDBMM). Using a recently developed $N$-sum box abstraction of a quantum multiple-access channel (QMAC), we translate CSA schemes over classical multiple-access channels into efficient quantum CSA schemes over a QMAC, achieving maximal superdense coding gain. Because of the $N$-sum box abstraction, the underlying problem of coding to exploit quantum entanglements for CSA schemes, becomes conceptually equivalent to that of designing a channel matrix for a MIMO MAC subject to given structural constraints imposed by the $N$-sum box abstraction, such that the resulting MIMO MAC is able to implement the functionality of a CSA scheme (encoding/decoding) over-the-air. Applications include Quantum PIR with secure and MDS-coded storage, as well as Quantum SDBMM.

Graph Neural Networks on Factor Graphs for Robust, Fast, and Scalable Linear State Estimation with PMUs

  • Authors: Ognjen Kundacina, Mirsad Cosovic, Dragisa Miskovic, Dejan Vukobratovic
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.14680
  • Pdf link: https://arxiv.org/pdf/2304.14680
  • Abstract
    As phasor measurement units (PMUs) become more widely used in transmission power systems, a fast state estimation (SE) algorithm that can take advantage of their high sample rates is needed. To accomplish this, we present a method that uses graph neural networks (GNNs) to learn complex bus voltage estimates from PMU voltage and current measurements. We propose an original implementation of GNNs over the power system's factor graph to simplify the integration of various types and quantities of measurements on power system buses and branches. Furthermore, we augment the factor graph to improve the robustness of GNN predictions. This model is highly efficient and scalable, as its computational complexity is linear with respect to the number of nodes in the power system. Training and test examples were generated by randomly sampling sets of power system measurements and annotated with the exact solutions of linear SE with PMUs. The numerical results demonstrate that the GNN model provides an accurate approximation of the SE solutions. Furthermore, errors caused by PMU malfunctions or communication failures that would normally make the SE problem unobservable have a local effect and do not deteriorate the results in the rest of the power system.

Zero Trust Chain A Design Pattern for Improved Interoperability and Security in Polkadot

  • Authors: Santiago Márquez Solís
  • Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.14730
  • Pdf link: https://arxiv.org/pdf/2304.14730
  • Abstract
    This research article presents various design patterns for improving interoperability in Polkadot, a blockchain platform. These patterns include chain bridges, interoperability standards, common asset identifiers, governance agreements, oracle chains, and a hypothetical design pattern called Zero Trust Chain. Implementation of these design patterns can help improve security and confidence in transactions between different chains on the Polkadot network, allowing for faster and more efficient communication. The article also emphasizes the importance of interoperability in blockchain technology and highlights Polkadot's flexibility in creating customized specialized chains that can further improve interoperability on the network. Overall, this article highlights how design patterns can improve interoperability in Polkadot, which could lead to greater adoption of blockchain technology in various industries.

FlowTransformer: A Transformer Framework for Flow-based Network Intrusion Detection Systems

  • Authors: Liam Daly Manocchio, Siamak Layeghy, Wai Weng Lo, Gayan K. Kulatilleke, Mohanad Sarhan, Marius Portmann
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.14746
  • Pdf link: https://arxiv.org/pdf/2304.14746
  • Abstract
    This paper presents the FlowTransformer framework, a novel approach for implementing transformer-based Network Intrusion Detection Systems (NIDSs). FlowTransformer leverages the strengths of transformer models in identifying the long-term behaviour and characteristics of networks, which are often overlooked by most existing NIDSs. By capturing these complex patterns in network traffic, FlowTransformer offers a flexible and efficient tool for researchers and practitioners in the cybersecurity community who are seeking to implement NIDSs using transformer-based models. FlowTransformer allows the direct substitution of various transformer components, including the input encoding, transformer, classification head, and the evaluation of these across any flow-based network dataset. To demonstrate the effectiveness and efficiency of the FlowTransformer framework, we utilise it to provide an extensive evaluation of various common transformer architectures, such as GPT 2.0 and BERT, on three commonly used public NIDS benchmark datasets. We provide results for accuracy, model size and speed. A key finding of our evaluation is that the choice of classification head has the most significant impact on the model performance. Surprisingly, Global Average Pooling, which is commonly used in text classification, performs very poorly in the context of NIDS. In addition, we show that model size can be reduced by over 50%, and inference and training times improved, with no loss of accuracy, by making specific choices of input encoding and classification head instead of other commonly used alternatives.

Orthogonal polynomial bases in the Mixed Virtual Element Method

  • Authors: Stefano Berrone, Stefano Scialò, Gioana Teora
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14755
  • Pdf link: https://arxiv.org/pdf/2304.14755
  • Abstract
    The use of orthonormal polynomial bases has been found to be efficient in preventing ill-conditioning of the system matrix in the primal formulation of Virtual Element Methods (VEM) for high values of polynomial degree and in presence of badly-shaped polygons. However, we show that using the natural extension of a orthogonal polynomial basis built for the primal formulation is not sufficient to cure ill-conditioning in the mixed case. Thus, in the present work, we introduce an orthogonal vector-polynomial basis which is built ad hoc for being used in the mixed formulation of VEM and which leads to very high-quality solution in each tested case. Furthermore, a numerical experiment related to simulations in Discrete Fracture Networks (DFN), which are often characterised by very badly-shaped elements, is proposed to validate our procedures.

Hyperparameter Optimization through Neural Network Partitioning

  • Authors: Bruno Mlodozeniec, Matthias Reisser, Christos Louizos
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.14766
  • Pdf link: https://arxiv.org/pdf/2304.14766
  • Abstract
    Well-tuned hyperparameters are crucial for obtaining good generalization behavior in neural networks. They can enforce appropriate inductive biases, regularize the model and improve performance -- especially in the presence of limited data. In this work, we propose a simple and efficient way for optimizing hyperparameters inspired by the marginal likelihood, an optimization objective that requires no validation data. Our method partitions the training data and a neural network model into $K$ data shards and parameter partitions, respectively. Each partition is associated with and optimized only on specific data shards. Combining these partitions into subnetworks allows us to define the ``out-of-training-sample" loss of a subnetwork, i.e., the loss on data shards unseen by the subnetwork, as the objective for hyperparameter optimization. We demonstrate that we can apply this objective to optimize a variety of different hyperparameters in a single training run while being significantly computationally cheaper than alternative methods aiming to optimize the marginal likelihood for neural networks. Lastly, we also focus on optimizing hyperparameters in federated learning, where retraining and cross-validation are particularly challenging.

MCPrioQ: A lock-free algorithm for online sparse markov-chains

  • Authors: Jesper Derehag, Åke Johansson
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14801
  • Pdf link: https://arxiv.org/pdf/2304.14801
  • Abstract
    In high performance systems it is sometimes hard to build very large graphs that are efficient both with respect to memory and compute. This paper proposes a data structure called Markov-chain-priority-queue (MCPrioQ), which is a lock-free sparse markov-chain that enables online and continuous learning with time-complexity of $O(1)$ for updates and $O(CDF^{-1}(t))$ inference. MCPrioQ is especially suitable for recommender-systems for lookups of $n$-items in descending probability order. The concurrent updates are achieved using hash-tables and atomic instructions and the lookups are achieved through a novel priority-queue which allows for approximately correct results even during concurrent updates. The approximatly correct and lock-free property is maintained by a read-copy-update scheme, but where the semantics have been slightly updated to allow for swap of elements rather than the traditional pop-insert scheme.

Channel Orthogonalization with Reconfigurable Surfaces

  • Authors: Juan Vidal Alegria, Fredrik Rusek
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.14804
  • Pdf link: https://arxiv.org/pdf/2304.14804
  • Abstract
    Orthogonal multi-user multiple-input multiple-output (MU-MIMO) channels allow for optimum performance with simplified precoding/equalization, and they achieve maximum multiplexing gain which is shared fairly among users. Reconfigurable intelligent surface (RIS) constitutes a promising cost-efficient solution to improve the wireless channel, since they consist of passive reflecting elements able to adjust the phases of the incoming waves. However, it is still widely unclear how these surfaces can improve spatial-multiplexing. In fact, the common RIS model cannot achieve perfect orthogonalization of MU-MIMO channels with a reasonable number of elements. Furthermore, efficient channel estimation algorithms for RIS, which are key for taking advantage of its benefits, are still a matter of research. We study two types of reconfigurable surfaces (RSs), namely amplitude-reconfigurable intelligent surface (ARIS) and fully-reconfigurable intelligent surface (FRIS), with extended capabilities over RIS. We show how these RSs allow for perfect channel orthogonalization, and, by minimizing the applied power, we show that they can potentially be implemented without the need of amplification. We also present an efficient channel estimation method for each of them that allows the base station (BS) to select the desired propagation channel.

NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields

  • Authors: Junge Zhang, Feihu Zhang, Shaochen Kuang, Li Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14811
  • Pdf link: https://arxiv.org/pdf/2304.14811
  • Abstract
    Labeling LiDAR point clouds for training autonomous driving is extremely expensive and difficult. LiDAR simulation aims at generating realistic LiDAR data with labels for training and verifying self-driving algorithms more efficiently. Recently, Neural Radiance Fields (NeRF) have been proposed for novel view synthesis using implicit reconstruction of 3D scenes. Inspired by this, we present NeRF-LIDAR, a novel LiDAR simulation method that leverages real-world information to generate realistic LIDAR point clouds. Different from existing LiDAR simulators, we use real images and point cloud data collected by self-driving cars to learn the 3D scene representation, point cloud generation and label rendering. We verify the effectiveness of our NeRF-LiDAR by training different 3D segmentation models on the generated LiDAR point clouds. It reveals that the trained models are able to achieve similar accuracy when compared with the same model trained on the real LiDAR data. Besides, the generated data is capable of boosting the accuracy through pre-training which helps reduce the requirements of the real labeled data.

Earning Extra Performance from Restrictive Feedbacks

  • Authors: Jing Li, Yuangang Pan, Yueming Lyu, Yinghua Yao, Yulei Sui, Ivor W. Tsang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14831
  • Pdf link: https://arxiv.org/pdf/2304.14831
  • Abstract
    Many machine learning applications encounter a situation where model providers are required to further refine the previously trained model so as to gratify the specific need of local users. This problem is reduced to the standard model tuning paradigm if the target data is permissibly fed to the model. However, it is rather difficult in a wide range of practical cases where target data is not shared with model providers but commonly some evaluations about the model are accessible. In this paper, we formally set up a challenge named \emph{Earning eXtra PerformancE from restriCTive feEDdbacks} (EXPECTED) to describe this form of model tuning problems. Concretely, EXPECTED admits a model provider to access the operational performance of the candidate model multiple times via feedback from a local user (or a group of users). The goal of the model provider is to eventually deliver a satisfactory model to the local user(s) by utilizing the feedbacks. Unlike existing model tuning methods where the target data is always ready for calculating model gradients, the model providers in EXPECTED only see some feedbacks which could be as simple as scalars, such as inference accuracy or usage rate. To enable tuning in this restrictive circumstance, we propose to characterize the geometry of the model performance with regard to model parameters through exploring the parameters' distribution. In particular, for the deep models whose parameters distribute across multiple layers, a more query-efficient algorithm is further tailor-designed that conducts layerwise tuning with more attention to those layers which pay off better. Our theoretical analyses justify the proposed algorithms from the aspects of both efficacy and efficiency. Extensive experiments on different applications demonstrate that our work forges a sound solution to the EXPECTED problem.

Regret Optimal Control for Uncertain Stochastic Systems

  • Authors: Andrea Martin, Luca Furieri, Florian Dörfler, John Lygeros, Giancarlo Ferrari-Trecate
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.14835
  • Pdf link: https://arxiv.org/pdf/2304.14835
  • Abstract
    We consider control of uncertain linear time-varying stochastic systems from the perspective of regret minimization. Specifically, we focus on the problem of designing a feedback controller that minimizes the loss relative to a clairvoyant optimal policy that has foreknowledge of the system dynamics and the exogenous disturbances. In this competitive framework, establishing robustness guarantees proves challenging as, differently from the case where the model is known, the benchmark policy is not only inapplicable, but also impossible to compute without knowledge of the system parameters. To overcome this issue, we embrace a scenario optimization approach, and we propose minimizing regret robustly over a finite set of randomly sampled system parameters. We prove that this policy optimization problem can be efficiently solved through semidefinite programming, and that the corresponding solution retains strong probabilistic out-of-sample regret guarantees in face of the uncertain dynamics. Our method naturally extends to include satisfaction of safety constraints with high probability. We validate our theoretical results and showcase the potential of our approach by means of numerical simulations.

Sampling-based Path Planning Algorithms: A Survey

  • Authors: Alka Choudhary
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14839
  • Pdf link: https://arxiv.org/pdf/2304.14839
  • Abstract
    Path planning is a classic problem for autonomous robots. To ensure safe and efficient point-to-point navigation an appropriate algorithm should be chosen keeping the robot's dimensions and its classification in mind. Autonomous robots use path-planning algorithms to safely navigate a dynamic, dense, and unknown environment. A few metrics for path planning algorithms to be taken into account are safety, efficiency, lowest-cost path generation, and obstacle avoidance. Before path planning can take place we need map representation which can be discretized or open configuration space. Discretized configuration space provides node/connectivity information from one point to another. While in open/free configuration space it is up to the algorithm to create a list of nodes and then find a feasible path. Both types of maps are populated by obstacle positions using perception obstacle detection techniques to represent current obstacles from the perspective of the robot. For open configuration spaces, sampling-based planning algorithms are used. This paper aims to explore various types of Sampling-based path-planning algorithms such as Probabilistic RoadMap (PRM), and Rapidly-exploring Random Trees (RRT). These two algorithms also have optimized versions - PRM* and RRT* and this paper discusses how that optimization is achieved and is beneficial.

Wasserstein Dictionaries of Persistence Diagrams

  • Authors: Keanu Sisouk, Julie Delon, Julien Tierny
  • Subjects: Machine Learning (cs.LG); Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.14852
  • Pdf link: https://arxiv.org/pdf/2304.14852
  • Abstract
    This paper presents a computational framework for the concise encoding of an ensemble of persistence diagrams, in the form of weighted Wasserstein barycenters [99], [101] of a dictionary of atom diagrams. We introduce a multi-scale gradient descent approach for the efficient resolution of the corresponding minimization problem, which interleaves the optimization of the barycenter weights with the optimization of the atom diagrams. Our approach leverages the analytic expressions for the gradient of both sub-problems to ensure fast iterations and it additionally exploits shared-memory parallelism. Extensive experiments on public ensembles demonstrate the efficiency of our approach, with Wasserstein dictionary computations in the orders of minutes for the largest examples. We show the utility of our contributions in two applications. First, we apply Wassserstein dictionaries to data reduction and reliably compress persistence diagrams by concisely representing them with their weights in the dictionary. Second, we present a dimensionality reduction framework based on a Wasserstein dictionary defined with a small number of atoms (typically three) and encode the dictionary as a low dimensional simplex embedded in a visual space (typically in 2D). In both applications, quantitative experiments assess the relevance of our framework. Finally, we provide a C++ implementation that can be used to reproduce our results.

Dense Hybrid Proposal Modulation for Lane Detection

  • Authors: Yuejian Wu, Linqing Zhao, Jiwen Lu, Haibin Yan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14874
  • Pdf link: https://arxiv.org/pdf/2304.14874
  • Abstract
    In this paper, we present a dense hybrid proposal modulation (DHPM) method for lane detection. Most existing methods perform sparse supervision on a subset of high-scoring proposals, while other proposals fail to obtain effective shape and location guidance, resulting in poor overall quality. To address this, we densely modulate all proposals to generate topologically and spatially high-quality lane predictions with discriminative representations. Specifically, we first ensure that lane proposals are physically meaningful by applying single-lane shape and location constraints. Benefitting from the proposed proposal-to-label matching algorithm, we assign each proposal a target ground truth lane to efficiently learn from spatial layout priors. To enhance the generalization and model the inter-proposal relations, we diversify the shape difference of proposals matching the same ground-truth lane. In addition to the shape and location constraints, we design a quality-aware classification loss to adaptively supervise each positive proposal so that the discriminative power can be further boosted. Our DHPM achieves very competitive performances on four popular benchmark datasets. Moreover, we consistently outperform the baseline model on most metrics without introducing new parameters and reducing inference speed.

A novel reduced-order model for advection-dominated problems based on Radon-Cumulative-Distribution Transform

  • Authors: Tobias Long, Robert Barnett, Richard Jefferson-Loveday, Giovanni Stabile, Matteo Icardi
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14883
  • Pdf link: https://arxiv.org/pdf/2304.14883
  • Abstract
    Problems with dominant advection, discontinuities, travelling features, or shape variations are widespread in computational mechanics. However, classical linear model reduction and interpolation methods typically fail to reproduce even relatively small parameter variations, making the reduced models inefficient and inaccurate. In this work a novel reduced order modelling approach is proposed based on the Radon-Cumulative-Distribution transform (RCDT). We show that this non-linear transformation can significantly improve the dimensionality of proper orthogonal decomposition (POD) reconstructions and is capable of interpolating accurately some advection-dominated phenomena. The method is tested on various testcases in multiphase fluid dynamics.

Ensuring Reliable Robot Task Performance through Probabilistic Rare-Event Verification and Synthesis

  • Authors: Guy Scher, Sadra Sadraddini, Ariel Yadin, Hadas Kress-Gazit
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14886
  • Pdf link: https://arxiv.org/pdf/2304.14886
  • Abstract
    Providing guarantees on the safe operation of robots against edge cases is challenging as testing methods such as traditional Monte-Carlo require too many samples to provide reasonable statistics. Built upon recent advancements in rare-event sampling, we present a model-based method to verify if a robotic system satisfies a Signal Temporal Logic (STL) specification in the face of environment variations and sensor/actuator noises. Our method is efficient and applicable to both linear and nonlinear and even black-box systems with arbitrary, but known, uncertainty distributions. For linear systems with Gaussian uncertainties, we exploit a feature to find optimal parameters that minimize the probability of failure. We demonstrate illustrative examples on applying our approach to real-world autonomous robotic systems.

The Power of Typed Affine Decision Structures: A Case Study

  • Authors: Gerrit Nolte, Maximilian Schlüter, Alnis Murtovi, Bernhard Steffen
  • Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.14888
  • Pdf link: https://arxiv.org/pdf/2304.14888
  • Abstract
    TADS are a novel, concise white-box representation of neural networks. In this paper, we apply TADS to the problem of neural network verification, using them to generate either proofs or concise error characterizations for desirable neural network properties. In a case study, we consider the robustness of neural networks to adversarial attacks, i.e., small changes to an input that drastically change a neural networks perception, and show that TADS can be used to provide precise diagnostics on how and where robustness errors a occur. We achieve these results by introducing Precondition Projection, a technique that yields a TADS describing network behavior precisely on a given subset of its input space, and combining it with PCA, a traditional, well-understood dimensionality reduction technique. We show that PCA is easily compatible with TADS. All analyses can be implemented in a straightforward fashion using the rich algebraic properties of TADS, demonstrating the utility of the TADS framework for neural network explainability and verification. While TADS do not yet scale as efficiently as state-of-the-art neural network verifiers, we show that, using PCA-based simplifications, they can still scale to mediumsized problems and yield concise explanations for potential errors that can be used for other purposes such as debugging a network or generating new training samples.

Model Predictive Control of Wind Turbines with Piecewise-Affine Power Coefficient Approximation

  • Authors: Arnold Sterle, Aaron Grapentin, Christian A. Hans, Jörg Raisch
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.14906
  • Pdf link: https://arxiv.org/pdf/2304.14906
  • Abstract
    In this paper, an offset-free bilinear model predictive control approach for wind turbines is presented. State-of-the-art controllers employ different control loops for pitch angle and generator torque which switch depending on wind conditions. In contrast, the presented controller is based on one unified control law that works for all wind conditions. The inherent nonlinearity of wind turbines is addressed through a piecewise-affine approximation of the power coefficient, which is modelled in a mixed-integer fashion. The presented controller is compared to a state-of-the-art baseline controller in a numerical case study using OpenFAST. Simulation results show that the presented controller ensures accurate reference power tracking. Additionally, damage equivalent loads are reduced for higher wind speeds.

Representation Matters: The Game of Chess Poses a Challenge to Vision Transformers

  • Authors: Johannes Czech, Jannis Blüml, Kristian Kersting
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14918
  • Pdf link: https://arxiv.org/pdf/2304.14918
  • Abstract
    While transformers have gained the reputation as the "Swiss army knife of AI", no one has challenged them to master the game of chess, one of the classical AI benchmarks. Simply using vision transformers (ViTs) within AlphaZero does not master the game of chess, mainly because ViTs are too slow. Even making them more efficient using a combination of MobileNet and NextViT does not beat what actually matters: a simple change of the input representation and value loss, resulting in a greater boost of up to 180 Elo points over AlphaZero.

An Edge Assisted Robust Smart Traffic Management and Signalling System for Guiding Emergency Vehicles During Peak Hours

  • Authors: Shuvadeep Masanta, Ramyashree Pramanik, Sourav Ghosh, Tanmay Bhattacharya
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14924
  • Pdf link: https://arxiv.org/pdf/2304.14924
  • Abstract
    Congestion in traffic is an unavoidable circumstance in many cities in India and other countries. It is an issue of major concern. The steep rise in the number of automobiles on the roads followed by old infrastructure, accidents, pedestrian traffic, and traffic rule violations all add to challenging traffic conditions. Given these poor conditions of traffic, there is a critical need for automatically detecting and signaling systems. There are already various technologies that are used for traffic management and signaling systems like video analysis, infrared sensors, and wireless sensors. The main issue with these methods is they are very costly and high maintenance is required. In this paper, we have proposed a three-phase system that can guide emergency vehicles and manage traffic based on the degree of congestion. In the first phase, the system processes the captured images and calculates the Index value which is used to discover the degree of congestion. The Index value of a particular road depends on its width and the length up to which the camera captures images of that road. We have to take input for the parameters (length and width) while setting up the system. In the second phase, the system checks whether there are any emergency vehicles present or not in any lane. In the third phase, the whole processing and decision-making part is performed at the edge server. The proposed model is robust and it takes into consideration adverse weather conditions such as hazy, foggy, and windy. It works very efficiently in low light conditions also. The edge server is a strategically placed server that provides us with low latency and better connectivity. Using Edge technology in this traffic management system reduces the strain on cloud servers and the system becomes more reliable in real-time because the latency and bandwidth get reduced due to processing at the intermediate edge server.

An Empirical Study of Multimodal Model Merging

  • Authors: Yi-Lin Sung, Linjie Li, Kevin Lin, Zhe Gan, Mohit Bansal, Lijuan Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14933
  • Pdf link: https://arxiv.org/pdf/2304.14933
  • Abstract
    Model merging (e.g., via interpolation or task arithmetic) fuses multiple models trained on different tasks to generate a multi-task solution. The technique has been proven successful in previous studies, where the models are trained on similar tasks and with the same initialization. In this paper, we expand on this concept to a multimodal setup by merging transformers trained on different modalities. Furthermore, we conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture to create a parameter-efficient modality-agnostic architecture. Through comprehensive experiments, we systematically investigate the key factors impacting model performance after merging, including initialization, merging mechanisms, and model architectures. Our analysis leads to an effective training recipe for matching the performance of the modality-agnostic baseline (i.e. pre-trained from scratch) via model merging. Our code is available at: https://github.com/ylsung/vl-merging

Information Redundancy and Biases in Public Document Information Extraction Benchmarks

  • Authors: Seif Laatiri, Pirashanth Ratnamogan, Joel Tang, Laurent Lam, William Vanhuffel, Fabien Caspani
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14936
  • Pdf link: https://arxiv.org/pdf/2304.14936
  • Abstract
    Advances in the Visually-rich Document Understanding (VrDU) field and particularly the Key-Information Extraction (KIE) task are marked with the emergence of efficient Transformer-based approaches such as the LayoutLM models. Despite the good performance of KIE models when fine-tuned on public benchmarks, they still struggle to generalize on complex real-life use-cases lacking sufficient document annotations. Our research highlighted that KIE standard benchmarks such as SROIE and FUNSD contain significant similarity between training and testing documents and can be adjusted to better evaluate the generalization of models. In this work, we designed experiments to quantify the information redundancy in public benchmarks, revealing a 75% template replication in SROIE official test set and 16% in FUNSD. We also proposed resampling strategies to provide benchmarks more representative of the generalization ability of models. We showed that models not suited for document analysis struggle on the adjusted splits dropping on average 10,5% F1 score on SROIE and 3.5% on FUNSD compared to multi-modal models dropping only 7,5% F1 on SROIE and 0.5% F1 on FUNSD.

CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl Data

  • Authors: Michał Turski, Tomasz Stanisławek, Karol Kaczmarek, Paweł Dyda, Filip Graliński
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.14953
  • Pdf link: https://arxiv.org/pdf/2304.14953
  • Abstract
    In recent years, the field of document understanding has progressed a lot. A significant part of this progress has been possible thanks to the use of language models pretrained on large amounts of documents. However, pretraining corpora used in the domain of document understanding are single domain, monolingual, or nonpublic. Our goal in this paper is to propose an efficient pipeline for creating a big-scale, diverse, multilingual corpus of PDF files from all over the Internet using Common Crawl, as PDF files are the most canonical types of documents as considered in document understanding. We analysed extensively all of the steps of the pipeline and proposed a solution which is a trade-off between data quality and processing time. We also share a CCpdf corpus in a form or an index of PDF files along with a script for downloading them, which produces a collection useful for language model pretraining. The dataset and tools published with this paper offer researchers the opportunity to develop even better multilingual language models.

Popularity Ratio Maximization: Surpassing Competitors through Influence Propagation

  • Authors: Hao Liao, Sheng Bi, Jiao Wu, Wei Zhang, Mingyang Zhou, Rui Mao, Wei Chen
  • Subjects: Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.14971
  • Pdf link: https://arxiv.org/pdf/2304.14971
  • Abstract
    In this paper, we present an algorithmic study on how to surpass competitors in popularity by strategic promotions in social networks. We first propose a novel model, in which we integrate the Preferential Attachment (PA) model for popularity growth with the Independent Cascade (IC) model for influence propagation in social networks called PA-IC model. In PA-IC, a popular item and a novice item grab shares of popularity from the natural popularity growth via the PA model, while the novice item tries to gain extra popularity via influence cascade in a social network. The {\em popularity ratio} is defined as the ratio of the popularity measure between the novice item and the popular item. We formulate {\em Popularity Ratio Maximization (PRM)} as the problem of selecting seeds in multiple rounds to maximize the popularity ratio in the end. We analyze the popularity ratio and show that it is monotone but not submodular. To provide an effective solution, we devise a surrogate objective function and show that empirically it is very close to the original objective function while theoretically, it is monotone and submodular. We design two efficient algorithms, one for the overlapping influence and non-overlapping seeds (across rounds) setting and the other for the non-overlapping influence and overlapping seed setting, and further discuss how to deal with other models and problem variants. Our empirical evaluation further demonstrates that the proposed PRM-IMM method consistently achieves the best popularity promotion compared to other methods. Our theoretical and empirical analyses shed light on the interplay between influence maximization and preferential attachment in social networks.

Hierarchical and Decentralised Federated Learning

  • Authors: Omer Rana, Theodoros Spyridopoulos, Nathaniel Hudson, Matt Baughman, Kyle Chard, Ian Foster, Aftab Khan
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.14982
  • Pdf link: https://arxiv.org/pdf/2304.14982
  • Abstract
    Federated learning has shown enormous promise as a way of training ML models in distributed environments while reducing communication costs and protecting data privacy. However, the rise of complex cyber-physical systems, such as the Internet-of-Things, presents new challenges that are not met with traditional FL methods. Hierarchical Federated Learning extends the traditional FL process to enable more efficient model aggregation based on application needs or characteristics of the deployment environment (e.g., resource capabilities and/or network connectivity). It illustrates the benefits of balancing processing across the cloud-edge continuum. Hierarchical Federated Learning is likely to be a key enabler for a wide range of applications, such as smart farming and smart energy management, as it can improve performance and reduce costs, whilst also enabling FL workflows to be deployed in environments that are not well-suited to traditional FL. Model aggregation algorithms, software frameworks, and infrastructures will need to be designed and implemented to make such solutions accessible to researchers and engineers across a growing set of domains. H-FL also introduces a number of new challenges. For instance, there are implicit infrastructural challenges. There is also a trade-off between having generalised models and personalised models. If there exist geographical patterns for data (e.g., soil conditions in a smart farm likely are related to the geography of the region itself), then it is crucial that models used locally can consider their own locality in addition to a globally-learned model. H-FL will be crucial to future FL solutions as it can aggregate and distribute models at multiple levels to optimally serve the trade-off between locality dependence and global anomaly robustness.

Interpreting Vision and Language Generative Models with Semantic Visual Priors

  • Authors: Michele Cafagna, Lina M. Rojas-Barahona, Kees van Deemter, Albert Gatt
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.14986
  • Pdf link: https://arxiv.org/pdf/2304.14986
  • Abstract
    When applied to Image-to-text models, interpretability methods often provide token-by-token explanations namely, they compute a visual explanation for each token of the generated sequence. Those explanations are expensive to compute and unable to comprehensively explain the model's output. Therefore, these models often require some sort of approximation that eventually leads to misleading explanations. We develop a framework based on SHAP, that allows for generating comprehensive, meaningful explanations leveraging the meaning representation of the output sequence as a whole. Moreover, by exploiting semantic priors in the visual backbone, we extract an arbitrary number of features that allows the efficient computation of Shapley values on large-scale models, generating at the same time highly meaningful visual explanations. We demonstrate that our method generates semantically more expressive explanations than traditional methods at a lower compute cost and that it can be generalized over other explainability methods.

Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards

  • Authors: Hao Qin, Kwang-Sung Jun, Chicheng Zhang
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.14989
  • Pdf link: https://arxiv.org/pdf/2304.14989
  • Abstract
    We study $K$-armed bandit problems where the reward distributions of the arms are all supported on the $[0,1]$ interval. It has been a challenge to design regret-efficient randomized exploration algorithms in this setting. Maillard sampling~\cite{maillard13apprentissage}, an attractive alternative to Thompson sampling, has recently been shown to achieve competitive regret guarantees in the sub-Gaussian reward setting~\cite{bian2022maillard} while maintaining closed-form action probabilities, which is useful for offline policy evaluation. In this work, we propose the Kullback-Leibler Maillard Sampling (KL-MS) algorithm, a natural extension of Maillard sampling for achieving KL-style gap-dependent regret bound. We show that KL-MS enjoys the asymptotic optimality when the rewards are Bernoulli and has a worst-case regret bound of the form $O(\sqrt{\mu^(1-\mu^) K T \ln K} + K \ln T)$, where $\mu^*$ is the expected reward of the optimal arm, and $T$ is the time horizon length.

Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs

  • Authors: George Pu, Anirudh Jain, Jihan Yin, Russell Kaplan
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14999
  • Pdf link: https://arxiv.org/pdf/2304.14999
  • Abstract
    As foundation models continue to exponentially scale in size, efficient methods of adaptation become increasingly critical. Parameter-efficient fine-tuning (PEFT), a recent class of techniques that require only modifying a small percentage of the model parameters, is currently the most popular method for adapting large language models (LLMs). Several PEFT techniques have recently been proposed with varying tradeoffs. We provide a comprehensive and uniform benchmark of various PEFT techniques across a representative LLM, the FLAN-T5 model, and evaluate model performance across different data scales of classification and generation datasets. Based on this, we provide a framework for choosing the optimal fine-tuning techniques given the task type and data availability. Contrary to popular belief, we also empirically prove that PEFT techniques converge slower than full tuning in low data scenarios, and posit the amount of data required for PEFT methods to both perform well and converge efficiently. Lastly, we further optimize these PEFT techniques by selectively choosing which parts of the model to train, and find that these techniques can be applied with significantly fewer parameters while maintaining and even improving performance.

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

  • Authors: Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2304.15010
  • Pdf link: https://arxiv.org/pdf/2304.15010
  • Abstract
    How to efficiently transform large language models (LLMs) into instruction followers is recently a popular research direction, while training LLM for multi-modal reasoning remains less explored. Although the recent LLaMA-Adapter demonstrates the potential to handle visual inputs with LLMs, it still cannot generalize well to open-ended visual instructions and lags behind GPT-4. In this paper, we present LLaMA-Adapter V2, a parameter-efficient visual instruction model. Specifically, we first augment LLaMA-Adapter by unlocking more learnable parameters (e.g., norm, bias and scale), which distribute the instruction-following ability across the entire LLaMA model besides adapters. Secondly, we propose an early fusion strategy to feed visual tokens only into the early LLM layers, contributing to better visual knowledge incorporation. Thirdly, a joint training paradigm of image-text pairs and instruction-following data is introduced by optimizing disjoint groups of learnable parameters. This strategy effectively alleviates the interference between the two tasks of image-text alignment and instruction following and achieves strong multi-modal reasoning with only a small-scale image-text and instruction dataset. During inference, we incorporate additional expert models (e.g. captioning/OCR systems) into LLaMA-Adapter to further enhance its image understanding capability without incurring training costs. Compared to the original LLaMA-Adapter, our LLaMA-Adapter V2 can perform open-ended multi-modal instructions by merely introducing 14M parameters over LLaMA. The newly designed framework also exhibits stronger language-only instruction-following capabilities and even excels in chat interactions. Our code and models are available at https://github.com/ZrrSkywalker/LLaMA-Adapter.

Keyword: faster

Robust and Fast Vehicle Detection using Augmented Confidence Map

  • Authors: Hamam Mokayed, Palaiahnakote Shivakumara, Lama Alkhaled, Rajkumar Saini, Muhammad Zeshan Afzal, Yan Chai Hum, Marcus Liwicki
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14462
  • Pdf link: https://arxiv.org/pdf/2304.14462
  • Abstract
    Vehicle detection in real-time scenarios is challenging because of the time constraints and the presence of multiple types of vehicles with different speeds, shapes, structures, etc. This paper presents a new method relied on generating a confidence map-for robust and faster vehicle detection. To reduce the adverse effect of different speeds, shapes, structures, and the presence of several vehicles in a single image, we introduce the concept of augmentation which highlights the region of interest containing the vehicles. The augmented map is generated by exploring the combination of multiresolution analysis and maximally stable extremal regions (MR-MSER). The output of MR-MSER is supplied to fast CNN to generate a confidence map, which results in candidate regions. Furthermore, unlike existing models that implement complicated models for vehicle detection, we explore the combination of a rough set and fuzzy-based models for robust vehicle detection. To show the effectiveness of the proposed method, we conduct experiments on our dataset captured by drones and on several vehicle detection benchmark datasets, namely, KITTI and UA-DETRAC. The results on our dataset and the benchmark datasets show that the proposed method outperforms the existing methods in terms of time efficiency and achieves a good detection rate.

Moccasin: Efficient Tensor Rematerialization for Neural Networks

  • Authors: Burak Bartan, Haoming Li, Harris Teague, Christopher Lott, Bistra Dilkina
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14463
  • Pdf link: https://arxiv.org/pdf/2304.14463
  • Abstract
    The deployment and training of neural networks on edge computing devices pose many challenges. The low memory nature of edge devices is often one of the biggest limiting factors encountered in the deployment of large neural network models. Tensor rematerialization or recompute is a way to address high memory requirements for neural network training and inference. In this paper we consider the problem of execution time minimization of compute graphs subject to a memory budget. In particular, we develop a new constraint programming formulation called \textsc{Moccasin} with only $O(n)$ integer variables, where $n$ is the number of nodes in the compute graph. This is a significant improvement over the works in the recent literature that propose formulations with $O(n^2)$ Boolean variables. We present numerical studies that show that our approach is up to an order of magnitude faster than recent work especially for large-scale graphs.

Zero Trust Chain A Design Pattern for Improved Interoperability and Security in Polkadot

  • Authors: Santiago Márquez Solís
  • Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.14730
  • Pdf link: https://arxiv.org/pdf/2304.14730
  • Abstract
    This research article presents various design patterns for improving interoperability in Polkadot, a blockchain platform. These patterns include chain bridges, interoperability standards, common asset identifiers, governance agreements, oracle chains, and a hypothetical design pattern called Zero Trust Chain. Implementation of these design patterns can help improve security and confidence in transactions between different chains on the Polkadot network, allowing for faster and more efficient communication. The article also emphasizes the importance of interoperability in blockchain technology and highlights Polkadot's flexibility in creating customized specialized chains that can further improve interoperability on the network. Overall, this article highlights how design patterns can improve interoperability in Polkadot, which could lead to greater adoption of blockchain technology in various industries.

SFD2: Semantic-guided Feature Detection and Description

  • Authors: Fei Xue, Ignas Budvytis, Roberto Cipolla
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14845
  • Pdf link: https://arxiv.org/pdf/2304.14845
  • Abstract
    Visual localization is a fundamental task for various applications including autonomous driving and robotics. Prior methods focus on extracting large amounts of often redundant locally reliable features, resulting in limited efficiency and accuracy, especially in large-scale environments under challenging conditions. Instead, we propose to extract globally reliable features by implicitly embedding high-level semantics into both the detection and description processes. Specifically, our semantic-aware detector is able to detect keypoints from reliable regions (e.g. building, traffic lane) and suppress unreliable areas (e.g. sky, car) implicitly instead of relying on explicit semantic labels. This boosts the accuracy of keypoint matching by reducing the number of features sensitive to appearance changes and avoiding the need of additional segmentation networks at test time. Moreover, our descriptors are augmented with semantics and have stronger discriminative ability, providing more inliers at test time. Particularly, experiments on long-term large-scale visual localization Aachen Day-Night and RobotCar-Seasons datasets demonstrate that our model outperforms previous local features and gives competitive accuracy to advanced matchers but is about 2 and 3 times faster when using 2k and 4k keypoints, respectively.

Keyword: mobile

MWaste: A Deep Learning Approach to Manage Household Waste

  • Authors: Suman Kunwar
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14498
  • Pdf link: https://arxiv.org/pdf/2304.14498
  • Abstract
    Computer vision methods have shown to be effective in classifying garbage into recycling categories for waste processing, existing methods are costly, imprecise, and unclear. To tackle this issue, we introduce MWaste, a mobile application that uses computer vision and deep learning techniques to classify waste materials as trash, plastic, paper, metal, glass or cardboard. Its effectiveness was tested on various neural network architectures and real-world images, achieving an average precision of 92% on the test set. This app can help combat climate change by enabling efficient waste processing and reducing the generation of greenhouse gases caused by incorrect waste disposal.

Mobile Network Slicing under Demand Uncertainty: A Stochastic Programming Approach

  • Authors: Anousheh Gholami, Nariman Torkzaban, John S. Baras
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.14556
  • Pdf link: https://arxiv.org/pdf/2304.14556
  • Abstract
    Network slicing enables the deployment of multiple dedicated virtual sub-networks, i.e. slices on a shared physical infrastructure. Unlike traditional one-size-fits-all resource provisioning schemes, each network slice (NS) in 5G is tailored to the specific service requirements of a group of customers. An end-to-end (E2E) mobile NS orchestration requires the simultaneous provisioning of computing, storage, and networking resources across the core network (CN) and the radio access network (RAN). Constant temporospatial changes in mobile user demand profiles further complicate the E2E NSs resource provisioning beyond the limits of the existing best-effort schemes that are only effective under accurate demand forecasts for all slices. This paper proposes a practical two-time-scale resource provisioning framework for E2E network slicing under demand uncertainty. At each macro-scale instance, we assume that only the spatial probability distribution of the NS demands is available. We formulate the NSs resource allocation problem as a stochastic mixed integer program (SMIP) with the objective of minimizing the total resource cost at the CN and the RAN. At each microscale instance, utilizing the exact slice demand profiles, a linear program is solved to jointly minimize the unsupported traffic and the resource cost at the RAN. We verify the effectiveness of our resource allocation scheme through numerical experiments.

LNMesh: Who Said You need Internet to send Bitcoin? Offline Lightning Network Payments using Community Wireless Mesh Networks

  • Authors: Ahmet Kurt, Abdulhadi Sahin, Ricardo Harrilal-Parchment, Kemal Akkaya
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.14559
  • Pdf link: https://arxiv.org/pdf/2304.14559
  • Abstract
    Bitcoin is undoubtedly a great alternative to today's existing digital payment systems. Even though Bitcoin's scalability has been debated for a long time, we see that it is no longer a concern thanks to its layer-2 solution Lightning Network (LN). LN has been growing non-stop since its creation and enabled fast, cheap, anonymous, censorship-resistant Bitcoin transactions. However, as known, LN nodes need an active Internet connection to operate securely which may not be always possible. For example, in the aftermath of natural disasters or power outages, users may not have Internet access for a while. Thus, in this paper, we propose LNMesh which enables offline LN payments on top of wireless mesh networks. Users of a neighborhood or a community can establish a wireless mesh network to use it as an infrastructure to enable offline LN payments when they do not have any Internet connection. As such, we first present proof-of-concept implementations where we successfully perform offline LN payments utilizing Bluetooth Low Energy and WiFi. For larger networks with more users where users can also move around, channel assignments in the network need to be made strategically and thus, we propose 1) minimum connected dominating set; and 2) uniform spanning tree based channel assignment approaches. Finally, to test these approaches, we implemented a simulator in Python along with the support of BonnMotion mobility tool. We then extensively tested the performance metrics of large-scale realistic offline LN payments on mobile wireless mesh networks. Our simulation results show that, success rates up to %95 are achievable with the proposed channel assignment approaches when channels have enough liquidity.

Caught in the Game: On the History and Evolution of Web Browser Gaming

  • Authors: Naif Mehanna (CRIStAL, CNRS, SPIRALS), Walter Rudametkin (UR, IUF, CNRS, IRISA, DiverSe)
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.14791
  • Pdf link: https://arxiv.org/pdf/2304.14791
  • Abstract
    Web browsers have come a long way since their inception, evolving from a simple means of displaying text documents over the network to complex software stacks with advanced graphics and network capabilities. As personal computers grew in popularity, developers jumped at the opportunity to deploy cross-platform games with centralized management and a low barrier to entry. Simply going to the right address is now enough to start a game. From text-based to GPU-powered 3D games, browser gaming has evolved to become a strong alternative to traditional console and mobile-based gaming, targeting both casual and advanced gamers. Browser technology has also evolved to accommodate more demanding applications, sometimes even supplanting functions typically left to the operating system. Today, websites display rich, computationally intensive, hardware-accelerated graphics, allowing developers to build ever-more impressive applications and games.In this paper, we present the evolution of browser gaming and the technologies that enabled it, from the release of the first text-based games in the early 1990s to current open-world and game-engine-powered browser games. We discuss the societal impact of browser gaming and how it has allowed a new target audience to accessdigital gaming. Finally, we review the potential future evolution ofthe browser gaming industry.

Representation Matters: The Game of Chess Poses a Challenge to Vision Transformers

  • Authors: Johannes Czech, Jannis Blüml, Kristian Kersting
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14918
  • Pdf link: https://arxiv.org/pdf/2304.14918
  • Abstract
    While transformers have gained the reputation as the "Swiss army knife of AI", no one has challenged them to master the game of chess, one of the classical AI benchmarks. Simply using vision transformers (ViTs) within AlphaZero does not master the game of chess, mainly because ViTs are too slow. Even making them more efficient using a combination of MobileNet and NextViT does not beat what actually matters: a simple change of the input representation and value loss, resulting in a greater boost of up to 180 Elo points over AlphaZero.

An Edge Assisted Robust Smart Traffic Management and Signalling System for Guiding Emergency Vehicles During Peak Hours

  • Authors: Shuvadeep Masanta, Ramyashree Pramanik, Sourav Ghosh, Tanmay Bhattacharya
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14924
  • Pdf link: https://arxiv.org/pdf/2304.14924
  • Abstract
    Congestion in traffic is an unavoidable circumstance in many cities in India and other countries. It is an issue of major concern. The steep rise in the number of automobiles on the roads followed by old infrastructure, accidents, pedestrian traffic, and traffic rule violations all add to challenging traffic conditions. Given these poor conditions of traffic, there is a critical need for automatically detecting and signaling systems. There are already various technologies that are used for traffic management and signaling systems like video analysis, infrared sensors, and wireless sensors. The main issue with these methods is they are very costly and high maintenance is required. In this paper, we have proposed a three-phase system that can guide emergency vehicles and manage traffic based on the degree of congestion. In the first phase, the system processes the captured images and calculates the Index value which is used to discover the degree of congestion. The Index value of a particular road depends on its width and the length up to which the camera captures images of that road. We have to take input for the parameters (length and width) while setting up the system. In the second phase, the system checks whether there are any emergency vehicles present or not in any lane. In the third phase, the whole processing and decision-making part is performed at the edge server. The proposed model is robust and it takes into consideration adverse weather conditions such as hazy, foggy, and windy. It works very efficiently in low light conditions also. The edge server is a strategically placed server that provides us with low latency and better connectivity. Using Edge technology in this traffic management system reduces the strain on cloud servers and the system becomes more reliable in real-time because the latency and bandwidth get reduced due to processing at the intermediate edge server.

Keyword: pruning

There is no result

Keyword: voxel

There is no result

Keyword: lidar

HyperMODEST: Self-Supervised 3D Object Detection with Confidence Score Filtering

  • Authors: Jenny Xu, Steven L. Waslander
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14446
  • Pdf link: https://arxiv.org/pdf/2304.14446
  • Abstract
    Current LiDAR-based 3D object detectors for autonomous driving are almost entirely trained on human-annotated data collected in specific geographical domains with specific sensor setups, making it difficult to adapt to a different domain. MODEST is the first work to train 3D object detectors without any labels. Our work, HyperMODEST, proposes a universal method implemented on top of MODEST that can largely accelerate the self-training process and does not require tuning on a specific dataset. We filter intermediate pseudo-labels used for data augmentation with low confidence scores. On the nuScenes dataset, we observe a significant improvement of 1.6% in AP BEV in 0-80m range at IoU=0.25 and an improvement of 1.7% in AP BEV in 0-80m range at IoU=0.5 while only using one-fifth of the training time in the original approach by MODEST. On the Lyft dataset, we also observe an improvement over the baseline during the first round of iterative self-training. We explore the trade-off between high precision and high recall in the early stage of the self-training process by comparing our proposed method with two other score filtering methods: confidence score filtering for pseudo-labels with and without static label retention. The code and models of this work are available at https://github.com/TRAILab/HyperMODEST

Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol Particles for Frontier Exploration

  • Authors: Alexander Kyuroson, Niklas Dahlquist, Nikolaos Stathoulopoulos, Vignesh Kottayam Viswanathan, Anton Koval, George Nikolakopoulos
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14520
  • Pdf link: https://arxiv.org/pdf/2304.14520
  • Abstract
    Algorithms for autonomous navigation in environments without Global Navigation Satellite System (GNSS) coverage mainly rely on onboard perception systems. These systems commonly incorporate sensors like cameras and LiDARs, the performance of which may degrade in the presence of aerosol particles. Thus, there is a need of fusing acquired data from these sensors with data from RADARs which can penetrate through such particles. Overall, this will improve the performance of localization and collision avoidance algorithms under such environmental conditions. This paper introduces a multimodal dataset from the harsh and unstructured underground environment with aerosol particles. A detailed description of the onboard sensors and the environment, where the dataset is collected are presented to enable full evaluation of acquired data. Furthermore, the dataset contains synchronized raw data measurements from all onboard sensors in Robot Operating System (ROS) format to facilitate the evaluation of navigation, and localization algorithms in such environments. In contrast to the existing datasets, the focus of this paper is not only to capture both temporal and spatial data diversities but also to present the impact of harsh conditions on captured data. Therefore, to validate the dataset, a preliminary comparison of odometry from onboard LiDARs is presented.

Fusion is Not Enough: Single-Modal Attacks to Compromise Fusion Models in Autonomous Driving

  • Authors: Zhiyuan Cheng, Hongjun Choi, James Liang, Shiwei Feng, Guanhong Tao, Dongfang Liu, Michael Zuzak, Xiangyu Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.14614
  • Pdf link: https://arxiv.org/pdf/2304.14614
  • Abstract
    Multi-sensor fusion (MSF) is widely adopted for perception in autonomous vehicles (AVs), particularly for the task of 3D object detection with camera and LiDAR sensors. The rationale behind fusion is to capitalize on the strengths of each modality while mitigating their limitations. The exceptional and leading performance of fusion models has been demonstrated by advanced deep neural network (DNN)-based fusion techniques. Fusion models are also perceived as more robust to attacks compared to single-modal ones due to the redundant information in multiple modalities. In this work, we challenge this perspective with single-modal attacks that targets the camera modality, which is considered less significant in fusion but more affordable for attackers. We argue that the weakest link of fusion models depends on their most vulnerable modality, and propose an attack framework that targets advanced camera-LiDAR fusion models with adversarial patches. Our approach employs a two-stage optimization-based strategy that first comprehensively assesses vulnerable image areas under adversarial attacks, and then applies customized attack strategies to different fusion models, generating deployable patches. Evaluations with five state-of-the-art camera-LiDAR fusion models on a real-world dataset show that our attacks successfully compromise all models. Our approach can either reduce the mean average precision (mAP) of detection performance from 0.824 to 0.353 or degrade the detection score of the target object from 0.727 to 0.151 on average, demonstrating the effectiveness and practicality of our proposed attack framework.

NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields

  • Authors: Junge Zhang, Feihu Zhang, Shaochen Kuang, Li Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14811
  • Pdf link: https://arxiv.org/pdf/2304.14811
  • Abstract
    Labeling LiDAR point clouds for training autonomous driving is extremely expensive and difficult. LiDAR simulation aims at generating realistic LiDAR data with labels for training and verifying self-driving algorithms more efficiently. Recently, Neural Radiance Fields (NeRF) have been proposed for novel view synthesis using implicit reconstruction of 3D scenes. Inspired by this, we present NeRF-LIDAR, a novel LiDAR simulation method that leverages real-world information to generate realistic LIDAR point clouds. Different from existing LiDAR simulators, we use real images and point cloud data collected by self-driving cars to learn the 3D scene representation, point cloud generation and label rendering. We verify the effectiveness of our NeRF-LiDAR by training different 3D segmentation models on the generated LiDAR point clouds. It reveals that the trained models are able to achieve similar accuracy when compared with the same model trained on the real LiDAR data. Besides, the generated data is capable of boosting the accuracy through pre-training which helps reduce the requirements of the real labeled data.

Keyword: diffusion

Learning a Diffusion Prior for NeRFs

  • Authors: Guandao Yang, Abhijit Kundu, Leonidas J. Guibas, Jonathan T. Barron, Ben Poole
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14473
  • Pdf link: https://arxiv.org/pdf/2304.14473
  • Abstract
    Neural Radiance Fields (NeRFs) have emerged as a powerful neural 3D representation for objects and scenes derived from 2D data. Generating NeRFs, however, remains difficult in many scenarios. For instance, training a NeRF with only a small number of views as supervision remains challenging since it is an under-constrained problem. In such settings, it calls for some inductive prior to filter out bad local minima. One way to introduce such inductive priors is to learn a generative model for NeRFs modeling a certain class of scenes. In this paper, we propose to use a diffusion model to generate NeRFs encoded on a regularized grid. We show that our model can sample realistic NeRFs, while at the same time allowing conditional generations, given a certain observation as guidance.

It is all about where you start: Text-to-image generation with seed selection

  • Authors: Dvir Samuel, Rami Ben-Ari, Simon Raviv, Nir Darshan, Gal Chechik
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14530
  • Pdf link: https://arxiv.org/pdf/2304.14530
  • Abstract
    Text-to-image diffusion models can synthesize a large variety of concepts in new compositions and scenarios. However, they still struggle with generating uncommon concepts, rare unusual combinations, or structured concepts like hand palms. Their limitation is partly due to the long-tail nature of their training data: web-crawled data sets are strongly unbalanced, causing models to under-represent concepts from the tail of the distribution. Here we characterize the effect of unbalanced training data on text-to-image models and offer a remedy. We show that rare concepts can be correctly generated by carefully selecting suitable generation seeds in the noise space, a technique that we call SeedSelect. SeedSelect is efficient and does not require retraining the diffusion model. We evaluate the benefit of SeedSelect on a series of problems. First, in few-shot semantic data augmentation, where we generate semantically correct images for few-shot and long-tail benchmarks. We show classification improvement on all classes, both from the head and tail of the training data of diffusion models. We further evaluate SeedSelect on correcting images of hands, a well-known pitfall of current diffusion models, and show that it improves hand generation substantially.

SceneGenie: Scene Graph Guided Diffusion Models for Image Synthesis

  • Authors: Azade Farshad, Yousef Yeganeh, Yu Chi, Chengzhi Shen, Björn Ommer, Nassir Navab
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14573
  • Pdf link: https://arxiv.org/pdf/2304.14573
  • Abstract
    Text-conditioned image generation has made significant progress in recent years with generative adversarial networks and more recently, diffusion models. While diffusion models conditioned on text prompts have produced impressive and high-quality images, accurately representing complex text prompts such as the number of instances of a specific object remains challenging. To address this limitation, we propose a novel guidance approach for the sampling process in the diffusion model that leverages bounding box and segmentation map information at inference time without additional training data. Through a novel loss in the sampling process, our approach guides the model with semantic features from CLIP embeddings and enforces geometric constraints, leading to high-resolution images that accurately represent the scene. To obtain bounding box and segmentation map information, we structure the text prompt as a scene graph and enrich the nodes with CLIP embeddings. Our proposed model achieves state-of-the-art performance on two public benchmarks for image generation from scene graphs, surpassing both scene graph to image and text-based diffusion models in various metrics. Our results demonstrate the effectiveness of incorporating bounding box and segmentation map guidance in the diffusion model sampling process for more accurate text-to-image generation.

MUDiff: Unified Diffusion for Complete Molecule Generation

  • Authors: Chenqing Hua, Sitao Luan, Minkai Xu, Rex Ying, Jie Fu, Stefano Ermon, Doina Precup
  • Subjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM)
  • Arxiv link: https://arxiv.org/abs/2304.14621
  • Pdf link: https://arxiv.org/pdf/2304.14621
  • Abstract
    We present a new model for generating molecular data by combining discrete and continuous diffusion processes. Our model generates a comprehensive representation of molecules, including atom features, 2D discrete molecule structures, and 3D continuous molecule coordinates. The use of diffusion processes allows for capturing the probabilistic nature of molecular processes and the ability to explore the effect of different factors on molecular structures and properties. Additionally, we propose a novel graph transformer architecture to denoise the diffusion process. The transformer is equivariant to Euclidean transformations, allowing it to learn invariant atom and edge representations while preserving the equivariance of atom coordinates. This transformer can be used to learn molecular representations robust to geometric transformations. We evaluate the performance of our model through experiments and comparisons with existing methods, showing its ability to generate more stable and valid molecules with good properties. Our model is a promising approach for designing molecules with desired properties and can be applied to a wide range of tasks in molecular modeling.

Keyword: dynamic

SSTM: Spatiotemporal Recurrent Transformers for Multi-frame Optical Flow Estimation

  • Authors: Fisseha Admasu Ferede, Madhusudhanan Balasubramanian
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14418
  • Pdf link: https://arxiv.org/pdf/2304.14418
  • Abstract
    Inaccurate optical flow estimates in and near occluded regions, and out-of-boundary regions are two of the current significant limitations of optical flow estimation algorithms. Recent state-of-the-art optical flow estimation algorithms are two-frame based methods where optical flow is estimated sequentially for each consecutive image pair in a sequence. While this approach gives good flow estimates, it fails to generalize optical flows in occluded regions mainly due to limited local evidence regarding moving elements in a scene. In this work, we propose a learning-based multi-frame optical flow estimation method that estimates two or more consecutive optical flows in parallel from multi-frame image sequences. Our underlying hypothesis is that by understanding temporal scene dynamics from longer sequences with more than two frames, we can characterize pixel-wise dependencies in a larger spatiotemporal domain, generalize complex motion patterns and thereby improve the accuracy of optical flow estimates in occluded regions. We present learning-based spatiotemporal recurrent transformers for multi-frame based optical flow estimation (SSTMs). Our method utilizes 3D Convolutional Gated Recurrent Units (3D-ConvGRUs) and spatiotemporal transformers to learn recurrent space-time motion dynamics and global dependencies in the scene and provide a generalized optical flow estimation. When compared with recent state-of-the-art two-frame and multi-frame methods on real world and synthetic datasets, performance of the SSTMs were significantly higher in occluded and out-of-boundary regions. Among all published state-of-the-art multi-frame methods, SSTM achieved state-of the-art results on the Sintel Final and KITTI2015 benchmark datasets.

One-Step Distributional Reinforcement Learning

  • Authors: Mastane Achab, Reda Alami, Yasser Abdelaziz Dahou Djilali, Kirill Fedyanin, Eric Moulines
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.14421
  • Pdf link: https://arxiv.org/pdf/2304.14421
  • Abstract
    Reinforcement learning (RL) allows an agent interacting sequentially with an environment to maximize its long-term expected return. In the distributional RL (DistrRL) paradigm, the agent goes beyond the limit of the expected value, to capture the underlying probability distribution of the return across all time steps. The set of DistrRL algorithms has led to improved empirical performance. Nevertheless, the theory of DistrRL is still not fully understood, especially in the control case. In this paper, we present the simpler one-step distributional reinforcement learning (OS-DistrRL) framework encompassing only the randomness induced by the one-step dynamics of the environment. Contrary to DistrRL, we show that our approach comes with a unified theory for both policy evaluation and control. Indeed, we propose two OS-DistrRL algorithms for which we provide an almost sure convergence analysis. The proposed approach compares favorably with categorical DistrRL on various environments.

MINN: Learning the dynamics of differential-algebraic equations and application to battery modeling

  • Authors: Yicun Huang, Changfu Zou, Yang Li, Torsten Wik
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14422
  • Pdf link: https://arxiv.org/pdf/2304.14422
  • Abstract
    The concept of integrating physics-based and data-driven approaches has become popular for modeling sustainable energy systems. However, the existing literature mainly focuses on the data-driven surrogates generated to replace physics-based models. These models often trade accuracy for speed but lack the generalisability, adaptability, and interpretability inherent in physics-based models, which are often indispensable in the modeling of real-world dynamic systems for optimization and control purposes. In this work, we propose a novel architecture for generating model-integrated neural networks (MINN) to allow integration on the level of learning physics-based dynamics of the system. The obtained hybrid model solves an unsettled research problem in control-oriented modeling, i.e., how to obtain an optimally simplified model that is physically insightful, numerically accurate, and computationally tractable simultaneously. We apply the proposed neural network architecture to model the electrochemical dynamics of lithium-ion batteries and show that MINN is extremely data-efficient to train while being sufficiently generalizable to previously unseen input data, owing to its underlying physical invariants. The MINN battery model has an accuracy comparable to the first principle-based model in predicting both the system outputs and any locally distributed electrochemical behaviors but achieves two orders of magnitude reduction in the solution time.

Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors

  • Authors: Héctor Martínez, Sandra Catalán, Francisco D. Igual, José R. Herrero, Rafael Rodríguez-Sánchez, Enrique S. Quintana-Ortí
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.14480
  • Pdf link: https://arxiv.org/pdf/2304.14480
  • Abstract
    This paper advocates for an intertwined design of the dense linear algebra software stack that breaks down the strict barriers between the high-level, blocked algorithms in LAPACK (Linear Algebra PACKage) and the low-level, architecture-dependent kernels in BLAS (Basic Linear Algebra Subprograms). Specifically, we propose customizing the GEMM (general matrix multiplication) kernel, which is invoked from the blocked algorithms for relevant matrix factorizations in LAPACK, to improve performance on modern multicore processors with hierarchical cache memories. To achieve this, we leverage an analytical model to dynamically adapt the cache configuration parameters of the GEMM to the shape of the matrix operands. Additionally, we accommodate a flexible development of architecture-specific micro-kernels that allow us to further improve the utilization of the cache hierarchy. Our experiments on two platforms, equipped with ARM (NVIDIA Carmel, Neon) and x86 (AMD EPYC, AVX2) multi-core processors, demonstrate the benefits of this approach in terms of better cache utilization and, in general, higher performance. However, they also reveal the delicate balance between optimizing for multi-threaded parallelism versus cache usage.

Deep state-space modeling for explainable representation, analysis, and generation of professional human poses

  • Authors: Brenda Elizabeth Olivas-Padilla, Sotiris Manitsaris
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.14502
  • Pdf link: https://arxiv.org/pdf/2304.14502
  • Abstract
    The analysis of human movements has been extensively studied due to its wide variety of practical applications. Nevertheless, the state-of-the-art still faces scientific challenges while modeling human movements. Firstly, new models that account for the stochasticity of human movement and the physical structure of the human body are required to accurately predict the evolution of full-body motion descriptors over time. Secondly, the explainability of existing deep learning algorithms regarding their body posture predictions while generating human movements still needs to be improved as they lack comprehensible representations of human movement. This paper addresses these challenges by introducing three novel approaches for creating explainable representations of human movement. In this work, full-body movement is formulated as a state-space model of a dynamic system whose parameters are estimated using deep learning and statistical algorithms. The representations adhere to the structure of the Gesture Operational Model (GOM), which describes movement through its spatial and temporal assumptions. Two approaches correspond to deep state-space models that apply nonlinear network parameterization to provide interpretable posture predictions. The third method trains GOM representations using one-shot training with Kalman Filters. This training strategy enables users to model single movements and estimate their mathematical representation using procedures that require less computational power than deep learning algorithms. Ultimately, two applications of the generated representations are presented. The first is for the accurate generation of human movements, and the second is for body dexterity analysis of professional movements, where dynamic associations between body joints and meaningful motion descriptors are identified.

pyBibX -- A Python Library for Bibliometric and Scientometric Analysis Powered with Artificial Intelligence Tools

  • Authors: Valdecy Pereira, Marcio Pereira Basilio, Carlos Henrique Tarjano Santos
  • Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14516
  • Pdf link: https://arxiv.org/pdf/2304.14516
  • Abstract
    Bibliometric and Scientometric analyses offer invaluable perspectives on the complex research terrain and collaborative dynamics spanning diverse academic disciplines. This paper presents pyBibX, a python library devised to conduct comprehensive bibliometric and scientometric analyses on raw data files sourced from Scopus, Web of Science, and PubMed, seamlessly integrating state of the art AI capabilities into its core functionality. The library executes a comprehensive EDA, presenting outcomes via visually appealing graphical illustrations. Network capabilities have been deftly integrated, encompassing Citation, Collaboration, and Similarity Analysis. Furthermore, the library incorporates AI capabilities, including Embedding vectors, Topic Modeling, Text Summarization, and other general Natural Language Processing tasks, employing models such as Sentence-BERT, BerTopic, BERT, chatGPT, and PEGASUS. As a demonstration, we have analyzed 184 documents associated with multiple-criteria decision analysis published between 1984 and 2023. The EDA emphasized a growing fascination with decision-making and fuzzy logic methodologies. Next, Network Analysis further accentuated the significance of central authors and intra-continental collaboration, identifying Canada and China as crucial collaboration hubs. Finally, AI Analysis distinguished two primary topics and chatGPT preeminence in Text Summarization. It also proved to be an indispensable instrument for interpreting results, as our library enables researchers to pose inquiries to chatGPT regarding bibliometric outcomes. Even so, data homogeneity remains a daunting challenge due to database inconsistencies. PyBibX is the first application integrating cutting-edge AI capabilities for analyzing scientific publications, enabling researchers to examine and interpret these outcomes more effectively.

Ensemble Modeling with Contrastive Knowledge Distillation for Sequential Recommendation

  • Authors: Hanwen Du, Huanhuan Yuan, Pengpeng Zhao, Fuzhen Zhuang, Guanfeng Liu, Lei Zhao, Yanchi Liu, Victor S. Sheng
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.14668
  • Pdf link: https://arxiv.org/pdf/2304.14668
  • Abstract
    Sequential recommendation aims to capture users' dynamic interest and predicts the next item of users' preference. Most sequential recommendation methods use a deep neural network as sequence encoder to generate user and item representations. Existing works mainly center upon designing a stronger sequence encoder. However, few attempts have been made with training an ensemble of networks as sequence encoders, which is more powerful than a single network because an ensemble of parallel networks can yield diverse prediction results and hence better accuracy. In this paper, we present Ensemble Modeling with contrastive Knowledge Distillation for sequential recommendation (EMKD). Our framework adopts multiple parallel networks as an ensemble of sequence encoders and recommends items based on the output distributions of all these networks. To facilitate knowledge transfer between parallel networks, we propose a novel contrastive knowledge distillation approach, which performs knowledge transfer from the representation level via Intra-network Contrastive Learning (ICL) and Cross-network Contrastive Learning (CCL), as well as Knowledge Distillation (KD) from the logits level via minimizing the Kullback-Leibler divergence between the output distributions of the teacher network and the student network. To leverage contextual information, we train the primary masked item prediction task alongside the auxiliary attribute prediction task as a multi-task learning scheme. Extensive experiments on public benchmark datasets show that EMKD achieves a significant improvement compared with the state-of-the-art methods. Besides, we demonstrate that our ensemble method is a generalized approach that can also improve the performance of other sequential recommenders. Our code is available at this link: https://github.com/hw-du/EMKD.

NeuralKG-ind: A Python Library for Inductive Knowledge Graph Representation Learning

  • Authors: Wen Zhang, Zhen Yao, Mingyang Chen, Zhiwei Huang, Huajun Chen
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14678
  • Pdf link: https://arxiv.org/pdf/2304.14678
  • Abstract
    Since the dynamic characteristics of knowledge graphs, many inductive knowledge graph representation learning (KGRL) works have been proposed in recent years, focusing on enabling prediction over new entities. NeuralKG-ind is the first library of inductive KGRL as an important update of NeuralKG library. It includes standardized processes, rich existing methods, decoupled modules, and comprehensive evaluation metrics. With NeuralKG-ind, it is easy for researchers and engineers to reproduce, redevelop, and compare inductive KGRL methods. The library, experimental methodologies, and model re-implementing results of NeuralKG-ind are all publicly released at https://github.com/zjukg/NeuralKG/tree/ind .

Metric Temporal Equilibrium Logic over Timed Traces

  • Authors: Arvid Becker, Pedro Cabalar, Martín Diéguez, Torsten Schaub, Anna Schuhmann
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14778
  • Pdf link: https://arxiv.org/pdf/2304.14778
  • Abstract
    In temporal extensions of Answer Set Programming (ASP) based on linear-time, the behavior of dynamic systems is captured by sequences of states. While this representation reflects their relative order, it abstracts away the specific times associated with each state. However, timing constraints are important in many applications like, for instance, when planning and scheduling go hand in hand. We address this by developing a metric extension of linear-time temporal equilibrium logic, in which temporal operators are constrained by intervals over natural numbers. The resulting Metric Equilibrium Logic provides the foundation of an ASP-based approach for specifying qualitative and quantitative dynamic constraints. To this end, we define a translation of metric formulas into monadic first-order formulas and give a correspondence between their models in Metric Equilibrium Logic and Monadic Quantified Equilibrium Logic, respectively. Interestingly, our translation provides a blue print for implementation in terms of ASP modulo difference constraints.

Regret Optimal Control for Uncertain Stochastic Systems

  • Authors: Andrea Martin, Luca Furieri, Florian Dörfler, John Lygeros, Giancarlo Ferrari-Trecate
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.14835
  • Pdf link: https://arxiv.org/pdf/2304.14835
  • Abstract
    We consider control of uncertain linear time-varying stochastic systems from the perspective of regret minimization. Specifically, we focus on the problem of designing a feedback controller that minimizes the loss relative to a clairvoyant optimal policy that has foreknowledge of the system dynamics and the exogenous disturbances. In this competitive framework, establishing robustness guarantees proves challenging as, differently from the case where the model is known, the benchmark policy is not only inapplicable, but also impossible to compute without knowledge of the system parameters. To overcome this issue, we embrace a scenario optimization approach, and we propose minimizing regret robustly over a finite set of randomly sampled system parameters. We prove that this policy optimization problem can be efficiently solved through semidefinite programming, and that the corresponding solution retains strong probabilistic out-of-sample regret guarantees in face of the uncertain dynamics. Our method naturally extends to include satisfaction of safety constraints with high probability. We validate our theoretical results and showcase the potential of our approach by means of numerical simulations.

IMP: Iterative Matching and Pose Estimation with Adaptive Pooling

  • Authors: Fei Xue, Ignas Budvytis, Roberto Cipolla
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14837
  • Pdf link: https://arxiv.org/pdf/2304.14837
  • Abstract
    Previous methods solve feature matching and pose estimation using a two-stage process by first finding matches and then estimating the pose. As they ignore the geometric relationships between the two tasks, they focus on either improving the quality of matches or filtering potential outliers, leading to limited efficiency or accuracy. In contrast, we propose an iterative matching and pose estimation framework (IMP) leveraging the geometric connections between the two tasks: a few good matches are enough for a roughly accurate pose estimation; a roughly accurate pose can be used to guide the matching by providing geometric constraints. To this end, we implement a geometry-aware recurrent attention-based module which jointly outputs sparse matches and camera poses. Specifically, for each iteration, we first implicitly embed geometric information into the module via a pose-consistency loss, allowing it to predict geometry-aware matches progressively. Second, we introduce an \textbf{e}fficient IMP, called EIMP, to dynamically discard keypoints without potential matches, avoiding redundant updating and significantly reducing the quadratic time complexity of attention computation in transformers. Experiments on YFCC100m, Scannet, and Aachen Day-Night datasets demonstrate that the proposed method outperforms previous approaches in terms of accuracy and efficiency.

Sampling-based Path Planning Algorithms: A Survey

  • Authors: Alka Choudhary
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14839
  • Pdf link: https://arxiv.org/pdf/2304.14839
  • Abstract
    Path planning is a classic problem for autonomous robots. To ensure safe and efficient point-to-point navigation an appropriate algorithm should be chosen keeping the robot's dimensions and its classification in mind. Autonomous robots use path-planning algorithms to safely navigate a dynamic, dense, and unknown environment. A few metrics for path planning algorithms to be taken into account are safety, efficiency, lowest-cost path generation, and obstacle avoidance. Before path planning can take place we need map representation which can be discretized or open configuration space. Discretized configuration space provides node/connectivity information from one point to another. While in open/free configuration space it is up to the algorithm to create a list of nodes and then find a feasible path. Both types of maps are populated by obstacle positions using perception obstacle detection techniques to represent current obstacles from the perspective of the robot. For open configuration spaces, sampling-based planning algorithms are used. This paper aims to explore various types of Sampling-based path-planning algorithms such as Probabilistic RoadMap (PRM), and Rapidly-exploring Random Trees (RRT). These two algorithms also have optimized versions - PRM* and RRT* and this paper discusses how that optimization is achieved and is beneficial.

MASK-CNN-Transformer For Real-Time Multi-Label Weather Recognition

  • Authors: Shengchao Chen, Ting Shu, Huan Zhao, Yuan Yan Tan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14857
  • Pdf link: https://arxiv.org/pdf/2304.14857
  • Abstract
    Weather recognition is an essential support for many practical life applications, including traffic safety, environment, and meteorology. However, many existing related works cannot comprehensively describe weather conditions due to their complex co-occurrence dependencies. This paper proposes a novel multi-label weather recognition model considering these dependencies. The proposed model called MASK-Convolutional Neural Network-Transformer (MASK-CT) is based on the Transformer, the convolutional process, and the MASK mechanism. The model employs multiple convolutional layers to extract features from weather images and a Transformer encoder to calculate the probability of each weather condition based on the extracted features. To improve the generalization ability of MASK-CT, a MASK mechanism is used during the training phase. The effect of the MASK mechanism is explored and discussed. The Mask mechanism randomly withholds some information from one-pair training instances (one image and its corresponding label). There are two types of MASK methods. Specifically, MASK-I is designed and deployed on the image before feeding it into the weather feature extractor and MASK-II is applied to the image label. The Transformer encoder is then utilized on the randomly masked image features and labels. The experimental results from various real-world weather recognition datasets demonstrate that the proposed MASK-CT model outperforms state-of-the-art methods. Furthermore, the high-speed dynamic real-time weather recognition capability of the MASK-CT is evaluated.

Topic-oriented Adversarial Attacks against Black-box Neural Ranking Models

  • Authors: Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, Xueqi Cheng
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.14867
  • Pdf link: https://arxiv.org/pdf/2304.14867
  • Abstract
    Neural ranking models (NRMs) have attracted considerable attention in information retrieval. Unfortunately, NRMs may inherit the adversarial vulnerabilities of general neural networks, which might be leveraged by black-hat search engine optimization practitioners. Recently, adversarial attacks against NRMs have been explored in the paired attack setting, generating an adversarial perturbation to a target document for a specific query. In this paper, we focus on a more general type of perturbation and introduce the topic-oriented adversarial ranking attack task against NRMs, which aims to find an imperceptible perturbation that can promote a target document in ranking for a group of queries with the same topic. We define both static and dynamic settings for the task and focus on decision-based black-box attacks. We propose a novel framework to improve topic-oriented attack performance based on a surrogate ranking model. The attack problem is formalized as a Markov decision process (MDP) and addressed using reinforcement learning. Specifically, a topic-oriented reward function guides the policy to find a successful adversarial example that can be promoted in rankings to as many queries as possible in a group. Experimental results demonstrate that the proposed framework can significantly outperform existing attack strategies, and we conclude by re-iterating that there exist potential risks for applying NRMs in the real world.

A novel reduced-order model for advection-dominated problems based on Radon-Cumulative-Distribution Transform

  • Authors: Tobias Long, Robert Barnett, Richard Jefferson-Loveday, Giovanni Stabile, Matteo Icardi
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14883
  • Pdf link: https://arxiv.org/pdf/2304.14883
  • Abstract
    Problems with dominant advection, discontinuities, travelling features, or shape variations are widespread in computational mechanics. However, classical linear model reduction and interpolation methods typically fail to reproduce even relatively small parameter variations, making the reduced models inefficient and inaccurate. In this work a novel reduced order modelling approach is proposed based on the Radon-Cumulative-Distribution transform (RCDT). We show that this non-linear transformation can significantly improve the dimensionality of proper orthogonal decomposition (POD) reconstructions and is capable of interpolating accurately some advection-dominated phenomena. The method is tested on various testcases in multiphase fluid dynamics.

A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks

  • Authors: Marc Finzi, Andres Potapczynski, Matthew Choptuik, Andrew Gordon Wilson
  • Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.14994
  • Pdf link: https://arxiv.org/pdf/2304.14994
  • Abstract
    Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible. While global minimization of the PDE residual over the network parameters works well for boundary value problems, catastrophic forgetting impairs the applicability of this approach to initial value problems (IVPs). In an alternative local-in-time approach, the optimization problem can be converted into an ordinary differential equation (ODE) on the network parameters and the solution propagated forward in time; however, we demonstrate that current methods based on this approach suffer from two key issues. First, following the ODE produces an uncontrolled growth in the conditioning of the problem, ultimately leading to unacceptably large numerical errors. Second, as the ODE methods scale cubically with the number of model parameters, they are restricted to small neural networks, significantly limiting their ability to represent intricate PDE initial conditions and solutions. Building on these insights, we develop Neural IVP, an ODE based IVP solver which prevents the network from getting ill-conditioned and runs in time linear in the number of parameters, enabling us to evolve the dynamics of challenging PDEs with neural networks.

Maximizing Reachability Probabilities in Rectangular Automata with Random Clocks

  • Authors: Joanna Delicaris, Stefan Schupp, Erika Ábrahám, Anne Remke
  • Subjects: Formal Languages and Automata Theory (cs.FL)
  • Arxiv link: https://arxiv.org/abs/2304.14996
  • Pdf link: https://arxiv.org/pdf/2304.14996
  • Abstract
    This paper proposes an algorithm to maximize reachability probabilities for rectangular automata with random clocks via a history-dependent prophetic scheduler. This model class incorporates time-induced nondeterminism on discrete behavior and nondeterminism in the dynamic behavior. After computing reachable state sets via a forward flowpipe construction, we use backward refinement to compute maximum reachability probabilities. The feasibility of the presented approach is illustrated on a scalable model.

New submissions for Thu, 23 Mar 23

Keyword: pruning

GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering

  • Authors: Dhaval Taunk, Lakshya Khanna, Pavan Kandru, Vasudeva Varma, Charu Sharma, Makarand Tapaswi
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2303.12320
  • Pdf link: https://arxiv.org/pdf/2303.12320
  • Abstract
    Commonsense question-answering (QA) methods combine the power of pre-trained Language Models (LM) with the reasoning provided by Knowledge Graphs (KG). A typical approach collects nodes relevant to the QA pair from a KG to form a Working Graph (WG) followed by reasoning using Graph Neural Networks(GNNs). This faces two major challenges: (i) it is difficult to capture all the information from the QA in the WG, and (ii) the WG contains some irrelevant nodes from the KG. To address these, we propose GrapeQA with two simple improvements on the WG: (i) Prominent Entities for Graph Augmentation identifies relevant text chunks from the QA pair and augments the WG with corresponding latent representations from the LM, and (ii) Context-Aware Node Pruning removes nodes that are less relevant to the QA pair. We evaluate our results on OpenBookQA, CommonsenseQA and MedQA-USMLE and see that GrapeQA shows consistent improvements over its LM + KG predecessor (QA-GNN in particular) and large improvements on OpenBookQA.

Edge Deep Learning Model Protection via Neuron Authorization

  • Authors: Jinyin Chen, Tao Liu, Rongchang Li, Yao Cheng, Xuhong Zhang, Shouling Ji, Haibin Zheng
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.12397
  • Pdf link: https://arxiv.org/pdf/2303.12397
  • Abstract
    With the development of deep learning processors and accelerators, deep learning models have been widely deployed on edge devices as part of the Internet of Things. Edge device models are generally considered as valuable intellectual properties that are worth for careful protection. Unfortunately, these models have a great risk of being stolen or illegally copied. The existing model protections using encryption algorithms are suffered from high computation overhead which is not practical due to the limited computing capacity on edge devices. In this work, we propose a light-weight, practical, and general Edge device model Pro tection method at neuron level, denoted as EdgePro. Specifically, we select several neurons as authorization neurons and set their activation values to locking values and scale the neuron outputs as the "asswords" during training. EdgePro protects the model by ensuring it can only work correctly when the "passwords" are met, at the cost of encrypting and storing the information of the "passwords" instead of the whole model. Extensive experimental results indicate that EdgePro can work well on the task of protecting on datasets with different modes. The inference time increase of EdgePro is only 60% of state-of-the-art methods, and the accuracy loss is less than 1%. Additionally, EdgePro is robust against adaptive attacks including fine-tuning and pruning, which makes it more practical in real-world applications. EdgePro is also open sourced to facilitate future research: https://github.com/Leon022/Edg

Keyword: neural\ architecture\ search

There is no result

Keyword: 3d object detection

OcTr: Octree-based Transformer for 3D Object Detection

  • Authors: Chao Zhou, Yanan Zhang, Jiaxin Chen, Di Huang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.12621
  • Pdf link: https://arxiv.org/pdf/2303.12621
  • Abstract
    A key challenge for LiDAR-based 3D object detection is to capture sufficient features from large scale 3D scenes especially for distant or/and occluded objects. Albeit recent efforts made by Transformers with the long sequence modeling capability, they fail to properly balance the accuracy and efficiency, suffering from inadequate receptive fields or coarse-grained holistic correlations. In this paper, we propose an Octree-based Transformer, named OcTr, to address this issue. It first constructs a dynamic octree on the hierarchical feature pyramid through conducting self-attention on the top level and then recursively propagates to the level below restricted by the octants, which captures rich global context in a coarse-to-fine manner while maintaining the computational complexity under control. Furthermore, for enhanced foreground perception, we propose a hybrid positional embedding, composed of the semantic-aware positional embedding and attention mask, to fully exploit semantic and geometry clues. Extensive experiments are conducted on the Waymo Open Dataset and KITTI Dataset, and OcTr reaches newly state-of-the-art results.

DR.CPO: Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion

  • Authors: Jungwook Shin, Jaeill Kim, Kyungeun Lee, Hyunghun Cho, Wonjong Rhee
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.12743
  • Pdf link: https://arxiv.org/pdf/2303.12743
  • Abstract
    In autonomous driving, data augmentation is commonly used for improving 3D object detection. The most basic methods include insertion of copied objects and rotation and scaling of the entire training frame. Numerous variants have been developed as well. The existing methods, however, are considerably limited when compared to the variety of the real world possibilities. In this work, we develop a diversified and realistic augmentation method that can flexibly construct a whole-body object, freely locate and rotate the object, and apply self-occlusion and external-occlusion accordingly. To improve the diversity of the whole-body object construction, we develop an iterative method that stochastically combines multiple objects observed from the real world into a single object. Unlike the existing augmentation methods, the constructed objects can be randomly located and rotated in the training frame because proper occlusions can be reflected to the whole-body objects in the final step. Finally, proper self-occlusion at each local object level and external-occlusion at the global frame level are applied using the Hidden Point Removal (HPR) algorithm that is computationally efficient. HPR is also used for adaptively controlling the point density of each object according to the object's distance from the LiDAR. Experiment results show that the proposed DR.CPO algorithm is data-efficient and model-agnostic without incurring any computational overhead. Also, DR.CPO can improve mAP performance by 2.08% when compared to the best 3D detection result known for KITTI dataset. The code is available at https://github.com/SNU-DRL/DRCPO.git

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

  • Authors: Hansheng Chen, Wei Tian, Pichao Wang, Fan Wang, Lu Xiong, Hao Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.12787
  • Pdf link: https://arxiv.org/pdf/2303.12787
  • Abstract
    Locating 3D objects from a single RGB image via Perspective-n-Point (PnP) is a long-standing problem in computer vision. Driven by end-to-end deep learning, recent studies suggest interpreting PnP as a differentiable layer, allowing for partial learning of 2D-3D point correspondences by backpropagating the gradients of pose loss. Yet, learning the entire correspondences from scratch is highly challenging, particularly for ambiguous pose solutions, where the globally optimal pose is theoretically non-differentiable w.r.t. the points. In this paper, we propose the EPro-PnP, a probabilistic PnP layer for general end-to-end pose estimation, which outputs a distribution of pose with differentiable probability density on the SE(3) manifold. The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution. The underlying principle generalizes previous approaches, and resembles the attention mechanism. EPro-PnP can enhance existing correspondence networks, closing the gap between PnP-based method and the task-specific leaders on the LineMOD 6DoF pose estimation benchmark. Furthermore, EPro-PnP helps to explore new possibilities of network design, as we demonstrate a novel deformable correspondence network with the state-of-the-art pose accuracy on the nuScenes 3D object detection benchmark. Our code is available at https://github.com/tjiiv-cprg/EPro-PnP-v2.

Keyword: voxel

LiDARFormer: A Unified Transformer-based Multi-task Network for LiDAR Perception

  • Authors: Zixiang Zhou, Dongqiangzi Ye, Weijia Chen, Yufei Xie, Yu Wang, Panqu Wang, Hassan Foroosh
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.12194
  • Pdf link: https://arxiv.org/pdf/2303.12194
  • Abstract
    There is a recent trend in the LiDAR perception field towards unifying multiple tasks in a single strong network with improved performance, as opposed to using separate networks for each task. In this paper, we introduce a new LiDAR multi-task learning paradigm based on the transformer. The proposed LiDARFormer utilizes cross-space global contextual feature information and exploits cross-task synergy to boost the performance of LiDAR perception tasks across multiple large-scale datasets and benchmarks. Our novel transformer-based framework includes a cross-space transformer module that learns attentive features between the 2D dense Bird's Eye View (BEV) and 3D sparse voxel feature maps. Additionally, we propose a transformer decoder for the segmentation task to dynamically adjust the learned features by leveraging the categorical feature representations. Furthermore, we combine the segmentation and detection features in a shared transformer decoder with cross-task attention layers to enhance and integrate the object-level and class-level features. LiDARFormer is evaluated on the large-scale nuScenes and the Waymo Open datasets for both 3D detection and semantic segmentation tasks, and it outperforms all previously published methods on both tasks. Notably, LiDARFormer achieves the state-of-the-art performance of 76.4% L2 mAPH and 74.3% NDS on the challenging Waymo and nuScenes detection benchmarks for a single model LiDAR-only method.

Uni-Fusion: Universal Continuous Mapping

  • Authors: Yijun Yuan, Andreas Nuechter
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.12678
  • Pdf link: https://arxiv.org/pdf/2303.12678
  • Abstract
    We introduce Uni-Fusion, an universal continuous mapping framework for surfaces, surface properties (color, infrared, etc.) and more (latent features in CLIP embedding space, etc.). We propose the first Universal Implicit Encoding model that supports encoding of both geometry and various types of properties (RGB, infrared, feature and etc.) without the need for any training. Based on that, our framework divides the point cloud into regular grid voxels and produces a latent feature in each voxel to form a Latent Implicit Map (LIM) for geometries and arbitrary properties. Then, by fusing a Local LIM of new frame to Global LIM, an incremental reconstruction is approached. Encoded with corresponding types of data, our Latent Implicit Map is capable to generate continuous surfaces, surface properties fields, surface feature fields and any other possible options. To demonstrate the capabilities of our model, we implement three applications: (1) incremental reconstruction for surfaces and color (2) 2D-to-3D fabricated properties transfers (3) open-vocabulary scene understanding by producing a text CLIP feature field on surfaces. We evaluate Uni-Fusion by comparing in corresponding applications, from which, Uni-Fusion shows high flexibility to various of application while performing best or competitive. The project page of Uni-Fusion is available at https://jarrome.github.io/Uni-Fusion/

Optimizing CAD Models with Latent Space Manipulation

  • Authors: Jannes Elstner, Raoul G. C. Schönhof, Steffen Tauber, Marco F Huber
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.12739
  • Pdf link: https://arxiv.org/pdf/2303.12739
  • Abstract
    When it comes to the optimization of CAD models in the automation domain, neural networks currently play only a minor role. Optimizing abstract features such as automation capability is challenging, since they can be very difficult to simulate, are too complex for rule-based systems, and also have little to no data available for machine-learning methods. On the other hand, image manipulation methods that can manipulate abstract features in images such as StyleCLIP have seen much success. They rely on the latent space of pretrained generative adversarial networks, and could therefore also make use of the vast amount of unlabeled CAD data. In this paper, we show that such an approach is also suitable for optimizing abstract automation-related features of CAD parts. We achieved this by extending StyleCLIP to work with CAD models in the form of voxel models, which includes using a 3D StyleGAN and a custom classifier. Finally, we demonstrate the ability of our system for the optimiziation of automation-related features by optimizing the grabability of various CAD models. This is an open access article under the CC BY-NC-ND license (this http URL) Peer review under the responsibility of the scientific committee of the 33rd CIRP Design Conference.

Keyword: lidar

LiDARFormer: A Unified Transformer-based Multi-task Network for LiDAR Perception

  • Authors: Zixiang Zhou, Dongqiangzi Ye, Weijia Chen, Yufei Xie, Yu Wang, Panqu Wang, Hassan Foroosh
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.12194
  • Pdf link: https://arxiv.org/pdf/2303.12194
  • Abstract
    There is a recent trend in the LiDAR perception field towards unifying multiple tasks in a single strong network with improved performance, as opposed to using separate networks for each task. In this paper, we introduce a new LiDAR multi-task learning paradigm based on the transformer. The proposed LiDARFormer utilizes cross-space global contextual feature information and exploits cross-task synergy to boost the performance of LiDAR perception tasks across multiple large-scale datasets and benchmarks. Our novel transformer-based framework includes a cross-space transformer module that learns attentive features between the 2D dense Bird's Eye View (BEV) and 3D sparse voxel feature maps. Additionally, we propose a transformer decoder for the segmentation task to dynamically adjust the learned features by leveraging the categorical feature representations. Furthermore, we combine the segmentation and detection features in a shared transformer decoder with cross-task attention layers to enhance and integrate the object-level and class-level features. LiDARFormer is evaluated on the large-scale nuScenes and the Waymo Open datasets for both 3D detection and semantic segmentation tasks, and it outperforms all previously published methods on both tasks. Notably, LiDARFormer achieves the state-of-the-art performance of 76.4% L2 mAPH and 74.3% NDS on the challenging Waymo and nuScenes detection benchmarks for a single model LiDAR-only method.

RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point Cloud Registration

  • Authors: Jiuming Liu, Guangming Wang, Zhe Liu, Chaokang Jiang, Marc Pollefeys, Hesheng Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.12384
  • Pdf link: https://arxiv.org/pdf/2303.12384
  • Abstract
    Although point cloud registration has achieved remarkable advances in object-level and indoor scenes, large-scale registration methods are rarely explored. Challenges mainly arise from the huge point number, complex distribution, and outliers of outdoor LiDAR scans. In addition, most existing registration works generally adopt a two-stage paradigm: They first find correspondences by extracting discriminative local features, and then leverage estimators (eg. RANSAC) to filter outliers, which are highly dependent on well-designed descriptors and post-processing choices. To address these problems, we propose an end-to-end transformer network (RegFormer) for large-scale point cloud alignment without any further post-processing. Specifically, a projection-aware hierarchical transformer is proposed to capture long-range dependencies and filter outliers by extracting point features globally. Our transformer has linear complexity, which guarantees high efficiency even for large-scale scenes. Furthermore, to effectively reduce mismatches, a bijective association transformer is designed for regressing the initial transformation. Extensive experiments on KITTI and NuScenes datasets demonstrate that our RegFormer achieves state-of-the-art performance in terms of both accuracy and efficiency.

An Effective Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

  • Authors: Chaoda Zheng, Xu Yan, Haiming Zhang, Baoyuan Wang, Shenghui Cheng, Shuguang Cui, Zhen Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.12535
  • Pdf link: https://arxiv.org/pdf/2303.12535
  • Abstract
    3D single object tracking in LiDAR point clouds (LiDAR SOT) plays a crucial role in autonomous driving. Current approaches all follow the Siamese paradigm based on appearance matching. However, LiDAR point clouds are usually textureless and incomplete, which hinders effective appearance matching. Besides, previous methods greatly overlook the critical motion clues among targets. In this work, beyond 3D Siamese tracking, we introduce a motion-centric paradigm to handle LiDAR SOT from a new perspective. Following this paradigm, we propose a matching-free two-stage tracker M^2-Track. At the 1st-stage, M^2-Track localizes the target within successive frames via motion transformation. Then it refines the target box through motion-assisted shape completion at the 2nd-stage. Due to the motion-centric nature, our method shows its impressive generalizability with limited training labels and provides good differentiability for end-to-end cycle training. This inspires us to explore semi-supervised LiDAR SOT by incorporating a pseudo-label-based motion augmentation and a self-supervised loss term. Under the fully-supervised setting, extensive experiments confirm that M^2-Track significantly outperforms previous state-of-the-arts on three large-scale datasets while running at 57FPS (~8%, ~17% and ~22% precision gains on KITTI, NuScenes, and Waymo Open Dataset respectively). While under the semi-supervised setting, our method performs on par with or even surpasses its fully-supervised counterpart using fewer than half labels from KITTI. Further analysis verifies each component's effectiveness and shows the motion-centric paradigm's promising potential for auto-labeling and unsupervised domain adaptation.

OcTr: Octree-based Transformer for 3D Object Detection

  • Authors: Chao Zhou, Yanan Zhang, Jiaxin Chen, Di Huang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.12621
  • Pdf link: https://arxiv.org/pdf/2303.12621
  • Abstract
    A key challenge for LiDAR-based 3D object detection is to capture sufficient features from large scale 3D scenes especially for distant or/and occluded objects. Albeit recent efforts made by Transformers with the long sequence modeling capability, they fail to properly balance the accuracy and efficiency, suffering from inadequate receptive fields or coarse-grained holistic correlations. In this paper, we propose an Octree-based Transformer, named OcTr, to address this issue. It first constructs a dynamic octree on the hierarchical feature pyramid through conducting self-attention on the top level and then recursively propagates to the level below restricted by the octants, which captures rich global context in a coarse-to-fine manner while maintaining the computational complexity under control. Furthermore, for enhanced foreground perception, we propose a hybrid positional embedding, composed of the semantic-aware positional embedding and attention mask, to fully exploit semantic and geometry clues. Extensive experiments are conducted on the Waymo Open Dataset and KITTI Dataset, and OcTr reaches newly state-of-the-art results.

DR.CPO: Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion

  • Authors: Jungwook Shin, Jaeill Kim, Kyungeun Lee, Hyunghun Cho, Wonjong Rhee
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.12743
  • Pdf link: https://arxiv.org/pdf/2303.12743
  • Abstract
    In autonomous driving, data augmentation is commonly used for improving 3D object detection. The most basic methods include insertion of copied objects and rotation and scaling of the entire training frame. Numerous variants have been developed as well. The existing methods, however, are considerably limited when compared to the variety of the real world possibilities. In this work, we develop a diversified and realistic augmentation method that can flexibly construct a whole-body object, freely locate and rotate the object, and apply self-occlusion and external-occlusion accordingly. To improve the diversity of the whole-body object construction, we develop an iterative method that stochastically combines multiple objects observed from the real world into a single object. Unlike the existing augmentation methods, the constructed objects can be randomly located and rotated in the training frame because proper occlusions can be reflected to the whole-body objects in the final step. Finally, proper self-occlusion at each local object level and external-occlusion at the global frame level are applied using the Hidden Point Removal (HPR) algorithm that is computationally efficient. HPR is also used for adaptively controlling the point density of each object according to the object's distance from the LiDAR. Experiment results show that the proposed DR.CPO algorithm is data-efficient and model-agnostic without incurring any computational overhead. Also, DR.CPO can improve mAP performance by 2.08% when compared to the best 3D detection result known for KITTI dataset. The code is available at https://github.com/SNU-DRL/DRCPO.git

Spherical Transformer for LiDAR-based 3D Recognition

  • Authors: Xin Lai, Yukang Chen, Fanbin Lu, Jianhui Liu, Jiaya Jia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.12766
  • Pdf link: https://arxiv.org/pdf/2303.12766
  • Abstract
    LiDAR-based 3D point cloud recognition has benefited various applications. Without specially considering the LiDAR point distribution, most current methods suffer from information disconnection and limited receptive field, especially for the sparse distant points. In this work, we study the varying-sparsity distribution of LiDAR points and present SphereFormer to directly aggregate information from dense close points to the sparse distant ones. We design radial window self-attention that partitions the space into multiple non-overlapping narrow and long windows. It overcomes the disconnection issue and enlarges the receptive field smoothly and dramatically, which significantly boosts the performance of sparse distant points. Moreover, to fit the narrow and long windows, we propose exponential splitting to yield fine-grained position encoding and dynamic feature selection to increase model representation ability. Notably, our method ranks 1st on both nuScenes and SemanticKITTI semantic segmentation benchmarks with 81.9% and 74.8% mIoU, respectively. Also, we achieve the 3rd place on nuScenes object detection benchmark with 72.8% NDS and 68.5% mAP. Code is available at https://github.com/dvlab-research/SphereFormer.git.

New submissions for Fri, 7 Apr 23

Keyword: efficient

Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural Networks

  • Authors: Michael Weiss, Paolo Tonella
  • Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.02654
  • Pdf link: https://arxiv.org/pdf/2304.02654
  • Abstract
    Recent decades have seen the rise of large-scale Deep Neural Networks (DNNs) to achieve human-competitive performance in a variety of artificial intelligence tasks. Often consisting of hundreds of millions, if not hundreds of billion parameters, these DNNs are too large to be deployed to, or efficiently run on resource-constrained devices such as mobile phones or IoT microcontrollers. Systems relying on large-scale DNNs thus have to call the corresponding model over the network, leading to substantial costs for hosting and running the large-scale remote model, costs which are often charged on a per-use basis. In this paper, we propose BiSupervised, a novel architecture, where, before relying on a large remote DNN, a system attempts to make a prediction on a small-scale local model. A DNN supervisor monitors said prediction process and identifies easy inputs for which the local prediction can be trusted. For these inputs, the remote model does not have to be invoked, thus saving costs, while only marginally impacting the overall system accuracy. Our architecture furthermore foresees a second supervisor to monitor the remote predictions and identify inputs for which not even these can be trusted, allowing to raise an exception or run a fallback strategy instead. We evaluate the cost savings, and the ability to detect incorrectly predicted inputs on four diverse case studies: IMDB movie review sentiment classification, Github issue triaging, Imagenet image classification, and SQuADv2 free-text question answering

nD-PDPA: nDimensional Probability Density Profile Analysis

  • Authors: Arjang Fahim, Stephanie Irausquin, Homayoun Valafar
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
  • Arxiv link: https://arxiv.org/abs/2304.02682
  • Pdf link: https://arxiv.org/pdf/2304.02682
  • Abstract
    Despite the recent advances in various Structural Genomics Projects, a large gap remains between the number of sequenced and structurally characterized proteins. Some reasons for this discrepancy include technical difficulties, labor, and the cost related to determining a structure by experimental methods such as NMR spectroscopy. Several computational methods have been developed to expand the applicability of NMR spectroscopy by addressing temporal and economical problems more efficiently. While these methods demonstrate successful outcomes to solve more challenging and structurally novel proteins, the cost has not been reduced significantly. Probability Density Profile Analysis (PDPA) has been previously introduced by our lab to directly address the economics of structure determination of routine proteins and the identification of novel structures from a minimal set of unassigned NMR data. 2D-PDPA (in which 2D denotes incorporation of data from two alignment media) has been successful in identifying the structural homolog of an unknown protein within a library of ~1000 decoy structures. In order to further expand the selectivity and sensitivity of PDPA, the incorporation of additional data was necessary. However, the expansion of the original PDPA approach was limited by its computational requirements where the inclusion of additional data would render it computationally intractable. Here we present the most recent developments of PDPA method (nD-PDPA: n Dimensional Probability Density Profile Analysis) that eliminate 2D-PDPA's computational limitations, and allows inclusion of RDC data from multiple vector types in multiple alignment media.

A Certified Radius-Guided Attack Framework to Image Segmentation Models

  • Authors: Wenjie Qu, Youqi Li, Binghui Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02693
  • Pdf link: https://arxiv.org/pdf/2304.02693
  • Abstract
    Image segmentation is an important problem in many safety-critical applications. Recent studies show that modern image segmentation models are vulnerable to adversarial perturbations, while existing attack methods mainly follow the idea of attacking image classification models. We argue that image segmentation and classification have inherent differences, and design an attack framework specially for image segmentation models. Our attack framework is inspired by certified radius, which was originally used by defenders to defend against adversarial perturbations to classification models. We are the first, from the attacker perspective, to leverage the properties of certified radius and propose a certified radius guided attack framework against image segmentation models. Specifically, we first adapt randomized smoothing, the state-of-the-art certification method for classification models, to derive the pixel's certified radius. We then focus more on disrupting pixels with relatively smaller certified radii and design a pixel-wise certified radius guided loss, when plugged into any existing white-box attack, yields our certified radius-guided white-box attack. Next, we propose the first black-box attack to image segmentation models via bandit. We design a novel gradient estimator, based on bandit feedback, which is query-efficient and provably unbiased and stable. We use this gradient estimator to design a projected bandit gradient descent (PBGD) attack, as well as a certified radius-guided PBGD (CR-PBGD) attack. We prove our PBGD and CR-PBGD attacks can achieve asymptotically optimal attack performance with an optimal rate. We evaluate our certified-radius guided white-box and black-box attacks on multiple modern image segmentation models and datasets. Our results validate the effectiveness of our certified radius-guided attack framework.

Recovering Continuous Scene Dynamics from A Single Blurry Image with Events

  • Authors: Zhangyi Cheng, Xiang Zhang, Lei Yu, Jianzhuang Liu, Wen Yang, Gui-Song Xia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02695
  • Pdf link: https://arxiv.org/pdf/2304.02695
  • Abstract
    This paper aims at demystifying a single motion-blurred image with events and revealing temporally continuous scene dynamics encrypted behind motion blurs. To achieve this end, an Implicit Video Function (IVF) is learned to represent a single motion blurred image with concurrent events, enabling the latent sharp image restoration of arbitrary timestamps in the range of imaging exposures. Specifically, a dual attention transformer is proposed to efficiently leverage merits from both modalities, i.e., the high temporal resolution of event features and the smoothness of image features, alleviating temporal ambiguities while suppressing the event noise. The proposed network is trained only with the supervision of ground-truth images of limited referenced timestamps. Motion- and texture-guided supervisions are employed simultaneously to enhance restorations of the non-referenced timestamps and improve the overall sharpness. Experiments on synthetic, semi-synthetic, and real-world datasets demonstrate that our proposed method outperforms state-of-the-art methods by a large margin in terms of both objective PSNR and SSIM measurements and subjective evaluations.

Agnostic proper learning of monotone functions: beyond the black-box correction barrier

  • Authors: Jane Lange, Arsen Vasilyan
  • Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02700
  • Pdf link: https://arxiv.org/pdf/2304.02700
  • Abstract
    We give the first agnostic, efficient, proper learning algorithm for monotone Boolean functions. Given $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$ uniformly random examples of an unknown function $f:{\pm 1}^n \rightarrow {\pm 1}$, our algorithm outputs a hypothesis $g:{\pm 1}^n \rightarrow {\pm 1}$ that is monotone and $(\mathrm{opt} + \varepsilon)$-close to $f$, where $\mathrm{opt}$ is the distance from $f$ to the closest monotone function. The running time of the algorithm (and consequently the size and evaluation time of the hypothesis) is also $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$, nearly matching the lower bound of Blais et al (RANDOM '15). We also give an algorithm for estimating up to additive error $\varepsilon$ the distance of an unknown function $f$ to monotone using a run-time of $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$. Previously, for both of these problems, sample-efficient algorithms were known, but these algorithms were not run-time efficient. Our work thus closes this gap in our knowledge between the run-time and sample complexity. This work builds upon the improper learning algorithm of Bshouty and Tamon (JACM '96) and the proper semiagnostic learning algorithm of Lange, Rubinfeld, and Vasilyan (FOCS '22), which obtains a non-monotone Boolean-valued hypothesis, then corrects'' it to monotone using query-efficient local computation algorithms on graphs. This black-box correction approach can achieve no error better than $2\mathrm{opt} + \varepsilon$ information-theoretically; we bypass this barrier by a) augmenting the improper learner with a convex optimization step, and b) learning and correcting a real-valued function before rounding its values to Boolean. Our real-valued correction algorithm solves the poset sorting'' problem of [LRV22] for functions over general posets with non-Boolean labels.

A Unified Taxonomy for Automated Vehicles: Individual, Cooperative, Collaborative, On-Road, and Off-Road

  • Authors: Fredrik Warg, Anders Thorsén, Victoria Vu, Carl Bergenhem
  • Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02705
  • Pdf link: https://arxiv.org/pdf/2304.02705
  • Abstract
    Various types of vehicle automation is increasingly used in a variety of environments including road vehicles such as cars or automated shuttles, confined areas such as mines or harbours, or in agriculture and forestry. In many use cases, the benefits are greater if several automated vehicles (AVs) cooperate to aid each other reach their goals more efficiently, or collaborate to complete a common task. Taxonomies and definitions create a common framework that helps researchers and practitioners advance the field. However, most existing work focus on road vehicles. In this paper, we review and extend taxonomies and definitions to encompass individually acting as well as cooperative and collaborative AVs for both on-road and off-road use cases. In particular, we introduce classes of collaborative vehicles not defined in existing literature, and define levels of automation suitable for vehicles where automation applies to additional functions in addition to the driving task.

Efficient OCR for Building a Diverse Digital History

  • Authors: Jacob Carlson, Tom Bryan, Melissa Dell
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Digital Libraries (cs.DL); General Economics (econ.GN)
  • Arxiv link: https://arxiv.org/abs/2304.02737
  • Pdf link: https://arxiv.org/pdf/2304.02737
  • Abstract
    Thousands of users consult digital archives daily, but the information they can access is unrepresentative of the diversity of documentary history. The sequence-to-sequence architecture typically used for optical character recognition (OCR) - which jointly learns a vision and language model - is poorly extensible to low-resource document collections, as learning a language-vision model requires extensive labeled sequences and compute. This study models OCR as a character level image retrieval problem, using a contrastively trained vision encoder. Because the model only learns characters' visual features, it is more sample efficient and extensible than existing architectures, enabling accurate OCR in settings where existing solutions fail. Crucially, the model opens new avenues for community engagement in making digital history more representative of documentary history.

Sejarah dan Perkembangan Teknik Natural Language Processing (NLP) Bahasa Indonesia: Tinjauan tentang sejarah, perkembangan teknologi, dan aplikasi NLP dalam bahasa Indonesia

  • Authors: Mukhlis Amien
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.02746
  • Pdf link: https://arxiv.org/pdf/2304.02746
  • Abstract
    This study provides an overview of the history of the development of Natural Language Processing (NLP) in the context of the Indonesian language, with a focus on the basic technologies, methods, and practical applications that have been developed. This review covers developments in basic NLP technologies such as stemming, part-of-speech tagging, and related methods; practical applications in cross-language information retrieval systems, information extraction, and sentiment analysis; and methods and techniques used in Indonesian language NLP research, such as machine learning, statistics-based machine translation, and conflict-based approaches. This study also explores the application of NLP in Indonesian language industry and research and identifies challenges and opportunities in Indonesian language NLP research and development. Recommendations for future Indonesian language NLP research and development include developing more efficient methods and technologies, expanding NLP applications, increasing sustainability, further research into the potential of NLP, and promoting interdisciplinary collaboration. It is hoped that this review will help researchers, practitioners, and the government to understand the development of Indonesian language NLP and identify opportunities for further research and development.

Robust, privacy-preserving, transparent, and auditable on-device blocklisting

  • Authors: Kurt Thomas, Sarah Meiklejohn, Michael A. Specter, Xiang Wang, Xavier Llorà, Stephan Somogyi, David Kleidermacher
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.02810
  • Pdf link: https://arxiv.org/pdf/2304.02810
  • Abstract
    With the accelerated adoption of end-to-end encryption, there is an opportunity to re-architect security and anti-abuse primitives in a manner that preserves new privacy expectations. In this paper, we consider two novel protocols for on-device blocklisting that allow a client to determine whether an object (e.g., URL, document, image, etc.) is harmful based on threat information possessed by a so-called remote enforcer in a way that is both privacy-preserving and trustworthy. Our protocols leverage a unique combination of private set intersection to promote privacy, cryptographic hashes to ensure resilience to false positives, cryptographic signatures to improve transparency, and Merkle inclusion proofs to ensure consistency and auditability. We benchmark our protocols -- one that is time-efficient, and the other space-efficient -- to demonstrate their practical use for applications such as email, messaging, storage, and other applications. We also highlight remaining challenges, such as privacy and censorship tensions that exist with logging or reporting. We consider our work to be a critical first step towards enabling complex, multi-stakeholder discussions on how best to provide on-device protections.

GIF: A General Graph Unlearning Strategy via Influence Function

  • Authors: Jiancan Wu, Yi Yang, Yuchun Qian, Yongduo Sui, Xiang Wang, Xiangnan He
  • Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.02835
  • Pdf link: https://arxiv.org/pdf/2304.02835
  • Abstract
    With the greater emphasis on privacy and security in our society, the problem of graph unlearning -- revoking the influence of specific data on the trained GNN model, is drawing increasing attention. However, ranging from machine unlearning to recently emerged graph unlearning methods, existing efforts either resort to retraining paradigm, or perform approximate erasure that fails to consider the inter-dependency between connected neighbors or imposes constraints on GNN structure, therefore hard to achieve satisfying performance-complexity trade-offs. In this work, we explore the influence function tailored for graph unlearning, so as to improve the unlearning efficacy and efficiency for graph unlearning. We first present a unified problem formulation of diverse graph unlearning tasks \wrt node, edge, and feature. Then, we recognize the crux to the inability of traditional influence function for graph unlearning, and devise Graph Influence Function (GIF), a model-agnostic unlearning method that can efficiently and accurately estimate parameter changes in response to a $\epsilon$-mass perturbation in deleted data. The idea is to supplement the objective of the traditional influence function with an additional loss term of the influenced neighbors due to the structural dependency. Further deductions on the closed-form solution of parameter changes provide a better understanding of the unlearning mechanism. We conduct extensive experiments on four representative GNN models and three benchmark datasets to justify the superiority of GIF for diverse graph unlearning tasks in terms of unlearning efficacy, model utility, and unlearning efficiency. Our implementations are available at \url{https://github.com/wujcan/GIF-torch/}.

Robustmix: Improving Robustness by Regularizing the Frequency Bias of Deep Nets

  • Authors: Jonas Ngnawe, Marianne ABEMGNIGNI NJIFON, Jonathan Heek, Yann Dauphin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02847
  • Pdf link: https://arxiv.org/pdf/2304.02847
  • Abstract
    Deep networks have achieved impressive results on a range of well-curated benchmark datasets. Surprisingly, their performance remains sensitive to perturbations that have little effect on human performance. In this work, we propose a novel extension of Mixup called Robustmix that regularizes networks to classify based on lower-frequency spatial features. We show that this type of regularization improves robustness on a range of benchmarks such as Imagenet-C and Stylized Imagenet. It adds little computational overhead and, furthermore, does not require a priori knowledge of a large set of image transformations. We find that this approach further complements recent advances in model architecture and data augmentation, attaining a state-of-the-art mCE of 44.8 with an EfficientNet-B8 model and RandAugment, which is a reduction of 16 mCE compared to the baseline.

Towards an Effective and Efficient Transformer for Rain-by-snow Weather Removal

  • Authors: Tao Gao, Yuanbo Wen, Kaihao Zhang, Peng Cheng, Ting Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02860
  • Pdf link: https://arxiv.org/pdf/2304.02860
  • Abstract
    Rain-by-snow weather removal is a specialized task in weather-degraded image restoration aiming to eliminate coexisting rain streaks and snow particles. In this paper, we propose RSFormer, an efficient and effective Transformer that addresses this challenge. Initially, we explore the proximity of convolution networks (ConvNets) and vision Transformers (ViTs) in hierarchical architectures and experimentally find they perform approximately at intra-stage feature learning. On this basis, we utilize a Transformer-like convolution block (TCB) that replaces the computationally expensive self-attention while preserving attention characteristics for adapting to input content. We also demonstrate that cross-stage progression is critical for performance improvement, and propose a global-local self-attention sampling mechanism (GLASM) that down-/up-samples features while capturing both global and local dependencies. Finally, we synthesize two novel rain-by-snow datasets, RSCityScape and RS100K, to evaluate our proposed RSFormer. Extensive experiments verify that RSFormer achieves the best trade-off between performance and time-consumption compared to other restoration methods. For instance, it outperforms Restormer with a 1.53% reduction in the number of parameters and a 15.6% reduction in inference time. Datasets, source code and pre-trained models are available at \url{https://github.com/chdwyb/RSFormer}.

VPFusion: Towards Robust Vertical Representation Learning for 3D Object Detection

  • Authors: Yuhao Huang, Sanping Zhou, Junjie Zhang, Jinpeng Dong, Nanning Zheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02867
  • Pdf link: https://arxiv.org/pdf/2304.02867
  • Abstract
    Efficient point cloud representation is a fundamental element of Lidar-based 3D object detection. Recent grid-based detectors usually divide point clouds into voxels or pillars and construct single-stream networks in Bird's Eye View. However, these point cloud encoding paradigms underestimate the point representation in the vertical direction, which cause the loss of semantic or fine-grained information, especially for vertical sensitive objects like pedestrian and cyclists. In this paper, we propose an explicit vertical multi-scale representation learning framework, VPFusion, to combine the complementary information from both voxel and pillar streams. Specifically, VPFusion first builds upon a sparse voxel-pillar-based backbone. The backbone divides point clouds into voxels and pillars, then encodes features with 3D and 2D sparse convolution simultaneously. Next, we introduce the Sparse Fusion Layer (SFL), which establishes a bidirectional pathway for sparse voxel and pillar features to enable the interaction between them. Additionally, we present the Dense Fusion Neck (DFN) to effectively combine the dense feature maps from voxel and pillar branches with multi-scale. Extensive experiments on the large-scale Waymo Open Dataset and nuScenes Dataset demonstrate that VPFusion surpasses the single-stream baselines by a large margin and achieves state-of-the-art performance with real-time inference speed.

Object-centric Inference for Language Conditioned Placement: A Foundation Model based Approach

  • Authors: Zhixuan Xu, Kechun Xu, Yue Wang, Rong Xiong
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02893
  • Pdf link: https://arxiv.org/pdf/2304.02893
  • Abstract
    We focus on the task of language-conditioned object placement, in which a robot should generate placements that satisfy all the spatial relational constraints in language instructions. Previous works based on rule-based language parsing or scene-centric visual representation have restrictions on the form of instructions and reference objects or require large amounts of training data. We propose an object-centric framework that leverages foundation models to ground the reference objects and spatial relations for placement, which is more sample efficient and generalizable. Experiments indicate that our model can achieve a 97.75% success rate of placement with only ~0.26M trainable parameters. Besides, our method generalizes better to both unseen objects and instructions. Moreover, with only 25% training data, we still outperform the top competing approach.

Affect as a proxy for literary mood

  • Authors: Emily Öhman, Riikka Rossi
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.02894
  • Pdf link: https://arxiv.org/pdf/2304.02894
  • Abstract
    We propose to use affect as a proxy for mood in literary texts. In this study, we explore the differences in computationally detecting tone versus detecting mood. Methodologically we utilize affective word embeddings to look at the affective distribution in different text segments. We also present a simple yet efficient and effective method of enhancing emotion lexicons to take both semantic shift and the domain of the text into account producing real-world congruent results closely matching both contemporary and modern qualitative analyses.

LSketch: A Label-Enabled Graph Stream Sketch Toward Time-Sensitive Queries

  • Authors: Yiling Zeng, Chunyao Song, Yuhan Li, Tingjian Ge
  • Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.02897
  • Pdf link: https://arxiv.org/pdf/2304.02897
  • Abstract
    Graph streams represent data interactions in real applications. The mining of graph streams plays an important role in network security, social network analysis, and traffic control, among others. However, the sheer volume and high dynamics cause great challenges for efficient storage and subsequent query analysis on them. Current studies apply sketches to summarize graph streams. We propose LSketch that works for heterogeneous graph streams, which effectively preserves the label information carried by the streams in real scenes, thereby enriching the expressive ability of sketches. In addition, as graph streams continue to evolve over time, edges too old may lose their practical significance. Therefore, we introduce the sliding window model into LSketch to eliminate the expired edges automatically. LSketch uses sub-linear storage space and can support structure based queries and time-sensitive queries with high accuracy. We perform extensive experiments over four real datasets, demonstrating the superiority of the proposed method over state-of-the-art methods, in aspects of query accuracy and time efficiency.

InterFormer: Real-time Interactive Image Segmentation

  • Authors: You Huang, Hao Yang, Ke Sun, Shengchuan Zhang, Guannan Jiang, Rongrong Ji, Liujuan Cao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.02942
  • Pdf link: https://arxiv.org/pdf/2304.02942
  • Abstract
    Interactive image segmentation enables annotators to efficiently perform pixel-level annotation for segmentation tasks. However, the existing interactive segmentation pipeline suffers from inefficient computations of interactive models because of the following two issues. First, annotators' later click is based on models' feedback of annotators' former click. This serial interaction is unable to utilize model's parallelism capabilities. Second, the model has to repeatedly process the image, the annotator's current click, and the model's feedback of the annotator's former clicks at each step of interaction, resulting in redundant computations. For efficient computation, we propose a method named InterFormer that follows a new pipeline to address these issues. InterFormer extracts and preprocesses the computationally time-consuming part i.e. image processing from the existing process. Specifically, InterFormer employs a large vision transformer (ViT) on high-performance devices to preprocess images in parallel, and then uses a lightweight module called interactive multi-head self attention (I-MSA) for interactive segmentation. Furthermore, the I-MSA module's deployment on low-power devices extends the practical application of interactive segmentation. The I-MSA module utilizes the preprocessed features to efficiently response to the annotator inputs in real-time. The experiments on several datasets demonstrate the effectiveness of InterFormer, which outperforms previous interactive segmentation models in terms of computational efficiency and segmentation quality, achieve real-time high-quality interactive segmentation on CPU-only devices.

When approximate design for fast homomorphic computation provides differential privacy guarantees

  • Authors: Arnaud Grivet Sébert, Martin Zuber, Oana Stan, Renaud Sirdey, Cédric Gouy-Pailler
  • Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02959
  • Pdf link: https://arxiv.org/pdf/2304.02959
  • Abstract
    While machine learning has become pervasive in as diversified fields as industry, healthcare, social networks, privacy concerns regarding the training data have gained a critical importance. In settings where several parties wish to collaboratively train a common model without jeopardizing their sensitive data, the need for a private training protocol is particularly stringent and implies to protect the data against both the model's end-users and the actors of the training phase. Differential privacy (DP) and cryptographic primitives are complementary popular countermeasures against privacy attacks. Among these cryptographic primitives, fully homomorphic encryption (FHE) offers ciphertext malleability at the cost of time-consuming operations in the homomorphic domain. In this paper, we design SHIELD, a probabilistic approximation algorithm for the argmax operator which is both fast when homomorphically executed and whose inaccuracy is used as a feature to ensure DP guarantees. Even if SHIELD could have other applications, we here focus on one setting and seamlessly integrate it in the SPEED collaborative training framework from "SPEED: Secure, PrivatE, and Efficient Deep learning" (Grivet S'ebert et al., 2021) to improve its computational efficiency. After thoroughly describing the FHE implementation of our algorithm and its DP analysis, we present experimental results. To the best of our knowledge, it is the first work in which relaxing the accuracy of an homomorphic calculation is constructively usable as a degree of freedom to achieve better FHE performances.

A Fast and Lightweight Network for Low-Light Image Enhancement

  • Authors: Yu Zhang, Xiaoguang Di, Junde Wu, RAO FU, Yong Li, Yue Wang, Yanwu Xu, Guohui YANG, Chunhui Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.02978
  • Pdf link: https://arxiv.org/pdf/2304.02978
  • Abstract
    Low-light images often suffer from severe noise, low brightness, low contrast, and color deviation. While several low-light image enhancement methods have been proposed, there remains a lack of efficient methods that can simultaneously solve all of these problems. In this paper, we introduce FLW-Net, a Fast and LightWeight Network for low-light image enhancement that significantly improves processing speed and overall effect. To achieve efficient low-light image enhancement, we recognize the challenges of the lack of an absolute reference and the need for a large receptive field to obtain global contrast. Therefore, we propose an efficient global feature information extraction component and design loss functions based on relative information to overcome these challenges. Finally, we conduct comparative experiments to demonstrate the effectiveness of the proposed method, and the results confirm that FLW-Net can significantly reduce the complexity of supervised low-light image enhancement networks while improving processing effect. Code is available at https://github.com/hitzhangyu/FLW-Net

IoT Federated Blockchain Learning at the Edge

  • Authors: James Calo, Benny Lo
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.03006
  • Pdf link: https://arxiv.org/pdf/2304.03006
  • Abstract
    IoT devices are sorely underutilized in the medical field, especially within machine learning for medicine, yet they offer unrivaled benefits. IoT devices are low-cost, energy-efficient, small and intelligent devices. In this paper, we propose a distributed federated learning framework for IoT devices, more specifically for IoMT (Internet of Medical Things), using blockchain to allow for a decentralized scheme improving privacy and efficiency over a centralized system; this allows us to move from the cloud-based architectures, that are prevalent, to the edge. The system is designed for three paradigms: 1) Training neural networks on IoT devices to allow for collaborative training of a shared model whilst decoupling the learning from the dataset to ensure privacy. Training is performed in an online manner simultaneously amongst all participants, allowing for the training of actual data that may not have been present in a dataset collected in the traditional way and dynamically adapt the system whilst it is being trained. 2) Training of an IoMT system in a fully private manner such as to mitigate the issue with confidentiality of medical data and to build robust, and potentially bespoke, models where not much, if any, data exists. 3) Distribution of the actual network training, something federated learning itself does not do, to allow hospitals, for example, to utilize their spare computing resources to train network models.

PointCAT: Cross-Attention Transformer for point cloud

  • Authors: Xincheng Yang, Mingze Jin, Weiji He, Qian Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03012
  • Pdf link: https://arxiv.org/pdf/2304.03012
  • Abstract
    Transformer-based models have significantly advanced natural language processing and computer vision in recent years. However, due to the irregular and disordered structure of point cloud data, transformer-based models for 3D deep learning are still in their infancy compared to other methods. In this paper we present Point Cross-Attention Transformer (PointCAT), a novel end-to-end network architecture using cross-attentions mechanism for point cloud representing. Our approach combines multi-scale features via two seprate cross-attention transformer branches. To reduce the computational increase brought by multi-branch structure, we further introduce an efficient model for shape classification, which only process single class token of one branch as a query to calculate attention map with the other. Extensive experiments demonstrate that our method outperforms or achieves comparable performance to several approaches in shape classification, part segmentation and semantic segmentation tasks.

Tensor Slicing and Optimization for Multicore NPUs

  • Authors: Rafael Sousa, Marcio Pereira, Yongin Kwon, Taeho Kim, Namsoon Jung, Chang Soo Kim, Michael Frank, Guido Araujo
  • Subjects: Performance (cs.PF); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03013
  • Pdf link: https://arxiv.org/pdf/2304.03013
  • Abstract
    Although code generation for Convolution Neural Network (CNN) models has been extensively studied, performing efficient data slicing and parallelization for highly-constrai-ned Multicore Neural Processor Units (NPUs) is still a challenging problem. Given the size of convolutions' input/output tensors and the small footprint of NPU on-chip memories, minimizing memory transactions while maximizing parallelism and MAC utilization are central to any effective solution. This paper proposes a TensorFlow XLA/LLVM compiler optimization pass for Multicore NPUs, called Tensor Slicing Optimization (TSO), which: (a) maximizes convolution parallelism and memory usage across NPU cores; and (b) reduces data transfers between host and NPU on-chip memories by using DRAM memory burst time estimates to guide tensor slicing. To evaluate the proposed approach, a set of experiments was performed using the NeuroMorphic Processor (NMP), a multicore NPU containing 32 RISC-V cores extended with novel CNN instructions. Experimental results show that TSO is capable of identifying the best tensor slicing that minimizes execution time for a set of CNN models. Speed-ups of up to 21.7% result when comparing the TSO burst-based technique to a no-burst data slicing approach. To validate the generality of the TSO approach, the algorithm was also ported to the Glow Machine Learning framework. The performance of the models were measured on both Glow and TensorFlow XLA/LLVM compilers, revealing similar results.

A computation of D(9) using FPGA Supercomputing

  • Authors: Lennart Van Hirtum, Patrick De Causmaecker, Jens Goemaere, Tobias Kenter, Heinrich Riebler, Michael Lass, Christian Plessl
  • Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)
  • Arxiv link: https://arxiv.org/abs/2304.03039
  • Pdf link: https://arxiv.org/pdf/2304.03039
  • Abstract
    This preprint makes the claim of having computed the $9^{th}$ Dedekind Number. This was done by building an efficient FPGA Accelerator for the core operation of the process, and parallelizing it on the Noctua 2 Supercluster at Paderborn University. The resulting value is 286386577668298411128469151667598498812366. This value can be verified in two steps. We have made the data file containing the 490M results available, each of which can be verified separately on CPU, and the whole file sums to our proposed value.

Data-driven HVAC Control Using Symbolic Regression: Design and Implementation

  • Authors: Yuki Ozawa, Dafang Zhao, Daichi Watari, Ittetsu Taniguchi, Toshihiro Suzuki, Yoshiyuki Shimoda, Takao Onoye
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.03078
  • Pdf link: https://arxiv.org/pdf/2304.03078
  • Abstract
    The large amount of data collected in buildings makes energy management smarter and more energy efficient. This study proposes a design and implementation methodology of data-driven heating, ventilation, and air conditioning (HVAC) control. Building thermodynamics is modeled using a symbolic regression model (SRM) built from the collected data. Additionally, an HVAC system model is also developed with a data-driven approach. A model predictive control (MPC) based HVAC scheduling is formulated with the developed models to minimize energy consumption and peak power demand and maximize thermal comfort. The performance of the proposed framework is demonstrated in the workspace in the actual campus building. The HVAC system using the proposed framework reduces the peak power by 16.1% compared to the widely used thermostat controller.

Offline Uncertainty Sampling in Data-driven Stochastic MPC

  • Authors: Johannes Teutsch, Sebastian Kerz, Tim Brüdigam, Dirk Wollherr, Marion Leibold
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.03088
  • Pdf link: https://arxiv.org/pdf/2304.03088
  • Abstract
    In this work, we exploit an offline-sampling based strategy for the constrained data-driven predictive control of an unknown linear system subject to random measurement noise. The strategy uses only past measured, potentially noisy data in a non-parametric system representation and does not require any prior model identification. The approximation of chance constraints using uncertainty sampling leads to efficient constraint tightening. Under mild assumptions, robust recursive feasibility and closed-loop constraint satisfaction is shown. In a simulation example, we provide evidence for the improved control performance of the proposed control scheme in comparison to a purely robust data-driven predictive control approach.

Inductive Graph Unlearning

  • Authors: Cheng-Long Wang, Mengdi Huai, Di Wang
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.03093
  • Pdf link: https://arxiv.org/pdf/2304.03093
  • Abstract
    As a way to implement the "right to be forgotten" in machine learning, \textit{machine unlearning} aims to completely remove the contributions and information of the samples to be deleted from a trained model without affecting the contributions of other samples. Recently, many frameworks for machine unlearning have been proposed, and most of them focus on image and text data. To extend machine unlearning to graph data, \textit{GraphEraser} has been proposed. However, a critical issue is that \textit{GraphEraser} is specifically designed for the transductive graph setting, where the graph is static and attributes and edges of test nodes are visible during training. It is unsuitable for the inductive setting, where the graph could be dynamic and the test graph information is invisible in advance. Such inductive capability is essential for production machine learning systems with evolving graphs like social media and transaction networks. To fill this gap, we propose the \underline{{\bf G}}\underline{{\bf U}}ided \underline{{\bf I}}n\underline{{\bf D}}uctiv\underline{{\bf E}} Graph Unlearning framework (GUIDE). GUIDE consists of three components: guided graph partitioning with fairness and balance, efficient subgraph repair, and similarity-based aggregation. Empirically, we evaluate our method on several inductive benchmarks and evolving transaction graphs. Generally speaking, GUIDE can be efficiently implemented on the inductive graph learning tasks for its low graph partition cost, no matter on computation or structure information. The code will be available here: https://github.com/Happy2Git/GUIDE.

FABRID: Flexible Attestation-Based Routing for Inter-Domain Networks

  • Authors: Cyrill Krähenbühl (ETH Zürich), Marc Wyss (ETH Zürich), David Basin (ETH Zürich), Vincent Lenders (armasuisse), Adrian Perrig (ETH Zürich), Martin Strohmeier (armasuisse)
  • Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.03108
  • Pdf link: https://arxiv.org/pdf/2304.03108
  • Abstract
    In its current state, the Internet does not provide end users with transparency and control regarding on-path forwarding devices. In particular, the lack of network device information reduces the trustworthiness of the forwarding path and prevents end-user applications requiring specific router capabilities from reaching their full potential. Moreover, the inability to influence the traffic's forwarding path results in applications communicating over undesired routes, while alternative paths with more desirable properties remain unusable. In this work, we present FABRID, a system that enables applications to forward traffic flexibly, potentially on multiple paths selected to comply with user-defined preferences, where information about forwarding devices is exposed and transparently attested by autonomous systems (ASes). The granularity of this information is chosen by each AS individually, protecting them from leaking sensitive network details, while the secrecy and authenticity of preferences embedded within the users' packets are protected through efficient cryptographic operations. We show the viability of FABRID by deploying it on a global SCION network test bed, and we demonstrate high throughput on commodity hardware.

Simplifying Content-Based Neural News Recommendation: On User Modeling and Training Objectives

  • Authors: Andreea Iana, Goran Glavaš, Heiko Paulheim
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.03112
  • Pdf link: https://arxiv.org/pdf/2304.03112
  • Abstract
    The advent of personalized news recommendation has given rise to increasingly complex recommender architectures. Most neural news recommenders rely on user click behavior and typically introduce dedicated user encoders that aggregate the content of clicked news into user embeddings (early fusion). These models are predominantly trained with standard point-wise classification objectives. The existing body of work exhibits two main shortcomings: (1) despite general design homogeneity, direct comparisons between models are hindered by varying evaluation datasets and protocols; (2) it leaves alternative model designs and training objectives vastly unexplored. In this work, we present a unified framework for news recommendation, allowing for a systematic and fair comparison of news recommenders across several crucial design dimensions: (i) candidate-awareness in user modeling, (ii) click behavior fusion, and (iii) training objectives. Our findings challenge the status quo in neural news recommendation. We show that replacing sizable user encoders with parameter-efficient dot products between candidate and clicked news embeddings (late fusion) often yields substantial performance gains. Moreover, our results render contrastive training a viable alternative to point-wise classification objectives.

Zero-shot Generative Model Adaptation via Image-specific Prompt Learning

  • Authors: Jiayi Guo, Chaofei Wang, You Wu, Eric Zhang, Kai Wang, Xingqian Xu, Shiji Song, Humphrey Shi, Gao Huang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03119
  • Pdf link: https://arxiv.org/pdf/2304.03119
  • Abstract
    Recently, CLIP-guided image synthesis has shown appealing performance on adapting a pre-trained source-domain generator to an unseen target domain. It does not require any target-domain samples but only the textual domain labels. The training is highly efficient, e.g., a few minutes. However, existing methods still have some limitations in the quality of generated images and may suffer from the mode collapse issue. A key reason is that a fixed adaptation direction is applied for all cross-domain image pairs, which leads to identical supervision signals. To address this issue, we propose an Image-specific Prompt Learning (IPL) method, which learns specific prompt vectors for each source-domain image. This produces a more precise adaptation direction for every cross-domain image pair, endowing the target-domain generator with greatly enhanced flexibility. Qualitative and quantitative evaluations on various domains demonstrate that IPL effectively improves the quality and diversity of synthesized images and alleviates the mode collapse. Moreover, IPL is independent of the structure of the generative model, such as generative adversarial networks or diffusion models. Code is available at https://github.com/Picsart-AI-Research/IPL-Zero-Shot-Generative-Model-Adaptation.

BotTriNet: A Unified and Efficient Embedding for Social Bots Detection via Metric Learning

  • Authors: Jun Wu, Xuesong Ye, Man Yan Yuet
  • Subjects: Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.03144
  • Pdf link: https://arxiv.org/pdf/2304.03144
  • Abstract
    A persistently popular topic in online social networks is the rapid and accurate discovery of bot accounts to prevent their invasion and harassment of genuine users. We propose a unified embedding framework called BOTTRINET, which utilizes textual content posted by accounts for bot detection based on the assumption that contexts naturally reveal account personalities and habits. Content is abundant and valuable if the system efficiently extracts bot-related information using embedding techniques. Beyond the general embedding framework that generates word, sentence, and account embeddings, we design a triplet network to tune the raw embeddings (produced by traditional natural language processing techniques) for better classification performance. We evaluate detection accuracy and f1score on a real-world dataset CRESCI2017, comprising three bot account categories and five bot sample sets. Our system achieves the highest average accuracy of 98.34% and f1score of 97.99% on two content-intensive bot sets, outperforming previous work and becoming state-of-the-art. It also makes a breakthrough on four content-less bot sets, with an average accuracy improvement of 11.52% and an average f1score increase of 16.70%.

Parameterized Approximation Schemes for Clustering with General Norm Objectives

  • Authors: Fateme Abbasi, Sandip Banerjee, Jarosław Byrka, Parinya Chalermsook, Ameet Gadekar, Kamyar Khodamoradi, Dániel Marx, Roohani Sharma, Joachim Spoerhase
  • Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03146
  • Pdf link: https://arxiv.org/pdf/2304.03146
  • Abstract
    This paper considers the well-studied algorithmic regime of designing a $(1+\epsilon)$-approximation algorithm for a $k$-clustering problem that runs in time $f(k,\epsilon)poly(n)$ (sometimes called an efficient parameterized approximation scheme or EPAS for short). Notable results of this kind include EPASes in the high-dimensional Euclidean setting for $k$-center [Bad\u{o}iu, Har-Peled, Indyk; STOC'02] as well as $k$-median, and $k$-means [Kumar, Sabharwal, Sen; J. ACM 2010]. However, existing EPASes handle only basic objectives (such as $k$-center, $k$-median, and $k$-means) and are tailored to the specific objective and metric space. Our main contribution is a clean and simple EPAS that settles more than ten clustering problems (across multiple well-studied objectives as well as metric spaces) and unifies well-known EPASes. Our algorithm gives EPASes for a large variety of clustering objectives (for example, $k$-means, $k$-center, $k$-median, priority $k$-center, $\ell$-centrum, ordered $k$-median, socially fair $k$-median aka robust $k$-median, or more generally monotone norm $k$-clustering) and metric spaces (for example, continuous high-dimensional Euclidean spaces, metrics of bounded doubling dimension, bounded treewidth metrics, and planar metrics). Key to our approach is a new concept that we call bounded $\epsilon$-scatter dimension--an intrinsic complexity measure of a metric space that is a relaxation of the standard notion of bounded doubling dimension. Our main technical result shows that two conditions are essentially sufficient for our algorithm to yield an EPAS on the input metric $M$ for any clustering objective: (i) The objective is described by a monotone (not necessarily symmetric!) norm, and (ii) the $\epsilon$-scatter dimension of $M$ is upper bounded by a function of $\epsilon$.

Spectral Toolkit of Algorithms for Graphs: Technical Report (1)

  • Authors: Peter Macgregor, He Sun
  • Subjects: Social and Information Networks (cs.SI); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Mathematical Software (cs.MS)
  • Arxiv link: https://arxiv.org/abs/2304.03170
  • Pdf link: https://arxiv.org/pdf/2304.03170
  • Abstract
    Spectral Toolkit of Algorithms for Graphs (STAG) is an open-source library for efficient spectral graph algorithms, and its development starts in September 2022. We have so far finished the component on local graph clustering, and this technical report presents a user's guide to STAG, showcase studies, and several technical considerations behind our development.

Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream

  • Authors: Yuheng Jiang, Kaixin Yao, Zhuo Su, Zhehao Shen, Haimin Luo, Lan Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03184
  • Pdf link: https://arxiv.org/pdf/2304.03184
  • Abstract
    Convenient 4D modeling of human-object interactions is essential for numerous applications. However, monocular tracking and rendering of complex interaction scenarios remain challenging. In this paper, we propose Instant-NVR, a neural approach for instant volumetric human-object tracking and rendering using a single RGBD camera. It bridges traditional non-rigid tracking with recent instant radiance field techniques via a multi-thread tracking-rendering mechanism. In the tracking front-end, we adopt a robust human-object capture scheme to provide sufficient motion priors. We further introduce a separated instant neural representation with a novel hybrid deformation module for the interacting scene. We also provide an on-the-fly reconstruction scheme of the dynamic/static radiance fields via efficient motion-prior searching. Moreover, we introduce an online key frame selection scheme and a rendering-aware refinement strategy to significantly improve the appearance details for online novel-view synthesis. Extensive experiments demonstrate the effectiveness and efficiency of our approach for the instant generation of human-object radiance fields on the fly, notably achieving real-time photo-realistic novel view synthesis under complex human-object interactions.

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

  • Authors: Nolan Dey, Gurpreet Gosal, Zhiming (Charles)Chen, Hemant Khachane, William Marshall, Ribhu Pathria, Marvin Tom, Joel Hestness
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.03208
  • Pdf link: https://arxiv.org/pdf/2304.03208
  • Abstract
    We study recent research advances that improve large language models through efficient pre-training and scaling, and open datasets and tools. We combine these advances to introduce Cerebras-GPT, a family of open compute-optimal language models scaled from 111M to 13B parameters. We train Cerebras-GPT models on the Eleuther Pile dataset following DeepMind Chinchilla scaling rules for efficient pre-training (highest accuracy for a given compute budget). We characterize the predictable power-law scaling and compare Cerebras-GPT with other publicly-available models to show all Cerebras-GPT models have state-of-the-art training efficiency on both pre-training and downstream objectives. We describe our learnings including how Maximal Update Parameterization ($\mu$P) can further improve large model scaling, improving accuracy and hyperparameter predictability at scale. We release our pre-trained models and code, making this paper the first open and reproducible work comparing compute-optimal model scaling to models trained on fixed dataset sizes. Cerebras-GPT models are available on HuggingFace: https://huggingface.co/cerebras.

Hierarchical Graph Neural Network with Cross-Attention for Cross-Device User Matching

  • Authors: Ali Taghibakhshi, Mingyuan Ma, Ashwath Aithal, Onur Yilmaz, Haggai Maron, Matthew West
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.03215
  • Pdf link: https://arxiv.org/pdf/2304.03215
  • Abstract
    Cross-device user matching is a critical problem in numerous domains, including advertising, recommender systems, and cybersecurity. It involves identifying and linking different devices belonging to the same person, utilizing sequence logs. Previous data mining techniques have struggled to address the long-range dependencies and higher-order connections between the logs. Recently, researchers have modeled this problem as a graph problem and proposed a two-tier graph contextual embedding (TGCE) neural network architecture, which outperforms previous methods. In this paper, we propose a novel hierarchical graph neural network architecture (HGNN), which has a more computationally efficient second level design than TGCE. Furthermore, we introduce a cross-attention (Cross-Att) mechanism in our model, which improves performance by 5% compared to the state-of-the-art TGCE method.

FedBot: Enhancing Privacy in Chatbots with Federated Learning

  • Authors: Addi Ait-Mlouk, Sadi Alawadi, Salman Toor, Andreas Hellander
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03228
  • Pdf link: https://arxiv.org/pdf/2304.03228
  • Abstract
    Chatbots are mainly data-driven and usually based on utterances that might be sensitive. However, training deep learning models on shared data can violate user privacy. Such issues have commonly existed in chatbots since their inception. In the literature, there have been many approaches to deal with privacy, such as differential privacy and secure multi-party computation, but most of them need to have access to users' data. In this context, Federated Learning (FL) aims to protect data privacy through distributed learning methods that keep the data in its location. This paper presents Fedbot, a proof-of-concept (POC) privacy-preserving chatbot that leverages large-scale customer support data. The POC combines Deep Bidirectional Transformer models and federated learning algorithms to protect customer data privacy during collaborative model training. The results of the proof-of-concept showcase the potential for privacy-preserving chatbots to transform the customer support industry by delivering personalized and efficient customer service that meets data privacy regulations and legal requirements. Furthermore, the system is specifically designed to improve its performance and accuracy over time by leveraging its ability to learn from previous interactions.

DiffMimic: Efficient Motion Mimicking with Differentiable Physics

  • Authors: Jiawei Ren, Cunjun Yu, Siwei Chen, Xiao Ma, Liang Pan, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03274
  • Pdf link: https://arxiv.org/pdf/2304.03274
  • Abstract
    Motion mimicking is a foundational task in physics-based character animation. However, most existing motion mimicking methods are built upon reinforcement learning (RL) and suffer from heavy reward engineering, high variance, and slow convergence with hard explorations. Specifically, they usually take tens of hours or even days of training to mimic a simple motion sequence, resulting in poor scalability. In this work, we leverage differentiable physics simulators (DPS) and propose an efficient motion mimicking method dubbed DiffMimic. Our key insight is that DPS casts a complex policy learning task to a much simpler state matching problem. In particular, DPS learns a stable policy by analytical gradients with ground-truth physical priors hence leading to significantly faster and stabler convergence than RL-based methods. Moreover, to escape from local optima, we utilize a Demonstration Replay mechanism to enable stable gradient backpropagation in a long horizon. Extensive experiments on standard benchmarks show that DiffMimic has a better sample efficiency and time efficiency than existing methods (e.g., DeepMimic). Notably, DiffMimic allows a physically simulated character to learn Backflip after 10 minutes of training and be able to cycle it after 3 hours of training, while the existing approach may require about a day of training to cycle Backflip. More importantly, we hope DiffMimic can benefit more differentiable animation systems with techniques like differentiable clothes simulation in future research.

Keyword: faster

DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model

  • Authors: Hoigi Seo, Hayeon Kim, Gwanghyun Kim, Se Young Chun
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.02827
  • Pdf link: https://arxiv.org/pdf/2304.02827
  • Abstract
    The increasing demand for high-quality 3D content creation has motivated the development of automated methods for creating 3D object models from a single image and/or from a text prompt. However, the reconstructed 3D objects using state-of-the-art image-to-3D methods still exhibit low correspondence to the given image and low multi-view consistency. Recent state-of-the-art text-to-3D methods are also limited, yielding 3D samples with low diversity per prompt with long synthesis time. To address these challenges, we propose DITTO-NeRF, a novel pipeline to generate a high-quality 3D NeRF model from a text prompt or a single image. Our DITTO-NeRF consists of constructing high-quality partial 3D object for limited in-boundary (IB) angles using the given or text-generated 2D image from the frontal view and then iteratively reconstructing the remaining 3D NeRF using inpainting latent diffusion model. We propose progressive 3D object reconstruction schemes in terms of scales (low to high resolution), angles (IB angles initially to outer-boundary (OB) later), and masks (object to background boundary) in our DITTO-NeRF so that high-quality information on IB can be propagated into OB. Our DITTO-NeRF outperforms state-of-the-art methods in terms of fidelity and diversity qualitatively and quantitatively with much faster training times than prior arts on image/text-to-3D such as DreamFusion, and NeuralLift-360.

Convolutional neural networks for crack detection on flexible road pavements

  • Authors: Hermann Tapamo, Anna Bosman, James Maina, Emile Horak
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02933
  • Pdf link: https://arxiv.org/pdf/2304.02933
  • Abstract
    Flexible road pavements deteriorate primarily due to traffic and adverse environmental conditions. Cracking is the most common deterioration mechanism; the surveying thereof is typically conducted manually using internationally defined classification standards. In South Africa, the use of high-definition video images has been introduced, which allows for safer road surveying. However, surveying is still a tedious manual process. Automation of the detection of defects such as cracks would allow for faster analysis of road networks and potentially reduce human bias and error. This study performs a comparison of six state-of-the-art convolutional neural network models for the purpose of crack detection. The models are pretrained on the ImageNet dataset, and fine-tuned using a new real-world binary crack dataset consisting of 14000 samples. The effects of dataset augmentation are also investigated. Of the six models trained, five achieved accuracy above 97%. The highest recorded accuracy was 98%, achieved by the ResNet and VGG16 models. The dataset is available at the following URL: https://zenodo.org/record/7795975

Boundary-Denoising for Video Activity Localization

  • Authors: Mengmeng Xu, Mattia Soldan, Jialin Gao, Shuming Liu, Juan-Manuel Pérez-Rúa, Bernard Ghanem
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02934
  • Pdf link: https://arxiv.org/pdf/2304.02934
  • Abstract
    Video activity localization aims at understanding the semantic content in long untrimmed videos and retrieving actions of interest. The retrieved action with its start and end locations can be used for highlight generation, temporal action detection, etc. Unfortunately, learning the exact boundary location of activities is highly challenging because temporal activities are continuous in time, and there are often no clear-cut transitions between actions. Moreover, the definition of the start and end of events is subjective, which may confuse the model. To alleviate the boundary ambiguity, we propose to study the video activity localization problem from a denoising perspective. Specifically, we propose an encoder-decoder model named DenoiseLoc. During training, a set of action spans is randomly generated from the ground truth with a controlled noise scale. Then we attempt to reverse this process by boundary denoising, allowing the localizer to predict activities with precise boundaries and resulting in faster convergence speed. Experiments show that DenoiseLoc advances %in several video activity understanding tasks. For example, we observe a gain of +12.36% average mAP on QV-Highlights dataset and +1.64% [email protected] on THUMOS'14 dataset over the baseline. Moreover, DenoiseLoc achieves state-of-the-art performance on TACoS and MAD datasets, but with much fewer predictions compared to other current methods.

Training a Two Layer ReLU Network Analytically

  • Authors: Adrian Barbu
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02972
  • Pdf link: https://arxiv.org/pdf/2304.02972
  • Abstract
    Neural networks are usually trained with different variants of gradient descent based optimization algorithms such as stochastic gradient descent or the Adam optimizer. Recent theoretical work states that the critical points (where the gradient of the loss is zero) of two-layer ReLU networks with the square loss are not all local minima. However, in this work we will explore an algorithm for training two-layer neural networks with ReLU-like activation and the square loss that alternatively finds the critical points of the loss function analytically for one layer while keeping the other layer and the neuron activation pattern fixed. Experiments indicate that this simple algorithm can find deeper optima than Stochastic Gradient Descent or the Adam optimizer, obtaining significantly smaller training loss values on four out of the five real datasets evaluated. Moreover, the method is faster than the gradient descent methods and has virtually no tuning parameters.

Patch-wise Features for Blur Image Classification

  • Authors: Sri Charan Kattamuru, Kshitij Agrawal, Shyam Prasad Adhikari, Abhishek Bose, Hemant Misra
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03156
  • Pdf link: https://arxiv.org/pdf/2304.03156
  • Abstract
    Images captured through smartphone cameras often suffer from degradation, blur being one of the major ones, posing a challenge in processing these images for downstream tasks. In this paper we propose low-compute lightweight patch-wise features for image quality assessment. Using our method we can discriminate between blur vs sharp image degradation. To this end, we train a decision-tree based XGBoost model on various intuitive image features like gray level variance, first and second order gradients, texture features like local binary patterns. Experiments conducted on an open dataset show that the proposed low compute method results in 90.1% mean accuracy on the validation set, which is comparable to the accuracy of a compute-intensive VGG16 network with 94% mean accuracy fine-tuned to this task. To demonstrate the generalizability of our proposed features and model we test the model on BHBID dataset and an internal dataset where we attain accuracy of 98% and 91%, respectively. The proposed method is 10x faster than the VGG16 based model on CPU and scales linearly to the input image size making it suitable to be implemented on low compute edge devices.

DiffMimic: Efficient Motion Mimicking with Differentiable Physics

  • Authors: Jiawei Ren, Cunjun Yu, Siwei Chen, Xiao Ma, Liang Pan, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03274
  • Pdf link: https://arxiv.org/pdf/2304.03274
  • Abstract
    Motion mimicking is a foundational task in physics-based character animation. However, most existing motion mimicking methods are built upon reinforcement learning (RL) and suffer from heavy reward engineering, high variance, and slow convergence with hard explorations. Specifically, they usually take tens of hours or even days of training to mimic a simple motion sequence, resulting in poor scalability. In this work, we leverage differentiable physics simulators (DPS) and propose an efficient motion mimicking method dubbed DiffMimic. Our key insight is that DPS casts a complex policy learning task to a much simpler state matching problem. In particular, DPS learns a stable policy by analytical gradients with ground-truth physical priors hence leading to significantly faster and stabler convergence than RL-based methods. Moreover, to escape from local optima, we utilize a Demonstration Replay mechanism to enable stable gradient backpropagation in a long horizon. Extensive experiments on standard benchmarks show that DiffMimic has a better sample efficiency and time efficiency than existing methods (e.g., DeepMimic). Notably, DiffMimic allows a physically simulated character to learn Backflip after 10 minutes of training and be able to cycle it after 3 hours of training, while the existing approach may require about a day of training to cycle Backflip. More importantly, we hope DiffMimic can benefit more differentiable animation systems with techniques like differentiable clothes simulation in future research.

Keyword: mobile

Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural Networks

  • Authors: Michael Weiss, Paolo Tonella
  • Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.02654
  • Pdf link: https://arxiv.org/pdf/2304.02654
  • Abstract
    Recent decades have seen the rise of large-scale Deep Neural Networks (DNNs) to achieve human-competitive performance in a variety of artificial intelligence tasks. Often consisting of hundreds of millions, if not hundreds of billion parameters, these DNNs are too large to be deployed to, or efficiently run on resource-constrained devices such as mobile phones or IoT microcontrollers. Systems relying on large-scale DNNs thus have to call the corresponding model over the network, leading to substantial costs for hosting and running the large-scale remote model, costs which are often charged on a per-use basis. In this paper, we propose BiSupervised, a novel architecture, where, before relying on a large remote DNN, a system attempts to make a prediction on a small-scale local model. A DNN supervisor monitors said prediction process and identifies easy inputs for which the local prediction can be trusted. For these inputs, the remote model does not have to be invoked, thus saving costs, while only marginally impacting the overall system accuracy. Our architecture furthermore foresees a second supervisor to monitor the remote predictions and identify inputs for which not even these can be trusted, allowing to raise an exception or run a fallback strategy instead. We evaluate the cost savings, and the ability to detect incorrectly predicted inputs on four diverse case studies: IMDB movie review sentiment classification, Github issue triaging, Imagenet image classification, and SQuADv2 free-text question answering

Adaptive Headway Motion Control and Motion Prediction for Safe Unicycle Motion Design

  • Authors: Aykut İşleyen, Nathan van de Wouw, Ömür Arslan
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02760
  • Pdf link: https://arxiv.org/pdf/2304.02760
  • Abstract
    Differential drive robots that can be modeled as a kinematic unicycle are a standard mobile base platform for many service and logistics robots. Safe and smooth autonomous motion around obstacles is a crucial skill for unicycle robots to perform diverse tasks in complex environments. A classical control approach for unicycle control is feedback linearization using a headway point at a fixed headway distance in front of the unicycle. The unicycle headway control brings the headway point to a desired goal location by embedding a linear headway reference dynamics, which often results in an undesired offset for the actual unicycle position. In this paper, we introduce a new unicycle headway control approach with an adaptive headway distance that overcomes this limitation, i.e., when the headway point reaches the goal the unicycle position is also at the goal. By systematically analyzing the closed-loop unicycle motion under the adaptive headway controller, we design analytical feedback motion prediction methods that bound the closed-loop unicycle position trajectory and so can be effectively used for safety assessment and safe unicycle motion design around obstacles. We present an application of adaptive headway motion control and motion prediction for safe unicycle path following around obstacles in numerical simulations.

Evaluating Customization of Remote Tele-operation Interfaces for Assistive Robots

  • Authors: Vinitha Ranganeni, Noah Ponto, Maya Cakmak
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02771
  • Pdf link: https://arxiv.org/pdf/2304.02771
  • Abstract
    Mobile manipulator platforms, like the Stretch RE1 robot, make the promise of in-home robotic assistance feasible. For people with severe physical limitations, like those with quadriplegia, the ability to tele-operate these robots themselves means that they can perform physical tasks they cannot otherwise do themselves, thereby increasing their level of independence. In order for users with physical limitations to operate these robots, their interfaces must be accessible and cater to the specific needs of all users. As physical limitations vary amongst users, it is difficult to make a single interface that will accommodate all users. Instead, such interfaces should be customizable to each individual user. In this paper we explore the value of customization of a browser-based interface for tele-operating the Stretch RE1 robot. More specifically, we evaluate the usability and effectiveness of a customized interface in comparison to the default interface configurations from prior work. We present a user study involving participants with motor impairments (N=10) and without motor impairments, who could serve as a caregiver, (N=13) that use the robot to perform mobile manipulation tasks in a real kitchen environment. Our study demonstrates that no single interface configuration satisfies all users' needs and preferences. Users perform better when using the customized interface for navigation, but not for manipulation due to higher complexity of learning to manipulate through the robot. All participants are able to use the robot to complete all tasks and participants with motor impairments believe that having the robot in their home would make them more independent.

Gotta Assess `Em All: A Risk Analysis of Criminal Offenses Facilitated through PokemonGO

  • Authors: Ashly Fuller, Martin Lo, Angelica Holmes, Lu Lemanski, Marie Vasek, Enrico Mariconti
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.02952
  • Pdf link: https://arxiv.org/pdf/2304.02952
  • Abstract
    Location-based games have come to the forefront of popularity in casual and mobile gaming over the past six years. However, there is no hard data on crimes that these games enable, ranging from assault to cyberstalking to grooming. Given these potential harms, we conduct a risk assessment and quasi-experiment on the game features of location-based games. Using PokemonGO as a case study, we identify and establish cyber-enabled stalking as the main risk event where in-game features such as an innocent function to share in-game postcards can be exploited by malicious users. Users obtain postcards that are unique to each Pokestop and represent gifts that can be shared with in-game friends. The number of postcards that each user can retain is limited, so they send the excess to their friends with items that boost their friends' game activities. The postcard often also unintentionally leaks the users' commonly visited locations to their in-game friends. We analyze these in-game features using risk assessment and identify cyber-enabled stalking as one of the main threats. We further evaluate the feasibility of this crime through a quasi-experiment. Our results show that participants' routine locations such as home and work can be reliably re-identified within days from the first gift exchange. This exploitation of a previously unconsidered in-game feature enables physical stalking of previously unknown persons which can escalate into more serious crimes. Given current data protection legislation in Europe, further preventive measures are required by Niantic to protect pseudonymized users from being re-identified by in-game features and (potentially) stalked.

SwarmGear: Heterogeneous Swarm of Drones with Reconfigurable Leader Drone and Virtual Impedance Links for Multi-Robot Inspection

  • Authors: Zhanibek Darush, Mikhail Martynov, Aleksey Fedoseev, Aleksei Shcherbak, Dzmitry Tsetserukou
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02956
  • Pdf link: https://arxiv.org/pdf/2304.02956
  • Abstract
    The continuous monitoring by drone swarms remains a challenging problem due to the lack of power supply and the inability of drones to land on uneven surfaces. Heterogeneous swarms, including ground and aerial vehicles, can support longer inspections and carry a higher number of sensors on board. However, their capabilities are limited by the mobility of wheeled and legged robots in a cluttered environment. In this paper, we propose a novel concept for autonomous inspection that we call SwarmGear. SwarmGear utilizes a heterogeneous swarm that investigates the environment in a leader-follower formation. The leader drone is able to land on rough terrain and traverse it by four compliant robotic legs, possessing both the functionalities of an aerial and mobile robot. To preserve the formation of the swarm during its motion, virtual impedance links were developed between the leader and the follower drones. We evaluated experimentally the accuracy of the hybrid leader drone's ground locomotion. By changing the step parameters, the optimal step configuration was found. Two types of gaits were evaluated. The experiments revealed low crosstrack error (mean of 2 cm and max of 4.8 cm) and the ability of the leader drone to move with a 190 mm step length and a 3 degree standard yaw deviation. Four types of drone formations were considered. The best formation was used for experiments with SwarmGear, and it showed low overall crosstrack error for the swarm (mean 7.9 cm for the type 1 gait and 5.1 cm for the type 2 gait). The proposed system can potentially improve the performance of autonomous swarms in cluttered and unstructured environments by allowing all agents of the swarm to switch between aerial and ground formations to overcome various obstacles and perform missions over a large area.

Spritz-PS: Validation of Synthetic Face Images Using a Large Dataset of Printed Documents

  • Authors: Ehsan Nowroozi, Yoosef Habibi, Mauro Conti
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computers and Society (cs.CY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02982
  • Pdf link: https://arxiv.org/pdf/2304.02982
  • Abstract
    The capability of doing effective forensic analysis on printed and scanned (PS) images is essential in many applications. PS documents may be used to conceal the artifacts of images which is due to the synthetic nature of images since these artifacts are typically present in manipulated images and the main artifacts in the synthetic images can be removed after the PS. Due to the appeal of Generative Adversarial Networks (GANs), synthetic face images generated with GANs models are difficult to differentiate from genuine human faces and may be used to create counterfeit identities. Additionally, since GANs models do not account for physiological constraints for generating human faces and their impact on human IRISes, distinguishing genuine from synthetic IRISes in the PS scenario becomes extremely difficult. As a result of the lack of large-scale reference IRIS datasets in the PS scenario, we aim at developing a novel dataset to become a standard for Multimedia Forensics (MFs) investigation which is available at [45]. In this paper, we provide a novel dataset made up of a large number of synthetic and natural printed IRISes taken from VIPPrint Printed and Scanned face images. We extracted irises from face images and it is possible that the model due to eyelid occlusion captured the incomplete irises. To fill the missing pixels of extracted iris, we applied techniques to discover the complex link between the iris images. To highlight the problems involved with the evaluation of the dataset's IRIS images, we conducted a large number of analyses employing Siamese Neural Networks to assess the similarities between genuine and synthetic human IRISes, such as ResNet50, Xception, VGG16, and MobileNet-v2. For instance, using the Xception network, we achieved 56.76% similarity of IRISes for synthetic images and 92.77% similarity of IRISes for real images.

Keyword: pruning

To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency

  • Authors: Daniel Campos, ChengXiang Zhai
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.02721
  • Pdf link: https://arxiv.org/pdf/2304.02721
  • Abstract
    Sequence-to-sequence language models can be used to produce abstractive summaries which are coherent, relevant, and concise. Still, model sizes can make deployment in latency-sensitive or web-scale implementations difficult. This paper studies the relationship between model size, structured pruning, inference efficiency, and summarization accuracy on widely used summarization datasets. We show that model accuracy is tied to the encoder size while inference efficiency is connected to the decoder. Using asymmetric pruning can lead to nearly 3x improvement in inference latency with ~1 point loss in Rouge-2. Moreover, we find both the average degradation and the role of asymmetry to be consistent across model sizes and variations in datasets.

NTK-SAP: Improving neural network pruning by aligning training dynamics

  • Authors: Yite Wang, Dawei Li, Ruoyu Sun
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02840
  • Pdf link: https://arxiv.org/pdf/2304.02840
  • Abstract
    Pruning neural networks before training has received increasing interest due to its potential to reduce training time and memory. One popular method is to prune the connections based on a certain metric, but it is not entirely clear what metric is the best choice. Recent advances in neural tangent kernel (NTK) theory suggest that the training dynamics of large enough neural networks is closely related to the spectrum of the NTK. Motivated by this finding, we propose to prune the connections that have the least influence on the spectrum of the NTK. This method can help maintain the NTK spectrum, which may help align the training dynamics to that of its dense counterpart. However, one possible issue is that the fixed-weight-NTK corresponding to a given initial point can be very different from the NTK corresponding to later iterates during the training phase. We further propose to sample multiple realizations of random weights to estimate the NTK spectrum. Note that our approach is weight-agnostic, which is different from most existing methods that are weight-dependent. In addition, we use random inputs to compute the fixed-weight-NTK, making our method data-agnostic as well. We name our foresight pruning algorithm Neural Tangent Kernel Spectrum-Aware Pruning (NTK-SAP). Empirically, our method achieves better performance than all baselines on multiple datasets.

Learning to Learn with Indispensable Connections

  • Authors: Sambhavi Tiwari, Manas Gogoi, Shekhar Verma, Krishna Pratap Singh
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02862
  • Pdf link: https://arxiv.org/pdf/2304.02862
  • Abstract
    Meta-learning aims to solve unseen tasks with few labelled instances. Nevertheless, despite its effectiveness for quick learning in existing optimization-based methods, it has several flaws. Inconsequential connections are frequently seen during meta-training, which results in an over-parameterized neural network. Because of this, meta-testing observes unnecessary computations and extra memory overhead. To overcome such flaws. We propose a novel meta-learning method called Meta-LTH that includes indispensible (necessary) connections. We applied the lottery ticket hypothesis technique known as magnitude pruning to generate these crucial connections that can effectively solve few-shot learning problem. We aim to perform two things: (a) to find a sub-network capable of more adaptive meta-learning and (b) to learn new low-level features of unseen tasks and recombine those features with the already learned features during the meta-test phase. Experimental results show that our proposed Met-LTH method outperformed existing first-order MAML algorithm for three different classification datasets. Our method improves the classification accuracy by approximately 2% (20-way 1-shot task setting) for omniglot dataset.

Keyword: voxel

VPFusion: Towards Robust Vertical Representation Learning for 3D Object Detection

  • Authors: Yuhao Huang, Sanping Zhou, Junjie Zhang, Jinpeng Dong, Nanning Zheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02867
  • Pdf link: https://arxiv.org/pdf/2304.02867
  • Abstract
    Efficient point cloud representation is a fundamental element of Lidar-based 3D object detection. Recent grid-based detectors usually divide point clouds into voxels or pillars and construct single-stream networks in Bird's Eye View. However, these point cloud encoding paradigms underestimate the point representation in the vertical direction, which cause the loss of semantic or fine-grained information, especially for vertical sensitive objects like pedestrian and cyclists. In this paper, we propose an explicit vertical multi-scale representation learning framework, VPFusion, to combine the complementary information from both voxel and pillar streams. Specifically, VPFusion first builds upon a sparse voxel-pillar-based backbone. The backbone divides point clouds into voxels and pillars, then encodes features with 3D and 2D sparse convolution simultaneously. Next, we introduce the Sparse Fusion Layer (SFL), which establishes a bidirectional pathway for sparse voxel and pillar features to enable the interaction between them. Additionally, we present the Dense Fusion Neck (DFN) to effectively combine the dense feature maps from voxel and pillar branches with multi-scale. Extensive experiments on the large-scale Waymo Open Dataset and nuScenes Dataset demonstrate that VPFusion surpasses the single-stream baselines by a large margin and achieves state-of-the-art performance with real-time inference speed.

Keyword: lidar

VPFusion: Towards Robust Vertical Representation Learning for 3D Object Detection

  • Authors: Yuhao Huang, Sanping Zhou, Junjie Zhang, Jinpeng Dong, Nanning Zheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02867
  • Pdf link: https://arxiv.org/pdf/2304.02867
  • Abstract
    Efficient point cloud representation is a fundamental element of Lidar-based 3D object detection. Recent grid-based detectors usually divide point clouds into voxels or pillars and construct single-stream networks in Bird's Eye View. However, these point cloud encoding paradigms underestimate the point representation in the vertical direction, which cause the loss of semantic or fine-grained information, especially for vertical sensitive objects like pedestrian and cyclists. In this paper, we propose an explicit vertical multi-scale representation learning framework, VPFusion, to combine the complementary information from both voxel and pillar streams. Specifically, VPFusion first builds upon a sparse voxel-pillar-based backbone. The backbone divides point clouds into voxels and pillars, then encodes features with 3D and 2D sparse convolution simultaneously. Next, we introduce the Sparse Fusion Layer (SFL), which establishes a bidirectional pathway for sparse voxel and pillar features to enable the interaction between them. Additionally, we present the Dense Fusion Neck (DFN) to effectively combine the dense feature maps from voxel and pillar branches with multi-scale. Extensive experiments on the large-scale Waymo Open Dataset and nuScenes Dataset demonstrate that VPFusion surpasses the single-stream baselines by a large margin and achieves state-of-the-art performance with real-time inference speed.

Geometric-aware Pretraining for Vision-centric 3D Object Detection

  • Authors: Linyan Huang, Huijie Wang, Jia Zeng, Shengchuan Zhang, Liujuan Cao, Rongrong Ji, Junchi Yan, Hongyang Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03105
  • Pdf link: https://arxiv.org/pdf/2304.03105
  • Abstract
    Multi-camera 3D object detection for autonomous driving is a challenging problem that has garnered notable attention from both academia and industry. An obstacle encountered in vision-based techniques involves the precise extraction of geometry-conscious features from RGB images. Recent approaches have utilized geometric-aware image backbones pretrained on depth-relevant tasks to acquire spatial information. However, these approaches overlook the critical aspect of view transformation, resulting in inadequate performance due to the misalignment of spatial knowledge between the image backbone and view transformation. To address this issue, we propose a novel geometric-aware pretraining framework called GAPretrain. Our approach incorporates spatial and structural cues to camera networks by employing the geometric-rich modality as guidance during the pretraining phase. The transference of modal-specific attributes across different modalities is non-trivial, but we bridge this gap by using a unified bird's-eye-view (BEV) representation and structural hints derived from LiDAR point clouds to facilitate the pretraining process. GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors. Our experiments demonstrate the effectiveness and generalization ability of the proposed method. We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively. We also conduct experiments on various image backbones and view transformations to validate the efficacy of our approach. Code will be released at https://github.com/OpenDriveLab/BEVPerception-Survey-Recipe.

SALUDA: Surface-based Automotive Lidar Unsupervised Domain Adaptation

  • Authors: Bjoern Michele, Alexandre Boulch, Gilles Puy, Tuan-Hung Vu, Renaud Marlet, Nicolas Courty
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03251
  • Pdf link: https://arxiv.org/pdf/2304.03251
  • Abstract
    Learning models on one labeled dataset that generalize well on another domain is a difficult task, as several shifts might happen between the data domains. This is notably the case for lidar data, for which models can exhibit large performance discrepancies due for instance to different lidar patterns or changes in acquisition conditions. This paper addresses the corresponding Unsupervised Domain Adaptation (UDA) task for semantic segmentation. To mitigate this problem, we introduce an unsupervised auxiliary task of learning an implicit underlying surface representation simultaneously on source and target data. As both domains share the same latent representation, the model is forced to accommodate discrepancies between the two sources of data. This novel strategy differs from classical minimization of statistical divergences or lidar-specific state-of-the-art domain adaptation techniques. Our experiments demonstrate that our method achieves a better performance than the current state of the art in synthetic-to-real and real-to-real scenarios.

Keyword: diffusion

DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model

  • Authors: Hoigi Seo, Hayeon Kim, Gwanghyun Kim, Se Young Chun
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.02827
  • Pdf link: https://arxiv.org/pdf/2304.02827
  • Abstract
    The increasing demand for high-quality 3D content creation has motivated the development of automated methods for creating 3D object models from a single image and/or from a text prompt. However, the reconstructed 3D objects using state-of-the-art image-to-3D methods still exhibit low correspondence to the given image and low multi-view consistency. Recent state-of-the-art text-to-3D methods are also limited, yielding 3D samples with low diversity per prompt with long synthesis time. To address these challenges, we propose DITTO-NeRF, a novel pipeline to generate a high-quality 3D NeRF model from a text prompt or a single image. Our DITTO-NeRF consists of constructing high-quality partial 3D object for limited in-boundary (IB) angles using the given or text-generated 2D image from the frontal view and then iteratively reconstructing the remaining 3D NeRF using inpainting latent diffusion model. We propose progressive 3D object reconstruction schemes in terms of scales (low to high resolution), angles (IB angles initially to outer-boundary (OB) later), and masks (object to background boundary) in our DITTO-NeRF so that high-quality information on IB can be propagated into OB. Our DITTO-NeRF outperforms state-of-the-art methods in terms of fidelity and diversity qualitatively and quantitatively with much faster training times than prior arts on image/text-to-3D such as DreamFusion, and NeuralLift-360.

Benchmarking Robustness to Text-Guided Corruptions

  • Authors: Mohammadreza Mofayezi, Yasamin Medghalchi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02963
  • Pdf link: https://arxiv.org/pdf/2304.02963
  • Abstract
    This study investigates the robustness of image classifiers to text-guided corruptions. We utilize diffusion models to edit images to different domains. Unlike other works that use synthetic or hand-picked data for benchmarking, we use diffusion models as they are generative models capable of learning to edit images while preserving their semantic content. Thus, the corruptions will be more realistic and the comparison will be more informative. Also, there is no need for manual labeling and we can create large-scale benchmarks with less effort. We define a prompt hierarchy based on the original ImageNet hierarchy to apply edits in different domains. As well as introducing a new benchmark we try to investigate the robustness of different vision models. The results of this study demonstrate that the performance of image classifiers decreases significantly in different language-based corruptions and edit domains. We also observe that convolutional models are more robust than transformer architectures. Additionally, we see that common data augmentation techniques can improve the performance on both the original data and the edited images. The findings of this research can help improve the design of image classifiers and contribute to the development of more robust machine learning systems. The code for generating the benchmark will be made available online upon publication.

DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance

  • Authors: Longwen Zhang, Qiwei Qiu, Hongyang Lin, Qixuan Zhang, Cheng Shi, Wei Yang, Ye Shi, Sibei Yang, Lan Xu, Jingyi Yu
  • Subjects: Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.03117
  • Pdf link: https://arxiv.org/pdf/2304.03117
  • Abstract
    Emerging Metaverse applications demand accessible, accurate, and easy-to-use tools for 3D digital human creations in order to depict different cultures and societies as if in the physical world. Recent large-scale vision-language advances pave the way to for novices to conveniently customize 3D content. However, the generated CG-friendly assets still cannot represent the desired facial traits for human characteristics. In this paper, we present DreamFace, a progressive scheme to generate personalized 3D faces under text guidance. It enables layman users to naturally customize 3D facial assets that are compatible with CG pipelines, with desired shapes, textures, and fine-grained animation capabilities. From a text input to describe the facial traits, we first introduce a coarse-to-fine scheme to generate the neutral facial geometry with a unified topology. We employ a selection strategy in the CLIP embedding space, and subsequently optimize both the details displacements and normals using Score Distillation Sampling from generic Latent Diffusion Model. Then, for neutral appearance generation, we introduce a dual-path mechanism, which combines the generic LDM with a novel texture LDM to ensure both the diversity and textural specification in the UV space. We also employ a two-stage optimization to perform SDS in both the latent and image spaces to significantly provides compact priors for fine-grained synthesis. Our generated neutral assets naturally support blendshapes-based facial animations. We further improve the animation ability with personalized deformation characteristics by learning the universal expression prior using the cross-identity hypernetwork. Notably, DreamFace can generate of realistic 3D facial assets with physically-based rendering quality and rich animation ability from video footage, even for fashion icons or exotic characters in cartoons and fiction movies.

Zero-shot Generative Model Adaptation via Image-specific Prompt Learning

  • Authors: Jiayi Guo, Chaofei Wang, You Wu, Eric Zhang, Kai Wang, Xingqian Xu, Shiji Song, Humphrey Shi, Gao Huang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03119
  • Pdf link: https://arxiv.org/pdf/2304.03119
  • Abstract
    Recently, CLIP-guided image synthesis has shown appealing performance on adapting a pre-trained source-domain generator to an unseen target domain. It does not require any target-domain samples but only the textual domain labels. The training is highly efficient, e.g., a few minutes. However, existing methods still have some limitations in the quality of generated images and may suffer from the mode collapse issue. A key reason is that a fixed adaptation direction is applied for all cross-domain image pairs, which leads to identical supervision signals. To address this issue, we propose an Image-specific Prompt Learning (IPL) method, which learns specific prompt vectors for each source-domain image. This produces a more precise adaptation direction for every cross-domain image pair, endowing the target-domain generator with greatly enhanced flexibility. Qualitative and quantitative evaluations on various domains demonstrate that IPL effectively improves the quality and diversity of synthesized images and alleviates the mode collapse. Moreover, IPL is independent of the structure of the generative model, such as generative adversarial networks or diffusion models. Code is available at https://github.com/Picsart-AI-Research/IPL-Zero-Shot-Generative-Model-Adaptation.

SketchFFusion: Sketch-guided image editing with diffusion model

  • Authors: Weihang Mao, Bo Han, Zihao Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03174
  • Pdf link: https://arxiv.org/pdf/2304.03174
  • Abstract
    Sketch-guided image editing aims to achieve local fine-tuning of the image based on the sketch information provided by the user, while maintaining the original status of the unedited areas. Due to the high cost of acquiring human sketches, previous works mostly relied on edge maps as a substitute for sketches, but sketches possess more rich structural information. In this paper, we propose a sketch generation scheme that can preserve the main contours of an image and closely adhere to the actual sketch style drawn by the user. Simultaneously, current image editing methods often face challenges such as image distortion, training cost, and loss of fine details in the sketch. To address these limitations, We propose a conditional diffusion model (SketchFFusion) based on the sketch structure vector. We evaluate the generative performance of our model and demonstrate that it outperforms existing methods.

Face Animation with an Attribute-Guided Diffusion Model

  • Authors: Bohan Zeng, Xuhui Liu, Sicheng Gao, Boyu Liu, Hong Li, Jianzhuang Liu, Baochang Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03199
  • Pdf link: https://arxiv.org/pdf/2304.03199
  • Abstract
    Face animation has achieved much progress in computer vision. However, prevailing GAN-based methods suffer from unnatural distortions and artifacts due to sophisticated motion deformation. In this paper, we propose a Face Animation framework with an attribute-guided Diffusion Model (FADM), which is the first work to exploit the superior modeling capacity of diffusion models for photo-realistic talking-head generation. To mitigate the uncontrollable synthesis effect of the diffusion model, we design an Attribute-Guided Conditioning Network (AGCN) to adaptively combine the coarse animation features and 3D face reconstruction results, which can incorporate appearance and motion conditions into the diffusion process. These specific designs help FADM rectify unnatural artifacts and distortions, and also enrich high-fidelity facial details through iterative diffusion refinements with accurate animation attributes. FADM can flexibly and effectively improve existing animation videos. Extensive experiments on widely used talking-head benchmarks validate the effectiveness of FADM over prior arts.

Inst-Inpaint: Instructing to Remove Objects with Diffusion Models

  • Authors: Ahmet Burak Yildirim, Vedat Baday, Erkut Erdem, Aykut Erdem, Aysegul Dundar
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03246
  • Pdf link: https://arxiv.org/pdf/2304.03246
  • Abstract
    Image inpainting task refers to erasing unwanted pixels from images and filling them in a semantically consistent and realistic way. Traditionally, the pixels that are wished to be erased are defined with binary masks. From the application point of view, a user needs to generate the masks for the objects they would like to remove which can be time-consuming and prone to errors. In this work, we are interested in an image inpainting algorithm that estimates which object to be removed based on natural language input and also removes it, simultaneously. For this purpose, first, we construct a dataset named GQA-Inpaint for this task which will be released soon. Second, we present a novel inpainting framework, Inst-Inpaint, that can remove objects from images based on the instructions given as text prompts. We set various GAN and diffusion-based baselines and run experiments on synthetic and real image datasets. We compare methods with different evaluation metrics that measure the quality and accuracy of the models and show significant quantitative and qualitative improvements.

Diffusion Models as Masked Autoencoders

  • Authors: Chen Wei, Karttikeya Mangalam, Po-Yao Huang, Yanghao Li, Haoqi Fan, Hu Xu, Huiyu Wang, Cihang Xie, Alan Yuille, Christoph Feichtenhofer
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03283
  • Pdf link: https://arxiv.org/pdf/2304.03283
  • Abstract
    There has been a longstanding belief that generation can facilitate a true understanding of visual data. In line with this, we revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models. While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE). Our approach is capable of (i) serving as a strong initialization for downstream recognition tasks, (ii) conducting high-quality image inpainting, and (iii) being effortlessly extended to video where it produces state-of-the-art classification accuracy. We further perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.

Keyword: dynamic

Abstraction-based Probabilistic Stability Analysis of Polyhedral Probabilistic Hybrid Systems

  • Authors: Spandan Das, Pavithra Prabhakar
  • Subjects: Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02647
  • Pdf link: https://arxiv.org/pdf/2304.02647
  • Abstract
    In this paper, we consider the problem of probabilistic stability analysis of a subclass of Stochastic Hybrid Systems, namely, Polyhedral Probabilistic Hybrid Systems (PPHS), where the flow dynamics is given by a polyhedral inclusion, the discrete switching between modes happens probabilistically at the boundaries of their invariant regions and the continuous state is not reset during switching. We present an abstraction-based analysis framework that consists of constructing a finite Markov Decision Processes (MDP) such that verification of certain property on the finite MDP ensures the satisfaction of probabilistic stability on the PPHS. Further, we present a polynomial-time algorithm for verifying the corresponding property on the MDP. Our experimental analysis demonstrates the feasibility of the approach in successfully verifying probabilistic stability on PPHS of various dimensions and sizes.

Emergent Coordination through Game-Induced Nonlinear Opinion Dynamics

  • Authors: Haimin Hu, Kensuke Nakamura, Kai-Chieh Hsu, Naomi Ehrich Leonard, Jaime Fernández Fisac
  • Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02687
  • Pdf link: https://arxiv.org/pdf/2304.02687
  • Abstract
    We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to unsafe behavior in safety-critical events. The recently developed nonlinear opinion dynamics (NOD) provide guarantees for breaking deadlocks. However, choosing the appropriate model parameters automatically in general multi-agent settings remains a challenge. In this paper, we first propose a novel and principled procedure for synthesizing NOD based on the value functions of dynamic games conditioned on agents' intents. In particular, we provide for the two-player two-option case precise stability conditions for equilibria of the game-induced NOD based on the mismatch between agents' opinions and their game values. We then propose an optimization-based trajectory optimization algorithm that computes agents' policies guided by the evolution of opinions. The efficacy of our method is illustrated with a simulated toll station coordination example.

Going Further: Flatness at the Rescue of Early Stopping for Adversarial Example Transferability

  • Authors: Martin Gubri, Maxime Cordy, Yves Le Traon
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02688
  • Pdf link: https://arxiv.org/pdf/2304.02688
  • Abstract
    Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that transferability is substantially increased when the training of the surrogate model has been early stopped. A common hypothesis to explain this is that the later training epochs are when models learn the non-robust features that adversarial attacks exploit. Hence, an early stopped model is more robust (hence, a better surrogate) than fully trained models. We demonstrate that the reasons why early stopping improves transferability lie in the side effects it has on the learning dynamics of the model. We first show that early stopping benefits transferability even on models learning from data with non-robust features. We then establish links between transferability and the exploration of the loss landscape in the parameter space, on which early stopping has an inherent effect. More precisely, we observe that transferability peaks when the learning rate decays, which is also the time at which the sharpness of the loss significantly drops. This leads us to propose RFN, a new approach for transferability that minimizes loss sharpness during training in order to maximize transferability. We show that by searching for large flat neighborhoods, RFN always improves over early stopping (by up to 47 points of transferability rate) and is competitive to (if not better than) strong state-of-the-art baselines.

ACTION++: Improving Semi-supervised Medical Image Segmentation with Adaptive Anatomical Contrast

  • Authors: Chenyu You, Weicheng Dai, Yifei Min, Lawrence Staib, Jas Sekhon, James S. Duncan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.02689
  • Pdf link: https://arxiv.org/pdf/2304.02689
  • Abstract
    Medical data often exhibits long-tail distributions with heavy class imbalance, which naturally leads to difficulty in classifying the minority classes (i.e., boundary regions or rare objects). Recent work has significantly improved semi-supervised medical image segmentation in long-tailed scenarios by equipping them with unsupervised contrastive criteria. However, it remains unclear how well they will perform in the labeled portion of data where class distribution is also highly imbalanced. In this work, we present ACTION++, an improved contrastive learning framework with adaptive anatomical contrast for semi-supervised medical segmentation. Specifically, we propose an adaptive supervised contrastive loss, where we first compute the optimal locations of class centers uniformly distributed on the embedding space (i.e., off-line), and then perform online contrastive matching training by encouraging different class features to adaptively match these distinct and uniformly distributed class centers. Moreover, we argue that blindly adopting a constant temperature $\tau$ in the contrastive loss on long-tailed medical data is not optimal, and propose to use a dynamic $\tau$ via a simple cosine schedule to yield better separation between majority and minority classes. Empirically, we evaluate ACTION++ on ACDC and LA benchmarks and show that it achieves state-of-the-art across two semi-supervised settings. Theoretically, we analyze the performance of adaptive anatomical contrast and confirm its superiority in label efficiency.

Recovering Continuous Scene Dynamics from A Single Blurry Image with Events

  • Authors: Zhangyi Cheng, Xiang Zhang, Lei Yu, Jianzhuang Liu, Wen Yang, Gui-Song Xia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02695
  • Pdf link: https://arxiv.org/pdf/2304.02695
  • Abstract
    This paper aims at demystifying a single motion-blurred image with events and revealing temporally continuous scene dynamics encrypted behind motion blurs. To achieve this end, an Implicit Video Function (IVF) is learned to represent a single motion blurred image with concurrent events, enabling the latent sharp image restoration of arbitrary timestamps in the range of imaging exposures. Specifically, a dual attention transformer is proposed to efficiently leverage merits from both modalities, i.e., the high temporal resolution of event features and the smoothness of image features, alleviating temporal ambiguities while suppressing the event noise. The proposed network is trained only with the supervision of ground-truth images of limited referenced timestamps. Motion- and texture-guided supervisions are employed simultaneously to enhance restorations of the non-referenced timestamps and improve the overall sharpness. Experiments on synthetic, semi-synthetic, and real-world datasets demonstrate that our proposed method outperforms state-of-the-art methods by a large margin in terms of both objective PSNR and SSIM measurements and subjective evaluations.

Efficient and Accurate Automatic Python Bindings with cppyy & Cling

  • Authors: Baidyanath Kundu (1 and 2), Vassil Vassilev (1 and 2), Wim Lavrijsen (3) ((1) European Council for Nuclear Research, (2) Princeton University (US), (3) LBNL (US))
  • Subjects: Programming Languages (cs.PL)
  • Arxiv link: https://arxiv.org/abs/2304.02712
  • Pdf link: https://arxiv.org/pdf/2304.02712
  • Abstract
    The simplicity of Python and the power of C++ force stark choices on a scientific software stack. There have been multiple developments to mitigate language boundaries by implementing language bindings, but the impedance mismatch between the static nature of C++ and the dynamic one of Python hinders their implementation; examples include the use of user-defined Python types with templated C++ and advanced memory management. The development of the C++ interpreter Cling has changed the way we can think of language bindings as it provides an incremental compilation infrastructure available at runtime. That is, Python can interrogate C++ on demand, and bindings can be lazily constructed at runtime. This automatic binding provision requires no direct support from library authors and offers better performance than alternative solutions, such as PyBind11. ROOT pioneered this approach with PyROOT, which was later enhanced with its successor, cppyy. However, until now, cppyy relied on the reflection layer of ROOT, which is limited in terms of provided features and performance. This paper presents the next step for language interoperability with cppyy, enabling research into uniform cross-language execution environments and boosting optimization opportunities across language boundaries. We illustrate the use of advanced C++ in Numba-accelerated Python through cppyy. We outline a path forward for re-engineering parts of cppyy to use upstream LLVM components to improve performance and sustainability. We demonstrate cppyy purely based on a C++ reflection library, InterOp, which offers interoperability primitives based on Cling and Clang-Repl.

Software and Analysis for Dynamic Voronoi Diagrams in the Hilbert Metric

  • Authors: Madeline Bumpus, Caesar Dai, Auguste H. Gezalyan, Sam Munoz, Renita Santhoshkumar, Songyu Ye, David M. Mount
  • Subjects: Computational Geometry (cs.CG)
  • Arxiv link: https://arxiv.org/abs/2304.02745
  • Pdf link: https://arxiv.org/pdf/2304.02745
  • Abstract
    The Hilbert metric is a projective metric defined on a convex body which generalizes the Cayley-Klein model of hyperbolic geometry to any convex set. In this paper we analyze Hilbert Voronoi diagrams in the Dynamic setting. In addition we introduce dynamic visualization software for Voronoi diagrams in the Hilbert metric on user specified convex polygons.

Adaptive Headway Motion Control and Motion Prediction for Safe Unicycle Motion Design

  • Authors: Aykut İşleyen, Nathan van de Wouw, Ömür Arslan
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02760
  • Pdf link: https://arxiv.org/pdf/2304.02760
  • Abstract
    Differential drive robots that can be modeled as a kinematic unicycle are a standard mobile base platform for many service and logistics robots. Safe and smooth autonomous motion around obstacles is a crucial skill for unicycle robots to perform diverse tasks in complex environments. A classical control approach for unicycle control is feedback linearization using a headway point at a fixed headway distance in front of the unicycle. The unicycle headway control brings the headway point to a desired goal location by embedding a linear headway reference dynamics, which often results in an undesired offset for the actual unicycle position. In this paper, we introduce a new unicycle headway control approach with an adaptive headway distance that overcomes this limitation, i.e., when the headway point reaches the goal the unicycle position is also at the goal. By systematically analyzing the closed-loop unicycle motion under the adaptive headway controller, we design analytical feedback motion prediction methods that bound the closed-loop unicycle position trajectory and so can be effectively used for safety assessment and safe unicycle motion design around obstacles. We present an application of adaptive headway motion control and motion prediction for safe unicycle path following around obstacles in numerical simulations.

A Robust Observer with Gyroscopic Bias Correction for Rotational Dynamics

  • Authors: Erjen Lefeber, Marcus Greiff, Anders Robertsson
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02763
  • Pdf link: https://arxiv.org/pdf/2304.02763
  • Abstract
    We propose an observer for rotational dynamics subject to directional and gyroscopic measurements, which simultaneously estimates the gyroscopic biases and attitude rates. We show uniform almost global asymptotic and local exponential stability of the resulting error dynamics, implying robustness against bounded disturbances. This robustness is quantified with respect to a popular nonlinear complementary filter in quantitative simulation studies, and we explore how the measurement noise propagates to the asymptotic errors as a function of tuning. This is an extended version of a paper with the same title (to appear at IFAC WC 2023). Additional mathematical details are provided in this extended version.

MoStGAN-V: Video Generation with Temporal Motion Styles

  • Authors: Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02777
  • Pdf link: https://arxiv.org/pdf/2304.02777
  • Abstract
    Video generation remains a challenging task due to spatiotemporal complexity and the requirement of synthesizing diverse motions with temporal consistency. Previous works attempt to generate videos in arbitrary lengths either in an autoregressive manner or regarding time as a continuous signal. However, they struggle to synthesize detailed and diverse motions with temporal coherence and tend to generate repetitive scenes after a few time steps. In this work, we argue that a single time-agnostic latent vector of style-based generator is insufficient to model various and temporally-consistent motions. Hence, we introduce additional time-dependent motion styles to model diverse motion patterns. In addition, a Motion Style Attention modulation mechanism, dubbed as MoStAtt, is proposed to augment frames with vivid dynamics for each specific scale (i.e., layer), which assigns attention score for each motion style w.r.t deconvolution filter weights in the target synthesis layer and softly attends different motion styles for weight modulation. Experimental results show our model achieves state-of-the-art performance on four unconditional $256^2$ video synthesis benchmarks trained with only 3 frames per clip and produces better qualitative results with respect to dynamic motions. Code and videos have been made available at https://github.com/xiaoqian-shen/MoStGAN-V.

Enhanced Grid Following Inverter: A Uniform Control Design Framework

  • Authors: Alireza Askarian, Jaesang Park, Srinivasa Salapaka
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02792
  • Pdf link: https://arxiv.org/pdf/2304.02792
  • Abstract
    This article presents a novel grid following (GFL) inverter control design framework that exploits the line dynamics structure in $dq$ frame and treats the inverter as an actuator. The proposed framework imposes a structure on the line's coupled dynamics and captures the effect of coupling on the GFL inverter's closed-loop stability and performance. One of the main features of our work is using the bode sensitivity integral to characterize the fundamental limitations of control design. These constraints translate into fundamental trade-offs between performance objectives such as reference tracking, closed-loop bandwidth, robust synchronization, and resilience to grid anomalies. The article develops design considerations to ensure specific trade-offs. We assess the performance of our proposed framework through simulation and experimental results.

Unveiling the Dynamics of Censorship, COVID-19 Regulations, and Protest: An Empirical Study of Chinese Subreddit r/china_irl

  • Authors: Siyi Zhou, Luca Luceri, Emilio Ferrara
  • Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2304.02800
  • Pdf link: https://arxiv.org/pdf/2304.02800
  • Abstract
    The COVID-19 pandemic has intensified numerous social issues that warrant academic investigation. Although information dissemination has been extensively studied, the silenced voices and censored content also merit attention due to their role in mobilizing social movements. In this paper, we provide empirical evidence to explore the relationships among COVID-19 regulations, censorship, and protest through a series of social incidents occurred in China during 2022. We analyze the similarities and differences between censored articles and discussions on r/china_irl, the most popular Chinese-speaking subreddit, and scrutinize the temporal dynamics of government censorship activities and their impact on user engagement within the subreddit. Furthermore, we examine users' linguistic patterns under the influence of a censorship-driven environment. Our findings reveal patterns in topic recurrence, the complex interplay between censorship activities, user subscription, and collective commenting behavior, as well as potential linguistic adaptation strategies to circumvent censorship. These insights hold significant implications for researchers interested in understanding the survival mechanisms of marginalized groups within censored information ecosystems.

Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling

  • Authors: Haotao Wang, Ziyu Jiang, Yan Han, Zhangyang Wang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02806
  • Pdf link: https://arxiv.org/pdf/2304.02806
  • Abstract
    Graph neural networks (GNNs) have been widely applied to learning over graph data. Yet, real-world graphs commonly exhibit diverse graph structures and contain heterogeneous nodes and edges. Moreover, to enhance the generalization ability of GNNs, it has become common practice to further increase the diversity of training graph structures by incorporating graph augmentations and/or performing large-scale pre-training on more graphs. Therefore, it becomes essential for a GNN to simultaneously model diverse graph structures. Yet, naively increasing the GNN model capacity will suffer from both higher inference costs and the notorious trainability issue of GNNs. This paper introduces the Mixture-of-Expert (MoE) idea to GNNs, aiming to enhance their ability to accommodate the diversity of training graph structures, without incurring computational overheads. Our new Graph Mixture of Expert (GMoE) model enables each node in the graph to dynamically select its own optimal \textit{information aggregation experts}. These experts are trained to model different subgroups of graph structures in the training set. Additionally, GMoE includes information aggregation experts with varying aggregation hop sizes, where the experts with larger hop sizes are specialized in capturing information over longer ranges. The effectiveness of GMoE is verified through experimental results on a large variety of graph, node, and link prediction tasks in the OGB benchmark. For instance, it enhances ROC-AUC by $1.81%$ in ogbg-molhiv and by $1.40%$ in ogbg-molbbbp, as compared to the non-MoE baselines. Our code is available at https://github.com/VITA-Group/Graph-Mixture-of-Experts.

Causal Repair of Learning-enabled Cyber-physical Systems

  • Authors: Pengyuan Lu, Ivan Ruchkin, Matthew Cleaveland, Oleg Sokolsky, Insup Lee
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.02813
  • Pdf link: https://arxiv.org/pdf/2304.02813
  • Abstract
    Models of actual causality leverage domain knowledge to generate convincing diagnoses of events that caused an outcome. It is promising to apply these models to diagnose and repair run-time property violations in cyber-physical systems (CPS) with learning-enabled components (LEC). However, given the high diversity and complexity of LECs, it is challenging to encode domain knowledge (e.g., the CPS dynamics) in a scalable actual causality model that could generate useful repair suggestions. In this paper, we focus causal diagnosis on the input/output behaviors of LECs. Specifically, we aim to identify which subset of I/O behaviors of the LEC is an actual cause for a property violation. An important by-product is a counterfactual version of the LEC that repairs the run-time property by fixing the identified problematic behaviors. Based on this insights, we design a two-step diagnostic pipeline: (1) construct and Halpern-Pearl causality model that reflects the dependency of property outcome on the component's I/O behaviors, and (2) perform a search for an actual cause and corresponding repair on the model. We prove that our pipeline has the following guarantee: if an actual cause is found, the system is guaranteed to be repaired; otherwise, we have high probabilistic confidence that the LEC under analysis did not cause the property violation. We demonstrate that our approach successfully repairs learned controllers on a standard OpenAI Gym benchmark.

NTK-SAP: Improving neural network pruning by aligning training dynamics

  • Authors: Yite Wang, Dawei Li, Ruoyu Sun
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02840
  • Pdf link: https://arxiv.org/pdf/2304.02840
  • Abstract
    Pruning neural networks before training has received increasing interest due to its potential to reduce training time and memory. One popular method is to prune the connections based on a certain metric, but it is not entirely clear what metric is the best choice. Recent advances in neural tangent kernel (NTK) theory suggest that the training dynamics of large enough neural networks is closely related to the spectrum of the NTK. Motivated by this finding, we propose to prune the connections that have the least influence on the spectrum of the NTK. This method can help maintain the NTK spectrum, which may help align the training dynamics to that of its dense counterpart. However, one possible issue is that the fixed-weight-NTK corresponding to a given initial point can be very different from the NTK corresponding to later iterates during the training phase. We further propose to sample multiple realizations of random weights to estimate the NTK spectrum. Note that our approach is weight-agnostic, which is different from most existing methods that are weight-dependent. In addition, we use random inputs to compute the fixed-weight-NTK, making our method data-agnostic as well. We name our foresight pruning algorithm Neural Tangent Kernel Spectrum-Aware Pruning (NTK-SAP). Empirically, our method achieves better performance than all baselines on multiple datasets.

Design and Control of a Ballbot Drivetrain with High Agility, Minimal Footprint, and High Payload

  • Authors: Chenzhang Xiao, Mahshid Mansouri, David Lam, Joao Ramos, Elizabeth T. Hsiao-Wecksler
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02887
  • Pdf link: https://arxiv.org/pdf/2304.02887
  • Abstract
    This paper presents the design and control of a ballbot drivetrain that aims to achieve high agility, minimal footprint, and high payload capacity while maintaining dynamic stability. Two hardware platforms and analytical models were developed to test design and control methodologies. The full-scale ballbot prototype (MiaPURE) was constructed using off-the-shelf components and designed to have agility, footprint, and balance similar to that of a walking human. The planar inverted pendulum testbed (PIPTB) was developed as a reduced-order testbed for quick validation of system performance. We then proposed a simple yet robust LQR-PI controller to balance and maneuver the ballbot drivetrain with a heavy payload. This is crucial because the drivetrain is often subject to high stiction due to elastomeric components in the torque transmission system. This controller was first tested in the PIPTB to compare with traditional LQR and cascaded PI-PD controllers, and then implemented in the ballbot drivetrain. The MiaPURE drivetrain was able to carry a payload of 60 kg, achieve a maximum speed of 2.3 m/s, and come to a stop from a speed of 1.4 m/s in 2 seconds in a selected translation direction. Finally, we demonstrated the omnidirectional movement of the ballbot drivetrain in an indoor environment as a payload-carrying robot and a human-riding mobility device. Our experiments demonstrated the feasibility of using the ballbot drivetrain as a universal mobility platform with agile movements, minimal footprint, and high payload capacity using our proposed design and control methodologies.

LSketch: A Label-Enabled Graph Stream Sketch Toward Time-Sensitive Queries

  • Authors: Yiling Zeng, Chunyao Song, Yuhan Li, Tingjian Ge
  • Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.02897
  • Pdf link: https://arxiv.org/pdf/2304.02897
  • Abstract
    Graph streams represent data interactions in real applications. The mining of graph streams plays an important role in network security, social network analysis, and traffic control, among others. However, the sheer volume and high dynamics cause great challenges for efficient storage and subsequent query analysis on them. Current studies apply sketches to summarize graph streams. We propose LSketch that works for heterogeneous graph streams, which effectively preserves the label information carried by the streams in real scenes, thereby enriching the expressive ability of sketches. In addition, as graph streams continue to evolve over time, edges too old may lose their practical significance. Therefore, we introduce the sliding window model into LSketch to eliminate the expired edges automatically. LSketch uses sub-linear storage space and can support structure based queries and time-sensitive queries with high accuracy. We perform extensive experiments over four real datasets, demonstrating the superiority of the proposed method over state-of-the-art methods, in aspects of query accuracy and time efficiency.

Quantifying and Defending against Privacy Threats on Federated Knowledge Graph Embedding

  • Authors: Yuke Hu, Wei Liang, Ruofan Wu, Kai Xiao, Weiqiang Wang, Xiaochen Li, Jinfei Liu, Zhan Qin
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.02932
  • Pdf link: https://arxiv.org/pdf/2304.02932
  • Abstract
    Knowledge Graph Embedding (KGE) is a fundamental technique that extracts expressive representation from knowledge graph (KG) to facilitate diverse downstream tasks. The emerging federated KGE (FKGE) collaboratively trains from distributed KGs held among clients while avoiding exchanging clients' sensitive raw KGs, which can still suffer from privacy threats as evidenced in other federated model trainings (e.g., neural networks). However, quantifying and defending against such privacy threats remain unexplored for FKGE which possesses unique properties not shared by previously studied models. In this paper, we conduct the first holistic study of the privacy threat on FKGE from both attack and defense perspectives. For the attack, we quantify the privacy threat by proposing three new inference attacks, which reveal substantial privacy risk by successfully inferring the existence of the KG triple from victim clients. For the defense, we propose DP-Flames, a novel differentially private FKGE with private selection, which offers a better privacy-utility tradeoff by exploiting the entity-binding sparse gradient property of FKGE and comes with a tight privacy accountant by incorporating the state-of-the-art private selection technique. We further propose an adaptive privacy budget allocation policy to dynamically adjust defense magnitude across the training procedure. Comprehensive evaluations demonstrate that the proposed defense can successfully mitigate the privacy threat by effectively reducing the success rate of inference attacks from $83.1%$ to $59.4%$ on average with only a modest utility decrease.

Adaptable and Interpretable Framework for Novelty Detection in Real-Time IoT Systems

  • Authors: Marek Wadinger, Michal Kvasnica
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02947
  • Pdf link: https://arxiv.org/pdf/2304.02947
  • Abstract
    This paper presents the Real-time Adaptive and Interpretable Detection (RAID) algorithm. The novel approach addresses the limitations of state-of-the-art anomaly detection methods for multivariate dynamic processes, which are restricted to detecting anomalies within the scope of the model training conditions. The RAID algorithm adapts to non-stationary effects such as data drift and change points that may not be accounted for during model development, resulting in prolonged service life. A dynamic model based on joint probability distribution handles anomalous behavior detection in a system and the root cause isolation based on adaptive process limits. RAID algorithm does not require changes to existing process automation infrastructures, making it highly deployable across different domains. Two case studies involving real dynamic system data demonstrate the benefits of the RAID algorithm, including change point adaptation, root cause isolation, and improved detection accuracy.

FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead

  • Authors: Kang Chen, Tao Han, Junchao Gong, Lei Bai, Fenghua Ling, Jing-Jia Luo, Xi Chen, Leiming Ma, Tianning Zhang, Rui Su, Yuanzheng Ci, Bin Li, Xiaokang Yang, Wanli Ouyang
  • Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
  • Arxiv link: https://arxiv.org/abs/2304.02948
  • Pdf link: https://arxiv.org/pdf/2304.02948
  • Abstract
    We present FengWu, an advanced data-driven global medium-range weather forecast system based on Artificial Intelligence (AI). Different from existing data-driven weather forecast methods, FengWu solves the medium-range forecast problem from a multi-modal and multi-task perspective. Specifically, a deep learning architecture equipped with model-specific encoder-decoders and cross-modal fusion Transformer is elaborately designed, which is learned under the supervision of an uncertainty loss to balance the optimization of different predictors in a region-adaptive manner. Besides this, a replay buffer mechanism is introduced to improve medium-range forecast performance. With 39-year data training based on the ERA5 reanalysis, FengWu is able to accurately reproduce the atmospheric dynamics and predict the future land and atmosphere states at 37 vertical levels on a 0.25{\deg} latitude-longitude resolution. Hindcasts of 6-hourly weather in 2018 based on ERA5 demonstrate that FengWu performs better than GraphCast in predicting 80% of the 880 reported predictands, e.g., reducing the root mean square error (RMSE) of 10-day lead global z500 prediction from 733 to 651 $m^{2}/s^2$. In addition, the inference cost of each iteration is merely 600ms on NVIDIA Tesla A100 hardware. The results suggest that FengWu can significantly improve the forecast skill and extend the skillful global medium-range weather forecast out to 10.75 days lead (with ACC of z500 > 0.6) for the first time.

Deep Long-Short Term Memory networks: Stability properties and Experimental validation

  • Authors: Fabio Bonassi, Alessio La Bella, Giulio Panzani, Marcello Farina, Riccardo Scattolini
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.02975
  • Pdf link: https://arxiv.org/pdf/2304.02975
  • Abstract
    The aim of this work is to investigate the use of Incrementally Input-to-State Stable ($\delta$ISS) deep Long Short Term Memory networks (LSTMs) for the identification of nonlinear dynamical systems. We show that suitable sufficient conditions on the weights of the network can be leveraged to setup a training procedure able to learn provenly-$\delta$ISS LSTM models from data. The proposed approach is tested on a real brake-by-wire apparatus to identify a model of the system from input-output experimentally collected data. Results show satisfactory modeling performances.

Distributed Model Predictive Control for Periodic Cooperation of Multi-Agent Systems

  • Authors: Matthias Köhler, Matthias A. Müller, Frank Allgöwer
  • Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.03002
  • Pdf link: https://arxiv.org/pdf/2304.03002
  • Abstract
    We consider multi-agent systems with heterogeneous, nonlinear agents subject to individual constraints that want to achieve a periodic, dynamic cooperative control goal which can be characterised by a set and a suitable cost. We propose a sequential distributed model predictive control (MPC) scheme in which agents sequentially solve an individual optimisation problem to track an artificial periodic output trajectory. The optimisation problems are coupled through these artificial periodic output trajectories, which are communicated and penalised using the cost that characterises the cooperative goal. The agents communicate only their artificial trajectories and only once per time step. We show that under suitable assumptions, the agents can incrementally move their artificial output trajectories towards the cooperative goal, and, hence, their closed-loop output trajectories asymptotically achieve it. We illustrate the scheme with a simulation example.

IoT Federated Blockchain Learning at the Edge

  • Authors: James Calo, Benny Lo
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.03006
  • Pdf link: https://arxiv.org/pdf/2304.03006
  • Abstract
    IoT devices are sorely underutilized in the medical field, especially within machine learning for medicine, yet they offer unrivaled benefits. IoT devices are low-cost, energy-efficient, small and intelligent devices. In this paper, we propose a distributed federated learning framework for IoT devices, more specifically for IoMT (Internet of Medical Things), using blockchain to allow for a decentralized scheme improving privacy and efficiency over a centralized system; this allows us to move from the cloud-based architectures, that are prevalent, to the edge. The system is designed for three paradigms: 1) Training neural networks on IoT devices to allow for collaborative training of a shared model whilst decoupling the learning from the dataset to ensure privacy. Training is performed in an online manner simultaneously amongst all participants, allowing for the training of actual data that may not have been present in a dataset collected in the traditional way and dynamically adapt the system whilst it is being trained. 2) Training of an IoMT system in a fully private manner such as to mitigate the issue with confidentiality of medical data and to build robust, and potentially bespoke, models where not much, if any, data exists. 3) Distribution of the actual network training, something federated learning itself does not do, to allow hospitals, for example, to utilize their spare computing resources to train network models.

Data-driven HVAC Control Using Symbolic Regression: Design and Implementation

  • Authors: Yuki Ozawa, Dafang Zhao, Daichi Watari, Ittetsu Taniguchi, Toshihiro Suzuki, Yoshiyuki Shimoda, Takao Onoye
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.03078
  • Pdf link: https://arxiv.org/pdf/2304.03078
  • Abstract
    The large amount of data collected in buildings makes energy management smarter and more energy efficient. This study proposes a design and implementation methodology of data-driven heating, ventilation, and air conditioning (HVAC) control. Building thermodynamics is modeled using a symbolic regression model (SRM) built from the collected data. Additionally, an HVAC system model is also developed with a data-driven approach. A model predictive control (MPC) based HVAC scheduling is formulated with the developed models to minimize energy consumption and peak power demand and maximize thermal comfort. The performance of the proposed framework is demonstrated in the workspace in the actual campus building. The HVAC system using the proposed framework reduces the peak power by 16.1% compared to the widely used thermostat controller.

Inductive Graph Unlearning

  • Authors: Cheng-Long Wang, Mengdi Huai, Di Wang
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.03093
  • Pdf link: https://arxiv.org/pdf/2304.03093
  • Abstract
    As a way to implement the "right to be forgotten" in machine learning, \textit{machine unlearning} aims to completely remove the contributions and information of the samples to be deleted from a trained model without affecting the contributions of other samples. Recently, many frameworks for machine unlearning have been proposed, and most of them focus on image and text data. To extend machine unlearning to graph data, \textit{GraphEraser} has been proposed. However, a critical issue is that \textit{GraphEraser} is specifically designed for the transductive graph setting, where the graph is static and attributes and edges of test nodes are visible during training. It is unsuitable for the inductive setting, where the graph could be dynamic and the test graph information is invisible in advance. Such inductive capability is essential for production machine learning systems with evolving graphs like social media and transaction networks. To fill this gap, we propose the \underline{{\bf G}}\underline{{\bf U}}ided \underline{{\bf I}}n\underline{{\bf D}}uctiv\underline{{\bf E}} Graph Unlearning framework (GUIDE). GUIDE consists of three components: guided graph partitioning with fairness and balance, efficient subgraph repair, and similarity-based aggregation. Empirically, we evaluate our method on several inductive benchmarks and evolving transaction graphs. Generally speaking, GUIDE can be efficiently implemented on the inductive graph learning tasks for its low graph partition cost, no matter on computation or structure information. The code will be available here: https://github.com/Happy2Git/GUIDE.

Constrained Exploration in Reinforcement Learning with Optimality Preservation

  • Authors: Peter C. Y. Chen
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.03104
  • Pdf link: https://arxiv.org/pdf/2304.03104
  • Abstract
    We consider a class of reinforcement-learning systems in which the agent follows a behavior policy to explore a discrete state-action space to find an optimal policy while adhering to some restriction on its behavior. Such restriction may prevent the agent from visiting some state-action pairs, possibly leading to the agent finding only a sub-optimal policy. To address this problem we introduce the concept of constrained exploration with optimality preservation, whereby the exploration behavior of the agent is constrained to meet a specification while the optimality of the (original) unconstrained learning process is preserved. We first establish a feedback-control structure that models the dynamics of the unconstrained learning process. We then extend this structure by adding a supervisor to ensure that the behavior of the agent meets the specification, and establish (for a class of reinforcement-learning problems with a known deterministic environment) a necessary and sufficient condition under which optimality is preserved. This work demonstrates the utility and the prospect of studying reinforcement-learning problems in the context of the theories of discrete-event systems, automata and formal languages.

A self-organizing robotic aggregate using solid and liquid-like collective states

  • Authors: Baudouin Saintyves, Matthew Spenko, Heinrich M. Jaeger
  • Subjects: Robotics (cs.RO); Soft Condensed Matter (cond-mat.soft); Adaptation and Self-Organizing Systems (nlin.AO)
  • Arxiv link: https://arxiv.org/abs/2304.03125
  • Pdf link: https://arxiv.org/pdf/2304.03125
  • Abstract
    Designing robotic systems that can change their physical form factor as well as their compliance to adapt to environmental constraints remains a major conceptual and technical challenge. To address this, we introduce the Granulobot, a modular system that blurs the distinction between soft, modular, and swarm robotics. The system consists of gear-like units that each contain a single actuator such that units can self-assemble into larger, granular aggregates using magnetic coupling. These aggregates can reconfigure dynamically and also split up into subsystems that might later recombine. Aggregates can self-organize into collective states with solid- and liquid-like properties, thus displaying widely differing compliances. These states can be perturbed locally via actuators or externally via mechanical feedback from the environment to produce adaptive shape shifting in a decentralized manner. This in turn can generate locomotion strategies adapted to different conditions. Aggregates can move over obstacles without using external sensors or coordinate to maintain a steady gait over different surfaces without electronic communication among units. The modular design highlights a physical, morphological form of control that advances the development of resilient robotic systems with the ability to morph and adapt to different functions and conditions.

From Saliency to DINO: Saliency-guided Vision Transformer for Few-shot Keypoint Detection

  • Authors: Changsheng Lu, Hao Zhu, Piotr Koniusz
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03140
  • Pdf link: https://arxiv.org/pdf/2304.03140
  • Abstract
    Unlike current deep keypoint detectors that are trained to recognize limited number of body parts, few-shot keypoint detection (FSKD) attempts to localize any keypoints, including novel or base keypoints, depending on the reference samples. FSKD requires the semantically meaningful relations for keypoint similarity learning to overcome the ubiquitous noise and ambiguous local patterns. One rescue comes with vision transformer (ViT) as it captures long-range relations well. However, ViT may model irrelevant features outside of the region of interest due to the global attention matrix, thus degrading similarity learning between support and query features. In this paper, we present a novel saliency-guided vision transformer, dubbed SalViT, for few-shot keypoint detection. Our SalViT enjoys a uniquely designed masked self-attention and a morphology learner, where the former introduces saliency map as a soft mask to constrain the self-attention on foregrounds, while the latter leverages the so-called power normalization to adjust morphology of saliency map, realizing ``dynamically changing receptive field''. Moreover, as salinecy detectors add computations, we show that attentive masks of DINO transformer can replace saliency. On top of SalViT, we also investigate i) transductive FSKD that enhances keypoint representations with unlabelled data and ii) FSKD under occlusions. We show that our model performs well on five public datasets and achieves ~10% PCK higher than the normally trained model under severe occlusions.

Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream

  • Authors: Yuheng Jiang, Kaixin Yao, Zhuo Su, Zhehao Shen, Haimin Luo, Lan Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03184
  • Pdf link: https://arxiv.org/pdf/2304.03184
  • Abstract
    Convenient 4D modeling of human-object interactions is essential for numerous applications. However, monocular tracking and rendering of complex interaction scenarios remain challenging. In this paper, we propose Instant-NVR, a neural approach for instant volumetric human-object tracking and rendering using a single RGBD camera. It bridges traditional non-rigid tracking with recent instant radiance field techniques via a multi-thread tracking-rendering mechanism. In the tracking front-end, we adopt a robust human-object capture scheme to provide sufficient motion priors. We further introduce a separated instant neural representation with a novel hybrid deformation module for the interacting scene. We also provide an on-the-fly reconstruction scheme of the dynamic/static radiance fields via efficient motion-prior searching. Moreover, we introduce an online key frame selection scheme and a rendering-aware refinement strategy to significantly improve the appearance details for online novel-view synthesis. Extensive experiments demonstrate the effectiveness and efficiency of our approach for the instant generation of human-object radiance fields on the fly, notably achieving real-time photo-realistic novel view synthesis under complex human-object interactions.

LANe: Lighting-Aware Neural Fields for Compositional Scene Synthesis

  • Authors: Akshay Krishnan, Amit Raj, Xianling Zhang, Alexandra Carlson, Nathan Tseng, Sandhya Sridhar, Nikita Jaipuria, James Hays
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03280
  • Pdf link: https://arxiv.org/pdf/2304.03280
  • Abstract
    Neural fields have recently enjoyed great success in representing and rendering 3D scenes. However, most state-of-the-art implicit representations model static or dynamic scenes as a whole, with minor variations. Existing work on learning disentangled world and object neural fields do not consider the problem of composing objects into different world neural fields in a lighting-aware manner. We present Lighting-Aware Neural Field (LANe) for the compositional synthesis of driving scenes in a physically consistent manner. Specifically, we learn a scene representation that disentangles the static background and transient elements into a world-NeRF and class-specific object-NeRFs to allow compositional synthesis of multiple objects in the scene. Furthermore, we explicitly designed both the world and object models to handle lighting variation, which allows us to compose objects into scenes with spatially varying lighting. This is achieved by constructing a light field of the scene and using it in conjunction with a learned shader to modulate the appearance of the object NeRFs. We demonstrate the performance of our model on a synthetic dataset of diverse lighting conditions rendered with the CARLA simulator, as well as a novel real-world dataset of cars collected at different times of the day. Our approach shows that it outperforms state-of-the-art compositional scene synthesis on the challenging dataset setup, via composing object-NeRFs learned from one scene into an entirely different scene whilst still respecting the lighting variations in the novel scene. For more results, please visit our project website https://lane-composition.github.io/.

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention

  • Authors: Mingyu Ding, Yikang Shen, Lijie Fan, Zhenfang Chen, Zitian Chen, Ping Luo, Joshua B. Tenenbaum, Chuang Gan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03282
  • Pdf link: https://arxiv.org/pdf/2304.03282
  • Abstract
    Humans possess a versatile mechanism for extracting structured representations of our visual world. When looking at an image, we can decompose the scene into entities and their parts as well as obtain the dependencies between them. To mimic such capability, we propose Visual Dependency Transformers (DependencyViT) that can induce visual dependencies without any labels. We achieve that with a novel neural operator called \emph{reversed attention} that can naturally capture long-range visual dependencies between image patches. Specifically, we formulate it as a dependency graph where a child token in reversed attention is trained to attend to its parent tokens and send information following a normalized probability distribution rather than gathering information in conventional self-attention. With such a design, hierarchies naturally emerge from reversed attention layers, and a dependency tree is progressively induced from leaf nodes to the root node unsupervisedly. DependencyViT offers several appealing benefits. (i) Entities and their parts in an image are represented by different subtrees, enabling part partitioning from dependencies; (ii) Dynamic visual pooling is made possible. The leaf nodes which rarely send messages can be pruned without hindering the model performance, based on which we propose the lightweight DependencyViT-Lite to reduce the computational and memory footprints; (iii) DependencyViT works well on both self- and weakly-supervised pretraining paradigms on ImageNet, and demonstrates its effectiveness on 8 datasets and 5 tasks, such as unsupervised part and saliency segmentation, recognition, and detection.

New submissions for Fri, 31 Mar 23

Keyword: efficient

Machine learning-based spin structure detection

  • Authors: Isaac Labrie-Boulay, Thomas Brian Winkler, Daniel Franzen, Alena Romanova, Hans Fangohr, Mathias Kläui
  • Subjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET); Data Analysis, Statistics and Probability (physics.data-an)
  • Arxiv link: https://arxiv.org/abs/2303.16905
  • Pdf link: https://arxiv.org/pdf/2303.16905
  • Abstract
    One of the most important magnetic spin structure is the topologically stabilised skyrmion quasi-particle. Its interesting physical properties make them candidates for memory and efficient neuromorphic computation schemes. For the device operation, detection of the position, shape, and size of skyrmions is required and magnetic imaging is typically employed. A frequently used technique is magneto-optical Kerr microscopy where depending on the samples material composition, temperature, material growing procedures, etc., the measurements suffer from noise, low-contrast, intensity gradients, or other optical artifacts. Conventional image analysis packages require manual treatment, and a more automatic solution is required. We report a convolutional neural network specifically designed for segmentation problems to detect the position and shape of skyrmions in our measurements. The network is tuned using selected techniques to optimize predictions and in particular the number of detected classes is found to govern the performance. The results of this study shows that a well-trained network is a viable method of automating data pre-processing in magnetic microscopy. The approach is easily extendable to other spin structures and other magnetic imaging methods.

Optimizing Reconfigurable Intelligent Surfaces for Short Transmissions: How Detailed Configurations can be Afforded?

  • Authors: Anders Enqvist, Özlem Tuğfe Demir, Cicek Cavdar, Emil Björnson
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.16913
  • Pdf link: https://arxiv.org/pdf/2303.16913
  • Abstract
    In this paper, we examine how to minimize the total energy consumption of a user equipment (UE) when it transmits a finite-sized data payload of a given length. The receiving base station (BS) controls a reconfigurable intelligent surface (RIS) that can be utilized to improve the channel conditions, but only if additional pilot signals are transmitted to configure the RIS. The challenge is that the pilot resources spent on configuring the RIS increase the energy consumption, especially when small payloads are transmitted, so it must be balanced against the energy savings during data transmission. We derive a formula for the energy consumption, taking both the pilot and data transmission power into account. It also includes the effects of imperfect channel state information, the use of phase-shifts with finite resolution at the RIS, and the passive circuit energy consumption. We also consider how dividing the RIS into subarrays consisting of multiple RIS elements using the same reflection coefficient can shorten the pilot length. In particular, the pilot power and subarray size are tuned to the payload length to minimize the energy consumption while maintaining parts of the aperture gain. Our analytical results show that, for a given geometry and transmission payload length, there exists a unique energy-minimizing subarray size and pilot power. For small payloads and when the channel conditions between the BS and UE are favorable compared to the path to the RIS, the energy consumption is minimized using subarrays with many elements and low pilot transmission power. On the other hand, when the channel conditions to the RIS are better and the data payloads are large, it is preferable to use fewer elements per subarray, potentially configuring each element individually and transmitting the pilot signals with additional power.

T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals

  • Authors: James Giroux, Martin Bouchard, Robert Laganiere
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16940
  • Pdf link: https://arxiv.org/pdf/2303.16940
  • Abstract
    Object detection utilizing Frequency Modulated Continous Wave radar is becoming increasingly popular in the field of autonomous systems. Radar does not possess the same drawbacks seen by other emission-based sensors such as LiDAR, primarily the degradation or loss of return signals due to weather conditions such as rain or snow. However, radar does possess traits that make it unsuitable for standard emission-based deep learning representations such as point clouds. Radar point clouds tend to be sparse and therefore information extraction is not efficient. To overcome this, more traditional digital signal processing pipelines were adapted to form inputs residing directly in the frequency domain via Fast Fourier Transforms. Commonly, three transformations were used to form Range-Azimuth-Doppler cubes in which deep learning algorithms could perform object detection. This too has drawbacks, namely the pre-processing costs associated with performing multiple Fourier Transforms and normalization. We explore the possibility of operating on raw radar inputs from analog to digital converters via the utilization of complex transformation layers. Moreover, we introduce hierarchical Swin Vision transformers to the field of radar object detection and show their capability to operate on inputs varying in pre-processing, along with different radar configurations, i.e. relatively low and high numbers of transmitters and receivers, while obtaining on par or better results than the state-of-the-art.

Concise QBF Encodings for Games on a Grid (extended version)

  • Authors: Irfansha Shaik, Jaco van de Pol
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16949
  • Pdf link: https://arxiv.org/pdf/2303.16949
  • Abstract
    Encoding 2-player games in QBF correctly and efficiently is challenging and error-prone. To enable concise specifications and uniform encodings of games played on grid boards, like Tic-Tac-Toe, Connect-4, Domineering, Pursuer-Evader and Breakthrough, we introduce Board-game Domain Definition Language (BDDL), inspired by the success of PDDL in the planning domain. We provide an efficient translation from BDDL into QBF, encoding the existence of a winning strategy of bounded depth. Our lifted encoding treats board positions symbolically and allows concise definitions of conditions, effects and winning configurations, relative to symbolic board positions. The size of the encoding grows linearly in the input model and the considered depth. To show the feasibility of such a generic approach, we use QBF solvers to compute the critical depths of winning strategies for instances of several known games. For several games, our work provides the first QBF encoding. Unlike plan validation in SAT-based planning, validating QBF-based winning strategies is difficult. We show how to validate winning strategies using QBF certificates and interactive game play.

Fairness-Aware Data Valuation for Supervised Learning

  • Authors: José Pombal, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro
  • Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2303.16963
  • Pdf link: https://arxiv.org/pdf/2303.16963
  • Abstract
    Data valuation is a ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how training instances may influence both performance and fairness of ML models. Thus, we propose Fairness-Aware Data vauatiOn (FADO), a data valuation framework that can be used to incorporate fairness concerns into a series of ML-related tasks (e.g., data pre-processing, exploratory data analysis, active learning). We propose an entropy-based data valuation metric suited to address our two-pronged goal of maximizing both performance and fairness, which is more computationally efficient than existing metrics. We then show how FADO can be applied as the basis for unfairness mitigation pre-processing techniques. Our methods achieve promising results -- up to a 40 p.p. improvement in fairness at a less than 1 p.p. loss in performance compared to a baseline -- and promote fairness in a data-centric way, where a deeper understanding of data quality takes center stage.

Computationally efficient sampling methods for sparsity promoting hierarchical Bayesian models

  • Authors: Daniela Calvetti, Erkki Somersalo
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.16988
  • Pdf link: https://arxiv.org/pdf/2303.16988
  • Abstract
    Bayesian hierarchical models have been demonstrated to provide efficient algorithms for finding sparse solutions to ill-posed inverse problems. The models comprise typically a conditionally Gaussian prior model for the unknown, augmented by a hyperprior model for the variances. A widely used choice for the hyperprior is a member of the family of generalized gamma distributions. Most of the work in the literature has concentrated on numerical approximation of the maximum a posteriori (MAP) estimates, and less attention has been paid on sampling methods or other means for uncertainty quantification. Sampling from the hierarchical models is challenging mainly for two reasons: The hierarchical models are typically high-dimensional, thus suffering from the curse of dimensionality, and the strong correlation between the unknown of interest and its variance can make sampling rather inefficient. This work addresses mainly the first one of these obstacles. By using a novel reparametrization, it is shown how the posterior distribution can be transformed into one dominated by a Gaussian white noise, allowing sampling by using the preconditioned Crank-Nicholson (pCN) scheme that has been shown to be efficient for sampling from distributions dominated by a Gaussian component. Furthermore, a novel idea for speeding up the pCN in a special case is developed, and the question of how strongly the hierarchical models are concentrated on sparse solutions is addressed in light of a computed example.

The G-invariant graph Laplacian

  • Authors: Eitan Rosen, Yoel Shkolnisky
  • Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2303.17001
  • Pdf link: https://arxiv.org/pdf/2303.17001
  • Abstract
    Graph Laplacian based algorithms for data lying on a manifold have been proven effective for tasks such as dimensionality reduction, clustering, and denoising. In this work, we consider data sets whose data point not only lie on a manifold, but are also closed under the action of a continuous group. An example of such data set is volumes that line on a low dimensional manifold, where each volume may be rotated in three-dimensional space. We introduce the G-invariant graph Laplacian that generalizes the graph Laplacian by accounting for the action of the group on the data set. We show that like the standard graph Laplacian, the G-invariant graph Laplacian converges to the Laplace-Beltrami operator on the data manifold, but with a significantly improved convergence rate. Furthermore, we show that the eigenfunctions of the G-invariant graph Laplacian admit the form of tensor products between the group elements and eigenvectors of certain matrices, which can be computed efficiently using FFT-type algorithms. We demonstrate our construction and its advantages on the problem of filtering data on a noisy manifold closed under the action of the special unitary group SU(2).

The secret of immersion: actor driven camera movement generation for auto-cinematography

  • Authors: Xinyi Wu, Haohong Wang, Aggelos K. Katsaggelos
  • Subjects: Multimedia (cs.MM); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17041
  • Pdf link: https://arxiv.org/pdf/2303.17041
  • Abstract
    Immersion plays a vital role when designing cinematic creations, yet the difficulty in immersive shooting prevents designers to create satisfactory outputs. In this work, we analyze the specific components that contribute to cinematographic immersion considering spatial, emotional, and aesthetic level, while these components are then combined into a high-level evaluation mechanism. Guided by such a immersion mechanism, we propose a GAN-based camera control system that is able to generate actor-driven camera movements in the 3D virtual environment to obtain immersive film sequences. The proposed encoder-decoder architecture in the generation flow transfers character motion into camera trajectory conditioned on an emotion factor. This ensures spatial and emotional immersion by performing actor-camera synchronization physically and psychologically. The emotional immersion is further strengthened by incorporating regularization that controls camera shakiness for expressing different mental statuses. To achieve aesthetic immersion, we make effort to improve aesthetic frame compositions by modifying the synthesized camera trajectory. Based on a self-supervised adjustor, the adjusted camera placements can project the character to the appropriate on-frame locations following aesthetic rules. The experimental results indicate that our proposed camera control system can efficiently offer immersive cinematic videos, both quantitatively and qualitatively, based on a fine-grained immersive shooting. Live examples are shown in the supplementary video.

Material-agnostic Shaping of Granular Materials with Optimal Transport

  • Authors: Nikhilesh Alatur, Olov Andersson, Roland Siegwart, Lionel Ott
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17047
  • Pdf link: https://arxiv.org/pdf/2303.17047
  • Abstract
    From construction materials, such as sand or asphalt, to kitchen ingredients, like rice, sugar, or salt; the world is full of granular materials. Despite impressive progress in robotic manipulation, manipulating and interacting with granular material remains a challenge due to difficulties in perceiving, representing, modelling, and planning for these variable materials that have complex internal dynamics. While some prior work has looked into estimating or learning accurate dynamics models for granular materials, the literature is still missing a more abstract planning method that can be used for planning manipulation actions for granular materials with unknown material properties. In this work, we leverage tools from optimal transport and connect them to robot motion planning. We propose a heuristics-based sweep planner that does not require knowledge of the material's properties and directly uses a height map representation to generate promising sweeps. These sweeps transform granular material from arbitrary start shapes into arbitrary target shapes. We apply the sweep planner in a fast and reactive feedback loop and avoid the need for model-based planning over multiple time steps. We validate our approach with a large set of simulation and hardware experiments where we show that our method is capable of efficiently solving several complex tasks, including gathering, separating, and shaping of several types of granular materials into different target shapes.

Transductive few-shot adapters for medical image segmentation

  • Authors: Julio Silva-Rodríguez, Jose Dolz, Ismail Ben Ayed
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17051
  • Pdf link: https://arxiv.org/pdf/2303.17051
  • Abstract
    With the recent raise of foundation models in computer vision and NLP, the pretrain-and-adapt strategy, where a large-scale model is fine-tuned on downstream tasks, is gaining popularity. However, traditional fine-tuning approaches may still require significant resources and yield sub-optimal results when the labeled data of the target task is scarce. This is especially the case in clinical settings. To address this challenge, we formalize few-shot efficient fine-tuning (FSEFT), a novel and realistic setting for medical image segmentation. Furthermore, we introduce a novel parameter-efficient fine-tuning strategy tailored to medical image segmentation, with (a) spatial adapter modules that are more appropriate for dense prediction tasks; and (b) a constrained transductive inference, which leverages task-specific prior knowledge. Our comprehensive experiments on a collection of public CT datasets for organ segmentation reveal the limitations of standard fine-tuning methods in few-shot scenarios, point to the potential of vision adapters and transductive inference, and confirm the suitability of foundation models.

A Tensor-based Convolutional Neural Network for Small Dataset Classification

  • Authors: Zhenhua Chen, David Crandall
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2303.17061
  • Pdf link: https://arxiv.org/pdf/2303.17061
  • Abstract
    Inspired by the ConvNets with structured hidden representations, we propose a Tensor-based Neural Network, TCNN. Different from ConvNets, TCNNs are composed of structured neurons rather than scalar neurons, and the basic operation is neuron tensor transformation. Unlike other structured ConvNets, where the part-whole relationships are modeled explicitly, the relationships are learned implicitly in TCNNs. Also, the structured neurons in TCNNs are high-rank tensors rather than vectors or matrices. We compare TCNNs with current popular ConvNets, including ResNets, MobileNets, EfficientNets, RegNets, etc., on CIFAR10, CIFAR100, and Tiny ImageNet. The experiment shows that TCNNs have higher efficiency in terms of parameters. TCNNs also show higher robustness against white-box adversarial attacks on MNIST compared to ConvNets.

Reading Strategies for Graph Visualizations that Wrap Around in Torus Topology

  • Authors: Kun-Ting Chen, Quynh Quang Ngo, Kuno Kurzhals, Kim Marriott, Tim Dwyer, Michael Sedlmair, Daniel Weiskopf
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2303.17066
  • Pdf link: https://arxiv.org/pdf/2303.17066
  • Abstract
    We investigate reading strategies for node-link diagrams that wrap around the boundaries in a flattened torus topology by examining eye tracking data recorded in a previous controlled study. Prior work showed that torus drawing affords greater flexibility in clutter reduction than traditional node-link representations, but impedes link-and-path exploration tasks, while repeating tiles around boundaries aids comprehension. However, it remains unclear what strategies users apply in different wrapping settings. This is important for design implications for future work on more effective wrapped visualizations for network applications, and cyclic data that could benefit from wrapping. We perform visual-exploratory data analysis of gaze data, and conduct statistical tests derived from the patterns identified. Results show distinguishable gaze behaviors, with more visual glances and transitions between areas of interest in the non-replicated layout. Full-context has more successful visual searches than partial-context, but the gaze allocation indicates that the layout could be more space-efficient.

Dependent Task Offloading in Edge Computing Using GNN and Deep Reinforcement Learning

  • Authors: Zequn Cao, Xiaoheng Deng
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.17100
  • Pdf link: https://arxiv.org/pdf/2303.17100
  • Abstract
    Task offloading is a widely used technology in Mobile Edge Computing (MEC), which declines the completion time of user task with the help of resourceful edge servers. Existing works mainly focus on the case that the computation density of a user task is homogenous so that it can be offloaded in full or by percentage. However, various user tasks in real life consist of several inner dependent subtasks, each of which is a minimum execution unit logically. Motivated by this gap, we aim to solve the Dependent Task Offloading (DTO) problem under multi-user multi-edge scenario in this paper. We firstly use Directed Acyclic Graph (DAG) to represent dependent task where nodes indicate subtasks and directed edges indicate dependencies among subtasks. Then we propose a scheme based on Graph Attention Network (GAT) and Deep Reinforcement Learning (DRL) to minimize the makespan of user tasks. To utilize GAT efficiently, we put the training of it on resourceful cloud in unsupervised style due to the numerous data and computation resource requirements. In addition, we design a multi-discrete Action space for DRL algorithm to enhance the applicability of our proposed scheme. Experiments are conducted on broadly distributed synthetic data. The results demonstrate that our proposed approach can be adapted to both simple and complex MEC environments and outperforms other methods.

Deep Generative Model and Its Applications in Efficient Wireless Network Management: A Tutorial and Case Study

  • Authors: Yinqiu Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Abbas Jamalipour
  • Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17114
  • Pdf link: https://arxiv.org/pdf/2303.17114
  • Abstract
    With the phenomenal success of diffusion models and ChatGPT, deep generation models (DGMs) have been experiencing explosive growth from 2022. Not limited to content generation, DGMs are also widely adopted in Internet of Things, Metaverse, and digital twin, due to their outstanding ability to represent complex patterns and generate plausible samples. In this article, we explore the applications of DGMs in a crucial task, i.e., improving the efficiency of wireless network management. Specifically, we firstly overview the generative AI, as well as three representative DGMs. Then, a DGM-empowered framework for wireless network management is proposed, in which we elaborate the issues of the conventional network management approaches, why DGMs can address them efficiently, and the step-by-step workflow for applying DGMs in managing wireless networks. Moreover, we conduct a case study on network economics, using the state-of-the-art DGM model, i.e., diffusion model, to generate effective contracts for incentivizing the mobile AI-Generated Content (AIGC) services. Last but not least, we discuss important open directions for the further research.

Conservation and stability in a discontinuous Galerkin method for the vector invariant spherical shallow water equations

  • Authors: Kieran Ricardo, David Lee, Kenneth Duru
  • Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
  • Arxiv link: https://arxiv.org/abs/2303.17120
  • Pdf link: https://arxiv.org/pdf/2303.17120
  • Abstract
    We develop a novel and efficient discontinuous Galerkin spectral element method (DG-SEM) for the spherical rotating shallow water equations in vector invariant form. We prove that the DG-SEM is energy stable, and discretely conserves mass, vorticity, and linear geostrophic balance on general curvlinear meshes. These theoretical results are possible due to our novel entropy stable numerical DG fluxes for the shallow water equations in vector invariant form. We experimentally verify these results on a cubed sphere mesh. Additionally, we show that our method is robust, that is can be run stably without any dissipation. The entropy stable fluxes are sufficient to control the grid scale noise generated by geostrophic turbulence without the need for artificial stabilisation.

C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation

  • Authors: Nazmul Karim, Niluthpol Chowdhury Mithun, Abhinav Rajvanshi, Han-pang Chiu, Supun Samarasekera, Nazanin Rahnavard
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17132
  • Pdf link: https://arxiv.org/pdf/2303.17132
  • Abstract
    Unsupervised domain adaptation (UDA) approaches focus on adapting models trained on a labeled source domain to an unlabeled target domain. UDA methods have a strong assumption that the source data is accessible during adaptation, which may not be feasible in many real-world scenarios due to privacy concerns and resource constraints of devices. In this regard, source-free domain adaptation (SFDA) excels as access to source data is no longer required during adaptation. Recent state-of-the-art (SOTA) methods on SFDA mostly focus on pseudo-label refinement based self-training which generally suffers from two issues: i) inevitable occurrence of noisy pseudo-labels that could lead to early training time memorization, ii) refinement process requires maintaining a memory bank which creates a significant burden in resource constraint scenarios. To address these concerns, we propose C-SFDA, a curriculum learning aided self-training framework for SFDA that adapts efficiently and reliably to changes across domains based on selective pseudo-labeling. Specifically, we employ a curriculum learning scheme to promote learning from a restricted amount of pseudo labels selected based on their reliabilities. This simple yet effective step successfully prevents label noise propagation during different stages of adaptation and eliminates the need for costly memory-bank based label refinement. Our extensive experimental evaluations on both image recognition and semantic segmentation tasks confirm the effectiveness of our method. C-SFDA is readily applicable to online test-time domain adaptation and also outperforms previous SOTA methods in this task.

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

  • Authors: Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17144
  • Pdf link: https://arxiv.org/pdf/2303.17144
  • Abstract
    Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research. To address this gap, we present DAMO-StreamNet, an optimized framework that combines recent advances from the YOLO series with a comprehensive analysis of spatial and temporal perception mechanisms, delivering a cutting-edge solution. The key innovations of DAMO-StreamNet are: (1) A robust neck structure incorporating deformable convolution, enhancing the receptive field and feature alignment capabilities. (2) A dual-branch structure that integrates short-path semantic features and long-path temporal features, improving motion state prediction accuracy. (3) Logits-level distillation for efficient optimization, aligning the logits of teacher and student networks in semantic space. (4) A real-time forecasting mechanism that updates support frame features with the current frame, ensuring seamless streaming perception during inference. Our experiments demonstrate that DAMO-StreamNet surpasses existing state-of-the-art methods, achieving 37.8% (normal size (600, 960)) and 43.3% (large size (1200, 1920)) sAP without using extra data. This work not only sets a new benchmark for real-time perception but also provides valuable insights for future research. Additionally, DAMO-StreamNet can be applied to various autonomous systems, such as drones and robots, paving the way for real-time perception.

Convergence of the CEM-GMsFEM for compressible flow in highly heterogeneous media

  • Authors: Leonardo A. Poveda, Shubin Fu, Eric T. Chung, Lina Zhao
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.17157
  • Pdf link: https://arxiv.org/pdf/2303.17157
  • Abstract
    This paper presents and analyses a Constraint Energy Minimization Generalized Multiscale Finite Element Method (CEM-GMsFEM) for solving single-phase non-linear compressible flows in highly heterogeneous media. The construction of CEM-GMsFEM hinges on two crucial steps: First, the auxiliary space is constructed by solving local spectral problems, where the basis functions corresponding to small eigenvalues are captured. Then the basis functions are obtained by solving local energy minimization problems over the oversampling domains using the auxiliary space. The basis functions have exponential decay outside the corresponding local oversampling regions. The convergence of the proposed method is provided, and we show that this convergence only depends on the coarse grid size and is independent of the heterogeneities. An online enrichment guided by \emph{a posteriori} error estimator is developed to enhance computational efficiency. Several numerical experiments on a three-dimensional case to confirm the theoretical findings are presented, illustrating the performance of the method and giving efficient and accurate numerical.

Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models

  • Authors: Sifan Long, Zhen Zhao, Junkun Yuan, Zichang Tan, Jiangjiang Liu, Luping Zhou, Shengsheng Wang, Jingdong Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17169
  • Pdf link: https://arxiv.org/pdf/2303.17169
  • Abstract
    Prompt learning has become one of the most efficient paradigms for adapting large pre-trained vision-language models to downstream tasks. Current state-of-the-art methods, like CoOp and ProDA, tend to adopt soft prompts to learn an appropriate prompt for each specific task. Recent CoCoOp further boosts the base-to-new generalization performance via an image-conditional prompt. However, it directly fuses identical image semantics to prompts of different labels and significantly weakens the discrimination among different classes as shown in our experiments. Motivated by this observation, we first propose a class-aware text prompt (CTP) to enrich generated prompts with label-related image information. Unlike CoCoOp, CTP can effectively involve image semantics and avoid introducing extra ambiguities into different prompts. On the other hand, instead of reserving the complete image representations, we propose text-guided feature tuning (TFT) to make the image branch attend to class-related representation. A contrastive loss is employed to align such augmented text and image representations on downstream tasks. In this way, the image-to-text CTP and text-to-image TFT can be mutually promoted to enhance the adaptation of VLMs for downstream tasks. Extensive experiments demonstrate that our method outperforms the existing methods by a significant margin. Especially, compared to CoCoOp, we achieve an average improvement of 4.03% on new classes and 3.19% on harmonic-mean over eleven classification benchmarks.

High-Performance Low-Complexity Hierarchical Frequency Synchronization for Distributed Massive MIMO-OFDMA Systems

  • Authors: Xiao-Yang Wang, Shaoshi Yang, Tian-Hao Yuan, Hou-Yu Zhai, Jianhua Zhang, Lajos Hanzo
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.17188
  • Pdf link: https://arxiv.org/pdf/2303.17188
  • Abstract
    We propose a high-performance yet low-complexity hierarchical frequency synchronization scheme for orthogonal frequency-division multiple-access (OFDMA) aided distributed massive multi-input multi-output (MIMO) systems, where multi-ple carrier frequency offsets (CFOs) have to be estimated in the uplink. To solve this multi-CFO estimation problem efficiently, we classify the active antenna units (AAUs) as the master and the slaves. Then, we split the scheme into two stages. During the first stage the distributed slave AAUs are synchronized with the master AAU, while the user equipment (UE) is synchronized with the closest slave AAU during the second stage. The mean square error (MSE) performance of our scheme is better than that of the representative state-of-the-art baseline schemes, while its computational complexity is substantially lower.

Practical self-supervised continual learning with continual fine-tuning

  • Authors: Chi Ian Tang, Lorena Qendro, Dimitris Spathis, Fahim Kawsar, Cecilia Mascolo, Akhil Mathur
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17235
  • Pdf link: https://arxiv.org/pdf/2303.17235
  • Abstract
    Self-supervised learning (SSL) has shown remarkable performance in computer vision tasks when trained offline. However, in a Continual Learning (CL) scenario where new data is introduced progressively, models still suffer from catastrophic forgetting. Retraining a model from scratch to adapt to newly generated data is time-consuming and inefficient. Previous approaches suggested re-purposing self-supervised objectives with knowledge distillation to mitigate forgetting across tasks, assuming that labels from all tasks are available during fine-tuning. In this paper, we generalize self-supervised continual learning in a practical setting where available labels can be leveraged in any step of the SSL process. With an increasing number of continual tasks, this offers more flexibility in the pre-training and fine-tuning phases. With Kaizen, we introduce a training architecture that is able to mitigate catastrophic forgetting for both the feature extractor and classifier with a carefully designed loss function. By using a set of comprehensive evaluation metrics reflecting different aspects of continual learning, we demonstrated that Kaizen significantly outperforms previous SSL models in competitive vision benchmarks, with up to 16.5% accuracy improvement on split CIFAR-100. Kaizen is able to balance the trade-off between knowledge retention and learning from new data with an end-to-end model, paving the way for practical deployment of continual learning systems.

Simultaneous reconstruction of sound speed and nonlinearity parameter in a paraxial model of vibro-acoustography in frequency domain

  • Authors: Barbara Kaltenbacher ans teresa Rauscher
  • Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
  • Arxiv link: https://arxiv.org/abs/2303.17236
  • Pdf link: https://arxiv.org/pdf/2303.17236
  • Abstract
    In this paper we consider the inverse problem of vibro-acoustography, a technique for enhancing ultrasound imaging by making use of nonlinear effects. It amounts to determining two spatially variable coefficients in a system of PDEs describing propagation of two directed sound beams and the wave resulting from their nonlinear interaction. To justify the use of Newton's method for solving this inverse problem, on one hand we verify well-definedeness and differentiability of the forward operator corresponding to two versions of the PDE model; on the other hand we consider an all-at-once formulation of the inverse problem and prove convergence of Newton's method for its solution.

Computationally efficient predictive control based on ANN state-space model

  • Authors: Jan H. Hoekstra, Bence Cseppentő, Gerben I. Beintema, Maarten Schoukens, Zsolt Kollár, Roland Tóth
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.17305
  • Pdf link: https://arxiv.org/pdf/2303.17305
  • Abstract
    Artificial neural networks (ANN) have been shown to be flexible and effective function estimators for identification of nonlinear state-space models. However, if the resulting models are used directly for nonlinear model predictive control (NMPC), the resulting nonlinear optimization problem is often overly complex due the size of the network, requires the use of high-order observers to track the states of the ANN model, and the overall control scheme exploits little of the structural properties or available autograd tools for these models. In this paper, we propose an efficient approach to auto-convert ANN state-space models to linear parameter-varying (LPV) form and solve predictive control problems by successive solutions of linear model predictive problems, corresponding to quadratic programs (QPs). Furthermore, we show how existing ANN identification methods, such as the SUBNET method that uses a state encoder, can provide efficient implementation of MPCs. The performance of the proposed approach is demonstrated via a simulation study on an unbalanced disc system.

Masked Autoencoders as Image Processors

  • Authors: Huiyu Duan, Wei Shen, Xiongkuo Min, Danyang Tu, Long Teng, Jia Wang, Guangtao Zhai
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17316
  • Pdf link: https://arxiv.org/pdf/2303.17316
  • Abstract
    Transformers have shown significant effectiveness for various vision tasks including both high-level vision and low-level vision. Recently, masked autoencoders (MAE) for feature pre-training have further unleashed the potential of Transformers, leading to state-of-the-art performances on various high-level vision tasks. However, the significance of MAE pre-training on low-level vision tasks has not been sufficiently explored. In this paper, we show that masked autoencoders are also scalable self-supervised learners for image processing tasks. We first present an efficient Transformer model considering both channel attention and shifted-window-based self-attention termed CSformer. Then we develop an effective MAE architecture for image processing (MAEIP) tasks. Extensive experimental results show that with the help of MAEIP pre-training, our proposed CSformer achieves state-of-the-art performance on various image processing tasks, including Gaussian denoising, real image denoising, single-image motion deblurring, defocus deblurring, and image deraining.

Topics in the Haystack: Extracting and Evaluating Topics beyond Coherence

  • Authors: Anton Thielmann, Quentin Seifert, Arik Reuter, Elisabeth Bergherr, Benjamin Säfken
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2303.17324
  • Pdf link: https://arxiv.org/pdf/2303.17324
  • Abstract
    Extracting and identifying latent topics in large text corpora has gained increasing importance in Natural Language Processing (NLP). Most models, whether probabilistic models similar to Latent Dirichlet Allocation (LDA) or neural topic models, follow the same underlying approach of topic interpretability and topic extraction. We propose a method that incorporates a deeper understanding of both sentence and document themes, and goes beyond simply analyzing word frequencies in the data. This allows our model to detect latent topics that may include uncommon words or neologisms, as well as words not present in the documents themselves. Additionally, we propose several new evaluation metrics based on intruder words and similarity measures in the semantic space. We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task. We demonstrate the competitive performance of our method with a large benchmark study, and achieve superior results compared to state-of-the-art topic modeling and document clustering models.

Linear Insertion Deletion Codes in the High-Noise and High-Rate Regimes

  • Authors: Kuan Cheng, Zhengzhong Jin, Xin Li, Zhide Wei, Yu Zheng
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2303.17370
  • Pdf link: https://arxiv.org/pdf/2303.17370
  • Abstract
    This work continues the study of linear error correcting codes against adversarial insertion deletion errors (insdel errors). Previously, the work of Cheng, Guruswami, Haeupler, and Li \cite{CGHL21} showed the existence of asymptotically good linear insdel codes that can correct arbitrarily close to $1$ fraction of errors over some constant size alphabet, or achieve rate arbitrarily close to $1/2$ even over the binary alphabet. As shown in \cite{CGHL21}, these bounds are also the best possible. However, known explicit constructions in \cite{CGHL21}, and subsequent improved constructions by Con, Shpilka, and Tamo \cite{9770830} all fall short of meeting these bounds. Over any constant size alphabet, they can only achieve rate $&lt; 1/8$ or correct $&lt; 1/4$ fraction of errors; over the binary alphabet, they can only achieve rate $&lt; 1/1216$ or correct $&lt; 1/54$ fraction of errors. Apparently, previous techniques face inherent barriers to achieve rate better than $1/4$ or correct more than $1/2$ fraction of errors. In this work we give new constructions of such codes that meet these bounds, namely, asymptotically good linear insdel codes that can correct arbitrarily close to $1$ fraction of errors over some constant size alphabet, and binary asymptotically good linear insdel codes that can achieve rate arbitrarily close to $1/2$.\ All our constructions are efficiently encodable and decodable. Our constructions are based on a novel approach of code concatenation, which embeds the index information implicitly into codewords. This significantly differs from previous techniques and may be of independent interest. Finally, we also prove the existence of linear concatenated insdel codes with parameters that match random linear codes, and propose a conjecture about linear insdel codes.

Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions

  • Authors: Yicheng Luo, Jackie Kay, Edward Grefenstette, Marc Peter Deisenroth
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17396
  • Pdf link: https://arxiv.org/pdf/2303.17396
  • Abstract
    Offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment. Online finetuning of such offline models can further improve performance. But how should we ideally finetune agents obtained from offline RL training? While offline RL algorithms can in principle be used for finetuning, in practice, their online performance improves slowly. In contrast, we show that it is possible to use standard online off-policy algorithms for faster improvement. However, we find this approach may suffer from policy collapse, where the policy undergoes severe performance deterioration during initial online learning. We investigate the issue of policy collapse and how it relates to data diversity, algorithm choices and online replay distribution. Based on these insights, we propose a conservative policy optimization procedure that can achieve stable and sample-efficient online learning from offline pretraining.

An Efficient Mobile Gateway Selection and Discovery Based-Routing Protocol in Heterogeneous LTE-VANET Networks

  • Authors: Driss Abada, Rachid Adrdor, Omar Boutkhoum, Adil Bohouch
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2303.17439
  • Pdf link: https://arxiv.org/pdf/2303.17439
  • Abstract
    Coupling cellular communication networks with vehicular ad hoc networks (VANET) can be a very interesting way out for providing Internet access to vehicles in the road. However, due to the several specific characteristics of VANETs, making an efficient multi-hop routing from vehicular sources to the Internet gateways through Long Term Evolution (LTE) technology is still challenging. In this paper, an Internet mobile gateway selection scheme is proposed to elect more potential vehicles to behave as gateways to Internet in VANETs. Therefore, the discovery and the selection of route to those mobiles gateways is carried out via an efficient multiple metrics-based relay selection mechanism. The objective is to select the more reliable route to the mobile gateways, by reducing the communication overhead and performing seamless handover. The proposed protocol is compared with one recent protocol based on packet delivery ratio, average end-to-end delay and overhead. The results show that the proposed protocol ameliorates significantly the network performance in the contrast of the other protocol.

NN-Copula-CD: A Copula-Guided Interpretable Neural Network for Change Detection in Heterogeneous Remote Sensing Images

  • Authors: Weiming Li, Xueqian Wang, Gang Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2303.17448
  • Pdf link: https://arxiv.org/pdf/2303.17448
  • Abstract
    Change detection (CD) in heterogeneous remote sensing images is a practical and challenging issue for real-life emergencies. In the past decade, the heterogeneous CD problem has significantly benefited from the development of deep neural networks (DNN). However, the data-driven DNNs always perform like a black box where the lack of interpretability limits the trustworthiness and controllability of DNNs in most practical CD applications. As a strong knowledge-driven tool to measure correlation between random variables, Copula theory has been introduced into CD, yet it suffers from non-robust CD performance without manual prior selection for Copula functions. To address the above issues, we propose a knowledge-data-driven heterogeneous CD method (NN-Copula-CD) based on the Copula-guided interpretable neural network. In our NN-Copula-CD, the mathematical characteristics of Copula are designed as the losses to supervise a simple fully connected neural network to learn the correlation between bi-temporal image patches, and then the changed regions are identified via binary classification for the correlation coefficients of all image patch pairs of the bi-temporal images. We conduct in-depth experiments on three datasets with multimodal images (e.g., Optical, SAR, and NIR), where the quantitative results and visualized analysis demonstrate both the effectiveness and interpretability of the proposed NN-Copula-CD.

HMES: A Scalable Human Mobility and Epidemic Simulation System with Fast Intervention Modeling

  • Authors: Haoyu Geng, Guanjie Zheng, Zhengqing Han, Hua Wei, Zhenhui Li
  • Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2303.17464
  • Pdf link: https://arxiv.org/pdf/2303.17464
  • Abstract
    Recently, the world has witnessed the most severe pandemic (COVID-19) in this century. Studies on epidemic prediction and simulation have received increasing attention. However, the current methods suffer from three issues. First, most of the current studies focus on epidemic prediction, which can not provide adequate support for intervention policy making. Second, most of the current interventions are based on population groups rather than fine-grained individuals, which can not make the measures towards the infected people and may cause waste of medical resources. Third, current simulations are not efficient and flexible enough for large-scale complex systems. In this paper, we propose a new epidemic simulation framework called HMES to address the above three challenges. The proposed framework covers a full pipeline of epidemic simulation and enables comprehensive fine-grained control in a large scale. In addition, we conduct experiments on real COVID-19 data. HMES demonstrates more accurate modeling of disease transmission up to 300 million people and up to 3 times acceleration compared to the state-of-the-art methods.

PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation

  • Authors: Qitao Zhao, Ce Zheng, Mengyuan Liu, Pichao Wang, Chen Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17472
  • Pdf link: https://arxiv.org/pdf/2303.17472
  • Abstract
    Recently, transformer-based methods have gained significant success in sequential 2D-to-3D lifting human pose estimation. As a pioneering work, PoseFormer captures spatial relations of human joints in each video frame and human dynamics across frames with cascaded transformer layers and has achieved impressive performance. However, in real scenarios, the performance of PoseFormer and its follow-ups is limited by two factors: (a) The length of the input joint sequence; (b) The quality of 2D joint detection. Existing methods typically apply self-attention to all frames of the input sequence, causing a huge computational burden when the frame number is increased to obtain advanced estimation accuracy, and they are not robust to noise naturally brought by the limited capability of 2D joint detectors. In this paper, we propose PoseFormerV2, which exploits a compact representation of lengthy skeleton sequences in the frequency domain to efficiently scale up the receptive field and boost robustness to noisy 2D joint detection. With minimum modifications to PoseFormer, the proposed method effectively fuses features both in the time domain and frequency domain, enjoying a better speed-accuracy trade-off than its precursor. Extensive experiments on two benchmark datasets (i.e., Human3.6M and MPI-INF-3DHP) demonstrate that the proposed approach significantly outperforms the original PoseFormer and other transformer-based variants. Code is released at \url{https://github.com/QitaoZhao/PoseFormerV2}.

Efficient distributed representations beyond negative sampling

  • Authors: Lorenzo Dall'Amico, Enrico Maria Belliardo
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2303.17475
  • Pdf link: https://arxiv.org/pdf/2303.17475
  • Abstract
    This article describes an efficient method to learn distributed representations, also known as embeddings. This is accomplished minimizing an objective function similar to the one introduced in the Word2Vec algorithm and later adopted in several works. The optimization computational bottleneck is the calculation of the softmax normalization constants for which a number of operations scaling quadratically with the sample size is required. This complexity is unsuited for large datasets and negative sampling is a popular workaround, allowing one to obtain distributed representations in linear time with respect to the sample size. Negative sampling consists, however, in a change of the loss function and hence solves a different optimization problem from the one originally proposed. Our contribution is to show that the sotfmax normalization constants can be estimated in linear time, allowing us to design an efficient optimization strategy to learn distributed representations. We test our approximation on two popular applications related to word and node embeddings. The results evidence competing performance in terms of accuracy with respect to negative sampling with a remarkably lower computational time.

Teaching contact-rich tasks from visual demonstrations by constraint extraction

  • Authors: Christian Hegeler, Filippo Rozzi, Loris Roveda, Kevin Haninger
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17481
  • Pdf link: https://arxiv.org/pdf/2303.17481
  • Abstract
    Contact-rich manipulation involves kinematic constraints on the task motion, typically with discrete transitions between these constraints during the task. Allowing the robot to detect and reason about these contact constraints can support robust and dynamic manipulation, but how can these contact models be efficiently learned? Purely visual observations are an attractive data source, allowing passive task demonstrations with unmodified objects. Existing approaches for vision-only learning from demonstration are effective in pick-and-place applications and planar tasks. Nevertheless, accuracy/occlusions and unobserved task dynamics can limit their robustness in contact-rich manipulation. To use visual demonstrations for contact-rich robotic tasks, we consider the demonstration of pose trajectories with transitions between holonomic kinematic constraints, first clustering the trajectories into discrete contact modes, then fitting kinematic constraints per each mode. The fit constraints are then used to (i) detect contact online with force/torque measurements and (ii) plan the robot policy with respect to the active constraint. We demonstrate the approach with real experiments, on cabling and rake tasks, showing the approach gives robust manipulation through contact transitions.

Edge Ranking of Graphs in Transportation Networks using a Graph Neural Network (GNN)

  • Authors: Debasish Jana, Sven Malama, Sriram Narasimhan, Ertugrul Taciroglu
  • Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17485
  • Pdf link: https://arxiv.org/pdf/2303.17485
  • Abstract
    Many networks, such as transportation, power, and water distribution, can be represented as graphs. Crucial challenge in graph representations is identifying the importance of graph edges and their influence on overall network efficiency and information flow performance. For example, important edges in a transportation network are those roads that, when affected, will significantly alter the network's overall efficiency. Commonly used approach to finding such important edges is ``edge betweenness centrality'' (EBC), an edge ranking measure to determine the influential edges of the graph based on connectivity and information spread. Computing the EBC utilizing the common Brandes algorithm involves calculating the shortest paths for every node pair, which can be computationally expensive and restrictive, especially for large graphs. Changes in the graph parameters, e.g., in the edge weight or the addition and deletion of nodes or edges, require the recalculation of the EBC. As the main contribution, we propose an approximate method to estimate the EBC using a Graph Neural Network (GNN), a deep learning-based approach. We show that it is computationally efficient compared to the conventional method, especially for large graphs. The proposed method of GNN-based edge ranking is evaluated on several synthetic graphs and a real-world transportation data set. We show that this framework can estimate the approximate edge ranking much faster compared to the conventional method. This approach is inductive, i.e., training and testing are performed on different sets of graphs with varying numbers of nodes and edges. The proposed method is especially suitable for applications on large-scale networks when edge information is desired, for example, in urban infrastructure improvement projects, power, and water network resilience analyses, and optimizing resource allocations in engineering networks.

3D Line Mapping Revisited

  • Authors: Shaohui Liu, Yifan Yu, Rémi Pautrat, Marc Pollefeys, Viktor Larsson
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17504
  • Pdf link: https://arxiv.org/pdf/2303.17504
  • Abstract
    In contrast to sparse keypoints, a handful of line segments can concisely encode the high-level scene layout, as they often delineate the main structural elements. In addition to offering strong geometric cues, they are also omnipresent in urban landscapes and indoor scenes. Despite their apparent advantages, current line-based reconstruction methods are far behind their point-based counterparts. In this paper we aim to close the gap by introducing LIMAP, a library for 3D line mapping that robustly and efficiently creates 3D line maps from multi-view imagery. This is achieved through revisiting the degeneracy problem of line triangulation, carefully crafted scoring and track building, and exploiting structural priors such as line coincidence, parallelism, and orthogonality. Our code integrates seamlessly with existing point-based Structure-from-Motion methods and can leverage their 3D points to further improve the line reconstruction. Furthermore, as a byproduct, the method is able to recover 3D association graphs between lines and points / vanishing points (VPs). In thorough experiments, we show that LIMAP significantly outperforms existing approaches for 3D line mapping. Our robust 3D line maps also open up new research directions. We show two example applications: visual localization and bundle adjustment, where integrating lines alongside points yields the best results. Code is available at https://github.com/cvg/limap.

Sum-of-Squares Lower Bounds for Densest $k$-Subgraph

  • Authors: Chris Jones, Aaron Potechin, Goutham Rajendran, Jeff Xu
  • Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2303.17506
  • Pdf link: https://arxiv.org/pdf/2303.17506
  • Abstract
    Given a graph and an integer $k$, Densest $k$-Subgraph is the algorithmic task of finding the subgraph on $k$ vertices with the maximum number of edges. This is a fundamental problem that has been subject to intense study for decades, with applications spanning a wide variety of fields. The state-of-the-art algorithm is an $O(n^{1/4 + \epsilon})$-factor approximation (for any $\epsilon &gt; 0$) due to Bhaskara et al. [STOC '10]. Moreover, the so-called log-density framework predicts that this is optimal, i.e. it is impossible for an efficient algorithm to achieve an $O(n^{1/4 - \epsilon})$-factor approximation. In the average case, Densest $k$-Subgraph is a prototypical noisy inference task which is conjectured to exhibit a statistical-computational gap. In this work, we provide the strongest evidence yet of hardness for Densest $k$-Subgraph by showing matching lower bounds against the powerful Sum-of-Squares (SoS) algorithm, a meta-algorithm based on convex programming that achieves state-of-art algorithmic guarantees for many optimization and inference problems. For $k \leq n^{\frac{1}{2}}$, we obtain a degree $n^{\delta}$ SoS lower bound for the hard regime as predicted by the log-density framework. To show this, we utilize the modern framework for proving SoS lower bounds on average-case problems pioneered by Barak et al. [FOCS '16]. A key issue is that small denser-than-average subgraphs in the input will greatly affect the value of the candidate pseudoexpectation operator around the subgraph. To handle this challenge, we devise a novel matrix factorization scheme based on the positive minimum vertex separator. We then prove an intersection tradeoff lemma to show that the error terms when using this separator are indeed small.

Learning in Factored Domains with Information-Constrained Visual Representations

  • Authors: Tyler Malloy, Miao Liu, Matthew D. Riemer, Tim Klinger, Gerald Tesauro, Chris R. Sims
  • Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Neurons and Cognition (q-bio.NC)
  • Arxiv link: https://arxiv.org/abs/2303.17508
  • Pdf link: https://arxiv.org/pdf/2303.17508
  • Abstract
    Humans learn quickly even in tasks that contain complex visual information. This is due in part to the efficient formation of compressed representations of visual information, allowing for better generalization and robustness. However, compressed representations alone are insufficient for explaining the high speed of human learning. Reinforcement learning (RL) models that seek to replicate this impressive efficiency may do so through the use of factored representations of tasks. These informationally simplistic representations of tasks are similarly motivated as the use of compressed representations of visual information. Recent studies have connected biological visual perception to disentangled and compressed representations. This raises the question of how humans learn to efficiently represent visual information in a manner useful for learning tasks. In this paper we present a model of human factored representation learning based on an altered form of a $\beta$-Variational Auto-encoder used in a visual learning task. Modelling results demonstrate a trade-off in the informational complexity of model latent dimension spaces, between the speed of learning and the accuracy of reconstructions.

Hybrid Dealiasing of Complex Convolutions

  • Authors: Noel Murasko, John C. Bowman
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.17510
  • Pdf link: https://arxiv.org/pdf/2303.17510
  • Abstract
    Efficient algorithms for computing linear convolutions based on the fast Fourier transform are developed. A hybrid approach is described that combines the conventional practice of explicit dealiasing (explicitly padding the input data with zeros) and implicit dealiasing (mathematically accounting for these zero values). The new approach generalizes implicit dealiasing to arbitrary padding ratios and includes explicit dealiasing as a special case. Unlike existing implementations of implicit dealiasing, hybrid dealiasing tailors its subtransform sizes to the convolution geometry. Multidimensional convolutions are implemented with hybrid dealiasing by decomposing them into lower-dimensional convolutions. Convolutions of complex-valued and Hermitian inputs of equal length are illustrated with pseudocode and implemented in the open-source FFTW++ library. Hybrid dealiasing is shown to outperform explicit dealiasing in one, two, and three dimensions.

Power-Optimal HARQ Protocol for Reliable Free Space Optical Communication

  • Authors: Georgios D. Chondrogiannis, Nikos A. Mitsiou, Nestor D. Chatzidiamantis, Alexandros-Apostolos A. Boulogeorgos, George K. Karagiannidis
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.17512
  • Pdf link: https://arxiv.org/pdf/2303.17512
  • Abstract
    This paper investigates the usage of hybrid automatic repeat request (HARQ) protocols for power-efficient and reliable communications over free space optical (FSO) links. By exploiting the large coherence time of the FSO channel, the proposed transmission schemes combat turbulence-induced fading by retransmitting the failed packets in the same coherence interval. To assess the performance of the presented HARQ technique, we extract a theoretical framework for the outage performance. In more detail, a closed-form expression for the outage probability (OP) is reported and an approximation for the high signal-to-noise ratio (SNR) region is extracted. Building upon the theoretical framework, we formulate a transmission power allocation problem throughout the retransmission rounds. This optimization problem is solved numerically through the use of an iterative algorithm. In addition, the average throughput of the HARQ schemes under consideration is examined. Simulation results validate the theoretical analysis under different turbulence conditions and demonstrate the performance improvement, in terms of both OP and throughput, of the proposed HARQ schemes compared to fixed transmit power HARQ benchmarks.

Nonlinear Approximation with Subsampled Rank-1 Lattices

  • Authors: Felix Bartel, Fabian Taubert
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.17541
  • Pdf link: https://arxiv.org/pdf/2303.17541
  • Abstract
    In this paper we approximate high-dimensional functions $f\colon\mathbb T^d\to\mathbb C$ by sparse trigonometric polynomials based on function evaluations. Recently it was shown that a dimension-incremental sparse Fourier transform (SFT) approach does not require the signal to be exactly sparse and is applicable in this setting. We combine this approach with subsampling techniques for rank-1 lattices. This way our approach benefits from the underlying structure in the sampling points making fast Fourier algorithms applicable whilst achieving the good sampling complexity of random points (logarithmic oversampling). In our analysis we show detection guarantees of the frequencies corresponding to the Fourier coefficients of largest magnitude. In numerical experiments we make a comparison to full rank-1 lattices and uniformly random points to confirm our findings.

Active User Identification in Fast Fading Massive Random Access Channels

  • Authors: Jyotish Robin, Elza Erkip
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.17543
  • Pdf link: https://arxiv.org/pdf/2303.17543
  • Abstract
    Reliable and prompt identification of active users is critical for enabling random access in massive machine-to-machine type networks which typically operate within stringent access delay and energy constraints. In this paper, an energy efficient active user identification protocol is envisioned in which the active users simultaneously transmit On-Off Keying (OOK) modulated preambles whereas the base station uses non-coherent detection to avoid the channel estimation overheads. The minimum number of channel-uses required for active user identification in the asymptotic regime of total number of users $\ell$ when the number of active devices k scales as $k = \Theta(1)$ is characterized along with an achievability scheme relying on the equivalence of activity detection to a group testing problem. A practical scheme for active user identification based on a belief propagation strategy is also proposed and its performance is compared against the theoretical bounds.

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

  • Authors: Chenpng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2303.17550
  • Pdf link: https://arxiv.org/pdf/2303.17550
  • Abstract
    While recent research has made significant progress in speech-driven talking face generation, the quality of the generated video still lags behind that of real recordings. One reason for this is the use of handcrafted intermediate representations like facial landmarks and 3DMM coefficients, which are designed based on human knowledge and are insufficient to precisely describe facial movements. Additionally, these methods require an external pretrained model for extracting these representations, whose performance sets an upper bound on talking face generation. To address these limitations, we propose a novel method called DAE-Talker that leverages data-driven latent representations obtained from a diffusion autoencoder (DAE). DAE contains an image encoder that encodes an image into a latent vector and a DDIM image decoder that reconstructs the image from it. We train our DAE on talking face video frames and then extract their latent representations as the training target for a Conformer-based speech2latent model. This allows DAE-Talker to synthesize full video frames and produce natural head movements that align with the content of speech, rather than relying on a predetermined head pose from a template video. We also introduce pose modelling in speech2latent for pose controllability. Additionally, we propose a novel method for generating continuous video frames with the DDIM image decoder trained on individual frames, eliminating the need for modelling the joint distribution of consecutive frames directly. Our experiments show that DAE-Talker outperforms existing popular methods in lip-sync, video fidelity, and pose naturalness. We also conduct ablation studies to analyze the effectiveness of the proposed techniques and demonstrate the pose controllability of DAE-Talker.

DDP: Diffusion Model for Dense Visual Prediction

  • Authors: Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17559
  • Pdf link: https://arxiv.org/pdf/2303.17559
  • Abstract
    We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task compared to the specialist counterparts. For example, semantic segmentation (83.9 mIoU on Cityscapes), BEV map segmentation (70.6 mIoU on nuScenes), and depth estimation (0.05 REL on KITTI). We hope that our approach will serve as a solid baseline and facilitate future research

Using AI to Measure Parkinson's Disease Severity at Home

  • Authors: Md Saiful Islam, Wasifur Rahman, Abdelrahman Abdelkader, Phillip T. Yang, Sangwu Lee, Jamie L. Adams, Ruth B. Schneider, E. Ray Dorsey, Ehsan Hoque
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17573
  • Pdf link: https://arxiv.org/pdf/2303.17573
  • Abstract
    We present an artificial intelligence system to remotely assess the motor performance of individuals with Parkinson's disease (PD). Participants performed a motor task (i.e., tapping fingers) in front of a webcam, and data from 250 global participants were rated by three expert neurologists following the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS). The neurologists' ratings were highly reliable, with an intra-class correlation coefficient (ICC) of 0.88. We developed computer algorithms to obtain objective measurements that align with the MDS-UPDRS guideline and are strongly correlated with the neurologists' ratings. Our machine learning model trained on these measures outperformed an MDS-UPDRS certified rater, with a mean absolute error (MAE) of 0.59 compared to the rater's MAE of 0.79. However, the model performed slightly worse than the expert neurologists (0.53 MAE). The methodology can be replicated for similar motor tasks, providing the possibility of evaluating individuals with PD and other movement disorders remotely, objectively, and in areas with limited access to neurological care.

Human-Robot Interaction using VAHR: Virtual Assistant, Human, and Robots in the Loop

  • Authors: Ahmad Amine, Mostafa Aldilati, Hadi Hasan, Noel Maalouf, Imad H. Elhajj
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17582
  • Pdf link: https://arxiv.org/pdf/2303.17582
  • Abstract
    Robots have become ubiquitous tools in various industries and households, highlighting the importance of human-robot interaction (\textbf{HRI}). This has increased the need for easy and accessible communication between humans and robots. Recent research has focused on the intersection of virtual assistant technology, such as Amazon's Alexa, with robots and its effect on HRI. This paper presents the Virtual Assistant, Human, and Robots in the loop (VAHR) system, which utilizes bidirectional communication to control multiple robots through Alexa. VAHR's performance was evaluated through a human-subjects experiment, comparing objective and subjective metrics of traditional keyboard and mouse interfaces to VAHR. The results showed that VAHR required 41% less Robot Attention Demand and ensured 91% more Fan-out time compared to the standard method. Additionally, VAHR led to a 62.5% improvement in multi-tasking, highlighting the potential for efficient human-robot interaction in physically- and mentally-demanding scenarios. However, subjective metrics revealed a need for human operators to build confidence and trust with this new method of operation.

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models

  • Authors: Eric Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, Humphrey Shi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17591
  • Pdf link: https://arxiv.org/pdf/2303.17591
  • Abstract
    The unlearning problem of deep learning models, once primarily an academic concern, has become a prevalent issue in the industry. The significant advances in text-to-image generation techniques have prompted global discussions on privacy, copyright, and safety, as numerous unauthorized personal IDs, content, artistic creations, and potentially harmful materials have been learned by these models and later utilized to generate and distribute uncontrolled content. To address this challenge, we propose \textbf{Forget-Me-Not}, an efficient and low-cost solution designed to safely remove specified IDs, objects, or styles from a well-configured text-to-image model in as little as 30 seconds, without impairing its ability to generate other content. Alongside our method, we introduce the \textbf{Memorization Score (M-Score)} and \textbf{ConceptBench} to measure the models' capacity to generate general concepts, grouped into three primary categories: ID, object, and style. Using M-Score and ConceptBench, we demonstrate that Forget-Me-Not can effectively eliminate targeted concepts while maintaining the model's performance on other concepts. Furthermore, Forget-Me-Not offers two practical extensions: a) removal of potentially harmful or NSFW content, and b) enhancement of model accuracy, inclusion and diversity through \textbf{concept correction and disentanglement}. It can also be adapted as a lightweight model patch for Stable Diffusion, allowing for concept manipulation and convenient distribution. To encourage future research in this critical area and promote the development of safe and inclusive generative models, we will open-source our code and ConceptBench at \href{https://github.com/SHI-Labs/Forget-Me-Not}{https://github.com/SHI-Labs/Forget-Me-Not}.

MobileInst: Video Instance Segmentation on the Mobile

  • Authors: Renhong Zhang, Tianheng Cheng, Shusheng Yang, Haoyi Jiang, Shuai Zhang, Jiancheng Lyu, Xin Li, Xiaowen Ying, Dashan Gao, Wenyu Liu, Xinggang Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17594
  • Pdf link: https://arxiv.org/pdf/2303.17594
  • Abstract
    Although recent approaches aiming for video instance segmentation have achieved promising results, it is still difficult to employ those approaches for real-world applications on mobile devices, which mainly suffer from (1) heavy computation and memory cost and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. Firstly, MobileInst adopts a mobile vision transformer to extract multi-level semantic features and presents an efficient query-based dual-transformer instance decoder for mask kernels and a semantic-enhanced mask decoder to generate instance segmentation per frame. Secondly, MobileInst exploits simple yet effective kernel reuse and kernel association to track objects for video instance segmentation. Further, we propose temporal query passing to enhance the tracking ability for kernels. We conduct experiments on COCO and YouTube-VIS datasets to demonstrate the superiority of MobileInst and evaluate the inference latency on a mobile CPU core of Qualcomm Snapdragon-778G, without other methods of acceleration. On the COCO dataset, MobileInst achieves 30.5 mask AP and 176 ms on the mobile CPU, which reduces the latency by 50% compared to the previous SOTA. For video instance segmentation, MobileInst achieves 35.0 AP on YouTube-VIS 2019 and 30.1 AP on YouTube-VIS 2021. Code will be available to facilitate real-world applications and future research.

Token Merging for Fast Stable Diffusion

  • Authors: Daniel Bolya, Judy Hoffman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17604
  • Pdf link: https://arxiv.org/pdf/2303.17604
  • Abstract
    The landscape of image generation has been forever changed by open vocabulary diffusion models. However, at their core these models use transformers, which makes generation slow. Better implementations to increase the throughput of these transformers have emerged, but they still evaluate the entire model. In this paper, we instead speed up diffusion models by exploiting natural redundancy in generated images by merging redundant tokens. After making some diffusion-specific improvements to Token Merging (ToMe), our ToMe for Stable Diffusion can reduce the number of tokens in an existing Stable Diffusion model by up to 60% while still producing high quality images without any extra training. In the process, we speed up image generation by up to 2x and reduce memory consumption by up to 5.6x. Furthermore, this speed-up stacks with efficient implementations such as xFormers, minimally impacting quality while being up to 5.4x faster for large images. Code is available at https://github.com/dbolya/tomesd.

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

  • Authors: Xuanyao Chen, Zhijian Liu, Haotian Tang, Li Yi, Hang Zhao, Song Han
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17605
  • Pdf link: https://arxiv.org/pdf/2303.17605
  • Abstract
    High-resolution images enable neural networks to learn richer visual representations. However, this improved performance comes at the cost of growing computational complexity, hindering their usage in latency-sensitive applications. As not all pixels are equal, skipping computations for less-important regions offers a simple and effective measure to reduce the computation. This, however, is hard to be translated into actual speedup for CNNs since it breaks the regularity of the dense convolution workload. In this paper, we introduce SparseViT that revisits activation sparsity for recent window-based vision transformers (ViTs). As window attentions are naturally batched over blocks, actual speedup with window activation pruning becomes possible: i.e., ~50% latency reduction with 60% sparsity. Different layers should be assigned with different pruning ratios due to their diverse sensitivities and computational costs. We introduce sparsity-aware adaptation and apply the evolutionary search to efficiently find the optimal layerwise sparsity configuration within the vast search space. SparseViT achieves speedups of 1.5x, 1.4x, and 1.3x compared to its dense counterpart in monocular 3D object detection, 2D instance segmentation, and 2D semantic segmentation, respectively, with negligible to no loss of accuracy.

Keyword: faster

Urgency-aware Routing in Single Origin-destination Itineraries through Artificial Currencies

  • Authors: Leonardo Pedroso, W.P.M.H. Heemels, Mauro Salazar
  • Subjects: Systems and Control (eess.SY); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2303.16945
  • Pdf link: https://arxiv.org/pdf/2303.16945
  • Abstract
    Within mobility systems, the presence of self-interested users can lead to aggregate routing patterns that are far from the societal optimum which could be achieved by centrally controlling the users' choices. In this paper, we design a fair incentive mechanism to steer the selfish behavior of the users to align with the societally optimal aggregate routing. The proposed mechanism is based on an artificial currency that cannot be traded or bought, but only spent or received when traveling. Specifically, we consider a parallel-arc network with a single origin and destination node within a repeated game setting whereby each user chooses from one of the available arcs to reach their destination on a daily basis. In this framework, taking faster routes comes at a cost, whereas taking slower routes is incentivized by a reward. The users are thus playing against their future selves when choosing their present actions. To capture this complex behavior, we assume the users to be rational and to minimize an urgency-weighted combination of their immediate and future discomfort. To design the optimal pricing, we first derive a closed-form expression for the best individual response strategy. Second, we formulate the pricing design problem for each arc to achieve the societally optimal aggregate flows, and reformulate it so that it can be solved with gradient-free optimization methods. Our numerical simulations show that it is possible to achieve a near-optimal routing whilst significantly reducing the users' perceived discomfort when compared to a centralized optimal but urgency-unaware policy.

PopSparse: Accelerated block sparse matrix multiplication on IPU

  • Authors: Zhiyi Li, Douglas Orr, Valeriu Ohan, Godfrey Da costa, Tom Murray, Adam Sanders, Deniz Beker, Dominic Masters
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16999
  • Pdf link: https://arxiv.org/pdf/2303.16999
  • Abstract
    Reducing the computational cost of running large scale neural networks using sparsity has attracted great attention in the deep learning community. While much success has been achieved in reducing FLOP and parameter counts while maintaining acceptable task performance, achieving actual speed improvements has typically been much more difficult, particularly on general purpose accelerators (GPAs) such as NVIDIA GPUs using low precision number formats. In this work we introduce PopSparse, a library that enables fast sparse operations on Graphcore IPUs by leveraging both the unique hardware characteristics of IPUs as well as any block structure defined in the data. We target two different types of sparsity: static, where the sparsity pattern is fixed at compile-time; and dynamic, where it can change each time the model is run. We present benchmark results for matrix multiplication for both of these modes on IPU with a range of block sizes, matrix sizes and densities. Results indicate that the PopSparse implementations are faster than dense matrix multiplications on IPU at a range of sparsity levels with large matrix size and block size. Furthermore, static sparsity in general outperforms dynamic sparsity. While previous work on GPAs has shown speedups only for very high sparsity (typically 99% and above), the present work demonstrates that our static sparse implementation outperforms equivalent dense calculations in FP16 at lower sparsity (around 90%).

Overcoming Challenges to Continuous Integration in HPC

  • Authors: Todd Gamblin, Daniel S. Katz
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.17034
  • Pdf link: https://arxiv.org/pdf/2303.17034
  • Abstract
    Continuous integration (CI) has become a ubiquitous practice in modern software development, with major code hosting services offering free automation on popular platforms. CI offers major benefits, as it enables detecting bugs in code prior to committing changes. While high-performance computing (HPC) research relies heavily on software, HPC machines are not considered "common" platforms. This presents several challenges that hinder the adoption of CI in HPC environments, making it difficult to maintain bug-free HPC projects, and resulting in adverse effects on the research community. In this article, we explore the challenges that impede HPC CI, such as hardware diversity, security, isolation, administrative policies, and non-standard authentication, environments, and job submission mechanisms. We propose several solutions that could enhance the quality of HPC software and the experience of developers. Implementing these solutions would require significant changes at HPC centers, but if these changes are made, it would ultimately enable faster and better science.

ACM with Overlapping Partitions: Implementation and Periodicity Analysis

  • Authors: Anthony O'Dea
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.17069
  • Pdf link: https://arxiv.org/pdf/2303.17069
  • Abstract
    The Arnold Cat Map (ACM) is a popular chaotic map used in image encryption. Chaotic maps are known for their sensitivity to initial conditions and their ability to mix, or rearrange, pixels. However, ACM is periodic, and the period is relatively short. This periodicity decreases the effective key space for a cryptosystem. Further, ACM can only be performed on square matrices. For non-square images, this issue can be solved by performing ACM on multiple square partitions of the image. If these partitions overlap, the periodicity will greatly increase. The resulting system will be referred to as overlapping ACM or OACM. This paper will cover the implementation and periodicity analysis for these overlapping systems, which previous papers involving similar overlapping block partitions did not. Viewing OACM as a scan as opposed to a map allows for faster implementation and period analysis.

TreePiece: Faster Semantic Parsing via Tree Tokenization

  • Authors: Sid Wang, Akshat Shrivastava, Sasha Livshits
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17161
  • Pdf link: https://arxiv.org/pdf/2303.17161
  • Abstract
    Autoregressive (AR) encoder-decoder neural networks have proved successful in many NLP problems, including Semantic Parsing -- a task that translates natural language to machine-readable parse trees. However, the sequential prediction process of AR models can be slow. To accelerate AR for semantic parsing, we introduce a new technique called TreePiece that tokenizes a parse tree into subtrees and generates one subtree per decoding step. On TopV2 benchmark, TreePiece shows 4.6 times faster decoding speed than standard AR, and comparable speed but significantly higher accuracy compared to Non-Autoregressive (NAR).

DPP-based Client Selection for Federated Learning with Non-IID Data

  • Authors: Yuxuan Zhang, Chao Xu, Howard H. Yang, Xijun Wang, Tony Q. S. Quek
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17358
  • Pdf link: https://arxiv.org/pdf/2303.17358
  • Abstract
    This paper proposes a client selection (CS) method to tackle the communication bottleneck of federated learning (FL) while concurrently coping with FL's data heterogeneity issue. Specifically, we first analyze the effect of CS in FL and show that FL training can be accelerated by adequately choosing participants to diversify the training dataset in each round of training. Based on this, we leverage data profiling and determinantal point process (DPP) sampling techniques to develop an algorithm termed Federated Learning with DPP-based Participant Selection (FL-DP$^3$S). This algorithm effectively diversifies the participants' datasets in each round of training while preserving their data privacy. We conduct extensive experiments to examine the efficacy of our proposed method. The results show that our scheme attains a faster convergence rate, as well as a smaller communication overhead than several baselines.

Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions

  • Authors: Yicheng Luo, Jackie Kay, Edward Grefenstette, Marc Peter Deisenroth
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17396
  • Pdf link: https://arxiv.org/pdf/2303.17396
  • Abstract
    Offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment. Online finetuning of such offline models can further improve performance. But how should we ideally finetune agents obtained from offline RL training? While offline RL algorithms can in principle be used for finetuning, in practice, their online performance improves slowly. In contrast, we show that it is possible to use standard online off-policy algorithms for faster improvement. However, we find this approach may suffer from policy collapse, where the policy undergoes severe performance deterioration during initial online learning. We investigate the issue of policy collapse and how it relates to data diversity, algorithm choices and online replay distribution. Based on these insights, we propose a conservative policy optimization procedure that can achieve stable and sample-efficient online learning from offline pretraining.

Edge Ranking of Graphs in Transportation Networks using a Graph Neural Network (GNN)

  • Authors: Debasish Jana, Sven Malama, Sriram Narasimhan, Ertugrul Taciroglu
  • Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17485
  • Pdf link: https://arxiv.org/pdf/2303.17485
  • Abstract
    Many networks, such as transportation, power, and water distribution, can be represented as graphs. Crucial challenge in graph representations is identifying the importance of graph edges and their influence on overall network efficiency and information flow performance. For example, important edges in a transportation network are those roads that, when affected, will significantly alter the network's overall efficiency. Commonly used approach to finding such important edges is ``edge betweenness centrality'' (EBC), an edge ranking measure to determine the influential edges of the graph based on connectivity and information spread. Computing the EBC utilizing the common Brandes algorithm involves calculating the shortest paths for every node pair, which can be computationally expensive and restrictive, especially for large graphs. Changes in the graph parameters, e.g., in the edge weight or the addition and deletion of nodes or edges, require the recalculation of the EBC. As the main contribution, we propose an approximate method to estimate the EBC using a Graph Neural Network (GNN), a deep learning-based approach. We show that it is computationally efficient compared to the conventional method, especially for large graphs. The proposed method of GNN-based edge ranking is evaluated on several synthetic graphs and a real-world transportation data set. We show that this framework can estimate the approximate edge ranking much faster compared to the conventional method. This approach is inductive, i.e., training and testing are performed on different sets of graphs with varying numbers of nodes and edges. The proposed method is especially suitable for applications on large-scale networks when edge information is desired, for example, in urban infrastructure improvement projects, power, and water network resilience analyses, and optimizing resource allocations in engineering networks.

Pgx: Hardware-accelerated parallel game simulation for reinforcement learning

  • Authors: Sotetsu Koyamada, Shinri Okano, Soichiro Nishimori, Yu Murata, Keigo Habara, Haruka Kita, Shin Ishii
  • Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17503
  • Pdf link: https://arxiv.org/pdf/2303.17503
  • Abstract
    We propose Pgx, a collection of board game simulators written in JAX. Thanks to auto-vectorization and Just-In-Time compilation of JAX, Pgx scales easily to thousands of parallel execution on GPU/TPU accelerators. We found that the simulation of Pgx on a single A100 GPU is 10x faster than that of existing reinforcement learning libraries. Pgx implements games considered vital benchmarks in artificial intelligence research, such as Backgammon, Shogi, and Go. Pgx is available at https://github.com/sotetsuk/pgx.

Token Merging for Fast Stable Diffusion

  • Authors: Daniel Bolya, Judy Hoffman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17604
  • Pdf link: https://arxiv.org/pdf/2303.17604
  • Abstract
    The landscape of image generation has been forever changed by open vocabulary diffusion models. However, at their core these models use transformers, which makes generation slow. Better implementations to increase the throughput of these transformers have emerged, but they still evaluate the entire model. In this paper, we instead speed up diffusion models by exploiting natural redundancy in generated images by merging redundant tokens. After making some diffusion-specific improvements to Token Merging (ToMe), our ToMe for Stable Diffusion can reduce the number of tokens in an existing Stable Diffusion model by up to 60% while still producing high quality images without any extra training. In the process, we speed up image generation by up to 2x and reduce memory consumption by up to 5.6x. Furthermore, this speed-up stacks with efficient implementations such as xFormers, minimally impacting quality while being up to 5.4x faster for large images. Code is available at https://github.com/dbolya/tomesd.

Keyword: mobile

A Tensor-based Convolutional Neural Network for Small Dataset Classification

  • Authors: Zhenhua Chen, David Crandall
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2303.17061
  • Pdf link: https://arxiv.org/pdf/2303.17061
  • Abstract
    Inspired by the ConvNets with structured hidden representations, we propose a Tensor-based Neural Network, TCNN. Different from ConvNets, TCNNs are composed of structured neurons rather than scalar neurons, and the basic operation is neuron tensor transformation. Unlike other structured ConvNets, where the part-whole relationships are modeled explicitly, the relationships are learned implicitly in TCNNs. Also, the structured neurons in TCNNs are high-rank tensors rather than vectors or matrices. We compare TCNNs with current popular ConvNets, including ResNets, MobileNets, EfficientNets, RegNets, etc., on CIFAR10, CIFAR100, and Tiny ImageNet. The experiment shows that TCNNs have higher efficiency in terms of parameters. TCNNs also show higher robustness against white-box adversarial attacks on MNIST compared to ConvNets.

Dependent Task Offloading in Edge Computing Using GNN and Deep Reinforcement Learning

  • Authors: Zequn Cao, Xiaoheng Deng
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.17100
  • Pdf link: https://arxiv.org/pdf/2303.17100
  • Abstract
    Task offloading is a widely used technology in Mobile Edge Computing (MEC), which declines the completion time of user task with the help of resourceful edge servers. Existing works mainly focus on the case that the computation density of a user task is homogenous so that it can be offloaded in full or by percentage. However, various user tasks in real life consist of several inner dependent subtasks, each of which is a minimum execution unit logically. Motivated by this gap, we aim to solve the Dependent Task Offloading (DTO) problem under multi-user multi-edge scenario in this paper. We firstly use Directed Acyclic Graph (DAG) to represent dependent task where nodes indicate subtasks and directed edges indicate dependencies among subtasks. Then we propose a scheme based on Graph Attention Network (GAT) and Deep Reinforcement Learning (DRL) to minimize the makespan of user tasks. To utilize GAT efficiently, we put the training of it on resourceful cloud in unsupervised style due to the numerous data and computation resource requirements. In addition, we design a multi-discrete Action space for DRL algorithm to enhance the applicability of our proposed scheme. Experiments are conducted on broadly distributed synthetic data. The results demonstrate that our proposed approach can be adapted to both simple and complex MEC environments and outperforms other methods.

Deep Generative Model and Its Applications in Efficient Wireless Network Management: A Tutorial and Case Study

  • Authors: Yinqiu Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Abbas Jamalipour
  • Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17114
  • Pdf link: https://arxiv.org/pdf/2303.17114
  • Abstract
    With the phenomenal success of diffusion models and ChatGPT, deep generation models (DGMs) have been experiencing explosive growth from 2022. Not limited to content generation, DGMs are also widely adopted in Internet of Things, Metaverse, and digital twin, due to their outstanding ability to represent complex patterns and generate plausible samples. In this article, we explore the applications of DGMs in a crucial task, i.e., improving the efficiency of wireless network management. Specifically, we firstly overview the generative AI, as well as three representative DGMs. Then, a DGM-empowered framework for wireless network management is proposed, in which we elaborate the issues of the conventional network management approaches, why DGMs can address them efficiently, and the step-by-step workflow for applying DGMs in managing wireless networks. Moreover, we conduct a case study on network economics, using the state-of-the-art DGM model, i.e., diffusion model, to generate effective contracts for incentivizing the mobile AI-Generated Content (AIGC) services. Last but not least, we discuss important open directions for the further research.

GAT-COBO: Cost-Sensitive Graph Neural Network for Telecom Fraud Detection

  • Authors: Xinxin Hu, Haotian Chen, Junjie Zhang, Hongchang Chen, Shuxin Liu, Xing Li, Yahui Wang, Xiangyang Xue
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17334
  • Pdf link: https://arxiv.org/pdf/2303.17334
  • Abstract
    Along with the rapid evolution of mobile communication technologies, such as 5G, there has been a drastically increase in telecom fraud, which significantly dissipates individual fortune and social wealth. In recent years, graph mining techniques are gradually becoming a mainstream solution for detecting telecom fraud. However, the graph imbalance problem, caused by the Pareto principle, brings severe challenges to graph data mining. This is a new and challenging problem, but little previous work has been noticed. In this paper, we propose a Graph ATtention network with COst-sensitive BOosting (GAT-COBO) for the graph imbalance problem. First, we design a GAT-based base classifier to learn the embeddings of all nodes in the graph. Then, we feed the embeddings into a well-designed cost-sensitive learner for imbalanced learning. Next, we update the weights according to the misclassification cost to make the model focus more on the minority class. Finally, we sum the node embeddings obtained by multiple cost-sensitive learners to obtain a comprehensive node representation, which is used for the downstream anomaly detection task. Extensive experiments on two real-world telecom fraud detection datasets demonstrate that our proposed method is effective for the graph imbalance problem, outperforming the state-of-the-art GNNs and GNN-based fraud detectors. In addition, our model is also helpful for solving the widespread over-smoothing problem in GNNs. The GAT-COBO code and datasets are available at https://github.com/xxhu94/GAT-COBO.

An Efficient Mobile Gateway Selection and Discovery Based-Routing Protocol in Heterogeneous LTE-VANET Networks

  • Authors: Driss Abada, Rachid Adrdor, Omar Boutkhoum, Adil Bohouch
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2303.17439
  • Pdf link: https://arxiv.org/pdf/2303.17439
  • Abstract
    Coupling cellular communication networks with vehicular ad hoc networks (VANET) can be a very interesting way out for providing Internet access to vehicles in the road. However, due to the several specific characteristics of VANETs, making an efficient multi-hop routing from vehicular sources to the Internet gateways through Long Term Evolution (LTE) technology is still challenging. In this paper, an Internet mobile gateway selection scheme is proposed to elect more potential vehicles to behave as gateways to Internet in VANETs. Therefore, the discovery and the selection of route to those mobiles gateways is carried out via an efficient multiple metrics-based relay selection mechanism. The objective is to select the more reliable route to the mobile gateways, by reducing the communication overhead and performing seamless handover. The proposed protocol is compared with one recent protocol based on packet delivery ratio, average end-to-end delay and overhead. The results show that the proposed protocol ameliorates significantly the network performance in the contrast of the other protocol.

Cost Sensitive GNN-based Imbalanced Learning for Mobile Social Network Fraud Detection

  • Authors: Xinxin Hu, Haotian Chen, Hongchang Chen, Shuxin Liu, Xing Li, Shibo Zhang, Yahui Wang, Xiangyang Xue
  • Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17486
  • Pdf link: https://arxiv.org/pdf/2303.17486
  • Abstract
    With the rapid development of mobile networks, the people's social contacts have been considerably facilitated. However, the rise of mobile social network fraud upon those networks, has caused a great deal of distress, in case of depleting personal and social wealth, then potentially doing significant economic harm. To detect fraudulent users, call detail record (CDR) data, which portrays the social behavior of users in mobile networks, has been widely utilized. But the imbalance problem in the aforementioned data, which could severely hinder the effectiveness of fraud detectors based on graph neural networks(GNN), has hardly been addressed in previous work. In this paper, we are going to present a novel Cost-Sensitive Graph Neural Network (CSGNN) by creatively combining cost-sensitive learning and graph neural networks. We conduct extensive experiments on two open-source realworld mobile network fraud datasets. The results show that CSGNN can effectively solve the graph imbalance problem and then achieve better detection performance than the state-of-the-art algorithms. We believe that our research can be applied to solve the graph imbalance problems in other fields. The CSGNN code and datasets are publicly available at https://github.com/xxhu94/CSGNN.

MobileInst: Video Instance Segmentation on the Mobile

  • Authors: Renhong Zhang, Tianheng Cheng, Shusheng Yang, Haoyi Jiang, Shuai Zhang, Jiancheng Lyu, Xin Li, Xiaowen Ying, Dashan Gao, Wenyu Liu, Xinggang Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17594
  • Pdf link: https://arxiv.org/pdf/2303.17594
  • Abstract
    Although recent approaches aiming for video instance segmentation have achieved promising results, it is still difficult to employ those approaches for real-world applications on mobile devices, which mainly suffer from (1) heavy computation and memory cost and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. Firstly, MobileInst adopts a mobile vision transformer to extract multi-level semantic features and presents an efficient query-based dual-transformer instance decoder for mask kernels and a semantic-enhanced mask decoder to generate instance segmentation per frame. Secondly, MobileInst exploits simple yet effective kernel reuse and kernel association to track objects for video instance segmentation. Further, we propose temporal query passing to enhance the tracking ability for kernels. We conduct experiments on COCO and YouTube-VIS datasets to demonstrate the superiority of MobileInst and evaluate the inference latency on a mobile CPU core of Qualcomm Snapdragon-778G, without other methods of acceleration. On the COCO dataset, MobileInst achieves 30.5 mask AP and 176 ms on the mobile CPU, which reduces the latency by 50% compared to the previous SOTA. For video instance segmentation, MobileInst achieves 35.0 AP on YouTube-VIS 2019 and 30.1 AP on YouTube-VIS 2021. Code will be available to facilitate real-world applications and future research.

Keyword: pruning

Explainable Intrusion Detection Systems Using Competitive Learning Techniques

  • Authors: Jesse Ables, Thomas Kirby, Sudip Mittal, Ioana Banicescu, Shahram Rahimi, William Anderson, Maria Seale
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17387
  • Pdf link: https://arxiv.org/pdf/2303.17387
  • Abstract
    The current state of the art systems in Artificial Intelligence (AI) enabled intrusion detection use a variety of black box methods. These black box methods are generally trained using Error Based Learning (EBL) techniques with a focus on creating accurate models. These models have high performative costs and are not easily explainable. A white box Competitive Learning (CL) based eXplainable Intrusion Detection System (X-IDS) offers a potential solution to these problem. CL models utilize an entirely different learning paradigm than EBL approaches. This different learning process makes the CL family of algorithms innately explainable and less resource intensive. In this paper, we create an X-IDS architecture that is based on DARPA's recommendation for explainable systems. In our architecture we leverage CL algorithms like, Self Organizing Maps (SOM), Growing Self Organizing Maps (GSOM), and Growing Hierarchical Self Organizing Map (GHSOM). The resulting models can be data-mined to create statistical and visual explanations. Our architecture is tested using NSL-KDD and CIC-IDS-2017 benchmark datasets, and produces accuracies that are 1% - 3% less than EBL models. However, CL models are much more explainable than EBL models. Additionally, we use a pruning process that is able to significantly reduce the size of these CL based models. By pruning our models, we are able to increase prediction speeds. Lastly, we analyze the statistical and visual explanations generated by our architecture, and we give a strategy that users could use to help navigate the set of explanations. These explanations will help users build trust with an Intrusion Detection System (IDS), and allow users to discover ways to increase the IDS's potency.

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

  • Authors: Xuanyao Chen, Zhijian Liu, Haotian Tang, Li Yi, Hang Zhao, Song Han
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17605
  • Pdf link: https://arxiv.org/pdf/2303.17605
  • Abstract
    High-resolution images enable neural networks to learn richer visual representations. However, this improved performance comes at the cost of growing computational complexity, hindering their usage in latency-sensitive applications. As not all pixels are equal, skipping computations for less-important regions offers a simple and effective measure to reduce the computation. This, however, is hard to be translated into actual speedup for CNNs since it breaks the regularity of the dense convolution workload. In this paper, we introduce SparseViT that revisits activation sparsity for recent window-based vision transformers (ViTs). As window attentions are naturally batched over blocks, actual speedup with window activation pruning becomes possible: i.e., ~50% latency reduction with 60% sparsity. Different layers should be assigned with different pruning ratios due to their diverse sensitivities and computational costs. We introduce sparsity-aware adaptation and apply the evolutionary search to efficiently find the optimal layerwise sparsity configuration within the vast search space. SparseViT achieves speedups of 1.5x, 1.4x, and 1.3x compared to its dense counterpart in monocular 3D object detection, 2D instance segmentation, and 2D semantic segmentation, respectively, with negligible to no loss of accuracy.

Keyword: voxel

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions

  • Authors: Lingdong Kong, Youquan Liu, Xin Li, Runnan Chen, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17597
  • Pdf link: https://arxiv.org/pdf/2303.17597
  • Abstract
    The robustness of 3D perception systems under natural corruptions from environments and sensors is pivotal for safety-critical applications. Existing large-scale 3D perception datasets often contain data that are meticulously cleaned. Such configurations, however, cannot reflect the reliability of perception models during the deployment stage. In this work, we present Robo3D, the first comprehensive benchmark heading toward probing the robustness of 3D detectors and segmentors under out-of-distribution scenarios against natural corruptions that occur in real-world environments. Specifically, we consider eight corruption types stemming from adversarial weather conditions, external disturbances, and internal sensor failure. We uncover that, although promising results have been progressively achieved on standard benchmarks, state-of-the-art 3D perception models are at risk of being vulnerable to corruptions. We draw key observations on the use of data representations, augmentation schemes, and training strategies, that could severely affect the model's performance. To pursue better robustness, we propose a density-insensitive training framework along with a simple flexible voxelization strategy to enhance the model resiliency. We hope our benchmark and approach could inspire future research in designing more robust and reliable 3D perception models. Our robustness benchmark suite is publicly available.

Keyword: lidar

T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals

  • Authors: James Giroux, Martin Bouchard, Robert Laganiere
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16940
  • Pdf link: https://arxiv.org/pdf/2303.16940
  • Abstract
    Object detection utilizing Frequency Modulated Continous Wave radar is becoming increasingly popular in the field of autonomous systems. Radar does not possess the same drawbacks seen by other emission-based sensors such as LiDAR, primarily the degradation or loss of return signals due to weather conditions such as rain or snow. However, radar does possess traits that make it unsuitable for standard emission-based deep learning representations such as point clouds. Radar point clouds tend to be sparse and therefore information extraction is not efficient. To overcome this, more traditional digital signal processing pipelines were adapted to form inputs residing directly in the frequency domain via Fast Fourier Transforms. Commonly, three transformations were used to form Range-Azimuth-Doppler cubes in which deep learning algorithms could perform object detection. This too has drawbacks, namely the pre-processing costs associated with performing multiple Fourier Transforms and normalization. We explore the possibility of operating on raw radar inputs from analog to digital converters via the utilization of complex transformation layers. Moreover, we introduce hierarchical Swin Vision transformers to the field of radar object detection and show their capability to operate on inputs varying in pre-processing, along with different radar configurations, i.e. relatively low and high numbers of transmitters and receivers, while obtaining on par or better results than the state-of-the-art.

BEVFusion4D: Learning LiDAR-Camera Fusion Under Bird's-Eye-View via Cross-Modality Guidance and Temporal Aggregation

  • Authors: Hongxiang Cai, Zeyuan Zhang, Zhenyu Zhou, Ziyin Li, Wenbo Ding, Jiuhua Zhao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17099
  • Pdf link: https://arxiv.org/pdf/2303.17099
  • Abstract
    Integrating LiDAR and Camera information into Bird's-Eye-View (BEV) has become an essential topic for 3D object detection in autonomous driving. Existing methods mostly adopt an independent dual-branch framework to generate LiDAR and camera BEV, then perform an adaptive modality fusion. Since point clouds provide more accurate localization and geometry information, they could serve as a reliable spatial prior to acquiring relevant semantic information from the images. Therefore, we design a LiDAR-Guided View Transformer (LGVT) to effectively obtain the camera representation in BEV space and thus benefit the whole dual-branch fusion system. LGVT takes camera BEV as the primitive semantic query, repeatedly leveraging the spatial cue of LiDAR BEV for extracting image features across multiple camera views. Moreover, we extend our framework into the temporal domain with our proposed Temporal Deformable Alignment (TDA) module, which aims to aggregate BEV features from multiple historical frames. Including these two modules, our framework dubbed BEVFusion4D achieves state-of-the-art results in 3D object detection, with 72.0% mAP and 73.5% NDS on the nuScenes validation set, and 73.3% mAP and 74.7% NDS on nuScenes test set, respectively.

Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving

  • Authors: Zijian Zhu, Yichi Zhang, Hai Chen, Yinpeng Dong, Shu Zhao, Wenbo Ding, Jiachen Zhong, Shibao Zheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.17297
  • Pdf link: https://arxiv.org/pdf/2303.17297
  • Abstract
    3D object detection is an essential perception task in autonomous driving to understand the environments. The Bird's-Eye-View (BEV) representations have significantly improved the performance of 3D detectors with camera inputs on popular benchmarks. However, there still lacks a systematic understanding of the robustness of these vision-dependent BEV models, which is closely related to the safety of autonomous driving systems. In this paper, we evaluate the natural and adversarial robustness of various representative models under extensive settings, to fully understand their behaviors influenced by explicit BEV features compared with those without BEV. In addition to the classic settings, we propose a 3D consistent patch attack by applying adversarial patches in the 3D space to guarantee the spatiotemporal consistency, which is more realistic for the scenario of autonomous driving. With substantial experiments, we draw several findings: 1) BEV models tend to be more stable than previous methods under different natural conditions and common corruptions due to the expressive spatial representations; 2) BEV models are more vulnerable to adversarial noises, mainly caused by the redundant BEV features; 3) Camera-LiDAR fusion models have superior performance under different settings with multi-modal inputs, but BEV fusion model is still vulnerable to adversarial noises of both point cloud and image. These findings alert the safety issue in the applications of BEV detectors and could facilitate the development of more robust models.

Event-based Agile Object Catching with a Quadrupedal Robot

  • Authors: Benedek Forrai, Takahiro Miki, Daniel Gehrig, Marco Hutter, Davide Scaramuzza
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17479
  • Pdf link: https://arxiv.org/pdf/2303.17479
  • Abstract
    Quadrupedal robots are conquering various indoor and outdoor applications due to their ability to navigate challenging uneven terrains. Exteroceptive information greatly enhances this capability since perceiving their surroundings allows them to adapt their controller and thus achieve higher levels of robustness. However, sensors such as LiDARs and RGB cameras do not provide sufficient information to quickly and precisely react in a highly dynamic environment since they suffer from a bandwidth-latency tradeoff. They require significant bandwidth at high frame rates while featuring significant perceptual latency at lower frame rates, thereby limiting their versatility on resource-constrained platforms. In this work, we tackle this problem by equipping our quadruped with an event camera, which does not suffer from this tradeoff due to its asynchronous and sparse operation. In leveraging the low latency of the events, we push the limits of quadruped agility and demonstrate high-speed ball catching for the first time. We show that our quadruped equipped with an event camera can catch objects with speeds up to 15 m/s from 4 meters, with a success rate of 83%. Using a VGA event camera, our method runs at 100 Hz on an NVIDIA Jetson Orin.

Keyword: diffusion

HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion

  • Authors: Ziya Erkoç, Fangchang Ma, Qi Shan, Matthias Nießner, Angela Dai
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17015
  • Pdf link: https://arxiv.org/pdf/2303.17015
  • Abstract
    Implicit neural fields, typically encoded by a multilayer perceptron (MLP) that maps from coordinates (e.g., xyz) to signals (e.g., signed distances), have shown remarkable promise as a high-fidelity and compact representation. However, the lack of a regular and explicit grid structure also makes it challenging to apply generative modeling directly on implicit neural fields in order to synthesize new data. To this end, we propose HyperDiffusion, a novel approach for unconditional generative modeling of implicit neural fields. HyperDiffusion operates directly on MLP weights and generates new neural implicit fields encoded by synthesized MLP parameters. Specifically, a collection of MLPs is first optimized to faithfully represent individual data samples. Subsequently, a diffusion process is trained in this MLP weight space to model the underlying distribution of neural implicit fields. HyperDiffusion enables diffusion modeling over a implicit, compact, and yet high-fidelity representation of complex signals across 3D shapes and 4D mesh animations within one single unified framework.

DiffCollage: Parallel Generation of Large Content with Diffusion Models

  • Authors: Qinsheng Zhang, Jiaming Song, Xun Huang, Yongxin Chen, Ming-Yu Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17076
  • Pdf link: https://arxiv.org/pdf/2303.17076
  • Abstract
    We present DiffCollage, a compositional diffusion model that can generate large content by leveraging diffusion models trained on generating pieces of the large content. Our approach is based on a factor graph representation where each factor node represents a portion of the content and a variable node represents their overlap. This representation allows us to aggregate intermediate outputs from diffusion models defined on individual nodes to generate content of arbitrary size and shape in parallel without resorting to an autoregressive generation procedure. We apply DiffCollage to various tasks, including infinite image generation, panorama image generation, and long-duration text-guided motion generation. Extensive experimental results with a comparison to strong autoregressive baselines verify the effectiveness of our approach.

Deep Generative Model and Its Applications in Efficient Wireless Network Management: A Tutorial and Case Study

  • Authors: Yinqiu Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Abbas Jamalipour
  • Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17114
  • Pdf link: https://arxiv.org/pdf/2303.17114
  • Abstract
    With the phenomenal success of diffusion models and ChatGPT, deep generation models (DGMs) have been experiencing explosive growth from 2022. Not limited to content generation, DGMs are also widely adopted in Internet of Things, Metaverse, and digital twin, due to their outstanding ability to represent complex patterns and generate plausible samples. In this article, we explore the applications of DGMs in a crucial task, i.e., improving the efficiency of wireless network management. Specifically, we firstly overview the generative AI, as well as three representative DGMs. Then, a DGM-empowered framework for wireless network management is proposed, in which we elaborate the issues of the conventional network management approaches, why DGMs can address them efficiently, and the step-by-step workflow for applying DGMs in managing wireless networks. Moreover, we conduct a case study on network economics, using the state-of-the-art DGM model, i.e., diffusion model, to generate effective contracts for incentivizing the mobile AI-Generated Content (AIGC) services. Last but not least, we discuss important open directions for the further research.

Discriminative Class Tokens for Text-to-Image Diffusion Models

  • Authors: Idan Schwartz, Vésteinn Snæbjarnarson, Sagie Benaim, Hila Chefer, Ryan Cotterell, Lior Wolf, Serge Belongie
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17155
  • Pdf link: https://arxiv.org/pdf/2303.17155
  • Abstract
    Recent advances in text-to-image diffusion models have enabled the generation of diverse and high-quality images. However, generated images often fall short of depicting subtle details and are susceptible to errors due to ambiguity in the input text. One way of alleviating these issues is to train diffusion models on class-labeled datasets. This comes with a downside, doing so limits their expressive power: (i) supervised datasets are generally small compared to large-scale scraped text-image datasets on which text-to-image models are trained, and so the quality and diversity of generated images are severely affected, or (ii) the input is a hard-coded label, as opposed to free-form text, which limits the control over the generated images. In this work, we propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text while achieving high accuracy through discriminative signals from a pretrained classifier, which guides the generation. This is done by iteratively modifying the embedding of a single input token of a text-to-image diffusion model, using the classifier, by steering generated images toward a given target class. Our method is fast compared to prior fine-tuning methods and does not require a collection of in-class images or retraining of a noise-tolerant classifier. We evaluate our method extensively, showing that the generated images are: (i) more accurate and of higher quality than standard diffusion models, (ii) can be used to augment training data in a low-resource setting, and (iii) reveal information about the data used to train the guiding classifier. The code is available at \url{https://github.com/idansc/discriminative_class_tokens}

LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation

  • Authors: Guangcong Zheng, Xianpan Zhou, Xuewei Li, Zhongang Qi, Ying Shan, Xi Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17189
  • Pdf link: https://arxiv.org/pdf/2303.17189
  • Abstract
    Recently, diffusion models have achieved great success in image synthesis. However, when it comes to the layout-to-image generation where an image often has a complex scene of multiple objects, how to make strong control over both the global layout map and each detailed object remains a challenging task. In this paper, we propose a diffusion model named LayoutDiffusion that can obtain higher generation quality and greater controllability than the previous works. To overcome the difficult multimodal fusion of image and layout, we propose to construct a structural image patch with region information and transform the patched image into a special layout to fuse with the normal layout in a unified form. Moreover, Layout Fusion Module (LFM) and Object-aware Cross Attention (OaCA) are proposed to model the relationship among multiple objects and designed to be object-aware and position-sensitive, allowing for precisely controlling the spatial related information. Extensive experiments show that our LayoutDiffusion outperforms the previous SOTA methods on FID, CAS by relatively 46.35%, 26.70% on COCO-stuff and 44.29%, 41.82% on VG. Code is available at https://github.com/ZGCTroy/LayoutDiffusion.

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models

  • Authors: Vidit Goel, Elia Peruzzo, Yifan Jiang, Dejia Xu, Nicu Sebe, Trevor Darrell, Zhangyang Wang, Humphrey Shi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17546
  • Pdf link: https://arxiv.org/pdf/2303.17546
  • Abstract
    Image editing using diffusion models has witnessed extremely fast-paced growth recently. There are various ways in which previous works enable controlling and editing images. Some works use high-level conditioning such as text, while others use low-level conditioning. Nevertheless, most of them lack fine-grained control over the properties of the different objects present in the image, i.e. object-level image editing. In this work, we consider an image as a composition of multiple objects, each defined by various properties. Out of these properties, we identify structure and appearance as the most intuitive to understand and useful for editing purposes. We propose Structure-and-Appearance Paired Diffusion model (PAIR-Diffusion), which is trained using structure and appearance information explicitly extracted from the images. The proposed model enables users to inject a reference image's appearance into the input image at both the object and global levels. Additionally, PAIR-Diffusion allows editing the structure while maintaining the style of individual components of the image unchanged. We extensively evaluate our method on LSUN datasets and the CelebA-HQ face dataset, and we demonstrate fine-grained control over both structure and appearance at the object level. We also applied the method to Stable Diffusion to edit any real image at the object level.

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

  • Authors: Chenpng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2303.17550
  • Pdf link: https://arxiv.org/pdf/2303.17550
  • Abstract
    While recent research has made significant progress in speech-driven talking face generation, the quality of the generated video still lags behind that of real recordings. One reason for this is the use of handcrafted intermediate representations like facial landmarks and 3DMM coefficients, which are designed based on human knowledge and are insufficient to precisely describe facial movements. Additionally, these methods require an external pretrained model for extracting these representations, whose performance sets an upper bound on talking face generation. To address these limitations, we propose a novel method called DAE-Talker that leverages data-driven latent representations obtained from a diffusion autoencoder (DAE). DAE contains an image encoder that encodes an image into a latent vector and a DDIM image decoder that reconstructs the image from it. We train our DAE on talking face video frames and then extract their latent representations as the training target for a Conformer-based speech2latent model. This allows DAE-Talker to synthesize full video frames and produce natural head movements that align with the content of speech, rather than relying on a predetermined head pose from a template video. We also introduce pose modelling in speech2latent for pose controllability. Additionally, we propose a novel method for generating continuous video frames with the DDIM image decoder trained on individual frames, eliminating the need for modelling the joint distribution of consecutive frames directly. Our experiments show that DAE-Talker outperforms existing popular methods in lip-sync, video fidelity, and pose naturalness. We also conduct ablation studies to analyze the effectiveness of the proposed techniques and demonstrate the pose controllability of DAE-Talker.

DDP: Diffusion Model for Dense Visual Prediction

  • Authors: Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17559
  • Pdf link: https://arxiv.org/pdf/2303.17559
  • Abstract
    We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task compared to the specialist counterparts. For example, semantic segmentation (83.9 mIoU on Cityscapes), BEV map segmentation (70.6 mIoU on nuScenes), and depth estimation (0.05 REL on KITTI). We hope that our approach will serve as a solid baseline and facilitate future research

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models

  • Authors: Eric Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, Humphrey Shi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17591
  • Pdf link: https://arxiv.org/pdf/2303.17591
  • Abstract
    The unlearning problem of deep learning models, once primarily an academic concern, has become a prevalent issue in the industry. The significant advances in text-to-image generation techniques have prompted global discussions on privacy, copyright, and safety, as numerous unauthorized personal IDs, content, artistic creations, and potentially harmful materials have been learned by these models and later utilized to generate and distribute uncontrolled content. To address this challenge, we propose \textbf{Forget-Me-Not}, an efficient and low-cost solution designed to safely remove specified IDs, objects, or styles from a well-configured text-to-image model in as little as 30 seconds, without impairing its ability to generate other content. Alongside our method, we introduce the \textbf{Memorization Score (M-Score)} and \textbf{ConceptBench} to measure the models' capacity to generate general concepts, grouped into three primary categories: ID, object, and style. Using M-Score and ConceptBench, we demonstrate that Forget-Me-Not can effectively eliminate targeted concepts while maintaining the model's performance on other concepts. Furthermore, Forget-Me-Not offers two practical extensions: a) removal of potentially harmful or NSFW content, and b) enhancement of model accuracy, inclusion and diversity through \textbf{concept correction and disentanglement}. It can also be adapted as a lightweight model patch for Stable Diffusion, allowing for concept manipulation and convenient distribution. To encourage future research in this critical area and promote the development of safe and inclusive generative models, we will open-source our code and ConceptBench at \href{https://github.com/SHI-Labs/Forget-Me-Not}{https://github.com/SHI-Labs/Forget-Me-Not}.

Consistent View Synthesis with Pose-Guided Diffusion Models

  • Authors: Hung-Yu Tseng, Qinbo Li, Changil Kim, Suhib Alsisan, Jia-Bin Huang, Johannes Kopf
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17598
  • Pdf link: https://arxiv.org/pdf/2303.17598
  • Abstract
    Novel view synthesis from a single image has been a cornerstone problem for many Virtual Reality applications that provide immersive experiences. However, most existing techniques can only synthesize novel views within a limited range of camera motion or fail to generate consistent and high-quality novel views under significant camera movement. In this work, we propose a pose-guided diffusion model to generate a consistent long-term video of novel views from a single image. We design an attention layer that uses epipolar lines as constraints to facilitate the association between different viewpoints. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of the proposed diffusion model against state-of-the-art transformer-based and GAN-based approaches.

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

  • Authors: Wen Wang, Kangyang Xie, Zide Liu, Hao Chen, Yue Cao, Xinlong Wang, Chunhua Shen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17599
  • Pdf link: https://arxiv.org/pdf/2303.17599
  • Abstract
    Large-scale text-to-image diffusion models achieve unprecedented success in image generation and editing. However, how to extend such success to video editing is unclear. Recent initial attempts at video editing require significant text-to-video data and computation resources for training, which is often not accessible. In this work, we propose vid2vid-zero, a simple yet effective method for zero-shot video editing. Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video. At the core of our method is a null-text inversion module for text-to-video alignment, a cross-frame modeling module for temporal consistency, and a spatial regularization module for fidelity to the original video. Without any training, we leverage the dynamic nature of the attention mechanism to enable bi-directional temporal modeling at test time. Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos. Code will be made available at \url{https://github.com/baaivision/vid2vid-zero}.

Token Merging for Fast Stable Diffusion

  • Authors: Daniel Bolya, Judy Hoffman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17604
  • Pdf link: https://arxiv.org/pdf/2303.17604
  • Abstract
    The landscape of image generation has been forever changed by open vocabulary diffusion models. However, at their core these models use transformers, which makes generation slow. Better implementations to increase the throughput of these transformers have emerged, but they still evaluate the entire model. In this paper, we instead speed up diffusion models by exploiting natural redundancy in generated images by merging redundant tokens. After making some diffusion-specific improvements to Token Merging (ToMe), our ToMe for Stable Diffusion can reduce the number of tokens in an existing Stable Diffusion model by up to 60% while still producing high quality images without any extra training. In the process, we speed up image generation by up to 2x and reduce memory consumption by up to 5.6x. Furthermore, this speed-up stacks with efficient implementations such as xFormers, minimally impacting quality while being up to 5.4x faster for large images. Code is available at https://github.com/dbolya/tomesd.

AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control

  • Authors: Ruixiang Jiang, Can Wang, Jingbo Zhang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17606
  • Pdf link: https://arxiv.org/pdf/2303.17606
  • Abstract
    Neural implicit fields are powerful for representing 3D scenes and generating high-quality novel views, but it remains challenging to use such implicit representations for creating a 3D human avatar with a specific identity and artistic style that can be easily animated. Our proposed method, AvatarCraft, addresses this challenge by using diffusion models to guide the learning of geometry and texture for a neural avatar based on a single text prompt. We carefully design the optimization framework of neural implicit fields, including a coarse-to-fine multi-bounding box training strategy, shape regularization, and diffusion-based constraints, to produce high-quality geometry and texture. Additionally, we make the human avatar animatable by deforming the neural implicit field with an explicit warping field that maps the target human mesh to a template human mesh, both represented using parametric human models. This simplifies animation and reshaping of the generated avatar by controlling pose and shape parameters. Extensive experiments on various text descriptions show that AvatarCraft is effective and robust in creating human avatars and rendering novel views, poses, and shapes. Our project page is: \url{https://avatar-craft.github.io/}.

Keyword: dynamic

Thrust vector control and state estimation architecture for low-cost small-scale launchers

  • Authors: Pedro dos Santos, Paulo Oliveira
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.16983
  • Pdf link: https://arxiv.org/pdf/2303.16983
  • Abstract
    This paper proposes an integrated architecture for Thrust Vector Control (TVC) and state estimation for low-cost small-scale launchers, naturally unstable, and propelled by a solid motor. The architecture is based on a non-linear, six-degrees-of-freedom model for the generic thrust-vector-controlled launcher dynamics and kinematics, deduced and implemented in a realistic simulation environment. For estimation and control design purposes, a linearized version of the model is proposed. Single-nozzle TVC actuation is adopted, allowing for pitch and yaw control, with the control law being derived from the Linear Quadratic Regulator (LQR) with additional integral action (LQI). The control system is implemented through gain scheduling. Full state estimation is performed resorting to complementary kinematic filters, closely related to linear Kalman fitering theory. The architecture, composed by the navigation and control systems, is tested in simulation environment, demonstrating satisfactory attitude tracking performance and robustness to both external disturbances and model uncertainties.

PopSparse: Accelerated block sparse matrix multiplication on IPU

  • Authors: Zhiyi Li, Douglas Orr, Valeriu Ohan, Godfrey Da costa, Tom Murray, Adam Sanders, Deniz Beker, Dominic Masters
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16999
  • Pdf link: https://arxiv.org/pdf/2303.16999
  • Abstract
    Reducing the computational cost of running large scale neural networks using sparsity has attracted great attention in the deep learning community. While much success has been achieved in reducing FLOP and parameter counts while maintaining acceptable task performance, achieving actual speed improvements has typically been much more difficult, particularly on general purpose accelerators (GPAs) such as NVIDIA GPUs using low precision number formats. In this work we introduce PopSparse, a library that enables fast sparse operations on Graphcore IPUs by leveraging both the unique hardware characteristics of IPUs as well as any block structure defined in the data. We target two different types of sparsity: static, where the sparsity pattern is fixed at compile-time; and dynamic, where it can change each time the model is run. We present benchmark results for matrix multiplication for both of these modes on IPU with a range of block sizes, matrix sizes and densities. Results indicate that the PopSparse implementations are faster than dense matrix multiplications on IPU at a range of sparsity levels with large matrix size and block size. Furthermore, static sparsity in general outperforms dynamic sparsity. While previous work on GPAs has shown speedups only for very high sparsity (typically 99% and above), the present work demonstrates that our static sparse implementation outperforms equivalent dense calculations in FP16 at lower sparsity (around 90%).

Scalable Implicit Solvers with Dynamic Mesh Adaptation for a Relativistic Drift-Kinetic Fokker-Planck-Boltzmann Model

  • Authors: Johann Rudi, Max Heldman, Emil M. Constantinescu, Qi Tang, Xian-Zhu Tang
  • Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Plasma Physics (physics.plasm-ph)
  • Arxiv link: https://arxiv.org/abs/2303.17019
  • Pdf link: https://arxiv.org/pdf/2303.17019
  • Abstract
    In this work we consider a relativistic drift-kinetic model for runaway electrons along with a Fokker-Planck operator for small-angle Coulomb collisions, a radiation damping operator, and a secondary knock-on (Boltzmann) collision source. We develop a new scalable fully implicit solver utilizing finite volume and conservative finite difference schemes and dynamic mesh adaptivity. A new data management framework in the PETSc library based on the p4est library is developed to enable simulations with dynamic adaptive mesh refinement (AMR), parallel computation, and load balancing. This framework is tested through the development of the runaway electron solver that is able to dynamically capture both bulk Maxwellian at the low-energy region and a runaway tail at the high-energy region. To effectively capture features via the AMR algorithm, a new AMR indicator prediction strategy is proposed that is performed alongside the implicit time evolution of the solution. This strategy is complemented by the introduction of computationally cheap feature-based AMR indicators that are analyzed theoretically. Numerical results quantify the advantages of the prediction strategy in better capturing features compared with nonpredictive strategies; and we demonstrate trade-offs regarding computational costs. The full solver is further verified through several benchmark problems including manufactured solutions and solutions of physics models. We particularly focus on demonstrating the advantages of using implicit time stepping and AMR for runaway electron simulations.

Stability bounds of droop-controlled inverters in power grid networks

  • Authors: Philipp C. Böttcher, Leonardo Rydin Gorjão, Dirk Witthaut
  • Subjects: Systems and Control (eess.SY); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2303.17032
  • Pdf link: https://arxiv.org/pdf/2303.17032
  • Abstract
    The energy mix of future power systems will include high shares of wind power and solar PV. These generation facilities are generally connected via power-electronic inverters. While conventional generation responds dynamically to the state of the electric power system, inverters are power electronic hardware and need to be programmed to react to the state of the system. Choosing an appropriate control scheme and the corresponding parameters is necessary to guarantee that the system operates safely. A prominent control scheme for inverters is droop control, which mimics the response of conventional generation. In this work, we investigate the stability of coupled systems of droop-controlled inverters in arbitrary network topologies. Employing linear stability analysis, we derive effective local stability criteria that consider both the overall network topology as well as its interplay with the inverters' intrinsic parameters. First, we explore the stability of an inverter coupled to an infinite grid in an analytic fashion and uncover stability and instability regions. Secondly, we extend the analysis to a generic topology of inverters and provide mathematical criteria for stability and instability of the system. Last, we showcase the usefulness of the criteria by examining two model systems using numerical simulations. The developed criteria show which parameters might lead to an unstable operating state.

Material-agnostic Shaping of Granular Materials with Optimal Transport

  • Authors: Nikhilesh Alatur, Olov Andersson, Roland Siegwart, Lionel Ott
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17047
  • Pdf link: https://arxiv.org/pdf/2303.17047
  • Abstract
    From construction materials, such as sand or asphalt, to kitchen ingredients, like rice, sugar, or salt; the world is full of granular materials. Despite impressive progress in robotic manipulation, manipulating and interacting with granular material remains a challenge due to difficulties in perceiving, representing, modelling, and planning for these variable materials that have complex internal dynamics. While some prior work has looked into estimating or learning accurate dynamics models for granular materials, the literature is still missing a more abstract planning method that can be used for planning manipulation actions for granular materials with unknown material properties. In this work, we leverage tools from optimal transport and connect them to robot motion planning. We propose a heuristics-based sweep planner that does not require knowledge of the material's properties and directly uses a height map representation to generate promising sweeps. These sweeps transform granular material from arbitrary start shapes into arbitrary target shapes. We apply the sweep planner in a fast and reactive feedback loop and avoid the need for model-based planning over multiple time steps. We validate our approach with a large set of simulation and hardware experiments where we show that our method is capable of efficiently solving several complex tasks, including gathering, separating, and shaping of several types of granular materials into different target shapes.

Modularized Control Synthesis for Complex Signal Temporal Logic Specifications

  • Authors: Zengjie Zhang, Sofie Haesaert
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.17086
  • Pdf link: https://arxiv.org/pdf/2303.17086
  • Abstract
    The control synthesis of a dynamic system subject to signal temporal logic (STL) specifications is commonly formulated as a mixed-integer linear programming (MILP) problem. Solving a MILP problem is computationally expensive when the STL formulas are long and complex. In this paper, we propose a framework to transform a long and complex STL formula into a syntactically separate form, i.e., the logical combination of a series of short and simple subformulas with non-overlapping timing intervals. Using this framework, one can easily modularize the synthesis of a complex formula using the synthesis solutions of the subformulas, which improves the efficiency of solving a MILP problem. Specifically, we propose a group of separation principles to guarantee the syntactic equivalence between the original formula and its syntactically separate counterpart. Then, we propose novel methods to solve the largest satisfaction region and the open-loop controller of the specification in a modularized manner. The efficacy of the methods is validated with a robot monitoring case study in simulation. Our work is promising to promote the efficiency of control synthesis for systems with complicated specifications.

Learning Reliable Representations for Incomplete Multi-View Partial Multi-Label Classification

  • Authors: Chengliang Liu, Jie Wen, Yong Xu, Liqiang Nie, Min Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17117
  • Pdf link: https://arxiv.org/pdf/2303.17117
  • Abstract
    As a cross-topic of multi-view learning and multi-label classification, multi-view multi-label classification has gradually gained traction in recent years. The application of multi-view contrastive learning has further facilitated this process, however, the existing multi-view contrastive learning methods crudely separate the so-called negative pair, which largely results in the separation of samples belonging to the same category or similar ones. Besides, plenty of multi-view multi-label learning methods ignore the possible absence of views and labels. To address these issues, in this paper, we propose an incomplete multi-view partial multi-label classification network named RANK. In this network, a label-driven multi-view contrastive learning strategy is proposed to leverage supervised information to preserve the structure within view and perform consistent alignment across views. Furthermore, we break through the view-level weights inherent in existing methods and propose a quality-aware sub-network to dynamically assign quality scores to each view of each sample. The label correlation information is fully utilized in the final multi-label cross-entropy classification loss, effectively improving the discriminative power. Last but not least, our model is not only able to handle complete multi-view multi-label datasets, but also works on datasets with missing instances and labels. Extensive experiments confirm that our RANK outperforms existing state-of-the-art methods.

Weighted Scheduling of Time-Sensitive Coflows

  • Authors: Olivier Brun, Rachid El-Azouzi, Quang-Trung Luu, Francesco De Pellergrini, Balakrishna J. Prabhu, Cédric Richier
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.17175
  • Pdf link: https://arxiv.org/pdf/2303.17175
  • Abstract
    Datacenter networks routinely support the data transfers of distributed computing frameworks in the form of coflows, i.e., sets of concurrent flows related to a common task. The vast majority of the literature has focused on the problem of scheduling coflows for completion time minimization, i.e., to maximize the average rate at which coflows are dispatched in the network fabric. However, many modern applications generate coflows dedicated to online services and mission-critical computing tasks which have to comply with specific completion deadlines. In this paper, we introduce $\mathtt{WDCoflow}$, a new algorithm to maximize the weighted number of coflows that complete before their deadline. By combining a dynamic programming algorithm along with parallel inequalities, our heuristic solution performs at once coflow admission control and coflow prioritization, imposing a $\sigma$-order on the set of coflows. With extensive simulation, we demonstrate the effectiveness of our algorithm in improving up to $3\times$ more coflows that meet their deadline in comparison the best SotA solution, namely $\mathtt{CS\text{-}MHA}$. Furthermore, when weights are used to differentiate coflow classes, $\mathtt{WDCoflow}$ is able to improve the admission per class up to $4\times$, while increasing the average weighted coflow admission rate.

Innovative Countermeasures to Defeat Cyber Attacks Against Blockchain Wallets: A Crypto Terminal Use Case

  • Authors: Pascal Urien (LTCI)
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.17206
  • Pdf link: https://arxiv.org/pdf/2303.17206
  • Abstract
    Blockchain transactions are signed by private keys. Secure key storage and tamper-proof computers are essential requirements for deploying a trusted infrastructure. In this paper, we identify some threats against blockchain wallets and propose a set of physical and logical countermeasures to thwart them. We present the crypto terminal device, operating with a removable secure element, built on open software and hardware architectures, capable of detecting a cloned device or corrupted software. These technologies are based on tamper-resistant computing (javacard), smart card anti-cloning, smart card content attestation, application firewall, bare-metal architecture, remote attestation, dynamic Physical Unclonable Function (dPUF), and programming tokens as a root of trust.This paper is an extended version of the paper ''Innovative Countermeasures to Defeat Cyber Attacks Against Blockchain Wallets,'' 2021 5th Cyber Security in Networking Conference (CSNet), 2021, pp. 49-54, doi: 10.1109/CSNet52717.2021.9614649

Multifactor Sequential Disentanglement via Structured Koopman Autoencoders

  • Authors: Nimrod Berman, Ilan Naiman, Omri Azencot
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17264
  • Pdf link: https://arxiv.org/pdf/2303.17264
  • Abstract
    Disentangling complex data to its latent factors of variation is a fundamental task in representation learning. Existing work on sequential disentanglement mostly provides two factor representations, i.e., it separates the data to time-varying and time-invariant factors. In contrast, we consider multifactor disentanglement in which multiple (more than two) semantic disentangled components are generated. Key to our approach is a strong inductive bias where we assume that the underlying dynamics can be represented linearly in the latent space. Under this assumption, it becomes natural to exploit the recently introduced Koopman autoencoder models. However, disentangled representations are not guaranteed in Koopman approaches, and thus we propose a novel spectral loss term which leads to structured Koopman matrices and disentanglement. Overall, we propose a simple and easy to code new deep model that is fully unsupervised and it supports multifactor disentanglement. We showcase new disentangling abilities such as swapping of individual static factors between characters, and an incremental swap of disentangled factors from the source to the target. Moreover, we evaluate our method extensively on two factor standard benchmark tasks where we significantly improve over competing unsupervised approaches, and we perform competitively in comparison to weakly- and self-supervised state-of-the-art approaches. The code is available at https://github.com/azencot-group/SKD.

Improved a posteriori Error Bounds for Reduced port-Hamiltonian Systems

  • Authors: Johannes Rettberg, Dominik Wittwar, Patrick Buchfink, Robin Herkert, Jörg Fehr, Bernard Haasdonk
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.17329
  • Pdf link: https://arxiv.org/pdf/2303.17329
  • Abstract
    Projection-based model order reduction of dynamical systems usually introduces an error between the high-fidelity model and its counterpart of lower dimension. This unknown error can be bounded by residual-based methods, which are typically known to be highly pessimistic in the sense of largely overestimating the true error. This work applies two improved error bounding techniques, namely (a) a hierarchical error bound and (b) an error bound based on an auxiliary linear problem, to the case of port-Hamiltonian systems. The approaches rely on a second approximation of (a) the dynamical system and (b) the error system. In this paper, these methods are for the first time adapted to port-Hamiltonian systems by exploiting their structure. The mathematical relationship between the two methods is discussed both, theoretically and numerically. The effectiveness of the described methods is demonstrated using a challenging three-dimensional port-Hamiltonian model of a classical guitar with fluid-structure interaction.

Uniform Substitution for Dynamic Logic with Communicating Hybrid Programs

  • Authors: Marvin Brieger, Stefan Mitsch, André Platzer
  • Subjects: Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2303.17333
  • Pdf link: https://arxiv.org/pdf/2303.17333
  • Abstract
    This paper introduces a uniform substitution calculus for $d\mathcal{L}\text{CHP}$, the dynamic logic of communicating hybrid programs. Uniform substitution enables parsimonious prover kernels by using axioms instead of axiom schemata. Instantiations can be recovered from a single proof rule responsible for soundness-critical instantiation checks rather than being spread across axiom schemata in side conditions. Even though communication and parallelism reasoning are notorious for necessitating subtle soundness-critical side conditions, uniform substitution when generalized to $d\mathcal{L}\text{CHP}$ manages to limit and isolate their conceptual overhead. Since uniform substitution has proven to simplify the implementation of hybrid systems provers substantially, uniform substitution for $d\mathcal{L}_\text{CHP}$ paves the way for a parsimonious implementation of theorem provers for hybrid systems with communication and parallelism.

The Essential Algorithms for the Matrix Chain

  • Authors: Francisco López, Lars Karlsson, Paolo Bientinesi
  • Subjects: Discrete Mathematics (cs.DM)
  • Arxiv link: https://arxiv.org/abs/2303.17352
  • Pdf link: https://arxiv.org/pdf/2303.17352
  • Abstract
    For a given product of $n$ matrices, the matrix chain multiplication problem asks for a parenthesisation that minimises the number of arithmetic operations. In 1973, Godbole presented a now classical dynamic programming formulation with cubic time complexity on the length of the chain. The best known algorithms run in linearithmic time, and the best known approximation algorithms run in linear time with an approximation factor smaller than two. All solutions have in common that they select an optimal parenthesisation from a set of $C_{n-1}$ (Catalan number $n - 1$) distinct parenthesisations. We studied the set of parenthesisations and discovered (a) that all of the exponentially many parenthesisations are useful in the sense that they are optimal in an infinite subset of the input space, (b) that only $n + 1$ parenthesisations are essential in the sense that they are arbitrarily better than the second best on an infinite subset of the input space, and (c) that the best essential parenthesisation is never more than twice as costly as the best non-essential parenthesisation. Through random sampling of the input space, we further discovered that the set of essential parenthesisations includes an optimal parenthesisation in the vast majority of inputs, and that the best essential parenthesisation is on average much closer to optimal than the worst-case bound. The results have direct consequences for the development of compilers for linear algebra expressions where the matrix sizes are unknown at compile-time.

Dynamic Conceptional Contrastive Learning for Generalized Category Discovery

  • Authors: Nan Pu, Zhun Zhong, Nicu Sebe
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17393
  • Pdf link: https://arxiv.org/pdf/2303.17393
  • Abstract
    Generalized category discovery (GCD) is a recently proposed open-world problem, which aims to automatically cluster partially labeled data. The main challenge is that the unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories. This leads traditional novel category discovery (NCD) methods to be incapacitated for GCD, due to their assumption of unlabeled data are only from novel categories. One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data. However, this manner largely ignores underlying relationships between instances of the same concepts (e.g., class, super-class, and sub-class), which results in inferior representation learning. In this paper, we propose a Dynamic Conceptional Contrastive Learning (DCCL) framework, which can effectively improve clustering accuracy by alternately estimating underlying visual conceptions and learning conceptional representation. In addition, we design a dynamic conception generation and update mechanism, which is able to ensure consistent conception learning and thus further facilitate the optimization of DCCL. Extensive experiments show that DCCL achieves new state-of-the-art performances on six generic and fine-grained visual recognition datasets, especially on fine-grained ones. For example, our method significantly surpasses the best competitor by 16.2% on the new classes for the CUB-200 dataset. Code is available at https://github.com/TPCD/DCCL.

Fast inference of latent space dynamics in huge relational event networks

  • Authors: Igor Artico, Ernst Wit
  • Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2303.17460
  • Pdf link: https://arxiv.org/pdf/2303.17460
  • Abstract
    Relational events are a type of social interactions, that sometimes are referred to as dynamic networks. Its dynamics typically depends on emerging patterns, so-called endogenous variables, or external forces, referred to as exogenous variables. Comprehensive information on the actors in the network, especially for huge networks, is rare, however. A latent space approach in network analysis has been a popular way to account for unmeasured covariates that are driving network configurations. Bayesian and EM-type algorithms have been proposed for inferring the latent space, but both the sheer size many social network applications as well as the dynamic nature of the process, and therefore the latent space, make computations prohibitively expensive. In this work we propose a likelihood-based algorithm that can deal with huge relational event networks. We propose a hierarchical strategy for inferring network community dynamics embedded into an interpretable latent space. Node dynamics are described by smooth spline processes. To make the framework feasible for large networks we borrow from machine learning optimization methodology. Model-based clustering is carried out via a convex clustering penalization, encouraging shared trajectories for ease of interpretation. We propose a model-based approach for separating macro-microstructures and perform a hierarchical analysis within successive hierarchies. The method can fit millions of nodes on a public Colab GPU in a few minutes. The code and a tutorial are available in a Github repository.

PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation

  • Authors: Qitao Zhao, Ce Zheng, Mengyuan Liu, Pichao Wang, Chen Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17472
  • Pdf link: https://arxiv.org/pdf/2303.17472
  • Abstract
    Recently, transformer-based methods have gained significant success in sequential 2D-to-3D lifting human pose estimation. As a pioneering work, PoseFormer captures spatial relations of human joints in each video frame and human dynamics across frames with cascaded transformer layers and has achieved impressive performance. However, in real scenarios, the performance of PoseFormer and its follow-ups is limited by two factors: (a) The length of the input joint sequence; (b) The quality of 2D joint detection. Existing methods typically apply self-attention to all frames of the input sequence, causing a huge computational burden when the frame number is increased to obtain advanced estimation accuracy, and they are not robust to noise naturally brought by the limited capability of 2D joint detectors. In this paper, we propose PoseFormerV2, which exploits a compact representation of lengthy skeleton sequences in the frequency domain to efficiently scale up the receptive field and boost robustness to noisy 2D joint detection. With minimum modifications to PoseFormer, the proposed method effectively fuses features both in the time domain and frequency domain, enjoying a better speed-accuracy trade-off than its precursor. Extensive experiments on two benchmark datasets (i.e., Human3.6M and MPI-INF-3DHP) demonstrate that the proposed approach significantly outperforms the original PoseFormer and other transformer-based variants. Code is released at \url{https://github.com/QitaoZhao/PoseFormerV2}.

Differentiable Environment Primitives for Contact State Estimation

  • Authors: Kevin Haninger, Kangwagye Samuel, Filippo Rozzi, Sehoon Oh, Loris Roveda
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17476
  • Pdf link: https://arxiv.org/pdf/2303.17476
  • Abstract
    In contact-rich manipulation, the robot dynamics are coupled with an environment that has application-specific dynamic properties (stiffness, inertia) and geometry (contact normal). Knowledge of these environmental parameters can improve control and monitoring, but they are often unobserved and may vary, either online or between task instances. Observers, such as the extended Kalman filter, can be used to estimate these parameters, but such model-based techniques can require too much engineering work to scale up to complex environments, such as multi-point contact. To accelerate environment modeling, we propose environment primitives: parameterized environment dynamics that can be connected in parallel and are expressed in an automatic differentiation framework. This simplifies offline gradient-based optimization to fit model parameters and linearization of the coupled dynamics for an observer. This method is implemented for stiffness contact models, allowing the fitting of contact geometry and stiffness offline or their online estimation by an extended Kalman filter. This method is applied to a collaborative robot, estimating external force, contact stiffness, and contact geometry from the motor position and current. The estimates of external force and stiffness are compared with a momentum observer and direct force measurements.

On the Analysis of Computational Delays in Reinforcement Learning-based Rate Adaptation Algorithms

  • Authors: Ricardo Trancoso, Ruben Queiros, Helder Fontes, Rui Campos
  • Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17477
  • Pdf link: https://arxiv.org/pdf/2303.17477
  • Abstract
    Several research works have applied Reinforcement Learning (RL) algorithms to solve the Rate Adaptation (RA) problem in Wi-Fi networks. The dynamic nature of the radio link requires the algorithms to be responsive to changes in link quality. Delays in the execution of the algorithm may be detrimental to its performance, which in turn may decrease network performance. This aspect has been overlooked in the state of the art. In this paper, we present an analysis of common computational delays in RL-based RA algorithms, and propose a methodology that may be applied to reduce these computational delays and increase the efficiency of this type of algorithms. We apply the proposed methodology to an existing RL-based RA algorithm. The obtained experimental results indicate a reduction of one order of magnitude in the execution time of the algorithm, improving its responsiveness to link quality changes.

Event-based Agile Object Catching with a Quadrupedal Robot

  • Authors: Benedek Forrai, Takahiro Miki, Daniel Gehrig, Marco Hutter, Davide Scaramuzza
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17479
  • Pdf link: https://arxiv.org/pdf/2303.17479
  • Abstract
    Quadrupedal robots are conquering various indoor and outdoor applications due to their ability to navigate challenging uneven terrains. Exteroceptive information greatly enhances this capability since perceiving their surroundings allows them to adapt their controller and thus achieve higher levels of robustness. However, sensors such as LiDARs and RGB cameras do not provide sufficient information to quickly and precisely react in a highly dynamic environment since they suffer from a bandwidth-latency tradeoff. They require significant bandwidth at high frame rates while featuring significant perceptual latency at lower frame rates, thereby limiting their versatility on resource-constrained platforms. In this work, we tackle this problem by equipping our quadruped with an event camera, which does not suffer from this tradeoff due to its asynchronous and sparse operation. In leveraging the low latency of the events, we push the limits of quadruped agility and demonstrate high-speed ball catching for the first time. We show that our quadruped equipped with an event camera can catch objects with speeds up to 15 m/s from 4 meters, with a success rate of 83%. Using a VGA event camera, our method runs at 100 Hz on an NVIDIA Jetson Orin.

Teaching contact-rich tasks from visual demonstrations by constraint extraction

  • Authors: Christian Hegeler, Filippo Rozzi, Loris Roveda, Kevin Haninger
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17481
  • Pdf link: https://arxiv.org/pdf/2303.17481
  • Abstract
    Contact-rich manipulation involves kinematic constraints on the task motion, typically with discrete transitions between these constraints during the task. Allowing the robot to detect and reason about these contact constraints can support robust and dynamic manipulation, but how can these contact models be efficiently learned? Purely visual observations are an attractive data source, allowing passive task demonstrations with unmodified objects. Existing approaches for vision-only learning from demonstration are effective in pick-and-place applications and planar tasks. Nevertheless, accuracy/occlusions and unobserved task dynamics can limit their robustness in contact-rich manipulation. To use visual demonstrations for contact-rich robotic tasks, we consider the demonstration of pose trajectories with transitions between holonomic kinematic constraints, first clustering the trajectories into discrete contact modes, then fitting kinematic constraints per each mode. The fit constraints are then used to (i) detect contact online with force/torque measurements and (ii) plan the robot policy with respect to the active constraint. We demonstrate the approach with real experiments, on cabling and rake tasks, showing the approach gives robust manipulation through contact transitions.

DDP: Diffusion Model for Dense Visual Prediction

  • Authors: Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17559
  • Pdf link: https://arxiv.org/pdf/2303.17559
  • Abstract
    We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task compared to the specialist counterparts. For example, semantic segmentation (83.9 mIoU on Cityscapes), BEV map segmentation (70.6 mIoU on nuScenes), and depth estimation (0.05 REL on KITTI). We hope that our approach will serve as a solid baseline and facilitate future research

TiDy-PSFs: Computational Imaging with Time-Averaged Dynamic Point-Spread-Functions

  • Authors: Sachin Shah, Sakshum Kulshrestha, Christopher A. Metzler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17583
  • Pdf link: https://arxiv.org/pdf/2303.17583
  • Abstract
    Point-spread-function (PSF) engineering is a powerful computational imaging techniques wherein a custom phase mask is integrated into an optical system to encode additional information into captured images. Used in combination with deep learning, such systems now offer state-of-the-art performance at monocular depth estimation, extended depth-of-field imaging, lensless imaging, and other tasks. Inspired by recent advances in spatial light modulator (SLM) technology, this paper answers a natural question: Can one encode additional information and achieve superior performance by changing a phase mask dynamically over time? We first prove that the set of PSFs described by static phase masks is non-convex and that, as a result, time-averaged PSFs generated by dynamic phase masks are fundamentally more expressive. We then demonstrate, in simulation, that time-averaged dynamic (TiDy) phase masks can offer substantially improved monocular depth estimation and extended depth-of-field imaging performance.

Polarity is all you need to learn and transfer faster

  • Authors: Qingyang Wang, Michael A.Powell, Ali Geisa, Eric Bridgeford, Joshua T. Vogelstein
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
  • Arxiv link: https://arxiv.org/abs/2303.17589
  • Pdf link: https://arxiv.org/pdf/2303.17589
  • Abstract
    Natural intelligences (NIs) thrive in a dynamic world - they learn quickly, sometimes with only a few samples. In contrast, Artificial intelligences (AIs) typically learn with prohibitive amount of training samples and computational power. What design principle difference between NI and AI could contribute to such a discrepancy? Here, we propose an angle from weight polarity: development processes initialize NIs with advantageous polarity configurations; as NIs grow and learn, synapse magnitudes update yet polarities are largely kept unchanged. We demonstrate with simulation and image classification tasks that if weight polarities are adequately set $\textit{a priori}$, then networks learn with less time and data. We also explicitly illustrate situations in which $\textit{a priori}$ setting the weight polarities is disadvantageous for networks. Our work illustrates the value of weight polarities from the perspective of statistical and computational efficiency during learning.

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

  • Authors: Wen Wang, Kangyang Xie, Zide Liu, Hao Chen, Yue Cao, Xinlong Wang, Chunhua Shen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17599
  • Pdf link: https://arxiv.org/pdf/2303.17599
  • Abstract
    Large-scale text-to-image diffusion models achieve unprecedented success in image generation and editing. However, how to extend such success to video editing is unclear. Recent initial attempts at video editing require significant text-to-video data and computation resources for training, which is often not accessible. In this work, we propose vid2vid-zero, a simple yet effective method for zero-shot video editing. Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video. At the core of our method is a null-text inversion module for text-to-video alignment, a cross-frame modeling module for temporal consistency, and a spatial regularization module for fidelity to the original video. Without any training, we leverage the dynamic nature of the attention mechanism to enable bi-directional temporal modeling at test time. Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos. Code will be made available at \url{https://github.com/baaivision/vid2vid-zero}.

New submissions for Mon, 24 Apr 23

Keyword: efficient

Using Z3 for Formal Modeling and Verification of FNN Global Robustness

  • Authors: Yihao Zhang, Zeming Wei, Xiyue Zhang, Meng Sun
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.10558
  • Pdf link: https://arxiv.org/pdf/2304.10558
  • Abstract
    While Feedforward Neural Networks (FNNs) have achieved remarkable success in various tasks, they are vulnerable to adversarial examples. Several techniques have been developed to verify the adversarial robustness of FNNs, but most of them focus on robustness verification against the local perturbation neighborhood of a single data point. There is still a large research gap in global robustness analysis. The global-robustness verifiable framework DeepGlobal has been proposed to identify \textit{all} possible Adversarial Dangerous Regions (ADRs) of FNNs, not limited to data samples in a test set. In this paper, we propose a complete specification and implementation of DeepGlobal utilizing the SMT solver Z3 for more explicit definition, and propose several improvements to DeepGlobal for more efficient verification. To evaluate the effectiveness of our implementation and improvements, we conduct extensive experiments on a set of benchmark datasets. Visualization of our experiment results shows the validity and effectiveness of the approach.

KOIOS: Top-k Semantic Overlap Set Search

  • Authors: Pranay Mundra, Jianhao Zhang, Fatemeh Nargesian, Nikolaus Augsten
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.10572
  • Pdf link: https://arxiv.org/pdf/2304.10572
  • Abstract
    We study the top-k set similarity search problem using semantic overlap. While vanilla overlap requires exact matches between set elements, semantic overlap allows elements that are syntactically different but semantically related to increase the overlap. The semantic overlap is the maximum matching score of a bipartite graph, where an edge weight between two set elements is defined by a user-defined similarity function, e.g., cosine similarity between embeddings. Common techniques like token indexes fail for semantic search since similar elements may be unrelated at the character level. Further, verifying candidates is expensive (cubic versus linear for syntactic overlap), calling for highly selective filters. We propose KOIOS, the first exact and efficient algorithm for semantic overlap search. KOIOS leverages sophisticated filters to minimize the number of required graph-matching calculations. Our experiments show that for medium to large sets less than 5% of the candidate sets need verification, and more than half of those sets are further pruned without requiring the expensive graph matching. We show the efficiency of our algorithm on four real datasets and demonstrate the improved result quality of semantic over vanilla set similarity search.

B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding

  • Authors: Miruna Oprescu, Jacob Dorn, Marah Ghoummaid, Andrew Jesson, Nathan Kallus, Uri Shalit
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.10577
  • Pdf link: https://arxiv.org/pdf/2304.10577
  • Abstract
    Estimating heterogeneous treatment effects from observational data is a crucial task across many fields, helping policy and decision-makers take better actions. There has been recent progress on robust and efficient methods for estimating the conditional average treatment effect (CATE) function, but these methods often do not take into account the risk of hidden confounding, which could arbitrarily and unknowingly bias any causal estimate based on observational data. We propose a meta-learner called the B-Learner, which can efficiently learn sharp bounds on the CATE function under limits on the level of hidden confounding. We derive the B-Learner by adapting recent results for sharp and valid bounds of the average treatment effect (Dorn et al., 2021) into the framework given by Kallus & Oprescu (2022) for robust and model-agnostic learning of distributional treatment effects. The B-Learner can use any function estimator such as random forests and deep neural networks, and we prove its estimates are valid, sharp, efficient, and have a quasi-oracle property with respect to the constituent estimators under more general conditions than existing methods. Semi-synthetic experimental comparisons validate the theoretical findings, and we use real-world data demonstrate how the method might be used in practice.

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

  • Authors: Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10592
  • Pdf link: https://arxiv.org/pdf/2304.10592
  • Abstract
    The recent GPT-4 has demonstrated extraordinary multi-modal abilities, such as directly generating websites from handwritten text and identifying humorous elements within images. These features are rarely observed in previous vision-language models. We believe the primary reason for GPT-4's advanced multi-modal generation capabilities lies in the utilization of a more advanced large language model (LLM). To examine this phenomenon, we present MiniGPT-4, which aligns a frozen visual encoder with a frozen LLM, Vicuna, using just one projection layer. Our findings reveal that MiniGPT-4 possesses many capabilities similar to those exhibited by GPT-4 like detailed image description generation and website creation from hand-written drafts. Furthermore, we also observe other emerging capabilities in MiniGPT-4, including writing stories and poems inspired by given images, providing solutions to problems shown in images, teaching users how to cook based on food photos, etc. In our experiment, we found that only performing the pretraining on raw image-text pairs could produce unnatural language outputs that lack coherency including repetition and fragmented sentences. To address this problem, we curate a high-quality, well-aligned dataset in the second stage to finetune our model using a conversational template. This step proved crucial for augmenting the model's generation reliability and overall usability. Notably, our model is highly computationally efficient, as we only train a projection layer utilizing approximately 5 million aligned image-text pairs. Our code, pre-trained model, and collected dataset are available at https://minigpt-4.github.io/.

DeepReShape: Redesigning Neural Networks for Efficient Private Inference

  • Authors: Nandan Kumar Jha, Brandon Reagen
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.10593
  • Pdf link: https://arxiv.org/pdf/2304.10593
  • Abstract
    The increasing demand for privacy and security has driven the advancement of private inference (PI), a cryptographic method enabling inferences directly on encrypted data. However, the computational and storage burdens of non-linear operators (e.g., ReLUs) render it impractical. Despite these limitations, prior ReLU optimization methods consistently relied on classical networks, that are not optimized for PI. Moreover, the selection of baseline networks in these ReLU optimization methods remains enigmatic and fails to provide insights into network attributes contributing to PI efficiency. In this paper, we investigate the desirable network architecture for efficient PI, and {\em key finding} is wider networks are superior at higher ReLU counts, while networks with a greater proportion of least-critical ReLUs excel at lower ReLU counts. Leveraging these findings, we develop a novel network redesign technique (DeepReShape) with a complexity of $\mathcal{O}(1)$, and synthesize specialized architectures(HybReNet). Compared to the state-of-the-art (SNL on CIFAR-100), we achieve a 2.35% accuracy gain at 180K ReLUs, and for ResNet50 on TinyImageNet our method saves 4.2$\times$ ReLUs at iso-accuracy.

Enhancing Artificial intelligence Policies with Fusion and Forecasting: Insights from Indian Patents Using Network Analysis

  • Authors: Akhil Kuniyil, Avinash Kshitij, Kasturi Mandal
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10596
  • Pdf link: https://arxiv.org/pdf/2304.10596
  • Abstract
    This paper presents a study of the interconnectivity and interdependence of various Artificial intelligence (AI) technologies through the use of centrality measures, clustering coefficients, and degree of fusion measures. By analyzing the technologies through different time windows and quantifying their importance, we have revealed important insights into the crucial components shaping the AI landscape and the maturity level of the domain. The results of this study have significant implications for future development and advancements in artificial intelligence and provide a clear understanding of key technology areas of fusion. Furthermore, this paper contributes to AI public policy research by offering a data-driven perspective on the current state and future direction of the field. However, it is important to acknowledge the limitations of this research and call for further studies to build on these results. With these findings, we hope to inform and guide future research in the field of AI, contributing to its continued growth and success.

ULEEN: A Novel Architecture for Ultra Low-Energy Edge Neural Networks

  • Authors: Zachary Susskind, Aman Arora, Igor D. S. Miranda, Alan T. L. Bacellar, Luis A. Q. Villon, Rafael F. Katopodis, Leandro S. de Araujo, Diego L. C. Dutra, Priscila M. V. Lima, Felipe M. G. Franca, Mauricio Breternitz Jr., Lizy K. John
  • Subjects: Hardware Architecture (cs.AR); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.10618
  • Pdf link: https://arxiv.org/pdf/2304.10618
  • Abstract
    The deployment of AI models on low-power, real-time edge devices requires accelerators for which energy, latency, and area are all first-order concerns. There are many approaches to enabling deep neural networks (DNNs) in this domain, including pruning, quantization, compression, and binary neural networks (BNNs), but with the emergence of the "extreme edge", there is now a demand for even more efficient models. In order to meet the constraints of ultra-low-energy devices, we propose ULEEN, a model architecture based on weightless neural networks. Weightless neural networks (WNNs) are a class of neural model which use table lookups, not arithmetic, to perform computation. The elimination of energy-intensive arithmetic operations makes WNNs theoretically well suited for edge inference; however, they have historically suffered from poor accuracy and excessive memory usage. ULEEN incorporates algorithmic improvements and a novel training strategy inspired by BNNs to make significant strides in improving accuracy and reducing model size. We compare FPGA and ASIC implementations of an inference accelerator for ULEEN against edge-optimized DNN and BNN devices. On a Xilinx Zynq Z-7045 FPGA, we demonstrate classification on the MNIST dataset at 14.3 million inferences per second (13 million inferences/Joule) with 0.21 $\mu$s latency and 96.2% accuracy, while Xilinx FINN achieves 12.3 million inferences per second (1.69 million inferences/Joule) with 0.31 $\mu$s latency and 95.83% accuracy. In a 45nm ASIC, we achieve 5.1 million inferences/Joule and 38.5 million inferences/second at 98.46% accuracy, while a quantized Bit Fusion model achieves 9230 inferences/Joule and 19,100 inferences/second at 99.35% accuracy. In our search for ever more efficient edge devices, ULEEN shows that WNNs are deserving of consideration.

NFT Marketplace

  • Authors: Piyush Batra, Gagan Raj Singh, Ritik Gandhi
  • Subjects: Multimedia (cs.MM); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.10632
  • Pdf link: https://arxiv.org/pdf/2304.10632
  • Abstract
    In an increasingly digitized world, the secure management and trade of digital assets have become a pressing issue. This project aims to address this challenge by developing a decentralized application (dApp) that leverages blockchain technology and deep learning models to provide secure and efficient digital asset management, with a focus on NFTs. The dApp includes features such as secure wallet connections, NFT image generation, minting, marketplace, and profile management. The back-end of the dApp is implemented using the Goerli testnet with Solidity-based smart contracts, while IPFS and ReactJS/EtherJS are used for decentralized storage and front-end development, respectively. Additionally, the OpenAI API is integrated to generate unique NFT images based on user input. The project demonstrates the practical application of blockchain technology and deep learning models in developing dApps for secure and decentralized digital asset management. Overall, the project contributes to the ongoing research on blockchain-based solutions for secure digital asset management, while highlighting the potential of blockchain and deep learning technologies to transform the way we manage and trade digital assets.

Get Rid Of Your Trail: Remotely Erasing Backdoors in Federated Learning

  • Authors: Manaar Alam, Hithem Lamri, Michail Maniatakos
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.10638
  • Pdf link: https://arxiv.org/pdf/2304.10638
  • Abstract
    Federated Learning (FL) enables collaborative deep learning training across multiple participants without exposing sensitive personal data. However, the distributed nature of FL and the unvetted participants' data makes it vulnerable to backdoor attacks. In these attacks, adversaries inject malicious functionality into the centralized model during training, leading to intentional misclassifications for specific adversary-chosen inputs. While previous research has demonstrated successful injections of persistent backdoors in FL, the persistence also poses a challenge, as their existence in the centralized model can prompt the central aggregation server to take preventive measures to penalize the adversaries. Therefore, this paper proposes a methodology that enables adversaries to effectively remove backdoors from the centralized model upon achieving their objectives or upon suspicion of possible detection. The proposed approach extends the concept of machine unlearning and presents strategies to preserve the performance of the centralized model and simultaneously prevent over-unlearning of information unrelated to backdoor patterns, making the adversaries stealthy while removing backdoors. To the best of our knowledge, this is the first work that explores machine unlearning in FL to remove backdoors to the benefit of adversaries. Exhaustive evaluation considering image classification scenarios demonstrates the efficacy of the proposed method in efficient backdoor removal from the centralized model, injected by state-of-the-art attacks across multiple configurations.

On the Effects of Data Heterogeneity on the Convergence Rates of Distributed Linear System Solvers

  • Authors: Boris Velasevic, Rohit Parasnis, Christopher G. Brinton, Navid Azizan
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.10640
  • Pdf link: https://arxiv.org/pdf/2304.10640
  • Abstract
    We consider the fundamental problem of solving a large-scale system of linear equations. In particular, we consider the setting where a taskmaster intends to solve the system in a distributed/federated fashion with the help of a set of machines, who each have a subset of the equations. Although there exist several approaches for solving this problem, missing is a rigorous comparison between the convergence rates of the projection-based methods and those of the optimization-based ones. In this paper, we analyze and compare these two classes of algorithms with a particular focus on the most efficient method from each class, namely, the recently proposed Accelerated Projection-Based Consensus (APC) and the Distributed Heavy-Ball Method (D-HBM). To this end, we first propose a geometric notion of data heterogeneity called angular heterogeneity and discuss its generality. Using this notion, we bound and compare the convergence rates of the studied algorithms and capture the effects of both cross-machine and local data heterogeneity on these quantities. Our analysis results in a number of novel insights besides showing that APC is the most efficient method in realistic scenarios where there is a large data heterogeneity. Our numerical analyses validate our theoretical results.

Word Sense Induction with Knowledge Distillation from BERT

  • Authors: Anik Saha, Alex Gittens, Bulent Yener
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.10642
  • Pdf link: https://arxiv.org/pdf/2304.10642
  • Abstract
    Pre-trained contextual language models are ubiquitously employed for language understanding tasks, but are unsuitable for resource-constrained systems. Noncontextual word embeddings are an efficient alternative in these settings. Such methods typically use one vector to encode multiple different meanings of a word, and incur errors due to polysemy. This paper proposes a two-stage method to distill multiple word senses from a pre-trained language model (BERT) by using attention over the senses of a word in a context and transferring this sense information to fit multi-sense embeddings in a skip-gram-like framework. We demonstrate an effective approach to training the sense disambiguation mechanism in our model with a distribution over word senses extracted from the output layer embeddings of BERT. Experiments on the contextual word similarity and sense induction tasks show that this method is superior to or competitive with state-of-the-art multi-sense embeddings on multiple benchmark data sets, and experiments with an embedding-based topic model (ETM) demonstrates the benefits of using this multi-sense embedding in a downstream application.

Modular Hardware Design with Timeline Types

  • Authors: Rachit Nigam, Pedro Henrique Azevedo De Amorim, Adrian Sampson
  • Subjects: Hardware Architecture (cs.AR); Programming Languages (cs.PL)
  • Arxiv link: https://arxiv.org/abs/2304.10646
  • Pdf link: https://arxiv.org/pdf/2304.10646
  • Abstract
    Modular design is a key challenge for enabling large-scale reuse of hardware modules. Unlike software, however, hardware designs correspond to physical circuits and inherit constraints from them. Timing constraints -- which cycle a signal arrives, when an input is read -- and structural constraints -- how often a multiplier accepts new inputs -- are fundamental to hardware interfaces. Existing hardware design languages do not provide a way to encode these constraints; a user must read documentation, build scripts, or in the worst case, a module's implementation to understand how to use it. We present Filament, a language for modular hardware design that supports the specification and enforcement of timing and structural constraints for statically scheduled pipelines. Filament uses timeline types, which describe the intervals of clock-cycle time when a given signal is available or required. Filament enables safe composition of hardware modules, ensures that the resulting designs are correctly pipelined, and predictably lowers them to efficient hardware.

Feature point detection in HDR images based on coefficient of variation

  • Authors: Artur Santos Nascimento, Welerson Augusto Lino de Jesus Melo, Daniel Oliveira Dantas, Beatriz Trinchão Andrade
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10666
  • Pdf link: https://arxiv.org/pdf/2304.10666
  • Abstract
    Feature point (FP) detection is a fundamental step of many computer vision tasks. However, FP detectors are usually designed for low dynamic range (LDR) images. In scenes with extreme light conditions, LDR images present saturated pixels, which degrade FP detection. On the other hand, high dynamic range (HDR) images usually present no saturated pixels but FP detection algorithms do not take advantage of all the information present in such images. FP detection frequently relies on differential methods, which work well in LDR images. However, in HDR images, the differential operation response in bright areas overshadows the response in dark areas. As an alternative to standard FP detection methods, this study proposes an FP detector based on a coefficient of variation (CV) designed for HDR images. The CV operation adapts its response based on the standard deviation of pixels inside a window, working well in both dark and bright areas of HDR images. The proposed and standard detectors are evaluated by measuring their repeatability rate (RR) and uniformity. Our proposed detector shows better performance when compared to other standard state-of-the-art detectors. In uniformity metric, our proposed detector surpasses all the other algorithms. In other hand, when using the repeatability rate metric, the proposed detector is worse than Harris for HDR and SURF detectors.

SLEPLET: Slepian Scale-Discretised Wavelets in Python

  • Authors: Patrick J. Roddy
  • Subjects: Information Theory (cs.IT); Instrumentation and Methods for Astrophysics (astro-ph.IM); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.10680
  • Pdf link: https://arxiv.org/pdf/2304.10680
  • Abstract
    Wavelets are widely used in various disciplines to analyse signals both in space and scale. Whilst many fields measure data on manifolds (i.e., the sphere), often data are only observed on a partial region of the manifold. Wavelets are a typical approach to data of this form, but the wavelet coefficients that overlap with the boundary become contaminated and must be removed for accurate analysis. Another approach is to estimate the region of missing data and to use existing whole-manifold methods for analysis. However, both approaches introduce uncertainty into any analysis. Slepian wavelets enable one to work directly with only the data present, thus avoiding the problems discussed above. Applications of Slepian wavelets to areas of research measuring data on the partial sphere include gravitational/magnetic fields in geodesy, ground-based measurements in astronomy, measurements of whole-planet properties in planetary science, geomagnetism of the Earth, and cosmic microwave background analyses.

Matching-based Data Valuation for Generative Model

  • Authors: Jiaxi Yang, Wenglong Deng, Benlin Liu, Yangsibo Huang, Xiaoxiao Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10701
  • Pdf link: https://arxiv.org/pdf/2304.10701
  • Abstract
    Data valuation is critical in machine learning, as it helps enhance model transparency and protect data properties. Existing data valuation methods have primarily focused on discriminative models, neglecting deep generative models that have recently gained considerable attention. Similar to discriminative models, there is an urgent need to assess data contributions in deep generative models as well. However, previous data valuation approaches mainly relied on discriminative model performance metrics and required model retraining. Consequently, they cannot be applied directly and efficiently to recent deep generative models, such as generative adversarial networks and diffusion models, in practice. To bridge this gap, we formulate the data valuation problem in generative models from a similarity-matching perspective. Specifically, we introduce Generative Model Valuator (GMValuator), the first model-agnostic approach for any generative models, designed to provide data valuation for generation tasks. We have conducted extensive experiments to demonstrate the effectiveness of the proposed method. To the best of their knowledge, GMValuator is the first work that offers a training-free, post-hoc data valuation strategy for deep generative models.

EulerNet: Adaptive Feature Interaction Learning via Euler's Formula for CTR Prediction

  • Authors: Zhen Tian, Ting Bai, Wayne Xin Zhao, Ji-Rong Wen, Zhao Cao
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.10711
  • Pdf link: https://arxiv.org/pdf/2304.10711
  • Abstract
    Learning effective high-order feature interactions is very crucial in the CTR prediction task. However, it is very time-consuming to calculate high-order feature interactions with massive features in online e-commerce platforms. Most existing methods manually design a maximal order and further filter out the useless interactions from them. Although they reduce the high computational costs caused by the exponential growth of high-order feature combinations, they still suffer from the degradation of model capability due to the suboptimal learning of the restricted feature orders. The solution to maintain the model capability and meanwhile keep it efficient is a technical challenge, which has not been adequately addressed. To address this issue, we propose an adaptive feature interaction learning model, named as EulerNet, in which the feature interactions are learned in a complex vector space by conducting space mapping according to Euler's formula. EulerNet converts the exponential powers of feature interactions into simple linear combinations of the modulus and phase of the complex features, making it possible to adaptively learn the high-order feature interactions in an efficient way. Furthermore, EulerNet incorporates the implicit and explicit feature interactions into a unified architecture, which achieves the mutual enhancement and largely boosts the model capabilities. Such a network can be fully learned from data, with no need of pre-designed form or order for feature interactions. Extensive experiments conducted on three public datasets have demonstrated the effectiveness and efficiency of our approach. Our code is available at: https://github.com/RUCAIBox/EulerNet.

Deep Learning-empowered Predictive Precoder Design for OTFS Transmission in URLLC

  • Authors: Chang Liu, Shuangyang Li, Weijie Yuan, Xuemeng Liu, Derrick Wing Kwan Ng
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.10723
  • Pdf link: https://arxiv.org/pdf/2304.10723
  • Abstract
    To guarantee excellent reliability performance in ultra-reliable low-latency communications (URLLC), pragmatic precoder design is an effective approach. However, an efficient precoder design highly depends on the accurate instantaneous channel state information at the transmitter (ICSIT), which however, is not always available in practice. To overcome this problem, in this paper, we focus on the orthogonal time frequency space (OTFS)-based URLLC system and adopt a deep learning (DL) approach to directly predict the precoder for the next time frame to minimize the frame error rate (FER) via implicitly exploiting the features from estimated historical channels in the delay-Doppler domain. By doing this, we can guarantee the system reliability even without the knowledge of ICSIT. To this end, a general precoder design problem is formulated where a closed-form theoretical FER expression is specifically derived to characterize the system reliability. Then, a delay-Doppler domain channels-aware convolutional long short-term memory (CLSTM) network (DDCL-Net) is proposed for predictive precoder design. In particular, both the convolutional neural network and LSTM modules are adopted in the proposed neural network to exploit the spatial-temporal features of wireless channels for improving the learning performance. Finally, simulation results demonstrated that the FER performance of the proposed method approaches that of the perfect ICSI-aided scheme.

Energy management system for biological 3D printing by the refinement of manifold model morphing in flexible grasping space

  • Authors: Kang Wang
  • Subjects: Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.10729
  • Pdf link: https://arxiv.org/pdf/2304.10729
  • Abstract
    The use of 3D printing, or additive manufacturing, has gained significant attention in recent years due to its potential for revolutionizing traditional manufacturing processes. One key challenge in 3D printing is managing energy consumption, as it directly impacts the cost, efficiency, and sustainability of the process. In this paper, we propose an energy management system that leverages the refinement of manifold model morphing in a flexible grasping space, to reduce costs for biological 3D printing. The manifold model is a mathematical representation of the 3D object to be printed, and the refinement process involves optimizing the morphing parameters of the manifold model to achieve desired printing outcomes. To enable flexibility in the grasping space, we incorporate data-driven approaches, such as machine learning and data augmentation techniques, to enhance the accuracy and robustness of the energy management system. Our proposed system addresses the challenges of limited sample data and complex morphologies of manifold models in layered additive manufacturing. Our method is more applicable for soft robotics and biomechanisms. We evaluate the performance of our system through extensive experiments and demonstrate its effectiveness in predicting and managing energy consumption in 3D printing processes. The results highlight the importance of refining manifold model morphing in the flexible grasping space for achieving energy-efficient 3D printing, contributing to the advancement of green and sustainable manufacturing practices.

Linear building pattern recognition via spatial knowledge graph

  • Authors: Wei Zhiwei, Xiao Yi, Tong Ying, Xu Wenjia, Wang Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10733
  • Pdf link: https://arxiv.org/pdf/2304.10733
  • Abstract
    Building patterns are important urban structures that reflect the effect of the urban material and social-economic on a region. Previous researches are mostly based on the graph isomorphism method and use rules to recognize building patterns, which are not efficient. The knowledge graph uses the graph to model the relationship between entities, and specific subgraph patterns can be efficiently obtained by using relevant reasoning tools. Thus, we try to apply the knowledge graph to recognize linear building patterns. First, we use the property graph to express the spatial relations in proximity, similar and linear arrangement between buildings; secondly, the rules of linear pattern recognition are expressed as the rules of knowledge graph reasoning; finally, the linear building patterns are recognized by using the rule-based reasoning in the built knowledge graph. The experimental results on a dataset containing 1289 buildings show that the method in this paper can achieve the same precision and recall as the existing methods; meanwhile, the recognition efficiency is improved by 5.98 times.

Multi-scale Evolutionary Neural Architecture Search for Deep Spiking Neural Networks

  • Authors: Wenxuan Pan, Feifei Zhao, Guobin Shen, Bing Han, Yi Zeng
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10749
  • Pdf link: https://arxiv.org/pdf/2304.10749
  • Abstract
    Spiking Neural Networks (SNNs) have received considerable attention not only for their superiority in energy efficient with discrete signal processing, but also for their natural suitability to integrate multi-scale biological plasticity. However, most SNNs directly adopt the structure of the well-established DNN, rarely automatically design Neural Architecture Search (NAS) for SNNs. The neural motifs topology, modular regional structure and global cross-brain region connection of the human brain are the product of natural evolution and can serve as a perfect reference for designing brain-inspired SNN architecture. In this paper, we propose a Multi-Scale Evolutionary Neural Architecture Search (MSE-NAS) for SNN, simultaneously considering micro-, meso- and macro-scale brain topologies as the evolutionary search space. MSE-NAS evolves individual neuron operation, self-organized integration of multiple circuit motifs, and global connectivity across motifs through a brain-inspired indirect evaluation function, Representational Dissimilarity Matrices (RDMs). This training-free fitness function could greatly reduce computational consumption and NAS's time, and its task-independent property enables the searched SNNs to exhibit excellent transferbility and scalability. Extensive experiments demonstrate that the proposed algorithm achieves state-of-the-art (SOTA) performance with shorter simulation steps on static datasets (CIFAR10, CIFAR100) and neuromorphic datasets (CIFAR10-DVS and DVS128-Gesture). The thorough analysis also illustrates the significant performance improvement and consistent bio-interpretability deriving from the topological evolution at different scales and the RDMs fitness function.

Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback

  • Authors: Nikhil Mehta, Milagro Teruel, Patricio Figueroa Sanz, Xin Deng, Ahmed Hassan Awadallah, Julia Kiseleva
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10750
  • Pdf link: https://arxiv.org/pdf/2304.10750
  • Abstract
    Many approaches to Natural Language Processing (NLP) tasks often treat them as single-step problems, where an agent receives an instruction, executes it, and is evaluated based on the final outcome. However, human language is inherently interactive, as evidenced by the back-and-forth nature of human conversations. In light of this, we posit that human-AI collaboration should also be interactive, with humans monitoring the work of AI agents and providing feedback that the agent can understand and utilize. Further, the AI agent should be able to detect when it needs additional information and proactively ask for help. Enabling this scenario would lead to more natural, efficient, and engaging human-AI collaborations. In this work, we explore these directions using the challenging task defined by the IGLU competition, an interactive grounded language understanding task in a MineCraft-like world. We explore multiple types of help players can give to the AI to guide it and analyze the impact of this help in AI behavior, resulting in performance improvements.

Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation

  • Authors: Harsh Maheshwari, Yen-Cheng Liu, Zsolt Kira
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10756
  • Pdf link: https://arxiv.org/pdf/2304.10756
  • Abstract
    Using multiple spatial modalities has been proven helpful in improving semantic segmentation performance. However, there are several real-world challenges that have yet to be addressed: (a) improving label efficiency and (b) enhancing robustness in realistic scenarios where modalities are missing at the test time. To address these challenges, we first propose a simple yet efficient multi-modal fusion mechanism Linear Fusion, that performs better than the state-of-the-art multi-modal models even with limited supervision. Second, we propose M3L: Multi-modal Teacher for Masked Modality Learning, a semi-supervised framework that not only improves the multi-modal performance but also makes the model robust to the realistic missing modality scenario using unlabeled data. We create the first benchmark for semi-supervised multi-modal semantic segmentation and also report the robustness to missing modalities. Our proposal shows an absolute improvement of up to 10% on robust mIoU above the most competitive baselines. Our code is available at https://github.com/harshm121/M3L

Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation

  • Authors: Hongcheng Wang, Yuxuan Wang, Fangwei Zhong, Mingdong Wu, Jianwei Zhang, Yizhou Wang, Hao Dong
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.10773
  • Pdf link: https://arxiv.org/pdf/2304.10773
  • Abstract
    Visual-audio navigation (VAN) is attracting more and more attention from the robotic community due to its broad applications, \emph{e.g.}, household robots and rescue robots. In this task, an embodied agent must search for and navigate to the sound source with egocentric visual and audio observations. However, the existing methods are limited in two aspects: 1) poor generalization to unheard sound categories; 2) sample inefficient in training. Focusing on these two problems, we propose a brain-inspired plug-and-play method to learn a semantic-agnostic and spatial-aware representation for generalizable visual-audio navigation. We meticulously design two auxiliary tasks for respectively accelerating learning representations with the above-desired characteristics. With these two auxiliary tasks, the agent learns a spatially-correlated representation of visual and audio inputs that can be applied to work on environments with novel sounds and maps. Experiment results on realistic 3D scenes (Replica and Matterport3D) demonstrate that our method achieves better generalization performance when zero-shot transferred to scenes with unseen maps and unheard sound categories.

Learn to Cluster Faces with Better Subgraphs

  • Authors: Yuan Cao, Di Jiang, Guanqun Hou, Fan Deng, Xinjia Chen, Qiang Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10831
  • Pdf link: https://arxiv.org/pdf/2304.10831
  • Abstract
    Face clustering can provide pseudo-labels to the massive unlabeled face data and improve the performance of different face recognition models. The existing clustering methods generally aggregate the features within subgraphs that are often implemented based on a uniform threshold or a learned cutoff position. This may reduce the recall of subgraphs and hence degrade the clustering performance. This work proposed an efficient neighborhood-aware subgraph adjustment method that can significantly reduce the noise and improve the recall of the subgraphs, and hence can drive the distant nodes to converge towards the same centers. More specifically, the proposed method consists of two components, i.e. face embeddings enhancement using the embeddings from neighbors, and enclosed subgraph construction of node pairs for structural information extraction. The embeddings are combined to predict the linkage probabilities for all node pairs to replace the cosine similarities to produce new subgraphs that can be further used for aggregation of GCNs or other clustering methods. The proposed method is validated through extensive experiments against a range of clustering solutions using three benchmark datasets and numerical results confirm that it outperforms the SOTA solutions in terms of generalization capability.

A Deep Learning algorithm to accelerate Algebraic Multigrid methods in Finite Element solvers of 3D elliptic PDEs

  • Authors: Matteo Caldana, Paola F. Antonietti, Luca Dede'
  • Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10832
  • Pdf link: https://arxiv.org/pdf/2304.10832
  • Abstract
    Algebraic multigrid (AMG) methods are among the most efficient solvers for linear systems of equations and they are widely used for the solution of problems stemming from the discretization of Partial Differential Equations (PDEs). The most severe limitation of AMG methods is the dependence on parameters that require to be fine-tuned. In particular, the strong threshold parameter is the most relevant since it stands at the basis of the construction of successively coarser grids needed by the AMG methods. We introduce a novel Deep Learning algorithm that minimizes the computational cost of the AMG method when used as a finite element solver. We show that our algorithm requires minimal changes to any existing code. The proposed Artificial Neural Network (ANN) tunes the value of the strong threshold parameter by interpreting the sparse matrix of the linear system as a black-and-white image and exploiting a pooling operator to transform it into a small multi-channel image. We experimentally prove that the pooling successfully reduces the computational cost of processing a large sparse matrix and preserves the features needed for the regression task at hand. We train the proposed algorithm on a large dataset containing problems with a highly heterogeneous diffusion coefficient defined in different three-dimensional geometries and discretized with unstructured grids and linear elasticity problems with a highly heterogeneous Young's modulus. When tested on problems with coefficients or geometries not present in the training dataset, our approach reduces the computational time by up to 30%.

An Analytical Model for Performance Estimation in High-Capacity IMDD Systems

  • Authors: Giuseppe Rizzelli, Pablo Torres-Ferrera, Fabrizio Forghieri, Roberto Gaudino
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.10834
  • Pdf link: https://arxiv.org/pdf/2304.10834
  • Abstract
    In this paper, we propose an analytical model to estimate the signal-to-noise ratio (SNR) at the output of an adaptive equalizer in intensity modulation and direct detection (IMDD) optical transmission systems affected by shot noise, thermal noise, relative intensity noise (RIN), chromatic dispersion (CD) and bandwidth limitations. We develop the model as an extension of a previously presented one, and then we test its accuracy by sweeping the main parameters of a 4-PAM-based communication system such as RIN coefficient, extinction ratio, CD coefficient and equalizer memory. Our findings show a remarkable agreement between time-domain simulations and analytical results, with SNR discrepancies below 0.1 dB in most cases, for both feed-forward and decision-feedback equalization. We consider that the proposed model is a powerful tool for the numerical design of strongly band-limited IMDD systems using receiver equalization, as it happens in most of modern and future M-PAM solutions for short reach and access systems.

A numerical method for the stability analysis of linear age-structured models with nonlocal diffusion

  • Authors: Dimitri Breda, Simone De Reggi, Rossana Vermiglio
  • Subjects: Numerical Analysis (math.NA); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.10835
  • Pdf link: https://arxiv.org/pdf/2304.10835
  • Abstract
    We numerically investigate the stability of linear age-structured population models with nonlocal diffusion, which arise naturally in describing dynamics of infectious diseases. Compared to Laplace diffusion, the analysis of models with nonlocal diffusion is more challenging since the associated semigroups have no regularizing properties in the spatial variable. Nevertheless, the asymptotic stability of the null equilibrium is determined by the spectrum of the infinitesimal generator associated to the semigroup. We propose to approximate the leading part of this spectrum by first reformulating the problem via integration of the age-state and then by discretizing the generator combining a spectral projection in space with a pseudospectral collocation in age. A rigorous convergence analysis is provided in the case of separable model coefficients. Results are confirmed experimentally and numerical tests are presented also for the more general instance.

Better Sign Language Translation with Monolingual Data

  • Authors: Ru Peng, Yawen Zeng, Junbo Zhao
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.10844
  • Pdf link: https://arxiv.org/pdf/2304.10844
  • Abstract
    Sign language translation (SLT) systems, which are often decomposed into video-to-gloss (V2G) recognition and gloss-to-text (G2T) translation through the pivot gloss, heavily relies on the availability of large-scale parallel G2T pairs. However, the manual annotation of pivot gloss, which is a sequence of transcribed written-language words in the order in which they are signed, further exacerbates the scarcity of data for SLT. To address this issue, this paper proposes a simple and efficient rule transformation method to transcribe the large-scale target monolingual data into its pseudo glosses automatically for enhancing the SLT translation. Empirical results show that the proposed approach can significantly improve the performance of SLT, especially achieving state-of-the-art results on two SLT benchmark datasets PHEONIX-WEATHER 2014T and ASLG-PC12. Our code has been released at: https://github.com/pengr/Mono\_SLT.

How Well Does the Metropolis Algorithm Cope With Local Optima?

  • Authors: Benjamin Doerr, Taha El Ghazi El Houssaini, Amirhossein Rajabi, Carsten Wit
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.10848
  • Pdf link: https://arxiv.org/pdf/2304.10848
  • Abstract
    The Metropolis algorithm (MA) is a classic stochastic local search heuristic. It avoids getting stuck in local optima by occasionally accepting inferior solutions. To better and in a rigorous manner understand this ability, we conduct a mathematical runtime analysis of the MA on the CLIFF benchmark. Apart from one local optimum, cliff functions are monotonically increasing towards the global optimum. Consequently, to optimize a cliff function, the MA only once needs to accept an inferior solution. Despite seemingly being an ideal benchmark for the MA to profit from its main working principle, our mathematical runtime analysis shows that this hope does not come true. Even with the optimal temperature (the only parameter of the MA), the MA optimizes most cliff functions less efficiently than simple elitist evolutionary algorithms (EAs), which can only leave the local optimum by generating a superior solution possibly far away. This result suggests that our understanding of why the MA is often very successful in practice is not yet complete. Our work also suggests to equip the MA with global mutation operators, an idea supported by our preliminary experiments.

Viewing Allocators as Bin Packing Solvers Demystifies Fragmentation

  • Authors: Christos P. Lamprakos, Sotirios Xydis, Francky Catthoor, Dimitrios Soudris
  • Subjects: Programming Languages (cs.PL)
  • Arxiv link: https://arxiv.org/abs/2304.10862
  • Pdf link: https://arxiv.org/pdf/2304.10862
  • Abstract
    This paper presents a trace-based simulation methodology for constructing representations of workload-allocator interaction. We use two-dimensional rectangular bin packing (2DBP) as our foundation. Classical 2DBP algorithms minimize their products' makespan, but virtual memory systems employing demand paging deem such a criterion inappropriate. We view an allocator's placement decisions as a solution to a 2DBP instance, optimizing some unknown criterion particular to that allocator's policy. Our end product is a compact data structure that fits e.g. the simulation of 80 million requests in a 350 MiB file. By design, it is concerned with events residing entirely in virtual memory; no information on memory accesses, indexing costs or any other factor is kept. We bootstrap our contribution's significance by exploring its relationship to maximum resident set size (RSS). Our baseline is the assumption that less fragmentation amounts to smaller peak RSS. We thus define a fragmentation metric in the 2DBP substrate and compute it for 28 workloads linked to 4 modern allocators. We also measure peak RSS for the 112 resulting pairs. Our metric exhibits a strong monotonic relationship (Spearman coefficient $\rho&gt;0.65$) in half of those cases: allocators achieving better 2DBP placements yield $9%$-$30%$ smaller peak RSS, with the trends remaining consistent across two different machines. Considering our representation's minimalism, the presented empirical evidence is a robust indicator of its potency. If workload-allocator interplay in the virtual address space suffices to evaluate a novel fragmentation definition, numerous other useful applications of our tool can be studied. Both augmenting 2DBP and exploring alternative computations on it provide ample fertile ground for future research.

Med-Tuning: Exploring Parameter-Efficient Transfer Learning for Medical Volumetric Segmentation

  • Authors: Wenxuan Wang, Jiachen Shen, Chen Chen, Jianbo Jiao, Yan Zhang, Shanshan Song, Jiangyun Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10880
  • Pdf link: https://arxiv.org/pdf/2304.10880
  • Abstract
    Deep learning based medical volumetric segmentation methods either train the model from scratch or follow the standard "pre-training then finetuning" paradigm. Although finetuning a well pre-trained model on downstream tasks can harness its representation power, the standard full finetuning is costly in terms of computation and memory footprint. In this paper, we present the first study on parameter-efficient transfer learning for medical volumetric segmentation and propose a novel framework named Med-Tuning based on intra-stage feature enhancement and inter-stage feature interaction. Given a large-scale pre-trained model on 2D natural images, our method can exploit both the multi-scale spatial feature representations and temporal correlations along image slices, which are crucial for accurate medical volumetric segmentation. Extensive experiments on three benchmark datasets (including CT and MRI) show that our method can achieve better results than previous state-of-the-art parameter-efficient transfer learning methods and full finetuning for the segmentation task, with much less tuned parameter costs. Compared to full finetuning, our method reduces the finetuned model parameters by up to 4x, with even better segmentation performance.

GCNH: A Simple Method For Representation Learning On Heterophilous Graphs

  • Authors: Andrea Cavallo, Claas Grohnfeldt, Michele Russo, Giulio Lovisotto, Luca Vassio
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10896
  • Pdf link: https://arxiv.org/pdf/2304.10896
  • Abstract
    Graph Neural Networks (GNNs) are well-suited for learning on homophilous graphs, i.e., graphs in which edges tend to connect nodes of the same type. Yet, achievement of consistent GNN performance on heterophilous graphs remains an open research problem. Recent works have proposed extensions to standard GNN architectures to improve performance on heterophilous graphs, trading off model simplicity for prediction accuracy. However, these models fail to capture basic graph properties, such as neighborhood label distribution, which are fundamental for learning. In this work, we propose GCN for Heterophily (GCNH), a simple yet effective GNN architecture applicable to both heterophilous and homophilous scenarios. GCNH learns and combines separate representations for a node and its neighbors, using one learned importance coefficient per layer to balance the contributions of center nodes and neighborhoods. We conduct extensive experiments on eight real-world graphs and a set of synthetic graphs with varying degrees of heterophily to demonstrate how the design choices for GCNH lead to a sizable improvement over a vanilla GCN. Moreover, GCNH outperforms state-of-the-art models of much higher complexity on four out of eight benchmarks, while producing comparable results on the remaining datasets. Finally, we discuss and analyze the lower complexity of GCNH, which results in fewer trainable parameters and faster training times than other methods, and show how GCNH mitigates the oversmoothing problem.

Factored Neural Representation for Scene Understanding

  • Authors: Yu-Shiang Wong, Niloy J. Mitra
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10950
  • Pdf link: https://arxiv.org/pdf/2304.10950
  • Abstract
    A long-standing goal in scene understanding is to obtain interpretable and editable representations that can be directly constructed from a raw monocular RGB-D video, without requiring specialized hardware setup or priors. The problem is significantly more challenging in the presence of multiple moving and/or deforming objects. Traditional methods have approached the setup with a mix of simplifications, scene priors, pretrained templates, or known deformation models. The advent of neural representations, especially neural implicit representations and radiance fields, opens the possibility of end-to-end optimization to collectively capture geometry, appearance, and object motion. However, current approaches produce global scene encoding, assume multiview capture with limited or no motion in the scenes, and do not facilitate easy manipulation beyond novel view synthesis. In this work, we introduce a factored neural scene representation that can directly be learned from a monocular RGB-D video to produce object-level neural presentations with an explicit encoding of object movement (e.g., rigid trajectory) and/or deformations (e.g., nonrigid movement). We evaluate ours against a set of neural approaches on both synthetic and real data to demonstrate that the representation is efficient, interpretable, and editable (e.g., change object trajectory). The project webpage is available at: $\href{https://yushiangw.github.io/factorednerf/}{\text{link}}$.

A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

  • Authors: Mizhaan Prajit Maniyar, Akash Mondal, Prashanth L.A., Shalabh Bhatnagar
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.10951
  • Pdf link: https://arxiv.org/pdf/2304.10951
  • Abstract
    We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. In this paper, we propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function using sample trajectories. The first algorithm requires an exact solution of the cubic regularized problem in each iteration, while the second algorithm employs an efficient gradient descent-based approximation to the cubic regularized problem. We establish convergence of our proposed algorithms to a second-order stationary point (SOSP) of the value function, which results in the avoidance of traps in the form of saddle points. In particular, the sample complexity of our algorithms to find an $\epsilon$-SOSP is $O(\epsilon^{-3.5})$, which is an improvement over the state-of-the-art sample complexity of $O(\epsilon^{-4.5})$.

Online Time-Optimal Trajectory Planning on Three-Dimensional Race Tracks

  • Authors: Matthias Rowold, Levent Ögretmen, Ulf Kasolowsky, Boris Lohmann
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.10954
  • Pdf link: https://arxiv.org/pdf/2304.10954
  • Abstract
    We propose an online planning approach for racing that generates the time-optimal trajectory for the upcoming track section. The resulting trajectory takes the current vehicle state, effects caused by \acl{3D} track geometries, and speed limits dictated by the race rules into account. In each planning step, an optimal control problem is solved, making a quasi-steady-state assumption with a point mass model constrained by gg-diagrams. For its online applicability, we propose an efficient representation of the gg-diagrams and identify negligible terms to reduce the computational effort. We demonstrate that the online planning approach can reproduce the lap times of an offline-generated racing line during single vehicle racing. Moreover, it finds a new time-optimal solution when a deviation from the original racing line is necessary, e.g., during an overtaking maneuver. Motivated by the application in a rule-based race, we also consider the scenario of a speed limit lower than the current vehicle velocity. We introduce an initializable slack variable to generate feasible trajectories despite the constraint violation while reducing the velocity to comply with the rules.

IBBT: Informed Batch Belief Trees for Motion Planning Under Uncertainty

  • Authors: Dongliang Zheng, Panagiotis Tsiotras
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10984
  • Pdf link: https://arxiv.org/pdf/2304.10984
  • Abstract
    In this work, we propose the Informed Batch Belief Trees (IBBT) algorithm for motion planning under motion and sensing uncertainties. The original stochastic motion planning problem is divided into a deterministic motion planning problem and a graph search problem. We solve the deterministic planning problem using sampling-based methods such as PRM or RRG to construct a graph of nominal trajectories. Then, an informed cost-to-go heuristic for the original problem is computed based on the nominal trajectory graph. Finally, we grow a belief tree by searching over the graph using the proposed heuristic. IBBT interleaves between batch state sampling, nominal trajectory graph construction, heuristic computing, and search over the graph to find belief space motion plans. IBBT is an anytime, incremental algorithm. With an increasing number of batches of samples added to the graph, the algorithm finds motion plans that converge to the optimal one. IBBT is efficient by reusing results between sequential iterations. The belief tree searching is an ordered search guided by an informed heuristic. We test IBBT in different planning environments. Our numerical investigation confirms that IBBT finds non-trivial motion plans and is faster compared with previous similar methods.

RGB-D Inertial Odometry for a Resource-Restricted Robot in Dynamic Environments

  • Authors: Jianheng Liu, Xuanfu Li, Yueqian Liu, Haoyao Chen
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.10987
  • Pdf link: https://arxiv.org/pdf/2304.10987
  • Abstract
    Current simultaneous localization and mapping (SLAM) algorithms perform well in static environments but easily fail in dynamic environments. Recent works introduce deep learning-based semantic information to SLAM systems to reduce the influence of dynamic objects. However, it is still challenging to apply a robust localization in dynamic environments for resource-restricted robots. This paper proposes a real-time RGB-D inertial odometry system for resource-restricted robots in dynamic environments named Dynamic-VINS. Three main threads run in parallel: object detection, feature tracking, and state optimization. The proposed Dynamic-VINS combines object detection and depth information for dynamic feature recognition and achieves performance comparable to semantic segmentation. Dynamic-VINS adopts grid-based feature detection and proposes a fast and efficient method to extract high-quality FAST feature points. IMU is applied to predict motion for feature tracking and moving consistency check. The proposed method is evaluated on both public datasets and real-world applications and shows competitive localization accuracy and robustness in dynamic environments. Yet, to the best of our knowledge, it is the best-performance real-time RGB-D inertial odometry for resource-restricted platforms in dynamic environments for now. The proposed system is open source at: https://github.com/HITSZ-NRSL/Dynamic-VINS.git

Minsight: A Fingertip-Sized Vision-Based Tactile Sensor for Robotic Manipulation

  • Authors: Iris Andrussow, Huanbo Sun, Katherine J. Kuchenbecker, Georg Martius
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.10990
  • Pdf link: https://arxiv.org/pdf/2304.10990
  • Abstract
    Intelligent interaction with the physical world requires perceptual abilities beyond vision and hearing; vibrant tactile sensing is essential for autonomous robots to dexterously manipulate unfamiliar objects or safely contact humans. Therefore, robotic manipulators need high-resolution touch sensors that are compact, robust, inexpensive, and efficient. The soft vision-based haptic sensor presented herein is a miniaturized and optimized version of the previously published sensor Insight. Minsight has the size and shape of a human fingertip and uses machine learning methods to output high-resolution maps of 3D contact force vectors at 60 Hz. Experiments confirm its excellent sensing performance, with a mean absolute force error of 0.07 N and contact location error of 0.6 mm across its surface area. Minsight's utility is shown in two robotic tasks on a 3-DoF manipulator. First, closed-loop force control enables the robot to track the movements of a human finger based only on tactile data. Second, the informative value of the sensor output is shown by detecting whether a hard lump is embedded within a soft elastomer with an accuracy of 98%. These findings indicate that Minsight can give robots the detailed fingertip touch sensing needed for dexterous manipulation and physical human-robot interaction.

Knowledge Distillation Under Ideal Joint Classifier Assumption

  • Authors: Huayu Li, Xiwen Chen, Gregory Ditzler, Ping Chang, Janet Roveda, Ao Li
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.11004
  • Pdf link: https://arxiv.org/pdf/2304.11004
  • Abstract
    Knowledge distillation is a powerful technique to compress large neural networks into smaller, more efficient networks. Softmax regression representation learning is a popular approach that uses a pre-trained teacher network to guide the learning of a smaller student network. While several studies explored the effectiveness of softmax regression representation learning, the underlying mechanism that provides knowledge transfer is not well understood. This paper presents Ideal Joint Classifier Knowledge Distillation (IJCKD), a unified framework that provides a clear and comprehensive understanding of the existing knowledge distillation methods and a theoretical foundation for future research. Using mathematical techniques derived from a theory of domain adaptation, we provide a detailed analysis of the student network's error bound as a function of the teacher. Our framework enables efficient knowledge transfer between teacher and student networks and can be applied to various applications.

Robot-Enabled Construction Assembly with Automated Sequence Planning based on ChatGPT: RoboGPT

  • Authors: Hengxu You, Yang Ye, Tianyu Zhou, Qi Zhu, Jing Du
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.11018
  • Pdf link: https://arxiv.org/pdf/2304.11018
  • Abstract
    Robot-based assembly in construction has emerged as a promising solution to address numerous challenges such as increasing costs, labor shortages, and the demand for safe and efficient construction processes. One of the main obstacles in realizing the full potential of these robotic systems is the need for effective and efficient sequence planning for construction tasks. Current approaches, including mathematical and heuristic techniques or machine learning methods, face limitations in their adaptability and scalability to dynamic construction environments. To expand the ability of the current robot system in sequential understanding, this paper introduces RoboGPT, a novel system that leverages the advanced reasoning capabilities of ChatGPT, a large language model, for automated sequence planning in robot-based assembly applied to construction tasks. The proposed system adapts ChatGPT for construction sequence planning and demonstrate its feasibility and effectiveness through experimental evaluation including Two case studies and 80 trials about real construction tasks. The results show that RoboGPT-driven robots can handle complex construction operations and adapt to changes on the fly. This paper contributes to the ongoing efforts to enhance the capabilities and performance of robot-based assembly systems in the construction industry, and it paves the way for further integration of large language model technologies in the field of construction robotics.

CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval

  • Authors: Shangda Wu, Dingyao Yu, Xu Tan, Maosong Sun
  • Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.11029
  • Pdf link: https://arxiv.org/pdf/2304.11029
  • Abstract
    We introduce CLaMP: Contrastive Language-Music Pre-training, which learns cross-modal representations between natural language and symbolic music using a music encoder and a text encoder trained jointly with a contrastive loss. To pre-train CLaMP, we collected a large dataset of 1.4 million music-text pairs. It employed text dropout as a data augmentation technique and bar patching to efficiently represent music data which reduces sequence length to less than 10%. In addition, we developed a masked music model pre-training objective to enhance the music encoder's comprehension of musical context and structure. CLaMP integrates textual information to enable semantic search and zero-shot classification for symbolic music, surpassing the capabilities of previous models. To support the evaluation of semantic search and music classification, we publicly release WikiMusicText (WikiMT), a dataset of 1010 lead sheets in ABC notation, each accompanied by a title, artist, genre, and description. In comparison to state-of-the-art models that require fine-tuning, zero-shot CLaMP demonstrated comparable or superior performance on score-oriented datasets.

Backpropagation-free Training of Deep Physical Neural Networks

  • Authors: Ali Momeni, Babak Rahmani, Matthieu Mallejac, Philipp Del Hougne, Romain Fleury
  • Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Applied Physics (physics.app-ph); Optics (physics.optics)
  • Arxiv link: https://arxiv.org/abs/2304.11042
  • Pdf link: https://arxiv.org/pdf/2304.11042
  • Abstract
    Recent years have witnessed the outstanding success of deep learning in various fields such as vision and natural language processing. This success is largely indebted to the massive size of deep learning models that is expected to increase unceasingly. This growth of the deep learning models is accompanied by issues related to their considerable energy consumption, both during the training and inference phases, as well as their scalability. Although a number of work based on unconventional physical systems have been proposed which addresses the issue of energy efficiency in the inference phase, efficient training of deep learning models has remained unaddressed. So far, training of digital deep learning models mainly relies on backpropagation, which is not suitable for physical implementation as it requires perfect knowledge of the computation performed in the so-called forward pass of the neural network. Here, we tackle this issue by proposing a simple deep neural network architecture augmented by a biologically plausible learning algorithm, referred to as "model-free forward-forward training". The proposed architecture enables training deep physical neural networks consisting of layers of physical nonlinear systems, without requiring detailed knowledge of the nonlinear physical layers' properties. We show that our method outperforms state-of-the-art hardware-aware training methods by improving training speed, decreasing digital computations, and reducing power consumption in physical systems. We demonstrate the adaptability of the proposed method, even in systems exposed to dynamic or unpredictable external perturbations. To showcase the universality of our approach, we train diverse wave-based physical neural networks that vary in the underlying wave phenomenon and the type of non-linearity they use, to perform vowel and image classification tasks experimentally.

SkillGPT: a RESTful API service for skill extraction and standardization using a Large Language Model

  • Authors: Nan Li, Bo Kang, Tijl De Bie
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.11060
  • Pdf link: https://arxiv.org/pdf/2304.11060
  • Abstract
    We present SkillGPT, a tool for skill extraction and standardization (SES) from free-style job descriptions and user profiles with an open-source Large Language Model (LLM) as backbone. Most previous methods for similar tasks either need supervision or rely on heavy data-preprocessing and feature engineering. Directly prompting the latest conversational LLM for standard skills, however, is slow, costly and inaccurate. In contrast, SkillGPT utilizes a LLM to perform its tasks in steps via summarization and vector similarity search, to balance speed with precision. The backbone LLM of SkillGPT is based on Llama, free for academic use and thus useful for exploratory research and prototype development. Hence, our cost-free SkillGPT gives users the convenience of conversational SES, efficiently and reliably.

HeRo: RoBERTa and Longformer Hebrew Language Models

  • Authors: Vitaly Shalumov, Harel Haskey
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.11077
  • Pdf link: https://arxiv.org/pdf/2304.11077
  • Abstract
    In this paper, we fill in an existing gap in resources available to the Hebrew NLP community by providing it with the largest so far pre-train dataset HeDC4, a state-of-the-art pre-trained language model HeRo for standard length inputs and an efficient transformer LongHeRo for long input sequences. The HeRo model was evaluated on the sentiment analysis, the named entity recognition, and the question answering tasks while the LongHeRo model was evaluated on the document classification task with a dataset composed of long documents. Both HeRo and LongHeRo presented state-of-the-art performance. The dataset and model checkpoints used in this work are publicly available.

A Convolutional Spiking Network for Gesture Recognition in Brain-Computer Interfaces

  • Authors: Yiming Ai, Bipin Rajendran
  • Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.11106
  • Pdf link: https://arxiv.org/pdf/2304.11106
  • Abstract
    Brain-computer interfaces are being explored for a wide variety of therapeutic applications. Typically, this involves measuring and analyzing continuous-time electrical brain activity via techniques such as electrocorticogram (ECoG) or electroencephalography (EEG) to drive external devices. However, due to the inherent noise and variability in the measurements, the analysis of these signals is challenging and requires offline processing with significant computational resources. In this paper, we propose a simple yet efficient machine learning-based approach for the exemplary problem of hand gesture classification based on brain signals. We use a hybrid machine learning approach that uses a convolutional spiking neural network employing a bio-inspired event-driven synaptic plasticity rule for unsupervised feature learning of the measured analog signals encoded in the spike domain. We demonstrate that this approach generalizes to different subjects with both EEG and ECoG data and achieves superior accuracy in the range of 92.74-97.07% in identifying different hand gesture classes and motor imagery tasks.

Deep-Learning-based Fast and Accurate 3D CT Deformable Image Registration in Lung Cancer

  • Authors: Yuzhen Ding, Hongying Feng, Yunze Yang, Jason Holmes, Zhengliang Liu, David Liu, William W. Wong, Nathan Y. Yu, Terence T. Sio, Steven E. Schild, Baoxin Li, Wei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
  • Arxiv link: https://arxiv.org/abs/2304.11135
  • Pdf link: https://arxiv.org/pdf/2304.11135
  • Abstract
    Purpose: In some proton therapy facilities, patient alignment relies on two 2D orthogonal kV images, taken at fixed, oblique angles, as no 3D on-the-bed imaging is available. The visibility of the tumor in kV images is limited since the patient's 3D anatomy is projected onto a 2D plane, especially when the tumor is behind high-density structures such as bones. This can lead to large patient setup errors. A solution is to reconstruct the 3D CT image from the kV images obtained at the treatment isocenter in the treatment position. Methods: An asymmetric autoencoder-like network built with vision-transformer blocks was developed. The data was collected from 1 head and neck patient: 2 orthogonal kV images (1024x1024 voxels), 1 3D CT with padding (512x512x512) acquired from the in-room CT-on-rails before kVs were taken and 2 digitally-reconstructed-radiograph (DRR) images (512x512) based on the CT. We resampled kV images every 8 voxels and DRR and CT every 4 voxels, thus formed a dataset consisting of 262,144 samples, in which the images have a dimension of 128 for each direction. In training, both kV and DRR images were utilized, and the encoder was encouraged to learn the jointed feature map from both kV and DRR images. In testing, only independent kV images were used. The full-size synthetic CT (sCT) was achieved by concatenating the sCTs generated by the model according to their spatial information. The image quality of the synthetic CT (sCT) was evaluated using mean absolute error (MAE) and per-voxel-absolute-CT-number-difference volume histogram (CDVH). Results: The model achieved a speed of 2.1s and a MAE of <40HU. The CDVH showed that <5% of the voxels had a per-voxel-absolute-CT-number-difference larger than 185 HU. Conclusion: A patient-specific vision-transformer-based network was developed and shown to be accurate and efficient to reconstruct 3D CT images from kV images.

Keyword: faster

Smart Learning to Find Dumb Contracts

  • Authors: Tamer Abdelaziz, Aquinas Hobor
  • Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.10726
  • Pdf link: https://arxiv.org/pdf/2304.10726
  • Abstract
    We introduce Deep Learning Vulnerability Analyzer (DLVA), a vulnerability detection tool for Ethereum smart contracts based on powerful deep learning techniques for sequential data adapted for bytecode. We train DLVA to judge bytecode even though the supervising oracle, Slither, can only judge source code. DLVA's training algorithm is general: we "extend" a source code analysis to bytecode without any manual feature engineering, predefined patterns, or expert rules. DLVA's training algorithm is also robust: it overcame a 1.25% error rate mislabeled contracts, and the student surpassing the teacher; found vulnerable contracts that Slither mislabeled. In addition to extending a source code analyzer to bytecode, DLVA is much faster than conventional tools for smart contract vulnerability detection based on formal methods: DLVA checks contracts for 29 vulnerabilities in 0.2 seconds, a speedup of 10-500x+ compared to traditional tools. DLVA has three key components. Smart Contract to Vector (SC2V) uses neural networks to map arbitrary smart contract bytecode to an high-dimensional floating-point vector. Sibling Detector (SD) classifies contracts when a target contract's vector is Euclidian-close to a labeled contract's vector in a training set; although only able to judge 55.7% of the contracts in our test set, it has an average accuracy of 97.4% with a false positive rate of only 0.1%. Lastly, Core Classifier (CC) uses neural networks to infer vulnerable contracts regardless of vector distance. DLVA has an overall accuracy of 96.6% with an associated false positive rate of only 3.7%.

FindVehicle and VehicleFinder: A NER dataset for natural language-based vehicle retrieval and a keyword-based cross-modal vehicle retrieval system

  • Authors: Runwei Guan, Ka Lok Man, Feifan Chen, Shanliang Yao, Rongsheng Hu, Xiaohui Zhu, Jeremy Smith, Eng Gee Lim, Yutao Yue
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2304.10893
  • Pdf link: https://arxiv.org/pdf/2304.10893
  • Abstract
    Natural language (NL) based vehicle retrieval is a task aiming to retrieve a vehicle that is most consistent with a given NL query from among all candidate vehicles. Because NL query can be easily obtained, such a task has a promising prospect in building an interactive intelligent traffic system (ITS). Current solutions mainly focus on extracting both text and image features and mapping them to the same latent space to compare the similarity. However, existing methods usually use dependency analysis or semantic role-labelling techniques to find keywords related to vehicle attributes. These techniques may require a lot of pre-processing and post-processing work, and also suffer from extracting the wrong keyword when the NL query is complex. To tackle these problems and simplify, we borrow the idea from named entity recognition (NER) and construct FindVehicle, a NER dataset in the traffic domain. It has 42.3k labelled NL descriptions of vehicle tracks, containing information such as the location, orientation, type and colour of the vehicle. FindVehicle also adopts both overlapping entities and fine-grained entities to meet further requirements. To verify its effectiveness, we propose a baseline NL-based vehicle retrieval model called VehicleFinder. Our experiment shows that by using text encoders pre-trained by FindVehicle, VehicleFinder achieves 87.7% precision and 89.4% recall when retrieving a target vehicle by text command on our homemade dataset based on UA-DETRAC. The time cost of VehicleFinder is 279.35 ms on one ARM v8.2 CPU and 93.72 ms on one RTX A4000 GPU, which is much faster than the Transformer-based system. The dataset is open-source via the link https://github.com/GuanRunwei/FindVehicle, and the implementation can be found via the link https://github.com/GuanRunwei/VehicleFinder-CTIM.

GCNH: A Simple Method For Representation Learning On Heterophilous Graphs

  • Authors: Andrea Cavallo, Claas Grohnfeldt, Michele Russo, Giulio Lovisotto, Luca Vassio
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10896
  • Pdf link: https://arxiv.org/pdf/2304.10896
  • Abstract
    Graph Neural Networks (GNNs) are well-suited for learning on homophilous graphs, i.e., graphs in which edges tend to connect nodes of the same type. Yet, achievement of consistent GNN performance on heterophilous graphs remains an open research problem. Recent works have proposed extensions to standard GNN architectures to improve performance on heterophilous graphs, trading off model simplicity for prediction accuracy. However, these models fail to capture basic graph properties, such as neighborhood label distribution, which are fundamental for learning. In this work, we propose GCN for Heterophily (GCNH), a simple yet effective GNN architecture applicable to both heterophilous and homophilous scenarios. GCNH learns and combines separate representations for a node and its neighbors, using one learned importance coefficient per layer to balance the contributions of center nodes and neighborhoods. We conduct extensive experiments on eight real-world graphs and a set of synthetic graphs with varying degrees of heterophily to demonstrate how the design choices for GCNH lead to a sizable improvement over a vanilla GCN. Moreover, GCNH outperforms state-of-the-art models of much higher complexity on four out of eight benchmarks, while producing comparable results on the remaining datasets. Finally, we discuss and analyze the lower complexity of GCNH, which results in fewer trainable parameters and faster training times than other methods, and show how GCNH mitigates the oversmoothing problem.

Faster Prefix-Sorting Algorithms for Deterministic Finite Automata

  • Authors: Sung-Hwan Kim, Francisco Olivares, Nicola Prezza
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.10962
  • Pdf link: https://arxiv.org/pdf/2304.10962
  • Abstract
    Sorting is a fundamental algorithmic pre-processing technique which often allows to represent data more compactly and, at the same time, speeds up search queries on it. In this paper, we focus on the well-studied problem of sorting and indexing string sets. Since the introduction of suffix trees in 1973, dozens of suffix sorting algorithms have been described in the literature. In 2017, these techniques were extended to sets of strings described by means of finite automata: the theory of Wheeler graphs [Gagie et al., TCS'17] introduced automata whose states can be totally-sorted according to the co-lexicographic (co-lex in the following) order of the prefixes of words accepted by the automaton. More recently, in [Cotumaccio, Prezza, SODA'21] it was shown how to extend these ideas to arbitrary automata by means of partial co-lex orders. This work showed that a co-lex order of minimum width (thus optimizing search query times) on deterministic finite automata (DFAs) can be computed in $O(m^2 + n^{5/2})$ time, $m$ being the number of transitions and $n$ the number of states of the input DFA. In this paper, we exhibit new combinatorial properties of the minimum-width co-lex order of DFAs and exploit them to design faster prefix sorting algorithms. In particular, we describe two algorithms sorting arbitrary DFAs in $O(mn)$ and $O(n^2\log n)$ time, respectively, and an algorithm sorting acyclic DFAs in $O(m\log n)$ time. Within these running times, all algorithms compute also a smallest chain partition of the partial order (required to index the DFA). We present an experiment result to show that an optimized implementation of the $O(n^2\log n)$-time algorithm exhibits a nearly-linear behaviour on large deterministic pan-genomic graphs and is thus also of practical interest.

IBBT: Informed Batch Belief Trees for Motion Planning Under Uncertainty

  • Authors: Dongliang Zheng, Panagiotis Tsiotras
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10984
  • Pdf link: https://arxiv.org/pdf/2304.10984
  • Abstract
    In this work, we propose the Informed Batch Belief Trees (IBBT) algorithm for motion planning under motion and sensing uncertainties. The original stochastic motion planning problem is divided into a deterministic motion planning problem and a graph search problem. We solve the deterministic planning problem using sampling-based methods such as PRM or RRG to construct a graph of nominal trajectories. Then, an informed cost-to-go heuristic for the original problem is computed based on the nominal trajectory graph. Finally, we grow a belief tree by searching over the graph using the proposed heuristic. IBBT interleaves between batch state sampling, nominal trajectory graph construction, heuristic computing, and search over the graph to find belief space motion plans. IBBT is an anytime, incremental algorithm. With an increasing number of batches of samples added to the graph, the algorithm finds motion plans that converge to the optimal one. IBBT is efficient by reusing results between sequential iterations. The belief tree searching is an ordered search guided by an informed heuristic. We test IBBT in different planning environments. Our numerical investigation confirms that IBBT finds non-trivial motion plans and is faster compared with previous similar methods.

Learned Monotone Minimal Perfect Hashing

  • Authors: Paolo Ferragina, Hans-Peter Lehmann, Peter Sanders, Giorgio Vinciguerra
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.11012
  • Pdf link: https://arxiv.org/pdf/2304.11012
  • Abstract
    A Monotone Minimal Perfect Hash Function (MMPHF) constructed on a set S of keys is a function that maps each key in S to its rank. On keys not in S, the function returns an arbitrary value. Applications range from databases, search engines, data encryption, to pattern-matching algorithms. In this paper, we describe LeMonHash, a new technique for constructing MMPHFs for integers. The core idea of LeMonHash is surprisingly simple and effective: we learn a monotone mapping from keys to their rank via an error-bounded piecewise linear model (the PGM-index), and then we solve the collisions that might arise among keys mapping to the same rank estimate by associating small integers with them in a retrieval data structure (BuRR). On synthetic random datasets, LeMonHash needs 35% less space than the next best competitor, while achieving about 16 times faster queries. On real-world datasets, the space usage is very close to or much better than the best competitors, while achieving up to 19 times faster queries than the next larger competitor. As far as the construction of LeMonHash is concerned, we get an improvement by a factor of up to 2, compared to the competitor with the next best space usage. We also investigate the case of keys being variable-length strings, introducing the so-called LeMonHash-VL: it needs space within 10% of the best competitors while achieving up to 3 times faster queries.

Keyword: mobile

Joint Client Assignment and UAV Route Planning for Indirect-Communication Federated Learning

  • Authors: Jieming Bian, Cong Shen, Jie Xu
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10744
  • Pdf link: https://arxiv.org/pdf/2304.10744
  • Abstract
    Federated Learning (FL) is a machine learning approach that enables the creation of shared models for powerful applications while allowing data to remain on devices. This approach provides benefits such as improved data privacy, security, and reduced latency. However, in some systems, direct communication between clients and servers may not be possible, such as remote areas without proper communication infrastructure. To overcome this challenge, a new framework called FedEx (Federated Learning via Model Express Delivery) is proposed. This framework employs mobile transporters, such as UAVs, to establish indirect communication channels between the server and clients. These transporters act as intermediaries and allow for model information exchange. The use of indirect communication presents new challenges for convergence analysis and optimization, as the delay introduced by the transporters' movement creates issues for both global model dissemination and local model collection. To address this, two algorithms, FedEx-Sync and FedEx-Async, are proposed for synchronized and asynchronized learning at the transporter level. Additionally, a bi-level optimization algorithm is proposed to solve the joint client assignment and route planning problem. Experimental validation using two public datasets in a simulated network demonstrates consistent results with the theory, proving the efficacy of FedEx.

Safe Routing Approach by Identifying and Subsequently Eliminating the Attacks in MANET

  • Authors: S.M. Udhaya Sankar, D. Dhinakaran, C. Cathrin Deboral, M. Ramakrishnan
  • Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.10838
  • Pdf link: https://arxiv.org/pdf/2304.10838
  • Abstract
    Wireless networks that are decentralized and communicate without using existing infrastructure are known as mobile ad-hoc networks. The most common sorts of threats and attacks can affect MANETs. Therefore, it is advised to utilize intrusion detection, which controls the system to detect additional security issues. Monitoring is essential to avoid attacks and provide extra protection against unauthorized access. Although the current solutions have been designed to defeat the attack nodes, they still require additional hardware, have considerable delivery delays, do not offer high throughput or packet delivery ratios, or do not do so without using more energy. The capability of a mobile node to forward packets, which is dependent on the platform's life quality, may be impacted by the absence of the network node power source. We developed the Safe Routing Approach (SRA), which uses behaviour analysis to track and monitor attackers who discard packets during the route discovery process. The attacking node recognition system is made for irregular routing node detection to protect the controller network's usual properties from becoming recognized as an attack node. The suggested method examines the nearby attack nodes and conceals the trusted node in the routing pathway. The path is instantly assigned after the initial discovery of trust nodes based on each node's strength value. It extends the network's life span and reduces packet loss. In terms of Packet Delivery Ratio (PDR), energy consumption, network performance, and detection of attack nodes, the suggested approach is contrasted with AIS, ZIDS, and Improved AODV. The findings demonstrate that the recommended strategy performs superior in terms of PDR, residual energy, and network throughput.

HabitatDyn Dataset: Dynamic Object Detection to Kinematics Estimation

  • Authors: Zhengcheng Shen, Yi Gao, Linh Kästner, Jens Lambrecht
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10854
  • Pdf link: https://arxiv.org/pdf/2304.10854
  • Abstract
    The advancement of computer vision and machine learning has made datasets a crucial element for further research and applications. However, the creation and development of robots with advanced recognition capabilities are hindered by the lack of appropriate datasets. Existing image or video processing datasets are unable to accurately depict observations from a moving robot, and they do not contain the kinematics information necessary for robotic tasks. Synthetic data, on the other hand, are cost-effective to create and offer greater flexibility for adapting to various applications. Hence, they are widely utilized in both research and industry. In this paper, we propose the dataset HabitatDyn, which contains both synthetic RGB videos, semantic labels, and depth information, as well as kinetics information. HabitatDyn was created from the perspective of a mobile robot with a moving camera, and contains 30 scenes featuring six different types of moving objects with varying velocities. To demonstrate the usability of our dataset, two existing algorithms are used for evaluation and an approach to estimate the distance between the object and camera is implemented based on these segmentation methods and evaluated through the dataset. With the availability of this dataset, we aspire to foster further advancements in the field of mobile robotics, leading to more capable and intelligent robots that can navigate and interact with their environments more effectively. The code is publicly available at https://github.com/ignc-research/HabitatDyn.

Using Mobile Data and Deep Models to Assess Auditory Verbal Hallucinations

  • Authors: Shayan Mirjafari, Subigya Nepal, Weichen Wang, Andrew T. Campbell
  • Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.11049
  • Pdf link: https://arxiv.org/pdf/2304.11049
  • Abstract
    Hallucination is an apparent perception in the absence of real external sensory stimuli. An auditory hallucination is a perception of hearing sounds that are not real. A common form of auditory hallucination is hearing voices in the absence of any speakers which is known as Auditory Verbal Hallucination (AVH). AVH is fragments of the mind's creation that mostly occur in people diagnosed with mental illnesses such as bipolar disorder and schizophrenia. Assessing the valence of hallucinated voices (i.e., how negative or positive voices are) can help measure the severity of a mental illness. We study N=435 individuals, who experience hearing voices, to assess auditory verbal hallucination. Participants report the valence of voices they hear four times a day for a month through ecological momentary assessments with questions that have four answering scales from not at all'' to extremely''. We collect these self-reports as the valence supervision of AVH events via a mobile application. Using the application, participants also record audio diaries to describe the content of hallucinated voices verbally. In addition, we passively collect mobile sensing data as contextual signals. We then experiment with how predictive these linguistic and contextual cues from the audio diary and mobile sensing data are of an auditory verbal hallucination event. Finally, using transfer learning and data fusion techniques, we train a neural net model that predicts the valance of AVH with a performance of 54% top-1 and 72% top-2 F1 score.

Multivariate and Multi-step Traffic Prediction for NextG Networks with SLA Violation Constraints

  • Authors: Evren Tuna, Alkan Soysal
  • Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.11156
  • Pdf link: https://arxiv.org/pdf/2304.11156
  • Abstract
    This paper focuses on predicting downlink (DL) traffic volume in mobile networks while minimizing overprovisioning and meeting a given service-level agreement (SLA) violation rate. We present a multivariate, multi-step, and SLA-driven approach that incorporates 20 different radio access network (RAN) features, a custom feature set based on peak traffic hours, and handover-based clustering to leverage the spatiotemporal effects. In addition, we propose a custom loss function that ensures the SLA violation rate constraint is satisfied while minimizing overprovisioning. We also perform multi-step prediction up to 24 steps ahead and evaluate performance under both single-step and multi-step prediction conditions. Our study makes several contributions, including the analysis of RAN features, the custom feature set design, a custom loss function, and a parametric method to satisfy SLA constraints.

Keyword: pruning

ULEEN: A Novel Architecture for Ultra Low-Energy Edge Neural Networks

  • Authors: Zachary Susskind, Aman Arora, Igor D. S. Miranda, Alan T. L. Bacellar, Luis A. Q. Villon, Rafael F. Katopodis, Leandro S. de Araujo, Diego L. C. Dutra, Priscila M. V. Lima, Felipe M. G. Franca, Mauricio Breternitz Jr., Lizy K. John
  • Subjects: Hardware Architecture (cs.AR); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.10618
  • Pdf link: https://arxiv.org/pdf/2304.10618
  • Abstract
    The deployment of AI models on low-power, real-time edge devices requires accelerators for which energy, latency, and area are all first-order concerns. There are many approaches to enabling deep neural networks (DNNs) in this domain, including pruning, quantization, compression, and binary neural networks (BNNs), but with the emergence of the "extreme edge", there is now a demand for even more efficient models. In order to meet the constraints of ultra-low-energy devices, we propose ULEEN, a model architecture based on weightless neural networks. Weightless neural networks (WNNs) are a class of neural model which use table lookups, not arithmetic, to perform computation. The elimination of energy-intensive arithmetic operations makes WNNs theoretically well suited for edge inference; however, they have historically suffered from poor accuracy and excessive memory usage. ULEEN incorporates algorithmic improvements and a novel training strategy inspired by BNNs to make significant strides in improving accuracy and reducing model size. We compare FPGA and ASIC implementations of an inference accelerator for ULEEN against edge-optimized DNN and BNN devices. On a Xilinx Zynq Z-7045 FPGA, we demonstrate classification on the MNIST dataset at 14.3 million inferences per second (13 million inferences/Joule) with 0.21 $\mu$s latency and 96.2% accuracy, while Xilinx FINN achieves 12.3 million inferences per second (1.69 million inferences/Joule) with 0.31 $\mu$s latency and 95.83% accuracy. In a 45nm ASIC, we achieve 5.1 million inferences/Joule and 38.5 million inferences/second at 98.46% accuracy, while a quantized Bit Fusion model achieves 9230 inferences/Joule and 19,100 inferences/second at 99.35% accuracy. In our search for ever more efficient edge devices, ULEEN shows that WNNs are deserving of consideration.

Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers

  • Authors: Siyuan Wei, Tianzhu Ye, Shen Zhang, Yao Tang, Jiajun Liang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10716
  • Pdf link: https://arxiv.org/pdf/2304.10716
  • Abstract
    Although vision transformers (ViTs) have shown promising results in various computer vision tasks recently, their high computational cost limits their practical applications. Previous approaches that prune redundant tokens have demonstrated a good trade-off between performance and computation costs. Nevertheless, errors caused by pruning strategies can lead to significant information loss. Our quantitative experiments reveal that the impact of pruned tokens on performance should be noticeable. To address this issue, we propose a novel joint Token Pruning & Squeezing module (TPS) for compressing vision transformers with higher efficiency. Firstly, TPS adopts pruning to get the reserved and pruned subsets. Secondly, TPS squeezes the information of pruned tokens into partial reserved tokens via the unidirectional nearest-neighbor matching and similarity-based fusing steps. Compared to state-of-the-art methods, our approach outperforms them under all token pruning intensities. Especially while shrinking DeiT-tiny&small computational budgets to 35%, it improves the accuracy by 1%-6% compared with baselines on ImageNet classification. The proposed method can accelerate the throughput of DeiT-small beyond DeiT-tiny, while its accuracy surpasses DeiT-tiny by 4.78%. Experiments on various transformers demonstrate the effectiveness of our method, while analysis experiments prove our higher robustness to the errors of the token pruning policy. Code is available at https://github.com/megvii-research/TPS-CVPR2023.

Conservative Sparse Neural Network Embedded Frequency-Constrained Unit Commitment With Distributed Energy Resources

  • Authors: Linwei Sang, Yinliang Xu, Zhongkai Yi, Lun Yang, Huan Long, Hongbin Sun
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10720
  • Pdf link: https://arxiv.org/pdf/2304.10720
  • Abstract
    The increasing penetration of distributed energy resources (DERs) will decrease the rotational inertia of the power system and further degrade the system frequency stability. To address the above issues, this paper leverages the advanced neural network (NN) to learn the frequency dynamics and incorporates NN to facilitate system reliable operation. This paper proposes the conservative sparse neural network (CSNN) embedded frequency-constrained unit commitment (FCUC) with converter-based DERs, including the learning and optimization stages. In the learning stage, it samples the inertia parameters, calculates the corresponding frequency, and characterizes the stability region of the sampled parameters using the convex hulls to ensure stability and avoid extrapolation. For conservativeness, the positive prediction error penalty is added to the loss function to prevent possible frequency requirement violation. For the sparsity, the NN topology pruning is employed to eliminate unnecessary connections for solving acceleration. In the optimization stage, the trained CSNN is transformed into mixed-integer linear constraints using the big-M method and then incorporated to establish the data-enhanced model. The case study verifies 1) the effectiveness of the proposed model in terms of high accuracy, fewer parameters, and significant solving acceleration; 2) the stable system operation against frequency violation under contingency.

Keyword: voxel

VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos

  • Authors: Huiyu Gao, Wei Mao, Miaomiao Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10687
  • Pdf link: https://arxiv.org/pdf/2304.10687
  • Abstract
    We propose VisFusion, a visibility-aware online 3D scene reconstruction approach from posed monocular videos. In particular, we aim to reconstruct the scene from volumetric features. Unlike previous reconstruction methods which aggregate features for each voxel from input views without considering its visibility, we aim to improve the feature fusion by explicitly inferring its visibility from a similarity matrix, computed from its projected features in each image pair. Following previous works, our model is a coarse-to-fine pipeline including a volume sparsification process. Different from their works which sparsify voxels globally with a fixed occupancy threshold, we perform the sparsification on a local feature volume along each visual ray to preserve at least one voxel per ray for more fine details. The sparse local volume is then fused with a global one for online reconstruction. We further propose to predict TSDF in a coarse-to-fine manner by learning its residuals across scales leading to better TSDF predictions. Experimental results on benchmarks show that our method can achieve superior performance with more scene details. Code is available at: https://github.com/huiyu-gao/VisFusion

Deep-Learning-based Fast and Accurate 3D CT Deformable Image Registration in Lung Cancer

  • Authors: Yuzhen Ding, Hongying Feng, Yunze Yang, Jason Holmes, Zhengliang Liu, David Liu, William W. Wong, Nathan Y. Yu, Terence T. Sio, Steven E. Schild, Baoxin Li, Wei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
  • Arxiv link: https://arxiv.org/abs/2304.11135
  • Pdf link: https://arxiv.org/pdf/2304.11135
  • Abstract
    Purpose: In some proton therapy facilities, patient alignment relies on two 2D orthogonal kV images, taken at fixed, oblique angles, as no 3D on-the-bed imaging is available. The visibility of the tumor in kV images is limited since the patient's 3D anatomy is projected onto a 2D plane, especially when the tumor is behind high-density structures such as bones. This can lead to large patient setup errors. A solution is to reconstruct the 3D CT image from the kV images obtained at the treatment isocenter in the treatment position. Methods: An asymmetric autoencoder-like network built with vision-transformer blocks was developed. The data was collected from 1 head and neck patient: 2 orthogonal kV images (1024x1024 voxels), 1 3D CT with padding (512x512x512) acquired from the in-room CT-on-rails before kVs were taken and 2 digitally-reconstructed-radiograph (DRR) images (512x512) based on the CT. We resampled kV images every 8 voxels and DRR and CT every 4 voxels, thus formed a dataset consisting of 262,144 samples, in which the images have a dimension of 128 for each direction. In training, both kV and DRR images were utilized, and the encoder was encouraged to learn the jointed feature map from both kV and DRR images. In testing, only independent kV images were used. The full-size synthetic CT (sCT) was achieved by concatenating the sCTs generated by the model according to their spatial information. The image quality of the synthetic CT (sCT) was evaluated using mean absolute error (MAE) and per-voxel-absolute-CT-number-difference volume histogram (CDVH). Results: The model achieved a speed of 2.1s and a MAE of <40HU. The CDVH showed that <5% of the voxels had a per-voxel-absolute-CT-number-difference larger than 185 HU. Conclusion: A patient-specific vision-transformer-based network was developed and shown to be accurate and efficient to reconstruct 3D CT images from kV images.

Keyword: lidar

HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative perception with vision transformer

  • Authors: Hao Xiang, Runsheng Xu, Jiaqi Ma
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10628
  • Pdf link: https://arxiv.org/pdf/2304.10628
  • Abstract
    Vehicle-to-Vehicle technologies have enabled autonomous vehicles to share information to see through occlusions, greatly enhancing perception performance. Nevertheless, existing works all focused on homogeneous traffic where vehicles are equipped with the same type of sensors, which significantly hampers the scale of collaboration and benefit of cross-modality interactions. In this paper, we investigate the multi-agent hetero-modal cooperative perception problem where agents may have distinct sensor modalities. We present HM-ViT, the first unified multi-agent hetero-modal cooperative perception framework that can collaboratively predict 3D objects for highly dynamic vehicle-to-vehicle (V2V) collaborations with varying numbers and types of agents. To effectively fuse features from multi-view images and LiDAR point clouds, we design a novel heterogeneous 3D graph transformer to jointly reason inter-agent and intra-agent interactions. The extensive experiments on the V2V perception dataset OPV2V demonstrate that the HM-ViT outperforms SOTA cooperative perception methods for V2V hetero-modal cooperative perception. We will release codes to facilitate future research.

Keyword: diffusion

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

  • Authors: Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, Jakub Grudzien Kuba, Sergey Levine
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10573
  • Pdf link: https://arxiv.org/pdf/2304.10573
  • Abstract
    Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-learning (IQL) addresses this by training a Q-function using only dataset actions through a modified Bellman backup. However, it is unclear which policy actually attains the values represented by this implicitly trained Q-function. In this paper, we reinterpret IQL as an actor-critic method by generalizing the critic objective and connecting it to a behavior-regularized implicit actor. This generalization shows how the induced actor balances reward maximization and divergence from the behavior policy, with the specific loss choice determining the nature of this tradeoff. Notably, this actor can exhibit complex and multimodal characteristics, suggesting issues with the conditional Gaussian actor fit with advantage weighted regression (AWR) used in prior methods. Instead, we propose using samples from a diffusion parameterized behavior policy and weights computed from the critic to then importance sampled our intended policy. We introduce Implicit Diffusion Q-learning (IDQL), combining our general IQL critic with the policy extraction method. IDQL maintains the ease of implementation of IQL while outperforming prior offline RL methods and demonstrating robustness to hyperparameters. Code is available at https://github.com/philippe-eecs/IDQL.

Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models

  • Authors: Jason J. Yu, Fereshteh Forghani, Konstantinos G. Derpanis, Marcus A. Brubaker
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10700
  • Pdf link: https://arxiv.org/pdf/2304.10700
  • Abstract
    Novel view synthesis from a single input image is a challenging task, where the goal is to generate a new view of a scene from a desired camera pose that may be separated by a large motion. The highly uncertain nature of this synthesis task due to unobserved elements within the scene (i.e., occlusion) and outside the field-of-view makes the use of generative models appealing to capture the variety of possible outputs. In this paper, we propose a novel generative model which is capable of producing a sequence of photorealistic images consistent with a specified camera trajectory, and a single starting image. Our approach is centred on an autoregressive conditional diffusion-based model capable of interpolating visible scene elements, and extrapolating unobserved regions in a view, in a geometrically consistent manner. Conditioning is limited to an image capturing a single camera view and the (relative) pose of the new camera view. To measure the consistency over a sequence of generated views, we introduce a new metric, the thresholded symmetric epipolar distance (TSED), to measure the number of consistent frame pairs in a sequence. While previous methods have been shown to produce high quality images and consistent semantics across pairs of views, we show empirically with our metric that they are often inconsistent with the desired camera poses. In contrast, we demonstrate that our method produces both photorealistic and view-consistent imagery.

Matching-based Data Valuation for Generative Model

  • Authors: Jiaxi Yang, Wenglong Deng, Benlin Liu, Yangsibo Huang, Xiaoxiao Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10701
  • Pdf link: https://arxiv.org/pdf/2304.10701
  • Abstract
    Data valuation is critical in machine learning, as it helps enhance model transparency and protect data properties. Existing data valuation methods have primarily focused on discriminative models, neglecting deep generative models that have recently gained considerable attention. Similar to discriminative models, there is an urgent need to assess data contributions in deep generative models as well. However, previous data valuation approaches mainly relied on discriminative model performance metrics and required model retraining. Consequently, they cannot be applied directly and efficiently to recent deep generative models, such as generative adversarial networks and diffusion models, in practice. To bridge this gap, we formulate the data valuation problem in generative models from a similarity-matching perspective. Specifically, we introduce Generative Model Valuator (GMValuator), the first model-agnostic approach for any generative models, designed to provide data valuation for generation tasks. We have conducted extensive experiments to demonstrate the effectiveness of the proposed method. To the best of their knowledge, GMValuator is the first work that offers a training-free, post-hoc data valuation strategy for deep generative models.

A Deep Learning algorithm to accelerate Algebraic Multigrid methods in Finite Element solvers of 3D elliptic PDEs

  • Authors: Matteo Caldana, Paola F. Antonietti, Luca Dede'
  • Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10832
  • Pdf link: https://arxiv.org/pdf/2304.10832
  • Abstract
    Algebraic multigrid (AMG) methods are among the most efficient solvers for linear systems of equations and they are widely used for the solution of problems stemming from the discretization of Partial Differential Equations (PDEs). The most severe limitation of AMG methods is the dependence on parameters that require to be fine-tuned. In particular, the strong threshold parameter is the most relevant since it stands at the basis of the construction of successively coarser grids needed by the AMG methods. We introduce a novel Deep Learning algorithm that minimizes the computational cost of the AMG method when used as a finite element solver. We show that our algorithm requires minimal changes to any existing code. The proposed Artificial Neural Network (ANN) tunes the value of the strong threshold parameter by interpreting the sparse matrix of the linear system as a black-and-white image and exploiting a pooling operator to transform it into a small multi-channel image. We experimentally prove that the pooling successfully reduces the computational cost of processing a large sparse matrix and preserves the features needed for the regression task at hand. We train the proposed algorithm on a large dataset containing problems with a highly heterogeneous diffusion coefficient defined in different three-dimensional geometries and discretized with unstructured grids and linear elasticity problems with a highly heterogeneous Young's modulus. When tested on problems with coefficients or geometries not present in the training dataset, our approach reduces the computational time by up to 30%.

A numerical method for the stability analysis of linear age-structured models with nonlocal diffusion

  • Authors: Dimitri Breda, Simone De Reggi, Rossana Vermiglio
  • Subjects: Numerical Analysis (math.NA); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.10835
  • Pdf link: https://arxiv.org/pdf/2304.10835
  • Abstract
    We numerically investigate the stability of linear age-structured population models with nonlocal diffusion, which arise naturally in describing dynamics of infectious diseases. Compared to Laplace diffusion, the analysis of models with nonlocal diffusion is more challenging since the associated semigroups have no regularizing properties in the spatial variable. Nevertheless, the asymptotic stability of the null equilibrium is determined by the spectrum of the infinitesimal generator associated to the semigroup. We propose to approximate the leading part of this spectrum by first reformulating the problem via integration of the age-state and then by discretizing the generator combining a spectral projection in space with a pseudospectral collocation in age. A rigorous convergence analysis is provided in the case of separable model coefficients. Results are confirmed experimentally and numerical tests are presented also for the more general instance.

A convection-diffusion problem with a large shift on Duran meshes

  • Authors: Mirjana Brdar, Sebastian Franz, Hans-Goerg Roosc
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.10937
  • Pdf link: https://arxiv.org/pdf/2304.10937
  • Abstract
    A convection-diffusion problem with a large shift in space is considered. Numerical analysis of high order finite element methods on layer-adapted Duran type meshes, as well as on coarser Duran type meshes in places where weak layers appear, is provided. The theoretical results are confirmed by numerical experiments.

Improved Diffusion-based Image Colorization via Piggybacked Models

  • Authors: Hanyuan Liu, Jinbo Xing, Minshan Xie, Chengze Li, Tien-Tsin Wong
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.11105
  • Pdf link: https://arxiv.org/pdf/2304.11105
  • Abstract
    Image colorization has been attracting the research interests of the community for decades. However, existing methods still struggle to provide satisfactory colorized results given grayscale images due to a lack of human-like global understanding of colors. Recently, large-scale Text-to-Image (T2I) models have been exploited to transfer the semantic information from the text prompts to the image domain, where text provides a global control for semantic objects in the image. In this work, we introduce a colorization model piggybacking on the existing powerful T2I diffusion model. Our key idea is to exploit the color prior knowledge in the pre-trained T2I diffusion model for realistic and diverse colorization. A diffusion guider is designed to incorporate the pre-trained weights of the latent diffusion model to output a latent color prior that conforms to the visual semantics of the grayscale input. A lightness-aware VQVAE will then generate the colorized result with pixel-perfect alignment to the given grayscale image. Our model can also achieve conditional colorization with additional inputs (e.g. user hints and texts). Extensive experiments show that our method achieves state-of-the-art performance in terms of perceptual quality.

BoDiffusion: Diffusing Sparse Observations for Full-Body Human Motion Synthesis

  • Authors: Angela Castillo, Maria Escobar, Guillaume Jeanneret, Albert Pumarola, Pablo Arbeláez, Ali Thabet, Artsiom Sanakoyeu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.11118
  • Pdf link: https://arxiv.org/pdf/2304.11118
  • Abstract
    Mixed reality applications require tracking the user's full-body motion to enable an immersive experience. However, typical head-mounted devices can only track head and hand movements, leading to a limited reconstruction of full-body motion due to variability in lower body configurations. We propose BoDiffusion -- a generative diffusion model for motion synthesis to tackle this under-constrained reconstruction problem. We present a time and space conditioning scheme that allows BoDiffusion to leverage sparse tracking inputs while generating smooth and realistic full-body motion sequences. To the best of our knowledge, this is the first approach that uses the reverse diffusion process to model full-body tracking as a conditional sequence generation task. We conduct experiments on the large-scale motion-capture dataset AMASS and show that our approach outperforms the state-of-the-art approaches by a significant margin in terms of full-body motion realism and joint reconstruction error.

Keyword: dynamic

Optimization of a Hydrodynamic Computational Reservoir through Evolution

  • Authors: Alessandro Pierro, Kristine Heiney, Shamit Shrivastava, Giulia Marcucci, Stefano Nichele
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.10610
  • Pdf link: https://arxiv.org/pdf/2304.10610
  • Abstract
    As demand for computational resources reaches unprecedented levels, research is expanding into the use of complex material substrates for computing. In this study, we interface with a model of a hydrodynamic system, under development by a startup, as a computational reservoir and optimize its properties using an evolution in materio approach. Input data are encoded as waves applied to our shallow water reservoir, and the readout wave height is obtained at a fixed detection point. We optimized the readout times and how inputs are mapped to the wave amplitude or frequency using an evolutionary search algorithm, with the objective of maximizing the system's ability to linearly separate observations in the training data by maximizing the readout matrix determinant. Applying evolutionary methods to this reservoir system substantially improved separability on an XNOR task, in comparison to implementations with hand-selected parameters. We also applied our approach to a regression task and show that our approach improves out-of-sample accuracy. Results from this study will inform how we interface with the physical reservoir in future work, and we will use these methods to continue to optimize other aspects of the physical implementation of this system as a computational reservoir.

HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative perception with vision transformer

  • Authors: Hao Xiang, Runsheng Xu, Jiaqi Ma
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10628
  • Pdf link: https://arxiv.org/pdf/2304.10628
  • Abstract
    Vehicle-to-Vehicle technologies have enabled autonomous vehicles to share information to see through occlusions, greatly enhancing perception performance. Nevertheless, existing works all focused on homogeneous traffic where vehicles are equipped with the same type of sensors, which significantly hampers the scale of collaboration and benefit of cross-modality interactions. In this paper, we investigate the multi-agent hetero-modal cooperative perception problem where agents may have distinct sensor modalities. We present HM-ViT, the first unified multi-agent hetero-modal cooperative perception framework that can collaboratively predict 3D objects for highly dynamic vehicle-to-vehicle (V2V) collaborations with varying numbers and types of agents. To effectively fuse features from multi-view images and LiDAR point clouds, we design a novel heterogeneous 3D graph transformer to jointly reason inter-agent and intra-agent interactions. The extensive experiments on the V2V perception dataset OPV2V demonstrate that the HM-ViT outperforms SOTA cooperative perception methods for V2V hetero-modal cooperative perception. We will release codes to facilitate future research.

Feature point detection in HDR images based on coefficient of variation

  • Authors: Artur Santos Nascimento, Welerson Augusto Lino de Jesus Melo, Daniel Oliveira Dantas, Beatriz Trinchão Andrade
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10666
  • Pdf link: https://arxiv.org/pdf/2304.10666
  • Abstract
    Feature point (FP) detection is a fundamental step of many computer vision tasks. However, FP detectors are usually designed for low dynamic range (LDR) images. In scenes with extreme light conditions, LDR images present saturated pixels, which degrade FP detection. On the other hand, high dynamic range (HDR) images usually present no saturated pixels but FP detection algorithms do not take advantage of all the information present in such images. FP detection frequently relies on differential methods, which work well in LDR images. However, in HDR images, the differential operation response in bright areas overshadows the response in dark areas. As an alternative to standard FP detection methods, this study proposes an FP detector based on a coefficient of variation (CV) designed for HDR images. The CV operation adapts its response based on the standard deviation of pixels inside a window, working well in both dark and bright areas of HDR images. The proposed and standard detectors are evaluated by measuring their repeatability rate (RR) and uniformity. Our proposed detector shows better performance when compared to other standard state-of-the-art detectors. In uniformity metric, our proposed detector surpasses all the other algorithms. In other hand, when using the repeatability rate metric, the proposed detector is worse than Harris for HDR and SURF detectors.

FSNet: Redesign Self-Supervised MonoDepth for Full-Scale Depth Prediction for Autonomous Driving

  • Authors: Yuxuan Liu, Zhenhua Xu, Huaiyang Huang, Lujia Wang, Ming Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10719
  • Pdf link: https://arxiv.org/pdf/2304.10719
  • Abstract
    Predicting accurate depth with monocular images is important for low-cost robotic applications and autonomous driving. This study proposes a comprehensive self-supervised framework for accurate scale-aware depth prediction on autonomous driving scenes utilizing inter-frame poses obtained from inertial measurements. In particular, we introduce a Full-Scale depth prediction network named FSNet. FSNet contains four important improvements over existing self-supervised models: (1) a multichannel output representation for stable training of depth prediction in driving scenarios, (2) an optical-flow-based mask designed for dynamic object removal, (3) a self-distillation training strategy to augment the training process, and (4) an optimization-based post-processing algorithm in test time, fusing the results from visual odometry. With this framework, robots and vehicles with only one well-calibrated camera can collect sequences of training image frames and camera poses, and infer accurate 3D depths of the environment without extra labeling work or 3D data. Extensive experiments on the KITTI dataset, KITTI-360 dataset and the nuScenes dataset demonstrate the potential of FSNet. More visualizations are presented in \url{https://sites.google.com/view/fsnet/home}

Conservative Sparse Neural Network Embedded Frequency-Constrained Unit Commitment With Distributed Energy Resources

  • Authors: Linwei Sang, Yinliang Xu, Zhongkai Yi, Lun Yang, Huan Long, Hongbin Sun
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10720
  • Pdf link: https://arxiv.org/pdf/2304.10720
  • Abstract
    The increasing penetration of distributed energy resources (DERs) will decrease the rotational inertia of the power system and further degrade the system frequency stability. To address the above issues, this paper leverages the advanced neural network (NN) to learn the frequency dynamics and incorporates NN to facilitate system reliable operation. This paper proposes the conservative sparse neural network (CSNN) embedded frequency-constrained unit commitment (FCUC) with converter-based DERs, including the learning and optimization stages. In the learning stage, it samples the inertia parameters, calculates the corresponding frequency, and characterizes the stability region of the sampled parameters using the convex hulls to ensure stability and avoid extrapolation. For conservativeness, the positive prediction error penalty is added to the loss function to prevent possible frequency requirement violation. For the sparsity, the NN topology pruning is employed to eliminate unnecessary connections for solving acceleration. In the optimization stage, the trained CSNN is transformed into mixed-integer linear constraints using the big-M method and then incorporated to establish the data-enhanced model. The case study verifies 1) the effectiveness of the proposed model in terms of high accuracy, fewer parameters, and significant solving acceleration; 2) the stable system operation against frequency violation under contingency.

DeformableFormer: Classification of Endoscopic Ultrasound Guided Fine Needle Biopsy in Pancreatic Diseases

  • Authors: Taiji Kurami, Takuya Ishikawa, Kazuhiro Hotta
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10791
  • Pdf link: https://arxiv.org/pdf/2304.10791
  • Abstract
    Endoscopic Ultrasound-Fine Needle Aspiration (EUS-FNA) is used to examine pancreatic cancer. EUS-FNA is an examination using EUS to insert a thin needle into the tumor and collect pancreatic tissue fragments. Then collected pancreatic tissue fragments are then stained to classify whether they are pancreatic cancer. However, staining and visual inspection are time consuming. In addition, if the pancreatic tissue fragment cannot be examined after staining, the collection must be done again on the other day. Therefore, our purpose is to classify from an unstained image whether it is available for examination or not, and to exceed the accuracy of visual classification by specialist physicians. Image classification before staining can reduce the time required for staining and the burden of patients. However, the images of pancreatic tissue fragments used in this study cannot be successfully classified by processing the entire image because the pancreatic tissue fragments are only a part of the image. Therefore, we propose a DeformableFormer that uses Deformable Convolution in MetaFormer framework. The architecture consists of a generalized model of the Vision Transformer, and we use Deformable Convolution in the TokenMixer part. In contrast to existing approaches, our proposed DeformableFormer is possible to perform feature extraction more locally and dynamically by Deformable Convolution. Therefore, it is possible to perform suitable feature extraction for classifying target. To evaluate our method, we classify two categories of pancreatic tissue fragments; available and unavailable for examination. We demonstrated that our method outperformed the accuracy by specialist physicians and conventional methods.

A numerical method for the stability analysis of linear age-structured models with nonlocal diffusion

  • Authors: Dimitri Breda, Simone De Reggi, Rossana Vermiglio
  • Subjects: Numerical Analysis (math.NA); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.10835
  • Pdf link: https://arxiv.org/pdf/2304.10835
  • Abstract
    We numerically investigate the stability of linear age-structured population models with nonlocal diffusion, which arise naturally in describing dynamics of infectious diseases. Compared to Laplace diffusion, the analysis of models with nonlocal diffusion is more challenging since the associated semigroups have no regularizing properties in the spatial variable. Nevertheless, the asymptotic stability of the null equilibrium is determined by the spectrum of the infinitesimal generator associated to the semigroup. We propose to approximate the leading part of this spectrum by first reformulating the problem via integration of the age-state and then by discretizing the generator combining a spectral projection in space with a pseudospectral collocation in age. A rigorous convergence analysis is provided in the case of separable model coefficients. Results are confirmed experimentally and numerical tests are presented also for the more general instance.

A Comprehensive Review on Ontologies for Scenario-based Testing in the Context of Autonomous Driving

  • Authors: Maximilian Zipfl, Nina Koch, J. Marius Zöllner
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.10837
  • Pdf link: https://arxiv.org/pdf/2304.10837
  • Abstract
    The verification and validation of autonomous driving vehicles remains a major challenge due to the high complexity of autonomous driving functions. Scenario-based testing is a promising method for validating such a complex system. Ontologies can be utilized to produce test scenarios that are both meaningful and relevant. One crucial aspect of this process is selecting the appropriate method for describing the entities involved. The level of detail and specific entity classes required will vary depending on the system being tested. It is important to choose an ontology that properly reflects these needs. This paper summarizes key representative ontologies for scenario-based testing and related use cases in the field of autonomous driving. The considered ontologies are classified according to their level of detail for both static facts and dynamic aspects. Furthermore, the ontologies are evaluated based on the presence of important entity classes and the relations between them.

Effective Numerical Simulations of Synchronous Generator System

  • Authors: Jiawei Zhang, Aiqing Zhu, Feng Ji, Chang Lin, Yifa Tang
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.10882
  • Pdf link: https://arxiv.org/pdf/2304.10882
  • Abstract
    Synchronous generator system is a complicated dynamical system for energy transmission, which plays an important role in modern industrial production. In this article, we propose some predictor-corrector methods and structure-preserving methods for a generator system based on the first benchmark model of subsynchronous resonance, among which the structure-preserving methods preserve a Dirac structure associated with the so-called port-Hamiltonian descriptor systems. To illustrate this, the simplified generator system in the form of index-1 differential-algebraic equations has been derived. Our analyses provide the global error estimates for a special class of structure-preserving methods called Gauss methods, which guarantee their superior performance over the PSCAD/EMTDC and the predictor-corrector methods in terms of computational stability. Numerical simulations are implemented to verify the effectiveness and advantages of our methods.

AMP in the wild: Learning robust, agile, natural legged locomotion skills

  • Authors: Yikai Wang, Zheyuan Jiang, Jianyu Chen
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10888
  • Pdf link: https://arxiv.org/pdf/2304.10888
  • Abstract
    The successful transfer of a learned controller from simulation to the real world for a legged robot requires not only the ability to identify the system, but also accurate estimation of the robot's state. In this paper, we propose a novel algorithm that can infer not only information about the parameters of the dynamic system, but also estimate important information about the robot's state from previous observations. We integrate our algorithm with Adversarial Motion Priors and achieve a robust, agile, and natural gait in both simulation and on a Unitree A1 quadruped robot in the real world. Empirical results demonstrate that our proposed algorithm enables traversing challenging terrains with lower power consumption compared to the baselines. Both qualitative and quantitative results are presented in this paper.

Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems

  • Authors: Mehran Salmani (1), Saeid Ghafouri (2 and 4), Alireza Sanaee (2), Kamran Razavi (3), Max Mühlhäuser (3), Joseph Doyle (2), Pooyan Jamshidi (4), Mohsen Sharif (1) ((1) Iran University of Science and Technology, (2) Queen Mary University of London, (3) Technical University of Darmstadt, (4) University of South Carolina)
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10892
  • Pdf link: https://arxiv.org/pdf/2304.10892
  • Abstract
    The use of machine learning (ML) inference for various applications is growing drastically. ML inference services engage with users directly, requiring fast and accurate responses. Moreover, these services face dynamic workloads of requests, imposing changes in their computing resources. Failing to right-size computing resources results in either latency service level objectives (SLOs) violations or wasted computing resources. Adapting to dynamic workloads considering all the pillars of accuracy, latency, and resource cost is challenging. In response to these challenges, we propose InfAdapter, which proactively selects a set of ML model variants with their resource allocations to meet latency SLO while maximizing an objective function composed of accuracy and cost. InfAdapter decreases SLO violation and costs up to 65% and 33%, respectively, compared to a popular industry autoscaler (Kubernetes Vertical Pod Autoscaler).

Electromechanical memcapacitive neurons for energy-efficient spiking neural networks

  • Authors: Zixi Zhang, Yuriy V. Pershin, Ivar Martin
  • Subjects: Emerging Technologies (cs.ET); Mesoscale and Nanoscale Physics (cond-mat.mes-hall)
  • Arxiv link: https://arxiv.org/abs/2304.10899
  • Pdf link: https://arxiv.org/pdf/2304.10899
  • Abstract
    In this article, we introduce a new nanoscale electromechanical device -- a leaky memcapacitor -- and show that it may be useful for the hardware implementation of spiking neurons. The leaky memcapacitor is a movable-plate capacitor that becomes quite conductive when the plates come close to each other. The equivalent circuit of the leaky memcapacitor involves a memcapacitive and memristive system connected in parallel. In the leaky memcapacitor, the resistance and capacitance depend on the same internal state variable, which is the displacement of the movable plate. We have performed a comprehensive analysis showing that several spiking types observed in biological neurons can be implemented with the leaky memcapacitor. Significant attention is paid to the dynamic properties of the model. As in leaky memcapacitors the capacitive and leaking resistive functionalities are implemented naturally within the same device structure, their use will simplify the creation of spiking neural networks.

Gradient-Based Distributed Controller Design Over Directed Networks

  • Authors: Yuto Watanabe, Kazunori Sakurama, Hyo-Sung Ahn
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10921
  • Pdf link: https://arxiv.org/pdf/2304.10921
  • Abstract
    In this study, we propose a design methodology of distributed controllers for multi-agent systems on a class of directed interaction networks by extending the gradient-flow method. Although the gradient-flow method is a common design tool for distributed controllers, it is inapplicable to directed networks. First, we demonstrate how to construct a distributed controller for systems over a class of time-invariant directed graphs. Subsequently, we achieve better convergence properties and performance enhancement than the conventional gradient-flow method. To illustrate its application in time-varying networks, we address the dynamic matching problem of two distinct groups of agents with different sensing ranges. This problem is a novel coordination task that involves pairing agents from two distinct groups to achieve a convergence of the paired agents' states to the same value. Accordingly, we apply the proposed method to this problem and provide sufficient conditions for successful matching. Lastly, numerical examples for systems on both time-invariant and time-varying networks demonstrate the effectiveness of the proposed method.

Real-Time Implementation of Dynamic State Estimation for Microgrid Load Bus Protection

  • Authors: Sarbajit Basu, Arthur K. Barnes, Adam Mate, Olga Lavrova
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10936
  • Pdf link: https://arxiv.org/pdf/2304.10936
  • Abstract
    Inverter-interfaced microgrids, owing to the lack of fault current, cannot be protected using traditional over-current protections, while admittance or differential relaying protection schemes are not practical to be implemented. Dynamic state estimation can track and predict power system transients and has been extensively investigated for setting-less protection. A novel real-time application of dynamic state estimation for protection is proposed in this paper, wherein parameter estimation and parallel processing is used to identify the state of the system. The implementation scheme has low process complexity and employs a data acquisition device and estimator that run on a general-purpose computer. This proposed implementation extends the state-of-the-art, under short-circuit conditions, to a real-time implementation with a lumped-load radial microgrid and a grid-forming inverter with current-limiting behavior.

Gradient Derivation for Learnable Parameters in Graph Attention Networks

  • Authors: Marion Neumeier, Andreas Tollkühn, Sebastian Dorn, Michael Botsch, Wolfgang Utschick
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10939
  • Pdf link: https://arxiv.org/pdf/2304.10939
  • Abstract
    This work provides a comprehensive derivation of the parameter gradients for GATv2 [4], a widely used implementation of Graph Attention Networks (GATs). GATs have proven to be powerful frameworks for processing graph-structured data and, hence, have been used in a range of applications. However, the achieved performance by these attempts has been found to be inconsistent across different datasets and the reasons for this remains an open research question. As the gradient flow provides valuable insights into the training dynamics of statistically learning models, this work obtains the gradients for the trainable model parameters of GATv2. The gradient derivations supplement the efforts of [2], where potential pitfalls of GATv2 are investigated.

RGB-D Inertial Odometry for a Resource-Restricted Robot in Dynamic Environments

  • Authors: Jianheng Liu, Xuanfu Li, Yueqian Liu, Haoyao Chen
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.10987
  • Pdf link: https://arxiv.org/pdf/2304.10987
  • Abstract
    Current simultaneous localization and mapping (SLAM) algorithms perform well in static environments but easily fail in dynamic environments. Recent works introduce deep learning-based semantic information to SLAM systems to reduce the influence of dynamic objects. However, it is still challenging to apply a robust localization in dynamic environments for resource-restricted robots. This paper proposes a real-time RGB-D inertial odometry system for resource-restricted robots in dynamic environments named Dynamic-VINS. Three main threads run in parallel: object detection, feature tracking, and state optimization. The proposed Dynamic-VINS combines object detection and depth information for dynamic feature recognition and achieves performance comparable to semantic segmentation. Dynamic-VINS adopts grid-based feature detection and proposes a fast and efficient method to extract high-quality FAST feature points. IMU is applied to predict motion for feature tracking and moving consistency check. The proposed method is evaluated on both public datasets and real-world applications and shows competitive localization accuracy and robustness in dynamic environments. Yet, to the best of our knowledge, it is the best-performance real-time RGB-D inertial odometry for resource-restricted platforms in dynamic environments for now. The proposed system is open source at: https://github.com/HITSZ-NRSL/Dynamic-VINS.git

Multi-level decision framework collision avoidance algorithm in emergency scenarios

  • Authors: Guoying Chen, Xinyu Wang, Min Hua, Wei Liu
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.11013
  • Pdf link: https://arxiv.org/pdf/2304.11013
  • Abstract
    With the rapid development of autonomous driving, the attention of academia has increasingly focused on the development of anti-collision systems in emergency scenarios, which have a crucial impact on driving safety. While numerous anti-collision strategies have emerged in recent years, most of them only consider steering or braking. The dynamic and complex nature of the driving environment presents a challenge to developing robust collision avoidance algorithms in emergency scenarios. To address the complex, dynamic obstacle scene and improve lateral maneuverability, this paper establishes a multi-level decision-making obstacle avoidance framework that employs the safe distance model and integrates emergency steering and emergency braking to complete the obstacle avoidance process. This approach helps avoid the high-risk situation of vehicle instability that can result from the separation of steering and braking actions. In the emergency steering algorithm, we define the collision hazard moment and propose a multi-constraint dynamic collision avoidance planning method that considers the driving area. Simulation results demonstrate that the decision-making collision avoidance logic can be applied to dynamic collision avoidance scenarios in complex traffic situations, effectively completing the obstacle avoidance task in emergency scenarios and improving the safety of autonomous driving.

Robot-Enabled Construction Assembly with Automated Sequence Planning based on ChatGPT: RoboGPT

  • Authors: Hengxu You, Yang Ye, Tianyu Zhou, Qi Zhu, Jing Du
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.11018
  • Pdf link: https://arxiv.org/pdf/2304.11018
  • Abstract
    Robot-based assembly in construction has emerged as a promising solution to address numerous challenges such as increasing costs, labor shortages, and the demand for safe and efficient construction processes. One of the main obstacles in realizing the full potential of these robotic systems is the need for effective and efficient sequence planning for construction tasks. Current approaches, including mathematical and heuristic techniques or machine learning methods, face limitations in their adaptability and scalability to dynamic construction environments. To expand the ability of the current robot system in sequential understanding, this paper introduces RoboGPT, a novel system that leverages the advanced reasoning capabilities of ChatGPT, a large language model, for automated sequence planning in robot-based assembly applied to construction tasks. The proposed system adapts ChatGPT for construction sequence planning and demonstrate its feasibility and effectiveness through experimental evaluation including Two case studies and 80 trials about real construction tasks. The results show that RoboGPT-driven robots can handle complex construction operations and adapt to changes on the fly. This paper contributes to the ongoing efforts to enhance the capabilities and performance of robot-based assembly systems in the construction industry, and it paves the way for further integration of large language model technologies in the field of construction robotics.

A Multi-Fidelity Bayesian Approach to Safe Controller Design

  • Authors: Ethan Lau, Vaibhav Srivastava, Shaunak Bopardikar
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.11023
  • Pdf link: https://arxiv.org/pdf/2304.11023
  • Abstract
    Safely controlling unknown dynamical systems is one of the biggest challenges in the field of control. Oftentimes, an approximate model of a system's dynamics exists which provides beneficial information for the selection of controls. However, differences between the approximate and true systems present challenges as well as safety concerns. We propose an algorithm called SAFE-SLOPE to safely evaluate points from a Gaussian process model of a function when its Lipschitz constant is unknown. We establish theoretical guarantees for the performance of SAFE-SLOPE and quantify how multi-fidelity modeling improves the algorithm's performance. Finally, we demonstrate how SAFE-SLOPE achieves lower cumulative regret than a naive sampling method by applying it to find the control gains of a linear time-invariant system.

Analog Feedback-Controlled Memristor programming Circuit for analog Content Addressable Memory

  • Authors: Jiaao Yu, Paul-Philipp Manea, Sara Ameli, Mohammad Hizzani, Amro Eldebiky, John Paul Strachan
  • Subjects: Emerging Technologies (cs.ET); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.11030
  • Pdf link: https://arxiv.org/pdf/2304.11030
  • Abstract
    Recent breakthroughs in associative memories suggest that silicon memories are coming closer to human memories, especially for memristive Content Addressable Memories (CAMs) which are capable to read and write in analog values. However, the Program-Verify algorithm, the state-of-the-art memristor programming algorithm, requires frequent switching between verifying and programming memristor conductance, which brings many defects such as high dynamic power and long programming time. Here, we propose an analog feedback-controlled memristor programming circuit that makes use of a novel look-up table-based (LUT-based) programming algorithm. With the proposed algorithm, the programming and the verification of a memristor can be performed in a single-direction sequential process. Besides, we also integrated a single proposed programming circuit with eight analog CAM (aCAM) cells to build an aCAM array. We present SPICE simulations on TSMC 28nm process. The theoretical analysis shows that 1. A memristor conductance within an aCAM cell can be converted to an output boundary voltage in aCAM searching operations and 2. An output boundary voltage in aCAM searching operations can be converted to a programming data line voltage in aCAM programming operations. The simulation results of the proposed programming circuit prove the theoretical analysis and thus verify the feasibility to program memristors without frequently switching between verifying and programming the conductance. Besides, the simulation results of the proposed aCAM array show that the proposed programming circuit can be integrated into a large array architecture.

Backpropagation-free Training of Deep Physical Neural Networks

  • Authors: Ali Momeni, Babak Rahmani, Matthieu Mallejac, Philipp Del Hougne, Romain Fleury
  • Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Applied Physics (physics.app-ph); Optics (physics.optics)
  • Arxiv link: https://arxiv.org/abs/2304.11042
  • Pdf link: https://arxiv.org/pdf/2304.11042
  • Abstract
    Recent years have witnessed the outstanding success of deep learning in various fields such as vision and natural language processing. This success is largely indebted to the massive size of deep learning models that is expected to increase unceasingly. This growth of the deep learning models is accompanied by issues related to their considerable energy consumption, both during the training and inference phases, as well as their scalability. Although a number of work based on unconventional physical systems have been proposed which addresses the issue of energy efficiency in the inference phase, efficient training of deep learning models has remained unaddressed. So far, training of digital deep learning models mainly relies on backpropagation, which is not suitable for physical implementation as it requires perfect knowledge of the computation performed in the so-called forward pass of the neural network. Here, we tackle this issue by proposing a simple deep neural network architecture augmented by a biologically plausible learning algorithm, referred to as "model-free forward-forward training". The proposed architecture enables training deep physical neural networks consisting of layers of physical nonlinear systems, without requiring detailed knowledge of the nonlinear physical layers' properties. We show that our method outperforms state-of-the-art hardware-aware training methods by improving training speed, decreasing digital computations, and reducing power consumption in physical systems. We demonstrate the adaptability of the proposed method, even in systems exposed to dynamic or unpredictable external perturbations. To showcase the universality of our approach, we train diverse wave-based physical neural networks that vary in the underlying wave phenomenon and the type of non-linearity they use, to perform vowel and image classification tasks experimentally.

An Unbiased Transformer Source Code Learning with Semantic Vulnerability Graph

  • Authors: Nafis Tanveer Islam, Gonzalo De La Torre Parra, Dylan Manuel, Elias Bou-Harb, Peyman Najafirad
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.11072
  • Pdf link: https://arxiv.org/pdf/2304.11072
  • Abstract
    Over the years, open-source software systems have become prey to threat actors. Even as open-source communities act quickly to patch the breach, code vulnerability screening should be an integral part of agile software development from the beginning. Unfortunately, current vulnerability screening techniques are ineffective at identifying novel vulnerabilities or providing developers with code vulnerability and classification. Furthermore, the datasets used for vulnerability learning often exhibit distribution shifts from the real-world testing distribution due to novel attack strategies deployed by adversaries and as a result, the machine learning model's performance may be hindered or biased. To address these issues, we propose a joint interpolated multitasked unbiased vulnerability classifier comprising a transformer "RoBERTa" and graph convolution neural network (GCN). We present a training process utilizing a semantic vulnerability graph (SVG) representation from source code, created by integrating edges from a sequential flow, control flow, and data flow, as well as a novel flow dubbed Poacher Flow (PF). Poacher flow edges reduce the gap between dynamic and static program analysis and handle complex long-range dependencies. Moreover, our approach reduces biases of classifiers regarding unbalanced datasets by integrating Focal Loss objective function along with SVG. Remarkably, experimental results show that our classifier outperforms state-of-the-art results on vulnerability detection with fewer false negatives and false positives. After testing our model across multiple datasets, it shows an improvement of at least 2.41% and 18.75% in the best-case scenario. Evaluations using N-day program samples demonstrate that our proposed approach achieves a 93% accuracy and was able to detect 4, zero-day vulnerabilities from popular GitHub repositories.

Generative AI-enabled Vehicular Networks: Fundamentals, Framework, and Case Study

  • Authors: Ruichen Zhang, Ke Xiong, Hongyang Du, Dusit Niyato, Jiawen Kang, Xuemin Shen, H. Vincent Poor
  • Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2304.11098
  • Pdf link: https://arxiv.org/pdf/2304.11098
  • Abstract
    Recognizing the tremendous improvements that the integration of generative AI can bring to intelligent transportation systems, this article explores the integration of generative AI technologies in vehicular networks, focusing on their potential applications and challenges. Generative AI, with its capabilities of generating realistic data and facilitating advanced decision-making processes, enhances various applications when combined with vehicular networks, such as navigation optimization, traffic prediction, data generation, and evaluation. Despite these promising applications, the integration of generative AI with vehicular networks faces several challenges, such as real-time data processing and decision-making, adapting to dynamic and unpredictable environments, as well as privacy and security concerns. To address these challenges, we propose a multi-modality semantic-aware framework to enhance the service quality of generative AI. By leveraging multi-modal and semantic communication technologies, the framework enables the use of text and image data for creating multi-modal content, providing more reliable guidance to receiving vehicles and ultimately improving system usability and efficiency. To further improve the reliability and efficiency of information transmission and reconstruction within the framework, taking generative AI-enabled vehicle-to-vehicle (V2V) as a case study, a deep reinforcement learning (DRL)-based approach is proposed for resource allocation. Finally, we discuss potential research directions and anticipated advancements in the field of generative AI-enabled vehicular networks.

Approximate Shielding of Atari Agents for Safe Exploration

  • Authors: Alexander W. Goodall, Francesco Belardinelli
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.11104
  • Pdf link: https://arxiv.org/pdf/2304.11104
  • Abstract
    Balancing exploration and conservatism in the constrained setting is an important problem if we are to use reinforcement learning for meaningful tasks in the real world. In this paper, we propose a principled algorithm for safe exploration based on the concept of shielding. Previous approaches to shielding assume access to a safety-relevant abstraction of the environment or a high-fidelity simulator. Instead, our work is based on latent shielding - another approach that leverages world models to verify policy roll-outs in the latent space of a learned dynamics model. Our novel algorithm builds on this previous work, using safety critics and other additional features to improve the stability and farsightedness of the algorithm. We demonstrate the effectiveness of our approach by running experiments on a small set of Atari games with state dependent safety labels. We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations, and in some cases improves the speed of convergence and quality of the final agent.

New submissions for Tue, 21 Mar 23

Keyword: pruning

FedRight: An Effective Model Copyright Protection for Federated Learning

  • Authors: Jinyin Chen, Mingjun Li, Mingjun Li, Haibin Zheng
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.10399
  • Pdf link: https://arxiv.org/pdf/2303.10399
  • Abstract
    Federated learning (FL), an effective distributed machine learning framework, implements model training and meanwhile protects local data privacy. It has been applied to a broad variety of practice areas due to its great performance and appreciable profits. Who owns the model, and how to protect the copyright has become a real problem. Intuitively, the existing property rights protection methods in centralized scenarios (e.g., watermark embedding and model fingerprints) are possible solutions for FL. But they are still challenged by the distributed nature of FL in aspects of the no data sharing, parameter aggregation, and federated training settings. For the first time, we formalize the problem of copyright protection for FL, and propose FedRight to protect model copyright based on model fingerprints, i.e., extracting model features by generating adversarial examples as model fingerprints. FedRight outperforms previous works in four key aspects: (i) Validity: it extracts model features to generate transferable fingerprints to train a detector to verify the copyright of the model. (ii) Fidelity: it is with imperceptible impact on the federated training, thus promising good main task performance. (iii) Robustness: it is empirically robust against malicious attacks on copyright protection, i.e., fine-tuning, model pruning, and adaptive attacks. (iv) Black-box: it is valid in the black-box forensic scenario where only application programming interface calls to the model are available. Extensive evaluations across 3 datasets and 9 model structures demonstrate FedRight's superior fidelity, validity, and robustness.

ExplainFix: Explainable Spatially Fixed Deep Networks

  • Authors: Alex Gaudio, Christos Faloutsos, Asim Smailagic, Pedro Costa, Aurelio Campilho
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.10408
  • Pdf link: https://arxiv.org/pdf/2303.10408
  • Abstract
    Is there an initialization for deep networks that requires no learning? ExplainFix adopts two design principles: the "fixed filters" principle that all spatial filter weights of convolutional neural networks can be fixed at initialization and never learned, and the "nimbleness" principle that only few network parameters suffice. We contribute (a) visual model-based explanations, (b) speed and accuracy gains, and (c) novel tools for deep convolutional neural networks. ExplainFix gives key insights that spatially fixed networks should have a steered initialization, that spatial convolution layers tend to prioritize low frequencies, and that most network parameters are not necessary in spatially fixed models. ExplainFix models have up to 100x fewer spatial filter kernels than fully learned models and matching or improved accuracy. Our extensive empirical analysis confirms that ExplainFix guarantees nimbler models (train up to 17% faster with channel pruning), matching or improved predictive performance (spanning 13 distinct baseline models, four architectures and two medical image datasets), improved robustness to larger learning rate, and robustness to varying model size. We are first to demonstrate that all spatial filters in state-of-the-art convolutional deep networks can be fixed at initialization, not learned.

Induced Feature Selection by Structured Pruning

  • Authors: Nathan Hubens, Victor Delvigne, Matei Mancas, Bernard Gosselin, Marius Preda, Titus Zaharia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.10999
  • Pdf link: https://arxiv.org/pdf/2303.10999
  • Abstract
    The advent of sparsity inducing techniques in neural networks has been of a great help in the last few years. Indeed, those methods allowed to find lighter and faster networks, able to perform more efficiently in resource-constrained environment such as mobile devices or highly requested servers. Such a sparsity is generally imposed on the weights of neural networks, reducing the footprint of the architecture. In this work, we go one step further by imposing sparsity jointly on the weights and on the input data. This can be achieved following a three-step process: 1) impose a certain structured sparsity on the weights of the network; 2) track back input features corresponding to zeroed blocks of weight; 3) remove useless weights and input features and retrain the network. Performing pruning both on the network and on input data not only allows for extreme reduction in terms of parameters and operations but can also serve as an interpretation process. Indeed, with the help of data pruning, we now have information about which input feature is useful for the network to keep its performance. Experiments conducted on a variety of architectures and datasets: MLP validated on MNIST, CIFAR10/100 and ConvNets (VGG16 and ResNet18), validated on CIFAR10/100 and CALTECH101 respectively, show that it is possible to achieve additional gains in terms of total parameters and in FLOPs by performing pruning on input data, while also increasing accuracy.

Keyword: neural\ architecture\ search

There is no result

Keyword: 3d object detection

CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

  • Authors: Kaixin Xiong, Shi Gong, Xiaoqing Ye, Xiao Tan, Ji Wan, Errui Ding, Jingdong Wang, Xiang Bai
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.10209
  • Pdf link: https://arxiv.org/pdf/2303.10209
  • Abstract
    In this paper, we address the problem of detecting 3D objects from multi-view images. Current query-based methods rely on global 3D position embeddings (PE) to learn the geometric correspondence between images and 3D space. We claim that directly interacting 2D image features with global 3D PE could increase the difficulty of learning view transformation due to the variation of camera extrinsics. Thus we propose a novel method based on CAmera view Position Embedding, called CAPE. We form the 3D position embeddings under the local camera-view coordinate system instead of the global coordinate system, such that 3D position embedding is free of encoding camera extrinsic parameters. Furthermore, we extend our CAPE to temporal modeling by exploiting the object queries of previous frames and encoding the ego-motion for boosting 3D object detection. CAPE achieves state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset. Codes and models are available on \href{https://github.com/PaddlePaddle/Paddle3D}{Paddle3D} and \href{https://github.com/kaixinbear/CAPE}{PyTorch Implementation}.

Vehicle-Infrastructure Cooperative 3D Object Detection via Feature Flow Prediction

  • Authors: Haibao Yu, Yingjuan Tang, Enze Xie, Jilei Mao, Jirui Yuan, Ping Luo, Zaiqing Nie
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.10552
  • Pdf link: https://arxiv.org/pdf/2303.10552
  • Abstract
    Cooperatively utilizing both ego-vehicle and infrastructure sensor data can significantly enhance autonomous driving perception abilities. However, temporal asynchrony and limited wireless communication in traffic environments can lead to fusion misalignment and impact detection performance. This paper proposes Feature Flow Net (FFNet), a novel cooperative detection framework that uses a feature flow prediction module to address these issues in vehicle-infrastructure cooperative 3D object detection. Rather than transmitting feature maps extracted from still-images, FFNet transmits feature flow, which leverages the temporal coherence of sequential infrastructure frames to predict future features and compensate for asynchrony. Additionally, we introduce a self-supervised approach to enable FFNet to generate feature flow with feature prediction ability. Experimental results demonstrate that our proposed method outperforms existing cooperative detection methods while requiring no more than 1/10 transmission cost of raw data on the DAIR-V2X dataset when temporal asynchrony exceeds 200$ms$. The code is available at \href{https://github.com/haibao-yu/FFNet-VIC3D}{https://github.com/haibao-yu/FFNet-VIC3D}.

Long-Term Indoor Localization with Metric-Semantic Mapping using a Floor Plan Prior

  • Authors: Nicky Zimmerman, Matteo Sodano, Elias Marks, Jens Behley, Cyrill Stachniss
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.10959
  • Pdf link: https://arxiv.org/pdf/2303.10959
  • Abstract
    Object-based maps are relevant for scene understanding since they integrate geometric and semantic information of the environment, allowing autonomous robots to robustly localize and interact with on objects. In this paper, we address the task of constructing a metric-semantic map for the purpose of long-term object-based localization. We exploit 3D object detections from monocular RGB frames for both, the object-based map construction, and for globally localizing in the constructed map. To tailor the approach to a target environment, we propose an efficient way of generating 3D annotations to finetune the 3D object detection model. We evaluate our map construction in an office building, and test our long-term localization approach on challenging sequences recorded in the same environment over nine months. The experiments suggest that our approach is suitable for constructing metric-semantic maps, and that our localization approach is robust to long-term changes. Both, the mapping algorithm and the localization pipeline can run online on an onboard computer. We will release an open-source C++/ROS implementation of our approach.

VIMI: Vehicle-Infrastructure Multi-view Intermediate Fusion for Camera-based 3D Object Detection

  • Authors: Zhe Wang, Siqi Fan, Xiaoliang Huo, Tongda Xu, Yan Wang, Jingjing Liu, Yilun Chen, Ya-Qin Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.10975
  • Pdf link: https://arxiv.org/pdf/2303.10975
  • Abstract
    In autonomous driving, Vehicle-Infrastructure Cooperative 3D Object Detection (VIC3D) makes use of multi-view cameras from both vehicles and traffic infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint. Two major challenges prevail in VIC3D: 1) inherent calibration noise when fusing multi-view images, caused by time asynchrony across cameras; 2) information loss when projecting 2D features into 3D space. To address these issues, We propose a novel 3D object detection framework, Vehicles-Infrastructure Multi-view Intermediate fusion (VIMI). First, to fully exploit the holistic perspectives from both vehicles and infrastructure, we propose a Multi-scale Cross Attention (MCA) module that fuses infrastructure and vehicle features on selective multi-scales to correct the calibration noise introduced by camera asynchrony. Then, we design a Camera-aware Channel Masking (CCM) module that uses camera parameters as priors to augment the fused features. We further introduce a Feature Compression (FC) module with channel and spatial compression blocks to reduce the size of transmitted features for enhanced efficiency. Experiments show that VIMI achieves 15.61% overall AP_3D and 21.44% AP_BEV on the new VIC3D dataset, DAIR-V2X-C, significantly outperforming state-of-the-art early fusion and late fusion methods with comparable transmission cost.

Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving

  • Authors: Yinpeng Dong, Caixin Kang, Jinlai Zhang, Zijian Zhu, Yikai Wang, Xiao Yang, Hang Su, Xingxing Wei, Jun Zhu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.11040
  • Pdf link: https://arxiv.org/pdf/2303.11040
  • Abstract
    3D object detection is an important task in autonomous driving to perceive the surroundings. Despite the excellent performance, the existing 3D detectors lack the robustness to real-world corruptions caused by adverse weathers, sensor noises, etc., provoking concerns about the safety and reliability of autonomous driving systems. To comprehensively and rigorously benchmark the corruption robustness of 3D detectors, in this paper we design 27 types of common corruptions for both LiDAR and camera inputs considering real-world driving scenarios. By synthesizing these corruptions on public datasets, we establish three corruption robustness benchmarks -- KITTI-C, nuScenes-C, and Waymo-C. Then, we conduct large-scale experiments on 24 diverse 3D object detection models to evaluate their corruption robustness. Based on the evaluation results, we draw several important findings, including: 1) motion-level corruptions are the most threatening ones that lead to significant performance drop of all models; 2) LiDAR-camera fusion models demonstrate better robustness; 3) camera-only models are extremely vulnerable to image corruptions, showing the indispensability of LiDAR point clouds. We release the benchmarks and codes at https://github.com/kkkcx/3D_Corruptions_AD. We hope that our benchmarks and findings can provide insights for future research on developing robust 3D object detection models.

Augment and Criticize: Exploring Informative Samples for Semi-Supervised Monocular 3D Object Detection

  • Authors: Zhenyu Li, Zhipeng Zhang, Heng Fan, Yuan He, Ke Wang, Xianming Liu, Junjun Jiang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.11243
  • Pdf link: https://arxiv.org/pdf/2303.11243
  • Abstract
    In this paper, we improve the challenging monocular 3D object detection problem with a general semi-supervised framework. Specifically, having observed that the bottleneck of this task lies in lacking reliable and informative samples to train the detector, we introduce a novel, simple, yet effective Augment and Criticize' framework that explores abundant informative samples from unlabeled data for learning more robust detection models. In the Augment' stage, we present the Augmentation-based Prediction aGgregation (APG), which aggregates detections from various automatically learned augmented views to improve the robustness of pseudo label generation. Since not all pseudo labels from APG are beneficially informative, the subsequent `Criticize' phase is presented. In particular, we introduce the Critical Retraining Strategy (CRS) that, unlike simply filtering pseudo labels using a fixed threshold (e.g., classification score) as in 2D semi-supervised tasks, leverages a learnable network to evaluate the contribution of unlabeled images at different training timestamps. This way, the noisy samples prohibitive to model evolution could be effectively suppressed. To validate our framework, we apply it to MonoDLE and MonoFlex. The two new detectors, dubbed 3DSeMo_DLE and 3DSeMo_FLEX, achieve state-of-the-art results with remarkable improvements for over 3.5% AP_3D/BEV (Easy) on KITTI, showing its effectiveness and generality. Code and models will be released.

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

  • Authors: Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.11301
  • Pdf link: https://arxiv.org/pdf/2303.11301
  • Abstract
    3D object detectors usually rely on hand-crafted proxies, e.g., anchors or centers, and translate well-studied 2D frameworks to 3D. Thus, sparse voxel features need to be densified and processed by dense prediction heads, which inevitably costs extra computation. In this paper, we instead propose VoxelNext for fully sparse 3D object detection. Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies. Our strong sparse convolutional network VoxelNeXt detects and tracks 3D objects through voxel features entirely. It is an elegant and efficient framework, with no need for sparse-to-dense conversion or NMS post-processing. Our method achieves a better speed-accuracy trade-off than other mainframe detectors on the nuScenes dataset. For the first time, we show that a fully sparse voxel-based representation works decently for LIDAR 3D object detection and tracking. Extensive experiments on nuScenes, Waymo, and Argoverse2 benchmarks validate the effectiveness of our approach. Without bells and whistles, our model outperforms all existing LIDAR methods on the nuScenes tracking test benchmark.

Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding

  • Authors: Jihao Liu, Tai Wang, Boxiao Liu, Qihang Zhang, Yu Liu, Hongsheng Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.11325
  • Pdf link: https://arxiv.org/pdf/2303.11325
  • Abstract
    Multi-view camera-based 3D detection is a challenging problem in computer vision. Recent works leverage a pretrained LiDAR detection model to transfer knowledge to a camera-based student network. However, we argue that there is a major domain gap between the LiDAR BEV features and the camera-based BEV features, as they have different characteristics and are derived from different sources. In this paper, we propose Geometry Enhanced Masked Image Modeling (GeoMIM) to transfer the knowledge of the LiDAR model in a pretrain-finetune paradigm for improving the multi-view camera-based 3D detection. GeoMIM is a multi-camera vision transformer with Cross-View Attention (CVA) blocks that uses LiDAR BEV features encoded by the pretrained BEV model as learning targets. During pretraining, GeoMIM's decoder has a semantic branch completing dense perspective-view features and the other geometry branch reconstructing dense perspective-view depth maps. The depth branch is designed to be camera-aware by inputting the camera's parameters for better transfer capability. Extensive results demonstrate that GeoMIM outperforms existing methods on nuScenes benchmark, achieving state-of-the-art performance for camera-based 3D object detection and 3D segmentation.

Keyword: voxel

Boundary-aware Supervoxel-level Iteratively Refined Interactive 3D Image Segmentation with Multi-agent Reinforcement Learning

  • Authors: Chaofan Ma, Qisen Xu, Xiangfeng Wang, Bo Jin, Xiaoyun Zhang, Yanfeng Wang, Ya Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.10692
  • Pdf link: https://arxiv.org/pdf/2303.10692
  • Abstract
    Interactive segmentation has recently been explored to effectively and efficiently harvest high-quality segmentation masks by iteratively incorporating user hints. While iterative in nature, most existing interactive segmentation methods tend to ignore the dynamics of successive interactions and take each interaction independently. We here propose to model iterative interactive image segmentation with a Markov decision process (MDP) and solve it with reinforcement learning (RL) where each voxel is treated as an agent. Considering the large exploration space for voxel-wise prediction and the dependence among neighboring voxels for the segmentation tasks, multi-agent reinforcement learning is adopted, where the voxel-level policy is shared among agents. Considering that boundary voxels are more important for segmentation, we further introduce a boundary-aware reward, which consists of a global reward in the form of relative cross-entropy gain, to update the policy in a constrained direction, and a boundary reward in the form of relative weight, to emphasize the correctness of boundary predictions. To combine the advantages of different types of interactions, i.e., simple and efficient for point-clicking, and stable and robust for scribbles, we propose a supervoxel-clicking based interaction design. Experimental results on four benchmark datasets have shown that the proposed method significantly outperforms the state-of-the-arts, with the advantage of fewer interactions, higher accuracy, and enhanced robustness.

NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping

  • Authors: Junyuan Deng, Xieyuanli Chen, Songpengcheng Xia, Zhen Sun, Guoqing Liu, Wenxian Yu, Ling Pei
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.10709
  • Pdf link: https://arxiv.org/pdf/2303.10709
  • Abstract
    Simultaneously odometry and mapping using LiDAR data is an important task for mobile systems to achieve full autonomy in large-scale environments. However, most existing LiDAR-based methods prioritize tracking quality over reconstruction quality. Although the recently developed neural radiance fields (NeRF) have shown promising advances in implicit reconstruction for indoor environments, the problem of simultaneous odometry and mapping for large-scale scenarios using incremental LiDAR data remains unexplored. To bridge this gap, in this paper, we propose a novel NeRF-based LiDAR odometry and mapping approach, NeRF-LOAM, consisting of three modules neural odometry, neural mapping, and mesh reconstruction. All these modules utilize our proposed neural signed distance function, which separates LiDAR points into ground and non-ground points to reduce Z-axis drift, optimizes odometry and voxel embeddings concurrently, and in the end generates dense smooth mesh maps of the environment. Moreover, this joint optimization allows our NeRF-LOAM to be pre-trained free and exhibit strong generalization abilities when applied to different environments. Extensive evaluations on three publicly available datasets demonstrate that our approach achieves state-of-the-art odometry and mapping performance, as well as a strong generalization in large-scale environments utilizing LiDAR data. Furthermore, we perform multiple ablation studies to validate the effectiveness of our network design. The implementation of our approach will be made available at https://github.com/JunyuanDeng/NeRF-LOAM.

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

  • Authors: Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.11301
  • Pdf link: https://arxiv.org/pdf/2303.11301
  • Abstract
    3D object detectors usually rely on hand-crafted proxies, e.g., anchors or centers, and translate well-studied 2D frameworks to 3D. Thus, sparse voxel features need to be densified and processed by dense prediction heads, which inevitably costs extra computation. In this paper, we instead propose VoxelNext for fully sparse 3D object detection. Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies. Our strong sparse convolutional network VoxelNeXt detects and tracks 3D objects through voxel features entirely. It is an elegant and efficient framework, with no need for sparse-to-dense conversion or NMS post-processing. Our method achieves a better speed-accuracy trade-off than other mainframe detectors on the nuScenes dataset. For the first time, we show that a fully sparse voxel-based representation works decently for LIDAR 3D object detection and tracking. Extensive experiments on nuScenes, Waymo, and Argoverse2 benchmarks validate the effectiveness of our approach. Without bells and whistles, our model outperforms all existing LIDAR methods on the nuScenes tracking test benchmark.

Keyword: lidar

CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

  • Authors: Kaixin Xiong, Shi Gong, Xiaoqing Ye, Xiao Tan, Ji Wan, Errui Ding, Jingdong Wang, Xiang Bai
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.10209
  • Pdf link: https://arxiv.org/pdf/2303.10209
  • Abstract
    In this paper, we address the problem of detecting 3D objects from multi-view images. Current query-based methods rely on global 3D position embeddings (PE) to learn the geometric correspondence between images and 3D space. We claim that directly interacting 2D image features with global 3D PE could increase the difficulty of learning view transformation due to the variation of camera extrinsics. Thus we propose a novel method based on CAmera view Position Embedding, called CAPE. We form the 3D position embeddings under the local camera-view coordinate system instead of the global coordinate system, such that 3D position embedding is free of encoding camera extrinsic parameters. Furthermore, we extend our CAPE to temporal modeling by exploiting the object queries of previous frames and encoding the ego-motion for boosting 3D object detection. CAPE achieves state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset. Codes and models are available on \href{https://github.com/PaddlePaddle/Paddle3D}{Paddle3D} and \href{https://github.com/kaixinbear/CAPE}{PyTorch Implementation}.

Revisiting LiDAR Spoofing Attack Capabilities against Object Detection: Improvements, Measurement, and New Attack

  • Authors: Takami Sato, Yuki Hayakawa, Ryo Suzuki, Yohsuke Shiiki, Kentaro Yoshioka, Qi Alfred Chen
  • Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.10555
  • Pdf link: https://arxiv.org/pdf/2303.10555
  • Abstract
    LiDAR (Light Detection And Ranging) is an indispensable sensor for precise long- and wide-range 3D sensing, which directly benefited the recent rapid deployment of autonomous driving (AD). Meanwhile, such a safety-critical application strongly motivates its security research. A recent line of research demonstrates that one can manipulate the LiDAR point cloud and fool object detection by firing malicious lasers against LiDAR. However, these efforts face 3 critical research gaps: (1) evaluating only on a specific LiDAR (VLP-16); (2) assuming unvalidated attack capabilities; and (3) evaluating with models trained on limited datasets. To fill these critical research gaps, we conduct the first large-scale measurement study on LiDAR spoofing attack capabilities on object detectors with 9 popular LiDARs in total and 3 major types of object detectors. To perform this measurement, we significantly improved the LiDAR spoofing capability with more careful optics and functional electronics, which allows us to be the first to clearly demonstrate and quantify key attack capabilities assumed in prior works. However, we further find that such key assumptions actually can no longer hold for all the other (8 out of 9) LiDARs that are more recent than VLP-16 due to various recent LiDAR features. To this end, we further identify a new type of LiDAR spoofing attack that can improve on this and be applicable to a much more general and recent set of LiDARs. We find that its attack capability is enough to (1) cause end-to-end safety hazards in simulated AD scenarios, and (2) remove real vehicles in the physical world. We also discuss the defense side.

NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping

  • Authors: Junyuan Deng, Xieyuanli Chen, Songpengcheng Xia, Zhen Sun, Guoqing Liu, Wenxian Yu, Ling Pei
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.10709
  • Pdf link: https://arxiv.org/pdf/2303.10709
  • Abstract
    Simultaneously odometry and mapping using LiDAR data is an important task for mobile systems to achieve full autonomy in large-scale environments. However, most existing LiDAR-based methods prioritize tracking quality over reconstruction quality. Although the recently developed neural radiance fields (NeRF) have shown promising advances in implicit reconstruction for indoor environments, the problem of simultaneous odometry and mapping for large-scale scenarios using incremental LiDAR data remains unexplored. To bridge this gap, in this paper, we propose a novel NeRF-based LiDAR odometry and mapping approach, NeRF-LOAM, consisting of three modules neural odometry, neural mapping, and mesh reconstruction. All these modules utilize our proposed neural signed distance function, which separates LiDAR points into ground and non-ground points to reduce Z-axis drift, optimizes odometry and voxel embeddings concurrently, and in the end generates dense smooth mesh maps of the environment. Moreover, this joint optimization allows our NeRF-LOAM to be pre-trained free and exhibit strong generalization abilities when applied to different environments. Extensive evaluations on three publicly available datasets demonstrate that our approach achieves state-of-the-art odometry and mapping performance, as well as a strong generalization in large-scale environments utilizing LiDAR data. Furthermore, we perform multiple ablation studies to validate the effectiveness of our network design. The implementation of our approach will be made available at https://github.com/JunyuanDeng/NeRF-LOAM.

A Target-Based Extrinsic Calibration Framework for Non-Overlapping Camera-Lidar Systems Using a Motion Capture System

  • Authors: Nicholas Charron, Steven L. Waslander, Sriram Narasimhan
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.10729
  • Pdf link: https://arxiv.org/pdf/2303.10729
  • Abstract
    In this work, we present a novel target-based lidar-camera extrinsic calibration methodology that can be used for non-overlapping field of view (FOV) sensors. Contrary to previous work, our methodology overcomes the non-overlapping FOV challenge using a motion capture system (MCS) instead of traditional simultaneous localization and mapping approaches. Due to the high relative precision of the MCS, our methodology can achieve both the high accuracy and repeatable calibrations of traditional target-based methods, regardless of the amount of overlap in the field of view of the sensors. We show using simulation that we can accurately recover extrinsic calibrations for a range of perturbations to the true calibration that would be expected in real circumstances. We also validate that high accuracy calibrations can be achieved on experimental data. Furthermore, We implement the described approach in an extensible way that allows any camera model, target shape, or feature extraction methodology to be used within our framework. We validate this implementation on two target shapes: an easy to construct cylinder target and a diamond target with a checkerboard. The cylinder target shape results show that our methodology can be used for degenerate target shapes where target poses cannot be fully constrained from a single observation, and distinct repeatable features need not be detected on the target.

Unsupervised Intrinsic Image Decomposition with LiDAR Intensity

  • Authors: Shogo Sato, Yasuhiro Yao, Taiga Yoshida, Takuhiro Kaneko, Shingo Ando, Jun Shimamura
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.10820
  • Pdf link: https://arxiv.org/pdf/2303.10820
  • Abstract
    Intrinsic image decomposition (IID) is the task that decomposes a natural image into albedo and shade. While IID is typically solved through supervised learning methods, it is not ideal due to the difficulty in observing ground truth albedo and shade in general scenes. Conversely, unsupervised learning methods are currently underperforming supervised learning methods since there are no criteria for solving the ill-posed problems. Recently, light detection and ranging (LiDAR) is widely used due to its ability to make highly precise distance measurements. Thus, we have focused on the utilization of LiDAR, especially LiDAR intensity, to address this issue. In this paper, we propose unsupervised intrinsic image decomposition with LiDAR intensity (IID-LI). Since the conventional unsupervised learning methods consist of image-to-image transformations, simply inputting LiDAR intensity is not an effective approach. Therefore, we design an intensity consistency loss that computes the error between LiDAR intensity and gray-scaled albedo to provide a criterion for the ill-posed problem. In addition, LiDAR intensity is difficult to handle due to its sparsity and occlusion, hence, a LiDAR intensity densification module is proposed. We verified the estimating quality using our own dataset, which include RGB images, LiDAR intensity and human judged annotations. As a result, we achieved an estimation accuracy that outperforms conventional unsupervised learning methods.

Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving

  • Authors: Yinpeng Dong, Caixin Kang, Jinlai Zhang, Zijian Zhu, Yikai Wang, Xiao Yang, Hang Su, Xingxing Wei, Jun Zhu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.11040
  • Pdf link: https://arxiv.org/pdf/2303.11040
  • Abstract
    3D object detection is an important task in autonomous driving to perceive the surroundings. Despite the excellent performance, the existing 3D detectors lack the robustness to real-world corruptions caused by adverse weathers, sensor noises, etc., provoking concerns about the safety and reliability of autonomous driving systems. To comprehensively and rigorously benchmark the corruption robustness of 3D detectors, in this paper we design 27 types of common corruptions for both LiDAR and camera inputs considering real-world driving scenarios. By synthesizing these corruptions on public datasets, we establish three corruption robustness benchmarks -- KITTI-C, nuScenes-C, and Waymo-C. Then, we conduct large-scale experiments on 24 diverse 3D object detection models to evaluate their corruption robustness. Based on the evaluation results, we draw several important findings, including: 1) motion-level corruptions are the most threatening ones that lead to significant performance drop of all models; 2) LiDAR-camera fusion models demonstrate better robustness; 3) camera-only models are extremely vulnerable to image corruptions, showing the indispensability of LiDAR point clouds. We release the benchmarks and codes at https://github.com/kkkcx/3D_Corruptions_AD. We hope that our benchmarks and findings can provide insights for future research on developing robust 3D object detection models.

Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation

  • Authors: Li Li, Hubert P. H. Shum, Toby P. Breckon
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.11203
  • Pdf link: https://arxiv.org/pdf/2303.11203
  • Abstract
    Whilst the availability of 3D LiDAR point cloud data has significantly grown in recent years, annotation remains expensive and time-consuming, leading to a demand for semi-supervised semantic segmentation methods with application domains such as autonomous driving. Existing work very often employs relatively large segmentation backbone networks to improve segmentation accuracy, at the expense of computational costs. In addition, many use uniform sampling to reduce ground truth data requirements for learning needed, often resulting in sub-optimal performance. To address these issues, we propose a new pipeline that employs a smaller architecture, requiring fewer ground-truth annotations to achieve superior segmentation accuracy compared to contemporary approaches. This is facilitated via a novel Sparse Depthwise Separable Convolution module that significantly reduces the network parameter count while retaining overall task performance. To effectively sub-sample our training data, we propose a new Spatio-Temporal Redundant Frame Downsampling (ST-RFD) method that leverages knowledge of sensor motion within the environment to extract a more diverse subset of training data frame samples. To leverage the use of limited annotated data samples, we further propose a soft pseudo-label method informed by LiDAR reflectivity. Our method outperforms contemporary semi-supervised work in terms of mIoU, using less labeled data, on the SemanticKITTI (59.5@5%) and ScribbleKITTI (58.1@5%) benchmark datasets, based on a 2.3x reduction in model parameters and 641x fewer multiply-add operations whilst also demonstrating significant performance improvement on limited training data (i.e., Less is More).

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

  • Authors: Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.11301
  • Pdf link: https://arxiv.org/pdf/2303.11301
  • Abstract
    3D object detectors usually rely on hand-crafted proxies, e.g., anchors or centers, and translate well-studied 2D frameworks to 3D. Thus, sparse voxel features need to be densified and processed by dense prediction heads, which inevitably costs extra computation. In this paper, we instead propose VoxelNext for fully sparse 3D object detection. Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies. Our strong sparse convolutional network VoxelNeXt detects and tracks 3D objects through voxel features entirely. It is an elegant and efficient framework, with no need for sparse-to-dense conversion or NMS post-processing. Our method achieves a better speed-accuracy trade-off than other mainframe detectors on the nuScenes dataset. For the first time, we show that a fully sparse voxel-based representation works decently for LIDAR 3D object detection and tracking. Extensive experiments on nuScenes, Waymo, and Argoverse2 benchmarks validate the effectiveness of our approach. Without bells and whistles, our model outperforms all existing LIDAR methods on the nuScenes tracking test benchmark.

Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding

  • Authors: Jihao Liu, Tai Wang, Boxiao Liu, Qihang Zhang, Yu Liu, Hongsheng Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.11325
  • Pdf link: https://arxiv.org/pdf/2303.11325
  • Abstract
    Multi-view camera-based 3D detection is a challenging problem in computer vision. Recent works leverage a pretrained LiDAR detection model to transfer knowledge to a camera-based student network. However, we argue that there is a major domain gap between the LiDAR BEV features and the camera-based BEV features, as they have different characteristics and are derived from different sources. In this paper, we propose Geometry Enhanced Masked Image Modeling (GeoMIM) to transfer the knowledge of the LiDAR model in a pretrain-finetune paradigm for improving the multi-view camera-based 3D detection. GeoMIM is a multi-camera vision transformer with Cross-View Attention (CVA) blocks that uses LiDAR BEV features encoded by the pretrained BEV model as learning targets. During pretraining, GeoMIM's decoder has a semantic branch completing dense perspective-view features and the other geometry branch reconstructing dense perspective-view depth maps. The depth branch is designed to be camera-aware by inputting the camera's parameters for better transfer capability. Extensive results demonstrate that GeoMIM outperforms existing methods on nuScenes benchmark, achieving state-of-the-art performance for camera-based 3D object detection and 3D segmentation.

New submissions for Fri, 24 Mar 23

Keyword: pruning

Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation

  • Authors: Bingyi Zhang, Viktor Prasanna
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2303.12901
  • Pdf link: https://arxiv.org/pdf/2303.12901
  • Abstract
    Graph Neural Network (GNN) inference is used in many real-world applications. Data sparsity in GNN inference, including sparsity in the input graph and the GNN model, offer opportunities to further speed up inference. Also, many pruning techniques have been proposed for model compression that increase the data sparsity of GNNs. We propose Dynasparse, a comprehensive hardware-software codesign on FPGA to accelerate GNN inference through dynamic sparsity exploitation. For this, we decouple the GNN computation kernels from the basic computation primitives, and explore hardware-software codesign as follows: 1) Hardware design: We propose a novel unified accelerator design on FPGA to efficiently execute various computation primitives. We develop a customized soft processor that is tightly coupled with the accelerator to execute a runtime system. Moreover, we develop efficient hardware mechanisms to profile the data sparsity and perform on-the-fly data format transformation to prepare the input data for various computation primitives; 2) Software design: We develop a runtime system that works synergistically with the accelerator to perform dynamic kernel-to-primitive mapping based on data sparsity. We implement Dynasparse on a state-of-the-art FPGA platform, Xilinx Alveo U250, and evaluate the design using widely used GNN models (GCN, GraphSAGE, GIN and SGC). For the above GNN models and various input graphs, the proposed accelerator and dynamic kernel-to-primitive mapping reduces the inference latency by $3.73\times$ on the average compared with the static mapping strategies employed in the state-of-the-art GNN accelerators. Compared with state-of-the-art CPU (GPU) implementations, Dynasparse achieves up to $56.9\times$ ($2.37\times$) speedup in end-to-end latency.

CP$^3$: Channel Pruning Plug-in for Point-based Networks

  • Authors: Yaomin Huang, Ning Liu, Zhengping Che, Zhiyuan Xu, Chaomin Shen, Yaxin Peng, Guixu Zhang, Xinmei Liu, Feifei Feng, Jian Tang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.13097
  • Pdf link: https://arxiv.org/pdf/2303.13097
  • Abstract
    Channel pruning can effectively reduce both computational cost and memory footprint of the original network while keeping a comparable accuracy performance. Though great success has been achieved in channel pruning for 2D image-based convolutional networks (CNNs), existing works seldom extend the channel pruning methods to 3D point-based neural networks (PNNs). Directly implementing the 2D CNN channel pruning methods to PNNs undermine the performance of PNNs because of the different representations of 2D images and 3D point clouds as well as the network architecture disparity. In this paper, we proposed CP$^3$, which is a Channel Pruning Plug-in for Point-based network. CP$^3$ is elaborately designed to leverage the characteristics of point clouds and PNNs in order to enable 2D channel pruning methods for PNNs. Specifically, it presents a coordinate-enhanced channel importance metric to reflect the correlation between dimensional information and individual channel features, and it recycles the discarded points in PNN's sampling process and reconsiders their potentially-exclusive information to enhance the robustness of channel pruning. Experiments on various PNN architectures show that CP$^3$ constantly improves state-of-the-art 2D CNN pruning approaches on different point cloud tasks. For instance, our compressed PointNeXt-S on ScanObjectNN achieves an accuracy of 88.52% with a pruning rate of 57.8%, outperforming the baseline pruning methods with an accuracy gain of 1.94%.

DetOFA: Efficient Training of Once-for-All Networks for Object Detection by Using Pre-trained Supernet and Path Filter

  • Authors: Yuiko Sakuma, Masato Ishii, Takuya Narihira
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13121
  • Pdf link: https://arxiv.org/pdf/2303.13121
  • Abstract
    We address the challenge of training a large supernet for the object detection task, using a relatively small amount of training data. Specifically, we propose an efficient supernet-based neural architecture search (NAS) method that uses transfer learning and search space pruning. First, the supernet is pre-trained on a classification task, for which large datasets are available. Second, the search space defined by the supernet is pruned by removing candidate models that are predicted to perform poorly. To effectively remove the candidates over a wide range of resource constraints, we particularly design a performance predictor, called path filter, which can accurately predict the relative performance of the models that satisfy similar resource constraints. Hence, supernet training is more focused on the best-performing candidates. Our path filter handles prediction for paths with different resource budgets. Compared to once-for-all, our proposed method reduces the computational cost of the optimal network architecture by 30% and 63%, while yielding better accuracy-floating point operations Pareto front (0.85 and 0.45 points of improvement on average precision for Pascal VOC and COCO, respectively).

Keyword: neural\ architecture\ search

There is no result

Keyword: 3d object detection

MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer

  • Authors: Yunsong Zhou, Hongzi Zhu, Quan Liu, Shan Chang, Minyi Guo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13018
  • Pdf link: https://arxiv.org/pdf/2303.13018
  • Abstract
    Mobile monocular 3D object detection (Mono3D) (e.g., on a vehicle, a drone, or a robot) is an important yet challenging task. Existing transformer-based offline Mono3D models adopt grid-based vision tokens, which is suboptimal when using coarse tokens due to the limited available computational power. In this paper, we propose an online Mono3D framework, called MonoATT, which leverages a novel vision transformer with heterogeneous tokens of varying shapes and sizes to facilitate mobile Mono3D. The core idea of MonoATT is to adaptively assign finer tokens to areas of more significance before utilizing a transformer to enhance Mono3D. To this end, we first use prior knowledge to design a scoring network for selecting the most important areas of the image, and then propose a token clustering and merging network with an attention mechanism to gradually merge tokens around the selected areas in multiple stages. Finally, a pixel-level feature map is reconstructed from heterogeneous tokens before employing a SOTA Mono3D detector as the underlying detection core. Experiment results on the real-world KITTI dataset demonstrate that MonoATT can effectively improve the Mono3D accuracy for both near and far objects and guarantee low latency. MonoATT yields the best performance compared with the state-of-the-art methods by a large margin and is ranked number one on the KITTI 3D benchmark.

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

  • Authors: Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, Jinkun Cao, Jiangmiao Pang, Dahua Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13510
  • Pdf link: https://arxiv.org/pdf/2303.13510
  • Abstract
    This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset. Inspired by the scene-voxel-point hierarchy in downstream 3D object detectors, we design masking and reconstruction strategies accounting for voxel distributions in the scene and local point distributions within the voxel. We employ a Reversed-Furthest-Voxel-Sampling strategy to address the uneven distribution of LiDAR points and propose MV-JAR, which combines two techniques for modeling the aforementioned distributions, resulting in superior performance. Our experiments reveal limitations in previous data-efficient experiments, which uniformly sample fine-tuning splits with varying data proportions from each LiDAR sequence, leading to similar data diversity across splits. To address this, we propose a new benchmark that samples scene sequences for diverse fine-tuning splits, ensuring adequate model convergence and providing a more accurate evaluation of pre-training methods. Experiments on our Waymo benchmark and the KITTI dataset demonstrate that MV-JAR consistently and significantly improves 3D detection performance across various data scales, achieving up to a 6.3% increase in mAPH compared to training from scratch. Codes and the benchmark will be available at https://github.com/SmartBot-PJLab/MV-JAR .

Keyword: voxel

Marching-Primitives: Shape Abstraction from Signed Distance Function

  • Authors: Weixiao Liu, Yuwei Wu, Sipu Ruan, Gregory S. Chirikjian
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13190
  • Pdf link: https://arxiv.org/pdf/2303.13190
  • Abstract
    Representing complex objects with basic geometric primitives has long been a topic in computer vision. Primitive-based representations have the merits of compactness and computational efficiency in higher-level tasks such as physics simulation, collision checking, and robotic manipulation. Unlike previous works which extract polygonal meshes from a signed distance function (SDF), in this paper, we present a novel method, named Marching-Primitives, to obtain a primitive-based abstraction directly from an SDF. Our method grows geometric primitives (such as superquadrics) iteratively by analyzing the connectivity of voxels while marching at different levels of signed distance. For each valid connected volume of interest, we march on the scope of voxels from which a primitive is able to be extracted in a probabilistic sense and simultaneously solve for the parameters of the primitive to capture the underlying local geometry. We evaluate the performance of our method on both synthetic and real-world datasets. The results show that the proposed method outperforms the state-of-the-art in terms of accuracy, and is directly generalizable among different categories and scales. The code is open-sourced at https://github.com/ChirikjianLab/Marching-Primitives.git.

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

  • Authors: Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, Jinkun Cao, Jiangmiao Pang, Dahua Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13510
  • Pdf link: https://arxiv.org/pdf/2303.13510
  • Abstract
    This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset. Inspired by the scene-voxel-point hierarchy in downstream 3D object detectors, we design masking and reconstruction strategies accounting for voxel distributions in the scene and local point distributions within the voxel. We employ a Reversed-Furthest-Voxel-Sampling strategy to address the uneven distribution of LiDAR points and propose MV-JAR, which combines two techniques for modeling the aforementioned distributions, resulting in superior performance. Our experiments reveal limitations in previous data-efficient experiments, which uniformly sample fine-tuning splits with varying data proportions from each LiDAR sequence, leading to similar data diversity across splits. To address this, we propose a new benchmark that samples scene sequences for diverse fine-tuning splits, ensuring adequate model convergence and providing a more accurate evaluation of pre-training methods. Experiments on our Waymo benchmark and the KITTI dataset demonstrate that MV-JAR consistently and significantly improves 3D detection performance across various data scales, achieving up to a 6.3% increase in mAPH compared to training from scratch. Codes and the benchmark will be available at https://github.com/SmartBot-PJLab/MV-JAR .

Keyword: lidar

MMFormer: Multimodal Transformer Using Multiscale Self-Attention for Remote Sensing Image Classification

  • Authors: Bo Zhang, Zuheng Ming, Wei Feng, Yaqian Liu, Liang He, Kaixing Zhao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13101
  • Pdf link: https://arxiv.org/pdf/2303.13101
  • Abstract
    To benefit the complementary information between heterogeneous data, we introduce a new Multimodal Transformer (MMFormer) for Remote Sensing (RS) image classification using Hyperspectral Image (HSI) accompanied by another source of data such as Light Detection and Ranging (LiDAR). Compared with traditional Vision Transformer (ViT) lacking inductive biases of convolutions, we first introduce convolutional layers to our MMFormer to tokenize patches from multimodal data of HSI and LiDAR. Then we propose a Multi-scale Multi-head Self-Attention (MSMHSA) module to address the problem of compatibility which often limits to fuse HSI with high spectral resolution and LiDAR with relatively low spatial resolution. The proposed MSMHSA module can incorporate HSI to LiDAR data in a coarse-to-fine manner enabling us to learn a fine-grained representation. Extensive experiments on widely used benchmarks (e.g., Trento and MUUFL) demonstrate the effectiveness and superiority of our proposed MMFormer for RS image classification.

Position-Guided Point Cloud Panoptic Segmentation Transformer

  • Authors: Zeqi Xiao, Wenwei Zhang, Tai Wang, Chen Change Loy, Dahua Lin, Jiangmiao Pang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13509
  • Pdf link: https://arxiv.org/pdf/2303.13509
  • Abstract
    DEtection TRansformer (DETR) started a trend that uses a group of learnable queries for unified visual perception. This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline. Although the naive adaptation obtains fair results, the instance segmentation performance is noticeably inferior to previous works. By diving into the details, we observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain. Considering instances in 3D are more featured by their positional information, we emphasize their roles during the modeling and design a robust Mixed-parameterized Positional Embedding (MPE) to guide the segmentation process. It is embedded into backbone features and later guides the mask prediction and query update processes iteratively, leading to Position-Aware Segmentation (PA-Seg) and Masked Focal Attention (MFA). All these designs impel the queries to attend to specific regions and identify various instances. The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% PQ on SemanticKITTI and nuScenes benchmark, respectively. The source code and models are available at https://github.com/SmartBot-PJLab/P3Former .

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

  • Authors: Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, Jinkun Cao, Jiangmiao Pang, Dahua Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13510
  • Pdf link: https://arxiv.org/pdf/2303.13510
  • Abstract
    This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset. Inspired by the scene-voxel-point hierarchy in downstream 3D object detectors, we design masking and reconstruction strategies accounting for voxel distributions in the scene and local point distributions within the voxel. We employ a Reversed-Furthest-Voxel-Sampling strategy to address the uneven distribution of LiDAR points and propose MV-JAR, which combines two techniques for modeling the aforementioned distributions, resulting in superior performance. Our experiments reveal limitations in previous data-efficient experiments, which uniformly sample fine-tuning splits with varying data proportions from each LiDAR sequence, leading to similar data diversity across splits. To address this, we propose a new benchmark that samples scene sequences for diverse fine-tuning splits, ensuring adequate model convergence and providing a more accurate evaluation of pre-training methods. Experiments on our Waymo benchmark and the KITTI dataset demonstrate that MV-JAR consistently and significantly improves 3D detection performance across various data scales, achieving up to a 6.3% increase in mAPH compared to training from scratch. Codes and the benchmark will be available at https://github.com/SmartBot-PJLab/MV-JAR .

New submissions for Fri, 14 Apr 23

Keyword: efficient

RELS-DQN: A Robust and Efficient Local Search Framework for Combinatorial Optimization

  • Authors: Yuanhang Shao, Tonmoy Dey, Nikola Vuckovic, Luke Van Popering, Alan Kuhnle
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06048
  • Pdf link: https://arxiv.org/pdf/2304.06048
  • Abstract
    Combinatorial optimization (CO) aims to efficiently find the best solution to NP-hard problems ranging from statistical physics to social media marketing. A wide range of CO applications can benefit from local search methods because they allow reversible action over greedy policies. Deep Q-learning (DQN) using message-passing neural networks (MPNN) has shown promise in replicating the local search behavior and obtaining comparable results to the local search algorithms. However, the over-smoothing and the information loss during the iterations of message passing limit its robustness across applications, and the large message vectors result in memory inefficiency. Our paper introduces RELS-DQN, a lightweight DQN framework that exhibits the local search behavior while providing practical scalability. Using the RELS-DQN model trained on one application, it can generalize to various applications by providing solution values higher than or equal to both the local search algorithms and the existing DQN models while remaining efficient in runtime and memory.

Exploiting Symmetry and Heuristic Demonstrations in Off-policy Reinforcement Learning for Robotic Manipulation

  • Authors: Amir M. Soufi Enayati, Zengjie Zhang, Kashish Gupta, Homayoun Najjaran
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06055
  • Pdf link: https://arxiv.org/pdf/2304.06055
  • Abstract
    Reinforcement learning demonstrates significant potential in automatically building control policies in numerous domains, but shows low efficiency when applied to robot manipulation tasks due to the curse of dimensionality. To facilitate the learning of such tasks, prior knowledge or heuristics that incorporate inherent simplification can effectively improve the learning performance. This paper aims to define and incorporate the natural symmetry present in physical robotic environments. Then, sample-efficient policies are trained by exploiting the expert demonstrations in symmetrical environments through an amalgamation of reinforcement and behavior cloning, which gives the off-policy learning process a diverse yet compact initiation. Furthermore, it presents a rigorous framework for a recent concept and explores its scope for robot manipulation tasks. The proposed method is validated via two point-to-point reaching tasks of an industrial arm, with and without an obstacle, in a simulation experiment study. A PID controller, which tracks the linear joint-space trajectories with hard-coded temporal logic to produce interim midpoints, is used to generate demonstrations in the study. The results of the study present the effect of the number of demonstrations and quantify the magnitude of behavior cloning to exemplify the possible improvement of model-free reinforcement learning in common manipulation tasks. A comparison study between the proposed method and a traditional off-policy reinforcement learning algorithm indicates its advantage in learning performance and potential value for applications.

Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays

  • Authors: Chen Xie, Francesco Daghero, Yukai Chen, Marco Castellano, Luca Gandolfi, Andrea Calimera, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06059
  • Pdf link: https://arxiv.org/pdf/2304.06059
  • Abstract
    Ultra-low-resolution Infrared (IR) array sensors offer a low-cost, energy-efficient, and privacy-preserving solution for people counting, with applications such as occupancy monitoring. Previous work has shown that Deep Learning (DL) can yield superior performance on this task. However, the literature was missing an extensive comparative analysis of various efficient DL architectures for IR array-based people counting, that considers not only their accuracy, but also the cost of deploying them on memory- and energy-constrained Internet of Things (IoT) edge nodes. In this work, we address this need by comparing 6 different DL architectures on a novel dataset composed of IR images collected from a commercial 8x8 array, which we made openly available. With a wide architectural exploration of each model type, we obtain a rich set of Pareto-optimal solutions, spanning cross-validated balanced accuracy scores in the 55.70-82.70% range. When deployed on a commercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, these models occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference, while consuming 17.18-120.43 $\mu$J of energy. Our models are significantly more accurate than a previous deterministic method (up to +39.9%), while being up to 3.53x faster and more energy efficient. Further, our models' accuracy is comparable to state-of-the-art DL solutions on similar resolution sensors, despite a much lower complexity. All our models enable continuous, real-time inference on a MCU-based IoT node, with years of autonomous operation without battery recharging.

Energy-guided Entropic Neural Optimal Transport

  • Authors: Petr Mokrov, Alexander Korotin, Evgeny Burnaev
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.06094
  • Pdf link: https://arxiv.org/pdf/2304.06094
  • Abstract
    Energy-Based Models (EBMs) are known in the Machine Learning community for the decades. Since the seminal works devoted to EBMs dating back to the noughties there have been appearing a lot of efficient methods which solve the generative modelling problem by means of energy potentials (unnormalized likelihood functions). In contrast, the realm of Optimal Transport (OT) and, in particular, neural OT solvers is much less explored and limited by few recent works (excluding WGAN based approaches which utilize OT as a loss function and do not model OT maps themselves). In our work, we bridge the gap between EBMs and Entropy-regularized OT. We present the novel methodology which allows utilizing the recent developments and technical improvements of the former in order to enrich the latter. We validate the applicability of our method on toy 2D scenarios as well as standard unpaired image-to-image translation problems. For the sake of simplicity, we choose simple short- and long- run EBMs as a backbone of our Energy-guided Entropic OT method, leaving the application of more sophisticated EBMs for future research.

IoT trust and reputation: a survey and taxonomy

  • Authors: Muhammad Aaqib, Aftab Ali, Liming Chen, Omar Nibouche
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06119
  • Pdf link: https://arxiv.org/pdf/2304.06119
  • Abstract
    IoT is one of the fastest-growing technologies and it is estimated that more than a billion devices would be utilized across the globe by the end of 2030. To maximize the capability of these connected entities, trust and reputation among IoT entities is essential. Several trust management models have been proposed in the IoT environment; however, these schemes have not fully addressed the IoT devices features, such as devices role, device type and its dynamic behavior in a smart environment. As a result, traditional trust and reputation models are insufficient to tackle these characteristics and uncertainty risks while connecting nodes to the network. Whilst continuous study has been carried out and various articles suggest promising solutions in constrained environments, research on trust and reputation is still at its infancy. In this paper, we carry out a comprehensive literature review on state-of-the-art research on the trust and reputation of IoT devices and systems. Specifically, we first propose a new structure, namely a new taxonomy, to organize the trust and reputation models based on the ways trust is managed. The proposed taxonomy comprises of traditional trust management-based systems and artificial intelligence-based systems, and combine both the classes which encourage the existing schemes to adapt these emerging concepts. This collaboration between the conventional mathematical and the advanced ML models result in design schemes that are more robust and efficient. Then we drill down to compare and analyse the methods and applications of these systems based on community-accepted performance metrics, e.g. scalability, delay, cooperativeness and efficiency. Finally, built upon the findings of the analysis, we identify and discuss open research issues and challenges, and further speculate and point out future research directions.

Label-Free Concept Bottleneck Models

  • Authors: Tuomas Oikarinen, Subhro Das, Lam M. Nguyen, Tsui-Wei Weng
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06129
  • Pdf link: https://arxiv.org/pdf/2304.06129
  • Abstract
    Concept bottleneck models (CBM) are a popular way of creating more interpretable neural networks by having hidden layer neurons correspond to human-understandable concepts. However, existing CBMs and their variants have two crucial limitations: first, they need to collect labeled data for each of the predefined concepts, which is time consuming and labor intensive; second, the accuracy of a CBM is often significantly lower than that of a standard neural network, especially on more complex datasets. This poor performance creates a barrier for adopting CBMs in practical real world applications. Motivated by these challenges, we propose Label-free CBM which is a novel framework to transform any neural network into an interpretable CBM without labeled concept data, while retaining a high accuracy. Our Label-free CBM has many advantages, it is: scalable - we present the first CBM scaled to ImageNet, efficient - creating a CBM takes only a few hours even for very large datasets, and automated - training it for a new dataset requires minimal human effort. Our code is available at https://github.com/Trustworthy-ML-Lab/Label-free-CBM.

AGI for Agriculture

  • Authors: Guoyu Lu, Sheng Li, Gengchen Mai, Jin Sun, Dajiang Zhu, Lilong Chai, Haijian Sun, Xianqiao Wang, Haixing Dai, Ninghao Liu, Rui Xu, Daniel Petti, Changying Li, Tianming Liu, Changying Li
  • Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06136
  • Pdf link: https://arxiv.org/pdf/2304.06136
  • Abstract
    Artificial General Intelligence (AGI) is poised to revolutionize a variety of sectors, including healthcare, finance, transportation, and education. Within healthcare, AGI is being utilized to analyze clinical medical notes, recognize patterns in patient data, and aid in patient management. Agriculture is another critical sector that impacts the lives of individuals worldwide. It serves as a foundation for providing food, fiber, and fuel, yet faces several challenges, such as climate change, soil degradation, water scarcity, and food security. AGI has the potential to tackle these issues by enhancing crop yields, reducing waste, and promoting sustainable farming practices. It can also help farmers make informed decisions by leveraging real-time data, leading to more efficient and effective farm management. This paper delves into the potential future applications of AGI in agriculture, such as agriculture image processing, natural language processing (NLP), robotics, knowledge graphs, and infrastructure, and their impact on precision livestock and precision crops. By leveraging the power of AGI, these emerging technologies can provide farmers with actionable insights, allowing for optimized decision-making and increased productivity. The transformative potential of AGI in agriculture is vast, and this paper aims to highlight its potential to revolutionize the industry.

Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

  • Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.06178
  • Pdf link: https://arxiv.org/pdf/2304.06178
  • Abstract
    Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.

SePEnTra: A secure and privacy-preserving energy trading mechanisms in transactive energy market

  • Authors: Rumpa Dasgupta, Amin Sakzad, Carsten Rudolph, Rafael Dowsley
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.06179
  • Pdf link: https://arxiv.org/pdf/2304.06179
  • Abstract
    In this paper, we design and present a novel model called SePEnTra to ensure the security and privacy of energy data while sharing with other entities during energy trading to determine optimal price signals. Furthermore, the market operator can use this data to detect malicious activities of users in the later stage without violating privacy (e.g., deviation of actual energy generation/consumption from forecast beyond a threshold). We use two cryptographic primitives, additive secret sharing and Pedersen commitment, in SePEnTra. The performance of our model is evaluated theoretically and numerically. We compare the performance of SePEnTra with the same Transactive energy market (TEM) framework without security mechanisms. The result shows that even though using advanced cryptographic primitives in a large market framework, SePEnTra has very low computational complexity and communication overhead. Moreover, it is storage efficient for all parties.

SURFSUP: Learning Fluid Simulation for Novel Surfaces

  • Authors: Arjun Mani, Ishaan Preetam Chandratreya, Elliot Creager, Carl Vondrick, Richard Zemel
  • Subjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)
  • Arxiv link: https://arxiv.org/abs/2304.06197
  • Pdf link: https://arxiv.org/pdf/2304.06197
  • Abstract
    Modeling the mechanics of fluid in complex scenes is vital to applications in design, graphics, and robotics. Learning-based methods provide fast and differentiable fluid simulators, however most prior work is unable to accurately model how fluids interact with genuinely novel surfaces not seen during training. We introduce SURFSUP, a framework that represents objects implicitly using signed distance functions (SDFs), rather than an explicit representation of meshes or particles. This continuous representation of geometry enables more accurate simulation of fluid-object interactions over long time periods while simultaneously making computation more efficient. Moreover, SURFSUP trained on simple shape primitives generalizes considerably out-of-distribution, even to complex real-world scenes and objects. Finally, we show we can invert our model to design simple objects to manipulate fluid flow.

Space-Time Tradeoffs for Conjunctive Queries with Access Patterns

  • Authors: Hangdong Zhao, Shaleen Deep, Paraschos Koutris
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.06221
  • Pdf link: https://arxiv.org/pdf/2304.06221
  • Abstract
    In this paper, we investigate space-time tradeoffs for answering conjunctive queries with access patterns (CQAPs). The goal is to create a space-efficient data structure in an initial preprocessing phase and use it for answering (multiple) queries in an online phase. Previous work has developed data structures that trades off space usage for answering time for queries of practical interest, such as the path and triangle query. However, these approaches lack a comprehensive framework and are not generalizable. Our main contribution is a general algorithmic framework for obtaining space-time tradeoffs for any CQAP. Our framework builds upon the $\PANDA$ algorithm and tree decomposition techniques. We demonstrate that our framework captures all state-of-the-art tradeoffs that were independently produced for various queries. Further, we show surprising improvements over the state-of-the-art tradeoffs known in the existing literature for reachability queries.

Improving Segmentation of Objects with Varying Sizes in Biomedical Images using Instance-wise and Center-of-Instance Segmentation Loss Function

  • Authors: Muhammad Febrian Rachmadi, Charissa Poon, Henrik Skibbe
  • Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06229
  • Pdf link: https://arxiv.org/pdf/2304.06229
  • Abstract
    In this paper, we propose a novel two-component loss for biomedical image segmentation tasks called the Instance-wise and Center-of-Instance (ICI) loss, a loss function that addresses the instance imbalance problem commonly encountered when using pixel-wise loss functions such as the Dice loss. The Instance-wise component improves the detection of small instances or ``blobs" in image datasets with both large and small instances. The Center-of-Instance component improves the overall detection accuracy. We compared the ICI loss with two existing losses, the Dice loss and the blob loss, in the task of stroke lesion segmentation using the ATLAS R2.0 challenge dataset from MICCAI 2022. Compared to the other losses, the ICI loss provided a better balanced segmentation, and significantly outperformed the Dice loss with an improvement of $1.7-3.7%$ and the blob loss by $0.6-5.0%$ in terms of the Dice similarity coefficient on both validation and test set, suggesting that the ICI loss is a potential solution to the instance imbalance problem.

Physics-informed radial basis network (PIRBN): A local approximation neural network for solving nonlinear PDEs

  • Authors: Jinshuai Bai, Gui-Rong Liu, Ashish Gupta, Laith Alzubaidi, Xi-Qiao Feng, YuanTong Gu
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06234
  • Pdf link: https://arxiv.org/pdf/2304.06234
  • Abstract
    Our recent intensive study has found that physics-informed neural networks (PINN) tend to be local approximators after training. This observation leads to this novel physics-informed radial basis network (PIRBN), which can maintain the local property throughout the entire training process. Compared to deep neural networks, a PIRBN comprises of only one hidden layer and a radial basis "activation" function. Under appropriate conditions, we demonstrated that the training of PIRBNs using gradient descendent methods can converge to Gaussian processes. Besides, we studied the training dynamics of PIRBN via the neural tangent kernel (NTK) theory. In addition, comprehensive investigations regarding the initialisation strategies of PIRBN were conducted. Based on numerical examples, PIRBN has been demonstrated to be more effective and efficient than PINN in solving PDEs with high-frequency features and ill-posed computational domains. Moreover, the existing PINN numerical techniques, such as adaptive learning, decomposition and different types of loss functions, are applicable to PIRBN. The programs that can regenerate all numerical results can be found at https://github.com/JinshuaiBai/PIRBN.

Cross-View Hierarchy Network for Stereo Image Super-Resolution

  • Authors: Wenbin Zou, Hongxia Gao, Liang Chen, Yunchen Zhang, Mingchao Jiang, Zhongxin Yu, Ming Tan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06236
  • Pdf link: https://arxiv.org/pdf/2304.06236
  • Abstract
    Stereo image super-resolution aims to improve the quality of high-resolution stereo image pairs by exploiting complementary information across views. To attain superior performance, many methods have prioritized designing complex modules to fuse similar information across views, yet overlooking the importance of intra-view information for high-resolution reconstruction. It also leads to problems of wrong texture in recovered images. To address this issue, we explore the interdependencies between various hierarchies from intra-view and propose a novel method, named Cross-View-Hierarchy Network for Stereo Image Super-Resolution (CVHSSR). Specifically, we design a cross-hierarchy information mining block (CHIMB) that leverages channel attention and large kernel convolution attention to extract both global and local features from the intra-view, enabling the efficient restoration of accurate texture details. Additionally, a cross-view interaction module (CVIM) is proposed to fuse similar features from different views by utilizing cross-view attention mechanisms, effectively adapting to the binocular scene. Extensive experiments demonstrate the effectiveness of our method. CVHSSR achieves the best stereo image super-resolution performance than other state-of-the-art methods while using fewer parameters. The source code and pre-trained models are available at https://github.com/AlexZou14/CVHSSR.

EWT: Efficient Wavelet-Transformer for Single Image Denoising

  • Authors: Juncheng Li, Bodong Cheng, Ying Chen, Guangwei Gao, Tieyong Zeng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06274
  • Pdf link: https://arxiv.org/pdf/2304.06274
  • Abstract
    Transformer-based image denoising methods have achieved encouraging results in the past year. However, it must uses linear operations to model long-range dependencies, which greatly increases model inference time and consumes GPU storage space. Compared with convolutional neural network-based methods, current Transformer-based image denoising methods cannot achieve a balance between performance improvement and resource consumption. In this paper, we propose an Efficient Wavelet Transformer (EWT) for image denoising. Specifically, we use Discrete Wavelet Transform (DWT) and Inverse Wavelet Transform (IWT) for downsampling and upsampling, respectively. This method can fully preserve the image features while reducing the image resolution, thereby greatly reducing the device resource consumption of the Transformer model. Furthermore, we propose a novel Dual-stream Feature Extraction Block (DFEB) to extract image features at different levels, which can further reduce model inference time and GPU memory usage. Experiments show that our method speeds up the original Transformer by more than 80%, reduces GPU memory usage by more than 60%, and achieves excellent denoising results. All code will be public.

Optimizing Multi-Domain Performance with Active Learning-based Improvement Strategies

  • Authors: Anand Gokul Mahalingam, Aayush Shah, Akshay Gulati, Royston Mascarenhas, Rakshitha Panduranga
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06277
  • Pdf link: https://arxiv.org/pdf/2304.06277
  • Abstract
    Improving performance in multiple domains is a challenging task, and often requires significant amounts of data to train and test models. Active learning techniques provide a promising solution by enabling models to select the most informative samples for labeling, thus reducing the amount of labeled data required to achieve high performance. In this paper, we present an active learning-based framework for improving performance across multiple domains. Our approach consists of two stages: first, we use an initial set of labeled data to train a base model, and then we iteratively select the most informative samples for labeling to refine the model. We evaluate our approach on several multi-domain datasets, including image classification, sentiment analysis, and object recognition. Our experiments demonstrate that our approach consistently outperforms baseline methods and achieves state-of-the-art performance on several datasets. We also show that our method is highly efficient, requiring significantly fewer labeled samples than other active learning-based methods. Overall, our approach provides a practical and effective solution for improving performance across multiple domains using active learning techniques.

Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning

  • Authors: Wenli Xiao, Yiwei Lyu, John Dolan
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06281
  • Pdf link: https://arxiv.org/pdf/2304.06281
  • Abstract
    Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Although shielding with Linear Temporal Logic (LTL) is a promising formal method to ensure safety in single-agent Reinforcement Learning (RL), it results in conservative behaviors when scaling to multi-agent scenarios. Additionally, it poses computational challenges for synthesizing shields in complex multi-agent environments. This work introduces Model-based Dynamic Shielding (MBDS) to support MARL algorithm design. Our algorithm synthesizes distributive shields, which are reactive systems running in parallel with each MARL agent, to monitor and rectify unsafe behaviors. The shields can dynamically split, merge, and recompute based on agents' states. This design enables efficient synthesis of shields to monitor agents in complex environments without coordination overheads. We also propose an algorithm to synthesize shields without prior knowledge of the dynamics model. The proposed algorithm obtains an approximate world model by interacting with the environment during the early stage of exploration, making our MBDS enjoy formal safety guarantees with high probability. We demonstrate in simulations that our framework can surpass existing baselines in terms of safety guarantees and learning performance.

ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis

  • Authors: Hongchen Tan, Baocai Yin, Kun Wei, Xiuping Liu, Xin Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06297
  • Pdf link: https://arxiv.org/pdf/2304.06297
  • Abstract
    We propose a novel Text-to-Image Generation Network, Adaptive Layout Refinement Generative Adversarial Network (ALR-GAN), to adaptively refine the layout of synthesized images without any auxiliary information. The ALR-GAN includes an Adaptive Layout Refinement (ALR) module and a Layout Visual Refinement (LVR) loss. The ALR module aligns the layout structure (which refers to locations of objects and background) of a synthesized image with that of its corresponding real image. In ALR module, we proposed an Adaptive Layout Refinement (ALR) loss to balance the matching of hard and easy features, for more efficient layout structure matching. Based on the refined layout structure, the LVR loss further refines the visual representation within the layout area. Experimental results on two widely-used datasets show that ALR-GAN performs competitively at the Text-to-Image generation task.

Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution

  • Authors: Zhuo Su, Jiehua Zhang, Tianpeng Liu, Zhen Liu, Shuanghui Zhang, Matti Pietikäinen, Li Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06305
  • Pdf link: https://arxiv.org/pdf/2304.06305
  • Abstract
    This paper proposes a novel module called middle spectrum grouped convolution (MSGC) for efficient deep convolutional neural networks (DCNNs) with the mechanism of grouped convolution. It explores the broad "middle spectrum" area between channel pruning and conventional grouped convolution. Compared with channel pruning, MSGC can retain most of the information from the input feature maps due to the group mechanism; compared with grouped convolution, MSGC benefits from the learnability, the core of channel pruning, for constructing its group topology, leading to better channel division. The middle spectrum area is unfolded along four dimensions: group-wise, layer-wise, sample-wise, and attention-wise, making it possible to reveal more powerful and interpretable structures. As a result, the proposed module acts as a booster that can reduce the computational cost of the host backbones for general image recognition with even improved predictive accuracy. For example, in the experiments on ImageNet dataset for image classification, MSGC can reduce the multiply-accumulates (MACs) of ResNet-18 and ResNet-50 by half but still increase the Top-1 accuracy by more than 1%. With 35% reduction of MACs, MSGC can also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on MS COCO dataset for object detection show similar observations. Our code and trained models are available at https://github.com/hellozhuo/msgc.

Efficient Multimodal Fusion via Interactive Prompting

  • Authors: Yaowei Li, Ruijie Quan, Linchao Zhu, Yi Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06306
  • Pdf link: https://arxiv.org/pdf/2304.06306
  • Abstract
    Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era. Following this trend, the size of multi-modal learning models constantly increases, leading to an urgent need to reduce the massive computational cost of finetuning these models for downstream tasks. In this paper, we propose an efficient and flexible multimodal fusion method, namely PMF, tailored for fusing unimodally pre-trained transformers. Specifically, we first present a modular multimodal fusion framework that exhibits high flexibility and facilitates mutual interactions among different modalities. In addition, we disentangle vanilla prompts into three types in order to learn different optimizing objectives for multimodal learning. It is also worth noting that we propose to add prompt vectors only on the deep layers of the unimodal transformers, thus significantly reducing the training memory usage. Experiment results show that our proposed method achieves comparable performance to several other multimodal finetuning methods with less than 3% trainable parameters and up to 66% saving of training memory usage.

Out-of-distribution Few-shot Learning For Edge Devices without Model Fine-tuning

  • Authors: Xinyun Zhang, Lanqing Hong
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06309
  • Pdf link: https://arxiv.org/pdf/2304.06309
  • Abstract
    Few-shot learning (FSL) via customization of a deep learning network with limited data has emerged as a promising technique to achieve personalized user experiences on edge devices. However, existing FSL methods primarily assume independent and identically distributed (IID) data and utilize either computational backpropagation updates for each task or a common model with task-specific prototypes. Unfortunately, the former solution is infeasible for edge devices that lack on-device backpropagation capabilities, while the latter often struggles with limited generalization ability, especially for out-of-distribution (OOD) data. This paper proposes a lightweight, plug-and-play FSL module called Task-aware Normalization (TANO) that enables efficient and task-aware adaptation of a deep neural network without backpropagation. TANO covers the properties of multiple user groups by coordinating the updates of several groups of the normalization statistics during meta-training and automatically identifies the appropriate normalization group for a downstream few-shot task. Consequently, TANO provides stable but task-specific estimations of the normalization statistics to close the distribution gaps and achieve efficient model adaptation. Results on both intra-domain and out-of-domain generalization experiments demonstrate that TANO outperforms recent methods in terms of accuracy, inference speed, and model size. Moreover, TANO achieves promising results on widely-used FSL benchmarks and data from real applications.

Universally Optimal Deterministic Broadcasting in the HYBRID Distributed Model

  • Authors: Yi-Jun Chang, Oren Hecht, Dean Leitersdorf
  • Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.06317
  • Pdf link: https://arxiv.org/pdf/2304.06317
  • Abstract
    In theoretical computer science, it is a common practice to show existential lower bounds for problems, meaning there is a family of pathological inputs on which no algorithm can do better. However, most inputs of interest can be solved much more efficiently, giving rise to the notion of universally optimal algorithms, which run as fast as possible on every input. Questions on the existence of universally optimal algorithms were first raised by Garay, Kutten, and Peleg in FOCS '93. This research direction reemerged recently through a series of works, including the influential work of Haeupler, Wajc, and Zuzic in STOC '21, which resolves some of these decades-old questions in the supported CONGEST model. We work in the HYBRID distributed model, which analyzes networks combining both global and local communication. Much attention has recently been devoted to solving distance related problems, such as All-Pairs Shortest Paths (APSP) in HYBRID, culminating in a $\tilde \Theta(n^{1/2})$ round algorithm for exact APSP. However, by definition, every problem in HYBRID is solvable in $D$ (diameter) rounds, showing that it is far from universally optimal. We show the first universally optimal algorithms in HYBRID, by presenting a fundamental tool that solves any broadcasting problem in a universally optimal number of rounds, deterministically. Specifically, we consider the problem in a graph $G$ where a set of $k$ messages $M$ distributed arbitrarily across $G$, requires every node to learn all of $M$. We show a universal lower bound and a matching, deterministic upper bound, for any graph $G$, any value $k$, and any distribution of $M$ across $G$. This broadcasting tool opens a new exciting direction of research into showing universally optimal algorithms in HYBRID. As an example, we use it to obtain algorithms for approximate and exact APSP in general and sparse graphs.

Continual Learning of Hand Gestures for Human-Robot Interaction

  • Authors: Xavier Cucurull, Anaís Garrell
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06319
  • Pdf link: https://arxiv.org/pdf/2304.06319
  • Abstract
    In this paper, we present an efficient method to incrementally learn to classify static hand gestures. This method allows users to teach a robot to recognize new symbols in an incremental manner. Contrary to other works which use special sensors or external devices such as color or data gloves, our proposed approach makes use of a single RGB camera to perform static hand gesture recognition from 2D images. Furthermore, our system is able to incrementally learn up to 38 new symbols using only 5 samples for each old class, achieving a final average accuracy of over 90%. In addition to that, the incremental training time can be reduced to a 10% of the time required when using all data available.

An Automotive Case Study on the Limits of Approximation for Object Detection

  • Authors: Martí Caro, Hamid Tabani, Jaume Abella, Francesc Moll, Enric Morancho, Ramon Canal, Josep Altet, Antonio Calomarde, Francisco J. Cazorla, Antonio Rubio, Pau Fontova, Jordi Fornt
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.06327
  • Pdf link: https://arxiv.org/pdf/2304.06327
  • Abstract
    The accuracy of camera-based object detection (CBOD) built upon deep learning is often evaluated against the real objects in frames only. However, such simplistic evaluation ignores the fact that many unimportant objects are small, distant, or background, and hence, their misdetections have less impact than those for closer, larger, and foreground objects in domains such as autonomous driving. Moreover, sporadic misdetections are irrelevant since confidence on detections is typically averaged across consecutive frames, and detection devices (e.g. cameras, LiDARs) are often redundant, thus providing fault tolerance. This paper exploits such intrinsic fault tolerance of the CBOD process, and assesses in an automotive case study to what extent CBOD can tolerate approximation coming from multiple sources such as lower precision arithmetic, approximate arithmetic units, and even random faults due to, for instance, low voltage operation. We show that the accuracy impact of those sources of approximation is within 1% of the baseline even when considering the three approximate domains simultaneously, and hence, multiple sources of approximation can be exploited to build highly efficient accelerators for CBOD in cars.

EF/CF: High Performance Smart Contract Fuzzing for Exploit Generation

  • Authors: Michael Rodler, David Paaßen, Wenting Li, Lukas Bernhard, Thorsten Holz, Ghassan Karame, Lucas Davi
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.06341
  • Pdf link: https://arxiv.org/pdf/2304.06341
  • Abstract
    Smart contracts are increasingly being used to manage large numbers of high-value cryptocurrency accounts. There is a strong demand for automated, efficient, and comprehensive methods to detect security vulnerabilities in a given contract. While the literature features a plethora of analysis methods for smart contracts, the existing proposals do not address the increasing complexity of contracts. Existing analysis tools suffer from false alarms and missed bugs in today's smart contracts that are increasingly defined by complexity and interdependencies. To scale accurate analysis to modern smart contracts, we introduce EF/CF, a high-performance fuzzer for Ethereum smart contracts. In contrast to previous work, EF/CF efficiently and accurately models complex smart contract interactions, such as reentrancy and cross-contract interactions, at a very high fuzzing throughput rate. To achieve this, EF/CF transpiles smart contract bytecode into native C++ code, thereby enabling the reuse of existing, optimized fuzzing toolchains. Furthermore, EF/CF increases fuzzing efficiency by employing a structure-aware mutation engine for smart contract transaction sequences and using a contract's ABI to generate valid transaction inputs. In a comprehensive evaluation, we show that EF/CF scales better -- without compromising accuracy -- to complex contracts compared to state-of-the-art approaches, including other fuzzers, symbolic/concolic execution, and hybrid approaches. Moreover, we show that EF/CF can automatically generate transaction sequences that exploit reentrancy bugs to steal Ether.

DDT: Dual-branch Deformable Transformer for Image Denoising

  • Authors: Kangliang Liu, Xiangcheng Du, Sijie Liu, Yingbin Zheng, Xingjiao Wu, Cheng Jin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06346
  • Pdf link: https://arxiv.org/pdf/2304.06346
  • Abstract
    Transformer is beneficial for image denoising tasks since it can model long-range dependencies to overcome the limitations presented by inductive convolutional biases. However, directly applying the transformer structure to remove noise is challenging because its complexity grows quadratically with the spatial resolution. In this paper, we propose an efficient Dual-branch Deformable Transformer (DDT) denoising network which captures both local and global interactions in parallel. We divide features with a fixed patch size and a fixed number of patches in local and global branches, respectively. In addition, we apply deformable attention operation in both branches, which helps the network focus on more important regions and further reduces computational complexity. We conduct extensive experiments on real-world and synthetic denoising tasks, and the proposed DDT achieves state-of-the-art performance with significantly fewer computational costs.

ODAM: Gradient-based instance-specific visual explanations for object detection

  • Authors: Chenyang Zhao, Antoni B. Chan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06354
  • Pdf link: https://arxiv.org/pdf/2304.06354
  • Abstract
    We propose the gradient-weighted Object Detector Activation Maps (ODAM), a visualized explanation technique for interpreting the predictions of object detectors. Utilizing the gradients of detector targets flowing into the intermediate feature maps, ODAM produces heat maps that show the influence of regions on the detector's decision for each predicted attribute. Compared to previous works classification activation maps (CAM), ODAM generates instance-specific explanations rather than class-specific ones. We show that ODAM is applicable to both one-stage detectors and two-stage detectors with different types of detector backbones and heads, and produces higher-quality visual explanations than the state-of-the-art both effectively and efficiently. We next propose a training scheme, Odam-Train, to improve the explanation ability on object discrimination of the detector through encouraging consistency between explanations for detections on the same object, and distinct explanations for detections on different objects. Based on the heat maps produced by ODAM with Odam-Train, we propose Odam-NMS, which considers the information of the model's explanation for each prediction to distinguish the duplicate detected objects. We present a detailed analysis of the visualized explanations of detectors and carry out extensive experiments to validate the effectiveness of the proposed ODAM.

IBIA: An Incremental Build-Infer-Approximate Framework for Approximate Inference of Partition Function

  • Authors: Shivani Bathla, Vinita Vasudevan
  • Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06366
  • Pdf link: https://arxiv.org/pdf/2304.06366
  • Abstract
    Exact computation of the partition function is known to be intractable, necessitating approximate inference techniques. Existing methods for approximate inference are slow to converge for many benchmarks. The control of accuracy-complexity trade-off is also non-trivial in many of these methods. We propose a novel incremental build-infer-approximate (IBIA) framework for approximate inference that addresses these issues. In this framework, the probabilistic graphical model is converted into a sequence of clique tree forests (SCTF) with bounded clique sizes. We show that the SCTF can be used to efficiently compute the partition function. We propose two new algorithms which are used to construct the SCTF and prove the correctness of both. The first is an algorithm for incremental construction of CTFs that is guaranteed to give a valid CTF with bounded clique sizes and the second is an approximation algorithm that takes a calibrated CTF as input and yields a valid and calibrated CTF with reduced clique sizes as the output. We have evaluated our method using several benchmark sets from recent UAI competitions and our results show good accuracies with competitive runtimes.

An attack resilient policy on the tip pool for DAG-based distributed ledgers

  • Authors: Lianna Zhao, Andrew Culleny, Sebastian Muellerz, Olivia Saay, Robert Shorten
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.06369
  • Pdf link: https://arxiv.org/pdf/2304.06369
  • Abstract
    This paper discusses congestion control and inconsistency problems in DAG-based distributed ledgers and proposes an additional filter to mitigate these issues. Unlike traditional blockchains, DAG-based DLTs use a directed acyclic graph structure to organize transactions, allowing higher scalability and efficiency. However, this also introduces challenges in controlling the rate at which blocks are added to the network and preventing the influence of spam attacks. To address these challenges, we propose a filter to limit the tip pool size and to avoid referencing old blocks. Furthermore, we present experimental results to demonstrate the effectiveness of this filter in reducing the negative impacts of various attacks. Our approach offers a lightweight and efficient solution for managing the flow of blocks in DAG-based DLTs, which can enhance the consistency and reliability of these systems. Index

Contact Models in Robotics: a Comparative Analysis

  • Authors: Quentin Le Lidec, Wilson Jallet, Louis Montaut, Ivan Laptev, Cordelia Schmid, Justin Carpentier
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06372
  • Pdf link: https://arxiv.org/pdf/2304.06372
  • Abstract
    Physics simulation is ubiquitous in robotics. Whether in model-based approaches (e.g., trajectory optimization), or model-free algorithms (e.g., reinforcement learning), physics simulators are a central component of modern control pipelines in robotics. Over the past decades, several robotic simulators have been developed, each with dedicated contact modeling assumptions and algorithmic solutions. In this article, we survey the main contact models and the associated numerical methods commonly used in robotics for simulating advanced robot motions involving contact interactions. In particular, we recall the physical laws underlying contacts and friction (i.e., Signorini condition, Coulomb's law, and the maximum dissipation principle), and how they are transcribed in current simulators. For each physics engine, we expose their inherent physical relaxations along with their limitations due to the numerical techniques employed. Based on our study, we propose theoretically grounded quantitative criteria on which we build benchmarks assessing both the physical and computational aspects of simulation. We support our work with an open-source and efficient C++ implementation of the existing algorithmic variations. Our results demonstrate that some approximations or algorithms commonly used in robotics can severely widen the reality gap and impact target applications. We hope this work will help motivate the development of new contact models, contact solvers, and robotic simulators in general, at the root of recent progress in motion generation in robotics.

Learning Accurate Performance Predictors for Ultrafast Automated Model Compression

  • Authors: Ziwei Wang, Jiwen Lu, Han Xiao, Shengyu Liu, Jie Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06393
  • Pdf link: https://arxiv.org/pdf/2304.06393
  • Abstract
    In this paper, we propose an ultrafast automated model compression framework called SeerNet for flexible network deployment. Conventional non-differen-tiable methods discretely search the desirable compression policy based on the accuracy from exhaustively trained lightweight models, and existing differentiable methods optimize an extremely large supernet to obtain the required compressed model for deployment. They both cause heavy computational cost due to the complex compression policy search and evaluation process. On the contrary, we obtain the optimal efficient networks by directly optimizing the compression policy with an accurate performance predictor, where the ultrafast automated model compression for various computational cost constraint is achieved without complex compression policy search and evaluation. Specifically, we first train the performance predictor based on the accuracy from uncertain compression policies actively selected by efficient evolutionary search, so that informative supervision is provided to learn the accurate performance predictor with acceptable cost. Then we leverage the gradient that maximizes the predicted performance under the barrier complexity constraint for ultrafast acquisition of the desirable compression policy, where adaptive update stepsizes with momentum are employed to enhance optimality of the acquired pruning and quantization strategy. Compared with the state-of-the-art automated model compression methods, experimental results on image classification and object detection show that our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.

Fast And Automatic Floating Point Error Analysis With CHEF-FP

  • Authors: Garima Singh, Baidyanath Kundu, Harshitha Menon, Alexander Penev, David J. Lange, Vassil Vassilev
  • Subjects: Numerical Analysis (math.NA); Hardware Architecture (cs.AR); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.06441
  • Pdf link: https://arxiv.org/pdf/2304.06441
  • Abstract
    As we reach the limit of Moore's Law, researchers are exploring different paradigms to achieve unprecedented performance. Approximate Computing (AC), which relies on the ability of applications to tolerate some error in the results to trade-off accuracy for performance, has shown significant promise. Despite the success of AC in domains such as Machine Learning, its acceptance in High-Performance Computing (HPC) is limited due to stringent requirements for accuracy. We need tools and techniques to identify regions of code that are amenable to approximations and their impact on the application output quality to guide developers to employ selective approximation. To this end, we propose CHEF-FP, a flexible, scalable, and easy-to-use source-code transformation tool based on Automatic Differentiation (AD) for analyzing approximation errors in HPC applications. CHEF-FP uses Clad, an efficient AD tool built as a plugin to the Clang compiler and based on the LLVM compiler infrastructure, as a backend and utilizes its AD abilities to evaluate approximation errors in C++ code. CHEF-FP works at the source by injecting error estimation code into the generated adjoints. This enables the error-estimation code to undergo compiler optimizations resulting in improved analysis time and reduced memory usage. We also provide theoretical and architectural augmentations to source code transformation-based AD tools to perform FP error analysis. This paper primarily focuses on analyzing errors introduced by mixed-precision AC techniques. We also show the applicability of our tool in estimating other kinds of errors by evaluating our tool on codes that use approximate functions. Moreover, we demonstrate the speedups CHEF-FP achieved during analysis time compared to the existing state-of-the-art tool due to its ability to generate and insert approximation error estimate code directly into the derivative source.

SpectFormer: Frequency and Attention is what you need in a Vision Transformer

  • Authors: Badri N. Patro, Vinay P. Namboodiri, Vijay Srinivas Agneeswaran
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06446
  • Pdf link: https://arxiv.org/pdf/2304.06446
  • Abstract
    Vision transformers have been applied successfully for image recognition tasks. There have been either multi-headed self-attention based (ViT \cite{dosovitskiy2020image}, DeIT, \cite{touvron2021training}) similar to the original work in textual models or more recently based on spectral layers (Fnet\cite{lee2021fnet}, GFNet\cite{rao2021global}, AFNO\cite{guibas2021efficient}). We hypothesize that both spectral and multi-headed attention plays a major role. We investigate this hypothesis through this work and observe that indeed combining spectral and multi-headed attention layers provides a better transformer architecture. We thus propose the novel Spectformer architecture for transformers that combines spectral and multi-headed attention layers. We believe that the resulting representation allows the transformer to capture the feature representation appropriately and it yields improved performance over other transformer representations. For instance, it improves the top-1 accuracy by 2% on ImageNet compared to both GFNet-H and LiT. SpectFormer-S reaches 84.25% top-1 accuracy on ImageNet-1K (state of the art for small version). Further, Spectformer-L achieves 85.7% that is the state of the art for the comparable base version of the transformers. We further ensure that we obtain reasonable results in other scenarios such as transfer learning on standard datasets such as CIFAR-10, CIFAR-100, Oxford-IIIT-flower, and Standford Car datasets. We then investigate its use in downstream tasks such of object detection and instance segmentation on the MS-COCO dataset and observe that Spectformer shows consistent performance that is comparable to the best backbones and can be further optimized and improved. Hence, we believe that combined spectral and attention layers are what are needed for vision transformers.

CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input

  • Authors: Senmao Tian, Ming Lu, Jiaming Liu, Yandong Guo, Yurong Chen, Shunli Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06454
  • Pdf link: https://arxiv.org/pdf/2304.06454
  • Abstract
    With the development of high-definition display devices, the practical scenario of Super-Resolution (SR) usually needs to super-resolve large input like 2K to higher resolution (4K/8K). To reduce the computational and memory cost, current methods first split the large input into local patches and then merge the SR patches into the output. These methods adaptively allocate a subnet for each patch. Quantization is a very important technique for network acceleration and has been used to design the subnets. Current methods train an MLP bit selector to determine the propoer bit for each layer. However, they uniformly sample subnets for training, making simple subnets overfitted and complicated subnets underfitted. Therefore, the trained bit selector fails to determine the optimal bit. Apart from this, the introduced bit selector brings additional cost to each layer of the SR network. In this paper, we propose a novel method named Content-Aware Bit Mapping (CABM), which can remove the bit selector without any performance loss. CABM also learns a bit selector for each layer during training. After training, we analyze the relation between the edge information of an input patch and the bit of each layer. We observe that the edge information can be an effective metric for the selected bit. Therefore, we design a strategy to build an Edge-to-Bit lookup table that maps the edge score of a patch to the bit of each layer during inference. The bit configuration of SR network can be determined by the lookup tables of all layers. Our strategy can find better bit configuration, resulting in more efficient mixed precision networks. We conduct detailed experiments to demonstrate the generalization ability of our method. The code will be released.

Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages

  • Authors: Israel Abebe Azime, Sana Sabah Al-Azzawi, Atnafu Lambebo Tonja, Iyanuoluwa Shode, Jesujoba Alabi, Ayodele Awokoya, Mardiyyah Oduwole, Tosin Adewumi, Samuel Fanijo, Oyinkansola Awosan, Oreen Yousuf
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06459
  • Pdf link: https://arxiv.org/pdf/2304.06459
  • Abstract
    AfriSenti-SemEval Shared Task 12 of SemEval-2023. The task aims to perform monolingual sentiment classification (sub-task A) for 12 African languages, multilingual sentiment classification (sub-task B), and zero-shot sentiment classification (task C). For sub-task A, we conducted experiments using classical machine learning classifiers, Afro-centric language models, and language-specific models. For task B, we fine-tuned multilingual pre-trained language models that support many of the languages in the task. For task C, we used we make use of a parameter-efficient Adapter approach that leverages monolingual texts in the target language for effective zero-shot transfer. Our findings suggest that using pre-trained Afro-centric language models improves performance for low-resource African languages. We also ran experiments using adapters for zero-shot tasks, and the results suggest that we can obtain promising results by using adapters with a limited amount of resources.

Repositioning Tiered HotSpot Execution Performance Relative to the Interpreter

  • Authors: Jonathan Lambert, Kevin Casey, Rosemary Monahan
  • Subjects: Programming Languages (cs.PL); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.06460
  • Pdf link: https://arxiv.org/pdf/2304.06460
  • Abstract
    Although the advantages of just-in-time compilation over traditional interpretive execution are widely recognised, there needs to be more current research investigating and repositioning the performance differences between these two execution models relative to contemporary workloads. Specifically, there is a need to examine the performance differences between Java Runtime Environment (JRE) Java Virtual Machine (JVM) tiered execution and JRE JVM interpretive execution relative to modern multicore architectures and modern concurrent and parallel benchmark workloads. This article aims to fill this research gap by presenting the results of a study that compares the performance of these two execution models under load from the Renaissance Benchmark Suite. This research is relevant to anyone interested in understanding the performance differences between just-in-time compiled code and interpretive execution. It provides a contemporary assessment of the interpretive JVM core, the entry and starting point for bytecode execution, relative to just-in-time tiered execution. The study considers factors such as the JRE version, the GNU GCC version used in the JRE build toolchain, and the garbage collector algorithm specified at runtime, and their impact on the performance difference envelope between interpretive and tiered execution. Our findings indicate that tiered execution is considerably more efficient than interpretive execution, and the performance gap has increased, ranging from 4 to 37 times more efficient. On average, tiered execution is approximately 15 times more efficient than interpretive execution. Additionally, the performance differences between interpretive and tiered execution are influenced by workload category, with narrower performance differences observed for web-based workloads and more significant differences for Functional and Scala-type workloads.

Towards Understanding the Benefits and Challenges of Demand Responsive Public Transit- A Case Study in the City of Charlotte, NC

  • Authors: Sanaz Sadat Hosseini, Mona Azarbayjani, Jason Lawrence, Hamed Tabkhi
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06467
  • Pdf link: https://arxiv.org/pdf/2304.06467
  • Abstract
    Access to adequate public transportation plays a critical role in inequity and socio-economic mobility, particularly in low-income communities. Low-income workers who rely heavily on public transportation face a spatial disparity between home and work, which leads to higher unemployment, longer job searches, and longer commute times. The overarching goal of this study is to get initial data that would result in creating a connected, coordinated, demand-responsive, and efficient public bus system that minimizes transit gaps for low-income, transit-dependent communities. To create equitable metropolitan public transportation, this paper evaluates existing CATS mobile applications that assist passengers in finding bus routes and arrival times. Our community survey methodology includes filling out questionnaires on Charlotte's current bus system on specific bus lines and determining user acceptance for a future novel smart technology. We have also collected data on the demand and transit gap for a real-world pilot study, Sprinter bus line, Bus line 7, Bus line 9, and Bus lines 97-99. These lines connect all of Charlotte City's main areas and are the most important bus lines in the system. On the studied routes, the primary survey results indicate that the current bus system has many flaws, the major one being the lack of proper timing to meet the needs of passengers. The most common problems are long commutes and long waiting times at stations. Moreover, the existing application provides inaccurate information, and on average, 80 percent of travelers and respondents are inclined to use new technology.

An Efficient Transfer Learning-based Approach for Apple Leaf Disease Classification

  • Authors: Md. Hamjajul Ashmafee, Tasnim Ahmed, Sabbir Ahmed, Md. Bakhtiar Hasan, Mst Nura Jahan, A.B.M. Ashikur Rahman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06520
  • Pdf link: https://arxiv.org/pdf/2304.06520
  • Abstract
    Correct identification and categorization of plant diseases are crucial for ensuring the safety of the global food supply and the overall financial success of stakeholders. In this regard, a wide range of solutions has been made available by introducing deep learning-based classification systems for different staple crops. Despite being one of the most important commercial crops in many parts of the globe, research proposing a smart solution for automatically classifying apple leaf diseases remains relatively unexplored. This study presents a technique for identifying apple leaf diseases based on transfer learning. The system extracts features using a pretrained EfficientNetV2S architecture and passes to a classifier block for effective prediction. The class imbalance issues are tackled by utilizing runtime data augmentation. The effect of various hyperparameters, such as input resolution, learning rate, number of epochs, etc., has been investigated carefully. The competence of the proposed pipeline has been evaluated on the apple leaf disease subset from the publicly available `PlantVillage' dataset, where it achieved an accuracy of 99.21%, outperforming the existing works.

Multi-kernel Correntropy-based Orientation Estimation of IMUs: Gradient Descent Methods

  • Authors: Shilei Li, Lijing Li, Dawei Shi, Yunjiang Lou, Ling Shi
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.06548
  • Pdf link: https://arxiv.org/pdf/2304.06548
  • Abstract
    This paper presents two computationally efficient algorithms for the orientation estimation of inertial measurement units (IMUs): the correntropy-based gradient descent (CGD) and the correntropy-based decoupled orientation estimation (CDOE). Traditional methods, such as gradient descent (GD) and decoupled orientation estimation (DOE), rely on the mean squared error (MSE) criterion, making them vulnerable to external acceleration and magnetic interference. To address this issue, we demonstrate that the multi-kernel correntropy loss (MKCL) is an optimal objective function for maximum likelihood estimation (MLE) when the noise follows a type of heavy-tailed distribution. In certain situations, the estimation error of the MKCL is bounded even in the presence of arbitrarily large outliers. By replacing the standard MSE cost function with MKCL, we develop the CGD and CDOE algorithms. We evaluate the effectiveness of our proposed methods by comparing them with existing algorithms in various situations. Experimental results indicate that our proposed methods (CGD and CDOE) outperform their conventional counterparts (GD and DOE), especially when faced with external acceleration and magnetic disturbances. Furthermore, the new algorithms demonstrate significantly lower computational complexity than Kalman filter-based approaches, making them suitable for applications with low-cost microprocessors.

Multiscale Finite Element Formulations for 2D/1D Problems

  • Authors: Karl Hollaus, Markus Schöbinger
  • Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
  • Arxiv link: https://arxiv.org/abs/2304.06553
  • Pdf link: https://arxiv.org/pdf/2304.06553
  • Abstract
    Multiscale finite element methods for 2D/1D problems have been studied in this work to demonstrate their excellent ability to solve real-world problems. These methods are much more efficient than conventional 3D finite element methods and just as accurate. The 2D/1D multiscale finite element methods are based on a magnetic vector potential or a current vector potential. Known currents for excitation can be replaced by the Biot-Savart-field. Boundary conditions allow to integrate planes of symmetry. All presented approaches consider eddy currents, an insulation layer and preserve the edge effect. A segment of a fictitious electrical machine has been studied to demonstrate all above options, the accuracy and the low computational costs of the 2D/1D multiscale finite element methods.

Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation

  • Authors: Mohit Sharma, Claudio Fantacci, Yuxiang Zhou, Skanda Koppula, Nicolas Heess, Jon Scholz, Yusuf Aytar
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06600
  • Pdf link: https://arxiv.org/pdf/2304.06600
  • Abstract
    Recent works have shown that large models pretrained on common visual learning tasks can provide useful representations for a wide range of specialized perception problems, as well as a variety of robotic manipulation tasks. While prior work on robotic manipulation has predominantly used frozen pretrained features, we demonstrate that in robotics this approach can fail to reach optimal performance, and that fine-tuning of the full model can lead to significantly better results. Unfortunately, fine-tuning disrupts the pretrained visual representation, and causes representational drift towards the fine-tuned task thus leading to a loss of the versatility of the original model. We introduce "lossless adaptation" to address this shortcoming of classical fine-tuning. We demonstrate that appropriate placement of our parameter efficient adapters can significantly reduce the performance gap between frozen pretrained representations and full end-to-end fine-tuning without changes to the original representation and thus preserving original capabilities of the pretrained model. We perform a comprehensive investigation across three major model architectures (ViTs, NFNets, and ResNets), supervised (ImageNet-1K classification) and self-supervised pretrained weights (CLIP, BYOL, Visual MAE) in 3 task domains and 35 individual tasks, and demonstrate that our claims are strongly validated in various settings.

Robustness Measures and Monitors for Time Window Temporal Logic

  • Authors: Ahmad Ahmad, Cristian-Ioan Vasile, Roberto Tron, Calin Belta
  • Subjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.06645
  • Pdf link: https://arxiv.org/pdf/2304.06645
  • Abstract
    Temporal logics (TLs) have been widely used to formalize interpretable tasks for cyber-physical systems. Time Window Temporal Logic (TWTL) has been recently proposed as a specification language for dynamical systems. In particular, it can easily express robotic tasks, and it allows for efficient, automata-based verification and synthesis of control policies for such systems. In this paper, we define two quantitative semantics for this logic, and two corresponding monitoring algorithms, which allow for real-time quantification of satisfaction of formulas by trajectories of discrete-time systems. We demonstrate the new semantics and their runtime monitors on numerical examples.

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

  • Authors: Enze Xie, Lewei Yao, Han Shi, Zhili Liu, Daquan Zhou, Zhaoqiang Liu, Jiawei Li, Zhenguo Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06648
  • Pdf link: https://arxiv.org/pdf/2304.06648
  • Abstract
    Diffusion models have proven to be highly effective in generating high-quality images. However, adapting large pre-trained diffusion models to new domains remains an open challenge, which is critical for real-world applications. This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains. DiffFit is embarrassingly simple that only fine-tunes the bias term and newly-added scaling factors in specific layers, yet resulting in significant training speed-up and reduced model storage costs. Compared with full fine-tuning, DiffFit achieves 2$\times$ training speed-up and only needs to store approximately 0.12% of the total model parameters. Intuitive theoretical analysis has been provided to justify the efficacy of scaling factors on fast adaptation. On 8 downstream datasets, DiffFit achieves superior or competitive performances compared to the full fine-tuning while being more efficient. Remarkably, we show that DiffFit can adapt a pre-trained low-resolution generative model to a high-resolution one by adding minimal cost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of 3.02 on ImageNet 512$\times$512 benchmark by fine-tuning only 25 epochs from a public pre-trained ImageNet 256$\times$256 checkpoint while being 30$\times$ more training efficient than the closest competitor.

DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer

  • Authors: Amit Kumar Rana, Sabarinath Mahadevan, Alexander Hermans, Bastian Leibe
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06668
  • Pdf link: https://arxiv.org/pdf/2304.06668
  • Abstract
    Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth annotations for training, which are expensive to create. Interactive segmentation networks help generate such annotations based on an image and the corresponding user interactions such as clicks. Existing methods for this task can only process a single instance at a time and each user interaction requires a full forward pass through the entire deep network. We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as spatio-temporal queries to a Transformer decoder with a potential to segment multiple object instances in a single iteration. Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image when compared to other methods. DynaMITe achieves state-of-the-art results on multiple existing interactive segmentation benchmarks, and also on the new multi-instance benchmark that we propose in this paper.

Inertia-Aware Microgrid Investment Planning Using Tractable Decomposition Algorithms

  • Authors: Agnes Marjorie Nakiganda, Shahab Dehghan, Petros Aristidou
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06674
  • Pdf link: https://arxiv.org/pdf/2304.06674
  • Abstract
    The integration of the frequency dynamics into Micro-Grid (MG) investment and operational planning problems is vital in ensuring the security of the system in the post-contingency states. However, the task of including transient security constraints in MG planning problems is non-trivial. This is due to the highly non-linear and non-convex nature of the analytical closed form of the frequency metrics (e.g., frequency nadir) and power flow constraints. To handle this issue, this paper presents two algorithms for decomposing the MG investment planning problem into multiple levels to enhance computational tractability and optimality. Furthermore, the sensitivity of the decisions made at each level is captured by corresponding dual cutting planes to model feasible secure regions. This, in turn, ensures both the optimal determination and placement of inertia services and accelerates the convergence of the proposed decomposition algorithms. The efficient and effective performance of the proposed algorithms is tested and verified on an 18-bus Low Voltage (LV) network and a 30-bus Medium Voltage (MV) network under various operating scenarios.

OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems

  • Authors: Jiachang Liu, Sam Rosen, Chudi Zhong, Cynthia Rudin
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.06686
  • Pdf link: https://arxiv.org/pdf/2304.06686
  • Abstract
    We consider an important problem in scientific discovery, identifying sparse governing equations for nonlinear dynamical systems. This involves solving sparse ridge regression problems to provable optimality in order to determine which terms drive the underlying dynamics. We propose a fast algorithm, OKRidge, for sparse ridge regression, using a novel lower bound calculation involving, first, a saddle point formulation, and from there, either solving (i) a linear system or (ii) using an ADMM-based approach, where the proximal operators can be efficiently evaluated by solving another linear system and an isotonic regression problem. We also propose a method to warm-start our solver, which leverages a beam search. Experimentally, our methods attain provable optimality with run times that are orders of magnitude faster than those of the existing MIP formulations solved by the commercial solver Gurobi.

Representing Volumetric Videos as Dynamic MLP Maps

  • Authors: Sida Peng, Yunzhi Yan, Qing Shuai, Hujun Bao, Xiaowei Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06717
  • Pdf link: https://arxiv.org/pdf/2304.06717
  • Abstract
    This paper introduces a novel representation of volumetric videos for real-time view synthesis of dynamic scenes. Recent advances in neural scene representations demonstrate their remarkable capability to model and render complex static scenes, but extending them to represent dynamic scenes is not straightforward due to their slow rendering speed or high storage cost. To solve this problem, our key idea is to represent the radiance field of each frame as a set of shallow MLP networks whose parameters are stored in 2D grids, called MLP maps, and dynamically predicted by a 2D CNN decoder shared by all frames. Representing 3D scenes with shallow MLPs significantly improves the rendering speed, while dynamically predicting MLP parameters with a shared 2D CNN instead of explicitly storing them leads to low storage cost. Experiments show that the proposed approach achieves state-of-the-art rendering quality on the NHR and ZJU-MoCap datasets, while being efficient for real-time rendering with a speed of 41.7 fps for $512 \times 512$ images on an RTX 3090 GPU. The code is available at https://zju3dv.github.io/mlp_maps/.

Keyword: faster

Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays

  • Authors: Chen Xie, Francesco Daghero, Yukai Chen, Marco Castellano, Luca Gandolfi, Andrea Calimera, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06059
  • Pdf link: https://arxiv.org/pdf/2304.06059
  • Abstract
    Ultra-low-resolution Infrared (IR) array sensors offer a low-cost, energy-efficient, and privacy-preserving solution for people counting, with applications such as occupancy monitoring. Previous work has shown that Deep Learning (DL) can yield superior performance on this task. However, the literature was missing an extensive comparative analysis of various efficient DL architectures for IR array-based people counting, that considers not only their accuracy, but also the cost of deploying them on memory- and energy-constrained Internet of Things (IoT) edge nodes. In this work, we address this need by comparing 6 different DL architectures on a novel dataset composed of IR images collected from a commercial 8x8 array, which we made openly available. With a wide architectural exploration of each model type, we obtain a rich set of Pareto-optimal solutions, spanning cross-validated balanced accuracy scores in the 55.70-82.70% range. When deployed on a commercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, these models occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference, while consuming 17.18-120.43 $\mu$J of energy. Our models are significantly more accurate than a previous deterministic method (up to +39.9%), while being up to 3.53x faster and more energy efficient. Further, our models' accuracy is comparable to state-of-the-art DL solutions on similar resolution sensors, despite a much lower complexity. All our models enable continuous, real-time inference on a MCU-based IoT node, with years of autonomous operation without battery recharging.

Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

  • Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.06178
  • Pdf link: https://arxiv.org/pdf/2304.06178
  • Abstract
    Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.

Beyond the Quadratic Time Barrier for Network Unreliability

  • Authors: Ruoxu Cen, William He, Jason Li, Debmalya Panigrahi
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.06552
  • Pdf link: https://arxiv.org/pdf/2304.06552
  • Abstract
    Karger (STOC 1995) gave the first FPTAS for the network (un)reliability problem, setting in motion research over the next three decades that obtained increasingly faster running times, eventually leading to a $\tilde{O}(n^2)$-time algorithm (Karger, STOC 2020). This represented a natural culmination of this line of work because the algorithmic techniques used can enumerate $\Theta(n^2)$ (near)-minimum cuts. In this paper, we go beyond this quadratic barrier and obtain a faster algorithm for the network unreliability problem. Our algorithm runs in $m^{1+o(1)} + \tilde{O}(n^{1.5})$ time. Our main contribution is a new estimator for network unreliability in very reliable graphs. These graphs are usually the bottleneck for network unreliability since the disconnection event is elusive. Our estimator is obtained by defining an appropriate importance sampling subroutine on a dual spanning tree packing of the graph. To complement this estimator for very reliable graphs, we use recursive contraction for moderately reliable graphs. We show that an interleaving of sparsification and contraction can be used to obtain a better parametrization of the recursive contraction algorithm that yields a faster running time matching the one obtained for the very reliable case.

Class-Incremental Learning of Plant and Disease Detection: Growing Branches with Knowledge Distillation

  • Authors: Mathieu Pagé Fortin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06619
  • Pdf link: https://arxiv.org/pdf/2304.06619
  • Abstract
    This paper investigates the problem of class-incremental object detection for agricultural applications where a model needs to learn new plant species and diseases incrementally without forgetting the previously learned ones. We adapt two public datasets to include new categories over time, simulating a more realistic and dynamic scenario. We then compare three class-incremental learning methods that leverage different forms of knowledge distillation to mitigate catastrophic forgetting. Our experiments show that all three methods suffer from catastrophic forgetting, but the recent Dynamic Y-KD approach, which additionally uses a dynamic architecture that grows new branches to learn new tasks, outperforms ILOD and Faster-ILOD in most scenarios both on new and old classes. These results highlight the challenges and opportunities of continual object detection for agricultural applications. In particular, the large intra-class and small inter-class variability that is typical of plant images exacerbate the difficulty of learning new categories without interfering with previous knowledge. We publicly release our code to encourage future work.

OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems

  • Authors: Jiachang Liu, Sam Rosen, Chudi Zhong, Cynthia Rudin
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.06686
  • Pdf link: https://arxiv.org/pdf/2304.06686
  • Abstract
    We consider an important problem in scientific discovery, identifying sparse governing equations for nonlinear dynamical systems. This involves solving sparse ridge regression problems to provable optimality in order to determine which terms drive the underlying dynamics. We propose a fast algorithm, OKRidge, for sparse ridge regression, using a novel lower bound calculation involving, first, a saddle point formulation, and from there, either solving (i) a linear system or (ii) using an ADMM-based approach, where the proximal operators can be efficiently evaluated by solving another linear system and an isotonic regression problem. We also propose a method to warm-start our solver, which leverages a beam search. Experimentally, our methods attain provable optimality with run times that are orders of magnitude faster than those of the existing MIP formulations solved by the commercial solver Gurobi.

Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields

  • Authors: Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, Peter Hedman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06706
  • Pdf link: https://arxiv.org/pdf/2304.06706
  • Abstract
    Neural Radiance Field training can be accelerated through the use of grid-based representations in NeRF's learned mapping from spatial coordinates to colors and volumetric density. However, these grid-based approaches lack an explicit understanding of scale and therefore often introduce aliasing, usually in the form of jaggies or missing scene content. Anti-aliasing has previously been addressed by mip-NeRF 360, which reasons about sub-volumes along a cone rather than points along a ray, but this approach is not natively compatible with current grid-based techniques. We show how ideas from rendering and signal processing can be used to construct a technique that combines mip-NeRF 360 and grid-based models such as Instant NGP to yield error rates that are 8% - 76% lower than either prior technique, and that trains 22x faster than mip-NeRF 360.

Keyword: mobile

Situational-Aware Multi-Graph Convolutional Recurrent Network (SA-MGCRN) for Travel Demand Forecasting During Wildfires

  • Authors: Xiaojian Zhang, Xilei Zhao, Yiming Xu, Ruggiero Lovreglio, Daniel Nilsson
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06233
  • Pdf link: https://arxiv.org/pdf/2304.06233
  • Abstract
    Real-time forecasting of travel demand during wildfire evacuations is crucial for emergency managers and transportation planners to make timely and better-informed decisions. However, few studies focus on accurate travel demand forecasting in large-scale emergency evacuations. Therefore, this study develops and tests a new methodological framework for modeling trip generation in wildfire evacuations by using (a) large-scale GPS data generated by mobile devices and (b) state-of-the-art AI technologies. The proposed methodology aims at forecasting evacuation trips and other types of trips. Based on the travel demand inferred from the GPS data, we develop a new deep learning model, i.e., Situational-Aware Multi-Graph Convolutional Recurrent Network (SA-MGCRN), along with a model updating scheme to achieve real-time forecasting of travel demand during wildfire evacuations. The proposed methodological framework is tested in this study for a real-world case study: the 2019 Kincade Fire in Sonoma County, CA. The results show that SA-MGCRN significantly outperforms all the selected state-of-the-art benchmarks in terms of prediction performance. Our finding suggests that the most important model components of SA-MGCRN are evacuation order/warning information, proximity to fire, and population change, which are consistent with behavioral theories and empirical findings.

Loosely Coupled Odometry, UWB Ranging, and Cooperative Spatial Detection for Relative Monte-Carlo Multi-Robot Localization

  • Authors: Xianjia Yu, Paola Torrico Morrn, Sahar Salimpour, Jorge Pena Queralta, Tomi Westerlund
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06264
  • Pdf link: https://arxiv.org/pdf/2304.06264
  • Abstract
    As mobile robots become more ubiquitous, their deployments grow across use cases where GNSS positioning is either unavailable or unreliable. This has led to increased interest in multi-modal relative localization methods. Complementing onboard odometry, ranging allows for relative state estimation, with ultra-wideband (UWB) ranging having gained widespread recognition due to its low cost and centimeter-level out-of-box accuracy. Infrastructure-free localization methods allow for more dynamic, ad-hoc, and flexible deployments, yet they have received less attention from the research community. In this work, we propose a cooperative relative multi-robot localization where we leverage inter-robot ranging and simultaneous spatial detections of objects in the environment. To achieve this, we equip robots with a single UWB transceiver and a stereo camera. We propose a novel Monte-Carlo approach to estimate relative states by either employing only UWB ranges or dynamically integrating simultaneous spatial detections from the stereo cameras. We also address the challenges for UWB ranging error mitigation, especially in non-line-of-sight, with a study on different LSTM networks to estimate the ranging error. The proposed approach has multiple benefits. First, we show that a single range is enough to estimate the accurate relative states of two robots when fusing odometry measurements. Second, our experiments also demonstrate that our approach surpasses traditional methods such as multilateration in terms of accuracy. Third, to increase accuracy even further, we allow for the integration of cooperative spatial detections. Finally, we show how ROS 2 and Zenoh can be integrated to build a scalable wireless communication solution for multi-robot systems. The experimental validation includes real-time deployment and autonomous navigation based on the relative positioning method.

Gamifying Math Education using Object Detection

  • Authors: Yueqiu Sun, Rohitkrishna Nambiar, Vivek Vidyasagaran
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06270
  • Pdf link: https://arxiv.org/pdf/2304.06270
  • Abstract
    Manipulatives used in the right way help improve mathematical concepts leading to better learning outcomes. In this paper, we present a phygital (physical + digital) curriculum inspired teaching system for kids aged 5-8 to learn geometry using shape tile manipulatives. Combining smaller shapes to form larger ones is an important skill kids learn early on which requires shape tiles to be placed close to each other in the play area. This introduces a challenge of oriented object detection for densely packed objects with arbitrary orientations. Leveraging simulated data for neural network training and light-weight mobile architectures, we enable our system to understand user interactions and provide real-time audiovisual feedback. Experimental results show that our network runs real-time with high precision/recall on consumer devices, thereby providing a consistent and enjoyable learning experience.

Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution

  • Authors: Zhuo Su, Jiehua Zhang, Tianpeng Liu, Zhen Liu, Shuanghui Zhang, Matti Pietikäinen, Li Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06305
  • Pdf link: https://arxiv.org/pdf/2304.06305
  • Abstract
    This paper proposes a novel module called middle spectrum grouped convolution (MSGC) for efficient deep convolutional neural networks (DCNNs) with the mechanism of grouped convolution. It explores the broad "middle spectrum" area between channel pruning and conventional grouped convolution. Compared with channel pruning, MSGC can retain most of the information from the input feature maps due to the group mechanism; compared with grouped convolution, MSGC benefits from the learnability, the core of channel pruning, for constructing its group topology, leading to better channel division. The middle spectrum area is unfolded along four dimensions: group-wise, layer-wise, sample-wise, and attention-wise, making it possible to reveal more powerful and interpretable structures. As a result, the proposed module acts as a booster that can reduce the computational cost of the host backbones for general image recognition with even improved predictive accuracy. For example, in the experiments on ImageNet dataset for image classification, MSGC can reduce the multiply-accumulates (MACs) of ResNet-18 and ResNet-50 by half but still increase the Top-1 accuracy by more than 1%. With 35% reduction of MACs, MSGC can also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on MS COCO dataset for object detection show similar observations. Our code and trained models are available at https://github.com/hellozhuo/msgc.

Towards Understanding the Benefits and Challenges of Demand Responsive Public Transit- A Case Study in the City of Charlotte, NC

  • Authors: Sanaz Sadat Hosseini, Mona Azarbayjani, Jason Lawrence, Hamed Tabkhi
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06467
  • Pdf link: https://arxiv.org/pdf/2304.06467
  • Abstract
    Access to adequate public transportation plays a critical role in inequity and socio-economic mobility, particularly in low-income communities. Low-income workers who rely heavily on public transportation face a spatial disparity between home and work, which leads to higher unemployment, longer job searches, and longer commute times. The overarching goal of this study is to get initial data that would result in creating a connected, coordinated, demand-responsive, and efficient public bus system that minimizes transit gaps for low-income, transit-dependent communities. To create equitable metropolitan public transportation, this paper evaluates existing CATS mobile applications that assist passengers in finding bus routes and arrival times. Our community survey methodology includes filling out questionnaires on Charlotte's current bus system on specific bus lines and determining user acceptance for a future novel smart technology. We have also collected data on the demand and transit gap for a real-world pilot study, Sprinter bus line, Bus line 7, Bus line 9, and Bus lines 97-99. These lines connect all of Charlotte City's main areas and are the most important bus lines in the system. On the studied routes, the primary survey results indicate that the current bus system has many flaws, the major one being the lack of proper timing to meet the needs of passengers. The most common problems are long commutes and long waiting times at stations. Moreover, the existing application provides inaccurate information, and on average, 80 percent of travelers and respondents are inclined to use new technology.

IoT-Based Water Quality Assessment System for Industrial Waste WaterHealthcare Perspective

  • Authors: Abdur Rab Dhruba, Kazi Nabiul Alam, Md. Shakib Khan, Sananda Saha, Mohammad Monirujjaman Khan, Mohammed Baz, Mehedi Masud, Mohammed A. AlZain
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06491
  • Pdf link: https://arxiv.org/pdf/2304.06491
  • Abstract
    The environment, especially water, gets polluted due to industrialization and urbanization. Pollution due to industrialization and urbanization has harmful effects on both the environment and the lives on Earth. This polluted water can cause food poisoning, diarrhea, short-term gastrointestinal problems, respiratory diseases, skin problems, and other serious health complications. In a developing country like Bangladesh, where ready-made garments sector is one of the major sources of the total Gross Domestic Product (GDP), most of the wastes released from the garment factories are dumped into the nearest rivers or canals. Hence, the quality of the water of these bodies become very incompatible for the living beings, and so, it has become one of the major threats to the environment and human health. In addition, the amount of fish in the rivers and canals in Bangladesh is decreasing day by day as a result of water pollution. Therefore, to save fish and other water animals and the environment, we need to monitor the quality of the water and find out the reasons for the pollution. Real-time monitoring of the quality of water is vital for controlling water pollution. Most of the approaches for controlling water pollution are mainly biological and lab-based, which takes a lot of time and resources. To address this issue, we developed an Internet of Things (IoT)-based real-time water quality monitoring system, integrated with a mobile application. The proposed system in this research measures some of the most important indexes of water, including the potential of hydrogen (pH), total dissolved solids (TDS), and turbidity, and temperature of water. The proposed system results will be very helpful in saving the environment, and thus, improving the health of living creatures on Earth.

IoT-Based Remote Health Monitoring System Employing Smart Sensors for Asthma Patients during COVID-19 Pandemic

  • Authors: Nafisa Shamim Rafa, Basma Binte Azmal, Abdur Rab Dhruba, Mohammad Monirujjaman Khan, Turki M. Alanazi, Faris A. Almalki, Othman AlOmeir
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06511
  • Pdf link: https://arxiv.org/pdf/2304.06511
  • Abstract
    COVID19 and asthma are respiratory diseases that can be life threatening in uncontrolled circumstances and require continuous monitoring. A poverty stricken South Asian country like Bangladesh has been bearing the brunt of the COVID19 pandemic since its beginning. The majority of the country's population resides in rural areas, where proper healthcare is difficult to access. This emphasizes the necessity of telemedicine, implementing the concept of the Internet of Things (IoT), which is still under development in Bangladesh. This paper demonstrates how the current challenges in the healthcare system are resolvable through the design of a remote health and environment monitoring system, specifically for asthma patients who are at an increased risk of COVID19. Since on-time treatment is essential, this system will allow doctors and medical staff to receive patient information in real time and deliver their services immediately to the patient regardless of their location. The proposed system consists of various sensors collecting heart rate, body temperature, ambient temperature, humidity, and air quality data and processing them through the Arduino Microcontroller. It is integrated with a mobile application. All this data is sent to the mobile application via a Bluetooth module and updated every few seconds so that the medical staff can instantly track patients' conditions and emergencies. The developed prototype is portable and easily usable by anyone. The system has been applied to five people of different ages and medical histories over a particular period. Upon analyzing all their data, it became clear which participants were particularly vulnerable to health deterioration and needed constant observation. Through this research, awareness about asthmatic symptoms will improve and help prevent their severity through effective treatment anytime, anywhere.

Keyword: pruning

Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution

  • Authors: Zhuo Su, Jiehua Zhang, Tianpeng Liu, Zhen Liu, Shuanghui Zhang, Matti Pietikäinen, Li Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06305
  • Pdf link: https://arxiv.org/pdf/2304.06305
  • Abstract
    This paper proposes a novel module called middle spectrum grouped convolution (MSGC) for efficient deep convolutional neural networks (DCNNs) with the mechanism of grouped convolution. It explores the broad "middle spectrum" area between channel pruning and conventional grouped convolution. Compared with channel pruning, MSGC can retain most of the information from the input feature maps due to the group mechanism; compared with grouped convolution, MSGC benefits from the learnability, the core of channel pruning, for constructing its group topology, leading to better channel division. The middle spectrum area is unfolded along four dimensions: group-wise, layer-wise, sample-wise, and attention-wise, making it possible to reveal more powerful and interpretable structures. As a result, the proposed module acts as a booster that can reduce the computational cost of the host backbones for general image recognition with even improved predictive accuracy. For example, in the experiments on ImageNet dataset for image classification, MSGC can reduce the multiply-accumulates (MACs) of ResNet-18 and ResNet-50 by half but still increase the Top-1 accuracy by more than 1%. With 35% reduction of MACs, MSGC can also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on MS COCO dataset for object detection show similar observations. Our code and trained models are available at https://github.com/hellozhuo/msgc.

Learning Accurate Performance Predictors for Ultrafast Automated Model Compression

  • Authors: Ziwei Wang, Jiwen Lu, Han Xiao, Shengyu Liu, Jie Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06393
  • Pdf link: https://arxiv.org/pdf/2304.06393
  • Abstract
    In this paper, we propose an ultrafast automated model compression framework called SeerNet for flexible network deployment. Conventional non-differen-tiable methods discretely search the desirable compression policy based on the accuracy from exhaustively trained lightweight models, and existing differentiable methods optimize an extremely large supernet to obtain the required compressed model for deployment. They both cause heavy computational cost due to the complex compression policy search and evaluation process. On the contrary, we obtain the optimal efficient networks by directly optimizing the compression policy with an accurate performance predictor, where the ultrafast automated model compression for various computational cost constraint is achieved without complex compression policy search and evaluation. Specifically, we first train the performance predictor based on the accuracy from uncertain compression policies actively selected by efficient evolutionary search, so that informative supervision is provided to learn the accurate performance predictor with acceptable cost. Then we leverage the gradient that maximizes the predicted performance under the barrier complexity constraint for ultrafast acquisition of the desirable compression policy, where adaptive update stepsizes with momentum are employed to enhance optimality of the acquired pruning and quantization strategy. Compared with the state-of-the-art automated model compression methods, experimental results on image classification and object detection show that our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.

Keyword: voxel

$E(3) \times SO(3)$-Equivariant Networks for Spherical Deconvolution in Diffusion MRI

  • Authors: Axel Elaldi, Guido Gerig, Neel Dey
  • Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06103
  • Pdf link: https://arxiv.org/pdf/2304.06103
  • Abstract
    We present Roto-Translation Equivariant Spherical Deconvolution (RT-ESD), an $E(3)\times SO(3)$ equivariant framework for sparse deconvolution of volumes where each voxel contains a spherical signal. Such 6D data naturally arises in diffusion MRI (dMRI), a medical imaging modality widely used to measure microstructure and structural connectivity. As each dMRI voxel is typically a mixture of various overlapping structures, there is a need for blind deconvolution to recover crossing anatomical structures such as white matter tracts. Existing dMRI work takes either an iterative or deep learning approach to sparse spherical deconvolution, yet it typically does not account for relationships between neighboring measurements. This work constructs equivariant deep learning layers which respect to symmetries of spatial rotations, reflections, and translations, alongside the symmetries of voxelwise spherical rotations. As a result, RT-ESD improves on previous work across several tasks including fiber recovery on the DiSCo dataset, deconvolution-derived partial volume estimation on real-world \textit{in vivo} human brain dMRI, and improved downstream reconstruction of fiber tractograms on the Tractometer dataset. Our implementation is available at https://github.com/AxelElaldi/e3so3_conv

Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

  • Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.06178
  • Pdf link: https://arxiv.org/pdf/2304.06178
  • Abstract
    Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.

Brain Structure Ages -- A new biomarker for multi-disease classification

  • Authors: Huy-Dung Nguyen, Michaël Clément, Boris Mansencal, Pierrick Coupé
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06591
  • Pdf link: https://arxiv.org/pdf/2304.06591
  • Abstract
    Age is an important variable to describe the expected brain's anatomy status across the normal aging trajectory. The deviation from that normative aging trajectory may provide some insights into neurological diseases. In neuroimaging, predicted brain age is widely used to analyze different diseases. However, using only the brain age gap information (\ie the difference between the chronological age and the estimated age) can be not enough informative for disease classification problems. In this paper, we propose to extend the notion of global brain age by estimating brain structure ages using structural magnetic resonance imaging. To this end, an ensemble of deep learning models is first used to estimate a 3D aging map (\ie voxel-wise age estimation). Then, a 3D segmentation mask is used to obtain the final brain structure ages. This biomarker can be used in several situations. First, it enables to accurately estimate the brain age for the purpose of anomaly detection at the population level. In this situation, our approach outperforms several state-of-the-art methods. Second, brain structure ages can be used to compute the deviation from the normal aging process of each brain structure. This feature can be used in a multi-disease classification task for an accurate differential diagnosis at the subject level. Finally, the brain structure age deviations of individuals can be visualized, providing some insights about brain abnormality and helping clinicians in real medical contexts.

Keyword: lidar

Survey on LiDAR Perception in Adverse Weather Conditions

  • Authors: Mariella Dreissig, Dominik Scheuble, Florian Piewak, Joschka Boedecker
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06312
  • Pdf link: https://arxiv.org/pdf/2304.06312
  • Abstract
    Autonomous vehicles rely on a variety of sensors to gather information about their surrounding. The vehicle's behavior is planned based on the environment perception, making its reliability crucial for safety reasons. The active LiDAR sensor is able to create an accurate 3D representation of a scene, making it a valuable addition for environment perception for autonomous vehicles. Due to light scattering and occlusion, the LiDAR's performance change under adverse weather conditions like fog, snow or rain. This limitation recently fostered a large body of research on approaches to alleviate the decrease in perception performance. In this survey, we gathered, analyzed, and discussed different aspects on dealing with adverse weather conditions in LiDAR-based environment perception. We address topics such as the availability of appropriate data, raw point cloud processing and denoising, robust perception algorithms and sensor fusion to mitigate adverse weather induced shortcomings. We furthermore identify the most pressing gaps in the current literature and pinpoint promising research directions.

An Automotive Case Study on the Limits of Approximation for Object Detection

  • Authors: Martí Caro, Hamid Tabani, Jaume Abella, Francesc Moll, Enric Morancho, Ramon Canal, Josep Altet, Antonio Calomarde, Francisco J. Cazorla, Antonio Rubio, Pau Fontova, Jordi Fornt
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.06327
  • Pdf link: https://arxiv.org/pdf/2304.06327
  • Abstract
    The accuracy of camera-based object detection (CBOD) built upon deep learning is often evaluated against the real objects in frames only. However, such simplistic evaluation ignores the fact that many unimportant objects are small, distant, or background, and hence, their misdetections have less impact than those for closer, larger, and foreground objects in domains such as autonomous driving. Moreover, sporadic misdetections are irrelevant since confidence on detections is typically averaged across consecutive frames, and detection devices (e.g. cameras, LiDARs) are often redundant, thus providing fault tolerance. This paper exploits such intrinsic fault tolerance of the CBOD process, and assesses in an automotive case study to what extent CBOD can tolerate approximation coming from multiple sources such as lower precision arithmetic, approximate arithmetic units, and even random faults due to, for instance, low voltage operation. We show that the accuracy impact of those sources of approximation is within 1% of the baseline even when considering the three approximate domains simultaneously, and hence, multiple sources of approximation can be exploited to build highly efficient accelerators for CBOD in cars.

RadarGNN: Transformation Invariant Graph Neural Network for Radar-based Perception

  • Authors: Felix Fent, Philipp Bauerschmidt, Markus Lienkamp
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06547
  • Pdf link: https://arxiv.org/pdf/2304.06547
  • Abstract
    A reliable perception has to be robust against challenging environmental conditions. Therefore, recent efforts focused on the use of radar sensors in addition to camera and lidar sensors for perception applications. However, the sparsity of radar point clouds and the poor data availability remain challenging for current perception methods. To address these challenges, a novel graph neural network is proposed that does not just use the information of the points themselves but also the relationships between the points. The model is designed to consider both point features and point-pair features, embedded in the edges of the graph. Furthermore, a general approach for achieving transformation invariance is proposed which is robust against unseen scenarios and also counteracts the limited data availability. The transformation invariance is achieved by an invariant data representation rather than an invariant model architecture, making it applicable to other methods. The proposed RadarGNN model outperforms all previous methods on the RadarScenes dataset. In addition, the effects of different invariances on the object detection and semantic segmentation quality are investigated. The code is made available as open-source software under https://github.com/TUMFTM/RadarGNN.

Keyword: diffusion

Social Biases through the Text-to-Image Generation Lens

  • Authors: Ranjita Naik, Besmira Nushi
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06034
  • Pdf link: https://arxiv.org/pdf/2304.06034
  • Abstract
    Text-to-Image (T2I) generation is enabling new applications that support creators, designers, and general end users of productivity software by generating illustrative content with high photorealism starting from a given descriptive text as a prompt. Such models are however trained on massive amounts of web data, which surfaces the peril of potential harmful biases that may leak in the generation process itself. In this paper, we take a multi-dimensional approach to studying and quantifying common social biases as reflected in the generated images, by focusing on how occupations, personality traits, and everyday situations are depicted across representations of (perceived) gender, age, race, and geographical location. Through an extensive set of both automated and human evaluation experiments we present findings for two popular T2I models: DALLE-v2 and Stable Diffusion. Our results reveal that there exist severe occupational biases of neutral prompts majorly excluding groups of people from results for both models. Such biases can get mitigated by increasing the amount of specification in the prompt itself, although the prompting mitigation will not address discrepancies in image quality or other usages of the model or its representations in other scenarios. Further, we observe personality traits being associated with only a limited set of people at the intersection of race, gender, and age. Finally, an analysis of geographical location representations on everyday situations (e.g., park, food, weddings) shows that for most situations, images generated through default location-neutral prompts are closer and more similar to images generated for locations of United States and Germany.

$E(3) \times SO(3)$-Equivariant Networks for Spherical Deconvolution in Diffusion MRI

  • Authors: Axel Elaldi, Guido Gerig, Neel Dey
  • Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06103
  • Pdf link: https://arxiv.org/pdf/2304.06103
  • Abstract
    We present Roto-Translation Equivariant Spherical Deconvolution (RT-ESD), an $E(3)\times SO(3)$ equivariant framework for sparse deconvolution of volumes where each voxel contains a spherical signal. Such 6D data naturally arises in diffusion MRI (dMRI), a medical imaging modality widely used to measure microstructure and structural connectivity. As each dMRI voxel is typically a mixture of various overlapping structures, there is a need for blind deconvolution to recover crossing anatomical structures such as white matter tracts. Existing dMRI work takes either an iterative or deep learning approach to sparse spherical deconvolution, yet it typically does not account for relationships between neighboring measurements. This work constructs equivariant deep learning layers which respect to symmetries of spatial rotations, reflections, and translations, alongside the symmetries of voxelwise spherical rotations. As a result, RT-ESD improves on previous work across several tasks including fiber recovery on the DiSCo dataset, deconvolution-derived partial volume estimation on real-world \textit{in vivo} human brain dMRI, and improved downstream reconstruction of fiber tractograms on the Tractometer dataset. Our implementation is available at https://github.com/AxelElaldi/e3so3_conv

PATMAT: Person Aware Tuning of Mask-Aware Transformer for Face Inpainting

  • Authors: Saman Motamed, Jianjin Xu, Chen Henry Wu, Fernando De la Torre
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06107
  • Pdf link: https://arxiv.org/pdf/2304.06107
  • Abstract
    Generative models such as StyleGAN2 and Stable Diffusion have achieved state-of-the-art performance in computer vision tasks such as image synthesis, inpainting, and de-noising. However, current generative models for face inpainting often fail to preserve fine facial details and the identity of the person, despite creating aesthetically convincing image structures and textures. In this work, we propose Person Aware Tuning (PAT) of Mask-Aware Transformer (MAT) for face inpainting, which addresses this issue. Our proposed method, PATMAT, effectively preserves identity by incorporating reference images of a subject and fine-tuning a MAT architecture trained on faces. By using ~40 reference images, PATMAT creates anchor points in MAT's style module, and tunes the model using the fixed anchors to adapt the model to a new face identity. Moreover, PATMAT's use of multiple images per anchor during training allows the model to use fewer reference images than competing methods. We demonstrate that PATMAT outperforms state-of-the-art models in terms of image quality, the preservation of person-specific details, and the identity of the subject. Our results suggest that PATMAT can be a promising approach for improving the quality of personalized face inpainting.

An Edit Friendly DDPM Noise Space: Inversion and Manipulations

  • Authors: Inbar Huberman-Spiegelglas, Vladimir Kulikov, Tomer Michaeli
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06140
  • Pdf link: https://arxiv.org/pdf/2304.06140
  • Abstract
    Denoising diffusion probabilistic models (DDPMs) employ a sequence of white Gaussian noise samples to generate an image. In analogy with GANs, those noise maps could be considered as the latent code associated with the generated image. However, this native noise space does not possess a convenient structure, and is thus challenging to work with in editing tasks. Here, we propose an alternative latent noise space for DDPM that enables a wide range of editing operations via simple means, and present an inversion method for extracting these edit-friendly noise maps for any given image (real or synthetically generated). As opposed to the native DDPM noise space, the edit-friendly noise maps do not have a standard normal distribution and are not statistically independent across timesteps. However, they allow perfect reconstruction of any desired image, and simple transformations on them translate into meaningful manipulations of the output image (e.g., shifting, color edits). Moreover, in text-conditional models, fixing those noise maps while changing the text prompt, modifies semantics while retaining structure. We illustrate how this property enables text-based editing of real images via the diverse DDPM sampling scheme (in contrast to the popular non-diverse DDIM inversion). We also show how it can be used within existing diffusion-based editing methods to improve their quality and diversity.

Intriguing properties of synthetic images: from generative adversarial networks to diffusion models

  • Authors: Riccardo Corvi, Davide Cozzolino, Giovanni Poggi, Koki Nagano, Luisa Verdoliva
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06408
  • Pdf link: https://arxiv.org/pdf/2304.06408
  • Abstract
    Detecting fake images is becoming a major goal of computer vision. This need is becoming more and more pressing with the continuous improvement of synthesis methods based on Generative Adversarial Networks (GAN), and even more with the appearance of powerful methods based on Diffusion Models (DM). Towards this end, it is important to gain insight into which image features better discriminate fake images from real ones. In this paper we report on our systematic study of a large number of image generators of different families, aimed at discovering the most forensically relevant characteristics of real and generated images. Our experiments provide a number of interesting observations and shed light on some intriguing properties of synthetic images: (1) not only the GAN models but also the DM and VQ-GAN (Vector Quantized Generative Adversarial Networks) models give rise to visible artifacts in the Fourier domain and exhibit anomalous regular patterns in the autocorrelation; (2) when the dataset used to train the model lacks sufficient variety, its biases can be transferred to the generated images; (3) synthetic and real images exhibit significant differences in the mid-high frequency signal content, observable in their radial and angular spectral power distributions.

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

  • Authors: Enze Xie, Lewei Yao, Han Shi, Zhili Liu, Daquan Zhou, Zhaoqiang Liu, Jiawei Li, Zhenguo Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06648
  • Pdf link: https://arxiv.org/pdf/2304.06648
  • Abstract
    Diffusion models have proven to be highly effective in generating high-quality images. However, adapting large pre-trained diffusion models to new domains remains an open challenge, which is critical for real-world applications. This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains. DiffFit is embarrassingly simple that only fine-tunes the bias term and newly-added scaling factors in specific layers, yet resulting in significant training speed-up and reduced model storage costs. Compared with full fine-tuning, DiffFit achieves 2$\times$ training speed-up and only needs to store approximately 0.12% of the total model parameters. Intuitive theoretical analysis has been provided to justify the efficacy of scaling factors on fast adaptation. On 8 downstream datasets, DiffFit achieves superior or competitive performances compared to the full fine-tuning while being more efficient. Remarkably, we show that DiffFit can adapt a pre-trained low-resolution generative model to a high-resolution one by adding minimal cost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of 3.02 on ImageNet 512$\times$512 benchmark by fine-tuning only 25 epochs from a public pre-trained ImageNet 256$\times$256 checkpoint while being 30$\times$ more training efficient than the closest competitor.

Learning Controllable 3D Diffusion Models from Single-view Images

  • Authors: Jiatao Gu, Qingzhe Gao, Shuangfei Zhai, Baoquan Chen, Lingjie Liu, Josh Susskind
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06700
  • Pdf link: https://arxiv.org/pdf/2304.06700
  • Abstract
    Diffusion models have recently become the de-facto approach for generative modeling in the 2D domain. However, extending diffusion models to 3D is challenging due to the difficulties in acquiring 3D ground truth data for training. On the other hand, 3D GANs that integrate implicit 3D representations into GANs have shown remarkable 3D-aware generation when trained only on single-view image datasets. However, 3D GANs do not provide straightforward ways to precisely control image synthesis. To address these challenges, We present Control3Diff, a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis for single-view datasets. Control3Diff explicitly models the underlying latent distribution (optionally conditioned on external inputs), thus enabling direct control during the diffusion process. Moreover, our approach is general and applicable to any type of controlling input, allowing us to train it with the same diffusion objective without any auxiliary supervision. We validate the efficacy of Control3Diff on standard image generation benchmarks, including FFHQ, AFHQ, and ShapeNet, using various conditioning inputs such as images, sketches, and text prompts. Please see the project website (\url{https://jiataogu.me/control3diff}) for video comparisons.

DiffusionRig: Learning Personalized Priors for Facial Appearance Editing

  • Authors: Zheng Ding, Xuaner Zhang, Zhihao Xia, Lars Jebe, Zhuowen Tu, Xiuming Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.06711
  • Pdf link: https://arxiv.org/pdf/2304.06711
  • Abstract
    We address the problem of learning person-specific facial priors from a small number (e.g., 20) of portrait photos of the same person. This enables us to edit this specific person's facial appearance, such as expression and lighting, while preserving their identity and high-frequency facial details. Key to our approach, which we dub DiffusionRig, is a diffusion model conditioned on, or "rigged by," crude 3D face models estimated from single in-the-wild images by an off-the-shelf estimator. On a high level, DiffusionRig learns to map simplistic renderings of 3D face models to realistic photos of a given person. Specifically, DiffusionRig is trained in two stages: It first learns generic facial priors from a large-scale face dataset and then person-specific priors from a small portrait photo collection of the person of interest. By learning the CGI-to-photo mapping with such personalized priors, DiffusionRig can "rig" the lighting, facial expression, head pose, etc. of a portrait photo, conditioned only on coarse 3D models while preserving this person's identity and other high-frequency characteristics. Qualitative and quantitative experiments show that DiffusionRig outperforms existing approaches in both identity preservation and photorealism. Please see the project website: https://diffusionrig.github.io for the supplemental material, video, code, and data.

Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction

  • Authors: Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, Hao Su
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06714
  • Pdf link: https://arxiv.org/pdf/2304.06714
  • Abstract
    3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images. Despite numerous task-specific methods, developing a comprehensive model remains challenging. In this paper, we present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects. Previous studies have used two-stage approaches that rely on pretrained NeRFs as real data to train diffusion models. In contrast, we propose a new single-stage training paradigm with an end-to-end objective that jointly optimizes a NeRF auto-decoder and a latent diffusion model, enabling simultaneous 3D reconstruction and prior learning, even from sparsely available views. At test time, we can directly sample the diffusion prior for unconditional generation, or combine it with arbitrary observations of unseen objects for NeRF reconstruction. SSDNeRF demonstrates robust results comparable to or better than leading task-specific methods in unconditional generation and single/sparse-view 3D reconstruction.

Expressive Text-to-Image Generation with Rich Text

  • Authors: Songwei Ge, Taesung Park, Jun-Yan Zhu, Jia-Bin Huang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06720
  • Pdf link: https://arxiv.org/pdf/2304.06720
  • Abstract
    Plain text has become a prevalent interface for text-to-image synthesis. However, its limited customization options hinder users from accurately describing desired outputs. For example, plain text makes it hard to specify continuous quantities, such as the precise RGB color value or importance of each word. Furthermore, creating detailed text prompts for complex scenes is tedious for humans to write and challenging for text encoders to interpret. To address these challenges, we propose using a rich-text editor supporting formats such as font style, size, color, and footnote. We extract each word's attributes from rich text to enable local style control, explicit token reweighting, precise color rendering, and detailed region synthesis. We achieve these capabilities through a region-based diffusion process. We first obtain each word's region based on cross-attention maps of a vanilla diffusion process using plain text. For each region, we enforce its text attributes by creating region-specific detailed prompts and applying region-specific guidance. We present various examples of image generation from rich text and demonstrate that our method outperforms strong baselines with quantitative evaluations.

Keyword: dynamic

Fairness: from the ethical principle to the practice of Machine Learning development as an ongoing agreement with stakeholders

  • Authors: Georgina Curto, Flavio Comim
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06031
  • Pdf link: https://arxiv.org/pdf/2304.06031
  • Abstract
    This paper clarifies why bias cannot be completely mitigated in Machine Learning (ML) and proposes an end-to-end methodology to translate the ethical principle of justice and fairness into the practice of ML development as an ongoing agreement with stakeholders. The pro-ethical iterative process presented in the paper aims to challenge asymmetric power dynamics in the fairness decision making within ML design and support ML development teams to identify, mitigate and monitor bias at each step of ML systems development. The process also provides guidance on how to explain the always imperfect trade-offs in terms of bias to users.

Web 3.0: The Future of Internet

  • Authors: Wensheng Gan, Zhenqiang Ye, Shicheng Wan, Philip S. Yu
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06032
  • Pdf link: https://arxiv.org/pdf/2304.06032
  • Abstract
    With the rapid growth of the Internet, human daily life has become deeply bound to the Internet. To take advantage of massive amounts of data and information on the internet, the Web architecture is continuously being reinvented and upgraded. From the static informative characteristics of Web 1.0 to the dynamic interactive features of Web 2.0, scholars and engineers have worked hard to make the internet world more open, inclusive, and equal. Indeed, the next generation of Web evolution (i.e., Web 3.0) is already coming and shaping our lives. Web 3.0 is a decentralized Web architecture that is more intelligent and safer than before. The risks and ruin posed by monopolists or criminals will be greatly reduced by a complete reconstruction of the Internet and IT infrastructure. In a word, Web 3.0 is capable of addressing web data ownership according to distributed technology. It will optimize the internet world from the perspectives of economy, culture, and technology. Then it promotes novel content production methods, organizational structures, and economic forms. However, Web 3.0 is not mature and is now being disputed. Herein, this paper presents a comprehensive survey of Web 3.0, with a focus on current technologies, challenges, opportunities, and outlook. This article first introduces a brief overview of the history of World Wide Web as well as several differences among Web 1.0, Web 2.0, Web 3.0, and Web3. Then, some technical implementations of Web 3.0 are illustrated in detail. We discuss the revolution and benefits that Web 3.0 brings. Finally, we explore several challenges and issues in this promising area.

Learning solution of nonlinear constitutive material models using physics-informed neural networks: COMM-PINN

  • Authors: Shahed Rezaei, Ahmad Moeineddin, Ali Harandi
  • Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06044
  • Pdf link: https://arxiv.org/pdf/2304.06044
  • Abstract
    We applied physics-informed neural networks to solve the constitutive relations for nonlinear, path-dependent material behavior. As a result, the trained network not only satisfies all thermodynamic constraints but also instantly provides information about the current material state (i.e., free energy, stress, and the evolution of internal variables) under any given loading scenario without requiring initial data. One advantage of this work is that it bypasses the repetitive Newton iterations needed to solve nonlinear equations in complex material models. Additionally, strategies are provided to reduce the required order of derivation for obtaining the tangent operator. The trained model can be directly used in any finite element package (or other numerical methods) as a user-defined material model. However, challenges remain in the proper definition of collocation points and in integrating several non-equality constraints that become active or non-active simultaneously. We tested this methodology on rate-independent processes such as the classical von Mises plasticity model with a nonlinear hardening law, as well as local damage models for interface cracking behavior with a nonlinear softening law. Finally, we discuss the potential and remaining challenges for future developments of this new approach.

Primal-Dual Contextual Bayesian Optimization for Control System Online Optimization with Time-Average Constraints

  • Authors: Wenjie Xu, Yuning Jiang, Bratislav Svetozarevic, Colin N. Jones
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.06104
  • Pdf link: https://arxiv.org/pdf/2304.06104
  • Abstract
    This paper studies the problem of online performance optimization of constrained closed-loop control systems, where both the objective and the constraints are unknown black-box functions affected by exogenous time-varying contextual disturbances. A primal-dual contextual Bayesian optimization algorithm is proposed that achieves sublinear cumulative regret with respect to the dynamic optimal solution under certain regularity conditions. Furthermore, the algorithm achieves zero time-average constraint violation, ensuring that the average value of the constraint function satisfies the desired constraint. The method is applied to both sampled instances from Gaussian processes and a continuous stirred tank reactor parameter tuning problem; simulation results show that the method simultaneously provides close-to-optimal performance and maintains constraint feasibility on average. This contrasts current state-of-the-art methods, which either suffer from large cumulative regret or severe constraint violations for the case studies presented.

IoT trust and reputation: a survey and taxonomy

  • Authors: Muhammad Aaqib, Aftab Ali, Liming Chen, Omar Nibouche
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06119
  • Pdf link: https://arxiv.org/pdf/2304.06119
  • Abstract
    IoT is one of the fastest-growing technologies and it is estimated that more than a billion devices would be utilized across the globe by the end of 2030. To maximize the capability of these connected entities, trust and reputation among IoT entities is essential. Several trust management models have been proposed in the IoT environment; however, these schemes have not fully addressed the IoT devices features, such as devices role, device type and its dynamic behavior in a smart environment. As a result, traditional trust and reputation models are insufficient to tackle these characteristics and uncertainty risks while connecting nodes to the network. Whilst continuous study has been carried out and various articles suggest promising solutions in constrained environments, research on trust and reputation is still at its infancy. In this paper, we carry out a comprehensive literature review on state-of-the-art research on the trust and reputation of IoT devices and systems. Specifically, we first propose a new structure, namely a new taxonomy, to organize the trust and reputation models based on the ways trust is managed. The proposed taxonomy comprises of traditional trust management-based systems and artificial intelligence-based systems, and combine both the classes which encourage the existing schemes to adapt these emerging concepts. This collaboration between the conventional mathematical and the advanced ML models result in design schemes that are more robust and efficient. Then we drill down to compare and analyse the methods and applications of these systems based on community-accepted performance metrics, e.g. scalability, delay, cooperativeness and efficiency. Finally, built upon the findings of the analysis, we identify and discuss open research issues and challenges, and further speculate and point out future research directions.

Robust and Context-Aware Real-Time Collaborative Robot Handling via Dynamic Gesture Commands

  • Authors: Rui Chen, Alvin Shek, Changliu Liu
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06175
  • Pdf link: https://arxiv.org/pdf/2304.06175
  • Abstract
    This paper studies real-time collaborative robot (cobot) handling, where the cobot maneuvers an object under human dynamic gesture commands. Enabling dynamic gesture commands is useful when the human needs to avoid direct contact with the robot or the object handled by the robot. However, the key challenge lies in the heterogeneity in human behaviors and the stochasticity in the perception of dynamic gestures, which requires the robot handling policy to be adaptable and robust. To address these challenges, we introduce Conditional Collaborative Handling Process (CCHP) to encode a contextaware cobot handling policy and a procedure to learn such policy from human-human collaboration. We thoroughly evaluate the adaptability and robustness of CCHP and apply our approach to a real-time cobot assembly task with Kinova Gen3 robot arm. Results show that our method leads to significantly less human effort and smoother human-robot collaboration than state-of-the-art rule-based approach even with first-time users.

Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

  • Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.06178
  • Pdf link: https://arxiv.org/pdf/2304.06178
  • Abstract
    Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.

Do "bad" citations have "good" effects?

  • Authors: Honglin Bao, Misha Teplitskiy
  • Subjects: Digital Libraries (cs.DL); Computers and Society (cs.CY); Multiagent Systems (cs.MA); Adaptation and Self-Organizing Systems (nlin.AO)
  • Arxiv link: https://arxiv.org/abs/2304.06190
  • Pdf link: https://arxiv.org/pdf/2304.06190
  • Abstract
    The scientific community generally discourages authors of research papers from citing papers that did not influence them because such "rhetorical" citations are assumed to degrade the literature and incentives for good work. Intuitively, a world where authors cite only substantively appears attractive. We argue that manding substantive citing may have underappreciated consequences on the allocation of attention and dynamism. We develop a novel agent-based model in which agents cite substantively and rhetorically. Agents first select papers to read based on their expected quality, read them and observe their actual quality, become influenced by those that are sufficiently good, and substantively cite them. Next, agents fill any remaining slots in the reference lists with papers that support their claims, regardless of whether they were actually influential. By turning rhetorical citing on-and-off, we find that rhetorical citing increases the correlation between quality and citations, increases citation churn, and reduces citation inequality. This occurs because rhetorical citing redistributes some citations from a stable set of elite-quality papers to a more dynamic set with high-to-moderate quality and high rhetorical value. Increasing the size of reference lists, often seen as an undesirable trend, amplifies the effects. In sum, rhetorical citing helps deconcentrate attention and makes it easier to displace incumbent ideas, so whether it is indeed undesirable depends on the metrics used to judge desirability.

Learning Over All Contracting and Lipschitz Closed-Loops for Partially-Observed Nonlinear Systems

  • Authors: Nicholas H. Barbara, Ruigang Wang, Ian R. Manchester
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.06193
  • Pdf link: https://arxiv.org/pdf/2304.06193
  • Abstract
    This paper presents a policy parameterization for learning-based control on nonlinear, partially-observed dynamical systems. The parameterization is based on a nonlinear version of the Youla parameterization and the recently proposed Recurrent Equilibrium Network (REN) class of models. We prove that the resulting Youla-REN parameterization automatically satisfies stability (contraction) and user-tunable robustness (Lipschitz) conditions on the closed-loop system. This means it can be used for safe learning-based control with no additional constraints or projections required to enforce stability or robustness. We test the new policy class in simulation on two reinforcement learning tasks: 1) magnetic suspension, and 2) inverting a rotary-arm pendulum. We find that the Youla-REN performs similarly to existing learning-based and optimal control methods while also ensuring stability and exhibiting improved robustness to adversarial disturbances.

Sub-Optimal Moving Horizon Estimation in Feedback Control of Linear Constrained Systems

  • Authors: Yujia Yang, Chris Manzie, Ye Pu
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06216
  • Pdf link: https://arxiv.org/pdf/2304.06216
  • Abstract
    Moving horizon estimation (MHE) offers benefits relative to other estimation approaches by its ability to explicitly handle constraints, but suffers increased computation cost. To help enable MHE on platforms with limited computation power, we propose to solve the optimization problem underlying MHE sub-optimally for a fixed number of optimization iterations per time step. The stability of the closed-loop system is analyzed using the small-gain theorem by considering the closed-loop controlled system, the optimization algorithm dynamics, and the estimation error dynamics as three interconnected subsystems. By assuming incremental input/output-to-state stability ({\delta}- IOSS) of the system and imposing standard ISS conditions on the controller, we derive conditions on the iteration number such that the interconnected system is input-to-state stable (ISS) w.r.t. the external disturbances. A simulation using an MHE- MPC estimator-controller pair is used to validate the results.

Physics-informed radial basis network (PIRBN): A local approximation neural network for solving nonlinear PDEs

  • Authors: Jinshuai Bai, Gui-Rong Liu, Ashish Gupta, Laith Alzubaidi, Xi-Qiao Feng, YuanTong Gu
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06234
  • Pdf link: https://arxiv.org/pdf/2304.06234
  • Abstract
    Our recent intensive study has found that physics-informed neural networks (PINN) tend to be local approximators after training. This observation leads to this novel physics-informed radial basis network (PIRBN), which can maintain the local property throughout the entire training process. Compared to deep neural networks, a PIRBN comprises of only one hidden layer and a radial basis "activation" function. Under appropriate conditions, we demonstrated that the training of PIRBNs using gradient descendent methods can converge to Gaussian processes. Besides, we studied the training dynamics of PIRBN via the neural tangent kernel (NTK) theory. In addition, comprehensive investigations regarding the initialisation strategies of PIRBN were conducted. Based on numerical examples, PIRBN has been demonstrated to be more effective and efficient than PINN in solving PDEs with high-frequency features and ill-posed computational domains. Moreover, the existing PINN numerical techniques, such as adaptive learning, decomposition and different types of loss functions, are applicable to PIRBN. The programs that can regenerate all numerical results can be found at https://github.com/JinshuaiBai/PIRBN.

Loosely Coupled Odometry, UWB Ranging, and Cooperative Spatial Detection for Relative Monte-Carlo Multi-Robot Localization

  • Authors: Xianjia Yu, Paola Torrico Morrn, Sahar Salimpour, Jorge Pena Queralta, Tomi Westerlund
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06264
  • Pdf link: https://arxiv.org/pdf/2304.06264
  • Abstract
    As mobile robots become more ubiquitous, their deployments grow across use cases where GNSS positioning is either unavailable or unreliable. This has led to increased interest in multi-modal relative localization methods. Complementing onboard odometry, ranging allows for relative state estimation, with ultra-wideband (UWB) ranging having gained widespread recognition due to its low cost and centimeter-level out-of-box accuracy. Infrastructure-free localization methods allow for more dynamic, ad-hoc, and flexible deployments, yet they have received less attention from the research community. In this work, we propose a cooperative relative multi-robot localization where we leverage inter-robot ranging and simultaneous spatial detections of objects in the environment. To achieve this, we equip robots with a single UWB transceiver and a stereo camera. We propose a novel Monte-Carlo approach to estimate relative states by either employing only UWB ranges or dynamically integrating simultaneous spatial detections from the stereo cameras. We also address the challenges for UWB ranging error mitigation, especially in non-line-of-sight, with a study on different LSTM networks to estimate the ranging error. The proposed approach has multiple benefits. First, we show that a single range is enough to estimate the accurate relative states of two robots when fusing odometry measurements. Second, our experiments also demonstrate that our approach surpasses traditional methods such as multilateration in terms of accuracy. Third, to increase accuracy even further, we allow for the integration of cooperative spatial detections. Finally, we show how ROS 2 and Zenoh can be integrated to build a scalable wireless communication solution for multi-robot systems. The experimental validation includes real-time deployment and autonomous navigation based on the relative positioning method.

Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning

  • Authors: Wenli Xiao, Yiwei Lyu, John Dolan
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06281
  • Pdf link: https://arxiv.org/pdf/2304.06281
  • Abstract
    Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Although shielding with Linear Temporal Logic (LTL) is a promising formal method to ensure safety in single-agent Reinforcement Learning (RL), it results in conservative behaviors when scaling to multi-agent scenarios. Additionally, it poses computational challenges for synthesizing shields in complex multi-agent environments. This work introduces Model-based Dynamic Shielding (MBDS) to support MARL algorithm design. Our algorithm synthesizes distributive shields, which are reactive systems running in parallel with each MARL agent, to monitor and rectify unsafe behaviors. The shields can dynamically split, merge, and recompute based on agents' states. This design enables efficient synthesis of shields to monitor agents in complex environments without coordination overheads. We also propose an algorithm to synthesize shields without prior knowledge of the dynamics model. The proposed algorithm obtains an approximate world model by interacting with the environment during the early stage of exploration, making our MBDS enjoy formal safety guarantees with high probability. We demonstrate in simulations that our framework can surpass existing baselines in terms of safety guarantees and learning performance.

Neural State-Space Models: Empirical Evaluation of Uncertainty Quantification

  • Authors: Marco Forgione, Dario Piga
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06349
  • Pdf link: https://arxiv.org/pdf/2304.06349
  • Abstract
    Effective quantification of uncertainty is an essential and still missing step towards a greater adoption of deep-learning approaches in different applications, including mission-critical ones. In particular, investigations on the predictive uncertainty of deep-learning models describing non-linear dynamical systems are very limited to date. This paper is aimed at filling this gap and presents preliminary results on uncertainty quantification for system identification with neural state-space models. We frame the learning problem in a Bayesian probabilistic setting and obtain posterior distributions for the neural network's weights and outputs through approximate inference techniques. Based on the posterior, we construct credible intervals on the outputs and define a surprise index which can effectively diagnose usage of the model in a potentially dangerous out-of-distribution regime, where predictions cannot be trusted.

Emergence of Symbols in Neural Networks for Semantic Understanding and Communication

  • Authors: Yang Chen, Liangxuan Guo, Shan Yu
  • Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Symbolic Computation (cs.SC); Neurons and Cognition (q-bio.NC)
  • Arxiv link: https://arxiv.org/abs/2304.06377
  • Pdf link: https://arxiv.org/pdf/2304.06377
  • Abstract
    Being able to create meaningful symbols and proficiently use them for higher cognitive functions such as communication, reasoning, planning, etc., is essential and unique for human intelligence. Current deep neural networks are still far behind human's ability to create symbols for such higher cognitive functions. Here we propose a solution, named SEA-net, to endow neural networks with ability of symbol creation, semantic understanding and communication. SEA-net generates symbols that dynamically configure the network to perform specific tasks. These symbols capture compositional semantic information that enables the system to acquire new functions purely by symbolic manipulation or communication. In addition, we found that these self-generated symbols exhibit an intrinsic structure resembling that of natural language, suggesting a common framework underlying the generation and understanding of symbols in both human brains and artificial neural networks. We hope that it will be instrumental in producing more capable systems in the future that can synergize the strengths of connectionist and symbolic approaches for AI.

Energy-Efficient GPU Clusters Scheduling for Deep Learning

  • Authors: Diandian Gu, Xintong Xie, Gang Huang, Xin Jin, Xuanzhe Liu
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.06381
  • Pdf link: https://arxiv.org/pdf/2304.06381
  • Abstract
    Training deep neural networks (DNNs) is a major workload in datacenters today, resulting in a tremendously fast growth of energy consumption. It is important to reduce the energy consumption while completing the DL training jobs early in data centers. In this paper, we propose PowerFlow, a GPU clusters scheduler that reduces the average Job Completion Time (JCT) under an energy budget. We first present performance models for DL training jobs to predict the throughput and energy consumption performance with different configurations. Based on the performance models, PowerFlow dynamically allocates GPUs and adjusts the GPU-level or job-level configurations of DL training jobs. PowerFlow applies network packing and buddy allocation to job placement, thus avoiding extra energy consumed by cluster fragmentations. Evaluation results show that under the same energy consumption, PowerFlow improves the average JCT by 1.57 - 3.39 x at most, compared to competitive baselines.

TransHP: Image Classification with Hierarchical Prompting

  • Authors: Wenhao Wang, Yifan Sun, Wei Li, Yi Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06385
  • Pdf link: https://arxiv.org/pdf/2304.06385
  • Abstract
    This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task. Different from prior HIC methods, our hierarchical prompting is the first to explicitly inject ancestor-class information as a tokenized hint that benefits the descendant-class discrimination. We think it well imitates human visual recognition, i.e., humans may use the ancestor class as a prompt to draw focus on the subtle differences among descendant classes. We model this prompting mechanism into a Transformer with Hierarchical Prompting (TransHP). TransHP consists of three steps: 1) learning a set of prompt tokens to represent the coarse (ancestor) classes, 2) on-the-fly predicting the coarse class of the input image at an intermediate block, and 3) injecting the prompt token of the predicted coarse class into the intermediate feature. Though the parameters of TransHP maintain the same for all input images, the injected coarse-class prompt conditions (modifies) the subsequent feature extraction and encourages a dynamic focus on relatively subtle differences among the descendant classes. Extensive experiments show that TransHP improves image classification on accuracy (e.g., improving ViT-B/16 by +2.83% ImageNet classification accuracy), training data efficiency (e.g., +12.69% improvement under 10% ImageNet training data), and model explainability. Moreover, TransHP also performs favorably against prior HIC methods, showing that TransHP well exploits the hierarchical information.

Communicating Actor Automata -- Modelling Erlang Processes as Communicating Machines

  • Authors: Dominic Orchard (University of Kent, UK), Mihail Munteanu (Masabi Ltd.), Paulo Torrens (University of Kent, UK)
  • Subjects: Programming Languages (cs.PL)
  • Arxiv link: https://arxiv.org/abs/2304.06395
  • Pdf link: https://arxiv.org/pdf/2304.06395
  • Abstract
    Brand and Zafiropulo's notion of Communicating Finite-State Machines (CFSMs) provides a succinct and powerful model of message-passing concurrency, based around channels. However, a major variant of message-passing concurrency is not readily captured by CFSMs: the actor model. In this work, we define a variant of CFSMs, called Communicating Actor Automata, to capture the actor model of concurrency as provided by Erlang: with mailboxes, from which messages are received according to repeated application of pattern matching. Furthermore, this variant of CFSMs supports dynamic process topologies, capturing common programming idioms in the context of actor-based message-passing concurrency. This gives a new basis for modelling, specifying, and verifying Erlang programs. We also consider a class of CAAs that give rise to freedom from race conditions.

Event-based tracking of human hands

  • Authors: Laura Duarte, Mohammad Safeea, Pedro Neto
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06534
  • Pdf link: https://arxiv.org/pdf/2304.06534
  • Abstract
    This paper proposes a novel method for human hands tracking using data from an event camera. The event camera detects changes in brightness, measuring motion, with low latency, no motion blur, low power consumption and high dynamic range. Captured frames are analysed using lightweight algorithms reporting 3D hand position data. The chosen pick-and-place scenario serves as an example input for collaborative human-robot interactions and in obstacle avoidance for human-robot safety applications. Events data are pre-processed into intensity frames. The regions of interest (ROI) are defined through object edge event activity, reducing noise. ROI features are extracted for use in-depth perception. Event-based tracking of human hand demonstrated feasible, in real time and at a low computational cost. The proposed ROI-finding method reduces noise from intensity images, achieving up to 89% of data reduction in relation to the original, while preserving the features. The depth estimation error in relation to ground truth (measured with wearables), measured using dynamic time warping and using a single event camera, is from 15 to 30 millimetres, depending on the plane it is measured. Tracking of human hands in 3D space using a single event camera data and lightweight algorithms to define ROI features (hands tracking in space).

DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos

  • Authors: Qi Zhao, M. Salman Asif, Zhan Ma
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06544
  • Pdf link: https://arxiv.org/pdf/2304.06544
  • Abstract
    Existing implicit neural representation (INR) methods do not fully exploit spatiotemporal redundancies in videos. Index-based INRs ignore the content-specific spatial features and hybrid INRs ignore the contextual dependency on adjacent frames, leading to poor modeling capability for scenes with large motion or dynamics. We analyze this limitation from the perspective of function fitting and reveal the importance of frame difference. To use explicit motion information, we propose Difference Neural Representation for Videos (DNeRV), which consists of two streams for content and frame difference. We also introduce a collaborative content unit for effective feature fusion. We test DNeRV for video compression, inpainting, and interpolation. DNeRV achieves competitive results against the state-of-the-art neural compression approaches and outperforms existing implicit methods on downstream inpainting and interpolation for $960 \times 1920$ videos.

Class-Incremental Learning of Plant and Disease Detection: Growing Branches with Knowledge Distillation

  • Authors: Mathieu Pagé Fortin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06619
  • Pdf link: https://arxiv.org/pdf/2304.06619
  • Abstract
    This paper investigates the problem of class-incremental object detection for agricultural applications where a model needs to learn new plant species and diseases incrementally without forgetting the previously learned ones. We adapt two public datasets to include new categories over time, simulating a more realistic and dynamic scenario. We then compare three class-incremental learning methods that leverage different forms of knowledge distillation to mitigate catastrophic forgetting. Our experiments show that all three methods suffer from catastrophic forgetting, but the recent Dynamic Y-KD approach, which additionally uses a dynamic architecture that grows new branches to learn new tasks, outperforms ILOD and Faster-ILOD in most scenarios both on new and old classes. These results highlight the challenges and opportunities of continual object detection for agricultural applications. In particular, the large intra-class and small inter-class variability that is typical of plant images exacerbate the difficulty of learning new categories without interfering with previous knowledge. We publicly release our code to encourage future work.

Robustness Measures and Monitors for Time Window Temporal Logic

  • Authors: Ahmad Ahmad, Cristian-Ioan Vasile, Roberto Tron, Calin Belta
  • Subjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.06645
  • Pdf link: https://arxiv.org/pdf/2304.06645
  • Abstract
    Temporal logics (TLs) have been widely used to formalize interpretable tasks for cyber-physical systems. Time Window Temporal Logic (TWTL) has been recently proposed as a specification language for dynamical systems. In particular, it can easily express robotic tasks, and it allows for efficient, automata-based verification and synthesis of control policies for such systems. In this paper, we define two quantitative semantics for this logic, and two corresponding monitoring algorithms, which allow for real-time quantification of satisfaction of formulas by trajectories of discrete-time systems. We demonstrate the new semantics and their runtime monitors on numerical examples.

ProtoDiv: Prototype-guided Division of Consistent Pseudo-bags for Whole-slide Image Classification

  • Authors: Rui Yang, Pei Liu, Luping Ji
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06652
  • Pdf link: https://arxiv.org/pdf/2304.06652
  • Abstract
    Due to the limitations of inadequate Whole-Slide Image (WSI) samples with weak labels, pseudo-bag-based multiple instance learning (MIL) appears as a vibrant prospect in WSI classification. However, the pseudo-bag dividing scheme, often crucial for classification performance, is still an open topic worth exploring. Therefore, this paper proposes a novel scheme, ProtoDiv, using a bag prototype to guide the division of WSI pseudo-bags. Rather than designing complex network architecture, this scheme takes a plugin-and-play approach to safely augment WSI data for effective training while preserving sample consistency. Furthermore, we specially devise an attention-based prototype that could be optimized dynamically in training to adapt to a classification task. We apply our ProtoDiv scheme on seven baseline models, and then carry out a group of comparison experiments on two public WSI datasets. Experiments confirm our ProtoDiv could usually bring obvious performance improvements to WSI classification.

D-SVM over Networked Systems with Non-Ideal Linking Conditions

  • Authors: Mohammadreza Doostmohammadian, Alireza Aghasi, Houman Zarrabi
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.06667
  • Pdf link: https://arxiv.org/pdf/2304.06667
  • Abstract
    This paper considers distributed optimization algorithms, with application in binary classification via distributed support-vector-machines (D-SVM) over multi-agent networks subject to some link nonlinearities. The agents solve a consensus-constraint distributed optimization cooperatively via continuous-time dynamics, while the links are subject to strongly sign-preserving odd nonlinear conditions. Logarithmic quantization and clipping (saturation) are two examples of such nonlinearities. In contrast to existing literature that mostly considers ideal links and perfect information exchange over linear channels, we show how general sector-bounded models affect the convergence to the optimizer (i.e., the SVM classifier) over dynamic balanced directed networks. In general, any odd sector-bounded nonlinear mapping can be applied to our dynamics. The main challenge is to show that the proposed system dynamics always have one zero eigenvalue (associated with the consensus) and the other eigenvalues all have negative real parts. This is done by recalling arguments from matrix perturbation theory. Then, the solution is shown to converge to the agreement state under certain conditions. For example, the gradient tracking (GT) step size is tighter than the linear case by factors related to the upper/lower sector bounds. To the best of our knowledge, no existing work in distributed optimization and learning literature considers non-ideal link conditions.

Inertia-Aware Microgrid Investment Planning Using Tractable Decomposition Algorithms

  • Authors: Agnes Marjorie Nakiganda, Shahab Dehghan, Petros Aristidou
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06674
  • Pdf link: https://arxiv.org/pdf/2304.06674
  • Abstract
    The integration of the frequency dynamics into Micro-Grid (MG) investment and operational planning problems is vital in ensuring the security of the system in the post-contingency states. However, the task of including transient security constraints in MG planning problems is non-trivial. This is due to the highly non-linear and non-convex nature of the analytical closed form of the frequency metrics (e.g., frequency nadir) and power flow constraints. To handle this issue, this paper presents two algorithms for decomposing the MG investment planning problem into multiple levels to enhance computational tractability and optimality. Furthermore, the sensitivity of the decisions made at each level is captured by corresponding dual cutting planes to model feasible secure regions. This, in turn, ensures both the optimal determination and placement of inertia services and accelerates the convergence of the proposed decomposition algorithms. The efficient and effective performance of the proposed algorithms is tested and verified on an 18-bus Low Voltage (LV) network and a 30-bus Medium Voltage (MV) network under various operating scenarios.

OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems

  • Authors: Jiachang Liu, Sam Rosen, Chudi Zhong, Cynthia Rudin
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.06686
  • Pdf link: https://arxiv.org/pdf/2304.06686
  • Abstract
    We consider an important problem in scientific discovery, identifying sparse governing equations for nonlinear dynamical systems. This involves solving sparse ridge regression problems to provable optimality in order to determine which terms drive the underlying dynamics. We propose a fast algorithm, OKRidge, for sparse ridge regression, using a novel lower bound calculation involving, first, a saddle point formulation, and from there, either solving (i) a linear system or (ii) using an ADMM-based approach, where the proximal operators can be efficiently evaluated by solving another linear system and an isotonic regression problem. We also propose a method to warm-start our solver, which leverages a beam search. Experimentally, our methods attain provable optimality with run times that are orders of magnitude faster than those of the existing MIP formulations solved by the commercial solver Gurobi.

Representing Volumetric Videos as Dynamic MLP Maps

  • Authors: Sida Peng, Yunzhi Yan, Qing Shuai, Hujun Bao, Xiaowei Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06717
  • Pdf link: https://arxiv.org/pdf/2304.06717
  • Abstract
    This paper introduces a novel representation of volumetric videos for real-time view synthesis of dynamic scenes. Recent advances in neural scene representations demonstrate their remarkable capability to model and render complex static scenes, but extending them to represent dynamic scenes is not straightforward due to their slow rendering speed or high storage cost. To solve this problem, our key idea is to represent the radiance field of each frame as a set of shallow MLP networks whose parameters are stored in 2D grids, called MLP maps, and dynamically predicted by a 2D CNN decoder shared by all frames. Representing 3D scenes with shallow MLPs significantly improves the rendering speed, while dynamically predicting MLP parameters with a shared 2D CNN instead of explicitly storing them leads to low storage cost. Experiments show that the proposed approach achieves state-of-the-art rendering quality on the NHR and ZJU-MoCap datasets, while being efficient for real-time rendering with a speed of 41.7 fps for $512 \times 512$ images on an RTX 3090 GPU. The code is available at https://zju3dv.github.io/mlp_maps/.

New submissions for Mon, 20 Mar 23

Keyword: pruning

Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution

  • Authors: Jiamian Wang, Huan Wang, Yulun Zhang, Yun Fu, Zhiqiang Tao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09650
  • Pdf link: https://arxiv.org/pdf/2303.09650
  • Abstract
    The field of image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. However, prevailing SR models suffer from prohibitive memory footprint and intensive computations, which limits further deployment on computational-constrained platforms. In this work, we investigate the potential of network pruning for super-resolution to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. Two main challenges remain in applying pruning methods for SR. First, the widely-used filter pruning technique reflects limited granularity and restricted adaptability to diverse network structures. Second, existing pruning methods generally operate upon a pre-trained network for the sparse structure determination, failing to get rid of dense model training in the traditional SR paradigm. To address these challenges, we adopt unstructured pruning with sparse models directly trained from scratch. Specifically, we propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly initialized network at each iteration and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly. We observe that the proposed ISS-P could dynamically learn sparse structures adapting to the optimization process and preserve the sparse model's trainability by yielding a more regularized gradient throughput. Experiments on benchmark datasets demonstrate the effectiveness of the proposed ISS-P compared with state-of-the-art methods over diverse network architectures.

Dynamic Structure Pruning for Compressing CNNs

  • Authors: Jun-Hyung Park, Yeachan Kim, Junho Kim, Joon-Young Choi, SangKeun Lee
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.09736
  • Pdf link: https://arxiv.org/pdf/2303.09736
  • Abstract
    Structure pruning is an effective method to compress and accelerate neural networks. While filter and channel pruning are preferable to other structure pruning methods in terms of realistic acceleration and hardware compatibility, pruning methods with a finer granularity, such as intra-channel pruning, are expected to be capable of yielding more compact and computationally efficient networks. Typical intra-channel pruning methods utilize a static and hand-crafted pruning granularity due to a large search space, which leaves room for improvement in their pruning performance. In this work, we introduce a novel structure pruning method, termed as dynamic structure pruning, to identify optimal pruning granularities for intra-channel pruning. In contrast to existing intra-channel pruning methods, the proposed method automatically optimizes dynamic pruning granularities in each layer while training deep neural networks. To achieve this, we propose a differentiable group learning method designed to efficiently learn a pruning granularity based on gradient-based learning of filter groups. The experimental results show that dynamic structure pruning achieves state-of-the-art pruning performance and better realistic acceleration on a GPU compared with channel pruning. In particular, it reduces the FLOPs of ResNet50 by 71.85% without accuracy degradation on the ImageNet dataset. Our code is available at https://github.com/irishev/DSP.

Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration

  • Authors: Zheng Qin, Hao Yu, Changjian Wang, Yuxing Peng, Kai Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09950
  • Pdf link: https://arxiv.org/pdf/2303.09950
  • Abstract
    We study the problem of outlier correspondence pruning for non-rigid point cloud registration. In rigid registration, spatial consistency has been a commonly used criterion to discriminate outliers from inliers. It measures the compatibility of two correspondences by the discrepancy between the respective distances in two point clouds. However, spatial consistency no longer holds in non-rigid cases and outlier rejection for non-rigid registration has not been well studied. In this work, we propose Graph-based Spatial Consistency Network (GraphSCNet) to filter outliers for non-rigid registration. Our method is based on the fact that non-rigid deformations are usually locally rigid, or local shape preserving. We first design a local spatial consistency measure over the deformation graph of the point cloud, which evaluates the spatial compatibility only between the correspondences in the vicinity of a graph node. An attention-based non-rigid correspondence embedding module is then devised to learn a robust representation of non-rigid correspondences from local spatial consistency. Despite its simplicity, GraphSCNet effectively improves the quality of the putative correspondences and attains state-of-the-art performance on three challenging benchmarks. Our code and models are available at https://github.com/qinzheng93/GraphSCNet.

Keyword: neural\ architecture\ search

There is no result

Keyword: 3d object detection

GOOD: General Optimization-based Fusion for 3D Object Detection via LiDAR-Camera Object Candidates

  • Authors: Bingqi Shen, Shuwei Dai, Yuyin Chen, Rong Xiong, Yue Wang, Yanmei Jiao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.09800
  • Pdf link: https://arxiv.org/pdf/2303.09800
  • Abstract
    3D object detection serves as the core basis of the perception tasks in autonomous driving. Recent years have seen the rapid progress of multi-modal fusion strategies for more robust and accurate 3D object detection. However, current researches for robust fusion are all learning-based frameworks, which demand a large amount of training data and are inconvenient to implement in new scenes. In this paper, we propose GOOD, a general optimization-based fusion framework that can achieve satisfying detection without training additional models and is available for any combinations of 2D and 3D detectors to improve the accuracy and robustness of 3D detection. First we apply the mutual-sided nearest-neighbor probability model to achieve the 3D-2D data association. Then we design an optimization pipeline that can optimize different kinds of instances separately based on the matching result. Apart from this, the 3D MOT method is also introduced to enhance the performance aided by previous frames. To the best of our knowledge, this is the first optimization-based late fusion framework for multi-modal 3D object detection which can be served as a baseline for subsequent research. Experiments on both nuScenes and KITTI datasets are carried out and the results show that GOOD outperforms by 9.1% on mAP score compared with PointPillars and achieves competitive results with the learning-based late fusion CLOCs.

A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving

  • Authors: Wanshui Gan, Ningkai Mo, Hongbin Xu, Naoto Yokoya
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.10076
  • Pdf link: https://arxiv.org/pdf/2303.10076
  • Abstract
    The task of estimating 3D occupancy from surrounding view images is an exciting development in the field of autonomous driving, following the success of Birds Eye View (BEV) perception.This task provides crucial 3D attributes of the driving environment, enhancing the overall understanding and perception of the surrounding space. However, there is still a lack of a baseline to define the task, such as network design, optimization, and evaluation. In this work, we present a simple attempt for 3D occupancy estimation, which is a CNN-based framework designed to reveal several key factors for 3D occupancy estimation. In addition, we explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation, stereo matching, and BEV perception (3D object detection and map segmentation), which could advance the study on 3D occupancy estimation. For evaluation, we propose a simple sampling strategy to define the metric for occupancy evaluation, which is flexible for current public datasets. Moreover, we establish a new benchmark in terms of the depth estimation metric, where we compare our proposed method with monocular depth estimation methods on the DDAD and Nuscenes datasets.The relevant code will be available in https://github.com/GANWANSHUI/SimpleOccupancy

Keyword: voxel

A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

  • Authors: Xiaotao Hu, Zhewei Huang, Ailin Huang, Jun Xu, Shuchang Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09875
  • Pdf link: https://arxiv.org/pdf/2303.09875
  • Abstract
    The performance of video prediction has been greatly boosted by advanced deep neural networks. However, most of the current methods suffer from large model sizes and require extra inputs, e.g., semantic/depth maps, for promising performance. For efficiency consideration, in this paper, we propose a Dynamic Multi-scale Voxel Flow Network (DMVFN) to achieve better video prediction performance at lower computational costs with only RGB images, than previous methods. The core of our DMVFN is a differentiable routing module that can effectively perceive the motion scales of video frames. Once trained, our DMVFN selects adaptive sub-networks for different inputs at the inference stage. Experiments on several benchmarks demonstrate that our DMVFN is an order of magnitude faster than Deep Voxel Flow and surpasses the state-of-the-art iterative-based OPT on generated image quality. Our code and demo are available at https://huxiaotaostasy.github.io/DMVFN/.

Semantic Scene Completion with Cleaner Self

  • Authors: Fengyun Wang, Dong Zhang, Hanwang Zhang, Jinhui Tang, Qianru Sun
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09977
  • Pdf link: https://arxiv.org/pdf/2303.09977
  • Abstract
    Semantic Scene Completion (SSC) transforms an image of single-view depth and/or RGB 2D pixels into 3D voxels, each of whose semantic labels are predicted. SSC is a well-known ill-posed problem as the prediction model has to "imagine" what is behind the visible surface, which is usually represented by Truncated Signed Distance Function (TSDF). Due to the sensory imperfection of the depth camera, most existing methods based on the noisy TSDF estimated from depth values suffer from 1) incomplete volumetric predictions and 2) confused semantic labels. To this end, we use the ground-truth 3D voxels to generate a perfect visible surface, called TSDF-CAD, and then train a "cleaner" SSC model. As the model is noise-free, it is expected to focus more on the "imagination" of unseen voxels. Then, we propose to distill the intermediate "cleaner" knowledge into another model with noisy TSDF input. In particular, we use the 3D occupancy feature and the semantic relations of the "cleaner self" to supervise the counterparts of the "noisy self" to respectively address the above two incorrect predictions. Experimental results validate that our method improves the noisy counterparts with 3.1% IoU and 2.2% mIoU for measuring scene completion and SSC, and also achieves new state-of-the-art accuracy on the popular NYU dataset.

Gyroid-like metamaterials: Topology optimization and Deep Learning

  • Authors: Asha Viswanath, Diab W Abueidda, Mohamad Modrek, Kamran A Khan, Seid Koric, Rashid K. Abu Al-Rub
  • Subjects: Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2303.10007
  • Pdf link: https://arxiv.org/pdf/2303.10007
  • Abstract
    Triply periodic minimal surface (TPMS) metamaterials characterized by mathematically-controlled topologies exhibit better mechanical properties compared to uniform structures. The unit cell topology of such metamaterials can be further optimized to improve a desired mechanical property for a specific application. However, such inverse design involves multiple costly 3D finite element analyses in topology optimization and hence has not been attempted. Data-driven models have recently gained popularity as surrogate models in the geometrical design of metamaterials. Gyroid-like unit cells are designed using a novel voxel algorithm, a homogenization-based topology optimization, and a Heaviside filter to attain optimized densities of 0-1 configuration. Few optimization data are used as input-output for supervised learning of the topology optimization process from a 3D CNN model. These models could then be used to instantaneously predict the optimized unit cell geometry for any topology parameters, thus alleviating the need to run any topology optimization for future design. The high accuracy of the model was demonstrated by a low mean square error metric and a high dice coefficient metric. This accelerated design of 3D metamaterials opens the possibility of designing any computationally costly problems involving complex geometry of metamaterials with multi-objective properties or multi-scale applications.

Keyword: lidar

Exorcising ''Wraith'': Protecting LiDAR-based Object Detector in Automated Driving System from Appearing Attacks

  • Authors: Qifan Xiao, Xudong Pan, Yifan Lu, Mi Zhang, Jiarun Dai, Min Yang
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.09731
  • Pdf link: https://arxiv.org/pdf/2303.09731
  • Abstract
    Automated driving systems rely on 3D object detectors to recognize possible obstacles from LiDAR point clouds. However, recent works show the adversary can forge non-existent cars in the prediction results with a few fake points (i.e., appearing attack). By removing statistical outliers, existing defenses are however designed for specific attacks or biased by predefined heuristic rules. Towards more comprehensive mitigation, we first systematically inspect the mechanism of recent appearing attacks: Their common weaknesses are observed in crafting fake obstacles which (i) have obvious differences in the local parts compared with real obstacles and (ii) violate the physical relation between depth and point density. In this paper, we propose a novel plug-and-play defensive module which works by side of a trained LiDAR-based object detector to eliminate forged obstacles where a major proportion of local parts have low objectness, i.e., to what degree it belongs to a real object. At the core of our module is a local objectness predictor, which explicitly incorporates the depth information to model the relation between depth and point density, and predicts each local part of an obstacle with an objectness score. Extensive experiments show, our proposed defense eliminates at least 70% cars forged by three known appearing attacks in most cases, while, for the best previous defense, less than 30% forged cars are eliminated. Meanwhile, under the same circumstance, our defense incurs less overhead for AP/precision on cars compared with existing defenses. Furthermore, We validate the effectiveness of our proposed defense on simulation-based closed-loop control driving tests in the open-source system of Baidu's Apollo.

Identifying Occluded Agents in Dynamic Games with Noise-Corrupted Observations

  • Authors: Tianyu Qiu, David Fridovich-Keil
  • Subjects: Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2303.09744
  • Pdf link: https://arxiv.org/pdf/2303.09744
  • Abstract
    To provide safe and efficient services, robots must rely on observations from sensors (lidar, camera, etc.) to have a clear knowledge of the environment. In multi-agent scenarios, robots must further reason about the intrinsic motivation underlying the behavior of other agents in order to make inferences about their future behavior. Occlusions, which often occur in robot operating scenarios, make the decision-making of robots even more challenging. In scenarios without occlusions, dynamic game theory provides a solid theoretical framework for predicting the behavior of agents with different objectives interacting with each other over time. Prior work proposed an inverse dynamic game method to recover the game model that best explains observed behavior. However, an apparent shortcoming is that it does not account for agents that may be occluded. Neglecting these agents may result in risky navigation decisions. To address this problem, we propose a novel inverse dynamic game technique to infer the behavior of occluded, unobserved agents that best explains the observation of visible agents' behavior, and simultaneously to predict the agents' future behavior based on the recovered game model. We demonstrate our method in several simulated scenarios. Results reveal that our method robustly estimates agents' objectives and predicts trajectories for both visible and occluded agents from a short sequence of noise corrupted trajectory observation of only the visible agents.

LCE-Calib: Automatic LiDAR-Frame/Event Camera Extrinsic Calibration With A Globally Optimal Solution

  • Authors: Jianhao Jiao, Feiyi Chen, Hexiang Wei, Jin Wu, Ming Liu
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09825
  • Pdf link: https://arxiv.org/pdf/2303.09825
  • Abstract
    The combination of LiDARs and cameras enables a mobile robot to perceive environments with multi-modal data, becoming a key factor in achieving robust perception. Traditional frame cameras are sensitive to changing illumination conditions, motivating us to introduce novel event cameras to make LiDAR-camera fusion more complete and robust. However, to jointly exploit these sensors, the challenging extrinsic calibration problem should be addressed. This paper proposes an automatic checkerboard-based approach to calibrate extrinsics between a LiDAR and a frame/event camera, where four contributions are presented. Firstly, we present an automatic feature extraction and checkerboard tracking method from LiDAR's point clouds. Secondly, we reconstruct realistic frame images from event streams, applying traditional corner detectors to event cameras. Thirdly, we propose an initialization-refinement procedure to estimate extrinsics using point-to-plane and point-to-line constraints in a coarse-to-fine manner. Fourthly, we introduce a unified and globally optimal solution to address two optimization problems in calibration. Our approach has been validated with extensive experiments on 19 simulated and real-world datasets and outperforms the state-of-the-art.

Privacy-preserving Pedestrian Tracking using Distributed 3D LiDARs

  • Authors: Masakazu Ohno, Riki Ukyo, Tatsuya Amano, Hamada Rizk, Hirozumi Yamaguchi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2303.09915
  • Pdf link: https://arxiv.org/pdf/2303.09915
  • Abstract
    The growing demand for intelligent environments unleashes an extraordinary cycle of privacy-aware applications that makes individuals' life more comfortable and safe. Examples of these applications include pedestrian tracking systems in large areas. Although the ubiquity of camera-based systems, they are not a preferable solution due to the vulnerability of leaking the privacy of pedestrians.In this paper, we introduce a novel privacy-preserving system for pedestrian tracking in smart environments using multiple distributed LiDARs of non-overlapping views. The system is designed to leverage LiDAR devices to track pedestrians in partially covered areas due to practical constraints, e.g., occlusion or cost. Therefore, the system uses the point cloud captured by different LiDARs to extract discriminative features that are used to train a metric learning model for pedestrian matching purposes. To boost the system's robustness, we leverage a probabilistic approach to model and adapt the dynamic mobility patterns of individuals and thus connect their sub-trajectories.We deployed the system in a large-scale testbed with 70 colorless LiDARs and conducted three different experiments. The evaluation result at the entrance hall confirms the system's ability to accurately track the pedestrians with a 0.98 F-measure even with zero-covered areas. This result highlights the promise of the proposed system as the next generation of privacy-preserving tracking means in smart environments.

New submissions for Wed, 12 Apr 23

Keyword: efficient

DeepHive: A multi-agent reinforcement learning approach for automated discovery of swarm-based optimization policies

  • Authors: Eloghosa Ikponmwoba, Ope Owoyele
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.04751
  • Pdf link: https://arxiv.org/pdf/2304.04751
  • Abstract
    We present an approach for designing swarm-based optimizers for the global optimization of expensive black-box functions. In the proposed approach, the problem of finding efficient optimizers is framed as a reinforcement learning problem, where the goal is to find optimization policies that require a few function evaluations to converge to the global optimum. The state of each agent within the swarm is defined as its current position and function value within a design space and the agents learn to take favorable actions that maximize reward, which is based on the final value of the objective function. The proposed approach is tested on various benchmark optimization functions and compared to the performance of other global optimization strategies. Furthermore, the effect of changing the number of agents, as well as the generalization capabilities of the trained agents are investigated. The results show superior performance compared to the other optimizers, desired scaling when the number of agents is varied, and acceptable performance even when applied to unseen functions. On a broader scale, the results show promise for the rapid development of domain-specific optimizers.

A new perspective on building efficient and expressive 3D equivariant graph neural networks

  • Authors: Weitao Du, Yuanqi Du, Limei Wang, Dieqiao Feng, Guifeng Wang, Shuiwang Ji, Carla Gomes, Zhi-Ming Ma
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.04757
  • Pdf link: https://arxiv.org/pdf/2304.04757
  • Abstract
    Geometric deep learning enables the encoding of physical symmetries in modeling 3D objects. Despite rapid progress in encoding 3D symmetries into Graph Neural Networks (GNNs), a comprehensive evaluation of the expressiveness of these networks through a local-to-global analysis lacks today. In this paper, we propose a local hierarchy of 3D isomorphism to evaluate the expressive power of equivariant GNNs and investigate the process of representing global geometric information from local patches. Our work leads to two crucial modules for designing expressive and efficient geometric GNNs; namely local substructure encoding (LSE) and frame transition encoding (FTE). To demonstrate the applicability of our theory, we propose LEFTNet which effectively implements these modules and achieves state-of-the-art performance on both scalar-valued and vector-valued molecular property prediction tasks. We further point out the design space for future developments of equivariant graph neural networks. Our codes are available at \url{https://github.com/yuanqidu/LeftNet}.

An autoencoder compression approach for accelerating large-scale inverse problems

  • Authors: Jonathan Wittmer, Jacob Badger, Hari Sundar, Tan Bui-Thanh
  • Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.04781
  • Pdf link: https://arxiv.org/pdf/2304.04781
  • Abstract
    PDE-constrained inverse problems are some of the most challenging and computationally demanding problems in computational science today. Fine meshes that are required to accurately compute the PDE solution introduce an enormous number of parameters and require large scale computing resources such as more processors and more memory to solve such systems in a reasonable time. For inverse problems constrained by time dependent PDEs, the adjoint method that is often employed to efficiently compute gradients and higher order derivatives requires solving a time-reversed, so-called adjoint PDE that depends on the forward PDE solution at each timestep. This necessitates the storage of a high dimensional forward solution vector at every timestep. Such a procedure quickly exhausts the available memory resources. Several approaches that trade additional computation for reduced memory footprint have been proposed to mitigate the memory bottleneck, including checkpointing and compression strategies. In this work, we propose a close-to-ideal scalable compression approach using autoencoders to eliminate the need for checkpointing and substantial memory storage, thereby reducing both the time-to-solution and memory requirements. We compare our approach with checkpointing and an off-the-shelf compression approach on an earth-scale ill-posed seismic inverse problem. The results verify the expected close-to-ideal speedup for both the gradient and Hessian-vector product using the proposed autoencoder compression approach. To highlight the usefulness of the proposed approach, we combine the autoencoder compression with the data-informed active subspace (DIAS) prior to show how the DIAS method can be affordably extended to large scale problems without the need of checkpointing and large memory.

Revisiting Test Time Adaptation under Online Evaluation

  • Authors: Motasem Alfarra, Hani Itani, Alejandro Pardo, Shyma Alhuwaider, Merey Ramazanova, Juan C. Pérez, Zhipeng Cai, Matthias Müller, Bernard Ghanem
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.04795
  • Pdf link: https://arxiv.org/pdf/2304.04795
  • Abstract
    This paper proposes a novel online evaluation protocol for Test Time Adaptation (TTA) methods, which penalizes slower methods by providing them with fewer samples for adaptation. TTA methods leverage unlabeled data at test time to adapt to distribution shifts. Though many effective methods have been proposed, their impressive performance usually comes at the cost of significantly increased computation budgets. Current evaluation protocols overlook the effect of this extra computation cost, affecting their real-world applicability. To address this issue, we propose a more realistic evaluation protocol for TTA methods, where data is received in an online fashion from a constant-speed data stream, thereby accounting for the method's adaptation speed. We apply our proposed protocol to benchmark several TTA methods on multiple datasets and scenarios. Extensive experiments shows that, when accounting for inference speed, simple and fast approaches can outperform more sophisticated but slower methods. For example, SHOT from 2020 outperforms the state-of-the-art method SAR from 2023 under our online setting. Our online evaluation protocol emphasizes the need for developing TTA methods that are efficient and applicable in realistic settings.

Scallop: A Language for Neurosymbolic Programming

  • Authors: Ziyang Li, Jiani Huang, Mayur Naik
  • Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.04812
  • Pdf link: https://arxiv.org/pdf/2304.04812
  • Abstract
    We present Scallop, a language which combines the benefits of deep learning and logical reasoning. Scallop enables users to write a wide range of neurosymbolic applications and train them in a data- and compute-efficient manner. It achieves these goals through three key features: 1) a flexible symbolic representation that is based on the relational data model; 2) a declarative logic programming language that is based on Datalog and supports recursion, aggregation, and negation; and 3) a framework for automatic and efficient differentiable reasoning that is based on the theory of provenance semirings. We evaluate Scallop on a suite of eight neurosymbolic applications from the literature. Our evaluation demonstrates that Scallop is capable of expressing algorithmic reasoning in diverse and challenging AI tasks, provides a succinct interface for machine learning programmers to integrate logical domain knowledge, and yields solutions that are comparable or superior to state-of-the-art models in terms of accuracy. Furthermore, Scallop's solutions outperform these models in aspects such as runtime and data efficiency, interpretability, and generalizability.

Advances in Cybercrime Prediction: A Survey of Machine, Deep, Transfer, and Adaptive Learning Techniques

  • Authors: Lavanya Elluri, Varun Mandalapu, Piyush Vyas, Nirmalya Roy
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.04819
  • Pdf link: https://arxiv.org/pdf/2304.04819
  • Abstract
    Cybercrime is a growing threat to organizations and individuals worldwide, with criminals using increasingly sophisticated techniques to breach security systems and steal sensitive data. In recent years, machine learning, deep learning, and transfer learning techniques have emerged as promising tools for predicting cybercrime and preventing it before it occurs. This paper aims to provide a comprehensive survey of the latest advancements in cybercrime prediction using above mentioned techniques, highlighting the latest research related to each approach. For this purpose, we reviewed more than 150 research articles and discussed around 50 most recent and relevant research articles. We start the review by discussing some common methods used by cyber criminals and then focus on the latest machine learning techniques and deep learning techniques, such as recurrent and convolutional neural networks, which were effective in detecting anomalous behavior and identifying potential threats. We also discuss transfer learning, which allows models trained on one dataset to be adapted for use on another dataset, and then focus on active and reinforcement Learning as part of early-stage algorithmic research in cybercrime prediction. Finally, we discuss critical innovations, research gaps, and future research opportunities in Cybercrime prediction. Overall, this paper presents a holistic view of cutting-edge developments in cybercrime prediction, shedding light on the strengths and limitations of each method and equipping researchers and practitioners with essential insights, publicly available datasets, and resources necessary to develop efficient cybercrime prediction systems.

Binary Latent Diffusion

  • Authors: Ze Wang, Jiang Wang, Zicheng Liu, Qiang Qiu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.04820
  • Pdf link: https://arxiv.org/pdf/2304.04820
  • Abstract
    In this paper, we show that a binary latent space can be explored for compact yet expressive image representations. We model the bi-directional mappings between an image and the corresponding latent binary representation by training an auto-encoder with a Bernoulli encoding distribution. On the one hand, the binary latent space provides a compact discrete image representation of which the distribution can be modeled more efficiently than pixels or continuous latent representations. On the other hand, we now represent each image patch as a binary vector instead of an index of a learned cookbook as in discrete image representations with vector quantization. In this way, we obtain binary latent representations that allow for better image quality and high-resolution image representations without any multi-stage hierarchy in the latent space. In this binary latent space, images can now be generated effectively using a binary latent diffusion model tailored specifically for modeling the prior over the binary image representations. We present both conditional and unconditional image generation experiments with multiple datasets, and show that the proposed method performs comparably to state-of-the-art methods while dramatically improving the sampling efficiency to as few as 16 steps without using any test-time acceleration. The proposed framework can also be seamlessly scaled to $1024 \times 1024$ high-resolution image generation without resorting to latent hierarchy or multi-stage refinements.

Exact Set-valued Estimation using Constrained Convex Generators for uncertain Linear Systems

  • Authors: Daniel Silvestre
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.04826
  • Pdf link: https://arxiv.org/pdf/2304.04826
  • Abstract
    Set-valued state estimation when in the presence of uncertainties in the model have been addressed in the literature essentially following three main approaches: i) interval arithmetic of the uncertain dynamics with the estimates; ii) factorizing the uncertainty into matrices with unity rank; and, iii) performing the convex hull for the vertices of the uncertainty space. Approach i) and ii) introduce a lot of conservatism because both disregard the relationship of the parameters with the entries of the dynamics matrix. On the other hand, approach iii) has a large growth on the number of variables required to represent the set or is approximated losing its main advantage in comparison with i) and ii). In this paper, with the application of autonomous vehicles in GPS-denied areas that resort to beacon signals for localization, we develop an exact (meaning no added conservatism) and optimal (smallest growth in the number of variables) closed-form definition for the convex hull of Convex Constrained Generators (CCGs). This results in a more efficient method to represent the minimum volume convex set corresponding to the state estimation. Given that reductions methods are still lacking in the literature for CCGs, we employ an approximation using ray-shooting that is comparable in terms of accuracy with methods for Constrained Zonotopes as the ones implemented in CORA. Simulations illustrate the greater accuracy of CCGs with the proposed convex hull operation in comparison to Constrained Zonotopes.

A visão da BBChain sobre o contexto tecnológico subjacente à adoção do Real Digital

  • Authors: Marcio G B de Avellar, Alexandre A S Junior, André H G Lopes, André L S Carneiro, João A Pereira, Davi C B D da Cunha
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.04833
  • Pdf link: https://arxiv.org/pdf/2304.04833
  • Abstract
    We explore confidential computing in the context of CBDCs using Microsoft's CCF framework as an example. By developing an experiment and comparing different approaches and performance and security metrics, we seek to evaluate the effectiveness of confidential computing to improve the privacy, security, and performance of CBDCs. Preliminary results suggest that confidential computing could be a promising solution to the technological challenges faced by CBDCs. Furthermore, by implementing confidential computing in DLTs such as Hyperledger Besu and utilizing frameworks such as CCF, we increase transaction confidentiality and privacy while maintaining the scalability and interoperability required for a global digital financial system. In conclusion, confidential computing can significantly bolster CBDC development, fostering a secure, private, and efficient financial future. -- Exploramos o uso da computa\c{c}~ao confidencial no contexto das CBDCs utilizando o framework CCF da Microsoft como exemplo. Via desenvolvimento de experimentos e compara\c{c}~ao de diferentes abordagens e m'etricas de desempenho e seguran\c{c}a, buscamos avaliar a efic'acia da computa\c{c}~ao confidencial para melhorar a privacidade, seguran\c{c}a e desempenho das CBDCs. Resultados preliminares sugerem que a computa\c{c}~ao confidencial pode ser uma solu\c{c}~ao promissora para os desafios tecnol'ogicos enfrentados pelas CBDCs. Ao implementar a computa\c{c}~ao confidencial em DLTs, como o Hyperledger Besu, e utilizar frameworks como o CCF, aumentamos a confidencialidade e a privacidade das transa\c{c}~oes, mantendo a escalabilidade e a interoperabilidade necess'arias para um sistema financeiro global e digital. Em conclus~ao, a computa\c{c}~ao confidencial pode refor\c{c}ar significativamente o desenvolvimento do CBDC, promovendo um futuro financeiro seguro, privado e eficiente.

Human Motion Detection Based on Dual-Graph and Weighted Nuclear Norm Regularizations

  • Authors: Jing Qin, Biyun Xie
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.04879
  • Pdf link: https://arxiv.org/pdf/2304.04879
  • Abstract
    Motion detection has been widely used in many applications, such as surveillance and robotics. Due to the presence of the static background, a motion video can be decomposed into a low-rank background and a sparse foreground. Many regularization techniques that preserve low-rankness of matrices can therefore be imposed on the background. In the meanwhile, geometry-based regularizations, such as graph regularizations, can be imposed on the foreground. Recently, weighted regularization techniques including the weighted nuclear norm regularization have been proposed in the image processing community to promote adaptive sparsity while achieving efficient performance. In this paper, we propose a robust dual graph regularized moving object detection model based on a novel weighted nuclear norm regularization and spatiotemporal graph Laplacians. Numerical experiments on realistic human motion data sets have demonstrated the effectiveness and robustness of this approach in separating moving objects from background, and the enormous potential in robotic applications.

DISTO: Evaluating Textual Distractors for Multi-Choice Questions using Negative Sampling based Approach

  • Authors: Bilal Ghanem, Alona Fyshe
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.04881
  • Pdf link: https://arxiv.org/pdf/2304.04881
  • Abstract
    Multiple choice questions (MCQs) are an efficient and common way to assess reading comprehension (RC). Every MCQ needs a set of distractor answers that are incorrect, but plausible enough to test student knowledge. Distractor generation (DG) models have been proposed, and their performance is typically evaluated using machine translation (MT) metrics. However, MT metrics often misjudge the suitability of generated distractors. We propose DISTO: the first learned evaluation metric for generated distractors. We validate DISTO by showing its scores correlate highly with human ratings of distractor quality. At the same time, DISTO ranks the performance of state-of-the-art DG models very differently from MT-based metrics, showing that MT metrics should not be used for distractor evaluation.

EVKG: An Interlinked and Interoperable Electric Vehicle Knowledge Graph for Smart Transportation System

  • Authors: Yanlin Qi, Gengchen Mai, Rui Zhu, Michael Zhang
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.04893
  • Pdf link: https://arxiv.org/pdf/2304.04893
  • Abstract
    Over the past decade, the electric vehicle industry has experienced unprecedented growth and diversification, resulting in a complex ecosystem. To effectively manage this multifaceted field, we present an EV-centric knowledge graph (EVKG) as a comprehensive, cross-domain, extensible, and open geospatial knowledge management system. The EVKG encapsulates essential EV-related knowledge, including EV adoption, electric vehicle supply equipment, and electricity transmission network, to support decision-making related to EV technology development, infrastructure planning, and policy-making by providing timely and accurate information and analysis. To enrich and contextualize the EVKG, we integrate the developed EV-relevant ontology modules from existing well-known knowledge graphs and ontologies. This integration enables interoperability with other knowledge graphs in the Linked Data Open Cloud, enhancing the EVKG's value as a knowledge hub for EV decision-making. Using six competency questions, we demonstrate how the EVKG can be used to answer various types of EV-related questions, providing critical insights into the EV ecosystem. Our EVKG provides an efficient and effective approach for managing the complex and diverse EV industry. By consolidating critical EV-related knowledge into a single, easily accessible resource, the EVKG supports decision-makers in making informed choices about EV technology development, infrastructure planning, and policy-making. As a flexible and extensible platform, the EVKG is capable of accommodating a wide range of data sources, enabling it to evolve alongside the rapidly changing EV landscape.

Advancing Medical Imaging with Language Models: A Journey from N-grams to ChatGPT

  • Authors: Mingzhe Hu, Shaoyan Pan, Yuheng Li, Xiaofeng Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.04920
  • Pdf link: https://arxiv.org/pdf/2304.04920
  • Abstract
    In this paper, we aimed to provide a review and tutorial for researchers in the field of medical imaging using language models to improve their tasks at hand. We began by providing an overview of the history and concepts of language models, with a special focus on large language models. We then reviewed the current literature on how language models are being used to improve medical imaging, emphasizing different applications such as image captioning, report generation, report classification, finding extraction, visual question answering, interpretable diagnosis, and more for various modalities and organs. The ChatGPT was specially highlighted for researchers to explore more potential applications. We covered the potential benefits of accurate and efficient language models for medical imaging analysis, including improving clinical workflow efficiency, reducing diagnostic errors, and assisting healthcare professionals in providing timely and accurate diagnoses. Overall, our goal was to bridge the gap between language models and medical imaging and inspire new ideas and innovations in this exciting area of research. We hope that this review paper will serve as a useful resource for researchers in this field and encourage further exploration of the possibilities of language models in medical imaging.

Model sparsification can simplify machine unlearning

  • Authors: Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, Sijia Liu
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.04934
  • Pdf link: https://arxiv.org/pdf/2304.04934
  • Abstract
    Recent data regulations necessitate machine unlearning (MU): The removal of the effect of specific examples from the model. While exact unlearning is possible by conducting a model retraining with the remaining data from scratch, its computational cost has led to the development of approximate but efficient unlearning schemes. Beyond data-centric MU solutions, we advance MU through a novel model-based viewpoint: sparsification via weight pruning. Our results in both theory and practice indicate that model sparsity can boost the multi-criteria unlearning performance of an approximate unlearner, closing the approximation gap, while continuing to be efficient. With this insight, we develop two new sparsity-aware unlearning meta-schemes, termed prune first, then unlearn' and sparsity-aware unlearning'. Extensive experiments show that our findings and proposals consistently benefit MU in various scenarios, including class-wise data scrubbing, random data scrubbing, and backdoor data forgetting. One highlight is the 77% unlearning efficacy gain of fine-tuning (one of the simplest approximate unlearning methods) in the proposed sparsity-aware unlearning paradigm. Codes are available at https://github.com/OPTML-Group/Unlearn-Sparse.

Stress-hybrid virtual element method on quadrilateral meshes for compressible and nearly-incompressible linear elasticity

  • Authors: Alvin Chen, N. Sukumar
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.04941
  • Pdf link: https://arxiv.org/pdf/2304.04941
  • Abstract
    In this paper, we propose a robust low-order stabilization-free virtual element method on quadrilateral meshes for linear elasticity that is based on the stress-hybrid principle. We refer to this approach as the Stress-Hybrid Virtual Element Method (SH-VEM). In this method, the Hellinger$-$Reissner variational principle is adopted, wherein both the equilibrium equations and the strain-displacement relations are variationally enforced. We consider small-strain deformations of linear elastic solids in the compressible and near-incompressible regimes over quadrilateral (convex and nonconvex) meshes. Within an element, the displacement field is approximated as a linear combination of canonical shape functions that are $\textit{virtual}$. The stress field, similar to the stress-hybrid finite element method of Pian and Sumihara, is represented using a linear combination of symmetric tensor polynomials. A 5-parameter expansion of the stress field is used in each element, with stress transformation equations applied on distorted quadrilaterals. In the variational statement of the strain-displacement relations, the divergence theorem is invoked to express the stress coefficients in terms of the nodal displacements. This results in a formulation with solely the nodal displacements as unknowns. Numerical results are presented for several benchmark problems from linear elasticity. We show that SH-VEM is free of volumetric and shear locking, and it converges optimally in the $L^2$ norm and energy seminorm of the displacement field, and in the $L^2$ norm of the hydrostatic stress.

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

  • Authors: Tao Lei, Junwen Bai, Siddhartha Brahma, Joshua Ainslie, Kenton Lee, Yanqi Zhou, Nan Du, Vincent Y. Zhao, Yuexin Wu, Bo Li, Yu Zhang, Ming-Wei Chang
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.04947
  • Pdf link: https://arxiv.org/pdf/2304.04947
  • Abstract
    We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning method that also improves inference efficiency. CoDA generalizes beyond standard adapter approaches to enable a new way of balancing speed and accuracy using conditional computation. Starting with an existing dense pretrained model, CoDA adds sparse activation together with a small number of new parameters and a light-weight training phase. Our experiments demonstrate that the CoDA approach provides an unexpectedly efficient way to transfer knowledge. Across a variety of language, vision, and speech tasks, CoDA achieves a 2x to 8x inference speed-up compared to the state-of-the-art Adapter approach with moderate to no accuracy loss and the same parameter efficiency.

Data-Efficient Image Quality Assessment with Attention-Panel Decoder

  • Authors: Guanyi Qin, Runze Hu, Yutao Liu, Xiawu Zheng, Haotian Liu, Xiu Li, Yan Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.04952
  • Pdf link: https://arxiv.org/pdf/2304.04952
  • Abstract
    Blind Image Quality Assessment (BIQA) is a fundamental task in computer vision, which however remains unresolved due to the complex distortion conditions and diversified image contents. To confront this challenge, we in this paper propose a novel BIQA pipeline based on the Transformer architecture, which achieves an efficient quality-aware feature representation with much fewer data. More specifically, we consider the traditional fine-tuning in BIQA as an interpretation of the pre-trained model. In this way, we further introduce a Transformer decoder to refine the perceptual information of the CLS token from different perspectives. This enables our model to establish the quality-aware feature manifold efficiently while attaining a strong generalization capability. Meanwhile, inspired by the subjective evaluation behaviors of human, we introduce a novel attention panel mechanism, which improves the model performance and reduces the prediction uncertainty simultaneously. The proposed BIQA method maintains a lightweight design with only one layer of the decoder, yet extensive experiments on eight standard BIQA datasets (both synthetic and authentic) demonstrate its superior performance to the state-of-the-art BIQA methods, i.e., achieving the SRCC values of 0.875 (vs. 0.859 in LIVEC) and 0.980 (vs. 0.969 in LIVE).

AROW: A V2X-based Automated Right-of-Way Algorithm for Distributed Cooperative Intersection Management

  • Authors: Ghayoor Shah, Yaser P. Fallah, Danyang Tian, Ehsan Moradi-Pari
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.04958
  • Pdf link: https://arxiv.org/pdf/2304.04958
  • Abstract
    Safe and efficient intersection management is critical for an improved driving experience. As per several studies, an increasing number of crashes and fatalities occur every year at intersections. Most crashes are a consequence of a lack of situational awareness and ambiguity over intersection crossing priority. In this regard, research in Cooperative Intersection Management (CIM) is considered highly significant since it can utilize Vehicle-to-Everything (V2X) communication among Connected and Autonomous Vehicles (CAVs). CAVs can transceive basic and/or advanced safety information, thereby improving situational awareness at intersections. Although numerous studies have been performed on CIM, most of them are reliant on the presence of a Road-Side Unit (RSU) that can act as a centralized intersection manager and assign intersection crossing priorities. In the absence of RSU, there are some distributed CIM methods that only rely on communication among CAVs for situational awareness, however, none of them are specifically focused towards Stop Controlled-Intersection (SCI) with the aim of mitigating ambiguity among CAVs. Thus, we propose an Automated Right-of-Way (AROW) algorithm based on distributed CIM that is capable of reducing ambiguity and handling any level of noncompliance by CAVs. The algorithm is validated with extensive experiments for its functionality and robustness, and it outperforms the current solutions.

PlantDet: A benchmark for Plant Detection in the Three-Rivers-Source Region

  • Authors: Huanhuan Li, Xuechao Zou, Yu-an Zhang, Jiangcai Zhaba, Guomei Li, Lamao Yongga
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.04963
  • Pdf link: https://arxiv.org/pdf/2304.04963
  • Abstract
    The Three-River-Source region is a highly significant natural reserve in China that harbors a plethora of untamed botanical resources. To meet the practical requirements of botanical research and intelligent plant management, we construct a large-scale dataset for Plant detection in the Three-River-Source region (PTRS). This dataset comprises 6965 high-resolution images of 2160*3840 pixels, captured by diverse sensors and platforms, and featuring objects of varying shapes and sizes. Subsequently, a team of botanical image interpretation experts annotated these images with 21 commonly occurring object categories. The fully annotated PTRS images contain 122, 300 instances of plant leaves, each labeled by a horizontal rectangle. The PTRS presents us with challenges such as dense occlusion, varying leaf resolutions, and high feature similarity among plants, prompting us to develop a novel object detection network named PlantDet. This network employs a window-based efficient self-attention module (ST block) to generate robust feature representation at multiple scales, improving the detection efficiency for small and densely-occluded objects. Our experimental results validate the efficacy of our proposed plant detection benchmark, with a precision of 88.1%, a mean average precision (mAP) of 77.6%, and a higher recall compared to the baseline. Additionally, our method effectively overcomes the issue of missing small objects. We intend to share our data and code with interested parties to advance further research in this field.

Computer Vision-Aided Intelligent Monitoring of Coffee: Towards Sustainable Coffee Production

  • Authors: Francisco Eron, Muhammad Noman, Raphael Ricon de Oliveira, Deigo de Souza Marques, Rafael Serapilha Durelli, Andre Pimenta Freire, Antonio Chalfun Junior
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.04966
  • Pdf link: https://arxiv.org/pdf/2304.04966
  • Abstract
    Coffee which is prepared from the grinded roasted seeds of harvested coffee cherries, is one of the most consumed beverage and traded commodity, globally. To manually monitor the coffee field regularly, and inform about plant and soil health, as well as estimate yield and harvesting time, is labor-intensive, time-consuming and error-prone. Some recent studies have developed sensors for estimating coffee yield at the time of harvest, however a more inclusive and applicable technology to remotely monitor multiple parameters of the field and estimate coffee yield and quality even at pre-harvest stage, was missing. Following precision agriculture approach, we employed machine learning algorithm YOLO, for image processing of coffee plant. In this study, the latest version of the state-of-the-art algorithm YOLOv7 was trained with 324 annotated images followed by its evaluation with 82 unannotated images as test data. Next, as an innovative approach for annotating the training data, we trained K-means models which led to machine-generated color classes of coffee fruit and could thus characterize the informed objects in the image. Finally, we attempted to develop an AI-based handy mobile application which would not only efficiently predict harvest time, estimate coffee yield and quality, but also inform about plant health. Resultantly, the developed model efficiently analyzed the test data with a mean average precision of 0.89. Strikingly, our innovative semi-supervised method with an mean average precision of 0.77 for multi-class mode surpassed the supervised method with mean average precision of only 0.60, leading to faster and more accurate annotation. The mobile application we designed based on the developed code, was named CoffeApp, which possesses multiple features of analyzing fruit from the image taken by phone camera with in field and can thus track fruit ripening in real time.

GRIL: A $2$-parameter Persistence Based Vectorization for Machine Learning

  • Authors: Cheng Xin, Soham Mukherjee, Shreyas N. Samaga, Tamal K. Dey
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Geometry (cs.CG); Algebraic Topology (math.AT)
  • Arxiv link: https://arxiv.org/abs/2304.04970
  • Pdf link: https://arxiv.org/pdf/2304.04970
  • Abstract
    $1$-parameter persistent homology, a cornerstone in Topological Data Analysis (TDA), studies the evolution of topological features such as connected components and cycles hidden in data. It has been applied to enhance the representation power of deep learning models, such as Graph Neural Networks (GNNs). To enrich the representations of topological features, here we propose to study $2$-parameter persistence modules induced by bi-filtration functions. In order to incorporate these representations into machine learning models, we introduce a novel vector representation called Generalized Rank Invariant Landscape \textsc{Gril} for $2$-parameter persistence modules. We show that this vector representation is $1$-Lipschitz stable and differentiable with respect to underlying filtration functions and can be easily integrated into machine learning models to augment encoding topological features. We present an algorithm to compute the vector representation efficiently. We also test our methods on synthetic and benchmark graph datasets, and compare the results with previous vector representations of $1$-parameter and $2$-parameter persistence modules.

StageInteractor: Query-based Object Detector with Cross-stage Interaction

  • Authors: Yao Teng, Haisong Liu, Sheng Guo, Limin Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.04978
  • Pdf link: https://arxiv.org/pdf/2304.04978
  • Abstract
    Previous object detectors make predictions based on dense grid points or numerous preset anchors. Most of these detectors are trained with one-to-many label assignment strategies. On the contrary, recent query-based object detectors depend on a sparse set of learnable queries and a series of decoder layers. The one-to-one label assignment is independently applied on each layer for the deep supervision during training. Despite the great success of query-based object detection, however, this one-to-one label assignment strategy demands the detectors to have strong fine-grained discrimination and modeling capacity. To solve the above problems, in this paper, we propose a new query-based object detector with cross-stage interaction, coined as StageInteractor. During the forward propagation, we come up with an efficient way to improve this modeling ability by reusing dynamic operators with lightweight adapters. As for the label assignment, a cross-stage label assigner is applied subsequent to the one-to-one label assignment. With this assigner, the training target class labels are gathered across stages and then reallocated to proper predictions at each decoder layer. On MS COCO benchmark, our model improves the baseline by 2.2 AP, and achieves 44.8 AP with ResNet-50 as backbone, 100 queries and 12 training epochs. With longer training time and 300 queries, StageInteractor achieves 51.1 AP and 52.2 AP with ResNeXt-101-DCN and Swin-S, respectively.

Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition

  • Authors: Guangyong Wei, Zhikui Duan, Shiren Li, Guangguang Yang, Xinmei Yu, Junhua Li
  • Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.04991
  • Pdf link: https://arxiv.org/pdf/2304.04991
  • Abstract
    In recent years, a great deal of attention has been paid to the Transformer network for speech recognition tasks due to its excellent model performance. However, the Transformer network always involves heavy computation and large number of parameters, causing serious deployment problems in devices with limited computation sources or storage memory. In this paper, a new lightweight model called Sim-T has been proposed to expand the generality of the Transformer model. Under the help of the newly developed multiplexing technique, the Sim-T can efficiently compress the model with negligible sacrifice on its performance. To be more precise, the proposed technique includes two parts, that are, module weight multiplexing and attention score multiplexing. Moreover, a novel decoder structure has been proposed to facilitate the attention score multiplexing. Extensive experiments have been conducted to validate the effectiveness of Sim-T. In Aishell-1 dataset, when the proposed Sim-T is 48% parameter less than the baseline Transformer, 0.4% CER improvement can be obtained. Alternatively, 69% parameter reduction can be achieved if the Sim-T gives the same performance as the baseline Transformer. With regard to the HKUST and WSJ eval92 datasets, CER and WER will be improved by 0.3% and 0.2%, respectively, when parameters in Sim-T are 40% less than the baseline Transformer.

Custom Memory Design for Logic-in-Memory: Drawbacks and Improvements over Conventional Memories

  • Authors: Fabrizio Ottati, Giovanna Turvani, Marco Vacca, Guido Masera
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.04995
  • Pdf link: https://arxiv.org/pdf/2304.04995
  • Abstract
    The speed of modern digital systems is severely limited by memory latency (the ``Memory Wall'' problem). Data exchange between Logic and Memory is also responsible for a large part of the system energy consumption. Logic--In--Memory (LiM) represents an attractive solution to this problem. By performing part of the computations directly inside the memory the system speed can be improved while reducing its energy consumption. LiM solutions that offer the major boost in performance are based on the modification of the memory cell. However, what is the cost of such modifications? How do these impact the memory array performance? In this work, this question is addressed by analysing a LiM memory array implementing an algorithm for the maximum/minimum value computation. The memory array is designed at physical level using the FreePDK $\SI{45}{\nano\meter}$ CMOS process, with three memory cell variants, and its performance is compared to SRAM and CAM memories. Results highlight that read and write operations performance is worsened but in--memory operations result to be very efficient: a 55.26% reduction in the energy--delay product is measured for the AND operation with respect to the SRAM read one; therefore, the LiM approach represents a very promising solution for low--density and high--performance memories.

Bayes correlated equilibria and no-regret dynamics

  • Authors: Kaito Fujii
  • Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.05005
  • Pdf link: https://arxiv.org/pdf/2304.05005
  • Abstract
    This paper explores equilibrium concepts for Bayesian games, which are fundamental models of games with incomplete information. We aim at three desirable properties of equilibria. First, equilibria can be naturally realized by introducing a mediator into games. Second, an equilibrium can be computed efficiently in a distributed fashion. Third, any equilibrium in that class approximately maximizes social welfare, as measured by the price of anarchy, for a broad class of games. These three properties allow players to compute an equilibrium and realize it via a mediator, thereby settling into a stable state with approximately optimal social welfare. Our main result is the existence of an equilibrium concept that satisfies these three properties. Toward this goal, we characterize various (non-equivalent) extensions of correlated equilibria, collectively known as Bayes correlated equilibria. In particular, we focus on communication equilibria (also known as coordination mechanisms), which can be realized by a mediator who gathers each player's private information and then sends correlated recommendations to the players. We show that if each player minimizes a variant of regret called untruthful swap regret in repeated play of Bayesian games, the empirical distribution of these dynamics converges to a communication equilibrium. We present an efficient algorithm for minimizing untruthful swap regret with a sublinear upper bound, which we prove to be tight up to a multiplicative constant. As a result, by simulating the dynamics with our algorithm, we can efficiently compute an approximate communication equilibrium. Furthermore, we extend existing lower bounds on the price of anarchy based on the smoothness arguments from Bayes Nash equilibria to equilibria obtained by the proposed dynamics.

Privacy Amplification via Shuffling: Unified, Simplified, and Tightened

  • Authors: Shaowei Wang
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.05007
  • Pdf link: https://arxiv.org/pdf/2304.05007
  • Abstract
    In decentralized settings, the shuffle model of differential privacy has emerged as a promising alternative to the classical local model. Analyzing privacy amplification via shuffling is a critical component in both single-message and multi-message shuffle protocols. However, current methods used in these two areas are distinct and specific, making them less convenient for protocol designers and practitioners. In this work, we introduce variation-ratio reduction as a unified framework for privacy amplification analyses in the shuffle model. This framework utilizes total variation bounds of local messages and probability ratio bounds of other users' blanket messages, converting them to indistinguishable levels. Our results indicate that the framework yields tighter bounds for both single-message and multi-message encoders (e.g., with local DP, local metric DP, or general multi-message randomizers). Specifically, for a broad range of local randomizers having extremal probability design, our amplification bounds are precisely tight. We also demonstrate that variation-ratio reduction is well-suited for parallel composition in the shuffle model and results in stricter privacy accounting for common sampling-based local randomizers. Our experimental findings show that, compared to existing amplification bounds, our numerical amplification bounds can save up to $30%$ of the budget for single-message protocols, $75%$ of the budget for multi-message protocols, and $75%$-$95%$ of the budget for parallel composition. Additionally, our implementation for numerical amplification bounds has only $\tilde{O}(n)$ complexity and is highly efficient in practice, taking just $2$ minutes for $n=10^8$ users. The code for our implementation can be found at \url{https://github.com/wangsw/PrivacyAmplification}.

Habits and goals in synergy: a variational Bayesian framework for behavior

  • Authors: Dongqi Han, Kenji Doya, Dongsheng Li, Jun Tani
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05008
  • Pdf link: https://arxiv.org/pdf/2304.05008
  • Abstract
    How to behave efficiently and flexibly is a central problem for understanding biological agents and creating intelligent embodied AI. It has been well known that behavior can be classified as two types: reward-maximizing habitual behavior, which is fast while inflexible; and goal-directed behavior, which is flexible while slow. Conventionally, habitual and goal-directed behaviors are considered handled by two distinct systems in the brain. Here, we propose to bridge the gap between the two behaviors, drawing on the principles of variational Bayesian theory. We incorporate both behaviors in one framework by introducing a Bayesian latent variable called "intention". The habitual behavior is generated by using prior distribution of intention, which is goal-less; and the goal-directed behavior is generated by the posterior distribution of intention, which is conditioned on the goal. Building on this idea, we present a novel Bayesian framework for modeling behaviors. Our proposed framework enables skill sharing between the two kinds of behaviors, and by leveraging the idea of predictive coding, it enables an agent to seamlessly generalize from habitual to goal-directed behavior without requiring additional training. The proposed framework suggests a fresh perspective for cognitive science and embodied AI, highlighting the potential for greater integration between habitual and goal-directed behaviors.

Towards an Understanding and Explanation for Mixed-Initiative Artificial Scientific Text Detection

  • Authors: Luoxuan Weng, Minfeng Zhu, Kam Kwai Wong, Shi Liu, Jiashun Sun, Hang Zhu, Dongming Han, Wei Chen
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.05011
  • Pdf link: https://arxiv.org/pdf/2304.05011
  • Abstract
    Large language models (LLMs) have gained popularity in various fields for their exceptional capability of generating human-like text. Their potential misuse has raised social concerns about plagiarism in academic contexts. However, effective artificial scientific text detection is a non-trivial task due to several challenges, including 1) the lack of a clear understanding of the differences between machine-generated and human-written scientific text, 2) the poor generalization performance of existing methods caused by out-of-distribution issues, and 3) the limited support for human-machine collaboration with sufficient interpretability during the detection process. In this paper, we first identify the critical distinctions between machine-generated and human-written scientific text through a quantitative experiment. Then, we propose a mixed-initiative workflow that combines human experts' prior knowledge with machine intelligence, along with a visual analytics prototype to facilitate efficient and trustworthy scientific text detection. Finally, we demonstrate the effectiveness of our approach through two case studies and a controlled user study with proficient researchers. We also provide design implications for interactive artificial text detection tools in high-stakes decision-making scenarios.

Human-machine cooperation for semantic feature listing

  • Authors: Kushin Mukherjee, Siddharth Suresh, Timothy T. Rogers
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05012
  • Pdf link: https://arxiv.org/pdf/2304.05012
  • Abstract
    Semantic feature norms, lists of features that concepts do and do not possess, have played a central role in characterizing human conceptual knowledge, but require extensive human labor. Large language models (LLMs) offer a novel avenue for the automatic generation of such feature lists, but are prone to significant error. Here, we present a new method for combining a learned model of human lexical-semantics from limited data with LLM-generated data to efficiently generate high-quality feature norms.

Scalable Real-Time Vehicle Deformation for Interactive Environments

  • Authors: Ben Kenwright
  • Subjects: Robotics (cs.RO); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.05045
  • Pdf link: https://arxiv.org/pdf/2304.05045
  • Abstract
    This paper proposes a real-time physically-based method for simulating vehicle deformation. Our system synthesizes vehicle deformation characteristics by considering a low-dimensional coupled vehicle body technique. We simulate the motion and crumbling behavior of vehicles smashing into rigid objects. We explain and demonstrate the combination of a reduced complexity non-linear finite element system that is scalable and computationally efficient. We use an explicit position-based integration scheme to improve simulation speeds, while remaining stable and preserving modeling accuracy. We show our approach using a variety of vehicle deformation test cases which were simulated in real-time.

Pointless Global Bundle Adjustment With Relative Motions Hessians

  • Authors: Ewelina Rupnik, Marc Pierrot-Deseilligny
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05118
  • Pdf link: https://arxiv.org/pdf/2304.05118
  • Abstract
    Bundle adjustment (BA) is the standard way to optimise camera poses and to produce sparse representations of a scene. However, as the number of camera poses and features grows, refinement through bundle adjustment becomes inefficient. Inspired by global motion averaging methods, we propose a new bundle adjustment objective which does not rely on image features' reprojection errors yet maintains precision on par with classical BA. Our method averages over relative motions while implicitly incorporating the contribution of the structure in the adjustment. To that end, we weight the objective function by local hessian matrices - a by-product of local bundle adjustments performed on relative motions (e.g., pairs or triplets) during the pose initialisation step. Such hessians are extremely rich as they encapsulate both the features' random errors and the geometric configuration between the cameras. These pieces of information propagated to the global frame help to guide the final optimisation in a more rigorous way. We argue that this approach is an upgraded version of the motion averaging approach and demonstrate its effectiveness on both photogrammetric datasets and computer vision benchmarks.

Accelerating Globally Optimal Consensus Maximization in Geometric Vision

  • Authors: Xinyue Zhang, Liangzu Peng, Wanting Xu, Laurent Kneip
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05156
  • Pdf link: https://arxiv.org/pdf/2304.05156
  • Abstract
    Branch-and-bound-based consensus maximization stands out due to its important ability of retrieving the globally optimal solution to outlier-affected geometric problems. However, while the discovery of such solutions caries high scientific value, its application in practical scenarios is often prohibited by its computational complexity growing exponentially as a function of the dimensionality of the problem at hand. In this work, we convey a novel, general technique that allows us to branch over an $n-1$ dimensional space for an n-dimensional problem. The remaining degree of freedom can be solved globally optimally within each bound calculation by applying the efficient interval stabbing technique. While each individual bound derivation is harder to compute owing to the additional need for solving a sorting problem, the reduced number of intervals and tighter bounds in practice lead to a significant reduction in the overall number of required iterations. Besides an abstract introduction of the approach, we present applications to three fundamental geometric computer vision problems: camera resectioning, relative camera pose estimation, and point set registration. Through our exhaustive tests, we demonstrate significant speed-up factors at times exceeding two orders of magnitude, thereby increasing the viability of globally optimal consensus maximizers in online application scenarios.

From research activities to institutional piloting: the challenges of modernizing interfaces and data interoperability

  • Authors: Sabine Tostain (IRD)
  • Subjects: Digital Libraries (cs.DL)
  • Arxiv link: https://arxiv.org/abs/2304.05180
  • Pdf link: https://arxiv.org/pdf/2304.05180
  • Abstract
    Research activities are generally observed and evaluated through the prism of their production and financial elements or team composition. In addition to standardized management indicators and bibliometrics, the French National Research Institute for Sustainable Development (IRD) has been building new indicators for the last ten years, based on the annual regulatory declarations of the Institute's researchers. Different quality management tools allow the evolution of the different interfaces. This source of data, more ''open'' and more ''useful'' through its integration into the Institute's information system, is adapted to the needs of the multi-year management of research at the IRD. The aim is twofold: (1) to make progress in the evaluation of research and in the mastery of information by all actors, (2) to enlighten as many actors as possible via more efficient digital circuits and tools. The purpose of this article is to explain how the IRD is changing the entire production chain and the indicators of researchers' activities to better map scientific activities.

TinyReptile: TinyML with Federated Meta-Learning

  • Authors: Haoyu Ren, Darko Anicic, Thomas A. Runkler
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.05201
  • Pdf link: https://arxiv.org/pdf/2304.05201
  • Abstract
    Tiny machine learning (TinyML) is a rapidly growing field aiming to democratize machine learning (ML) for resource-constrained microcontrollers (MCUs). Given the pervasiveness of these tiny devices, it is inherent to ask whether TinyML applications can benefit from aggregating their knowledge. Federated learning (FL) enables decentralized agents to jointly learn a global model without sharing sensitive local data. However, a common global model may not work for all devices due to the complexity of the actual deployment environment and the heterogeneity of the data available on each device. In addition, the deployment of TinyML hardware has significant computational and communication constraints, which traditional ML fails to address. Considering these challenges, we propose TinyReptile, a simple but efficient algorithm inspired by meta-learning and online learning, to collaboratively learn a solid initialization for a neural network (NN) across tiny devices that can be quickly adapted to a new device with respect to its data. We demonstrate TinyReptile on Raspberry Pi 4 and Cortex-M4 MCU with only 256-KB RAM. The evaluations on various TinyML use cases confirm a resource reduction and training time saving by at least two factors compared with baseline algorithms with comparable performance.

Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond

  • Authors: Ensheng Shi, Yanlin Wang, Hongyu Zhang, Lun Du, Shi Han, Dongmei Zhang, Hongbin Sun
  • Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.05216
  • Pdf link: https://arxiv.org/pdf/2304.05216
  • Abstract
    Recently, fine-tuning pre-trained code models such as CodeBERT on downstream tasks has achieved great success in many software testing and analysis tasks. While effective and prevalent, fine-tuning the pre-trained parameters incurs a large computational cost. In this paper, we conduct an extensive experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning. We then propose efficient alternatives to fine-tune the large pre-trained code model based on the above findings. Our experimental study shows that (1) lexical, syntactic and structural properties of source code are encoded in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model. (2) The process of fine-tuning preserves most of the code properties. Specifically, the basic code properties captured by lower and intermediate layers are still preserved during fine-tuning. Furthermore, we find that only the representations of the top two layers change most during fine-tuning for various downstream tasks. (3) Based on the above findings, we propose Telly to efficiently fine-tune pre-trained code models via layer freezing. The extensive experimental results on five various downstream tasks demonstrate that training parameters and the corresponding time cost are greatly reduced, while performances are similar or better. Replication package including source code, datasets, and online Appendix is available at: \url{https://github.com/DeepSoftwareAnalytics/Telly}.

Inhomogeneous graph trend filtering via a l2,0 cardinality penalty

  • Authors: Xiaoqing Huang, Andersen Ang, Jie Zhang, Yijie Wang
  • Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.05223
  • Pdf link: https://arxiv.org/pdf/2304.05223
  • Abstract
    We study estimation of piecewise smooth signals over a graph. We propose a $\ell_{2,0}$-norm penalized Graph Trend Filtering (GTF) model to estimate piecewise smooth graph signals that exhibits inhomogeneous levels of smoothness across the nodes. We prove that the proposed GTF model is simultaneously a k-means clustering on the signal over the nodes and a minimum graph cut on the edges of the graph, where the clustering and the cut share the same assignment matrix. We propose two methods to solve the proposed GTF model: a spectral decomposition method and a method based on simulated annealing. In the experiment on synthetic and real-world datasets, we show that the proposed GTF model has a better performances compared with existing approaches on the tasks of denoising, support recovery and semi-supervised classification. We also show that the proposed GTF model can be solved more efficiently than existing models for the dataset with a large edge set.

OpenAL: Evaluation and Interpretation of Active Learning Strategies

  • Authors: W. Jonas, A. Abraham, L. Dreyfus-Schmidt
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.05246
  • Pdf link: https://arxiv.org/pdf/2304.05246
  • Abstract
    Despite the vast body of literature on Active Learning (AL), there is no comprehensive and open benchmark allowing for efficient and simple comparison of proposed samplers. Additionally, the variability in experimental settings across the literature makes it difficult to choose a sampling strategy, which is critical due to the one-off nature of AL experiments. To address those limitations, we introduce OpenAL, a flexible and open-source framework to easily run and compare sampling AL strategies on a collection of realistic tasks. The proposed benchmark is augmented with interpretability metrics and statistical analysis methods to understand when and why some samplers outperform others. Last but not least, practitioners can easily extend the benchmark by submitting their own AL samplers.

Re-Weighted Softmax Cross-Entropy to Control Forgetting in Federated Learning

  • Authors: Gwen Legate, Lucas Caccia, Eugene Belilovsky
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05260
  • Pdf link: https://arxiv.org/pdf/2304.05260
  • Abstract
    In Federated Learning, a global model is learned by aggregating model updates computed at a set of independent client nodes, to reduce communication costs multiple gradient steps are performed at each node prior to aggregation. A key challenge in this setting is data heterogeneity across clients resulting in differing local objectives which can lead clients to overly minimize their own local objective, diverging from the global solution. We demonstrate that individual client models experience a catastrophic forgetting with respect to data from other clients and propose an efficient approach that modifies the cross-entropy objective on a per-client basis by re-weighting the softmax logits prior to computing the loss. This approach shields classes outside a client's label set from abrupt representation change and we empirically demonstrate it can alleviate client forgetting and provide consistent improvements to standard federated learning algorithms. Our method is particularly beneficial under the most challenging federated learning settings where data heterogeneity is high and client participation in each round is low.

Controllable Textual Inversion for Personalized Text-to-Image Generation

  • Authors: Jianan Yang, Haobo Wang, Ruixuan Xiao, Sai Wu, Gang Chen, Junbo Zhao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.05265
  • Pdf link: https://arxiv.org/pdf/2304.05265
  • Abstract
    The recent large-scale generative modeling has attained unprecedented performance especially in producing high-fidelity images driven by text prompts. Text inversion (TI), alongside the text-to-image model backbones, is proposed as an effective technique in personalizing the generation when the prompts contain user-defined, unseen or long-tail concept tokens. Despite that, we find and show that the deployment of TI remains full of "dark-magics" -- to name a few, the harsh requirement of additional datasets, arduous human efforts in the loop and lack of robustness. In this work, we propose a much-enhanced version of TI, dubbed Controllable Textual Inversion (COTI), in resolving all the aforementioned problems and in turn delivering a robust, data-efficient and easy-to-use framework. The core to COTI is a theoretically-guided loss objective instantiated with a comprehensive and novel weighted scoring mechanism, encapsulated by an active-learning paradigm. The extensive results show that COTI significantly outperforms the prior TI-related approaches with a 26.05 decrease in the FID score and a 23.00% boost in the R-precision.

Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning

  • Authors: Wenjin Wang, Yunqing Hu, Qianglong Chen, Yin Zhang
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05288
  • Pdf link: https://arxiv.org/pdf/2304.05288
  • Abstract
    Parameter regularization or allocation methods are effective in overcoming catastrophic forgetting in lifelong learning. However, they solve all tasks in a sequence uniformly and ignore the differences in the learning difficulty of different tasks. So parameter regularization methods face significant forgetting when learning a new task very different from learned tasks, and parameter allocation methods face unnecessary parameter overhead when learning simple tasks. In this paper, we propose the Parameter Allocation & Regularization (PAR), which adaptively select an appropriate strategy for each task from parameter allocation and regularization based on its learning difficulty. A task is easy for a model that has learned tasks related to it and vice versa. We propose a divergence estimation method based on the Nearest-Prototype distance to measure the task relatedness using only features of the new task. Moreover, we propose a time-efficient relatedness-aware sampling-based architecture search strategy to reduce the parameter overhead for allocation. Experimental results on multiple benchmarks demonstrate that, compared with SOTAs, our method is scalable and significantly reduces the model's redundancy while improving the model's performance. Further qualitative analysis indicates that PAR obtains reasonable task-relatedness.

RRHF: Rank Responses to Align Language Models with Human Feedback without tears

  • Authors: Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang, Fei Huang
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.05302
  • Pdf link: https://arxiv.org/pdf/2304.05302
  • Abstract
    Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models with human preferences, significantly enhancing the quality of interactions between humans and these models. InstructGPT implements RLHF through several stages, including Supervised Fine-Tuning (SFT), reward model training, and Proximal Policy Optimization (PPO). PPO, however, is sensitive to hyperparameters and requires a minimum of four models in its standard implementation, which makes it hard to train. In contrast, we propose a novel learning paradigm called RRHF, which scores responses generated by different sampling policies and learns to align them with human preferences through ranking loss. RRHF can efficiently align language model output probabilities with human preferences as robust as fine-tuning and it only needs 1 to 2 models during tuning. In addition, RRHF can be considered an extension of SFT and reward models while being simpler than PPO in terms of coding, model counts, and hyperparameters. The entire alignment process can be accomplished within a single RRHF training session. We evaluate RRHF using LLaMA and Alpaca on Helpful and Harmless data, demonstrating performance comparable to PPO.

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

  • Authors: Yunpeng Zhang, Zheng Zhu, Dalong Du
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05316
  • Pdf link: https://arxiv.org/pdf/2304.05316
  • Abstract
    The vision-based perception for autonomous driving has undergone a transformation from the bird-eye-view (BEV) representations to the 3D semantic occupancy. Compared with the BEV planes, the 3D semantic occupancy further provides structural information along the vertical direction. This paper presents OccFormer, a dual-path transformer network to effectively process the 3D volume for semantic occupancy prediction. OccFormer achieves a long-range, dynamic, and efficient encoding of the camera-generated 3D voxel features. It is obtained by decomposing the heavy 3D processing into the local and global transformer pathways along the horizontal plane. For the occupancy decoder, we adapt the vanilla Mask2Former for 3D semantic occupancy by proposing preserve-pooling and class-guided sampling, which notably mitigate the sparsity and class imbalance. Experimental results demonstrate that OccFormer significantly outperforms existing methods for semantic scene completion on SemanticKITTI dataset and for LiDAR semantic segmentation on nuScenes dataset. Code is available at \url{https://github.com/zhangyp15/OccFormer}.

SciKGTeX -- A LaTeX Package to Semantically Annotate Contributions in Scientific Publications

  • Authors: Christof Bless, Ildar Baimuratov, Oliver Karras
  • Subjects: Digital Libraries (cs.DL); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.05327
  • Pdf link: https://arxiv.org/pdf/2304.05327
  • Abstract
    Scientific knowledge graphs have been proposed as a solution to structure the content of research publications in a machine-actionable way and enable more efficient, computer-assisted workflows for many research activities. Crowd-sourcing approaches are used frequently to build and maintain such scientific knowledge graphs. To contribute to scientific knowledge graphs, researchers need simple and easy-to-use solutions to generate new knowledge graph elements and establish the practice of semantic representations in scientific communication. In this paper, we present a workflow for authors of scientific documents to specify their contributions with a LaTeX package, called SciKGTeX, and upload them to a scientific knowledge graph. The SciKGTeX package allows authors of scientific publications to mark the main contributions of their work directly in LaTeX source files. The package embeds marked contributions as metadata into the generated PDF document, from where they can be extracted automatically and imported into a scientific knowledge graph, such as the ORKG. This workflow is simpler and faster than current approaches, which make use of external web interfaces for data entry. Our user evaluation shows that SciKGTeX is easy to use, with a score of 79 out of 100 on the System Usability Scale, as participants of the study needed only 7 minutes on average to annotate the main contributions on a sample abstract of a published paper. Further testing shows that the embedded contributions can be successfully uploaded to ORKG within ten seconds. SciKGTeX simplifies the process of manual semantic annotation of research contributions in scientific articles. Our workflow demonstrates how a scientific knowledge graph can automatically ingest research contributions from document metadata.

TT-SDF2PC: Registration of Point Cloud and Compressed SDF Directly in the Memory-Efficient Tensor Train Domain

  • Authors: Alexey I. Boyko, Anastasiia Kornilova, Rahim Tariverdizadeh, Mirfarid Musavian, Larisa Markeeva, Ivan Oseledets, Gonzalo Ferrer
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.05342
  • Pdf link: https://arxiv.org/pdf/2304.05342
  • Abstract
    This paper addresses the following research question: ``can one compress a detailed 3D representation and use it directly for point cloud registration?''. Map compression of the scene can be achieved by the tensor train (TT) decomposition of the signed distance function (SDF) representation. It regulates the amount of data reduced by the so-called TT-ranks. Using this representation we have proposed an algorithm, the TT-SDF2PC, that is capable of directly registering a PC to the compressed SDF by making use of efficient calculations of its derivatives in the TT domain, saving computations and memory. We compare TT-SDF2PC with SOTA local and global registration methods in a synthetic dataset and a real dataset and show on par performance while requiring significantly less resources.

Leo: Lagrange Elementary Optimization

  • Authors: Aso M. Aladdin, Tarik A. Rashid
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.05346
  • Pdf link: https://arxiv.org/pdf/2304.05346
  • Abstract
    Global optimization problems are frequently solved using the practical and efficient method of evolutionary sophistication. But as the original problem becomes more complex, so does its efficacy and expandability. Thus, the purpose of this research is to introduce the Lagrange Elementary Optimization (Leo) as an evolutionary method, which is self-adaptive inspired by the remarkable accuracy of vaccinations using the albumin quotient of human blood. They develop intelligent agents using their fitness function value after gene crossing. These genes direct the search agents during both exploration and exploitation. The main objective of the Leo algorithm is presented in this paper along with the inspiration and motivation for the concept. To demonstrate its precision, the proposed algorithm is validated against a variety of test functions, including 19 traditional benchmark functions and the CECC06 2019 test functions. The results of Leo for 19 classic benchmark test functions are evaluated against DA, PSO, and GA separately, and then two other recent algorithms such as FDO and LPB are also included in the evaluation. In addition, the Leo is tested by ten functions on CECC06 2019 with DA, WOA, SSA, FDO, LPB, and FOX algorithms distinctly. The cumulative outcomes demonstrate Leo's capacity to increase the starting population and move toward the global optimum. Different standard measurements are used to verify and prove the stability of Leo in both the exploration and exploitation phases. Moreover, Statistical analysis supports the findings results of the proposed research. Finally, novel applications in the real world are introduced to demonstrate the practicality of Leo.

Astroformer: More Data Might Not be All You Need for Classification

  • Authors: Rishit Dagli
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05350
  • Pdf link: https://arxiv.org/pdf/2304.05350
  • Abstract
    Recent advancements in areas such as natural language processing and computer vision rely on intricate and massive models that have been trained using vast amounts of unlabelled or partly labeled data and training or deploying these state-of-the-art methods to resource constraint environments has been a challenge. Galaxy morphologies are crucial to understanding the processes by which galaxies form and evolve. Efficient methods to classify galaxy morphologies are required to extract physical information from modern-day astronomy surveys. In this paper, we introduce methods to learn from less amounts of data. We propose using a hybrid transformer-convolutional architecture drawing much inspiration from the success of CoAtNet and MaxViT. Concretely, we use the transformer-convolutional hybrid with a new stack design for the network, a different way of creating a relative self-attention layer, and pair it with a careful selection of data augmentation and regularization techniques. Our approach sets a new state-of-the-art on predicting galaxy morphologies from images on the Galaxy10 DECals dataset, a science objective, which consists of 17736 labeled images achieving $94.86%$ top-$1$ accuracy, beating the current state-of-the-art for this task by $4.62%$. Furthermore, this approach also sets a new state-of-the-art on CIFAR-100 and Tiny ImageNet. We also find that models and training methods used for larger datasets would often not work very well in the low-data regime. Our code and models will be released at a later date before the conference.

Asymmetric Polynomial Loss For Multi-Label Classification

  • Authors: Yusheng Huang, Jiexing Qi, Xinbing Wang, Zhouhan Lin
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05361
  • Pdf link: https://arxiv.org/pdf/2304.05361
  • Abstract
    Various tasks are reformulated as multi-label classification problems, in which the binary cross-entropy (BCE) loss is frequently utilized for optimizing well-designed models. However, the vanilla BCE loss cannot be tailored for diverse tasks, resulting in a suboptimal performance for different models. Besides, the imbalance between redundant negative samples and rare positive samples could degrade the model performance. In this paper, we propose an effective Asymmetric Polynomial Loss (APL) to mitigate the above issues. Specifically, we first perform Taylor expansion on BCE loss. Then we ameliorate the coefficients of polynomial functions. We further employ the asymmetric focusing mechanism to decouple the gradient contribution from the negative and positive samples. Moreover, we validate that the polynomial coefficients can recalibrate the asymmetric focusing hyperparameters. Experiments on relation extraction, text classification, and image classification show that our APL loss can consistently improve performance without extra training burden.

Design and Analysis of Index codes for 3-Group NOMA in Vehicular Adhoc Networks

  • Authors: Sai Pavan Deekshitula, B. Sundar Rajan
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2304.05379
  • Pdf link: https://arxiv.org/pdf/2304.05379
  • Abstract
    Index coding (IC) is a source coding technique employed to improve spectral utilisation, where the source node aims to satisfy users' demands by making minimum transmissions. Non-orthogonal multiple access (NOMA) is integral to the radio access technique used in 5G networks. Index-coded NOMA (IC-NOMA) transmission scheme in Vehicular Adhoc Networks (VANETs) involves applying NOMA principles on index-coded data to avoid network congestion and to improve spectral efficiency compared to conventional IC systems. In this work, a spectral efficient transmission scheme called 3-Group IC-NOMA is proposed, and an innovative index code design that fits with NOMA decoding principles to obtain improved spectral efficiency is developed. Through exhaustive analytical studies, we demonstrate that the proposed transmission scheme always supports higher rates than the conventional IC systems and requires less power to achieve an information rate at least as good as conventional IC systems.

Keyword: faster

Similarity search in the blink of an eye with compressed indices

  • Authors: Cecilia Aguerrebere, Ishwar Bhati, Mark Hildebrand, Mariano Tepper, Ted Willke
  • Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.04759
  • Pdf link: https://arxiv.org/pdf/2304.04759
  • Abstract
    Nowadays, data is represented by vectors. Retrieving those vectors, among millions and billions, that are similar to a given query is a ubiquitous problem of relevance for a wide range of applications. In this work, we present new techniques for creating faster and smaller indices to run these searches. To this end, we introduce a novel vector compression method, Locally-adaptive Vector Quantization (LVQ), that simultaneously reduces memory footprint and improves search performance, with minimal impact on search accuracy. LVQ is designed to work optimally in conjunction with graph-based indices, reducing their effective bandwidth while enabling random-access-friendly fast similarity computations. Our experimental results show that LVQ, combined with key optimizations for graph-based indices in modern datacenter systems, establishes the new state of the art in terms of performance and memory footprint. For billions of vectors, LVQ outcompetes the second-best alternatives: (1) in the low-memory regime, by up to 20.7x in throughput with up to a 3x memory footprint reduction, and (2) in the high-throughput regime by 5.8x with 1.4x less memory.

RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud Environments

  • Authors: Drew Penney, Bin Li, Lizhong Chen, Jaroslaw J. Sydir, Anna Drewek-Ossowicka, Ramesh Illikkal, Charlie Tai, Ravi Iyer, Andrew Herdrich
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.04797
  • Pdf link: https://arxiv.org/pdf/2304.04797
  • Abstract
    Resource sharing between multiple workloads has become a prominent practice among cloud service providers, motivated by demand for improved resource utilization and reduced cost of ownership. Effective resource sharing, however, remains an open challenge due to the adverse effects that resource contention can have on high-priority, user-facing workloads with strict Quality of Service (QoS) requirements. Although recent approaches have demonstrated promising results, those works remain largely impractical in public cloud environments since workloads are not known in advance and may only run for a brief period, thus prohibiting offline learning and significantly hindering online learning. In this paper, we propose RAPID, a novel framework for fast, fully-online resource allocation policy learning in highly dynamic operating environments. RAPID leverages lightweight QoS predictions, enabled by domain-knowledge-inspired techniques for sample efficiency and bias reduction, to decouple control from conventional feedback sources and guide policy learning at a rate orders of magnitude faster than prior work. Evaluation on a real-world server platform with representative cloud workloads confirms that RAPID can learn stable resource allocation policies in minutes, as compared with hours in prior state-of-the-art, while improving QoS by 9.0x and increasing best-effort workload performance by 19-43%.

An Experimental Study of Two-Level Schwarz Domain Decomposition Preconditioners on GPUs

  • Authors: Ichitaro Yamazaki, Alexander Heinlein, Sivasankaran Rajamanickam
  • Subjects: Numerical Analysis (math.NA); Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS)
  • Arxiv link: https://arxiv.org/abs/2304.04876
  • Pdf link: https://arxiv.org/pdf/2304.04876
  • Abstract
    The generalized Dryja--Smith--Widlund (GDSW) preconditioner is a two-level overlapping Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlapping Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the solution of sparse linear systems arising from the discretization of a wide range of partial different equations. In this paper, we present FROSch (Fast and Robust Schwarz), a domain decomposition solver package which implements GDSW-type preconditioners for both CPU and GPU clusters. To improve the solver performance on GPUs, we use a novel decomposition to run multiple MPI processes on each GPU, reducing both solver's computational and storage costs and potentially improving the convergence rate. This allowed us to obtain competitive or faster performance using GPUs compared to using CPUs alone. We demonstrate the performance of FROSch on the Summit supercomputer with NVIDIA V100 GPUs, where we used NVIDIA Multi-Process Service (MPS) to implement our decomposition strategy. The solver has a wide variety of algorithmic and implementation choices, which poses both opportunities and challenges for its GPU implementation. We conduct a thorough experimental study with different solver options including the exact or inexact solution of the local overlapping subdomain problems on a GPU. We also discuss the effect of using the iterative variant of the incomplete LU factorization and sparse-triangular solve as the approximate local solver, and using lower precision for computing the whole FROSch preconditioner. Overall, the solve time was reduced by factors of about $2\times$ using GPUs, while the GPU acceleration of the numerical setup time depend on the solver options and the local matrix sizes.

Multi-Sample Consensus Driven Unsupervised Normal Estimation for 3D Point Clouds

  • Authors: Jie Zhang, Minghui Nie, Junjie Cao, Jian Liu, Ligang Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.04884
  • Pdf link: https://arxiv.org/pdf/2304.04884
  • Abstract
    Deep normal estimators have made great strides on synthetic benchmarks. Unfortunately, their performance dramatically drops on the real scan data since they are supervised only on synthetic datasets. The point-wise annotation of ground truth normals is vulnerable to inefficiency and inaccuracies, which totally makes it impossible to build perfect real datasets for supervised deep learning. To overcome the challenge, we propose a multi-sample consensus paradigm for unsupervised normal estimation. The paradigm consists of multi-candidate sampling, candidate rejection, and mode determination. The latter two are driven by neighbor point consensus and candidate consensus respectively. Two primary implementations of the paradigm, MSUNE and MSUNE-Net, are proposed. MSUNE minimizes a candidate consensus loss in mode determination. As a robust optimization method, it outperforms the cutting-edge supervised deep learning methods on real data at the cost of longer runtime for sampling enough candidate normals for each query point. MSUNE-Net, the first unsupervised deep normal estimator as far as we know, significantly promotes the multi-sample consensus further. It transfers the three online stages of MSUNE to offline training. Thereby its inference time is 100 times faster. Besides that, more accurate inference is achieved, since the candidates of query points from similar patches can form a sufficiently large candidate set implicitly in MSUNE-Net. Comprehensive experiments demonstrate that the two proposed unsupervised methods are noticeably superior to some supervised deep normal estimators on the most common synthetic dataset. More importantly, they show better generalization ability and outperform all the SOTA conventional and deep methods on three real datasets: NYUV2, KITTI, and a dataset from PCV [1].

Neural Network Predicts Ion Concentration Profiles under Nanoconfinement

  • Authors: Zhonglin Cao, Yuyang Wang, Cooper Lorsung, Amir Barati Farimani
  • Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)
  • Arxiv link: https://arxiv.org/abs/2304.04896
  • Pdf link: https://arxiv.org/pdf/2304.04896
  • Abstract
    Modeling the ion concentration profile in nanochannel plays an important role in understanding the electrical double layer and electroosmotic flow. Due to the non-negligible surface interaction and the effect of discrete solvent molecules, molecular dynamics (MD) simulation is often used as an essential tool to study the behavior of ions under nanoconfinement. Despite the accuracy of MD simulation in modeling nanoconfinement systems, it is computationally expensive. In this work, we propose neural network to predict ion concentration profiles in nanochannels with different configurations, including channel widths, ion molarity, and ion types. By modeling the ion concentration profile as a probability distribution, our neural network can serve as a much faster surrogate model for MD simulation with high accuracy. We further demonstrate the superior prediction accuracy of neural network over XGBoost. Lastly, we demonstrated that neural network is flexible in predicting ion concentration profiles with different bin sizes. Overall, our deep learning model is a fast, flexible, and accurate surrogate model to predict ion concentration profiles in nanoconfinement.

Computer Vision-Aided Intelligent Monitoring of Coffee: Towards Sustainable Coffee Production

  • Authors: Francisco Eron, Muhammad Noman, Raphael Ricon de Oliveira, Deigo de Souza Marques, Rafael Serapilha Durelli, Andre Pimenta Freire, Antonio Chalfun Junior
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.04966
  • Pdf link: https://arxiv.org/pdf/2304.04966
  • Abstract
    Coffee which is prepared from the grinded roasted seeds of harvested coffee cherries, is one of the most consumed beverage and traded commodity, globally. To manually monitor the coffee field regularly, and inform about plant and soil health, as well as estimate yield and harvesting time, is labor-intensive, time-consuming and error-prone. Some recent studies have developed sensors for estimating coffee yield at the time of harvest, however a more inclusive and applicable technology to remotely monitor multiple parameters of the field and estimate coffee yield and quality even at pre-harvest stage, was missing. Following precision agriculture approach, we employed machine learning algorithm YOLO, for image processing of coffee plant. In this study, the latest version of the state-of-the-art algorithm YOLOv7 was trained with 324 annotated images followed by its evaluation with 82 unannotated images as test data. Next, as an innovative approach for annotating the training data, we trained K-means models which led to machine-generated color classes of coffee fruit and could thus characterize the informed objects in the image. Finally, we attempted to develop an AI-based handy mobile application which would not only efficiently predict harvest time, estimate coffee yield and quality, but also inform about plant health. Resultantly, the developed model efficiently analyzed the test data with a mean average precision of 0.89. Strikingly, our innovative semi-supervised method with an mean average precision of 0.77 for multi-class mode surpassed the supervised method with mean average precision of only 0.60, leading to faster and more accurate annotation. The mobile application we designed based on the developed code, was named CoffeApp, which possesses multiple features of analyzing fruit from the image taken by phone camera with in field and can thus track fruit ripening in real time.

Fast IMU-based Dual Estimation of Human Motion and Kinematic Parameters via Progressive In-Network Computing

  • Authors: Xiaobing Dai, Huanzhuo Wu, Siyi Wang, Junjie Jiao, Giang T. Nguyen, Frank H. P. Fitzek, Sandra Hirche
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.05131
  • Pdf link: https://arxiv.org/pdf/2304.05131
  • Abstract
    Many applications involve humans in the loop, where continuous and accurate human motion monitoring provides valuable information for safe and intuitive human-machine interaction. Portable devices such as inertial measurement units (IMUs) are applicable to monitor human motions, while in practice often limited computational power is available locally. The human motion in task space coordinates requires not only the human joint motion but also the nonlinear coordinate transformation depending on the parameters such as human limb length. In most applications, measuring these kinematics parameters for each individual requires undesirably high effort. Therefore, it is desirable to estimate both, the human motion and kinematic parameters from IMUs. In this work, we propose a novel computational framework for dual estimation in real-time exploiting in-network computational resources. We adopt the concept of field Kalman filtering, where the dual estimation problem is decomposed into a fast state estimation process and a computationally expensive parameter estimation process. In order to further accelerate the convergence, the parameter estimation is progressively computed on multiple networked computational nodes. The superiority of our proposed method is demonstrated by a simulation of a human arm, where the estimation accuracy is shown to converge faster than with conventional approaches.

PP-MobileSeg: Explore the Fast and Accurate Semantic Segmentation Model on Mobile Devices

  • Authors: Shiyu Tang, Ting Sun, Juncai Peng, Guowei Chen, Yuying Hao, Manhui Lin, Zhihong Xiao, Jiangbin You, Yi Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05152
  • Pdf link: https://arxiv.org/pdf/2304.05152
  • Abstract
    The success of transformers in computer vision has led to several attempts to adapt them for mobile devices, but their performance remains unsatisfactory in some real-world applications. To address this issue, we propose PP-MobileSeg, a semantic segmentation model that achieves state-of-the-art performance on mobile devices. PP-MobileSeg comprises three novel parts: the StrideFormer backbone, the Aggregated Attention Module (AAM), and the Valid Interpolate Module (VIM). The four-stage StrideFormer backbone is built with MV3 blocks and strided SEA attention, and it is able to extract rich semantic and detailed features with minimal parameter overhead. The AAM first filters the detailed features through semantic feature ensemble voting and then combines them with semantic features to enhance the semantic information. Furthermore, we proposed VIM to upsample the downsampled feature to the resolution of the input image. It significantly reduces model latency by only interpolating classes present in the final prediction, which is the most significant contributor to overall model latency. Extensive experiments show that PP-MobileSeg achieves a superior tradeoff between accuracy, model size, and latency compared to other methods. On the ADE20K dataset, PP-MobileSeg achieves 1.57% higher accuracy in mIoU than SeaFormer-Base with 32.9% fewer parameters and 42.3% faster acceleration on Qualcomm Snapdragon 855. Source codes are available at https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.8.

flap: A Deterministic Parser with Fused Lexing

  • Authors: Neel Krishnaswami, Ningning Xie, Jeremy Yallop
  • Subjects: Programming Languages (cs.PL)
  • Arxiv link: https://arxiv.org/abs/2304.05276
  • Pdf link: https://arxiv.org/pdf/2304.05276
  • Abstract
    Lexers and parsers are typically defined separately and connected by a token stream. This separate definition is important for modularity and reduces the potential for parsing ambiguity. However, materializing tokens as data structures and case-switching on tokens comes with a cost. We show how to fuse separately-defined lexers and parsers, drastically improving performance without compromising modularity or increasing ambiguity. We propose a deterministic variant of Greibach Normal Form that ensures deterministic parsing with a single token of lookahead and makes fusion strikingly simple, and prove that normalizing context free expressions into the deterministic normal form is semantics-preserving. Our staged parser combinator library, flap, provides a standard interface, but generates specialized token-free code that runs two to six times faster than ocamlyacc on a range of benchmarks.

TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Training

  • Authors: William Won, Midhilesh Elavazhagan, Sudarshan Srinivasan, Ajaya Durg, Swati Gupta, Tushar Krishna
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.05301
  • Pdf link: https://arxiv.org/pdf/2304.05301
  • Abstract
    Collective communications are an indispensable part of distributed training. Running a topology-aware collective algorithm is crucial for optimizing communication performance by minimizing congestion. Today such algorithms only exist for a small set of simple topologies, limiting the topologies employed in training clusters and handling irregular topologies due to network failures. In this paper, we propose TACOS, an automated topology-aware collective synthesizer for arbitrary input network topologies. TACOS synthesized 3.73x faster All-Reduce algorithm over baselines, and synthesized collective algorithms for 512-NPU system in just 6.1 minutes.

SciKGTeX -- A LaTeX Package to Semantically Annotate Contributions in Scientific Publications

  • Authors: Christof Bless, Ildar Baimuratov, Oliver Karras
  • Subjects: Digital Libraries (cs.DL); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.05327
  • Pdf link: https://arxiv.org/pdf/2304.05327
  • Abstract
    Scientific knowledge graphs have been proposed as a solution to structure the content of research publications in a machine-actionable way and enable more efficient, computer-assisted workflows for many research activities. Crowd-sourcing approaches are used frequently to build and maintain such scientific knowledge graphs. To contribute to scientific knowledge graphs, researchers need simple and easy-to-use solutions to generate new knowledge graph elements and establish the practice of semantic representations in scientific communication. In this paper, we present a workflow for authors of scientific documents to specify their contributions with a LaTeX package, called SciKGTeX, and upload them to a scientific knowledge graph. The SciKGTeX package allows authors of scientific publications to mark the main contributions of their work directly in LaTeX source files. The package embeds marked contributions as metadata into the generated PDF document, from where they can be extracted automatically and imported into a scientific knowledge graph, such as the ORKG. This workflow is simpler and faster than current approaches, which make use of external web interfaces for data entry. Our user evaluation shows that SciKGTeX is easy to use, with a score of 79 out of 100 on the System Usability Scale, as participants of the study needed only 7 minutes on average to annotate the main contributions on a sample abstract of a published paper. Further testing shows that the embedded contributions can be successfully uploaded to ORKG within ten seconds. SciKGTeX simplifies the process of manual semantic annotation of research contributions in scientific articles. Our workflow demonstrates how a scientific knowledge graph can automatically ingest research contributions from document metadata.

Keyword: mobile

Robust Body Exposure (RoBE): A Graph-based Dynamics Modeling Approach to Manipulating Blankets over People

  • Authors: Kavya Puthuveetil, Sasha Wald, Atharva Pusalkar, Pratyusha Karnati, Zackory Erickson
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.04822
  • Pdf link: https://arxiv.org/pdf/2304.04822
  • Abstract
    Robotic caregivers could potentially improve the quality of life of many who require physical assistance. However, in order to assist individuals who are lying in bed, robots must be capable of dealing with a significant obstacle: the blanket or sheet that will almost always cover the person's body. We propose a method for targeted bedding manipulation over people lying supine in bed where we first learn a model of the cloth's dynamics. Then, we optimize over this model to uncover a given target limb using information about human body shape and pose that only needs to be provided at run-time. We show how this approach enables greater robustness to variation relative to geometric and reinforcement learning baselines via a number of generalization evaluations in simulation and in the real world. We further evaluate our approach in a human study with 12 participants where we demonstrate that a mobile manipulator can adapt to real variation in human body shape, size, pose, and blanket configuration to uncover target body parts without exposing the rest of the body. Source code and supplementary materials are available online.

MHfit: Mobile Health Data for Predicting Athletics Fitness Using Machine Learning

  • Authors: Jonayet Miah, Muntasir mamun, Md Minhazur Rahman, Md Ishtyaq Mahmyd, Asm Mohaimenul Islam, Sabbir Ahmed
  • Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.04839
  • Pdf link: https://arxiv.org/pdf/2304.04839
  • Abstract
    Mobile phones and other electronic gadgets or devices have aided in collecting data without the need for data entry. This paper will specifically focus on Mobile health data. Mobile health data use mobile devices to gather clinical health data and track patient vitals in real-time. Our study is aimed to give decisions for small or big sports teams on whether one athlete good fit or not for a particular game with the compare several machine learning algorithms to predict human behavior and health using the data collected from mobile devices and sensors placed on patients. In this study, we have obtained the dataset from a similar study done on mhealth. The dataset contains vital signs recordings of ten volunteers from different backgrounds. They had to perform several physical activities with a sensor placed on their bodies. Our study used 5 machine learning algorithms (XGBoost, Naive Bayes, Decision Tree, Random Forest, and Logistic Regression) to analyze and predict human health behavior. XGBoost performed better compared to the other machine learning algorithms and achieved 95.2% accuracy, 99.5% in sensitivity, 99.5% in specificity, and 99.66% in F1 score. Our research indicated a promising future in mhealth being used to predict human behavior and further research and exploration need to be done for it to be available for commercial use specifically in the sports industry.

Bounding Box Annotation with Visible Status

  • Authors: Takuya Kiyokawa, Naoki Shirakura, Hiroki Katayama, Keita Tomochika, Jun Takamatsu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2304.04901
  • Pdf link: https://arxiv.org/pdf/2304.04901
  • Abstract
    Training deep-learning-based vision systems requires the manual annotation of a significant amount of data to optimize several parameters of the deep convolutional neural networks. Such manual annotation is highly time-consuming and labor-intensive. To reduce this burden, a previous study presented a fully automated annotation approach that does not require any manual intervention. The proposed method associates a visual marker with an object and captures it in the same image. However, because the previous method relied on moving the object within the capturing range using a fixed-point camera, the collected image dataset was limited in terms of capturing viewpoints. To overcome this limitation, this study presents a mobile application-based free-viewpoint image-capturing method. With the proposed application, users can collect multi-view image datasets automatically that are annotated with bounding boxes by moving the camera. However, capturing images through human involvement is laborious and monotonous. Therefore, we propose gamified application features to track the progress of the collection status. Our experiments demonstrated that using the gamified mobile application for bounding box annotation, with visible collection progress status, can motivate users to collect multi-view object image datasets with less mental workload and time pressure in an enjoyable manner, leading to increased engagement.

Computer Vision-Aided Intelligent Monitoring of Coffee: Towards Sustainable Coffee Production

  • Authors: Francisco Eron, Muhammad Noman, Raphael Ricon de Oliveira, Deigo de Souza Marques, Rafael Serapilha Durelli, Andre Pimenta Freire, Antonio Chalfun Junior
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.04966
  • Pdf link: https://arxiv.org/pdf/2304.04966
  • Abstract
    Coffee which is prepared from the grinded roasted seeds of harvested coffee cherries, is one of the most consumed beverage and traded commodity, globally. To manually monitor the coffee field regularly, and inform about plant and soil health, as well as estimate yield and harvesting time, is labor-intensive, time-consuming and error-prone. Some recent studies have developed sensors for estimating coffee yield at the time of harvest, however a more inclusive and applicable technology to remotely monitor multiple parameters of the field and estimate coffee yield and quality even at pre-harvest stage, was missing. Following precision agriculture approach, we employed machine learning algorithm YOLO, for image processing of coffee plant. In this study, the latest version of the state-of-the-art algorithm YOLOv7 was trained with 324 annotated images followed by its evaluation with 82 unannotated images as test data. Next, as an innovative approach for annotating the training data, we trained K-means models which led to machine-generated color classes of coffee fruit and could thus characterize the informed objects in the image. Finally, we attempted to develop an AI-based handy mobile application which would not only efficiently predict harvest time, estimate coffee yield and quality, but also inform about plant health. Resultantly, the developed model efficiently analyzed the test data with a mean average precision of 0.89. Strikingly, our innovative semi-supervised method with an mean average precision of 0.77 for multi-class mode surpassed the supervised method with mean average precision of only 0.60, leading to faster and more accurate annotation. The mobile application we designed based on the developed code, was named CoffeApp, which possesses multiple features of analyzing fruit from the image taken by phone camera with in field and can thus track fruit ripening in real time.

Measuring Teachers' Visual Expertise Using the Gaze Relational Index Based on Real-world Eye-tracking Data and Varying Velocity Thresholds

  • Authors: Christian Kosel (1), Angelina Mooseder (2), Tina Seidl (1), Juergen Pfeffer (2) ((1) Friedl Schoeller Endowed Chair for Educational Psychology, School of Social Science and Technology, Technical University Munich, Germany, (2) Computational Social Science and Big Data, School of Social Science and Technology, Technical University Munich, Germany)
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.05143
  • Pdf link: https://arxiv.org/pdf/2304.05143
  • Abstract
    This article adds to the understanding of teachers' visual expertise by measuring visual information processing in real-world classrooms (mobile eye-tracking) with the newly introduced Gaze Relational Index (GRI) metric, which is defined as the ratio of mean fixation duration to mean fixation number. In addition, the aim was to provide a methodological contribution to future research by showing to what extent the selected configurations (i.e. varying velocity thresholds and fixation merging) of the eye movement event detection algorithm for detecting fixations and saccades influence the results of eye-tracking studies. Our study leads to two important take-home messages: First, by following a novice-expert paradigm (2 novice teachers & 2 experienced teachers), we found that the GRI can serve as a sensitive measure of visual expertise. As hypothesized, experienced teachers' GRI was lower, suggesting that their more fine-graded organization of domain-specific knowledge allows them to fixate more rapidly and frequently in the classroom. Second, we found that the selected velocity threshold parameter alter and, in the worst case, bias the results of an eye-tracking study. Therefore, in the interest of further generalizability of the results within visual expertise research, we emphasize that it is highly important to report configurations that are relevant for the identification of eye movements.

PP-MobileSeg: Explore the Fast and Accurate Semantic Segmentation Model on Mobile Devices

  • Authors: Shiyu Tang, Ting Sun, Juncai Peng, Guowei Chen, Yuying Hao, Manhui Lin, Zhihong Xiao, Jiangbin You, Yi Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05152
  • Pdf link: https://arxiv.org/pdf/2304.05152
  • Abstract
    The success of transformers in computer vision has led to several attempts to adapt them for mobile devices, but their performance remains unsatisfactory in some real-world applications. To address this issue, we propose PP-MobileSeg, a semantic segmentation model that achieves state-of-the-art performance on mobile devices. PP-MobileSeg comprises three novel parts: the StrideFormer backbone, the Aggregated Attention Module (AAM), and the Valid Interpolate Module (VIM). The four-stage StrideFormer backbone is built with MV3 blocks and strided SEA attention, and it is able to extract rich semantic and detailed features with minimal parameter overhead. The AAM first filters the detailed features through semantic feature ensemble voting and then combines them with semantic features to enhance the semantic information. Furthermore, we proposed VIM to upsample the downsampled feature to the resolution of the input image. It significantly reduces model latency by only interpolating classes present in the final prediction, which is the most significant contributor to overall model latency. Extensive experiments show that PP-MobileSeg achieves a superior tradeoff between accuracy, model size, and latency compared to other methods. On the ADE20K dataset, PP-MobileSeg achieves 1.57% higher accuracy in mIoU than SeaFormer-Base with 32.9% fewer parameters and 42.3% faster acceleration on Qualcomm Snapdragon 855. Source codes are available at https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.8.

A user co-designed digital INtervention for Child LangUage DisordEr: The INCLUDE Project Protocol

  • Authors: Rafiah Patel
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.05224
  • Pdf link: https://arxiv.org/pdf/2304.05224
  • Abstract
    Around ten percent of all children could have a disorder where language does not develop as expected. This often effects vocabulary skills, i.e., finding the words to express wants, needs and ideas, which can influence behaviours linked to wellbeing and daily functioning, such as concentration, independence, social interactions and managing emotions. Without specialist support, needs can increase in severity and continue to adulthood. The type of support, known as interventions showing strongest evidence for improving vocabulary with some signs of improved behaviour and wellbeing are ones that use word-webs. These are diagrams consisting of lines that connect sound and meaning information about a word to strengthen the child's word knowledge and use. The diagrams resemble what is commonly known as mind-maps and are widely used by Speech and Language Therapists in partnership with schools to help children with language difficulties. In addition, interventions delivered through mobile-devices has led in some cases to increased vocabulary gains with positive influence on wellbeing and academic attainment. With advances in technology and the availability of user-friendly mobile devices to capture, combine and replay multimedia content, new opportunities for designing bespoke vocabulary instruction have emerged that are without timing and location constraints. This brings the potential to engage and motivate users and harbour independence through functional strategies that support each child's unique language needs. To achieve this, children with language disorder, their parents/carers, support professionals and software development team members must work jointly to create an intervention that is fit for purpose. This is the first research planned to explore the collaborative development and acceptability of a digitally enhanced vocabulary intervention for child language disorder.

Keyword: pruning

FINEX: A Fast Index for Exact & Flexible Density-Based Clustering (Extended Version with Proofs)*

  • Authors: Konstantin Emil Thiel, Daniel Kocher, Nikolaus Augsten, Thomas Hütter, Willi Mann, Daniel Ulrich Schmitt
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.04817
  • Pdf link: https://arxiv.org/pdf/2304.04817
  • Abstract
    Density-based clustering aims to find groups of similar objects (i.e., clusters) in a given dataset. Applications include, e.g., process mining and anomaly detection. It comes with two user parameters ({\epsilon}, MinPts) that determine the clustering result, but are typically unknown in advance. Thus, users need to interactively test various settings until satisfying clusterings are found. However, existing solutions suffer from the following limitations: (a) Ineffective pruning of expensive neighborhood computations. (b) Approximate clustering, where objects are falsely labeled noise. (c) Restricted parameter tuning that is limited to {\epsilon} whereas MinPts is constant, which reduces the explorable clusterings. (d) Inflexibility in terms of applicable data types and distance functions. We propose FINEX, a linear-space index that overcomes these limitations. Our index provides exact clusterings and can be queried with either of the two parameters. FINEX avoids neighborhood computations where possible and reduces the complexities of the remaining computations by leveraging fundamental properties of density-based clusters. Hence, our solution is effcient and flexible regarding data types and distance functions. Moreover, FINEX respects the original and straightforward notion of density-based clustering. In our experiments on 12 large real-world datasets from various domains, FINEX frequently outperforms state-of-the-art techniques for exact clustering by orders of magnitude.

Design, Integration, and Field Evaluation of a Robotic Blossom Thinning System for Tree Fruit Crops

  • Authors: Uddhav Bhattarai, Qin Zhang, Manoj Karkee
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.04919
  • Pdf link: https://arxiv.org/pdf/2304.04919
  • Abstract
    The US apple industry relies heavily on semi-skilled manual labor force for essential field operations such as training, pruning, blossom and green fruit thinning, and harvesting. Blossom thinning is one of the crucial crop load management practices to achieve desired crop load, fruit quality, and return bloom. While several techniques such as chemical, and mechanical thinning are available for large-scale blossom thinning such approaches often yield unpredictable thinning results and may cause damage the canopy, spurs, and leaf tissue. Hence, growers still depend on laborious, labor intensive and expensive manual hand blossom thinning for desired thinning outcomes. This research presents a robotic solution for blossom thinning in apple orchards using a computer vision system with artificial intelligence, a six degrees of freedom robotic manipulator, and an electrically actuated miniature end-effector for robotic blossom thinning. The integrated robotic system was evaluated in a commercial apple orchard which showed promising results for targeted and selective blossom thinning. Two thinning approaches, center and boundary thinning, were investigated to evaluate the system ability to remove varying proportion of flowers from apple flower clusters. During boundary thinning the end effector was actuated around the cluster boundary while center thinning involved end-effector actuation only at the cluster centroid for a fixed duration of 2 seconds. The boundary thinning approach thinned 67.2% of flowers from the targeted clusters with a cycle time of 9.0 seconds per cluster, whereas center thinning approach thinned 59.4% of flowers with a cycle time of 7.2 seconds per cluster. When commercially adopted, the proposed system could help address problems faced by apple growers with current hand, chemical, and mechanical blossom thinning approaches.

Model sparsification can simplify machine unlearning

  • Authors: Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, Sijia Liu
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.04934
  • Pdf link: https://arxiv.org/pdf/2304.04934
  • Abstract
    Recent data regulations necessitate machine unlearning (MU): The removal of the effect of specific examples from the model. While exact unlearning is possible by conducting a model retraining with the remaining data from scratch, its computational cost has led to the development of approximate but efficient unlearning schemes. Beyond data-centric MU solutions, we advance MU through a novel model-based viewpoint: sparsification via weight pruning. Our results in both theory and practice indicate that model sparsity can boost the multi-criteria unlearning performance of an approximate unlearner, closing the approximation gap, while continuing to be efficient. With this insight, we develop two new sparsity-aware unlearning meta-schemes, termed prune first, then unlearn' and sparsity-aware unlearning'. Extensive experiments show that our findings and proposals consistently benefit MU in various scenarios, including class-wise data scrubbing, random data scrubbing, and backdoor data forgetting. One highlight is the 77% unlearning efficacy gain of fine-tuning (one of the simplest approximate unlearning methods) in the proposed sparsity-aware unlearning paradigm. Codes are available at https://github.com/OPTML-Group/Unlearn-Sparse.

Keyword: voxel

Weakly Supervised Intracranial Hemorrhage Segmentation using Head-Wise Gradient-Infused Self-Attention Maps from a Swin Transformer in Categorical Learning

  • Authors: Amirhossein Rasoulian, Soorena Salari, Yiming Xiao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.04902
  • Pdf link: https://arxiv.org/pdf/2304.04902
  • Abstract
    Intracranial hemorrhage (ICH) is a life-threatening medical emergency caused by various factors. Timely and precise diagnosis of ICH is crucial for administering effective treatment and improving patient survival rates. While deep learning techniques have emerged as the leading approach for medical image analysis and processing, the most commonly employed supervised learning often requires large, high-quality annotated datasets that can be costly to obtain, particularly for pixel/voxel-wise image segmentation. To address this challenge and facilitate ICH treatment decisions, we proposed a novel weakly supervised ICH segmentation method that leverages a hierarchical combination of head-wise gradient-infused self-attention maps obtained from a Swin transformer. The transformer is trained using an ICH classification task with categorical labels. To build and validate the proposed technique, we used two publicly available clinical CT datasets, namely RSNA 2019 Brain CT hemorrhage and PhysioNet. Additionally, we conducted an exploratory study comparing two learning strategies - binary classification and full ICH subtyping - to assess their impact on self-attention and our weakly supervised ICH segmentation framework. The proposed algorithm was compared against the popular U-Net with full supervision, as well as a similar weakly supervised approach using Grad-CAM for ICH segmentation. With a mean Dice score of 0.47, our technique achieved similar ICH segmentation performance as the U-Net and outperformed the Grad-CAM based approach, demonstrating the excellent potential of the proposed framework in challenging medical image segmentation tasks.

EvAC3D: From Event-based Apparent Contours to 3D Models via Continuous Visual Hulls

  • Authors: Ziyun Wang, Kenneth Chaney, Kostas Daniilidis
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05296
  • Pdf link: https://arxiv.org/pdf/2304.05296
  • Abstract
    3D reconstruction from multiple views is a successful computer vision field with multiple deployments in applications. State of the art is based on traditional RGB frames that enable optimization of photo-consistency cross views. In this paper, we study the problem of 3D reconstruction from event-cameras, motivated by the advantages of event-based cameras in terms of low power and latency as well as by the biological evidence that eyes in nature capture the same data and still perceive well 3D shape. The foundation of our hypothesis that 3D reconstruction is feasible using events lies in the information contained in the occluding contours and in the continuous scene acquisition with events. We propose Apparent Contour Events (ACE), a novel event-based representation that defines the geometry of the apparent contour of an object. We represent ACE by a spatially and temporally continuous implicit function defined in the event x-y-t space. Furthermore, we design a novel continuous Voxel Carving algorithm enabled by the high temporal resolution of the Apparent Contour Events. To evaluate the performance of the method, we collect MOEC-3D, a 3D event dataset of a set of common real-world objects. We demonstrate the ability of EvAC3D to reconstruct high-fidelity mesh surfaces from real event sequences while allowing the refinement of the 3D reconstruction for each individual event.

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

  • Authors: Yunpeng Zhang, Zheng Zhu, Dalong Du
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05316
  • Pdf link: https://arxiv.org/pdf/2304.05316
  • Abstract
    The vision-based perception for autonomous driving has undergone a transformation from the bird-eye-view (BEV) representations to the 3D semantic occupancy. Compared with the BEV planes, the 3D semantic occupancy further provides structural information along the vertical direction. This paper presents OccFormer, a dual-path transformer network to effectively process the 3D volume for semantic occupancy prediction. OccFormer achieves a long-range, dynamic, and efficient encoding of the camera-generated 3D voxel features. It is obtained by decomposing the heavy 3D processing into the local and global transformer pathways along the horizontal plane. For the occupancy decoder, we adapt the vanilla Mask2Former for 3D semantic occupancy by proposing preserve-pooling and class-guided sampling, which notably mitigate the sparsity and class imbalance. Experimental results demonstrate that OccFormer significantly outperforms existing methods for semantic scene completion on SemanticKITTI dataset and for LiDAR semantic segmentation on nuScenes dataset. Code is available at \url{https://github.com/zhangyp15/OccFormer}.

Keyword: lidar

Simultaneous localization and mapping by using Low-Cost Ultrasonic Sensor for Underwater crawler

  • Authors: Trish Velan Dcruz, Cicero Estibeiro, Anil Shankar, Mangal Das
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.05155
  • Pdf link: https://arxiv.org/pdf/2304.05155
  • Abstract
    Autonomous robots can help people explore parts of the ocean that would be hard or impossible to get to otherwise. The increase in the availability of low-cost components has made it possible to innovate, design, and implement new and innovative ideas for underwater robotics. Cost-effective and open solutions that are available today can be used to replace expensive robot systems. The prototype of an autonomous robot system that functions in brackish waterways in settings such as fish hatcheries is presented in this research. The system has low-cost ultrasonic sensors that use a SLAM algorithm to map and move through the environment. When compared to previous studies that used Lidar sensors, this system's configuration was chosen to keep costs down. A comparison is shown between ultrasonic and lidar sensors, showing their respective pros and cons.

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

  • Authors: Yunpeng Zhang, Zheng Zhu, Dalong Du
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05316
  • Pdf link: https://arxiv.org/pdf/2304.05316
  • Abstract
    The vision-based perception for autonomous driving has undergone a transformation from the bird-eye-view (BEV) representations to the 3D semantic occupancy. Compared with the BEV planes, the 3D semantic occupancy further provides structural information along the vertical direction. This paper presents OccFormer, a dual-path transformer network to effectively process the 3D volume for semantic occupancy prediction. OccFormer achieves a long-range, dynamic, and efficient encoding of the camera-generated 3D voxel features. It is obtained by decomposing the heavy 3D processing into the local and global transformer pathways along the horizontal plane. For the occupancy decoder, we adapt the vanilla Mask2Former for 3D semantic occupancy by proposing preserve-pooling and class-guided sampling, which notably mitigate the sparsity and class imbalance. Experimental results demonstrate that OccFormer significantly outperforms existing methods for semantic scene completion on SemanticKITTI dataset and for LiDAR semantic segmentation on nuScenes dataset. Code is available at \url{https://github.com/zhangyp15/OccFormer}.

Keyword: diffusion

$\textit{e-Uber}$: A Crowdsourcing Platform for Electric Vehicle-based Ride- and Energy-sharing

  • Authors: Ashutosh Timilsina, Simone Silvestri
  • Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.04753
  • Pdf link: https://arxiv.org/pdf/2304.04753
  • Abstract
    The sharing-economy-based business model has recently seen success in the transportation and accommodation sectors with companies like Uber and Airbnb. There is growing interest in applying this model to energy systems, with modalities like peer-to-peer (P2P) Energy Trading, Electric Vehicles (EV)-based Vehicle-to-Grid (V2G), Vehicle-to-Home (V2H), Vehicle-to-Vehicle (V2V), and Battery Swapping Technology (BST). In this work, we exploit the increasing diffusion of EVs to realize a crowdsourcing platform called e-Uber that jointly enables ride-sharing and energy-sharing through V2G and BST. e-Uber exploits spatial crowdsourcing, reinforcement learning, and reverse auction theory. Specifically, the platform uses reinforcement learning to understand the drivers' preferences towards different ride-sharing and energy-sharing tasks. Based on these preferences, a personalized list is recommended to each driver through CMAB-based Algorithm for task Recommendation System (CARS). Drivers bid on their preferred tasks in their list in a reverse auction fashion. Then e-Uber solves the task assignment optimization problem that minimizes cost and guarantees V2G energy requirement. We prove that this problem is NP-hard and introduce a bipartite matching-inspired heuristic, Bipartite Matching-based Winner selection (BMW), that has polynomial time complexity. Results from experiments using real data from NYC taxi trips and energy consumption show that e-Uber performs close to the optimum and finds better solutions compared to a state-of-the-art approach

DDRF: Denoising Diffusion Model for Remote Sensing Image Fusion

  • Authors: ZiHan Cao, ShiQi Cao, Xiao Wu, JunMing Hou, Ran Ran, Liang-Jian Deng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.04774
  • Pdf link: https://arxiv.org/pdf/2304.04774
  • Abstract
    Denosing diffusion model, as a generative model, has received a lot of attention in the field of image generation recently, thanks to its powerful generation capability. However, diffusion models have not yet received sufficient research in the field of image fusion. In this article, we introduce diffusion model to the image fusion field, treating the image fusion task as image-to-image translation and designing two different conditional injection modulation modules (i.e., style transfer modulation and wavelet modulation) to inject coarse-grained style information and fine-grained high-frequency and low-frequency information into the diffusion UNet, thereby generating fused images. In addition, we also discussed the residual learning and the selection of training objectives of the diffusion model in the image fusion task. Extensive experimental results based on quantitative and qualitative assessments compared with benchmarks demonstrates state-of-the-art results and good generalization performance in image fusion tasks. Finally, it is hoped that our method can inspire other works and gain insight into this field to better apply the diffusion model to image fusion tasks. Code shall be released for better reproducibility.

Binary Latent Diffusion

  • Authors: Ze Wang, Jiang Wang, Zicheng Liu, Qiang Qiu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.04820
  • Pdf link: https://arxiv.org/pdf/2304.04820
  • Abstract
    In this paper, we show that a binary latent space can be explored for compact yet expressive image representations. We model the bi-directional mappings between an image and the corresponding latent binary representation by training an auto-encoder with a Bernoulli encoding distribution. On the one hand, the binary latent space provides a compact discrete image representation of which the distribution can be modeled more efficiently than pixels or continuous latent representations. On the other hand, we now represent each image patch as a binary vector instead of an index of a learned cookbook as in discrete image representations with vector quantization. In this way, we obtain binary latent representations that allow for better image quality and high-resolution image representations without any multi-stage hierarchy in the latent space. In this binary latent space, images can now be generated effectively using a binary latent diffusion model tailored specifically for modeling the prior over the binary image representations. We present both conditional and unconditional image generation experiments with multiple datasets, and show that the proposed method performs comparably to state-of-the-art methods while dramatically improving the sampling efficiency to as few as 16 steps without using any test-time acceleration. The proposed framework can also be seamlessly scaled to $1024 \times 1024$ high-resolution image generation without resorting to latent hierarchy or multi-stage refinements.

iPINNs: Incremental learning for Physics-informed neural networks

  • Authors: Aleksandr Dekhovich, Marcel H.F. Sluiter, David M.J. Tax, Miguel A. Bessa
  • Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.04854
  • Pdf link: https://arxiv.org/pdf/2304.04854
  • Abstract
    Physics-informed neural networks (PINNs) have recently become a powerful tool for solving partial differential equations (PDEs). However, finding a set of neural network parameters that lead to fulfilling a PDE can be challenging and non-unique due to the complexity of the loss landscape that needs to be traversed. Although a variety of multi-task learning and transfer learning approaches have been proposed to overcome these issues, there is no incremental training procedure for PINNs that can effectively mitigate such training challenges. We propose incremental PINNs (iPINNs) that can learn multiple tasks (equations) sequentially without additional parameters for new tasks and improve performance for every equation in the sequence. Our approach learns multiple PDEs starting from the simplest one by creating its own subnetwork for each PDE and allowing each subnetwork to overlap with previously learned subnetworks. We demonstrate that previous subnetworks are a good initialization for a new equation if PDEs share similarities. We also show that iPINNs achieve lower prediction error than regular PINNs for two different scenarios: (1) learning a family of equations (e.g., 1-D convection PDE); and (2) learning PDEs resulting from a combination of processes (e.g., 1-D reaction-diffusion PDE). The ability to learn all problems with a single network together with learning more complex PDEs with better generalization than regular PINNs will open new avenues in this field.

Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond

  • Authors: Mohammadreza Armandpour, Huangjie Zheng, Ali Sadeghian, Amir Sadeghian, Mingyuan Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.04968
  • Pdf link: https://arxiv.org/pdf/2304.04968
  • Abstract
    Although text-to-image diffusion models have made significant strides in generating images from text, they are sometimes more inclined to generate images like the data on which the model was trained rather than the provided text. This limitation has hindered their usage in both 2D and 3D applications. To address this problem, we explored the use of negative prompts but found that the current implementation fails to produce desired results, particularly when there is an overlap between the main and negative prompts. To overcome this issue, we propose Perp-Neg, a new algorithm that leverages the geometrical properties of the score space to address the shortcomings of the current negative prompts algorithm. Perp-Neg does not require any training or fine-tuning of the model. Moreover, we experimentally demonstrate that Perp-Neg provides greater flexibility in generating images by enabling users to edit out unwanted concepts from the initially generated images in 2D cases. Furthermore, to extend the application of Perp-Neg to 3D, we conducted a thorough exploration of how Perp-Neg can be used in 2D to condition the diffusion model to generate desired views, rather than being biased toward the canonical views. Finally, we applied our 2D intuition to integrate Perp-Neg with the state-of-the-art text-to-3D (DreamFusion) method, effectively addressing its Janus (multi-head) problem.

Diffusion Recommender Model

  • Authors: Wenjie Wang, Yiyan Xu, Fuli Feng, Xinyu Lin, Xiangnan He, Tat-Seng Chua
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.04971
  • Pdf link: https://arxiv.org/pdf/2304.04971
  • Abstract
    Generative models such as Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs) are widely utilized to model the generative process of user interactions. However, these generative models suffer from intrinsic limitations such as the instability of GANs and the restricted representation ability of VAEs. Such limitations hinder the accurate modeling of the complex user interaction generation procedure, such as noisy interactions caused by various interference factors. In light of the impressive advantages of Diffusion Models (DMs) over traditional generative models in image synthesis, we propose a novel Diffusion Recommender Model (named DiffRec) to learn the generative process in a denoising manner. To retain personalized information in user interactions, DiffRec reduces the added noises and avoids corrupting users' interactions into pure noises like in image synthesis. In addition, we extend traditional DMs to tackle the unique challenges in practical recommender systems: high resource costs for large-scale item prediction and temporal shifts of user preference. To this end, we propose two extensions of DiffRec: L-DiffRec clusters items for dimension compression and conducts the diffusion processes in the latent space; and T-DiffRec reweights user interactions based on the interaction timestamps to encode temporal information. We conduct extensive experiments on three datasets under multiple settings (e.g. clean training, noisy training, and temporal training). The empirical results and in-depth analysis validate the superiority of DiffRec with two extensions over competitive baselines.

SPIRiT-Diffusion: Self-Consistency Driven Diffusion Model for Accelerated MRI

  • Authors: Zhuo-Xu Cui, Chentao Cao, Jing Cheng, Sen Jia, Hairong Zheng, Dong Liang, Yanjie Zhu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05060
  • Pdf link: https://arxiv.org/pdf/2304.05060
  • Abstract
    Diffusion models are a leading method for image generation and have been successfully applied in magnetic resonance imaging (MRI) reconstruction. Current diffusion-based reconstruction methods rely on coil sensitivity maps (CSM) to reconstruct multi-coil data. However, it is difficult to accurately estimate CSMs in practice use, resulting in degradation of the reconstruction quality. To address this issue, we propose a self-consistency-driven diffusion model inspired by the iterative self-consistent parallel imaging (SPIRiT), namely SPIRiT-Diffusion. Specifically, the iterative solver of the self-consistent term in SPIRiT is utilized to design a novel stochastic differential equation (SDE) for diffusion process. Then $\textit{k}$-space data can be interpolated directly during the reverse diffusion process, instead of using CSM to separate and combine individual coil images. This method indicates that the optimization model can be used to design SDE in diffusion models, driving the diffusion process strongly conforming with the physics involved in the optimization model, dubbed model-driven diffusion. The proposed SPIRiT-Diffusion method was evaluated on a 3D joint Intracranial and Carotid Vessel Wall imaging dataset. The results demonstrate that it outperforms the CSM-based reconstruction methods, and achieves high reconstruction quality at a high acceleration rate of 10.

Gradient flows of interacting Laguerre cells as discrete porous media flows

  • Authors: Andrea Natale (RAPSODI )
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.05069
  • Pdf link: https://arxiv.org/pdf/2304.05069
  • Abstract
    We study a class of discrete models in which a collection of particles evolves in time following the gradient flow of an energy depending on the cell areas of an associated Laguerre (i.e. a weighted Voronoi) tessellation. We consider the high number of cell limit of such systems and, using a modulated energy argument, we prove convergence towards smooth solutions of nonlinear diffusion PDEs of porous medium type.

Modeling and design of heterogeneous hierarchical bioinspired spider web structures using generative deep learning and additive manufacturing

  • Authors: Wei Lu, Nic A. Lee, Markus J. Buehler
  • Subjects: Machine Learning (cs.LG); Soft Condensed Matter (cond-mat.soft); Adaptation and Self-Organizing Systems (nlin.AO)
  • Arxiv link: https://arxiv.org/abs/2304.05137
  • Pdf link: https://arxiv.org/pdf/2304.05137
  • Abstract
    Spider webs are incredible biological structures, comprising thin but strong silk filament and arranged into complex hierarchical architectures with striking mechanical properties (e.g., lightweight but high strength, achieving diverse mechanical responses). While simple 2D orb webs can easily be mimicked, the modeling and synthesis of 3D-based web structures remain challenging, partly due to the rich set of design features. Here we provide a detailed analysis of the heterogenous graph structures of spider webs, and use deep learning as a way to model and then synthesize artificial, bio-inspired 3D web structures. The generative AI models are conditioned based on key geometric parameters (including average edge length, number of nodes, average node degree, and others). To identify graph construction principles, we use inductive representation sampling of large experimentally determined spider web graphs, to yield a dataset that is used to train three conditional generative models: 1) An analog diffusion model inspired by nonequilibrium thermodynamics, with sparse neighbor representation, 2) a discrete diffusion model with full neighbor representation, and 3) an autoregressive transformer architecture with full neighbor representation. All three models are scalable, produce complex, de novo bio-inspired spider web mimics, and successfully construct graphs that meet the design objectives. We further propose algorithm that assembles web samples produced by the generative models into larger-scale structures based on a series of geometric design targets, including helical and parametric shapes, mimicking, and extending natural design principles towards integration with diverging engineering objectives. Several webs are manufactured using 3D printing and tested to assess mechanical properties.

Multi-scale Fusion Fault Diagnosis Method Based on Two-Dimensionaliztion Sequence in Complex Scenarios

  • Authors: Weiyang Jin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05198
  • Pdf link: https://arxiv.org/pdf/2304.05198
  • Abstract
    Rolling bearings are critical components in rotating machinery, and their faults can cause severe damage. Early detection of abnormalities is crucial to prevent catastrophic accidents. Traditional and intelligent methods have been used to analyze time series data, but in real-life scenarios, sensor data is often noisy and cannot be accurately characterized in the time domain, leading to mode collapse in trained models. Two-dimensionalization methods such as the Gram angle field method (GAF) or interval sampling have been proposed, but they lack mathematical derivation and interpretability. This paper proposes an improved GAF combined with grayscale images for convolution scenarios. The main contributions include illustrating the feasibility of the approach in complex scenarios, widening the data set, and introducing an improved convolutional neural network method with a multi-scale feature fusion diffusion model and deep learning compression techniques for deployment in industrial scenarios.

Diffusion Models for Constrained Domains

  • Authors: Nic Fishman, Leo Klarner, Valentin De Bortoli, Emile Mathieu, Michael Hutchinson
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.05364
  • Pdf link: https://arxiv.org/pdf/2304.05364
  • Abstract
    Denoising diffusion models are a recent class of generative models which achieve state-of-the-art results in many domains such as unconditional image generation and text-to-speech tasks. They consist of a noising process destroying the data and a backward stage defined as the time-reversal of the noising diffusion. Building on their success, diffusion models have recently been extended to the Riemannian manifold setting. Yet, these Riemannian diffusion models require geodesics to be defined for all times. While this setting encompasses many important applications, it does not include manifolds defined via a set of inequality constraints, which are ubiquitous in many scientific domains such as robotics and protein design. In this work, we introduce two methods to bridge this gap. First, we design a noising process based on the logarithmic barrier metric induced by the inequality constraints. Second, we introduce a noising process based on the reflected Brownian motion. As existing diffusion model techniques cannot be applied in this setting, we derive new tools to define such models in our framework. We empirically demonstrate the applicability of our methods to a number of synthetic and real-world tasks, including the constrained conformational modelling of protein backbones and robotic arms.

HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

  • Authors: Eslam Mohamed Bakr, Pengzhan Sun, Xiaoqian Shen, Faizan Farooq Khan, Li Erran Li, Mohamed Elhoseiny
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.05390
  • Pdf link: https://arxiv.org/pdf/2304.05390
  • Abstract
    In recent years, Text-to-Image (T2I) models have been extensively studied, especially with the emergence of diffusion models that achieve state-of-the-art results on T2I synthesis tasks. However, existing benchmarks heavily rely on subjective human evaluation, limiting their ability to holistically assess the model's capabilities. Furthermore, there is a significant gap between efforts in developing new T2I architectures and those in evaluation. To address this, we introduce HRS-Bench, a concrete evaluation benchmark for T2I models that is Holistic, Reliable, and Scalable. Unlike existing bench-marks that focus on limited aspects, HRS-Bench measures 13 skills that can be categorized into five major categories: accuracy, robustness, generalization, fairness, and bias. In addition, HRS-Bench covers 50 scenarios, including fashion, animals, transportation, food, and clothes. We evaluate nine recent large-scale T2I models using metrics that cover a wide range of skills. A human evaluation aligned with 95% of our evaluations on average was conducted to probe the effectiveness of HRS-Bench. Our experiments demonstrate that existing models often struggle to generate images with the desired count of objects, visual text, or grounded emotions. We hope that our benchmark help ease future text-to-image generation research. The code and data are available at https://eslambakr.github.io/hrsbench.github.io

Keyword: dynamic

Porównanie metod detekcji zajętości widma radiowego z wykorzystaniem uczenia federacyjnego z oraz bez węzła centralnego

  • Authors: Łukasz Kułacz
  • Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.04754
  • Pdf link: https://arxiv.org/pdf/2304.04754
  • Abstract
    Dynamic spectrum access systems typically require information about the spectrum occupancy and thus the presence of other users in order to make a spectrum al-location decision for a new device. Simple methods of spectrum occupancy detection are often far from reliable, hence spectrum occupancy detection algorithms supported by machine learning or artificial intelligence are often and successfully used. To protect the privacy of user data and to reduce the amount of control data, an interesting approach is to use federated machine learning. This paper compares two approaches to system design using federated machine learning: with and without a central node.

Distributed Estimation with Decentralized Control for Quadruple-Tank Process

  • Authors: Moh Kamalul Wafi, Bambang L. Widjiantoro
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.04763
  • Pdf link: https://arxiv.org/pdf/2304.04763
  • Abstract
    This paper proposes the design of quadruple-tank process due to the unique multivariable MIMO system under minimum and non-minimum scenario with respect to the valve ratio. This model is then implemented the distributed estimation algorithm with decentralized control. The inputs are set in divergent gains of pumps while the four-tank process is interconnected so that the stability properties are different, making the usage of decentralized control is reasonable. The number of outputs is designed the same as those of inputs which are also that of distributed Luenberger observer with the continuous linearized dynamical system. This distributed comprises local estimates only in certain output, meaning that it would lead to insufficiency so that the neighbouring links under some network topologies are required in the dynamical system. This concept fortunately works in two different characteristic stability of the tank process regarding estimating the states. This success leads to the further research of the more large-scale complex system.

Non-Linear Estimation using the Weighted Average Consensus-Based Unscented Filtering for Various Vehicles Dynamics towards Autonomous Sensorless Design

  • Authors: Bambang L. Widjiantoro, Moh Kamalul Wafi, Katherin Indriawati
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.04766
  • Pdf link: https://arxiv.org/pdf/2304.04766
  • Abstract
    The concerns to autonomous vehicles have been becoming more intriguing in coping with the more environmentally dynamics non-linear systems under some constraints and disturbances. These vehicles connect not only to the self-instruments yet to the neighborhoods components, making the diverse interconnected communications which should be handled locally to ease the computation and to fasten the decision. To deal with those interconnected networks, the distributed estimation to reach the untouched states, pursuing sensorless design, is approached, initiated by the construction of the modified pseudo measurement which, due to approximation, led to the weighted average consensus calculation within unscented filtering along with the bounded estimation errors. Moreover, the tested vehicles are also associated to certain robust control scenarios subject to noise and disturbance with some stability analysis to ensure the usage of the proposed estimation algorithm. The numerical instances are presented along with the performances of the control and estimation method. The results affirms the effectiveness of the method with limited error deviation compared to the other centralized and distributed filtering. Beyond these, the further research would be the directed sensorless design and fault-tolerant learning control subject to faults to negate the failures.

RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud Environments

  • Authors: Drew Penney, Bin Li, Lizhong Chen, Jaroslaw J. Sydir, Anna Drewek-Ossowicka, Ramesh Illikkal, Charlie Tai, Ravi Iyer, Andrew Herdrich
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.04797
  • Pdf link: https://arxiv.org/pdf/2304.04797
  • Abstract
    Resource sharing between multiple workloads has become a prominent practice among cloud service providers, motivated by demand for improved resource utilization and reduced cost of ownership. Effective resource sharing, however, remains an open challenge due to the adverse effects that resource contention can have on high-priority, user-facing workloads with strict Quality of Service (QoS) requirements. Although recent approaches have demonstrated promising results, those works remain largely impractical in public cloud environments since workloads are not known in advance and may only run for a brief period, thus prohibiting offline learning and significantly hindering online learning. In this paper, we propose RAPID, a novel framework for fast, fully-online resource allocation policy learning in highly dynamic operating environments. RAPID leverages lightweight QoS predictions, enabled by domain-knowledge-inspired techniques for sample efficiency and bias reduction, to decouple control from conventional feedback sources and guide policy learning at a rate orders of magnitude faster than prior work. Evaluation on a real-world server platform with representative cloud workloads confirms that RAPID can learn stable resource allocation policies in minutes, as compared with hours in prior state-of-the-art, while improving QoS by 9.0x and increasing best-effort workload performance by 19-43%.

Robust Body Exposure (RoBE): A Graph-based Dynamics Modeling Approach to Manipulating Blankets over People

  • Authors: Kavya Puthuveetil, Sasha Wald, Atharva Pusalkar, Pratyusha Karnati, Zackory Erickson
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.04822
  • Pdf link: https://arxiv.org/pdf/2304.04822
  • Abstract
    Robotic caregivers could potentially improve the quality of life of many who require physical assistance. However, in order to assist individuals who are lying in bed, robots must be capable of dealing with a significant obstacle: the blanket or sheet that will almost always cover the person's body. We propose a method for targeted bedding manipulation over people lying supine in bed where we first learn a model of the cloth's dynamics. Then, we optimize over this model to uncover a given target limb using information about human body shape and pose that only needs to be provided at run-time. We show how this approach enables greater robustness to variation relative to geometric and reinforcement learning baselines via a number of generalization evaluations in simulation and in the real world. We further evaluate our approach in a human study with 12 participants where we demonstrate that a mobile manipulator can adapt to real variation in human body shape, size, pose, and blanket configuration to uncover target body parts without exposing the rest of the body. Source code and supplementary materials are available online.

Exact Set-valued Estimation using Constrained Convex Generators for uncertain Linear Systems

  • Authors: Daniel Silvestre
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.04826
  • Pdf link: https://arxiv.org/pdf/2304.04826
  • Abstract
    Set-valued state estimation when in the presence of uncertainties in the model have been addressed in the literature essentially following three main approaches: i) interval arithmetic of the uncertain dynamics with the estimates; ii) factorizing the uncertainty into matrices with unity rank; and, iii) performing the convex hull for the vertices of the uncertainty space. Approach i) and ii) introduce a lot of conservatism because both disregard the relationship of the parameters with the entries of the dynamics matrix. On the other hand, approach iii) has a large growth on the number of variables required to represent the set or is approximated losing its main advantage in comparison with i) and ii). In this paper, with the application of autonomous vehicles in GPS-denied areas that resort to beacon signals for localization, we develop an exact (meaning no added conservatism) and optimal (smallest growth in the number of variables) closed-form definition for the convex hull of Convex Constrained Generators (CCGs). This results in a more efficient method to represent the minimum volume convex set corresponding to the state estimation. Given that reductions methods are still lacking in the literature for CCGs, we employ an approximation using ray-shooting that is comparable in terms of accuracy with methods for Constrained Zonotopes as the ones implemented in CORA. Simulations illustrate the greater accuracy of CCGs with the proposed convex hull operation in comparison to Constrained Zonotopes.

A few-shot graph Laplacian-based approach for improving the accuracy of low-fidelity data

  • Authors: Orazio Pinti, Assad A. Oberai
  • Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.04862
  • Pdf link: https://arxiv.org/pdf/2304.04862
  • Abstract
    Low-fidelity data is typically inexpensive to generate but inaccurate. On the other hand, high-fidelity data is accurate but expensive to obtain. Multi-fidelity methods use a small set of high-fidelity data to enhance the accuracy of a large set of low-fidelity data. In the approach described in this paper, this is accomplished by constructing a graph Laplacian using the low-fidelity data and computing its low-lying spectrum. This spectrum is then used to cluster the data and identify points that are closest to the centroids of the clusters. High-fidelity data is then acquired for these key points. Thereafter, a transformation that maps every low-fidelity data point to its bi-fidelity counterpart is determined by minimizing the discrepancy between the bi- and high-fidelity data at the key points, and to preserve the underlying structure of the low-fidelity data distribution. The latter objective is achieved by relying, once again, on the spectral properties of the graph Laplacian. This method is applied to a problem in solid mechanics and another in aerodynamics. In both cases, this methods uses a small fraction of high-fidelity data to significantly improve the accuracy of a large set of low-fidelity data.

Neural Network Predicts Ion Concentration Profiles under Nanoconfinement

  • Authors: Zhonglin Cao, Yuyang Wang, Cooper Lorsung, Amir Barati Farimani
  • Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)
  • Arxiv link: https://arxiv.org/abs/2304.04896
  • Pdf link: https://arxiv.org/pdf/2304.04896
  • Abstract
    Modeling the ion concentration profile in nanochannel plays an important role in understanding the electrical double layer and electroosmotic flow. Due to the non-negligible surface interaction and the effect of discrete solvent molecules, molecular dynamics (MD) simulation is often used as an essential tool to study the behavior of ions under nanoconfinement. Despite the accuracy of MD simulation in modeling nanoconfinement systems, it is computationally expensive. In this work, we propose neural network to predict ion concentration profiles in nanochannels with different configurations, including channel widths, ion molarity, and ion types. By modeling the ion concentration profile as a probability distribution, our neural network can serve as a much faster surrogate model for MD simulation with high accuracy. We further demonstrate the superior prediction accuracy of neural network over XGBoost. Lastly, we demonstrated that neural network is flexible in predicting ion concentration profiles with different bin sizes. Overall, our deep learning model is a fast, flexible, and accurate surrogate model to predict ion concentration profiles in nanoconfinement.

AffectMachine-Classical: A novel system for generating affective classical music

  • Authors: Kat R. Agres, Adyasha Dash, Phoebe Chua
  • Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.04915
  • Pdf link: https://arxiv.org/pdf/2304.04915
  • Abstract
    This work introduces a new music generation system, called AffectMachine-Classical, that is capable of generating affective Classic music in real-time. AffectMachine was designed to be incorporated into biofeedback systems (such as brain-computer-interfaces) to help users become aware of, and ultimately mediate, their own dynamic affective states. That is, this system was developed for music-based MedTech to support real-time emotion self-regulation in users. We provide an overview of the rule-based, probabilistic system architecture, describing the main aspects of the system and how they are novel. We then present the results of a listener study that was conducted to validate the ability of the system to reliably convey target emotions to listeners. The findings indicate that AffectMachine-Classical is very effective in communicating various levels of Arousal ($R^2 = .96$) to listeners, and is also quite convincing in terms of Valence (R^2 = .90). Future work will embed AffectMachine-Classical into biofeedback systems, to leverage the efficacy of the affective music for emotional well-being in listeners.

A Data-Driven State Aggregation Approach for Dynamic Discrete Choice Models

  • Authors: Sinong Geng, Houssam Nassif, Carlos A. Manzanares
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.04916
  • Pdf link: https://arxiv.org/pdf/2304.04916
  • Abstract
    We study dynamic discrete choice models, where a commonly studied problem involves estimating parameters of agent reward functions (also known as "structural" parameters), using agent behavioral data. Maximum likelihood estimation for such models requires dynamic programming, which is limited by the curse of dimensionality. In this work, we present a novel algorithm that provides a data-driven method for selecting and aggregating states, which lowers the computational and sample complexity of estimation. Our method works in two stages. In the first stage, we use a flexible inverse reinforcement learning approach to estimate agent Q-functions. We use these estimated Q-functions, along with a clustering algorithm, to select a subset of states that are the most pivotal for driving changes in Q-functions. In the second stage, with these selected "aggregated" states, we conduct maximum likelihood estimation using a commonly used nested fixed-point algorithm. The proposed two-stage approach mitigates the curse of dimensionality by reducing the problem dimension. Theoretically, we derive finite-sample bounds on the associated estimation error, which also characterize the trade-off of computational complexity, estimation error, and sample complexity. We demonstrate the empirical performance of the algorithm in two classic dynamic discrete choice estimation applications.

Point-and-Shoot All-in-Focus Photo Synthesis from Smartphone Camera Pair

  • Authors: Xianrui Luo, Juewen Peng, Weiyue Zhao, Ke Xian, Hao Lu, Zhiguo Cao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.04917
  • Pdf link: https://arxiv.org/pdf/2304.04917
  • Abstract
    All-in-Focus (AIF) photography is expected to be a commercial selling point for modern smartphones. Standard AIF synthesis requires manual, time-consuming operations such as focal stack compositing, which is unfriendly to ordinary people. To achieve point-and-shoot AIF photography with a smartphone, we expect that an AIF photo can be generated from one shot of the scene, instead of from multiple photos captured by the same camera. Benefiting from the multi-camera module in modern smartphones, we introduce a new task of AIF synthesis from main (wide) and ultra-wide cameras. The goal is to recover sharp details from defocused regions in the main-camera photo with the help of the ultra-wide-camera one. The camera setting poses new challenges such as parallax-induced occlusions and inconsistent color between cameras. To overcome the challenges, we introduce a predict-and-refine network to mitigate occlusions and propose dynamic frequency-domain alignment for color correction. To enable effective training and evaluation, we also build an AIF dataset with 2686 unique scenes. Each scene includes two photos captured by the main camera, one photo captured by the ultrawide camera, and a synthesized AIF photo. Results show that our solution, termed EasyAIF, can produce high-quality AIF photos and outperforms strong baselines quantitatively and qualitatively. For the first time, we demonstrate point-and-shoot AIF photo synthesis successfully from main and ultra-wide cameras.

Staged Contact Optimization: Combining Contact-Implicit and Multi-Phase Hybrid Trajectory Optimization

  • Authors: Michael R. Turski, Joseph Norby, Aaron M. Johnson
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.04923
  • Pdf link: https://arxiv.org/pdf/2304.04923
  • Abstract
    Trajectory optimization problems for legged robots are commonly formulated with fixed contact schedules. These multi-phase Hybrid Trajectory Optimization (HTO) methods result in locally optimal trajectories, but the result depends heavily upon the predefined contact mode sequence. Contact-Implicit Optimization (CIO) offers a potential solution to this issue by allowing the contact mode to be determined throughout the trajectory by the optimization solver. However, CIO suffers from long solve times and convergence issues. This work combines the benefits of these two methods into one algorithm: Staged Contact Optimization (SCO). SCO tightens constraints on contact in stages, eventually fixing them to allow robust and fast convergence to a feasible solution. Results on a planar biped and spatial quadruped demonstrate speed and optimality improvements over CIO and HTO. These properties make SCO well suited for offline trajectory generation or as an effective tool for exploring the dynamic capabilities of a robot.

Universal dual-port grid-forming control: bridging the gap between grid-forming and grid-following control

  • Authors: Irina Subotić, and Dominic Groß
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.04939
  • Pdf link: https://arxiv.org/pdf/2304.04939
  • Abstract
    We study a dual-port grid-forming (GFM) control for power systems containing ac and dc transmission, converter-interfaced generation and energy storage, and legacy generation. To operate such a system and provide standard services, state-of-the-art control architectures i) require assigning grid-following (GFL) and GFM controls to different converters, and ii) result in highly complex system dynamics. In contrast, dual-port GFM control (i) subsumes standard functions of GFM and GFL controls in a simple controller, ii) can be applied to a wide range of emerging technologies independently of the network configuration, and iii) significantly reduces system complexity. In this work, we provide i) an end-to-end modeling framework that allows to model complex topologies through composition of reduced-order device models, ii) an in-depth discussion of universal dual-port GFM control for emerging power systems, and iii) end-to-end stability conditions that cover a wide range of network topologies, emerging technologies, and legacy technologies. Finally, we validate our findings in a detailed case study.

A Family of Iteration Functions for General Linear Systems

  • Authors: Bahman Kalantari
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.04940
  • Pdf link: https://arxiv.org/pdf/2304.04940
  • Abstract
    We develop novel theory and algorithms for computing approximate solution to $Ax=b$, or to $A^TAx=A^Tb$, where $A$ is an $m \times n$ real matrix of arbitrary rank. First, we describe the {\it Triangle Algorithm} (TA), where given an ellipsoid $E_{A,\rho}={Ax: \Vert x \Vert \leq \rho}$, in each iteration it either computes successively improving approximation $b_k=Ax_k \in E_{A,\rho}$, or proves $b \not \in E_{A, \rho}$. We then extend TA for computing an approximate solution or minimum-norm solution. Next, we develop a dynamic version of TA, the {\it Centering Triangle Algorithm} (CTA), generating residuals $r_k=b - Ax_k$ via iterations of the simple formula, $F_1(r)=r-(r^THr/r^TH^2r)Hr$, where $H=A$ when $A$ is symmetric PSD, otherwise $H=AA^T$ but need not be computed explicitly. More generally, CTA extends to a family of iteration function, $F_t( r)$, $t=1, \dots, m$ satisfying: On the one hand, given $t \leq m$ and $r_0=b-Ax_0$, where $x_0=A^Tw_0$ with $w_0 \in \mathbb{R}^m$ arbitrary, for all $k \geq 1$, $r_k=F_t(r_{k-1})=b-Ax_k$ and $A^Tr_k$ converges to zero. Algorithmically, if $H$ is invertible with condition number $\kappa$, in $k=O( (\kappa/t) \ln \varepsilon^{-1})$ iterations $\Vert r_k \Vert \leq \varepsilon$. If $H$ is singular with $\kappa^+$ the ratio of its largest to smallest positive eigenvalues, in $k =O(\kappa^+/t\varepsilon)$ iterations either $\Vert r_k \Vert \leq \varepsilon$ or $\Vert A^T r_k\Vert= O(\sqrt{\varepsilon})$. If $N$ is the number of nonzero entries of $A$, each iteration take $O(Nt+t^3)$ operations. On the other hand, given $r_0=b-Ax_0$, suppose its minimal polynomial with respect to $H$ has degree $s$. Then $Ax=b$ is solvable if and only if $F_{s}(r_0)=0$. Moreover, exclusively $A^TAx=A^Tb$ is solvable, if and only if $F_{s}(r_0) \not= 0$ but $A^T F_s(r_0)=0$. Additionally, ${F_t(r_0)}_{t=1}^s$ is computable in $O(Ns+s^3)$ operations.

StageInteractor: Query-based Object Detector with Cross-stage Interaction

  • Authors: Yao Teng, Haisong Liu, Sheng Guo, Limin Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.04978
  • Pdf link: https://arxiv.org/pdf/2304.04978
  • Abstract
    Previous object detectors make predictions based on dense grid points or numerous preset anchors. Most of these detectors are trained with one-to-many label assignment strategies. On the contrary, recent query-based object detectors depend on a sparse set of learnable queries and a series of decoder layers. The one-to-one label assignment is independently applied on each layer for the deep supervision during training. Despite the great success of query-based object detection, however, this one-to-one label assignment strategy demands the detectors to have strong fine-grained discrimination and modeling capacity. To solve the above problems, in this paper, we propose a new query-based object detector with cross-stage interaction, coined as StageInteractor. During the forward propagation, we come up with an efficient way to improve this modeling ability by reusing dynamic operators with lightweight adapters. As for the label assignment, a cross-stage label assigner is applied subsequent to the one-to-one label assignment. With this assigner, the training target class labels are gathered across stages and then reallocated to proper predictions at each decoder layer. On MS COCO benchmark, our model improves the baseline by 2.2 AP, and achieves 44.8 AP with ResNet-50 as backbone, 100 queries and 12 training epochs. With longer training time and 300 queries, StageInteractor achieves 51.1 AP and 52.2 AP with ResNeXt-101-DCN and Swin-S, respectively.

Detecting Anomalous Microflows in IoT Volumetric Attacks via Dynamic Monitoring of MUD Activity

  • Authors: Ayyoob Hamza, Hassan Habibi Gharakheili, Theophilus A. Benson, Gustavo Batista, Vijay Sivaraman
  • Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.04987
  • Pdf link: https://arxiv.org/pdf/2304.04987
  • Abstract
    IoT networks are increasingly becoming target of sophisticated new cyber-attacks. Anomaly-based detection methods are promising in finding new attacks, but there are certain practical challenges like false-positive alarms, hard to explain, and difficult to scale cost-effectively. The IETF recent standard called Manufacturer Usage Description (MUD) seems promising to limit the attack surface on IoT devices by formally specifying their intended network behavior. In this paper, we use SDN to enforce and monitor the expected behaviors of each IoT device, and train one-class classifier models to detect volumetric attacks. Our specific contributions are fourfold. (1) We develop a multi-level inferencing model to dynamically detect anomalous patterns in network activity of MUD-compliant traffic flows via SDN telemetry, followed by packet inspection of anomalous flows. This provides enhanced fine-grained visibility into distributed and direct attacks, allowing us to precisely isolate volumetric attacks with microflow (5-tuple) resolution. (2) We collect traffic traces (benign and a variety of volumetric attacks) from network behavior of IoT devices in our lab, generate labeled datasets, and make them available to the public. (3) We prototype a full working system (modules are released as open-source), demonstrates its efficacy in detecting volumetric attacks on several consumer IoT devices with high accuracy while maintaining low false positives, and provides insights into cost and performance of our system. (4) We demonstrate how our models scale in environments with a large number of connected IoTs (with datasets collected from a network of IP cameras in our university campus) by considering various training strategies (per device unit versus per device type), and balancing the accuracy of prediction against the cost of models in terms of size and training time.

Bayes correlated equilibria and no-regret dynamics

  • Authors: Kaito Fujii
  • Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.05005
  • Pdf link: https://arxiv.org/pdf/2304.05005
  • Abstract
    This paper explores equilibrium concepts for Bayesian games, which are fundamental models of games with incomplete information. We aim at three desirable properties of equilibria. First, equilibria can be naturally realized by introducing a mediator into games. Second, an equilibrium can be computed efficiently in a distributed fashion. Third, any equilibrium in that class approximately maximizes social welfare, as measured by the price of anarchy, for a broad class of games. These three properties allow players to compute an equilibrium and realize it via a mediator, thereby settling into a stable state with approximately optimal social welfare. Our main result is the existence of an equilibrium concept that satisfies these three properties. Toward this goal, we characterize various (non-equivalent) extensions of correlated equilibria, collectively known as Bayes correlated equilibria. In particular, we focus on communication equilibria (also known as coordination mechanisms), which can be realized by a mediator who gathers each player's private information and then sends correlated recommendations to the players. We show that if each player minimizes a variant of regret called untruthful swap regret in repeated play of Bayesian games, the empirical distribution of these dynamics converges to a communication equilibrium. We present an efficient algorithm for minimizing untruthful swap regret with a sublinear upper bound, which we prove to be tight up to a multiplicative constant. As a result, by simulating the dynamics with our algorithm, we can efficiently compute an approximate communication equilibrium. Furthermore, we extend existing lower bounds on the price of anarchy based on the smoothness arguments from Bayes Nash equilibria to equilibria obtained by the proposed dynamics.

Translating Assembly Accuracy Requirements to Cut-Off Frequencies for Component Mode Synthesis

  • Authors: Lars A.L. Janssen, Bart Besselink, Rob H.B. Fey, Nathan van de Wouw
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.05021
  • Pdf link: https://arxiv.org/pdf/2304.05021
  • Abstract
    One of the most popular methods for reducing the complexity of assemblies of finite element models in the field of structural dynamics is component mode synthesis. A main challenge of component mode synthesis is balancing model complexity and model accuracy, because it is difficult to predict how component reduction influences assembly model accuracy. This work introduces an approach that allows for the translation of assembly model accuracy requirements in the frequency domain to the automatic selection of the cut-off frequencies for the model-order reduction (MOR) of components. The approach is based on a mathematical approach for MOR for coupled linear systems in the field of systems and control. We show how this approach is also applicable to structural dynamics models. We demonstrate the use of this approach in the scope of component mode synthesis (CMS) methods with the aim to reduce the complexity of component models while guaranteeing accuracy requirements of the assembly model. The proposed approach is illustrated on a mechanical, three-component structural dynamics system for which reduced-order models are computed that are reduced further compared to reduction using standard methods. This results in lower simulation cost, while maintaining the required accuracy.

Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond

  • Authors: Michael Krause, Christof Weiß, Meinard Müller
  • Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.05032
  • Pdf link: https://arxiv.org/pdf/2304.05032
  • Abstract
    Many tasks in music information retrieval (MIR) involve weakly aligned data, where exact temporal correspondences are unknown. The connectionist temporal classification (CTC) loss is a standard technique to learn feature representations based on weakly aligned training data. However, CTC is limited to discrete-valued target sequences and can be difficult to extend to multi-label problems. In this article, we show how soft dynamic time warping (SoftDTW), a differentiable variant of classical DTW, can be used as an alternative to CTC. Using multi-pitch estimation as an example scenario, we show that SoftDTW yields results on par with a state-of-the-art multi-label extension of CTC. In addition to being more elegant in terms of its algorithmic formulation, SoftDTW naturally extends to real-valued target sequences.

Real-Time Character Rise Motions

  • Authors: Ben Kenwright
  • Subjects: Robotics (cs.RO); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.05056
  • Pdf link: https://arxiv.org/pdf/2304.05056
  • Abstract
    This paper presents an uncomplicated dynamic controller for generating physically-plausible three-dimensional full-body biped character rise motions on-the-fly at run-time. Our low-dimensional controller uses fundamental reference information (e.g., center-of-mass, hands, and feet locations) to produce balanced biped get-up poses by means of a real-time physically-based simulation. The key idea is to use a simple approximate model (i.e., similar to the inverted-pendulum stepping model) to create continuous reference trajectories that can be seamlessly tracked by an articulated biped character to create balanced rise-motions. Our approach does not use any key-framed data or any computationally expensive processing (e.g., offline-optimization or search algorithms). We demonstrate the effectiveness and ease of our technique through example (i.e., a biped character picking itself up from different laying positions).

If consciousness is dynamically relevant, artificial intelligence isn't conscious

  • Authors: Johannes Kleiner, Tim Ludwig
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05077
  • Pdf link: https://arxiv.org/pdf/2304.05077
  • Abstract
    We demonstrate that if consciousness is relevant for the temporal evolution of a system's states -- that is, if it is dynamically relevant -- then AI systems cannot be conscious. That is because AI systems run on CPUs, GPUs, TPUs or other processors which have been designed and verified to adhere to computational dynamics that systematically preclude or suppress deviations. The design and verification preclude or suppress, in particular, potential consciousness-related dynamical effects, so that if consciousness is dynamically relevant, AI systems cannot be conscious.

TodyNet: Temporal Dynamic Graph Neural Network for Multivariate Time Series Classification

  • Authors: Huaiyuan Liu, Xianzhang Liu, Donghua Yang, Zhiyu Liang, Hongzhi Wang, Yong Cui, Jun Gu
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.05078
  • Pdf link: https://arxiv.org/pdf/2304.05078
  • Abstract
    Multivariate time series classification (MTSC) is an important data mining task, which can be effectively solved by popular deep learning technology. Unfortunately, the existing deep learning-based methods neglect the hidden dependencies in different dimensions and also rarely consider the unique dynamic features of time series, which lack sufficient feature extraction capability to obtain satisfactory classification accuracy. To address this problem, we propose a novel temporal dynamic graph neural network (TodyNet) that can extract hidden spatio-temporal dependencies without undefined graph structure. It enables information flow among isolated but implicit interdependent variables and captures the associations between different time slots by dynamic graph mechanism, which further improves the classification performance of the model. Meanwhile, the hierarchical representations of graphs cannot be learned due to the limitation of GNNs. Thus, we also design a temporal graph pooling layer to obtain a global graph-level representation for graph learning with learnable temporal parameters. The dynamic graph, graph information propagation, and temporal convolution are jointly learned in an end-to-end framework. The experiments on 26 UEA benchmark datasets illustrate that the proposed TodyNet outperforms existing deep learning-based methods in the MTSC tasks.

One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field

  • Authors: Weichuang Li, Longhao Zhang, Dong Wang, Bin Zhao, Zhigang Wang, Mulin Chen, Bang Zhang, Zhongjian Wang, Liefeng Bo, Xuelong Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05097
  • Pdf link: https://arxiv.org/pdf/2304.05097
  • Abstract
    Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image. Most pioneering methods rely primarily on 2D representations and thus will inevitably suffer from face distortion when large head rotations are encountered. Recent works instead employ explicit 3D structural representations or implicit neural rendering to improve performance under large pose changes. Nevertheless, the fidelity of identity and expression is not so desirable, especially for novel-view synthesis. In this paper, we propose HiDe-NeRF, which achieves high-fidelity and free-view talking-head synthesis. Drawing on the recently proposed Deformable Neural Radiance Fields, HiDe-NeRF represents the 3D dynamic scene into a canonical appearance field and an implicit deformation field, where the former comprises the canonical source face and the latter models the driving pose and expression. In particular, we improve fidelity from two aspects: (i) to enhance identity expressiveness, we design a generalized appearance module that leverages multi-scale volume features to preserve face shape and details; (ii) to improve expression preciseness, we propose a lightweight deformation module that explicitly decouples the pose and expression to enable precise expression modeling. Extensive experiments demonstrate that our proposed approach can generate better results than previous works. Project page: https://www.waytron.net/hidenerf/

Video Event Restoration Based on Keyframes for Video Anomaly Detection

  • Authors: Zhiwei Yang, Jing Liu, Zhaoyang Wu, Peng Wu, Xiaotao Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05112
  • Pdf link: https://arxiv.org/pdf/2304.05112
  • Abstract
    Video anomaly detection (VAD) is a significant computer vision problem. Existing deep neural network (DNN) based VAD methods mostly follow the route of frame reconstruction or frame prediction. However, the lack of mining and learning of higher-level visual features and temporal context relationships in videos limits the further performance of these two approaches. Inspired by video codec theory, we introduce a brand-new VAD paradigm to break through these limitations: First, we propose a new task of video event restoration based on keyframes. Encouraging DNN to infer missing multiple frames based on video keyframes so as to restore a video event, which can more effectively motivate DNN to mine and learn potential higher-level visual features and comprehensive temporal context relationships in the video. To this end, we propose a novel U-shaped Swin Transformer Network with Dual Skip Connections (USTN-DSC) for video event restoration, where a cross-attention and a temporal upsampling residual skip connection are introduced to further assist in restoring complex static and dynamic motion object features in the video. In addition, we propose a simple and effective adjacent frame difference loss to constrain the motion consistency of the video sequence. Extensive experiments on benchmarks demonstrate that USTN-DSC outperforms most existing methods, validating the effectiveness of our method.

Modeling and design of heterogeneous hierarchical bioinspired spider web structures using generative deep learning and additive manufacturing

  • Authors: Wei Lu, Nic A. Lee, Markus J. Buehler
  • Subjects: Machine Learning (cs.LG); Soft Condensed Matter (cond-mat.soft); Adaptation and Self-Organizing Systems (nlin.AO)
  • Arxiv link: https://arxiv.org/abs/2304.05137
  • Pdf link: https://arxiv.org/pdf/2304.05137
  • Abstract
    Spider webs are incredible biological structures, comprising thin but strong silk filament and arranged into complex hierarchical architectures with striking mechanical properties (e.g., lightweight but high strength, achieving diverse mechanical responses). While simple 2D orb webs can easily be mimicked, the modeling and synthesis of 3D-based web structures remain challenging, partly due to the rich set of design features. Here we provide a detailed analysis of the heterogenous graph structures of spider webs, and use deep learning as a way to model and then synthesize artificial, bio-inspired 3D web structures. The generative AI models are conditioned based on key geometric parameters (including average edge length, number of nodes, average node degree, and others). To identify graph construction principles, we use inductive representation sampling of large experimentally determined spider web graphs, to yield a dataset that is used to train three conditional generative models: 1) An analog diffusion model inspired by nonequilibrium thermodynamics, with sparse neighbor representation, 2) a discrete diffusion model with full neighbor representation, and 3) an autoregressive transformer architecture with full neighbor representation. All three models are scalable, produce complex, de novo bio-inspired spider web mimics, and successfully construct graphs that meet the design objectives. We further propose algorithm that assembles web samples produced by the generative models into larger-scale structures based on a series of geometric design targets, including helical and parametric shapes, mimicking, and extending natural design principles towards integration with diverging engineering objectives. Several webs are manufactured using 3D printing and tested to assess mechanical properties.

Distributed Event-Triggered Online Learning for Multi-Agent System Control using Gaussian Process Regression

  • Authors: Xiaobing Dai, Zewen Yang, Mengtian Xu, Sandra Hirche
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.05138
  • Pdf link: https://arxiv.org/pdf/2304.05138
  • Abstract
    For the cooperative control of multi-agent systems with unknown dynamics, data-driven methods are commonly employed to infer models from the collected data. Due to the flexibility to model nonlinear functions and the existence of theoretical prediction error bound, Gaussian process (GP) regression is widely used in such control problems. Online learning, i.e. adding newly collected training data to the GP models, promises to improve control performance via improved predictions during the operation. In this paper, we propose a distributed event-triggered online learning algorithm for multi-agent system control. The proposed algorithm only employs locally available information from the neighbors and achieves a guaranteed overall control performance with desired tracking error bound. Moreover, the exclusion of the Zeno behavior for each agent is proved. Finally, the effectiveness of the proposed event-triggered online learning is demonstrated in simulations.

Feed-forward Disturbance Compensation for Station Keeping in Wave-dominated Environments

  • Authors: Kyle L. Walker, Adam A. Stokes, Aristides Kiprakis, Francesco Giorgio-Serchi
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.05222
  • Pdf link: https://arxiv.org/pdf/2304.05222
  • Abstract
    When deploying robots in shallow ocean waters, wave disturbances can be significant, highly dynamic and pose problems when operating near structures; this is a key limitation of current control strategies, restricting the range of conditions in which subsea vehicles can be deployed. To improve dynamic control and offer a higher level of robustness, this work proposes a Cascaded Proportional-Derivative (C-PD) with Feed-forward (FF) control scheme for disturbance mitigation, exploring the concept of explicitly using disturbance estimations to counteract state perturbations. Results demonstrate that the proposed controller is capable of higher performance in contrast to a standard C-PD controller, with an average reduction of ~48% witnessed across various sea states. Additional analysis also investigated performance when considering coarse estimations featuring inaccuracies; average improvements of ~17% demonstrate the effectiveness of the proposed strategy to handle these uncertainties. The proposal in this work shows promise for improved control without a drastic increase in required computing power; if coupled with sufficient sensors, state estimation techniques and prediction algorithms, utilising feed-forward compensating control actions offers a potential solution to improve vehicle control under wave-induced disturbances.

Neural Delay Differential Equations: System Reconstruction and Image Classification

  • Authors: Qunxi Zhu, Yao Guo, Wei Lin
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS); Chaotic Dynamics (nlin.CD)
  • Arxiv link: https://arxiv.org/abs/2304.05310
  • Pdf link: https://arxiv.org/pdf/2304.05310
  • Abstract
    Neural Ordinary Differential Equations (NODEs), a framework of continuous-depth neural networks, have been widely applied, showing exceptional efficacy in coping with representative datasets. Recently, an augmented framework has been developed to overcome some limitations that emerged in the application of the original framework. In this paper, we propose a new class of continuous-depth neural networks with delay, named Neural Delay Differential Equations (NDDEs). To compute the corresponding gradients, we use the adjoint sensitivity method to obtain the delayed dynamics of the adjoint. Differential equations with delays are typically seen as dynamical systems of infinite dimension that possess more fruitful dynamics. Compared to NODEs, NDDEs have a stronger capacity of nonlinear representations. We use several illustrative examples to demonstrate this outstanding capacity. Firstly, we successfully model the delayed dynamics where the trajectories in the lower-dimensional phase space could be mutually intersected and even chaotic in a model-free or model-based manner. Traditional NODEs, without any argumentation, are not directly applicable for such modeling. Secondly, we achieve lower loss and higher accuracy not only for the data produced synthetically by complex models but also for the CIFAR10, a well-known image dataset. Our results on the NDDEs demonstrate that appropriately articulating the elements of dynamical systems into the network design is truly beneficial in promoting network performance.

Stability/instability study of density systems and control law design

  • Authors: Igor Furtat
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.05313
  • Pdf link: https://arxiv.org/pdf/2304.05313
  • Abstract
    The paper considers some class of dynamical systems that called density systems. For such systems the derivative of quadratic function depends on so-called density function. The density function is used to set the properties of phase space, therefore, it influences the behaviour of investigated systems. A particular class of such systems is previously considered for (in)stability study of dynamical systems using the flow and divergence of a phase vector. In this paper, a more general class of such systems is considered, and it is shown that the density function can be used not only to study (in)stability, but also to set the properties of space in order to change the behaviour of dynamical systems. The development of control laws based on use the density function for systems with known and unknown parameters is considered. All obtained results are accompanied by the simulations illustrating the theoretical conclusions.

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

  • Authors: Yunpeng Zhang, Zheng Zhu, Dalong Du
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05316
  • Pdf link: https://arxiv.org/pdf/2304.05316
  • Abstract
    The vision-based perception for autonomous driving has undergone a transformation from the bird-eye-view (BEV) representations to the 3D semantic occupancy. Compared with the BEV planes, the 3D semantic occupancy further provides structural information along the vertical direction. This paper presents OccFormer, a dual-path transformer network to effectively process the 3D volume for semantic occupancy prediction. OccFormer achieves a long-range, dynamic, and efficient encoding of the camera-generated 3D voxel features. It is obtained by decomposing the heavy 3D processing into the local and global transformer pathways along the horizontal plane. For the occupancy decoder, we adapt the vanilla Mask2Former for 3D semantic occupancy by proposing preserve-pooling and class-guided sampling, which notably mitigate the sparsity and class imbalance. Experimental results demonstrate that OccFormer significantly outperforms existing methods for semantic scene completion on SemanticKITTI dataset and for LiDAR semantic segmentation on nuScenes dataset. Code is available at \url{https://github.com/zhangyp15/OccFormer}.

Unified Multi-Modal Image Synthesis for Missing Modality Imputation

  • Authors: Yue Zhang, Chengtao Peng, Qiuli Wang, Dan Song, Kaiyan Li, S. Kevin Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.05340
  • Pdf link: https://arxiv.org/pdf/2304.05340
  • Abstract
    Multi-modal medical images provide complementary soft-tissue characteristics that aid in the screening and diagnosis of diseases. However, limited scanning time, image corruption and various imaging protocols often result in incomplete multi-modal images, thus limiting the usage of multi-modal data for clinical purposes. To address this issue, in this paper, we propose a novel unified multi-modal image synthesis method for missing modality imputation. Our method overall takes a generative adversarial architecture, which aims to synthesize missing modalities from any combination of available ones with a single model. To this end, we specifically design a Commonality- and Discrepancy-Sensitive Encoder for the generator to exploit both modality-invariant and specific information contained in input modalities. The incorporation of both types of information facilitates the generation of images with consistent anatomy and realistic details of the desired distribution. Besides, we propose a Dynamic Feature Unification Module to integrate information from a varying number of available modalities, which enables the network to be robust to random missing modalities. The module performs both hard integration and soft integration, ensuring the effectiveness of feature combination while avoiding information loss. Verified on two public multi-modal magnetic resonance datasets, the proposed method is effective in handling various synthesis tasks and shows superior performance compared to previous methods.

Distributed no-regret edge resource allocation with limited communication

  • Authors: Saad Kriouile, Dimitrios Tsilimantos, Theodoros Giannakas
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.05355
  • Pdf link: https://arxiv.org/pdf/2304.05355
  • Abstract
    To accommodate low latency and computation-intensive services, such as the Internet-of-Things (IoT), 5G networks are expected to have cloud and edge computing capabilities. To this end, we consider a generic network setup where devices, performing analytics-related tasks, can partially process a task and offload its remainder to base stations, which can then reroute it to cloud and/or to edge servers. To account for the potentially unpredictable traffic demands and edge network dynamics, we formulate the resource allocation as an online convex optimization problem with service violation constraints and allow limited communication between neighboring nodes. To address the problem, we propose an online distributed (across the nodes) primal-dual algorithm and prove that it achieves sublinear regret and violation; in fact, the achieved bound is of the same order as the best known centralized alternative. Our results are further supported using the publicly available Milano dataset.

New submissions for Mon, 27 Mar 23

Keyword: pruning

Low Rank Optimization for Efficient Deep Learning: Making A Balance between Compact Architecture and Fast Training

  • Authors: Xinwei Ou, Zhangxin Chen, Ce Zhu, Yipeng Liu
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.13635
  • Pdf link: https://arxiv.org/pdf/2303.13635
  • Abstract
    Deep neural networks have achieved great success in many data processing applications. However, the high computational complexity and storage cost makes deep learning hard to be used on resource-constrained devices, and it is not environmental-friendly with much power cost. In this paper, we focus on low-rank optimization for efficient deep learning techniques. In the space domain, deep neural networks are compressed by low rank approximation of the network parameters, which directly reduces the storage requirement with a smaller number of network parameters. In the time domain, the network parameters can be trained in a few subspaces, which enables efficient training for fast convergence. The model compression in the spatial domain is summarized into three categories as pre-train, pre-set, and compression-aware methods, respectively. With a series of integrable techniques discussed, such as sparse pruning, quantization, and entropy coding, we can ensemble them in an integration framework with lower computational complexity and storage. Besides of summary of recent technical advances, we have two findings for motivating future works: one is that the effective rank outperforms other sparse measures for network compression. The other is a spatial and temporal balance for tensorized neural networks.

How Does Attention Work in Vision Transformers? A Visual Analytics Attempt

  • Authors: Yiran Li, Junpeng Wang, Xin Dai, Liang Wang, Chin-Chia Michael Yeh, Yan Zheng, Wei Zhang, Kwan-Liu Ma
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2303.13731
  • Pdf link: https://arxiv.org/pdf/2303.13731
  • Abstract
    Vision transformer (ViT) expands the success of transformer models from sequential data to images. The model decomposes an image into many smaller patches and arranges them into a sequence. Multi-head self-attentions are then applied to the sequence to learn the attention between patches. Despite many successful interpretations of transformers on sequential data, little effort has been devoted to the interpretation of ViTs, and many questions remain unanswered. For example, among the numerous attention heads, which one is more important? How strong are individual patches attending to their spatial neighbors in different heads? What attention patterns have individual heads learned? In this work, we answer these questions through a visual analytics approach. Specifically, we first identify what heads are more important in ViTs by introducing multiple pruning-based metrics. Then, we profile the spatial distribution of attention strengths between patches inside individual heads, as well as the trend of attention strengths across attention layers. Third, using an autoencoder-based learning solution, we summarize all possible attention patterns that individual heads could learn. Examining the attention strengths and patterns of the important heads, we answer why they are important. Through concrete case studies with experienced deep learning experts on multiple ViTs, we validate the effectiveness of our solution that deepens the understanding of ViTs from head importance, head attention strength, and head attention pattern.

Efficient Execution of SPARQL Queries with OPTIONAL and UNION Expressions

  • Authors: Lei Zou, Yue Pang, M. Tamer Özsu, Jiaqi Chen
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2303.13844
  • Pdf link: https://arxiv.org/pdf/2303.13844
  • Abstract
    The proliferation of RDF datasets has resulted in studies focusing on optimizing SPARQL query processing. Most existing work focuses on basic graph patterns (BGPs) and ignores other vital operators in SPARQL, such as UNION and OPTIONAL. SPARQL queries with these operators, which we abbreviate as SPARQL-UO, pose serious query plan generation challenges. In this paper, we propose techniques for executing SPARQL-UO queries using BGP execution as a building block, based on a novel BGP-based Evaluation (BE)-Tree representation of query plans. On top of this, we propose a series of cost-driven BE-tree transformations to generate more efficient plans by reducing the search space and intermediate result sizes, and a candidate pruning technique that further enhances efficiency at query time. Experiments confirm that our method outperforms the state-of-the-art by orders of magnitude.

LINe: Out-of-Distribution Detection by Leveraging Important Neurons

  • Authors: Yong Hyun Ahn, Gyeong-Moon Park, Seong Tae Kim
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13995
  • Pdf link: https://arxiv.org/pdf/2303.13995
  • Abstract
    It is important to quantify the uncertainty of input samples, especially in mission-critical domains such as autonomous driving and healthcare, where failure predictions on out-of-distribution (OOD) data are likely to cause big problems. OOD detection problem fundamentally begins in that the model cannot express what it is not aware of. Post-hoc OOD detection approaches are widely explored because they do not require an additional re-training process which might degrade the model's performance and increase the training cost. In this study, from the perspective of neurons in the deep layer of the model representing high-level features, we introduce a new aspect for analyzing the difference in model outputs between in-distribution data and OOD data. We propose a novel method, Leveraging Important Neurons (LINe), for post-hoc Out of distribution detection. Shapley value-based pruning reduces the effects of noisy outputs by selecting only high-contribution neurons for predicting specific classes of input data and masking the rest. Activation clipping fixes all values above a certain threshold into the same value, allowing LINe to treat all the class-specific features equally and just consider the difference between the number of activated feature differences between in-distribution and OOD data. Comprehensive experiments verify the effectiveness of the proposed method by outperforming state-of-the-art post-hoc OOD detection methods on CIFAR-10, CIFAR-100, and ImageNet datasets.

PowerPruning: Selecting Weights and Activations for Power-Efficient Neural Network Acceleration

  • Authors: Richard Petri, Grace Li Zhang, Yiran Chen, Ulf Schlichtmann, Bing Li
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.13997
  • Pdf link: https://arxiv.org/pdf/2303.13997
  • Abstract
    Deep neural networks (DNNs) have been successfully applied in various fields. A major challenge of deploying DNNs, especially on edge devices, is power consumption, due to the large number of multiply-and-accumulate (MAC) operations. To address this challenge, we propose PowerPruning, a novel method to reduce power consumption in digital neural network accelerators by selecting weights that lead to less power consumption in MAC operations. In addition, the timing characteristics of the selected weights together with all activation transitions are evaluated. The weights and activations that lead to small delays are further selected. Consequently, the maximum delay of the sensitized circuit paths in the MAC units is reduced even without modifying MAC units, which thus allows a flexible scaling of supply voltage to reduce power consumption further. Together with retraining, the proposed method can reduce power consumption of DNNs on hardware by up to 78.3% with only a slight accuracy loss.

Keyword: neural\ architecture\ search

There is no result

Keyword: 3d object detection

MoGDE: Boosting Mobile Monocular 3D Object Detection with Ground Depth Estimation

  • Authors: Yunsong Zhou, Quan Liu, Hongzi Zhu, Yunzhe Li, Shan Chang, Minyi Guo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13561
  • Pdf link: https://arxiv.org/pdf/2303.13561
  • Abstract
    Monocular 3D object detection (Mono3D) in mobile settings (e.g., on a vehicle, a drone, or a robot) is an important yet challenging task. Due to the near-far disparity phenomenon of monocular vision and the ever-changing camera pose, it is hard to acquire high detection accuracy, especially for far objects. Inspired by the insight that the depth of an object can be well determined according to the depth of the ground where it stands, in this paper, we propose a novel Mono3D framework, called MoGDE, which constantly estimates the corresponding ground depth of an image and then utilizes the estimated ground depth information to guide Mono3D. To this end, we utilize a pose detection network to estimate the pose of the camera and then construct a feature map portraying pixel-level ground depth according to the 3D-to-2D perspective geometry. Moreover, to improve Mono3D with the estimated ground depth, we design an RGB-D feature fusion network based on the transformer structure, where the long-range self-attention mechanism is utilized to effectively identify ground-contacting points and pin the corresponding ground depth to the image feature map. We conduct extensive experiments on the real-world KITTI dataset. The results demonstrate that MoGDE can effectively improve the Mono3D accuracy and robustness for both near and far objects. MoGDE yields the best performance compared with the state-of-the-art methods by a large margin and is ranked number one on the KITTI 3D benchmark.

Keyword: voxel

UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative Neural Feature Fields

  • Authors: Yuanbo Yang, Yifei Yang, Hanlei Guo, Rong Xiong, Yue Wang, Yiyi Liao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.14167
  • Pdf link: https://arxiv.org/pdf/2303.14167
  • Abstract
    Generating photorealistic images with controllable camera pose and scene contents is essential for many applications including AR/VR and simulation. Despite the fact that rapid progress has been made in 3D-aware generative models, most existing methods focus on object-centric images and are not applicable to generating urban scenes for free camera viewpoint control and scene editing. To address this challenging task, we propose UrbanGIRAFFE, which uses a coarse 3D panoptic prior, including the layout distribution of uncountable stuff and countable objects, to guide a 3D-aware generative model. Our model is compositional and controllable as it breaks down the scene into stuff, objects, and sky. Using stuff prior in the form of semantic voxel grids, we build a conditioned stuff generator that effectively incorporates the coarse semantic and geometry information. The object layout prior further allows us to learn an object generator from cluttered scenes. With proper loss functions, our approach facilitates photorealistic 3D-aware image synthesis with diverse controllability, including large camera movement, stuff editing, and object manipulation. We validate the effectiveness of our model on both synthetic and real-world datasets, including the challenging KITTI-360 dataset.

Keyword: lidar

Collaboration Helps Camera Overtake LiDAR in 3D Detection

  • Authors: Yue Hu, Yifan Lu, Runsheng Xu, Weidi Xie, Siheng Chen, Yanfeng Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13560
  • Pdf link: https://arxiv.org/pdf/2303.13560
  • Abstract
    Camera-only 3D detection provides an economical solution with a simple configuration for localizing objects in 3D space compared to LiDAR-based detection systems. However, a major challenge lies in precise depth estimation due to the lack of direct 3D measurements in the input. Many previous methods attempt to improve depth estimation through network designs, e.g., deformable layers and larger receptive fields. This work proposes an orthogonal direction, improving the camera-only 3D detection by introducing multi-agent collaborations. Our proposed collaborative camera-only 3D detection (CoCa3D) enables agents to share complementary information with each other through communication. Meanwhile, we optimize communication efficiency by selecting the most informative cues. The shared messages from multiple viewpoints disambiguate the single-agent estimated depth and complement the occluded and long-range regions in the single-agent view. We evaluate CoCa3D in one real-world dataset and two new simulation datasets. Results show that CoCa3D improves previous SOTA performances by 44.21% on DAIR-V2X, 30.60% on OPV2V+, 12.59% on CoPerception-UAVs+ for AP@70. Our preliminary results show a potential that with sufficient collaboration, the camera might overtake LiDAR in some practical scenarios. We released the dataset and code at https://siheng-chen.github.io/dataset/CoPerception+ and https://github.com/MediaBrain-SJTU/CoCa3D.

ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data

  • Authors: Haojie Zhao, Junsong Chen, Lijun Wang, Huchuan Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13885
  • Pdf link: https://arxiv.org/pdf/2303.13885
  • Abstract
    Compared with traditional RGB-only visual tracking, few datasets have been constructed for RGB-D tracking. In this paper, we propose ARKitTrack, a new RGB-D tracking dataset for both static and dynamic scenes captured by consumer-grade LiDAR scanners equipped on Apple's iPhone and iPad. ARKitTrack contains 300 RGB-D sequences, 455 targets, and 229.7K video frames in total. Along with the bounding box annotations and frame-level attributes, we also annotate this dataset with 123.9K pixel-level target masks. Besides, the camera intrinsic and camera pose of each frame are provided for future developments. To demonstrate the potential usefulness of this dataset, we further present a unified baseline for both box-level and pixel-level tracking, which integrates RGB features with bird's-eye-view representations to better explore cross-modality 3D geometry. In-depth empirical analysis has verified that the ARKitTrack dataset can significantly facilitate RGB-D tracking and that the proposed baseline method compares favorably against the state of the arts. The code and dataset is available at https://arkittrack.github.io.

CCL: Continual Contrastive Learning for LiDAR Place Recognition

  • Authors: Jiafeng Cui, Xieyuanli Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.13952
  • Pdf link: https://arxiv.org/pdf/2303.13952
  • Abstract
    Place recognition is an essential and challenging task in loop closing and global localization for robotics and autonomous driving applications. Benefiting from the recent advances in deep learning techniques, the performance of LiDAR place recognition (LPR) has been greatly improved. However, current deep learning-based methods suffer from two major problems: poor generalization ability and catastrophic forgetting. In this paper, we propose a continual contrastive learning method, named CCL, to tackle the catastrophic forgetting problem and generally improve the robustness of LPR approaches. Our CCL constructs a contrastive feature pool and utilizes contrastive loss to train more transferable representations of places. When transferred into new environments, our CCL continuously reviews the contrastive memory bank and applies a distribution-based knowledge distillation to maintain the retrieval ability of the past data while continually learning to recognize new places from the new data. We thoroughly evaluate our approach on Oxford, MulRan, and PNV datasets using three different LPR methods. The experimental results show that our CCL consistently improves the performance of different methods in different environments outperforming the state-of-the-art continual learning method. The implementation of our method has been released at https://github.com/cloudcjf/CCL.

StereoScene: BEV-Assisted Stereo Matching Empowers 3D Semantic Scene Completion

  • Authors: Bohan Li, Yasheng Sun, Xin Jin, Wenjun Zeng, Zheng Zhu, Xiaoefeng Wang, Yunpeng Zhang, James Okae, Hang Xiao, Dalong Du
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13959
  • Pdf link: https://arxiv.org/pdf/2303.13959
  • Abstract
    3D semantic scene completion (SSC) is an ill-posed task that requires inferring a dense 3D scene from incomplete observations. Previous methods either explicitly incorporate 3D geometric input or rely on learnt 3D prior behind monocular RGB images. However, 3D sensors such as LiDAR are expensive and intrusive while monocular cameras face challenges in modeling precise geometry due to the inherent ambiguity. In this work, we propose StereoScene for 3D Semantic Scene Completion (SSC), which explores taking full advantage of light-weight camera inputs without resorting to any external 3D sensors. Our key insight is to leverage stereo matching to resolve geometric ambiguity. To improve its robustness in unmatched areas, we introduce bird's-eye-view (BEV) representation to inspire hallucination ability with rich context information. On top of the stereo and BEV representations, a mutual interactive aggregation (MIA) module is carefully devised to fully unleash their power. Specifically, a Bi-directional Interaction Transformer (BIT) augmented with confidence re-weighting is used to encourage reliable prediction through mutual guidance while a Dual Volume Aggregation (DVA) module is designed to facilitate complementary aggregation. Experimental results on SemanticKITTI demonstrate that the proposed StereoScene outperforms the state-of-the-art camera-based methods by a large margin with a relative improvement of 26.9% in geometry and 38.6% in semantic.

New submissions for Thu, 13 Apr 23

Keyword: efficient

PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors

  • Authors: Haley M. So, Laurie Bose, Piotr Dudek, Gordon Wetzstein
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05440
  • Pdf link: https://arxiv.org/pdf/2304.05440
  • Abstract
    Conventional image sensors digitize high-resolution images at fast frame rates, producing a large amount of data that needs to be transmitted off the sensor for further processing. This is challenging for perception systems operating on edge devices, because communication is power inefficient and induces latency. Fueled by innovations in stacked image sensor fabrication, emerging sensor-processors offer programmability and minimal processing capabilities directly on the sensor. We exploit these capabilities by developing an efficient recurrent neural network architecture, PixelRNN, that encodes spatio-temporal features on the sensor using purely binary operations. PixelRNN reduces the amount of data to be transmitted off the sensor by a factor of 64x compared to conventional systems while offering competitive accuracy for hand gesture recognition and lip reading tasks. We experimentally validate PixelRNN using a prototype implementation on the SCAMP-5 sensor-processor platform.

Probabilistic Reasoning at Scale: Trigger Graphs to the Rescue

  • Authors: Efthymia Tsamoura, Jaehun Lee, Jacopo Urbani
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.05459
  • Pdf link: https://arxiv.org/pdf/2304.05459
  • Abstract
    The role of uncertainty in data management has become more prominent than ever before, especially because of the growing importance of machine learning-driven applications that produce large uncertain databases. A well-known approach to querying such databases is to blend rule-based reasoning with uncertainty. However, techniques proposed so far struggle with large databases. In this paper, we address this problem by presenting a new technique for probabilistic reasoning that exploits Trigger Graphs (TGs) -- a notion recently introduced for the non-probabilistic setting. The intuition is that TGs can effectively store a probabilistic model by avoiding an explicit materialization of the lineage and by grouping together similar derivations of the same fact. Firstly, we show how TGs can be adapted to support the possible world semantics. Then, we describe techniques for efficiently computing a probabilistic model, and formally establish the correctness of our approach. We also present an extensive empirical evaluation using a prototype called LTGs. Our comparison against other leading engines shows that LTGs is not only faster, even against approximate reasoning techniques, but can also reason over probabilistic databases that existing engines cannot scale to.

An Adaptive Factorized Nyström Preconditioner for Regularized Kernel Matrices

  • Authors: Shifan Zhao, Tianshi Xu, Edmond Chow, Yuanzhe Xi
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.05460
  • Pdf link: https://arxiv.org/pdf/2304.05460
  • Abstract
    The spectrum of a kernel matrix significantly depends on the parameter values of the kernel function used to define the kernel matrix. This makes it challenging to design a preconditioner for a regularized kernel matrix that is robust across different parameter values. This paper proposes the Adaptive Factorized Nystr"om (AFN) preconditioner. The preconditioner is designed for the case where the rank k of the Nystr"om approximation is large, i.e., for kernel function parameters that lead to kernel matrices with eigenvalues that decay slowly. AFN deliberately chooses a well-conditioned submatrix to solve with and corrects a Nystr"om approximation with a factorized sparse approximate matrix inverse. This makes AFN efficient for kernel matrices with large numerical ranks. AFN also adaptively chooses the size of this submatrix to balance accuracy and cost.

CamDiff: Camouflage Image Augmentation via Diffusion Model

  • Authors: Xue-Jing Luo, Shuo Wang, Zongwei Wu, Christos Sakaridis, Yun Cheng, Deng-Ping Fan, Luc Van Gool
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05469
  • Pdf link: https://arxiv.org/pdf/2304.05469
  • Abstract
    The burgeoning field of camouflaged object detection (COD) seeks to identify objects that blend into their surroundings. Despite the impressive performance of recent models, we have identified a limitation in their robustness, where existing methods may misclassify salient objects as camouflaged ones, despite these two characteristics being contradictory. This limitation may stem from lacking multi-pattern training images, leading to less saliency robustness. To address this issue, we introduce CamDiff, a novel approach inspired by AI-Generated Content (AIGC) that overcomes the scarcity of multi-pattern training images. Specifically, we leverage the latent diffusion model to synthesize salient objects in camouflaged scenes, while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training (CLIP) model to prevent synthesis failures and ensure the synthesized object aligns with the input prompt. Consequently, the synthesized image retains its original camouflage label while incorporating salient objects, yielding camouflage samples with richer characteristics. The results of user studies show that the salient objects in the scenes synthesized by our framework attract the user's attention more; thus, such samples pose a greater challenge to the existing COD models. Our approach enables flexible editing and efficient large-scale dataset generation at a low cost. It significantly enhances COD baselines' training and testing phases, emphasizing robustness across diverse domains. Our newly-generated datasets and source code are available at https://github.com/drlxj/CamDiff.

Contingency Games for Multi-Agent Interaction

  • Authors: Lasse Peters, Andrea Bajcsy, Chih-Yuan Chiu, David Fridovich-Keil, Forrest Laine, Laura Ferranti, Javier Alonso-Mora
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.05483
  • Pdf link: https://arxiv.org/pdf/2304.05483
  • Abstract
    Contingency planning, wherein an agent generates a set of possible plans conditioned on the outcome of an uncertain event, is an increasingly popular way for robots to act under uncertainty. In this work, we take a game-theoretic perspective on contingency planning which is tailored to multi-agent scenarios in which a robot's actions impact the decisions of other agents and vice versa. The resulting contingency game allows the robot to efficiently coordinate with other agents by generating strategic motion plans conditioned on multiple possible intents for other actors in the scene. Contingency games are parameterized via a scalar variable which represents a future time at which intent uncertainty will be resolved. Varying this parameter enables a designer to easily adjust how conservatively the robot behaves in the game. Interestingly, we also find that existing variants of game-theoretic planning under uncertainty are readily obtained as special cases of contingency games. Lastly, we offer an efficient method for solving N-player contingency games with nonlinear dynamics and non-convex costs and constraints. Through a series of simulated autonomous driving scenarios, we demonstrate that plans generated via contingency games provide quantitative performance gains over game-theoretic motion plans that do not account for future uncertainty reduction.

Communication Efficient DNN Partitioning-based Federated Learning

  • Authors: Di Wu, Rehmat Ullah, Philip Rodgers, Peter Kilpatrick, Ivor Spence, Blesson Varghese
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.05495
  • Pdf link: https://arxiv.org/pdf/2304.05495
  • Abstract
    Efficiently running federated learning (FL) on resource-constrained devices is challenging since they are required to train computationally intensive deep neural networks (DNN) independently. DNN partitioning-based FL (DPFL) has been proposed as one mechanism to accelerate training where the layers of a DNN (or computation) are offloaded from the device to an edge server. However, this creates significant communication overheads since the activation and gradient need to be transferred between the device and the edge server during training. Current techniques reduce the communication introduced by DNN partitioning using local loss-based methods. We demonstrate that these methods adversely impact accuracy and ignore the communication costs incurred when transmitting the activation from the device to the server. This paper proposes ActionFed - a communication efficient framework for DPFL to accelerate training on resource-constrained devices. ActionFed eliminates the transmission of the gradient by developing pre-trained initialization of the DNN model on the device for the first time. This reduces the accuracy degradation seen in local loss-based methods. In addition, ActionFed proposes a novel replay buffer mechanism and implements a quantization-based compression technique to reduce the transmission of the activation. It is experimentally demonstrated that ActionFed can reduce the communication cost by up to 15.77x and accelerates training by up to 3.87x when compared to vanilla DPFL.

Revisiting Single-gated Mixtures of Experts

  • Authors: Amelie Royer, Ilia Karmanov, Andrii Skliar, Babak Ehteshami Bejnordi, Tijmen Blankevoort
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.05497
  • Pdf link: https://arxiv.org/pdf/2304.05497
  • Abstract
    Mixture of Experts (MoE) are rising in popularity as a means to train extremely large-scale models, yet allowing for a reasonable computational cost at inference time. Recent state-of-the-art approaches usually assume a large number of experts, and require training all experts jointly, which often lead to training instabilities such as the router collapsing In contrast, in this work, we propose to revisit the simple single-gate MoE, which allows for more practical training. Key to our work are (i) a base model branch acting both as an early-exit and an ensembling regularization scheme, (ii) a simple and efficient asynchronous training pipeline without router collapse issues, and finally (iii) a per-sample clustering-based initialization. We show experimentally that the proposed model obtains efficiency-to-accuracy trade-offs comparable with other more complex MoE, and outperforms non-mixture baselines. This showcases the merits of even a simple single-gate MoE, and motivates further exploration in this area.

GraphGANFed: A Federated Generative Framework for Graph-Structured Molecules Towards Efficient Drug Discovery

  • Authors: Daniel Manu, Jingjing Yao, Wuji Liu, Xiang Sun
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05498
  • Pdf link: https://arxiv.org/pdf/2304.05498
  • Abstract
    Recent advances in deep learning have accelerated its use in various applications, such as cellular image analysis and molecular discovery. In molecular discovery, a generative adversarial network (GAN), which comprises a discriminator to distinguish generated molecules from existing molecules and a generator to generate new molecules, is one of the premier technologies due to its ability to learn from a large molecular data set efficiently and generate novel molecules that preserve similar properties. However, different pharmaceutical companies may be unwilling or unable to share their local data sets due to the geo-distributed and sensitive nature of molecular data sets, making it impossible to train GANs in a centralized manner. In this paper, we propose a Graph convolutional network in Generative Adversarial Networks via Federated learning (GraphGANFed) framework, which integrates graph convolutional neural Network (GCN), GAN, and federated learning (FL) as a whole system to generate novel molecules without sharing local data sets. In GraphGANFed, the discriminator is implemented as a GCN to better capture features from molecules represented as molecular graphs, and FL is used to train both the discriminator and generator in a distributive manner to preserve data privacy. Extensive simulations are conducted based on the three bench-mark data sets to demonstrate the feasibility and effectiveness of GraphGANFed. The molecules generated by GraphGANFed can achieve high novelty (=100) and diversity (> 0.9). The simulation results also indicate that 1) a lower complexity discriminator model can better avoid mode collapse for a smaller data set, 2) there is a tradeoff among different evaluation metrics, and 3) having the right dropout ratio of the generator and discriminator can avoid mode collapse.

L3MVN: Leveraging Large Language Models for Visual Target Navigation

  • Authors: Bangguo Yu, Hamidreza Kasaei, Ming Cao
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.05501
  • Pdf link: https://arxiv.org/pdf/2304.05501
  • Abstract
    Visual target navigation in unknown environments is a crucial problem in robotics. Despite extensive investigation of classical and learning-based approaches in the past, robots lack common-sense knowledge about household objects and layouts. Prior state-of-the-art approaches to this task rely on learning the priors during the training and typically require significant expensive resources and time for learning. To address this, we propose a new framework for visual target navigation that leverages Large Language Models (LLM) to impart common sense for object searching. Specifically, we introduce two paradigms: (i) zero-shot and (ii) feed-forward approaches that use language to find the relevant frontier from the semantic map as a long-term goal and explore the environment efficiently. Our analysis demonstrates the notable zero-shot generalization and transfer capabilities from the use of language. Experiments on Gibson and Habitat-Matterport 3D (HM3D) demonstrate that the proposed framework significantly outperforms existing map-based methods in terms of success rate and generalization. Ablation analysis also indicates that the common-sense knowledge from the language model leads to more efficient semantic exploration. Finally, we provide a real robot experiment to verify the applicability of our framework in real-world scenarios. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/l3mvn.

Frontier Semantic Exploration for Visual Target Navigation

  • Authors: Bangguo Yu, Hamidreza Kasaei, Ming Cao
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.05506
  • Pdf link: https://arxiv.org/pdf/2304.05506
  • Abstract
    This work focuses on the problem of visual target navigation, which is very important for autonomous robots as it is closely related to high-level tasks. To find a special object in unknown environments, classical and learning-based approaches are fundamental components of navigation that have been investigated thoroughly in the past. However, due to the difficulty in the representation of complicated scenes and the learning of the navigation policy, previous methods are still not adequate, especially for large unknown scenes. Hence, we propose a novel framework for visual target navigation using the frontier semantic policy. In this proposed framework, the semantic map and the frontier map are built from the current observation of the environment. Using the features of the maps and object category, deep reinforcement learning enables to learn a frontier semantic policy which can be used to select a frontier cell as a long-term goal to explore the environment efficiently. Experiments on Gibson and Habitat-Matterport 3D (HM3D) demonstrate that the proposed framework significantly outperforms existing map-based methods in terms of success rate and efficiency. Ablation analysis also indicates that the proposed approach learns a more efficient exploration policy based on the frontiers. A demonstration is provided to verify the applicability of applying our model to real-world transfer. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/fsevn.

Training Large Language Models Efficiently with Sparsity and Dataflow

  • Authors: Venkat Srinivasan, Darshan Gandhi, Urmish Thakker, Raghu Prabhakar
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.05511
  • Pdf link: https://arxiv.org/pdf/2304.05511
  • Abstract
    Large foundation language models have shown their versatility in being able to be adapted to perform a wide variety of downstream tasks, such as text generation, sentiment analysis, semantic search etc. However, training such large foundational models is a non-trivial exercise that requires a significant amount of compute power and expertise from machine learning and systems experts. As models get larger, these demands are only increasing. Sparsity is a promising technique to relieve the compute requirements for training. However, sparsity introduces new challenges in training the sparse model to the same quality as the dense counterparts. Furthermore, sparsity drops the operation intensity and introduces irregular memory access patterns that makes it challenging to efficiently utilize compute resources. This paper demonstrates an end-to-end training flow on a large language model - 13 billion GPT - using sparsity and dataflow. The dataflow execution model and architecture enables efficient on-chip irregular memory accesses as well as native kernel fusion and pipelined parallelism that helps recover device utilization. We show that we can successfully train GPT 13B to the same quality as the dense GPT 13B model, while achieving an end-end speedup of 4.5x over dense A100 baseline.

State estimation of a carbon capture process through POD model reduction and neural network approximation

  • Authors: Siyu Liu, Xunyuan Yin, Jinfeng Liu (University of Alberta)
  • Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.05514
  • Pdf link: https://arxiv.org/pdf/2304.05514
  • Abstract
    This paper presents an efficient approach for state estimation of post-combustion CO2 capture plants (PCCPs) by using reduced-order neural network models. The method involves extracting lower-dimensional feature vectors from high-dimensional operational data of the PCCP and constructing a reduced-order process model using proper orthogonal decomposition (POD). Multi-layer perceptron (MLP) neural networks capture the dominant dynamics of the process and train the network parameters with low-dimensional data obtained from open-loop simulations. The proposed POD-MLP model can be used as the basis for estimating the states of PCCPs at a significantly decreased computational cost. For state estimation, a reduced-order extended Kalman filtering (EKF) scheme based on the POD-MLP model is developed. Our simulations demonstrate that the proposed POD-MLP modeling approach reduces computational complexity compared to the POD-only model for nonlinear systems. Additionally, the POD-MLP-EKF algorithm can accurately reconstruct the full state information of PCCPs while significantly improving computational efficiency compared to the EKF based on the original PCCP model.

MoMo: A shared encoder Model for text, image and multi-Modal representations

  • Authors: Rakesh Chada, Zhaoheng Zheng, Pradeep Natarajan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.05523
  • Pdf link: https://arxiv.org/pdf/2304.05523
  • Abstract
    We propose a self-supervised shared encoder model that achieves strong results on several visual, language and multimodal benchmarks while being data, memory and run-time efficient. We make three key contributions. First, in contrast to most existing works, we use a single transformer with all the encoder layers processing both the text and the image modalities. Second, we propose a stage-wise training strategy where the model is first trained on images, then jointly with unimodal text and image datasets and finally jointly with text and text-image datasets. Third, to preserve information across both the modalities, we propose a training pipeline that learns simultaneously from gradient updates of different modalities at each training update step. The results on downstream text-only, image-only and multimodal tasks show that our model is competitive with several strong models while using fewer parameters and lesser pre-training data. For example, MoMo performs competitively with FLAVA on multimodal (+3.1), image-only (+1.1) and text-only (-0.1) tasks despite having 2/5th the number of parameters and using 1/3rd the image-text training pairs. Finally, we ablate various design choices and further show that increasing model size produces significant performance gains indicating potential for substantial improvements with larger models using our approach.

Understanding Causality with Large Language Models: Feasibility and Opportunities

  • Authors: Cheng Zhang, Stefan Bauer, Paul Bennett, Jiangfeng Gao, Wenbo Gong, Agrin Hilmkil, Joel Jennings, Chao Ma, Tom Minka, Nick Pawlowski, James Vaughan
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.05524
  • Pdf link: https://arxiv.org/pdf/2304.05524
  • Abstract
    We assess the ability of large language models (LLMs) to answer causal questions by analyzing their strengths and weaknesses against three types of causal question. We believe that current LLMs can answer causal questions with existing causal knowledge as combined domain experts. However, they are not yet able to provide satisfactory answers for discovering new knowledge or for high-stakes decision-making tasks with high precision. We discuss possible future directions and opportunities, such as enabling explicit and implicit causal modules as well as deep causal-aware LLMs. These will not only enable LLMs to answer many different types of causal questions for greater impact but also enable LLMs to be more trustworthy and efficient in general.

Encrypted Price-based Market Mechanism for Optimal Load Frequency Control

  • Authors: Jihoon Suh, Takashi Tanaka
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.05525
  • Pdf link: https://arxiv.org/pdf/2304.05525
  • Abstract
    The global trend of energy deregulation has led to the market mechanism replacing some functionality of load frequency control (LFC). Accordingly, information exchange among participating generators and the market operator plays a crucial role in optimizing social utility. However, privacy has been an equally pressing concern in such settings. This conflict between individuals' privacy and social utility has been a long-standing challenge in market mechanism literature as well as in Cyber-Physical Systems (CPSs). In this paper, we propose a novel encrypted market architecture that leverages a hybrid encryption method and two-party computation protocols, enabling the secure synthesis and implementation of an optimal price-based market mechanism. This work spotlights the importance of secure and efficient outsourcing of controller synthesis, which is a critical element within the proposed framework. A two-area LFC model is used to conduct a case study.

Group projected Subspace Pursuit for Identification of variable coefficient differential equations (GP-IDENT)

  • Authors: Yuchen He, Sung-Ha Kang, Wenjing Liao, Hao Liu, Yingjie Liu
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.05543
  • Pdf link: https://arxiv.org/pdf/2304.05543
  • Abstract
    We propose an effective and robust algorithm for identifying partial differential equations (PDEs) with space-time varying coefficients from a single trajectory of noisy observations. Identifying unknown differential equations from noisy observations is a difficult task, and it is even more challenging with space and time varying coefficients in the PDE. The proposed algorithm, GP-IDENT, has three ingredients: (i) we use B-spline bases to express the unknown space and time varying coefficients, (ii) we propose Group Projected Subspace Pursuit (GPSP) to find a sequence of candidate PDEs with various levels of complexity, and (iii) we propose a new criterion for model selection using the Reduction in Residual (RR) to choose an optimal one among the pool of candidates. The new GPSP considers group projected subspaces which is more robust than existing methods in distinguishing correlated group features. We test GP-IDENT on a variety of PDEs and PDE systems, and compare it with the state-of-the-art parametric PDE identification algorithms under different settings to illustrate its outstanding performance. Our experiments show that GP-IDENT is effective in identifying the correct terms from a large dictionary and the model selection scheme is robust to noise.

MEMA Runtime Framework: Minimizing External Memory Accesses for TinyML on Microcontrollers

  • Authors: Andrew Sabot, Vikas Natesh, H.T. Kung, Wei-Te Ting
  • Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Performance (cs.PF); Programming Languages (cs.PL)
  • Arxiv link: https://arxiv.org/abs/2304.05544
  • Pdf link: https://arxiv.org/pdf/2304.05544
  • Abstract
    We present the MEMA framework for the easy and quick derivation of efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems. The framework accounts for hardware resource constraints and problem sizes in analytically determining optimized schedules and kernels that minimize memory accesses. MEMA provides a solution to a well-known problem in the current practice, that is, optimal schedules tend to be found only through a time consuming and heuristic search of a large scheduling space. We compare the performance of runtimes derived from MEMA to existing state-of-the-art libraries on ARM-based TinyML systems. For example, for neural network benchmarks on the ARM Cortex-M4, we achieve up to a 1.8x speedup and 44% energy reduction over CMSIS-NN.

A Predictive Model using Machine Learning Algorithm in Identifying Students Probability on Passing Semestral Course

  • Authors: Anabella C. Doctor
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.05565
  • Pdf link: https://arxiv.org/pdf/2304.05565
  • Abstract
    This study aims to determine a predictive model to learn students probability to pass their courses taken at the earliest stage of the semester. To successfully discover a good predictive model with high acceptability, accurate, and precision rate which delivers a useful outcome for decision making in education systems, in improving the processes of conveying knowledge and uplifting students academic performance, the proponent applies and strictly followed the CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology. This study employs classification for data mining techniques, and decision tree for algorithm. With the utilization of the newly discovered predictive model, the prediction of students probabilities to pass the current courses they take gives 0.7619 accuracy, 0.8333 precision, 0.8823 recall, and 0.8571 f1 score, which shows that the model used in the prediction is reliable, accurate, and recommendable. Considering the indicators and the results, it can be noted that the prediction model used in this study is highly acceptable. The data mining techniques provides effective and efficient innovative tools in analyzing and predicting student performances. The model used in this study will greatly affect the way educators understand and identify the weakness of their students in the class, the way they improved the effectiveness of their learning processes gearing to their students, bring down academic failure rates, and help institution administrators modify their learning system outcomes. Further study for the inclusion of some students demographic information, vast amount of data within the dataset, automated and manual process of predictive criteria indicators where the students can regulate to which criteria, they must improve more for them to pass their courses taken at the end of the semester as early as midterm period are highly needed.

Distributed Compressed Sparse Row Format for Spiking Neural Network Simulation, Serialization, and Interoperability

  • Authors: Felix Wang
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.05587
  • Pdf link: https://arxiv.org/pdf/2304.05587
  • Abstract
    With the increasing development of neuromorphic platforms and their related software tools as well as the increasing scale of spiking neural network (SNN) models, there is a pressure for interoperable and scalable representations of network state. In response to this, we discuss a parallel extension of a widely used format for efficiently representing sparse matrices, the compressed sparse row (CSR), in the context of supporting the simulation and serialization of large-scale SNNs. Sparse matrices for graph adjacency structure provide a natural fit for describing the connectivity of an SNN, and prior work in the area of parallel graph partitioning has developed the distributed CSR (dCSR) format for storing and ingesting large graphs. We contend that organizing additional network information, such as neuron and synapse state, in alignment with its adjacency as dCSR provides a straightforward partition-based distribution of network state. For large-scale simulations, this means each parallel process is only responsible for its own partition of state, which becomes especially useful when the size of an SNN exceeds the memory resources of a single compute node. For potentially long-running simulations, this also enables network serialization to and from disk (e.g. for checkpoint/restart fault-tolerant computing) to be performed largely independently between parallel processes. We also provide a potential implementation, and put it forward for adoption within the neural computing community.

Zero-Knowledge Proof-based Practical Federated Learning on Blockchain

  • Authors: Zhibo Xing, Zijian Zhang, Meng Li, Jiamou Liu, Liehuang Zhu, Giovanni Russello, Muhammad Rizwan Asghar
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.05590
  • Pdf link: https://arxiv.org/pdf/2304.05590
  • Abstract
    Since the concern of privacy leakage extremely discourages user participation in sharing data, federated learning has gradually become a promising technique for both academia and industry for achieving collaborative learning without leaking information about the local data. Unfortunately, most federated learning solutions cannot efficiently verify the execution of each participant's local machine learning model and protect the privacy of user data, simultaneously. In this article, we first propose a Zero-Knowledge Proof-based Federated Learning (ZKP-FL) scheme on blockchain. It leverages zero-knowledge proof for both the computation of local data and the aggregation of local model parameters, aiming to verify the computation process without requiring the plaintext of the local data. We further propose a Practical ZKP-FL (PZKP-FL) scheme to support fraction and non-linear operations. Specifically, we explore a Fraction-Integer mapping function, and use Taylor expansion to efficiently handle non-linear operations while maintaining the accuracy of the federated learning model. We also analyze the security of PZKP-FL. Performance analysis demonstrates that the whole running time of the PZKP-FL scheme is approximately less than one minute in parallel execution.

Vehicle Trajectory Prediction based Predictive Collision Risk Assessment for Autonomous Driving in Highway Scenarios

  • Authors: Dejian Meng, Wei Xiao, Lijun Zhang, Zhuang Zhang, Zihao Liu
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.05610
  • Pdf link: https://arxiv.org/pdf/2304.05610
  • Abstract
    For driving safely and efficiently in highway scenarios, autonomous vehicles (AVs) must be able to predict future behaviors of surrounding object vehicles (OVs), and assess collision risk accurately for reasonable decision-making. Aiming at autonomous driving in highway scenarios, a predictive collision risk assessment method based on trajectory prediction of OVs is proposed in this paper. Firstly, the vehicle trajectory prediction is formulated as a sequence generation task with long short-term memory (LSTM) encoder-decoder framework. Convolutional social pooling (CSP) and graph attention network (GAN) are adopted for extracting local spatial vehicle interactions and distant spatial vehicle interactions, respectively. Then, two basic risk metrics, time-to-collision (TTC) and minimal distance margin (MDM), are calculated between the predicted trajectory of OV and the candidate trajectory of AV. Consequently, a time-continuous risk function is constructed with temporal and spatial risk metrics. Finally, the vehicle trajectory prediction model CSP-GAN-LSTM is evaluated on two public highway datasets. The quantitative results indicate that the proposed CSP-GAN-LSTM model outperforms the existing state-of-the-art (SOTA) methods in terms of position prediction accuracy. Besides, simulation results in typical highway scenarios further validate the feasibility and effectiveness of the proposed predictive collision risk assessment method.

NutritionVerse-3D: A 3D Food Model Dataset for Nutritional Intake Estimation

  • Authors: Chi-en Amy Tai, Matthew Keller, Mattie Kerrigan, Yuhao Chen, Saeejith Nair, Pengcheng Xi, Alexander Wong
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05619
  • Pdf link: https://arxiv.org/pdf/2304.05619
  • Abstract
    77% of adults over 50 want to age in place today, presenting a major challenge to ensuring adequate nutritional intake. It has been reported that one in four older adults that are 65 years or older are malnourished and given the direct link between malnutrition and decreased quality of life, there have been numerous studies conducted on how to efficiently track nutritional intake of food. Recent advancements in machine learning and computer vision show promise of automated nutrition tracking methods of food, but require a large high-quality dataset in order to accurately identify the nutrients from the food on the plate. Unlike existing datasets, a collection of 3D models with nutritional information allow for view synthesis to create an infinite number of 2D images for any given viewpoint/camera angle along with the associated nutritional information. In this paper, we develop a methodology for collecting high-quality 3D models for food items with a particular focus on speed and consistency, and introduce NutritionVerse-3D, a large-scale high-quality high-resolution dataset of 105 3D food models, in conjunction with their associated weight, food name, and nutritional value. These models allow for large quantity food intake scenes, diverse and customizable scene layout, and an infinite number of camera settings and lighting conditions. NutritionVerse-3D is publicly available as a part of an open initiative to accelerate machine learning for nutrition sensing.

Constructing Deep Spiking Neural Networks from Artificial Neural Networks with Knowledge Distillation

  • Authors: Qi Xu, Yaxin Li, Jiangrong Shen, Jian K Liu, Huajin Tang, Gang Pan
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05627
  • Pdf link: https://arxiv.org/pdf/2304.05627
  • Abstract
    Spiking neural networks (SNNs) are well known as the brain-inspired models with high computing efficiency, due to a key component that they utilize spikes as information units, close to the biological neural systems. Although spiking based models are energy efficient by taking advantage of discrete spike signals, their performance is limited by current network structures and their training methods. As discrete signals, typical SNNs cannot apply the gradient descent rules directly into parameters adjustment as artificial neural networks (ANNs). Aiming at this limitation, here we propose a novel method of constructing deep SNN models with knowledge distillation (KD) that uses ANN as teacher model and SNN as student model. Through ANN-SNN joint training algorithm, the student SNN model can learn rich feature information from the teacher ANN model through the KD method, yet it avoids training SNN from scratch when communicating with non-differentiable spikes. Our method can not only build a more efficient deep spiking structure feasibly and reasonably, but use few time steps to train whole model compared to direct training or ANN to SNN methods. More importantly, it has a superb ability of noise immunity for various types of artificial noises and natural signals. The proposed novel method provides efficient ways to improve the performance of SNN through constructing deeper structures in a high-throughput fashion, with potential usage for light and efficient brain-inspired computing of practical scenarios.

DOSM: Demand-Prediction based Online Service Management for Vehicular Edge Computing Networks

  • Authors: Anum Talpur, Mohan Gurusamy
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.05637
  • Pdf link: https://arxiv.org/pdf/2304.05637
  • Abstract
    In this work, we investigate an online service management problem in vehicular edge computing networks. To satisfy the varying service demands of mobile vehicles, a service management framework is required to make decisions on the service lifecycle to maintain good network performance. We describe the service lifecycle consists of creating an instance of a given service (\textit{scale-out}), moving an instance to a different edge node (\textit{migration}), and/or termination of an underutilized instance (\textit{scale-in}). In this paper, we propose an efficient online algorithm to perform service management in each time slot, where performance quality in the current time slot, the service demand in future time slots, and the minimal observed delay by vehicles and the minimal migration delay are considered while making the decisions on service lifecycle. Here, the future service demand is computed from a gated recurrent unit (GRU)-based prediction model, and the network performance quality is estimated using a deep reinforcement learning (DRL) model which has the ability to interact with the vehicular environment in real-time. The choice of optimal edge location to deploy a service instance at different times is based on our proposed optimization formulations. Simulation experiments using real-world vehicle trajectories are carried out to evaluate the performance of our proposed demand-prediction based online service management (DOSM) framework against different state-of-the-art solutions using several performance metrics.

An Optimal SVC Bitstream Schema for Viewport-dependent 360-degree Video Streaming

  • Authors: Gang Shen, Mingyang Ma, Guangxin Xu
  • Subjects: Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2304.05654
  • Pdf link: https://arxiv.org/pdf/2304.05654
  • Abstract
    To deliver ultra-high resolution 360-degree video (such as 8K, 12K, or even higher) across the internet, viewport-dependent streaming becomes necessary to save bandwidth. During viewport switches, clients and servers will instantly exchange coordination info and contents for the given viewports. However, those viewport switches pose a serious challenge for video encoding because the temporal dependency between contents within changing viewports is unpredictable. In existing practices, it is commonly noted that GOP (Group of Pictures) size in a bitstream intrinsically prohibits the reduction of the viewport switch latency, such as Motion-to-photon (MTP) latency, or motion-to-high-quality (MTHQ) latency. In this paper, we presented a Scalable Video Coding (SVC) based bitstream schema, which can structurally remove the impacts of GOP in viewport-dependent streaming and provide instant viewport switches within one-frame time (the best possible). In addition, combined with tiling, this new coding schema allows an efficient packing of the non-adjacent regions within a viewport of 360-degree video. Our experiments also show that the overall encoding with this SVC-based approach is faster than with multi-stream approaches. Compared with current 360-degree video streaming solutions based on MPEG-I OMAF, our approach is superior in terms of viewport switch latency, simplicity of viewport packing, and encoding performance.

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer

  • Authors: Jiahao Wang, Songyang Zhang, Yong Liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05659
  • Pdf link: https://arxiv.org/pdf/2304.05659
  • Abstract
    This paper studies how to keep a vision backbone effective while removing token mixers in its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are intended to perform information communication between different spatial tokens but suffer from considerable computational cost and latency. However, directly removing them will lead to an incomplete model structure prior, and thus brings a significant accuracy drop. To this end, we first develop an RepIdentityFormer base on the re-parameterizing idea, to study the token mixer free model architecture. And we then explore the improved learning paradigm to break the limitation of simple token mixer free backbone, and summarize the empirical practice into 5 guidelines. Equipped with the proposed optimization strategy, we are able to build an extremely simple vision backbone with encouraging performance, while enjoying the high efficiency during inference. Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy. We hope this work can serve as a starting point for the exploration of optimization-driven efficient network design. Project page: https://techmonsterwang.github.io/RIFormer/.

A parallel rank-adaptive integrator for dynamical low-rank approximation

  • Authors: Gianluca Ceruti, Jonas Kusch, Christian Lubich
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.05660
  • Pdf link: https://arxiv.org/pdf/2304.05660
  • Abstract
    This work introduces a parallel and rank-adaptive matrix integrator for dynamical low-rank approximation. The method is related to the previously proposed rank-adaptive basis update & Galerkin (BUG) integrator but differs significantly in that all arising differential equations, both for the basis and the Galerkin coefficients, are solved in parallel. Moreover, this approach eliminates the need for a potentially costly coefficient update with augmented basis matrices. The integrator also incorporates a new step rejection strategy that enhances the robustness of both the parallel integrator and the BUG integrator. By construction, the parallel integrator inherits the robust error bound of the BUG and projector-splitting integrators. Comparisons of the parallel and BUG integrators are presented by a series of numerical experiments which demonstrate the efficiency of the proposed method, for problems from radiative transfer and radiation therapy.

SuperpixelGraph: Semi-automatic generation of building footprint through semantic-sensitive superpixel and neural graph networks

  • Authors: Haojia Yu, Han Hu, Bo Xu, Qisen Shang, Zhendong Wang, Qing Zhu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05661
  • Pdf link: https://arxiv.org/pdf/2304.05661
  • Abstract
    Most urban applications necessitate building footprints in the form of concise vector graphics with sharp boundaries rather than pixel-wise raster images. This need contrasts with the majority of existing methods, which typically generate over-smoothed footprint polygons. Editing these automatically produced polygons can be inefficient, if not more time-consuming than manual digitization. This paper introduces a semi-automatic approach for building footprint extraction through semantically-sensitive superpixels and neural graph networks. Drawing inspiration from object-based classification techniques, we first learn to generate superpixels that are not only boundary-preserving but also semantically-sensitive. The superpixels respond exclusively to building boundaries rather than other natural objects, while simultaneously producing semantic segmentation of the buildings. These intermediate superpixel representations can be naturally considered as nodes within a graph. Consequently, graph neural networks are employed to model the global interactions among all superpixels and enhance the representativeness of node features for building segmentation. Classical approaches are utilized to extract and regularize boundaries for the vectorized building footprints. Utilizing minimal clicks and straightforward strokes, we efficiently accomplish accurate segmentation outcomes, eliminating the necessity for editing polygon vertices. Our proposed approach demonstrates superior precision and efficacy, as validated by experimental assessments on various public benchmark datasets. We observe a 10% enhancement in the metric for superpixel clustering and an 8% increment in vector graphics evaluation, when compared with established techniques. Additionally, we have devised an optimized and sophisticated pipeline for interactive editing, poised to further augment the overall quality of the results.

Rail Detection: An Efficient Row-based Network and A New Benchmark

  • Authors: Xinpeng Li, Xiaojiang Peng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05667
  • Pdf link: https://arxiv.org/pdf/2304.05667
  • Abstract
    Rail detection, essential for railroad anomaly detection, aims to identify the railroad region in video frames. Although various studies on rail detection exist, neither an open benchmark nor a high-speed network is available in the community, making algorithm comparison and development difficult. Inspired by the growth of lane detection, we propose a rail database and a row-based rail detection method. In detail, we make several contributions: (i) We present a real-world railway dataset, Rail-DB, with 7432 pairs of images and annotations. The images are collected from different situations in lighting, road structures, and views. The rails are labeled with polylines, and the images are categorized into nine scenes. The Rail-DB is expected to facilitate the improvement of rail detection algorithms. (ii) We present an efficient row-based rail detection method, Rail-Net, containing a lightweight convolutional backbone and an anchor classifier. Specifically, we formulate the process of rail detection as a row-based selecting problem. This strategy reduces the computational cost compared to alternative segmentation methods. (iii) We evaluate the Rail-Net on Rail-DB with extensive experiments, including cross-scene settings and network backbones ranging from ResNet to Vision Transformers. Our method achieves promising performance in terms of both speed and accuracy. Notably, a lightweight version could achieve 92.77% accuracy and 312 frames per second. The Rail-Net outperforms the traditional method by 50.65% and the segmentation one by 5.86%. The database and code are available at: https://github.com/Sampson-Lee/Rail-Detection.

Real-time Trajectory-based Social Group Detection

  • Authors: Simindokht Jahangard, Munawar Hayat, Hamid Rezatofighi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05678
  • Pdf link: https://arxiv.org/pdf/2304.05678
  • Abstract
    Social group detection is a crucial aspect of various robotic applications, including robot navigation and human-robot interactions. To date, a range of model-based techniques have been employed to address this challenge, such as the F-formation and trajectory similarity frameworks. However, these approaches often fail to provide reliable results in crowded and dynamic scenarios. Recent advancements in this area have mainly focused on learning-based methods, such as deep neural networks that use visual content or human pose. Although visual content-based methods have demonstrated promising performance on large-scale datasets, their computational complexity poses a significant barrier to their practical use in real-time applications. To address these issues, we propose a simple and efficient framework for social group detection. Our approach explores the impact of motion trajectory on social grouping and utilizes a novel, reliable, and fast data-driven method. We formulate the individuals in a scene as a graph, where the nodes are represented by LSTM-encoded trajectories and the edges are defined by the distances between each pair of tracks. Our framework employs a modified graph transformer module and graph clustering losses to detect social groups. Our experiments on the popular JRDBAct dataset reveal noticeable improvements in performance, with relative improvements ranging from 2% to 11%. Furthermore, our framework is significantly faster, with up to 12x faster inference times compared to state-of-the-art methods under the same computation resources. These results demonstrate that our proposed method is suitable for real-time robotic applications.

Fully Conservative Difference Schemes for the Rotation-Two-Component Camassa-Holm System with Smooth/Nonsmooth Initial Data

  • Authors: Tong Yan, Jiwei Zhang, Qifeng Zhang
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.05679
  • Pdf link: https://arxiv.org/pdf/2304.05679
  • Abstract
    The rotation-two-component Camassa--Holm system, which possesses strongly nonlinear coupled terms and high-order differential terms, tends to have continuous nonsmooth solitary wave solutions, such as peakons, stumpons, composite waves and even chaotic waves. In this paper an accurate semi-discrete conservative difference scheme for the system is derived by taking advantage of its Hamiltonian invariants. We show that the semi-discrete numerical scheme preserves at least three discrete conservative laws: mass, momentum and energy. Furthermore, a fully discrete finite difference scheme is proposed without destroying anyone of the conservative laws. Combining a nonlinear iteration process and an efficient threshold strategy, the accuracy of the numerical scheme can be guaranteed. Meanwhile, the difference scheme can capture the formation and propagation of solitary wave solutions with satisfying long time behavior under the smooth/nonsmooth initial data. The numerical results reveal a new type of asymmetric wave breaking phenomenon under the nonzero rotational parameter.

Human-Robot Skill Transfer with Enhanced Compliance via Dynamic Movement Primitives

  • Authors: Jayden Hong, Zengjie Zhang, Amir M. Soufi Enayati, Homayoun Najjaran
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05703
  • Pdf link: https://arxiv.org/pdf/2304.05703
  • Abstract
    Finding an efficient way to adapt robot trajectory is a priority to improve overall performance of robots. One approach for trajectory planning is through transferring human-like skills to robots by Learning from Demonstrations (LfD). The human demonstration is considered the target motion to mimic. However, human motion is typically optimal for human embodiment but not for robots because of the differences between human biomechanics and robot dynamics. The Dynamic Movement Primitives (DMP) framework is a viable solution for this limitation of LfD, but it requires tuning the second-order dynamics in the formulation. Our contribution is introducing a systematic method to extract the dynamic features from human demonstration to auto-tune the parameters in the DMP framework. In addition to its use with LfD, another utility of the proposed method is that it can readily be used in conjunction with Reinforcement Learning (RL) for robot training. In this way, the extracted features facilitate the transfer of human skills by allowing the robot to explore the possible trajectories more efficiently and increasing robot compliance significantly. We introduced a methodology to extract the dynamic features from multiple trajectories based on the optimization of human-likeness and similarity in the parametric space. Our method was implemented into an actual human-robot setup to extract human dynamic features and used to regenerate the robot trajectories following both LfD and RL with DMP. It resulted in a stable performance of the robot, maintaining a high degree of human-likeness based on accumulated distance error as good as the best heuristic tuning.

Stochastic Domain Decomposition Based on Variable-Separation Method

  • Authors: Liang Chen, Yaru Chen, Qiuqi Li
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.05708
  • Pdf link: https://arxiv.org/pdf/2304.05708
  • Abstract
    Uncertainty propagation across different domains is of fundamental importance in stochastic simulations. In this work, we develop a novel stochastic domain decomposition method for steady-state partial differential equations (PDEs) with random inputs. The Variable-separation (VS) method is one of the most accurate and efficient approaches to solving the stochastic partial differential equation (SPDE). We extend the VS method to stochastic algebraic systems, and then integrate its essence with the deterministic domain decomposition method (DDM). It leads to the stochastic domain decomposition based on the Variable-separation method (SDD-VS) that we investigate in this paper. A significant merit of the proposed SDD-VS method is that it is competent to alleviate the "curse of dimensionality", thanks to the explicit representation of stochastic functions deduced by physical systems. The SDD-VS method aims to get a separated representation of the solution to the stochastic interface problem. To this end, an offline-online computational decomposition is introduced to improve efficiency. The main challenge in the offline phase is to obtain the affine representation of stochastic algebraic systems, which is crucial to the SDD-VS method. This is accomplished through the successive and flexible applications of the VS method. In the online phase, the interface unknowns of SPDEs are estimated using the quasi-optimal separated representation, making it easier to construct efficient surrogate models of subproblems. At last, three concrete examples are presented to illustrate the effectiveness of the proposed method.

Dynamic Graph Representation Learning with Neural Networks: A Survey

  • Authors: Leshanshui Yang, Sébastien Adam, Clément Chatelain
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.05729
  • Pdf link: https://arxiv.org/pdf/2304.05729
  • Abstract
    In recent years, Dynamic Graph (DG) representations have been increasingly used for modeling dynamic systems due to their ability to integrate both topological and temporal information in a compact representation. Dynamic graphs allow to efficiently handle applications such as social network prediction, recommender systems, traffic forecasting or electroencephalography analysis, that can not be adressed using standard numeric representations. As a direct consequence of the emergence of dynamic graph representations, dynamic graph learning has emerged as a new machine learning problem, combining challenges from both sequential/temporal data processing and static graph learning. In this research area, Dynamic Graph Neural Network (DGNN) has became the state of the art approach and plethora of models have been proposed in the very recent years. This paper aims at providing a review of problems and models related to dynamic graph learning. The various dynamic graph supervised learning settings are analysed and discussed. We identify the similarities and differences between existing models with respect to the way time information is modeled. Finally, general guidelines for a DGNN designer when faced with a dynamic graph learning problem are provided.

A Novel Hybrid Post-Weighting Digital Predistortion in mMIMO Under Crosstalk

  • Authors: Ganesh Prasad, Håkan Johansson
  • Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.05795
  • Pdf link: https://arxiv.org/pdf/2304.05795
  • Abstract
    In a hybrid beamforming, a single digital predistortion (DPD) is inefficient to address all the nonlinearities over a subarray of power amplifiers (PAs) with underlying crosstalk in a massive multiple-input multiple-output (mMIMO) transmitter. In this context, the proposed work describes a novel hybrid post-weighting (PW) scheme. Here, it extends the competence of one trained DPD to all PAs exclusively via following PW block associated with optimal coefficients along the basis functions of the DPD. Consequently, it reduces the nonlinear radiation significantly in a wide range of azimuth directions to the transmitter.

Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series

  • Authors: Matthieu Herrmann, Chang Wei Tan, Mahsa Salehi, Geoffrey I. Webb
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05800
  • Pdf link: https://arxiv.org/pdf/2304.05800
  • Abstract
    Time series classification (TSC) is a challenging task due to the diversity of types of feature that may be relevant for different classification tasks, including trends, variance, frequency, magnitude, and various patterns. To address this challenge, several alternative classes of approach have been developed, including similarity-based, features and intervals, shapelets, dictionary, kernel, neural network, and hybrid approaches. While kernel, neural network, and hybrid approaches perform well overall, some specialized approaches are better suited for specific tasks. In this paper, we propose a new similarity-based classifier, Proximity Forest version 2.0 (PF 2.0), which outperforms previous state-of-the-art similarity-based classifiers across the UCR benchmark and outperforms state-of-the-art kernel, neural network, and hybrid methods on specific datasets in the benchmark that are best addressed by similarity-base methods. PF 2.0 incorporates three recent advances in time series similarity measures -- (1) computationally efficient early abandoning and pruning to speedup elastic similarity computations; (2) a new elastic similarity measure, Amerced Dynamic Time Warping (ADTW); and (3) cost function tuning. It rationalizes the set of similarity measures employed, reducing the eight base measures of the original PF to three and using the first derivative transform with all similarity measures, rather than a limited subset. We have implemented both PF 1.0 and PF 2.0 in a single C++ framework, making the PF framework more efficient.

EgoDist: Comparing networks via distributions of egonet features

  • Authors: Carlo Piccardi
  • Subjects: Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2304.05801
  • Pdf link: https://arxiv.org/pdf/2304.05801
  • Abstract
    Identifying networks with similar characteristics in a given ensemble, or detecting pattern discontinuities in a temporal sequence of networks, are two examples of tasks that require an effective metric capable of quantifying network (dis)similarity. Here we propose a method based on a global portrait of graph properties built by processing local nodes features. More precisely, a set of dissimilarity measures is defined by elaborating the distributions, over the network, of a few egonet features, namely the degree, the clustering coefficient, and the egonet persistence. The method, which does not require the alignment of the two networks being compared, exploits the statistics of the three features to define one- or multi-dimensional distribution functions, which are then compared to define a distance between the networks. The effectiveness of the method is evaluated using a standard classification test, i.e., recognizing the graphs originating from the same synthetic model. Overall, the proposed distances have performances comparable to the best state-of-the-art techniques (graphlet-based methods) with similar computational requirements. Given its simplicity and flexibility, the method is proposed as a viable approach for network comparison tasks.

DUFormer: A Novel Architecture for Power Line Segmentation of Aerial Images

  • Authors: Deyu An, Qiang Zhang, Jianshu Chao, Ting Li, Feng Qiao, Yong Deng, Zhenpeng Bian, Jia Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05821
  • Pdf link: https://arxiv.org/pdf/2304.05821
  • Abstract
    Power lines pose a significant safety threat to unmanned aerial vehicles (UAVs) operating at low altitudes. However, detecting power lines in aerial images is challenging due to the small size of the foreground data (i.e., power lines) and the abundance of background information. To address this challenge, we propose DUFormer, a semantic segmentation algorithm designed specifically for power line detection in aerial images. We assume that performing sufficient feature extraction with a convolutional neural network (CNN) that has a strong inductive bias is beneficial for training an efficient Transformer model. To this end, we propose a heavy token encoder responsible for overlapping feature re-mining and tokenization. The encoder comprises a pyramid CNN feature extraction module and a power line feature enhancement module. Following sufficient feature extraction for power lines, the feature fusion is carried out, and then the Transformer block is used for global modeling. The final segmentation result is obtained by fusing local and global features in the decode head. Additionally, we demonstrate the significance of the joint multi-weight loss function in power line segmentation. The experimental results demonstrate that our proposed method achieves the state-of-the-art performance in power line segmentation on the publicly available TTPLA dataset.

Data-Driven Response Regime Exploration and Identification for Dynamical Systems

  • Authors: Maor Farid
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.05822
  • Pdf link: https://arxiv.org/pdf/2304.05822
  • Abstract
    Data-Driven Response Regime Exploration and Identification (DR$^2$EI) is a novel and fully data-driven method for identifying and classifying response regimes of a dynamical system without requiring human intervention. This approach is a valuable tool for exploring and discovering response regimes in complex dynamical systems, especially when the governing equations and the number of response regimes are unknown, and the system is expensive to sample. Additionally, the method is useful for order reduction, as it can be used to identify the most dominant response regimes of a given dynamical system. DR$^2$EI utilizes unsupervised learning algorithms to transform the system's response into an embedding space that facilitates regime classification. An active sequential sampling approach based on Gaussian Process Regression (GPR) is used to efficiently sample the parameter space, quantify uncertainty, and provide optimal trade-offs between exploration and exploitation. The performance of the DR$^2$EI method was evaluated by analyzing three established dynamical systems: the mathematical pendulum, the Lorenz system, and the Duffing oscillator. The method was shown to effectively identify a variety of response regimes with both similar and distinct topological features and frequency content, demonstrating its versatility in capturing a wide range of behaviors. While it may not be possible to guarantee that all possible regimes will be identified, the method provides an automated and efficient means for exploring the parameter space of a dynamical system and identifying its underlying "sufficiently dominant" response regimes without prior knowledge of the system's equations or behavior.

FedTrip: A Resource-Efficient Federated Learning Method with Triplet Regularization

  • Authors: Xujing Li, Min Liu, Sheng Sun, Yuwei Wang, Hui Jiang, Xuefeng Jiang
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.05824
  • Pdf link: https://arxiv.org/pdf/2304.05824
  • Abstract
    In the federated learning scenario, geographically distributed clients collaboratively train a global model. Data heterogeneity among clients significantly results in inconsistent model updates, which evidently slow down model convergence. To alleviate this issue, many methods employ regularization terms to narrow the discrepancy between client-side local models and the server-side global model. However, these methods impose limitations on the ability to explore superior local models and ignore the valuable information in historical models. Besides, although the up-to-date representation method simultaneously concerns the global and historical local models, it suffers from unbearable computation cost. To accelerate convergence with low resource consumption, we innovatively propose a model regularization method named FedTrip, which is designed to restrict global-local divergence and decrease current-historical correlation for alleviating the negative effects derived from data heterogeneity. FedTrip helps the current local model to be close to the global model while keeping away from historical local models, which contributes to guaranteeing the consistency of local updates among clients and efficiently exploring superior local models with negligible additional computation cost on attaching operations. Empirically, we demonstrate the superiority of FedTrip via extensive evaluations. To achieve the target accuracy, FedTrip outperforms the state-of-the-art baselines in terms of significantly reducing the total overhead of client-server communication and local computation.

RESET: Revisiting Trajectory Sets for Conditional Behavior Prediction

  • Authors: Julian Schmidt, Pascal Huissel, Julian Wiederer, Julian Jordan, Vasileios Belagiannis, Klaus Dietmayer
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.05856
  • Pdf link: https://arxiv.org/pdf/2304.05856
  • Abstract
    It is desirable to predict the behavior of traffic participants conditioned on different planned trajectories of the autonomous vehicle. This allows the downstream planner to estimate the impact of its decisions. Recent approaches for conditional behavior prediction rely on a regression decoder, meaning that coordinates or polynomial coefficients are regressed. In this work we revisit set-based trajectory prediction, where the probability of each trajectory in a predefined trajectory set is determined by a classification model, and first-time employ it to the task of conditional behavior prediction. We propose RESET, which combines a new metric-driven algorithm for trajectory set generation with a graph-based encoder. For unconditional prediction, RESET achieves comparable performance to a regression-based approach. Due to the nature of set-based approaches, it has the advantageous property of being able to predict a flexible number of trajectories without influencing runtime or complexity. For conditional prediction, RESET achieves reasonable results with late fusion of the planned trajectory, which was not observed for regression-based approaches before. This means that RESET is computationally lightweight to combine with a planner that proposes multiple future plans of the autonomous vehicle, as large parts of the forward pass can be reused.

Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL

  • Authors: Zakaria Mhammedi, Dylan J. Foster, Alexander Rakhlin
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05889
  • Pdf link: https://arxiv.org/pdf/2304.05889
  • Abstract
    We study the design of sample-efficient algorithms for reinforcement learning in the presence of rich, high-dimensional observations, formalized via the Block MDP problem. Existing algorithms suffer from either 1) computational intractability, 2) strong statistical assumptions that are not necessarily satisfied in practice, or 3) suboptimal sample complexity. We address these issues by providing the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level, with minimal statistical assumptions. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics, a learning objective in which the aim is to predict the learner's own action from the current observation and observations in the (potentially distant) future. MusIK is simple and flexible, and can efficiently take advantage of general-purpose function approximation. Our analysis leverages several new techniques tailored to non-optimistic exploration algorithms, which we anticipate will find broader use.

Node-Differentially Private Estimation of the Number of Connected Components

  • Authors: Iden Kalemaj, Sofya Raskhodnikova, Adam Smith, Charalampos E. Tsourakakis
  • Subjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.05890
  • Pdf link: https://arxiv.org/pdf/2304.05890
  • Abstract
    We design the first node-differentially private algorithm for approximating the number of connected components in a graph. Given a database representing an $n$-vertex graph $G$ and a privacy parameter $\varepsilon$, our algorithm runs in polynomial time and, with probability $1-o(1)$, has additive error $\widetilde{O}(\frac{\Delta^\ln\ln n}{\varepsilon}),$ where $\Delta^$ is the smallest possible maximum degree of a spanning forest of $G.$ Node-differentially private algorithms are known only for a small number of database analysis tasks. A major obstacle for designing such an algorithm for the number of connected components is that this graph statistic is not robust to adding one node with arbitrary connections (a change that node-differential privacy is designed to hide): {\em every} graph is a neighbor of a connected graph. We overcome this by designing a family of efficiently computable Lipschitz extensions of the number of connected components or, equivalently, the size of a spanning forest. The construction of the extensions, which is at the core of our algorithm, is based on the forest polytope of $G.$ We prove several combinatorial facts about spanning forests, in particular, that a graph with no induced $\Delta$-stars has a spanning forest of degree at most $\Delta$. With this fact, we show that our Lipschitz extensions for the number of connected components equal the true value of the function for the largest possible monotone families of graphs. More generally, on all monotone sets of graphs, the $\ell_\infty$ error of our Lipschitz extensions is nearly optimal.

Localizing Model Behavior with Path Patching

  • Authors: Nicholas Goldowsky-Dill, Chris MacLeod, Lucas Sato, Aryaman Arora
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.05969
  • Pdf link: https://arxiv.org/pdf/2304.05969
  • Abstract
    Localizing behaviors of neural networks to a subset of the network's components or a subset of interactions between components is a natural first step towards analyzing network mechanisms and possible failure modes. Existing work is often qualitative and ad-hoc, and there is no consensus on the appropriate way to evaluate localization claims. We introduce path patching, a technique for expressing and quantitatively testing a natural class of hypotheses expressing that behaviors are localized to a set of paths. We refine an explanation of induction heads, characterize a behavior of GPT-2, and open source a framework for efficiently running similar experiments.

HiPrompt: Few-Shot Biomedical Knowledge Fusion via Hierarchy-Oriented Prompting

  • Authors: Jiaying Lu, Jiaming Shen, Bo Xiong, Wenjing Ma, Steffen Staab, Carl Yang
  • Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.05973
  • Pdf link: https://arxiv.org/pdf/2304.05973
  • Abstract
    Medical decision-making processes can be enhanced by comprehensive biomedical knowledge bases, which require fusing knowledge graphs constructed from different sources via a uniform index system. The index system often organizes biomedical terms in a hierarchy to provide the aligned entities with fine-grained granularity. To address the challenge of scarce supervision in the biomedical knowledge fusion (BKF) task, researchers have proposed various unsupervised methods. However, these methods heavily rely on ad-hoc lexical and structural matching algorithms, which fail to capture the rich semantics conveyed by biomedical entities and terms. Recently, neural embedding models have proved effective in semantic-rich tasks, but they rely on sufficient labeled data to be adequately trained. To bridge the gap between the scarce-labeled BKF and neural embedding models, we propose HiPrompt, a supervision-efficient knowledge fusion framework that elicits the few-shot reasoning ability of large language models through hierarchy-oriented prompts. Empirical results on the collected KG-Hi-BKF benchmark datasets demonstrate the effectiveness of HiPrompt.

GPr-Net: Geometric Prototypical Network for Point Cloud Few-Shot Learning

  • Authors: Tejas Anvekar, Dena Bazazian
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06007
  • Pdf link: https://arxiv.org/pdf/2304.06007
  • Abstract
    In the realm of 3D-computer vision applications, point cloud few-shot learning plays a critical role. However, it poses an arduous challenge due to the sparsity, irregularity, and unordered nature of the data. Current methods rely on complex local geometric extraction techniques such as convolution, graph, and attention mechanisms, along with extensive data-driven pre-training tasks. These approaches contradict the fundamental goal of few-shot learning, which is to facilitate efficient learning. To address this issue, we propose GPr-Net (Geometric Prototypical Network), a lightweight and computationally efficient geometric prototypical network that captures the intrinsic topology of point clouds and achieves superior performance. Our proposed method, IGI++ (Intrinsic Geometry Interpreter++) employs vector-based hand-crafted intrinsic geometry interpreters and Laplace vectors to extract and evaluate point cloud morphology, resulting in improved representations for FSL (Few-Shot Learning). Additionally, Laplace vectors enable the extraction of valuable features from point clouds with fewer points. To tackle the distribution drift challenge in few-shot metric learning, we leverage hyperbolic space and demonstrate that our approach handles intra and inter-class variance better than existing point cloud few-shot learning methods. Experimental results on the ModelNet40 dataset show that GPr-Net outperforms state-of-the-art methods in few-shot learning on point clouds, achieving utmost computational efficiency that is $170\times$ better than all existing works. The code is publicly available at https://github.com/TejasAnvekar/GPr-Net.

An Improved Heart Disease Prediction Using Stacked Ensemble Method

  • Authors: Md. Maidul Islam, Tanzina Nasrin Tania, Sharmin Akter, Kazi Hassan Shakib
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06015
  • Pdf link: https://arxiv.org/pdf/2304.06015
  • Abstract
    Heart disorder has just overtaken cancer as the world's biggest cause of mortality. Several cardiac failures, heart disease mortality, and diagnostic costs can all be reduced with early identification and treatment. Medical data is collected in large quantities by the healthcare industry, but it is not well mined. The discovery of previously unknown patterns and connections in this information can help with an improved decision when it comes to forecasting heart disorder risk. In the proposed study, we constructed an ML-based diagnostic system for heart illness forecasting, using a heart disorder dataset. We used data preprocessing techniques like outlier detection and removal, checking and removing missing entries, feature normalization, cross-validation, nine classification algorithms like RF, MLP, KNN, ETC, XGB, SVC, ADB, DT, and GBM, and eight classifier measuring performance metrics like ramification accuracy, precision, F1 score, specificity, ROC, sensitivity, log-loss, and Matthews' correlation coefficient, as well as eight classification performance evaluations. Our method can easily differentiate between people who have cardiac disease and those are normal. Receiver optimistic curves and also the region under the curves were determined by every classifier. Most of the classifiers, pretreatment strategies, validation methods, and performance assessment metrics for classification models have been discussed in this study. The performance of the proposed scheme has been confirmed, utilizing all of its capabilities. In this work, the impact of clinical decision support systems was evaluated using a stacked ensemble approach that included these nine algorithms

RECLIP: Resource-efficient CLIP by Training with Small Images

  • Authors: Runze Li, Dahun Kim, Bir Bhanu, Weicheng Kuo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06028
  • Pdf link: https://arxiv.org/pdf/2304.06028
  • Abstract
    We present RECLIP (Resource-efficient CLIP), a simple method that minimizes computational resource footprint for CLIP (Contrastive Language Image Pretraining). Inspired by the notion of coarse-to-fine in computer vision, we leverage small images to learn from large-scale language supervision efficiently, and finetune the model with high-resolution data in the end. Since the complexity of the vision transformer heavily depends on input image size, our approach significantly reduces the training resource requirements both in theory and in practice. Using the same batch size and training epoch, RECLIP achieves highly competitive zero-shot classification and image text retrieval accuracy with 6 to 8$\times$ less computational resources and 7 to 9$\times$ fewer FLOPs than the baseline. Compared to the state-of-the-art contrastive learning methods, RECLIP demonstrates 5 to 59$\times$ training resource savings while maintaining highly competitive zero-shot classification and retrieval performance. We hope this work will pave the path for the broader research community to explore language supervised pretraining in more resource-friendly settings.

Keyword: faster

Efficient Automation of Neural Network Design: A Survey on Differentiable Neural Architecture Search

  • Authors: Alexandre Heuillet, Ahmad Nasser, Hichem Arioui, Hedi Tabia
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05405
  • Pdf link: https://arxiv.org/pdf/2304.05405
  • Abstract
    In the past few years, Differentiable Neural Architecture Search (DNAS) rapidly imposed itself as the trending approach to automate the discovery of deep neural network architectures. This rise is mainly due to the popularity of DARTS, one of the first major DNAS methods. In contrast with previous works based on Reinforcement Learning or Evolutionary Algorithms, DNAS is faster by several orders of magnitude and uses fewer computational resources. In this comprehensive survey, we focus specifically on DNAS and review recent approaches in this field. Furthermore, we propose a novel challenge-based taxonomy to classify DNAS methods. We also discuss the contributions brought to DNAS in the past few years and its impact on the global NAS field. Finally, we conclude by giving some insights into future research directions for the DNAS field.

Probabilistic Reasoning at Scale: Trigger Graphs to the Rescue

  • Authors: Efthymia Tsamoura, Jaehun Lee, Jacopo Urbani
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.05459
  • Pdf link: https://arxiv.org/pdf/2304.05459
  • Abstract
    The role of uncertainty in data management has become more prominent than ever before, especially because of the growing importance of machine learning-driven applications that produce large uncertain databases. A well-known approach to querying such databases is to blend rule-based reasoning with uncertainty. However, techniques proposed so far struggle with large databases. In this paper, we address this problem by presenting a new technique for probabilistic reasoning that exploits Trigger Graphs (TGs) -- a notion recently introduced for the non-probabilistic setting. The intuition is that TGs can effectively store a probabilistic model by avoiding an explicit materialization of the lineage and by grouping together similar derivations of the same fact. Firstly, we show how TGs can be adapted to support the possible world semantics. Then, we describe techniques for efficiently computing a probabilistic model, and formally establish the correctness of our approach. We also present an extensive empirical evaluation using a prototype called LTGs. Our comparison against other leading engines shows that LTGs is not only faster, even against approximate reasoning techniques, but can also reason over probabilistic databases that existing engines cannot scale to.

Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box

  • Authors: Ryan Giordano, Martin Ingram, Tamara Broderick
  • Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.05527
  • Pdf link: https://arxiv.org/pdf/2304.05527
  • Abstract
    Automatic differentiation variational inference (ADVI) offers fast and easy-to-use posterior approximation in multiple modern probabilistic programming languages. However, its stochastic optimizer lacks clear convergence criteria and requires tuning parameters. Moreover, ADVI inherits the poor posterior uncertainty estimates of mean-field variational Bayes (MFVB). We introduce deterministic ADVI'' (DADVI) to address these issues. DADVI replaces the intractable MFVB objective with a fixed Monte Carlo approximation, a technique known in the stochastic optimization literature as the sample average approximation'' (SAA). By optimizing an approximate but deterministic objective, DADVI can use off-the-shelf second-order optimization, and, unlike standard mean-field ADVI, is amenable to more accurate posterior linear response (LR) covariance estimates. In contrast to existing worst-case theory, we show that, on certain classes of common statistical problems, DADVI and the SAA can perform well with relatively few samples even in very high dimensions, though we also show that such favorable results cannot extend to variational approximations that are too expressive relative to mean-field ADVI. We show on a variety of real-world problems that DADVI reliably finds good solutions with default settings (unlike ADVI) and, together with LR covariances, is typically faster and more accurate than standard ADVI.

Zoom is what you need: An empirical study of the power of zoom and spatial biases in image classification

  • Authors: Mohammad Reza Taesiri, Giang Nguyen, Sarra Habchi, Cor-Paul Bezemer, Anh Nguyen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05538
  • Pdf link: https://arxiv.org/pdf/2304.05538
  • Abstract
    Image classifiers are information-discarding machines, by design. Yet, how these models discard information remains mysterious. We hypothesize that one way for image classifiers to reach high accuracy is to first zoom to the most discriminative region in the image and then extract features from there to predict image labels. We study six popular networks ranging from AlexNet to CLIP and find that proper framing of the input image can lead to the correct classification of 98.91% of ImageNet images. Furthermore, we explore the potential and limits of zoom transforms in image classification and uncover positional biases in various datasets, especially a strong center bias in two popular datasets: ImageNet-A and ObjectNet. Finally, leveraging our insights into the potential of zoom, we propose a state-of-the-art test-time augmentation (TTA) technique that improves classification accuracy by forcing models to explicitly perform zoom-in operations before making predictions. Our method is more interpretable, accurate, and faster than MEMO, a state-of-the-art TTA method. Additionally, we propose ImageNet-Hard, a new benchmark where zooming in alone often does not help state-of-the-art models better label images.

An Optimal SVC Bitstream Schema for Viewport-dependent 360-degree Video Streaming

  • Authors: Gang Shen, Mingyang Ma, Guangxin Xu
  • Subjects: Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2304.05654
  • Pdf link: https://arxiv.org/pdf/2304.05654
  • Abstract
    To deliver ultra-high resolution 360-degree video (such as 8K, 12K, or even higher) across the internet, viewport-dependent streaming becomes necessary to save bandwidth. During viewport switches, clients and servers will instantly exchange coordination info and contents for the given viewports. However, those viewport switches pose a serious challenge for video encoding because the temporal dependency between contents within changing viewports is unpredictable. In existing practices, it is commonly noted that GOP (Group of Pictures) size in a bitstream intrinsically prohibits the reduction of the viewport switch latency, such as Motion-to-photon (MTP) latency, or motion-to-high-quality (MTHQ) latency. In this paper, we presented a Scalable Video Coding (SVC) based bitstream schema, which can structurally remove the impacts of GOP in viewport-dependent streaming and provide instant viewport switches within one-frame time (the best possible). In addition, combined with tiling, this new coding schema allows an efficient packing of the non-adjacent regions within a viewport of 360-degree video. Our experiments also show that the overall encoding with this SVC-based approach is faster than with multi-stream approaches. Compared with current 360-degree video streaming solutions based on MPEG-I OMAF, our approach is superior in terms of viewport switch latency, simplicity of viewport packing, and encoding performance.

Factorized Inverse Path Tracing for Efficient and Accurate Material-Lighting Estimation

  • Authors: Liwen Wu, Rui Zhu, Mustafa B. Yaldiz, Yinhao Zhu, Hong Cai, Janarbek Matai, Fatih Porikli, Tzu-Mao Li, Manmohan Chandraker, Ravi Ramamoorthi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.05669
  • Pdf link: https://arxiv.org/pdf/2304.05669
  • Abstract
    Inverse path tracing has recently been applied to joint material and lighting estimation, given geometry and multi-view HDR observations of an indoor scene. However, it has two major limitations: path tracing is expensive to compute, and ambiguities exist between reflection and emission. We propose a novel Factorized Inverse Path Tracing (FIPT) method which utilizes a factored light transport formulation and finds emitters driven by rendering errors. Our algorithm enables accurate material and lighting optimization faster than previous work, and is more effective at resolving ambiguities. The exhaustive experiments on synthetic scenes show that our method (1) outperforms state-of-the-art indoor inverse rendering and relighting methods particularly in the presence of complex illumination effects; (2) speeds up inverse path tracing optimization to less than an hour. We further demonstrate robustness to noisy inputs through material and lighting estimates that allow plausible relighting in a real scene. The source code is available at: https://github.com/lwwu2/fipt

Real-time Trajectory-based Social Group Detection

  • Authors: Simindokht Jahangard, Munawar Hayat, Hamid Rezatofighi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05678
  • Pdf link: https://arxiv.org/pdf/2304.05678
  • Abstract
    Social group detection is a crucial aspect of various robotic applications, including robot navigation and human-robot interactions. To date, a range of model-based techniques have been employed to address this challenge, such as the F-formation and trajectory similarity frameworks. However, these approaches often fail to provide reliable results in crowded and dynamic scenarios. Recent advancements in this area have mainly focused on learning-based methods, such as deep neural networks that use visual content or human pose. Although visual content-based methods have demonstrated promising performance on large-scale datasets, their computational complexity poses a significant barrier to their practical use in real-time applications. To address these issues, we propose a simple and efficient framework for social group detection. Our approach explores the impact of motion trajectory on social grouping and utilizes a novel, reliable, and fast data-driven method. We formulate the individuals in a scene as a graph, where the nodes are represented by LSTM-encoded trajectories and the edges are defined by the distances between each pair of tracks. Our framework employs a modified graph transformer module and graph clustering losses to detect social groups. Our experiments on the popular JRDBAct dataset reveal noticeable improvements in performance, with relative improvements ranging from 2% to 11%. Furthermore, our framework is significantly faster, with up to 12x faster inference times compared to state-of-the-art methods under the same computation resources. These results demonstrate that our proposed method is suitable for real-time robotic applications.

Cost-damage analysis of attack trees

  • Authors: Milan Lopuhaä-Zwakenberg, Mariëlle Stoelinga
  • Subjects: Cryptography and Security (cs.CR); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.05812
  • Pdf link: https://arxiv.org/pdf/2304.05812
  • Abstract
    Attack trees (ATs) are a widely deployed modelling technique to categorize potential attacks on a system. An attacker of such a system aims at doing as much damage as possible, but might be limited by a cost budget. The maximum possible damage for a given cost budget is an important security metric of a system. In this paper, we find the maximum damage given a cost budget by modelling this problem with ATs, both in deterministic and probabilistic settings. We show that the general problem is NP-complete, and provide heuristics to solve it. For general ATs these are based on integer linear programming. However when the AT is tree-structured, then one can instead use a faster bottom-up approach. We also extend these methods to other problems related to the cost-damage tradeoff, such as the cost-damage Pareto front.

Keyword: mobile

DOSM: Demand-Prediction based Online Service Management for Vehicular Edge Computing Networks

  • Authors: Anum Talpur, Mohan Gurusamy
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.05637
  • Pdf link: https://arxiv.org/pdf/2304.05637
  • Abstract
    In this work, we investigate an online service management problem in vehicular edge computing networks. To satisfy the varying service demands of mobile vehicles, a service management framework is required to make decisions on the service lifecycle to maintain good network performance. We describe the service lifecycle consists of creating an instance of a given service (\textit{scale-out}), moving an instance to a different edge node (\textit{migration}), and/or termination of an underutilized instance (\textit{scale-in}). In this paper, we propose an efficient online algorithm to perform service management in each time slot, where performance quality in the current time slot, the service demand in future time slots, and the minimal observed delay by vehicles and the minimal migration delay are considered while making the decisions on service lifecycle. Here, the future service demand is computed from a gated recurrent unit (GRU)-based prediction model, and the network performance quality is estimated using a deep reinforcement learning (DRL) model which has the ability to interact with the vehicular environment in real-time. The choice of optimal edge location to deploy a service instance at different times is based on our proposed optimization formulations. Simulation experiments using real-world vehicle trajectories are carried out to evaluate the performance of our proposed demand-prediction based online service management (DOSM) framework against different state-of-the-art solutions using several performance metrics.

5Greplay: a 5G Network Traffic Fuzzer -- Application to Attack Injection

  • Authors: Zujany Salazar, Huu Nghia Nguyen, Wissam Mallouli, Ana R Cavalli, Edgardo Montes de Oca
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.05719
  • Pdf link: https://arxiv.org/pdf/2304.05719
  • Abstract
    The fifth generation of mobile broadband is more than just an evolution to provide more mobile bandwidth, massive machine-type communications, and ultra-reliable and low-latency communications. It relies on a complex, dynamic and heterogeneous environment that implies addressing numerous testing and security challenges. In this paper we present 5Greplay, an open-source 5G network traffic fuzzer that enables the evaluation of 5G components by replaying and modifying 5G network traffic by creating and injecting network scenarios into a target that can be a 5G core service (e.g., AMF, SMF) or a RAN network (e.g., gNodeB). The tool provides the ability to alter network packets online or offline in both control and data planes in a very flexible manner. The experimental evaluation conducted against open-source based 5G platforms, showed that the target services accept traffic being altered by the tool, and that it can reach up to 9.56 Gbps using only 1 processor core to replay 5G traffic.

Stand-Up Indulgent Gathering on Lines

  • Authors: Quentin Bramas (ICube, ICUBE-Réseaux, UNISTRA), Sayaka Kamei, Anissa Lamani (ICube, ICUBE-Réseaux, UNISTRA), Sébastien Tixeuil (SU)
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.05722
  • Pdf link: https://arxiv.org/pdf/2304.05722
  • Abstract
    We consider a variant of the crash-fault gathering problem called stand-up indulgent gathering (SUIG). In this problem, a group of mobile robots must eventually gather at a single location, which is not known in advance. If no robots crash, they must all meet at the same location. However, if one or more robots crash at a single location, all non-crashed robots must eventually gather at that location. The SUIG problem was first introduced for robots operating in a two-dimensional continuous Euclidean space, with most solutions relying on the ability of robots to move a prescribed (real) distance at each time instant. In this paper, we investigate the SUIG problem for robots operating in a discrete universe (i.e., a graph) where they can only move one unit of distance (i.e., to an adjacent node) at each time instant. Specifically, we focus on line-shaped networks and characterize the solvability of the SUIG problem for oblivious robots without multiplicity detection.

Fast vehicle detection algorithm based on lightweight YOLO7-tiny

  • Authors: Bo Li, YiHua Chen, Hao Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06002
  • Pdf link: https://arxiv.org/pdf/2304.06002
  • Abstract
    The swift and precise detection of vehicles holds significant research significance in intelligent transportation systems (ITS). However, current vehicle detection algorithms encounter challenges such as high computational complexity, low detection rate, and limited feasibility on mobile devices. To address these issues, this paper proposes a lightweight vehicle detection algorithm for YOLOv7-tiny called Ghost-YOLOv7. The model first scales the width multiple to 0.5 and replaces the standard convolution of the backbone network with Ghost convolution to achieve a lighter network and improve the detection speed; secondly, a Ghost bi-directional feature pyramid network (Ghost-BiFPN) neck network is designed to enhance feature extraction capability of the algorithm and enrich semantic information; thirdly, a Ghost Decouoled Head (GDH) is employed for accurate prediction of vehicle location and class, enhancing model accuracy; finally, a coordinate attention mechanism is introduced in the output layer to suppress environmental interference, and the WIoU loss function is employed to enhance the detection accuracy further. Experimental results on the PASCAL VOC dataset demonstrate that Ghost-YOLOv7 outperforms the original YOLOv7-tiny model, achieving a 29.8% reduction in computation, 37.3% reduction in the number of parameters, 35.1% reduction in model weights, and 1.1% higher mean average precision (mAP), while achieving a detection speed of 428 FPS. These results validate the effectiveness of the proposed method.

Keyword: pruning

Distilling Token-Pruned Pose Transformer for 2D Human Pose Estimation

  • Authors: Feixiang Ren
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05548
  • Pdf link: https://arxiv.org/pdf/2304.05548
  • Abstract
    Human pose estimation has seen widespread use of transformer models in recent years. Pose transformers benefit from the self-attention map, which captures the correlation between human joint tokens and the image. However, training such models is computationally expensive. The recent token-Pruned Pose Transformer (PPT) solves this problem by pruning the background tokens of the image, which are usually less informative. However, although it improves efficiency, PPT inevitably leads to worse performance than TokenPose due to the pruning of tokens. To overcome this problem, we present a novel method called Distilling Pruned-Token Transformer for human pose estimation (DPPT). Our method leverages the output of a pre-trained TokenPose to supervise the learning process of PPT. We also establish connections between the internal structure of pose transformers and PPT, such as attention maps and joint features. Our experimental results on the MPII datasets show that our DPPT can significantly improve PCK compared to previous PPT models while still reducing computational complexity.

Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series

  • Authors: Matthieu Herrmann, Chang Wei Tan, Mahsa Salehi, Geoffrey I. Webb
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05800
  • Pdf link: https://arxiv.org/pdf/2304.05800
  • Abstract
    Time series classification (TSC) is a challenging task due to the diversity of types of feature that may be relevant for different classification tasks, including trends, variance, frequency, magnitude, and various patterns. To address this challenge, several alternative classes of approach have been developed, including similarity-based, features and intervals, shapelets, dictionary, kernel, neural network, and hybrid approaches. While kernel, neural network, and hybrid approaches perform well overall, some specialized approaches are better suited for specific tasks. In this paper, we propose a new similarity-based classifier, Proximity Forest version 2.0 (PF 2.0), which outperforms previous state-of-the-art similarity-based classifiers across the UCR benchmark and outperforms state-of-the-art kernel, neural network, and hybrid methods on specific datasets in the benchmark that are best addressed by similarity-base methods. PF 2.0 incorporates three recent advances in time series similarity measures -- (1) computationally efficient early abandoning and pruning to speedup elastic similarity computations; (2) a new elastic similarity measure, Amerced Dynamic Time Warping (ADTW); and (3) cost function tuning. It rationalizes the set of similarity measures employed, reducing the eight base measures of the original PF to three and using the first derivative transform with all similarity measures, rather than a limited subset. We have implemented both PF 1.0 and PF 2.0 in a single C++ framework, making the PF framework more efficient.

Keyword: voxel

There is no result

Keyword: lidar

SceneCalib: Automatic Targetless Calibration of Cameras and Lidars in Autonomous Driving

  • Authors: Ayon Sen, Gang Pan, Anton Mitrokhin, Ashraful Islam
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05530
  • Pdf link: https://arxiv.org/pdf/2304.05530
  • Abstract
    Accurate camera-to-lidar calibration is a requirement for sensor data fusion in many 3D perception tasks. In this paper, we present SceneCalib, a novel method for simultaneous self-calibration of extrinsic and intrinsic parameters in a system containing multiple cameras and a lidar sensor. Existing methods typically require specially designed calibration targets and human operators, or they only attempt to solve for a subset of calibration parameters. We resolve these issues with a fully automatic method that requires no explicit correspondences between camera images and lidar point clouds, allowing for robustness to many outdoor environments. Furthermore, the full system is jointly calibrated with explicit cross-camera constraints to ensure that camera-to-camera and camera-to-lidar extrinsic parameters are consistent.

WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language

  • Authors: Zhenxiang Lin, Xidong Peng, Peishan Cong, Yuenan Hou, Xinge Zhu, Sibei Yang, Yuexin Ma
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05645
  • Pdf link: https://arxiv.org/pdf/2304.05645
  • Abstract
    We introduce the task of 3D visual grounding in large-scale dynamic scenes based on natural linguistic descriptions and online captured multi-modal visual data, including 2D images and 3D LiDAR point clouds. We present a novel method, WildRefer, for this task by fully utilizing the appearance features in images, the location and geometry features in point clouds, and the dynamic features in consecutive input frames to match the semantic features in language. In particular, we propose two novel datasets, STRefer and LifeRefer, which focus on large-scale human-centric daily-life scenarios with abundant 3D object and natural language annotations. Our datasets are significant for the research of 3D visual grounding in the wild and has huge potential to boost the development of autonomous driving and service robots. Extensive comparisons and ablation studies illustrate that our method achieves state-of-the-art performance on two proposed datasets. Code and dataset will be released when the paper is published.

Keyword: diffusion

CamDiff: Camouflage Image Augmentation via Diffusion Model

  • Authors: Xue-Jing Luo, Shuo Wang, Zongwei Wu, Christos Sakaridis, Yun Cheng, Deng-Ping Fan, Luc Van Gool
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05469
  • Pdf link: https://arxiv.org/pdf/2304.05469
  • Abstract
    The burgeoning field of camouflaged object detection (COD) seeks to identify objects that blend into their surroundings. Despite the impressive performance of recent models, we have identified a limitation in their robustness, where existing methods may misclassify salient objects as camouflaged ones, despite these two characteristics being contradictory. This limitation may stem from lacking multi-pattern training images, leading to less saliency robustness. To address this issue, we introduce CamDiff, a novel approach inspired by AI-Generated Content (AIGC) that overcomes the scarcity of multi-pattern training images. Specifically, we leverage the latent diffusion model to synthesize salient objects in camouflaged scenes, while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training (CLIP) model to prevent synthesis failures and ensure the synthesized object aligns with the input prompt. Consequently, the synthesized image retains its original camouflage label while incorporating salient objects, yielding camouflage samples with richer characteristics. The results of user studies show that the salient objects in the scenes synthesized by our framework attract the user's attention more; thus, such samples pose a greater challenge to the existing COD models. Our approach enables flexible editing and efficient large-scale dataset generation at a low cost. It significantly enhances COD baselines' training and testing phases, emphasizing robustness across diverse domains. Our newly-generated datasets and source code are available at https://github.com/drlxj/CamDiff.

Improving Diffusion Models for Scene Text Editing with Dual Encoders

  • Authors: Jiabao Ji, Guanhua Zhang, Zhaowen Wang, Bairu Hou, Zhifei Zhang, Brian Price, Shiyu Chang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05568
  • Pdf link: https://arxiv.org/pdf/2304.05568
  • Abstract
    Scene text editing is a challenging task that involves modifying or inserting specified texts in an image while maintaining its natural and realistic appearance. Most previous approaches to this task rely on style-transfer models that crop out text regions and feed them into image transfer models, such as GANs. However, these methods are limited in their ability to change text style and are unable to insert texts into images. Recent advances in diffusion models have shown promise in overcoming these limitations with text-conditional image editing. However, our empirical analysis reveals that state-of-the-art diffusion models struggle with rendering correct text and controlling text style. To address these problems, we propose DIFFSTE to improve pre-trained diffusion models with a dual encoder design, which includes a character encoder for better text legibility and an instruction encoder for better style control. An instruction tuning framework is introduced to train our model to learn the mapping from the text instruction to the corresponding image with either the specified style or the style of the surrounding texts in the background. Such a training method further brings our method the zero-shot generalization ability to the following three scenarios: generating text with unseen font variation, e.g., italic and bold, mixing different fonts to construct a new font, and using more relaxed forms of natural language as the instructions to guide the generation task. We evaluate our approach on five datasets and demonstrate its superior performance in terms of text correctness, image naturalness, and style controllability. Our code is publicly available. https://github.com/UCSB-NLP-Chang/DiffSTE

InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions

  • Authors: Han Liang, Wenqian Zhang, Wenxuan Li, Jingyi Yu, Lan Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05684
  • Pdf link: https://arxiv.org/pdf/2304.05684
  • Abstract
    We have recently seen tremendous progress in diffusion advances for generating realistic human motions. Yet, they largely disregard the rich multi-human interactions. In this paper, we present InterGen, an effective diffusion-based approach that incorporates human-to-human interactions into the motion diffusion process, which enables layman users to customize high-quality two-person interaction motions, with only text guidance. We first contribute a multimodal dataset, named InterHuman. It consists of about 107M frames for diverse two-person interactions, with accurate skeletal motions and 16,756 natural language descriptions. For the algorithm side, we carefully tailor the motion diffusion model to our two-person interaction setting. To handle the symmetry of human identities during interactions, we propose two cooperative transformer-based denoisers that explicitly share weights, with a mutual attention mechanism to further connect the two denoising processes. Then, we propose a novel representation for motion input in our interaction diffusion model, which explicitly formulates the global relations between the two performers in the world frame. We further introduce two novel regularization terms to encode spatial relations, equipped with a corresponding damping scheme during the training of our interaction diffusion model. Extensive experiments validate the effectiveness and generalizability of InterGen. Notably, it can generate more diverse and compelling two-person motions than previous methods and enables various downstream applications for human interactions.

Exploring Diffusion Models for Unsupervised Video Anomaly Detection

  • Authors: Anil Osman Tur, Nicola Dall'Asen, Cigdem Beyan, Elisa Ricci
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05841
  • Pdf link: https://arxiv.org/pdf/2304.05841
  • Abstract
    This paper investigates the performance of diffusion models for video anomaly detection (VAD) within the most challenging but also the most operational scenario in which the data annotations are not used. As being sparse, diverse, contextual, and often ambiguous, detecting abnormal events precisely is a very ambitious task. To this end, we rely only on the information-rich spatio-temporal data, and the reconstruction power of the diffusion models such that a high reconstruction error is utilized to decide the abnormality. Experiments performed on two large-scale video anomaly detection datasets demonstrate the consistent improvement of the proposed method over the state-of-the-art generative models while in some cases our method achieves better scores than the more complex models. This is the first study using a diffusion model and examining its parameters' influence to present guidance for VAD in surveillance scenarios.

A quadrature scheme for steady-state diffusion equations involving fractional power of regularly accretive operator

  • Authors: Beiping Duan, Zongze Yang
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.05848
  • Pdf link: https://arxiv.org/pdf/2304.05848
  • Abstract
    In this paper, we construct a quadrature scheme to numerically solve the nonlocal diffusion equation $(\mathcal{A}^\alpha+b\mathcal{I})u=f$ with $\mathcal{A}^\alpha$ the $\alpha$-th power of the regularly accretive operator $\mathcal{A}$. Rigorous error analysis is carried out and sharp error bounds (up to some negligible constants) are obtained. The error estimates include a wide range of cases in which the regularity index and spectral angle of $\mathcal{A}$, the smoothness of $f$, the size of $b$ and $\alpha$ are all involved. The quadrature scheme is exponentially convergent with respect to the step size and is root-exponentially convergent with respect to the number of solves. Some numerical tests are presented in the last section to verify the sharpness of our estimates. Furthermore, both the scheme and the error bounds can be utilized directly to solve and analyze time-dependent problems.

Cancer-Net BCa-S: Breast Cancer Grade Prediction using Volumetric Deep Radiomic Features from Synthetic Correlated Diffusion Imaging

  • Authors: Chi-en Amy Tai, Hayden Gunraj, Alexander Wong
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05899
  • Pdf link: https://arxiv.org/pdf/2304.05899
  • Abstract
    The prevalence of breast cancer continues to grow, affecting about 300,000 females in the United States in 2023. However, there are different levels of severity of breast cancer requiring different treatment strategies, and hence, grading breast cancer has become a vital component of breast cancer diagnosis and treatment planning. Specifically, the gold-standard Scarff-Bloom-Richardson (SBR) grade has been shown to consistently indicate a patient's response to chemotherapy. Unfortunately, the current method to determine the SBR grade requires removal of some cancer cells from the patient which can lead to stress and discomfort along with costly expenses. In this paper, we study the efficacy of deep learning for breast cancer grading based on synthetic correlated diffusion (CDI$^s$) imaging, a new magnetic resonance imaging (MRI) modality and found that it achieves better performance on SBR grade prediction compared to those learnt using gold-standard imaging modalities. Hence, we introduce Cancer-Net BCa-S, a volumetric deep radiomics approach for predicting SBR grade based on volumetric CDI$^s$ data. Given the promising results, this proposed method to identify the severity of the cancer would allow for better treatment decisions without the need for a biopsy. Cancer-Net BCa-S has been made publicly available as part of a global open-source initiative for advancing machine learning for cancer care.

Diffusion models with location-scale noise

  • Authors: Alexia Jolicoeur-Martineau, Kilian Fatras, Ke Li, Tal Kachman
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.05907
  • Pdf link: https://arxiv.org/pdf/2304.05907
  • Abstract
    Diffusion Models (DMs) are powerful generative models that add Gaussian noise to the data and learn to remove it. We wanted to determine which noise distribution (Gaussian or non-Gaussian) led to better generated data in DMs. Since DMs do not work by design with non-Gaussian noise, we built a framework that allows reversing a diffusion process with non-Gaussian location-scale noise. We use that framework to show that the Gaussian distribution performs the best over a wide range of other distributions (Laplace, Uniform, t, Generalized-Gaussian).

SpectralDiff: Hyperspectral Image Classification with Spectral-Spatial Diffusion Models

  • Authors: Ning Chen, Jun Yue, Leyuan Fang, Shaobo Xia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05961
  • Pdf link: https://arxiv.org/pdf/2304.05961
  • Abstract
    Hyperspectral image (HSI) classification is an important topic in the field of remote sensing, and has a wide range of applications in Earth science. HSIs contain hundreds of continuous bands, which are characterized by high dimension and high correlation between adjacent bands. The high dimension and redundancy of HSI data bring great difficulties to HSI classification. In recent years, a large number of HSI feature extraction and classification methods based on deep learning have been proposed. However, their ability to model the global relationships among samples in both spatial and spectral domains is still limited. In order to solve this problem, an HSI classification method with spectral-spatial diffusion models is proposed. The proposed method realizes the reconstruction of spectral-spatial distribution of the training samples with the forward and reverse spectral-spatial diffusion process, thus modeling the global spatial-spectral relationship between samples. Then, we use the spectral-spatial denoising network of the reverse process to extract the unsupervised diffusion features. Features extracted by the spectral-spatial diffusion models can achieve cross-sample perception from the reconstructed distribution of the training samples, thus obtaining better classification performance. Experiments on three public HSI datasets show that the proposed method can achieve better performance than the state-of-the-art methods. The source code and the pre-trained spectral-spatial diffusion model will be publicly available at https://github.com/chenning0115/SpectralDiff.

Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views

  • Authors: Siwei Zhang, Qianli Ma, Yan Zhang, Sadegh Aliakbarian, Darren Cosker, Siyu Tang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06024
  • Pdf link: https://arxiv.org/pdf/2304.06024
  • Abstract
    Automatic perception of human behaviors during social interactions is crucial for AR/VR applications, and an essential component is estimation of plausible 3D human pose and shape of our social partners from the egocentric view. One of the biggest challenges of this task is severe body truncation due to close social distances in egocentric scenarios, which brings large pose ambiguities for unseen body parts. To tackle this challenge, we propose a novel scene-conditioned diffusion method to model the body pose distribution. Conditioned on the 3D scene geometry, the diffusion model generates bodies in plausible human-scene interactions, with the sampling guided by a physics-based collision score to further resolve human-scene inter-penetrations. The classifier-free training enables flexible sampling with different conditions and enhanced diversity. A visibility-aware graph convolution model guided by per-joint visibility serves as the diffusion denoiser to incorporate inter-joint dependencies and per-body-part control. Extensive evaluations show that our method generates bodies in plausible interactions with 3D scenes, achieving both superior accuracy for visible joints and diversity for invisible body parts. The code will be available at https://sanweiliti.github.io/egohmr/egohmr.html.

DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

  • Authors: Johanna Karras, Aleksander Holynski, Ting-Chun Wang, Ira Kemelmacher-Shlizerman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06025
  • Pdf link: https://arxiv.org/pdf/2304.06025
  • Abstract
    We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion. To achieve this, we transform a pretrained text-to-image model (Stable Diffusion) into a pose-and-image guided video synthesis model, using a novel finetuning strategy, a set of architectural changes to support the added conditioning signals, and techniques to encourage temporal consistency. We fine-tune on a collection of fashion videos from the UBC Fashion dataset. We evaluate our method on a variety of clothing styles and poses, and demonstrate that our method produces state-of-the-art results on fashion video animation. Video results are available on our project page.

Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA

  • Authors: James Seale Smith, Yen-Chang Hsu, Lingyu Zhang, Ting Hua, Zsolt Kira, Yilin Shen, Hongxia Jin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06027
  • Pdf link: https://arxiv.org/pdf/2304.06027
  • Abstract
    Recent works demonstrate a remarkable ability to customize text-to-image diffusion models while only providing a few example images. What happens if you try to customize such models using multiple, fine-grained concepts in a sequential (i.e., continual) manner? In our work, we show that recent state-of-the-art customization of text-to-image models suffer from catastrophic forgetting when new concepts arrive sequentially. Specifically, when adding a new concept, the ability to generate high quality images of past, similar concepts degrade. To circumvent this forgetting, we propose a new method, C-LoRA, composed of a continually self-regularized low-rank adaptation in cross attention layers of the popular Stable Diffusion model. Furthermore, we use customization prompts which do not include the word of the customized object (i.e., "person" for a human face dataset) and are initialized as completely random embeddings. Importantly, our method induces only marginal additional parameter costs and requires no storage of user data for replay. We show that C-LoRA not only outperforms several baselines for our proposed setting of text-to-image continual customization, which we refer to as Continual Diffusion, but that we achieve a new state-of-the-art in the well-established rehearsal-free continual learning setting for image classification. The high achieving performance of C-LoRA in two separate domains positions it as a compelling solution for a wide range of applications, and we believe it has significant potential for practical impact.

Keyword: dynamic

Global QoS Policy Optimization in SD-WAN

  • Authors: Pham Tran Anh Quang, Jérémie Leguay, Xu Gong, Xu Huiying
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.05473
  • Pdf link: https://arxiv.org/pdf/2304.05473
  • Abstract
    In modern SD-WAN networks, a global controller is able to steer traffic on different paths based on application requirements and global intents. However, existing solutions cannot dynamically tune the way bandwidth is shared between flows inside each overlay link, in particular when the available capacity is uncertain due to cross traffic. In this context, we propose a global QoS (Quality of Service) policy optimization model that dynamically adjusts rate limits of applications based on their requirements to follow the evolution of network conditions. It relies on a novel cross-traffic estimator for the available bandwidth of overlay links that only exploits already available measurements. We propose two local search algorithms, one centralized and one distributed, that leverage cross-traffic estimation. We show in packet-level simulations a significant performance improvement in terms of SLA (Service Level Agreement) satisfaction. For instance, the adaptive tuning of load balancing and QoS policies based on cross-traffic estimation can improve SLA satisfaction by $40%$ compared to static policies.

Contingency Games for Multi-Agent Interaction

  • Authors: Lasse Peters, Andrea Bajcsy, Chih-Yuan Chiu, David Fridovich-Keil, Forrest Laine, Laura Ferranti, Javier Alonso-Mora
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.05483
  • Pdf link: https://arxiv.org/pdf/2304.05483
  • Abstract
    Contingency planning, wherein an agent generates a set of possible plans conditioned on the outcome of an uncertain event, is an increasingly popular way for robots to act under uncertainty. In this work, we take a game-theoretic perspective on contingency planning which is tailored to multi-agent scenarios in which a robot's actions impact the decisions of other agents and vice versa. The resulting contingency game allows the robot to efficiently coordinate with other agents by generating strategic motion plans conditioned on multiple possible intents for other actors in the scene. Contingency games are parameterized via a scalar variable which represents a future time at which intent uncertainty will be resolved. Varying this parameter enables a designer to easily adjust how conservatively the robot behaves in the game. Interestingly, we also find that existing variants of game-theoretic planning under uncertainty are readily obtained as special cases of contingency games. Lastly, we offer an efficient method for solving N-player contingency games with nonlinear dynamics and non-convex costs and constraints. Through a series of simulated autonomous driving scenarios, we demonstrate that plans generated via contingency games provide quantitative performance gains over game-theoretic motion plans that do not account for future uncertainty reduction.

DistHD: A Learner-Aware Dynamic Encoding Method for Hyperdimensional Classification

  • Authors: Junyao Wang, Sitao Huang, Mohsen Imani
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05503
  • Pdf link: https://arxiv.org/pdf/2304.05503
  • Abstract
    Brain-inspired hyperdimensional computing (HDC) has been recently considered a promising learning approach for resource-constrained devices. However, existing approaches use static encoders that are never updated during the learning process. Consequently, it requires a very high dimensionality to achieve adequate accuracy, severely lowering the encoding and training efficiency. In this paper, we propose DistHD, a novel dynamic encoding technique for HDC adaptive learning that effectively identifies and regenerates dimensions that mislead the classification and compromise the learning quality. Our proposed algorithm DistHD successfully accelerates the learning process and achieves the desired accuracy with considerably lower dimensionality.

State estimation of a carbon capture process through POD model reduction and neural network approximation

  • Authors: Siyu Liu, Xunyuan Yin, Jinfeng Liu (University of Alberta)
  • Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.05514
  • Pdf link: https://arxiv.org/pdf/2304.05514
  • Abstract
    This paper presents an efficient approach for state estimation of post-combustion CO2 capture plants (PCCPs) by using reduced-order neural network models. The method involves extracting lower-dimensional feature vectors from high-dimensional operational data of the PCCP and constructing a reduced-order process model using proper orthogonal decomposition (POD). Multi-layer perceptron (MLP) neural networks capture the dominant dynamics of the process and train the network parameters with low-dimensional data obtained from open-loop simulations. The proposed POD-MLP model can be used as the basis for estimating the states of PCCPs at a significantly decreased computational cost. For state estimation, a reduced-order extended Kalman filtering (EKF) scheme based on the POD-MLP model is developed. Our simulations demonstrate that the proposed POD-MLP modeling approach reduces computational complexity compared to the POD-only model for nonlinear systems. Additionally, the POD-MLP-EKF algorithm can accurately reconstruct the full state information of PCCPs while significantly improving computational efficiency compared to the EKF based on the original PCCP model.

Necessary and Sufficient Conditions for Simultaneous State and Input Recovery of Linear Systems with Sparse Inputs by $\ell_1$-Minimization

  • Authors: Kyle Poe, Enrique Mallada, René Vidal
  • Subjects: Systems and Control (eess.SY); Information Theory (cs.IT); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.05526
  • Pdf link: https://arxiv.org/pdf/2304.05526
  • Abstract
    The study of theoretical conditions for recovering sparse signals from compressive measurements has received a lot of attention in the research community. In parallel, there has been a great amount of work characterizing conditions for the recovery both the state and the input to a linear dynamical system (LDS), including a handful of results on recovering sparse inputs. However, existing sufficient conditions for recovering sparse inputs to an LDS are conservative and hard to interpret, while necessary and sufficient conditions have not yet appeared in the literature. In this work, we provide (1) the first characterization of necessary and sufficient conditions for the existence and uniqueness of sparse inputs to an LDS, (2) the first necessary and sufficient conditions for a linear program to recover both an unknown initial state and a sparse input, and (3) simple, interpretable recovery conditions in terms of the LDS parameters. We conclude with a numerical validation of these claims and discuss implications and future directions.

CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data

  • Authors: Chen Zhao, Anqi Liu, Xiao Zhang, Xuewei Cao, Zhengming Ding, Qiuying Sha, Hui Shen, Hong-Wen Deng, Weihua Zhou
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Genomics (q-bio.GN)
  • Arxiv link: https://arxiv.org/abs/2304.05542
  • Pdf link: https://arxiv.org/pdf/2304.05542
  • Abstract
    Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning, which is used to maximize the mutual information between different types of omics, is employed before latent feature concatenation. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicated that the proposed CLCLSA outperformed the state-of-the-art approaches for multi-omics data classification using incomplete multi-omics data.

DynamicDet: A Unified Dynamic Architecture for Object Detection

  • Authors: Zhihao Lin, Yongtao Wang, Jinhe Zhang, Xiaojie Chu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05552
  • Pdf link: https://arxiv.org/pdf/2304.05552
  • Abstract
    Dynamic neural network is an emerging research topic in deep learning. With adaptive inference, dynamic models can achieve remarkable accuracy and computational efficiency. However, it is challenging to design a powerful dynamic detector, because of no suitable dynamic architecture and exiting criterion for object detection. To tackle these difficulties, we propose a dynamic framework for object detection, named DynamicDet. Firstly, we carefully design a dynamic architecture based on the nature of the object detection task. Then, we propose an adaptive router to analyze the multi-scale information and to decide the inference route automatically. We also present a novel optimization strategy with an exiting criterion based on the detection losses for our dynamic detectors. Last, we present a variable-speed inference strategy, which helps to realize a wide range of accuracy-speed trade-offs with only one dynamic detector. Extensive experiments conducted on the COCO benchmark demonstrate that the proposed DynamicDet achieves new state-of-the-art accuracy-speed trade-offs. For instance, with comparable accuracy, the inference speed of our dynamic detector Dy-YOLOv7-W6 surpasses YOLOv7-E6 by 12%, YOLOv7-D6 by 17%, and YOLOv7-E6E by 39%. The code is available at https://github.com/VDIGPKU/DynamicDet.

Towards Large-Scale Simulations of Open-Ended Evolution in Continuous Cellular Automata

  • Authors: Bert Wang-Chak Chan
  • Subjects: Neural and Evolutionary Computing (cs.NE); Cellular Automata and Lattice Gases (nlin.CG)
  • Arxiv link: https://arxiv.org/abs/2304.05639
  • Pdf link: https://arxiv.org/pdf/2304.05639
  • Abstract
    Inspired by biological and cultural evolution, there have been many attempts to explore and elucidate the necessary conditions for open-endedness in artificial intelligence and artificial life. Using a continuous cellular automata called Lenia as the base system, we built large-scale evolutionary simulations using parallel computing framework JAX, in order to achieve the goal of never-ending evolution of self-organizing patterns. We report a number of system design choices, including (1) implicit implementation of genetic operators, such as reproduction by pattern self-replication, and selection by differential existential success; (2) localization of genetic information; and (3) algorithms for dynamically maintenance of the localized genotypes and translation to phenotypes. Simulation results tend to go through a phase of diversity and creativity, gradually converge to domination by fast expanding patterns, presumably a optimal solution under the current design. Based on our experimentation, we propose several factors that may further facilitate open-ended evolution, such as virtual environment design, mass conservation, and energy constraints.

Instance-Aware Domain Generalization for Face Anti-Spoofing

  • Authors: Qianyu Zhou, Ke-Yue Zhang, Taiping Yao, Xuequan Lu, Ran Yi, Shouhong Ding, Lizhuang Ma
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05640
  • Pdf link: https://arxiv.org/pdf/2304.05640
  • Abstract
    Face anti-spoofing (FAS) based on domain generalization (DG) has been recently studied to improve the generalization on unseen scenarios. Previous methods typically rely on domain labels to align the distribution of each domain for learning domain-invariant representations. However, artificial domain labels are coarse-grained and subjective, which cannot reflect real domain distributions accurately. Besides, such domain-aware methods focus on domain-level alignment, which is not fine-grained enough to ensure that learned representations are insensitive to domain styles. To address these issues, we propose a novel perspective for DG FAS that aligns features on the instance level without the need for domain labels. Specifically, Instance-Aware Domain Generalization framework is proposed to learn the generalizable feature by weakening the features' sensitivity to instance-specific styles. Concretely, we propose Asymmetric Instance Adaptive Whitening to adaptively eliminate the style-sensitive feature correlation, boosting the generalization. Moreover, Dynamic Kernel Generator and Categorical Style Assembly are proposed to first extract the instance-specific features and then generate the style-diversified features with large style shifts, respectively, further facilitating the learning of style-insensitive features. Extensive experiments and analysis demonstrate the superiority of our method over state-of-the-art competitors. Code will be publicly available at https://github.com/qianyuzqy/IADG.

WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language

  • Authors: Zhenxiang Lin, Xidong Peng, Peishan Cong, Yuenan Hou, Xinge Zhu, Sibei Yang, Yuexin Ma
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05645
  • Pdf link: https://arxiv.org/pdf/2304.05645
  • Abstract
    We introduce the task of 3D visual grounding in large-scale dynamic scenes based on natural linguistic descriptions and online captured multi-modal visual data, including 2D images and 3D LiDAR point clouds. We present a novel method, WildRefer, for this task by fully utilizing the appearance features in images, the location and geometry features in point clouds, and the dynamic features in consecutive input frames to match the semantic features in language. In particular, we propose two novel datasets, STRefer and LifeRefer, which focus on large-scale human-centric daily-life scenarios with abundant 3D object and natural language annotations. Our datasets are significant for the research of 3D visual grounding in the wild and has huge potential to boost the development of autonomous driving and service robots. Extensive comparisons and ablation studies illustrate that our method achieves state-of-the-art performance on two proposed datasets. Code and dataset will be released when the paper is published.

A parallel rank-adaptive integrator for dynamical low-rank approximation

  • Authors: Gianluca Ceruti, Jonas Kusch, Christian Lubich
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.05660
  • Pdf link: https://arxiv.org/pdf/2304.05660
  • Abstract
    This work introduces a parallel and rank-adaptive matrix integrator for dynamical low-rank approximation. The method is related to the previously proposed rank-adaptive basis update & Galerkin (BUG) integrator but differs significantly in that all arising differential equations, both for the basis and the Galerkin coefficients, are solved in parallel. Moreover, this approach eliminates the need for a potentially costly coefficient update with augmented basis matrices. The integrator also incorporates a new step rejection strategy that enhances the robustness of both the parallel integrator and the BUG integrator. By construction, the parallel integrator inherits the robust error bound of the BUG and projector-splitting integrators. Comparisons of the parallel and BUG integrators are presented by a series of numerical experiments which demonstrate the efficiency of the proposed method, for problems from radiative transfer and radiation therapy.

Real-time Trajectory-based Social Group Detection

  • Authors: Simindokht Jahangard, Munawar Hayat, Hamid Rezatofighi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05678
  • Pdf link: https://arxiv.org/pdf/2304.05678
  • Abstract
    Social group detection is a crucial aspect of various robotic applications, including robot navigation and human-robot interactions. To date, a range of model-based techniques have been employed to address this challenge, such as the F-formation and trajectory similarity frameworks. However, these approaches often fail to provide reliable results in crowded and dynamic scenarios. Recent advancements in this area have mainly focused on learning-based methods, such as deep neural networks that use visual content or human pose. Although visual content-based methods have demonstrated promising performance on large-scale datasets, their computational complexity poses a significant barrier to their practical use in real-time applications. To address these issues, we propose a simple and efficient framework for social group detection. Our approach explores the impact of motion trajectory on social grouping and utilizes a novel, reliable, and fast data-driven method. We formulate the individuals in a scene as a graph, where the nodes are represented by LSTM-encoded trajectories and the edges are defined by the distances between each pair of tracks. Our framework employs a modified graph transformer module and graph clustering losses to detect social groups. Our experiments on the popular JRDBAct dataset reveal noticeable improvements in performance, with relative improvements ranging from 2% to 11%. Furthermore, our framework is significantly faster, with up to 12x faster inference times compared to state-of-the-art methods under the same computation resources. These results demonstrate that our proposed method is suitable for real-time robotic applications.

A Persistent-Excitation-Free Method for System Disturbance Estimation Using Concurrent Learning

  • Authors: Zengjie Zhang, Fangzhou Liu, Tong Liu, Jianbin Qiu, Martin Buss
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.05693
  • Pdf link: https://arxiv.org/pdf/2304.05693
  • Abstract
    Observer-based methods are widely used to estimate the disturbances of different dynamic systems. However, a drawback of the conventional disturbance observers is that they all assume persistent excitation (PE) of the systems. As a result, they may lead to poor estimation precision when PE is not ensured, for instance, when the disturbance gain of the system is close to the singularity. In this paper, we propose a novel disturbance observer based on concurrent learning (CL) with time-variant history stacks, which ensures high estimation precision even in PE-free cases. The disturbance observer is designed in both continuous and discrete time. The estimation errors of the proposed method are proved to converge to a bounded set using the Lyapunov method. A history-sample-selection procedure is proposed to reduce the estimation error caused by the accumulation of old history samples. A simulation study on epidemic control shows that the proposed method produces higher estimation precision than the conventional disturbance observer when PE is not satisfied. This justifies the correctness of the proposed CL-based disturbance observer and verifies its applicability to solving practical problems.

Multi-scale Geometry-aware Transformer for 3D Point Cloud Classification

  • Authors: Xian Wei, Muyu Wang, Shing-Ho Jonathan Lin, Zhengyu Li, Jian Yang, Arafat Al-Jawari, Xuan Tang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.05694
  • Pdf link: https://arxiv.org/pdf/2304.05694
  • Abstract
    Self-attention modules have demonstrated remarkable capabilities in capturing long-range relationships and improving the performance of point cloud tasks. However, point cloud objects are typically characterized by complex, disordered, and non-Euclidean spatial structures with multiple scales, and their behavior is often dynamic and unpredictable. The current self-attention modules mostly rely on dot product multiplication and dimension alignment among query-key-value features, which cannot adequately capture the multi-scale non-Euclidean structures of point cloud objects. To address these problems, this paper proposes a self-attention plug-in module with its variants, Multi-scale Geometry-aware Transformer (MGT). MGT processes point cloud data with multi-scale local and global geometric information in the following three aspects. At first, the MGT divides point cloud data into patches with multiple scales. Secondly, a local feature extractor based on sphere mapping is proposed to explore the geometry inner each patch and generate a fixed-length representation for each patch. Thirdly, the fixed-length representations are fed into a novel geodesic-based self-attention to capture the global non-Euclidean geometry between patches. Finally, all the modules are integrated into the framework of MGT with an end-to-end training scheme. Experimental results demonstrate that the MGT vastly increases the capability of capturing multi-scale geometry using the self-attention mechanism and achieves strong competitive performance on mainstream point cloud benchmarks.

Human-Robot Skill Transfer with Enhanced Compliance via Dynamic Movement Primitives

  • Authors: Jayden Hong, Zengjie Zhang, Amir M. Soufi Enayati, Homayoun Najjaran
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05703
  • Pdf link: https://arxiv.org/pdf/2304.05703
  • Abstract
    Finding an efficient way to adapt robot trajectory is a priority to improve overall performance of robots. One approach for trajectory planning is through transferring human-like skills to robots by Learning from Demonstrations (LfD). The human demonstration is considered the target motion to mimic. However, human motion is typically optimal for human embodiment but not for robots because of the differences between human biomechanics and robot dynamics. The Dynamic Movement Primitives (DMP) framework is a viable solution for this limitation of LfD, but it requires tuning the second-order dynamics in the formulation. Our contribution is introducing a systematic method to extract the dynamic features from human demonstration to auto-tune the parameters in the DMP framework. In addition to its use with LfD, another utility of the proposed method is that it can readily be used in conjunction with Reinforcement Learning (RL) for robot training. In this way, the extracted features facilitate the transfer of human skills by allowing the robot to explore the possible trajectories more efficiently and increasing robot compliance significantly. We introduced a methodology to extract the dynamic features from multiple trajectories based on the optimization of human-likeness and similarity in the parametric space. Our method was implemented into an actual human-robot setup to extract human dynamic features and used to regenerate the robot trajectories following both LfD and RL with DMP. It resulted in a stable performance of the robot, maintaining a high degree of human-likeness based on accumulated distance error as good as the best heuristic tuning.

5Greplay: a 5G Network Traffic Fuzzer -- Application to Attack Injection

  • Authors: Zujany Salazar, Huu Nghia Nguyen, Wissam Mallouli, Ana R Cavalli, Edgardo Montes de Oca
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.05719
  • Pdf link: https://arxiv.org/pdf/2304.05719
  • Abstract
    The fifth generation of mobile broadband is more than just an evolution to provide more mobile bandwidth, massive machine-type communications, and ultra-reliable and low-latency communications. It relies on a complex, dynamic and heterogeneous environment that implies addressing numerous testing and security challenges. In this paper we present 5Greplay, an open-source 5G network traffic fuzzer that enables the evaluation of 5G components by replaying and modifying 5G network traffic by creating and injecting network scenarios into a target that can be a 5G core service (e.g., AMF, SMF) or a RAN network (e.g., gNodeB). The tool provides the ability to alter network packets online or offline in both control and data planes in a very flexible manner. The experimental evaluation conducted against open-source based 5G platforms, showed that the target services accept traffic being altered by the tool, and that it can reach up to 9.56 Gbps using only 1 processor core to replay 5G traffic.

Towards a more comprehensive open-source model for interdisciplinary smart integrated energy systems

  • Authors: Béla Wiegel, Tom Steffen, Davood Babazadeh, Christian Becker
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.05720
  • Pdf link: https://arxiv.org/pdf/2304.05720
  • Abstract
    The energy transition has recently experienced a further acceleration. In order to make the integration of renewable energies as cost-effective, secure and sustainable as possible and to develop new paradigms for the energy system, many energy system models have been developed in research in the past to evaluate the solutions. While model identification and dissemination of results are widely discussed in the literature, a detailed view of the methodology is often missing. This paper addresses this topic and proposes a methodology to build a comprehensive, publicly accessible database for modeling a multi-modal integrated energy system. The focus hereby is dynamic modeling of low- and medium-voltage grids consisting of prosumers, battery storages, heat pumps and electric cars. In addition, a district heating network is parameterized to match the electricity grid. Modelica and the TransiEnt-Library serves as the modeling tool. The methodology for creating the grid models is available via GitLab. A study case that uses the methodology to analyze the congestion situation within a medium-voltage distribution grid is presented.

Distributed Coverage Control of Constrained Constant-Speed Unicycle Multi-Agent Systems

  • Authors: Qingchen Liu, Zengjie Zhang, Nhan Khanh Le, Jiahu Qin, Fangzhou Liu, Sandra Hirche
  • Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.05723
  • Pdf link: https://arxiv.org/pdf/2304.05723
  • Abstract
    This paper proposes a novel distributed coverage controller for a multi-agent system with constant-speed unicycle robots (CSUR). The work is motivated by the limitation of the conventional method that does not ensure the satisfaction of hard state- and input-dependent constraints and leads to feasibility issues for multi-CSUR systems. In this paper, we solve these problems by designing a novel coverage cost function and a saturated gradient-search-based control law. Invariant set theory and Lyapunov-based techniques are used to prove the state-dependent confinement and the convergence of the system state to the optimal coverage configuration, respectively. The controller is implemented in a distributed manner based on a novel communication standard among the agents. A series of simulation case studies are conducted to validate the effectiveness of the proposed coverage controller in different initial conditions and with control parameters. A comparison study in simulation reveals the advantage of the proposed method in terms of avoiding infeasibility. The experiment study verifies the applicability of the method to real robots with uncertainties. The development procedure of the method from theoretical analysis to experimental validation provides a novel framework for multi-agent system coordinate control with complex agent dynamics.

Dynamic Graph Representation Learning with Neural Networks: A Survey

  • Authors: Leshanshui Yang, Sébastien Adam, Clément Chatelain
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.05729
  • Pdf link: https://arxiv.org/pdf/2304.05729
  • Abstract
    In recent years, Dynamic Graph (DG) representations have been increasingly used for modeling dynamic systems due to their ability to integrate both topological and temporal information in a compact representation. Dynamic graphs allow to efficiently handle applications such as social network prediction, recommender systems, traffic forecasting or electroencephalography analysis, that can not be adressed using standard numeric representations. As a direct consequence of the emergence of dynamic graph representations, dynamic graph learning has emerged as a new machine learning problem, combining challenges from both sequential/temporal data processing and static graph learning. In this research area, Dynamic Graph Neural Network (DGNN) has became the state of the art approach and plethora of models have been proposed in the very recent years. This paper aims at providing a review of problems and models related to dynamic graph learning. The various dynamic graph supervised learning settings are analysed and discussed. We identify the similarities and differences between existing models with respect to the way time information is modeled. Finally, general guidelines for a DGNN designer when faced with a dynamic graph learning problem are provided.

RO-MAP: Real-Time Multi-Object Mapping with Neural Radiance Fields

  • Authors: Xiao Han, Houxuan Liu, Yunchao Ding, Lu Yang
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.05735
  • Pdf link: https://arxiv.org/pdf/2304.05735
  • Abstract
    Accurate perception of objects in the environment is important for improving the scene understanding capability of SLAM systems. In robotic and augmented reality applications, object maps with semantic and metric information show attractive advantages. In this paper, we present RO-MAP, a novel multi-object mapping pipeline that does not rely on 3D priors. Given only monocular input, we use neural radiance fields to represent objects and couple them with a lightweight object SLAM based on multi-view geometry, to simultaneously localize objects and implicitly learn their dense geometry. We create separate implicit models for each detected object and train them dynamically and in parallel as new observations are added. Experiments on synthetic and real-world datasets demonstrate that our method can generate semantic object map with shape reconstruction, and be competitive with offline methods while achieving real-time performance (25Hz). The code and dataset will be available at: https://github.com/XiaoHan-Git/RO-MAP

Boosting long-term forecasting performance for continuous-time dynamic graph networks via data augmentation

  • Authors: Yuxing Tian, Mingjie Zhu, Jiachi Luo, Song Li
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05749
  • Pdf link: https://arxiv.org/pdf/2304.05749
  • Abstract
    This study focuses on long-term forecasting (LTF) on continuous-time dynamic graph networks (CTDGNs), which is important for real-world modeling. Existing CTDGNs are effective for modeling temporal graph data due to their ability to capture complex temporal dependencies but perform poorly on LTF due to the substantial requirement for historical data, which is not practical in most cases. To relieve this problem, a most intuitive way is data augmentation. In this study, we propose \textbf{\underline{U}ncertainty \underline{M}asked \underline{M}ix\underline{U}p (UmmU)}: a plug-and-play module that conducts uncertainty estimation to introduce uncertainty into the embedding of intermediate layer of CTDGNs, and perform masked mixup to further enhance the uncertainty of the embedding to make it generalize to more situations. UmmU can be easily inserted into arbitrary CTDGNs without increasing the number of parameters. We conduct comprehensive experiments on three real-world dynamic graph datasets, the results demonstrate that UmmU can effectively improve the long-term forecasting performance for CTDGNs.

Self-Supervised Learning with Cluster-Aware-DINO for High-Performance Robust Speaker Verification

  • Authors: Bing Han, Zhengyang Chen, Yanmin Qian
  • Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.05754
  • Pdf link: https://arxiv.org/pdf/2304.05754
  • Abstract
    Automatic speaker verification task has made great achievements using deep learning approaches with the large-scale manually annotated dataset. However, it's very difficult and expensive to collect a large amount of well-labeled data for system building. In this paper, we propose a novel and advanced self-supervised learning framework which can construct a high performance speaker verification system without using any labeled data. To avoid the impact of false negative pairs, we adopt the self-distillation with no labels (DINO) framework as the initial model, which can be trained without exploiting negative pairs. Then, we introduce a cluster-aware training strategy for DINO to improve the diversity of data. In the iteration learning stage, due to a mass of unreliable labels from clustering, the quality of pseudo labels is important for the system training. This motivates us to propose dynamic loss-gate and label correction (DLG-LC) methods to alleviate the performance degradation caused by unreliable labels. More specifically, we model the loss distribution with GMM and obtain the loss-gate threshold dynamically to distinguish the reliable and unreliable labels. Besides, we adopt the model predictions to correct the unreliable label, for better utilizing the unreliable data rather than dropping them directly. Moreover, we extend the DLG-LC to multi-modality to further improve the performance. The experiments are performed on the commonly used Voxceleb dataset. Compared to the best-known self-supervised speaker verification system, our proposed method obtain 22.17%, 27.94% and 25.56% relative EER improvement on Vox-O, Vox-E and Vox-H test sets, even with fewer iterations, smaller models, and simpler clustering methods. More importantly, the newly proposed system even achieves comparable results with the fully supervised system, but without using any human labeled data.

Learning coordination through new actions

  • Authors: Sofia B.S.D. Castro
  • Subjects: Computer Science and Game Theory (cs.GT); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.05763
  • Pdf link: https://arxiv.org/pdf/2304.05763
  • Abstract
    We provide a novel approach to achieving a desired outcome in a coordination game: the original 2x2 game is embedded in a 2x3 game where one of the players may use a third action. For a large set of payoff values only one of the Nash equilibria of the original 2x2 game is stable under replicator dynamics. We show that this Nash equilibrium is the {\omega}-limit of all initial conditions in the interior of the state space for the modified 2x3 game. Thus, the existence of a third action for one of the players, although not used, allows both players to coordinate on one Nash equilibrium. This Nash equilibrium is the one preferred by, at least, the player with access to the new action. This approach deals with both coordination failure (players choose the payoff-dominant Nash equilibrium, if such a Nash equilibrium exists) and miscoordination (players do not use mixed strategies).

Model Reduction of Linear Stochastic Systems with Preservation of sc-LTL Specifications

  • Authors: Maico Hendrikus Wilhelmus Engelaar, Licio Romao, Yulong Gao, Mircea Lazar, Alessandro Abate, Sofie Haesaert
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.05770
  • Pdf link: https://arxiv.org/pdf/2304.05770
  • Abstract
    We propose a correct-by-design controller synthesis framework for discrete-time linear stochastic systems that provides more flexibility to the overall abstraction framework of stochastic systems. Rather than directly abstracting the original dynamics, which can be large-scale and complex, we propose an intermediate step that leverages weak Gaussian realization theory and Kalman filtering techniques to obtain a related, discrete-time stochastic dynamical system that is simpler, and more prone to abstraction methods. We also propose a controller refinement algorithm and show correctness of the overall approach in enforcing synthetically co-safe Linear Temporal Logic properties. In general, the generated simplified stochastic dynamical systems are time-varying, but, under some technical conditions, will become time-invariant. We illustrate our theoretical findings with an example that supports the proposed correct-by-design framework and that illustrates how model reduction of stochastic models can be achieved.

A Security Evaluation Framework for Software-Defined Network Architectures in Data Center Environments

  • Authors: Igor Ivkić, Dominik Thiede, Nicholas Race, Matthew Broadbent, Antonios Gouglidis
  • Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.05776
  • Pdf link: https://arxiv.org/pdf/2304.05776
  • Abstract
    The importance of cloud computing has grown over the last years, which resulted in a significant increase of Data Center (DC) network requirements. Virtualisation is one of the key drivers of that transformation and enables a massive deployment of computing resources, which exhausts server capacity limits. Furthermore, the increased network endpoints need to be handled dynamically and centrally to facilitate cloud computing functionalities. Traditional DCs barely satisfy those demands because of their inherent limitations based on the network topology. Software-Defined Networks (SDN) promise to meet the increasing network requirements for cloud applications by decoupling control functionalities from data forwarding. Although SDN solutions add more flexibility to DC networks, they also pose new vulnerabilities with a high impact due to the centralised architecture. In this paper we propose an evaluation framework for assessing the security level of SDN architectures in four different stages. Furthermore, we show in an experimental study, how the framework can be used for mapping SDN threats with associated vulnerabilities and necessary mitigations in conjunction with risk and impact classification. The proposed framework helps administrators to evaluate the network security level, to apply countermeasures for identified SDN threats, and to meet the networks security requirements.

Micromagnetics simulations and phase transitions of ferromagnetics with Dzyaloshinskii-Moriya interaction

  • Authors: Panchi Li, Shuting Gu, Jin Lan, Jingrun Chen, Weiqing Ren, Rui Du
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.05789
  • Pdf link: https://arxiv.org/pdf/2304.05789
  • Abstract
    Magnetic skyrmions widely exist in a diverse range of magnetic systems, including chiral magnets with a non-centrosymmetric structure characterized by Dzyaloshinkii-Moriya interaction~(DMI). In this study, we propose a generalized semi-implicit backward differentiation formula projection method, enabling the simulations of the Landau-Lifshitz~(LL) equation in chiral magnets in a typical time step-size of $1$ ps, markedly exceeding the limit subjected by existing numerical methods of typically $0.1$ ps. Using micromagnetics simulations, we show that the LL equation with DMI reveals an intriguing dynamic instability in magnetization configurations as the damping varies. Both the isolated skyrmionium and skyrmionium clusters can be consequently produced using a simple initialization strategy and a specific damping parameter. Assisted by the string method, the transition path between skyrmion and skyrmionium, along with the escape of a skyrmion from the skyrmion clusters, are then thoroughly examined. The numerical methods developed in this work not only provide a reliable paradigm to investigate the skyrmion-based textures and their transition paths, but also facilitate the understandings for magnetization dynamics in complex magnetic systems.

Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series

  • Authors: Matthieu Herrmann, Chang Wei Tan, Mahsa Salehi, Geoffrey I. Webb
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05800
  • Pdf link: https://arxiv.org/pdf/2304.05800
  • Abstract
    Time series classification (TSC) is a challenging task due to the diversity of types of feature that may be relevant for different classification tasks, including trends, variance, frequency, magnitude, and various patterns. To address this challenge, several alternative classes of approach have been developed, including similarity-based, features and intervals, shapelets, dictionary, kernel, neural network, and hybrid approaches. While kernel, neural network, and hybrid approaches perform well overall, some specialized approaches are better suited for specific tasks. In this paper, we propose a new similarity-based classifier, Proximity Forest version 2.0 (PF 2.0), which outperforms previous state-of-the-art similarity-based classifiers across the UCR benchmark and outperforms state-of-the-art kernel, neural network, and hybrid methods on specific datasets in the benchmark that are best addressed by similarity-base methods. PF 2.0 incorporates three recent advances in time series similarity measures -- (1) computationally efficient early abandoning and pruning to speedup elastic similarity computations; (2) a new elastic similarity measure, Amerced Dynamic Time Warping (ADTW); and (3) cost function tuning. It rationalizes the set of similarity measures employed, reducing the eight base measures of the original PF to three and using the first derivative transform with all similarity measures, rather than a limited subset. We have implemented both PF 1.0 and PF 2.0 in a single C++ framework, making the PF framework more efficient.

Data-Driven Response Regime Exploration and Identification for Dynamical Systems

  • Authors: Maor Farid
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.05822
  • Pdf link: https://arxiv.org/pdf/2304.05822
  • Abstract
    Data-Driven Response Regime Exploration and Identification (DR$^2$EI) is a novel and fully data-driven method for identifying and classifying response regimes of a dynamical system without requiring human intervention. This approach is a valuable tool for exploring and discovering response regimes in complex dynamical systems, especially when the governing equations and the number of response regimes are unknown, and the system is expensive to sample. Additionally, the method is useful for order reduction, as it can be used to identify the most dominant response regimes of a given dynamical system. DR$^2$EI utilizes unsupervised learning algorithms to transform the system's response into an embedding space that facilitates regime classification. An active sequential sampling approach based on Gaussian Process Regression (GPR) is used to efficiently sample the parameter space, quantify uncertainty, and provide optimal trade-offs between exploration and exploitation. The performance of the DR$^2$EI method was evaluated by analyzing three established dynamical systems: the mathematical pendulum, the Lorenz system, and the Duffing oscillator. The method was shown to effectively identify a variety of response regimes with both similar and distinct topological features and frequency content, demonstrating its versatility in capturing a wide range of behaviors. While it may not be possible to guarantee that all possible regimes will be identified, the method provides an automated and efficient means for exploring the parameter space of a dynamical system and identifying its underlying "sufficiently dominant" response regimes without prior knowledge of the system's equations or behavior.

When Should You Wait Before Updating? Toward a Robustness Refinement

  • Authors: Swan Dubois, Laurent Feuilloley, Franck Petit, Mikaël Rabie
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.05831
  • Pdf link: https://arxiv.org/pdf/2304.05831
  • Abstract
    Consider a dynamic network and a given distributed problem. At any point in time, there might exist several solutions that are equally good with respect to the problem specification, but that are different from an algorithmic perspective, because some could be easier to update than others when the network changes. In other words, one would prefer to have a solution that is more robust to topological changes in the network; and in this direction the best scenario would be that the solution remains correct despite the dynamic of the network. In~\cite{CasteigtsDPR20}, the authors introduced a very strong robustness criterion: they required that for any removal of edges that maintain the network connected, the solution remains valid. They focus on the maximal independent set problem, and their approach consists in characterizing the graphs in which there exists a robust solution (the existential problem), or even stronger, where any solution is robust (the universal problem). As the robustness criteria is very demanding, few graphs have a robust solution, and even fewer are such that all of their solutions are robust. In this paper, we ask the following question: \textit{Can we have robustness for a larger class of networks, if we bound the number $k$ of edge removals allowed}? (See the full paper for the full abstract.)

Dynamic Mixed Membership Stochastic Block Model for Weighted Labeled Networks

  • Authors: Gaël Poux-Médard, Julien Velcin, Sabine Loudcher
  • Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.05894
  • Pdf link: https://arxiv.org/pdf/2304.05894
  • Abstract
    Most real-world networks evolve over time. Existing literature proposes models for dynamic networks that are either unlabeled or assumed to have a single membership structure. On the other hand, a new family of Mixed Membership Stochastic Block Models (MMSBM) allows to model static labeled networks under the assumption of mixed-membership clustering. In this work, we propose to extend this later class of models to infer dynamic labeled networks under a mixed membership assumption. Our approach takes the form of a temporal prior on the model's parameters. It relies on the single assumption that dynamics are not abrupt. We show that our method significantly differs from existing approaches, and allows to model more complex systems --dynamic labeled networks. We demonstrate the robustness of our method with several experiments on both synthetic and real-world datasets. A key interest of our approach is that it needs very few training data to yield good results. The performance gain under challenging conditions broadens the variety of possible applications of automated learning tools --as in social sciences, which comprise many fields where small datasets are a major obstacle to the introduction of machine learning methods.

A Phoneme-Informed Neural Network Model for Note-Level Singing Transcription

  • Authors: Sangeon Yong, Li Su, Juhan Nam
  • Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.05917
  • Pdf link: https://arxiv.org/pdf/2304.05917
  • Abstract
    Note-level automatic music transcription is one of the most representative music information retrieval (MIR) tasks and has been studied for various instruments to understand music. However, due to the lack of high-quality labeled data, transcription of many instruments is still a challenging task. In particular, in the case of singing, it is difficult to find accurate notes due to its expressiveness in pitch, timbre, and dynamics. In this paper, we propose a method of finding note onsets of singing voice more accurately by leveraging the linguistic characteristics of singing, which are not seen in other instruments. The proposed model uses mel-scaled spectrogram and phonetic posteriorgram (PPG), a frame-wise likelihood of phoneme, as an input of the onset detection network while PPG is generated by the pre-trained network with singing and speech data. To verify how linguistic features affect onset detection, we compare the evaluation results through the dataset with different languages and divide onset types for detailed analysis. Our approach substantially improves the performance of singing transcription and therefore emphasizes the importance of linguistic features in singing analysis.

Unified Numerical Stability and Accuracy Analysis of the Partitioned-Solution Approach

  • Authors: Georgios Tzounas, Gabriela Hug
  • Subjects: Numerical Analysis (math.NA); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.05955
  • Pdf link: https://arxiv.org/pdf/2304.05955
  • Abstract
    This paper focuses on the Partitioned-Solution Approach (PSA) employed for the Time-Domain Simulation (TDS) of dynamic power system models. In PSA, differential equations are solved at each step of the TDS for state variables, whereas algebraic equations are solved separately. The goal of this paper is to propose a novel, matrix-pencil based technique to study numerical stability and accuracy of PSA in a unified way. The proposed technique quantifies the numerical deformation that PSA-based methods introduce to the dynamics of the power system model, and allows estimating useful upper time step bounds that achieve prescribed simulation accuracy criteria. The family of Predictor-Corrector (PC) methods, which is commonly applied in practical implementations of PSA, is utilized to illustrate the proposed technique. Simulations are carried out on the IEEE 39-bus system, as well as on a 1479-bus model of the All-Island Irish Transmission System (AIITS).

UAV Obstacle Avoidance by Human-in-the-Loop Reinforcement in Arbitrary 3D Environment

  • Authors: Xuyang Li, Jianwu Fang, Kai Du, Kuizhi Mei, Jianru Xue
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.05959
  • Pdf link: https://arxiv.org/pdf/2304.05959
  • Abstract
    This paper focuses on the continuous control of the unmanned aerial vehicle (UAV) based on a deep reinforcement learning method for a large-scale 3D complex environment. The purpose is to make the UAV reach any target point from a certain starting point, and the flying height and speed are variable during navigation. In this work, we propose a deep reinforcement learning (DRL)-based method combined with human-in-the-loop, which allows the UAV to avoid obstacles automatically during flying. We design multiple reward functions based on the relevant domain knowledge to guide UAV navigation. The role of human-in-the-loop is to dynamically change the reward function of the UAV in different situations to suit the obstacle avoidance of the UAV better. We verify the success rate and average step size on urban, rural, and forest scenarios, and the experimental results show that the proposed method can reduce the training convergence time and improve the efficiency and accuracy of navigation tasks. The code is available on the website https://github.com/Monnalo/UAV_navigation.

An information-theoretic evolutionary algorithm

  • Authors: Arnaud Berny
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.05963
  • Pdf link: https://arxiv.org/pdf/2304.05963
  • Abstract
    We propose a novel evolutionary algorithm on bit vectors which derives from the principles of information theory. The information-theoretic evolutionary algorithm (it-EA) iteratively updates a search distribution with two parameters, the center, that is the bit vector at which standard bit mutation is applied, and the mutation rate. The mutation rate is updated by means of information-geometric optimization and the center is updated by means of a maximum likelihood principle. Standard elitist and non elitist updates of the center are also considered. Experiments illustrate the dynamics of the mutation rate and the influence of hyperparameters. In an empirical runtime analysis, on OneMax and LeadingOnes, the elitist and non elitist it-EAs obtain promising results.

Traffic Modeling with SUMO: a Tutorial

  • Authors: Davide Andrea Guastella, Gianluca Bontempi
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.05982
  • Pdf link: https://arxiv.org/pdf/2304.05982
  • Abstract
    This paper presents a step-by-step guide to generating and simulating a traffic scenario using the open-source simulation tool SUMO. It introduces the common pipeline used to generate a synthetic traffic model for SUMO, how to import existing traffic data into a model to achieve accuracy in traffic simulation (that is, producing a traffic model which dynamics is similar to the real one). It also describes how SUMO outputs information from simulation that can be used for data analysis purposes.

Astrocytic gliotransmission as a pathway for stable stimulation of post-synaptic spiking: Implications for working memory

  • Authors: Valentin Würzbauer, Kerstin Lenk, Matin Jafarian
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06004
  • Pdf link: https://arxiv.org/pdf/2304.06004
  • Abstract
    The brain consists not only of neurons but also of non-neuronal cells, including astrocytes. Recent discoveries in neuroscience suggest that astrocytes directly regulate neuronal activity by releasing gliotransmitters such as glutamate. In this paper, we consider a biologically plausible mathematical model of a tripartite neuron-astrocyte network. We study the stability of the nonlinear astrocyte dynamics, as well as its role in regulating the firing rate of the post-synaptic neuron. We show that astrocytes enable storing neuronal information temporarily. Motivated by recent findings on the role of astrocytes in explaining mechanisms of working memory, we numerically verify the utility of our analysis in showing the possibility of two competing theories of persistent and sparse neuronal activity of working memory.

Adaptive Human Matting for Dynamic Videos

  • Authors: Chung-Ching Lin, Jiang Wang, Kun Luo, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06018
  • Pdf link: https://arxiv.org/pdf/2304.06018
  • Abstract
    The most recent efforts in video matting have focused on eliminating trimap dependency since trimap annotations are expensive and trimap-based methods are less adaptable for real-time applications. Despite the latest tripmap-free methods showing promising results, their performance often degrades when dealing with highly diverse and unstructured videos. We address this limitation by introducing Adaptive Matting for Dynamic Videos, termed AdaM, which is a framework designed for simultaneously differentiating foregrounds from backgrounds and capturing alpha matte details of human subjects in the foreground. Two interconnected network designs are employed to achieve this goal: (1) an encoder-decoder network that produces alpha mattes and intermediate masks which are used to guide the transformer in adaptively decoding foregrounds and backgrounds, and (2) a transformer network in which long- and short-term attention combine to retain spatial and temporal contexts, facilitating the decoding of foreground details. We benchmark and study our methods on recently introduced datasets, showing that our model notably improves matting realism and temporal coherence in complex real-world videos and achieves new best-in-class generalizability. Further details and examples are available at https://github.com/microsoft/AdaM.

VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs

  • Authors: Moayed Haji Ali, Andrew Bond, Tolga Birdal, Duygu Ceylan, Levent Karacan, Erkut Erdem, Aykut Erdem
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06020
  • Pdf link: https://arxiv.org/pdf/2304.06020
  • Abstract
    We propose $\textbf{VidStyleODE}$, a spatiotemporally continuous disentangled $\textbf{Vid}$eo representation based upon $\textbf{Style}$GAN and Neural-$\textbf{ODE}$s. Effective traversal of the latent space learned by Generative Adversarial Networks (GANs) has been the basis for recent breakthroughs in image editing. However, the applicability of such advancements to the video domain has been hindered by the difficulty of representing and controlling videos in the latent space of GANs. In particular, videos are composed of content (i.e., appearance) and complex motion components that require a special mechanism to disentangle and control. To achieve this, VidStyleODE encodes the video content in a pre-trained StyleGAN $\mathcal{W}_+$ space and benefits from a latent ODE component to summarize the spatiotemporal dynamics of the input video. Our novel continuous video generation process then combines the two to generate high-quality and temporally consistent videos with varying frame rates. We show that our proposed method enables a variety of applications on real videos: text-guided appearance manipulation, motion manipulation, image animation, and video interpolation and extrapolation. Project website: https://cyberiada.github.io/VidStyleODE

New submissions for Fri, 17 Mar 23

Keyword: pruning

There is no result

Keyword: neural\ architecture\ search

There is no result

Keyword: 3d object detection

Among Us: Adversarially Robust Collaborative Perception by Consensus

  • Authors: Yiming Li, Qi Fang, Jiamu Bai, Siheng Chen, Felix Juefei-Xu, Chen Feng
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2303.09495
  • Pdf link: https://arxiv.org/pdf/2303.09495
  • Abstract
    Multiple robots could perceive a scene (e.g., detect objects) collaboratively better than individuals, although easily suffer from adversarial attacks when using deep learning. This could be addressed by the adversarial defense, but its training requires the often-unknown attacking mechanism. Differently, we propose ROBOSAC, a novel sampling-based defense strategy generalizable to unseen attackers. Our key idea is that collaborative perception should lead to consensus rather than dissensus in results compared to individual perception. This leads to our hypothesize-and-verify framework: perception results with and without collaboration from a random subset of teammates are compared until reaching a consensus. In such a framework, more teammates in the sampled subset often entail better perception performance but require longer sampling time to reject potential attackers. Thus, we derive how many sampling trials are needed to ensure the desired size of an attacker-free subset, or equivalently, the maximum size of such a subset that we can successfully sample within a given number of trials. We validate our method on the task of collaborative 3D object detection in autonomous driving scenarios.

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

  • Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09551
  • Pdf link: https://arxiv.org/pdf/2303.09551
  • Abstract
    3D scene understanding plays a vital role in vision-based autonomous driving. While most existing methods focus on 3D object detection, they have difficulty describing real-world objects of arbitrary shapes and infinite classes. Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial 2D-3D attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To obtain dense occupancy prediction, we design a pipeline to generate dense occupancy ground truth without expansive occupancy annotations. Specifically, we fuse multi-frame LiDAR scans of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense occupancy labels. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our method. Code and dataset are available at https://github.com/weiyithu/SurroundOcc

Keyword: voxel

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

  • Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09551
  • Pdf link: https://arxiv.org/pdf/2303.09551
  • Abstract
    3D scene understanding plays a vital role in vision-based autonomous driving. While most existing methods focus on 3D object detection, they have difficulty describing real-world objects of arbitrary shapes and infinite classes. Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial 2D-3D attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To obtain dense occupancy prediction, we design a pipeline to generate dense occupancy ground truth without expansive occupancy annotations. Specifically, we fuse multi-frame LiDAR scans of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense occupancy labels. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our method. Code and dataset are available at https://github.com/weiyithu/SurroundOcc

Keyword: lidar

SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments

  • Authors: Yudi Dai (1), Yitai Lin (1), Xiping Lin (2), Chenglu Wen (1), Lan Xu (2), Hongwei Yi (3), Siqi Shen (1), Yuexin Ma (2), Cheng Wang (1) ((1) Xiamen University, China, (2) ShanghaiTech University, China, (3) Max Planck Institute for Intelligent Systems, Germany)
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09095
  • Pdf link: https://arxiv.org/pdf/2303.09095
  • Abstract
    We present SLOPER4D, a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction in the wild. Employing a head-mounted device integrated with a LiDAR and camera, we record 12 human subjects' activities over 10 diverse urban scenes from an egocentric view. Frame-wise annotations for 2D key points, 3D pose parameters, and global translations are provided, together with reconstructed scene point clouds. To obtain accurate 3D ground truth in such large dynamic scenes, we propose a joint optimization method to fit local SMPL meshes to the scene and fine-tune the camera calibration during dynamic motions frame by frame, resulting in plausible and scene-natural 3D human poses. Eventually, SLOPER4D consists of 15 sequences of human motions, each of which has a trajectory length of more than 200 meters (up to 1,300 meters) and covers an area of more than 2,000 $m^2$ (up to 13,000 $m^2$), including more than 100K LiDAR frames, 300k video frames, and 500K IMU-based motion frames. With SLOPER4D, we provide a detailed and thorough analysis of two critical tasks, including camera-based 3D HPE and LiDAR-based 3D HPE in urban environments, and benchmark a new task, GHPE. The in-depth analysis demonstrates SLOPER4D poses significant challenges to existing methods and produces great research opportunities. The dataset and code are released at \url{this http URL}

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

  • Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09551
  • Pdf link: https://arxiv.org/pdf/2303.09551
  • Abstract
    3D scene understanding plays a vital role in vision-based autonomous driving. While most existing methods focus on 3D object detection, they have difficulty describing real-world objects of arbitrary shapes and infinite classes. Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial 2D-3D attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To obtain dense occupancy prediction, we design a pipeline to generate dense occupancy ground truth without expansive occupancy annotations. Specifically, we fuse multi-frame LiDAR scans of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense occupancy labels. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our method. Code and dataset are available at https://github.com/weiyithu/SurroundOcc

New submissions for Fri, 7 Apr 23

Keyword: efficient

Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural Networks

  • Authors: Michael Weiss, Paolo Tonella
  • Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.02654
  • Pdf link: https://arxiv.org/pdf/2304.02654
  • Abstract
    Recent decades have seen the rise of large-scale Deep Neural Networks (DNNs) to achieve human-competitive performance in a variety of artificial intelligence tasks. Often consisting of hundreds of millions, if not hundreds of billion parameters, these DNNs are too large to be deployed to, or efficiently run on resource-constrained devices such as mobile phones or IoT microcontrollers. Systems relying on large-scale DNNs thus have to call the corresponding model over the network, leading to substantial costs for hosting and running the large-scale remote model, costs which are often charged on a per-use basis. In this paper, we propose BiSupervised, a novel architecture, where, before relying on a large remote DNN, a system attempts to make a prediction on a small-scale local model. A DNN supervisor monitors said prediction process and identifies easy inputs for which the local prediction can be trusted. For these inputs, the remote model does not have to be invoked, thus saving costs, while only marginally impacting the overall system accuracy. Our architecture furthermore foresees a second supervisor to monitor the remote predictions and identify inputs for which not even these can be trusted, allowing to raise an exception or run a fallback strategy instead. We evaluate the cost savings, and the ability to detect incorrectly predicted inputs on four diverse case studies: IMDB movie review sentiment classification, Github issue triaging, Imagenet image classification, and SQuADv2 free-text question answering

nD-PDPA: nDimensional Probability Density Profile Analysis

  • Authors: Arjang Fahim, Stephanie Irausquin, Homayoun Valafar
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
  • Arxiv link: https://arxiv.org/abs/2304.02682
  • Pdf link: https://arxiv.org/pdf/2304.02682
  • Abstract
    Despite the recent advances in various Structural Genomics Projects, a large gap remains between the number of sequenced and structurally characterized proteins. Some reasons for this discrepancy include technical difficulties, labor, and the cost related to determining a structure by experimental methods such as NMR spectroscopy. Several computational methods have been developed to expand the applicability of NMR spectroscopy by addressing temporal and economical problems more efficiently. While these methods demonstrate successful outcomes to solve more challenging and structurally novel proteins, the cost has not been reduced significantly. Probability Density Profile Analysis (PDPA) has been previously introduced by our lab to directly address the economics of structure determination of routine proteins and the identification of novel structures from a minimal set of unassigned NMR data. 2D-PDPA (in which 2D denotes incorporation of data from two alignment media) has been successful in identifying the structural homolog of an unknown protein within a library of ~1000 decoy structures. In order to further expand the selectivity and sensitivity of PDPA, the incorporation of additional data was necessary. However, the expansion of the original PDPA approach was limited by its computational requirements where the inclusion of additional data would render it computationally intractable. Here we present the most recent developments of PDPA method (nD-PDPA: n Dimensional Probability Density Profile Analysis) that eliminate 2D-PDPA's computational limitations, and allows inclusion of RDC data from multiple vector types in multiple alignment media.

A Certified Radius-Guided Attack Framework to Image Segmentation Models

  • Authors: Wenjie Qu, Youqi Li, Binghui Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02693
  • Pdf link: https://arxiv.org/pdf/2304.02693
  • Abstract
    Image segmentation is an important problem in many safety-critical applications. Recent studies show that modern image segmentation models are vulnerable to adversarial perturbations, while existing attack methods mainly follow the idea of attacking image classification models. We argue that image segmentation and classification have inherent differences, and design an attack framework specially for image segmentation models. Our attack framework is inspired by certified radius, which was originally used by defenders to defend against adversarial perturbations to classification models. We are the first, from the attacker perspective, to leverage the properties of certified radius and propose a certified radius guided attack framework against image segmentation models. Specifically, we first adapt randomized smoothing, the state-of-the-art certification method for classification models, to derive the pixel's certified radius. We then focus more on disrupting pixels with relatively smaller certified radii and design a pixel-wise certified radius guided loss, when plugged into any existing white-box attack, yields our certified radius-guided white-box attack. Next, we propose the first black-box attack to image segmentation models via bandit. We design a novel gradient estimator, based on bandit feedback, which is query-efficient and provably unbiased and stable. We use this gradient estimator to design a projected bandit gradient descent (PBGD) attack, as well as a certified radius-guided PBGD (CR-PBGD) attack. We prove our PBGD and CR-PBGD attacks can achieve asymptotically optimal attack performance with an optimal rate. We evaluate our certified-radius guided white-box and black-box attacks on multiple modern image segmentation models and datasets. Our results validate the effectiveness of our certified radius-guided attack framework.

Recovering Continuous Scene Dynamics from A Single Blurry Image with Events

  • Authors: Zhangyi Cheng, Xiang Zhang, Lei Yu, Jianzhuang Liu, Wen Yang, Gui-Song Xia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02695
  • Pdf link: https://arxiv.org/pdf/2304.02695
  • Abstract
    This paper aims at demystifying a single motion-blurred image with events and revealing temporally continuous scene dynamics encrypted behind motion blurs. To achieve this end, an Implicit Video Function (IVF) is learned to represent a single motion blurred image with concurrent events, enabling the latent sharp image restoration of arbitrary timestamps in the range of imaging exposures. Specifically, a dual attention transformer is proposed to efficiently leverage merits from both modalities, i.e., the high temporal resolution of event features and the smoothness of image features, alleviating temporal ambiguities while suppressing the event noise. The proposed network is trained only with the supervision of ground-truth images of limited referenced timestamps. Motion- and texture-guided supervisions are employed simultaneously to enhance restorations of the non-referenced timestamps and improve the overall sharpness. Experiments on synthetic, semi-synthetic, and real-world datasets demonstrate that our proposed method outperforms state-of-the-art methods by a large margin in terms of both objective PSNR and SSIM measurements and subjective evaluations.

Agnostic proper learning of monotone functions: beyond the black-box correction barrier

  • Authors: Jane Lange, Arsen Vasilyan
  • Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02700
  • Pdf link: https://arxiv.org/pdf/2304.02700
  • Abstract
    We give the first agnostic, efficient, proper learning algorithm for monotone Boolean functions. Given $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$ uniformly random examples of an unknown function $f:{\pm 1}^n \rightarrow {\pm 1}$, our algorithm outputs a hypothesis $g:{\pm 1}^n \rightarrow {\pm 1}$ that is monotone and $(\mathrm{opt} + \varepsilon)$-close to $f$, where $\mathrm{opt}$ is the distance from $f$ to the closest monotone function. The running time of the algorithm (and consequently the size and evaluation time of the hypothesis) is also $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$, nearly matching the lower bound of Blais et al (RANDOM '15). We also give an algorithm for estimating up to additive error $\varepsilon$ the distance of an unknown function $f$ to monotone using a run-time of $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$. Previously, for both of these problems, sample-efficient algorithms were known, but these algorithms were not run-time efficient. Our work thus closes this gap in our knowledge between the run-time and sample complexity. This work builds upon the improper learning algorithm of Bshouty and Tamon (JACM '96) and the proper semiagnostic learning algorithm of Lange, Rubinfeld, and Vasilyan (FOCS '22), which obtains a non-monotone Boolean-valued hypothesis, then corrects'' it to monotone using query-efficient local computation algorithms on graphs. This black-box correction approach can achieve no error better than $2\mathrm{opt} + \varepsilon$ information-theoretically; we bypass this barrier by a) augmenting the improper learner with a convex optimization step, and b) learning and correcting a real-valued function before rounding its values to Boolean. Our real-valued correction algorithm solves the poset sorting'' problem of [LRV22] for functions over general posets with non-Boolean labels.

A Unified Taxonomy for Automated Vehicles: Individual, Cooperative, Collaborative, On-Road, and Off-Road

  • Authors: Fredrik Warg, Anders Thorsén, Victoria Vu, Carl Bergenhem
  • Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02705
  • Pdf link: https://arxiv.org/pdf/2304.02705
  • Abstract
    Various types of vehicle automation is increasingly used in a variety of environments including road vehicles such as cars or automated shuttles, confined areas such as mines or harbours, or in agriculture and forestry. In many use cases, the benefits are greater if several automated vehicles (AVs) cooperate to aid each other reach their goals more efficiently, or collaborate to complete a common task. Taxonomies and definitions create a common framework that helps researchers and practitioners advance the field. However, most existing work focus on road vehicles. In this paper, we review and extend taxonomies and definitions to encompass individually acting as well as cooperative and collaborative AVs for both on-road and off-road use cases. In particular, we introduce classes of collaborative vehicles not defined in existing literature, and define levels of automation suitable for vehicles where automation applies to additional functions in addition to the driving task.

Efficient OCR for Building a Diverse Digital History

  • Authors: Jacob Carlson, Tom Bryan, Melissa Dell
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Digital Libraries (cs.DL); General Economics (econ.GN)
  • Arxiv link: https://arxiv.org/abs/2304.02737
  • Pdf link: https://arxiv.org/pdf/2304.02737
  • Abstract
    Thousands of users consult digital archives daily, but the information they can access is unrepresentative of the diversity of documentary history. The sequence-to-sequence architecture typically used for optical character recognition (OCR) - which jointly learns a vision and language model - is poorly extensible to low-resource document collections, as learning a language-vision model requires extensive labeled sequences and compute. This study models OCR as a character level image retrieval problem, using a contrastively trained vision encoder. Because the model only learns characters' visual features, it is more sample efficient and extensible than existing architectures, enabling accurate OCR in settings where existing solutions fail. Crucially, the model opens new avenues for community engagement in making digital history more representative of documentary history.

Sejarah dan Perkembangan Teknik Natural Language Processing (NLP) Bahasa Indonesia: Tinjauan tentang sejarah, perkembangan teknologi, dan aplikasi NLP dalam bahasa Indonesia

  • Authors: Mukhlis Amien
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.02746
  • Pdf link: https://arxiv.org/pdf/2304.02746
  • Abstract
    This study provides an overview of the history of the development of Natural Language Processing (NLP) in the context of the Indonesian language, with a focus on the basic technologies, methods, and practical applications that have been developed. This review covers developments in basic NLP technologies such as stemming, part-of-speech tagging, and related methods; practical applications in cross-language information retrieval systems, information extraction, and sentiment analysis; and methods and techniques used in Indonesian language NLP research, such as machine learning, statistics-based machine translation, and conflict-based approaches. This study also explores the application of NLP in Indonesian language industry and research and identifies challenges and opportunities in Indonesian language NLP research and development. Recommendations for future Indonesian language NLP research and development include developing more efficient methods and technologies, expanding NLP applications, increasing sustainability, further research into the potential of NLP, and promoting interdisciplinary collaboration. It is hoped that this review will help researchers, practitioners, and the government to understand the development of Indonesian language NLP and identify opportunities for further research and development.

Robust, privacy-preserving, transparent, and auditable on-device blocklisting

  • Authors: Kurt Thomas, Sarah Meiklejohn, Michael A. Specter, Xiang Wang, Xavier Llorà, Stephan Somogyi, David Kleidermacher
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.02810
  • Pdf link: https://arxiv.org/pdf/2304.02810
  • Abstract
    With the accelerated adoption of end-to-end encryption, there is an opportunity to re-architect security and anti-abuse primitives in a manner that preserves new privacy expectations. In this paper, we consider two novel protocols for on-device blocklisting that allow a client to determine whether an object (e.g., URL, document, image, etc.) is harmful based on threat information possessed by a so-called remote enforcer in a way that is both privacy-preserving and trustworthy. Our protocols leverage a unique combination of private set intersection to promote privacy, cryptographic hashes to ensure resilience to false positives, cryptographic signatures to improve transparency, and Merkle inclusion proofs to ensure consistency and auditability. We benchmark our protocols -- one that is time-efficient, and the other space-efficient -- to demonstrate their practical use for applications such as email, messaging, storage, and other applications. We also highlight remaining challenges, such as privacy and censorship tensions that exist with logging or reporting. We consider our work to be a critical first step towards enabling complex, multi-stakeholder discussions on how best to provide on-device protections.

GIF: A General Graph Unlearning Strategy via Influence Function

  • Authors: Jiancan Wu, Yi Yang, Yuchun Qian, Yongduo Sui, Xiang Wang, Xiangnan He
  • Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.02835
  • Pdf link: https://arxiv.org/pdf/2304.02835
  • Abstract
    With the greater emphasis on privacy and security in our society, the problem of graph unlearning -- revoking the influence of specific data on the trained GNN model, is drawing increasing attention. However, ranging from machine unlearning to recently emerged graph unlearning methods, existing efforts either resort to retraining paradigm, or perform approximate erasure that fails to consider the inter-dependency between connected neighbors or imposes constraints on GNN structure, therefore hard to achieve satisfying performance-complexity trade-offs. In this work, we explore the influence function tailored for graph unlearning, so as to improve the unlearning efficacy and efficiency for graph unlearning. We first present a unified problem formulation of diverse graph unlearning tasks \wrt node, edge, and feature. Then, we recognize the crux to the inability of traditional influence function for graph unlearning, and devise Graph Influence Function (GIF), a model-agnostic unlearning method that can efficiently and accurately estimate parameter changes in response to a $\epsilon$-mass perturbation in deleted data. The idea is to supplement the objective of the traditional influence function with an additional loss term of the influenced neighbors due to the structural dependency. Further deductions on the closed-form solution of parameter changes provide a better understanding of the unlearning mechanism. We conduct extensive experiments on four representative GNN models and three benchmark datasets to justify the superiority of GIF for diverse graph unlearning tasks in terms of unlearning efficacy, model utility, and unlearning efficiency. Our implementations are available at \url{https://github.com/wujcan/GIF-torch/}.

Robustmix: Improving Robustness by Regularizing the Frequency Bias of Deep Nets

  • Authors: Jonas Ngnawe, Marianne ABEMGNIGNI NJIFON, Jonathan Heek, Yann Dauphin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02847
  • Pdf link: https://arxiv.org/pdf/2304.02847
  • Abstract
    Deep networks have achieved impressive results on a range of well-curated benchmark datasets. Surprisingly, their performance remains sensitive to perturbations that have little effect on human performance. In this work, we propose a novel extension of Mixup called Robustmix that regularizes networks to classify based on lower-frequency spatial features. We show that this type of regularization improves robustness on a range of benchmarks such as Imagenet-C and Stylized Imagenet. It adds little computational overhead and, furthermore, does not require a priori knowledge of a large set of image transformations. We find that this approach further complements recent advances in model architecture and data augmentation, attaining a state-of-the-art mCE of 44.8 with an EfficientNet-B8 model and RandAugment, which is a reduction of 16 mCE compared to the baseline.

Towards an Effective and Efficient Transformer for Rain-by-snow Weather Removal

  • Authors: Tao Gao, Yuanbo Wen, Kaihao Zhang, Peng Cheng, Ting Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02860
  • Pdf link: https://arxiv.org/pdf/2304.02860
  • Abstract
    Rain-by-snow weather removal is a specialized task in weather-degraded image restoration aiming to eliminate coexisting rain streaks and snow particles. In this paper, we propose RSFormer, an efficient and effective Transformer that addresses this challenge. Initially, we explore the proximity of convolution networks (ConvNets) and vision Transformers (ViTs) in hierarchical architectures and experimentally find they perform approximately at intra-stage feature learning. On this basis, we utilize a Transformer-like convolution block (TCB) that replaces the computationally expensive self-attention while preserving attention characteristics for adapting to input content. We also demonstrate that cross-stage progression is critical for performance improvement, and propose a global-local self-attention sampling mechanism (GLASM) that down-/up-samples features while capturing both global and local dependencies. Finally, we synthesize two novel rain-by-snow datasets, RSCityScape and RS100K, to evaluate our proposed RSFormer. Extensive experiments verify that RSFormer achieves the best trade-off between performance and time-consumption compared to other restoration methods. For instance, it outperforms Restormer with a 1.53% reduction in the number of parameters and a 15.6% reduction in inference time. Datasets, source code and pre-trained models are available at \url{https://github.com/chdwyb/RSFormer}.

VPFusion: Towards Robust Vertical Representation Learning for 3D Object Detection

  • Authors: Yuhao Huang, Sanping Zhou, Junjie Zhang, Jinpeng Dong, Nanning Zheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02867
  • Pdf link: https://arxiv.org/pdf/2304.02867
  • Abstract
    Efficient point cloud representation is a fundamental element of Lidar-based 3D object detection. Recent grid-based detectors usually divide point clouds into voxels or pillars and construct single-stream networks in Bird's Eye View. However, these point cloud encoding paradigms underestimate the point representation in the vertical direction, which cause the loss of semantic or fine-grained information, especially for vertical sensitive objects like pedestrian and cyclists. In this paper, we propose an explicit vertical multi-scale representation learning framework, VPFusion, to combine the complementary information from both voxel and pillar streams. Specifically, VPFusion first builds upon a sparse voxel-pillar-based backbone. The backbone divides point clouds into voxels and pillars, then encodes features with 3D and 2D sparse convolution simultaneously. Next, we introduce the Sparse Fusion Layer (SFL), which establishes a bidirectional pathway for sparse voxel and pillar features to enable the interaction between them. Additionally, we present the Dense Fusion Neck (DFN) to effectively combine the dense feature maps from voxel and pillar branches with multi-scale. Extensive experiments on the large-scale Waymo Open Dataset and nuScenes Dataset demonstrate that VPFusion surpasses the single-stream baselines by a large margin and achieves state-of-the-art performance with real-time inference speed.

Object-centric Inference for Language Conditioned Placement: A Foundation Model based Approach

  • Authors: Zhixuan Xu, Kechun Xu, Yue Wang, Rong Xiong
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02893
  • Pdf link: https://arxiv.org/pdf/2304.02893
  • Abstract
    We focus on the task of language-conditioned object placement, in which a robot should generate placements that satisfy all the spatial relational constraints in language instructions. Previous works based on rule-based language parsing or scene-centric visual representation have restrictions on the form of instructions and reference objects or require large amounts of training data. We propose an object-centric framework that leverages foundation models to ground the reference objects and spatial relations for placement, which is more sample efficient and generalizable. Experiments indicate that our model can achieve a 97.75% success rate of placement with only ~0.26M trainable parameters. Besides, our method generalizes better to both unseen objects and instructions. Moreover, with only 25% training data, we still outperform the top competing approach.

Affect as a proxy for literary mood

  • Authors: Emily Öhman, Riikka Rossi
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.02894
  • Pdf link: https://arxiv.org/pdf/2304.02894
  • Abstract
    We propose to use affect as a proxy for mood in literary texts. In this study, we explore the differences in computationally detecting tone versus detecting mood. Methodologically we utilize affective word embeddings to look at the affective distribution in different text segments. We also present a simple yet efficient and effective method of enhancing emotion lexicons to take both semantic shift and the domain of the text into account producing real-world congruent results closely matching both contemporary and modern qualitative analyses.

LSketch: A Label-Enabled Graph Stream Sketch Toward Time-Sensitive Queries

  • Authors: Yiling Zeng, Chunyao Song, Yuhan Li, Tingjian Ge
  • Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.02897
  • Pdf link: https://arxiv.org/pdf/2304.02897
  • Abstract
    Graph streams represent data interactions in real applications. The mining of graph streams plays an important role in network security, social network analysis, and traffic control, among others. However, the sheer volume and high dynamics cause great challenges for efficient storage and subsequent query analysis on them. Current studies apply sketches to summarize graph streams. We propose LSketch that works for heterogeneous graph streams, which effectively preserves the label information carried by the streams in real scenes, thereby enriching the expressive ability of sketches. In addition, as graph streams continue to evolve over time, edges too old may lose their practical significance. Therefore, we introduce the sliding window model into LSketch to eliminate the expired edges automatically. LSketch uses sub-linear storage space and can support structure based queries and time-sensitive queries with high accuracy. We perform extensive experiments over four real datasets, demonstrating the superiority of the proposed method over state-of-the-art methods, in aspects of query accuracy and time efficiency.

InterFormer: Real-time Interactive Image Segmentation

  • Authors: You Huang, Hao Yang, Ke Sun, Shengchuan Zhang, Guannan Jiang, Rongrong Ji, Liujuan Cao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.02942
  • Pdf link: https://arxiv.org/pdf/2304.02942
  • Abstract
    Interactive image segmentation enables annotators to efficiently perform pixel-level annotation for segmentation tasks. However, the existing interactive segmentation pipeline suffers from inefficient computations of interactive models because of the following two issues. First, annotators' later click is based on models' feedback of annotators' former click. This serial interaction is unable to utilize model's parallelism capabilities. Second, the model has to repeatedly process the image, the annotator's current click, and the model's feedback of the annotator's former clicks at each step of interaction, resulting in redundant computations. For efficient computation, we propose a method named InterFormer that follows a new pipeline to address these issues. InterFormer extracts and preprocesses the computationally time-consuming part i.e. image processing from the existing process. Specifically, InterFormer employs a large vision transformer (ViT) on high-performance devices to preprocess images in parallel, and then uses a lightweight module called interactive multi-head self attention (I-MSA) for interactive segmentation. Furthermore, the I-MSA module's deployment on low-power devices extends the practical application of interactive segmentation. The I-MSA module utilizes the preprocessed features to efficiently response to the annotator inputs in real-time. The experiments on several datasets demonstrate the effectiveness of InterFormer, which outperforms previous interactive segmentation models in terms of computational efficiency and segmentation quality, achieve real-time high-quality interactive segmentation on CPU-only devices.

When approximate design for fast homomorphic computation provides differential privacy guarantees

  • Authors: Arnaud Grivet Sébert, Martin Zuber, Oana Stan, Renaud Sirdey, Cédric Gouy-Pailler
  • Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02959
  • Pdf link: https://arxiv.org/pdf/2304.02959
  • Abstract
    While machine learning has become pervasive in as diversified fields as industry, healthcare, social networks, privacy concerns regarding the training data have gained a critical importance. In settings where several parties wish to collaboratively train a common model without jeopardizing their sensitive data, the need for a private training protocol is particularly stringent and implies to protect the data against both the model's end-users and the actors of the training phase. Differential privacy (DP) and cryptographic primitives are complementary popular countermeasures against privacy attacks. Among these cryptographic primitives, fully homomorphic encryption (FHE) offers ciphertext malleability at the cost of time-consuming operations in the homomorphic domain. In this paper, we design SHIELD, a probabilistic approximation algorithm for the argmax operator which is both fast when homomorphically executed and whose inaccuracy is used as a feature to ensure DP guarantees. Even if SHIELD could have other applications, we here focus on one setting and seamlessly integrate it in the SPEED collaborative training framework from "SPEED: Secure, PrivatE, and Efficient Deep learning" (Grivet S'ebert et al., 2021) to improve its computational efficiency. After thoroughly describing the FHE implementation of our algorithm and its DP analysis, we present experimental results. To the best of our knowledge, it is the first work in which relaxing the accuracy of an homomorphic calculation is constructively usable as a degree of freedom to achieve better FHE performances.

A Fast and Lightweight Network for Low-Light Image Enhancement

  • Authors: Yu Zhang, Xiaoguang Di, Junde Wu, RAO FU, Yong Li, Yue Wang, Yanwu Xu, Guohui YANG, Chunhui Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.02978
  • Pdf link: https://arxiv.org/pdf/2304.02978
  • Abstract
    Low-light images often suffer from severe noise, low brightness, low contrast, and color deviation. While several low-light image enhancement methods have been proposed, there remains a lack of efficient methods that can simultaneously solve all of these problems. In this paper, we introduce FLW-Net, a Fast and LightWeight Network for low-light image enhancement that significantly improves processing speed and overall effect. To achieve efficient low-light image enhancement, we recognize the challenges of the lack of an absolute reference and the need for a large receptive field to obtain global contrast. Therefore, we propose an efficient global feature information extraction component and design loss functions based on relative information to overcome these challenges. Finally, we conduct comparative experiments to demonstrate the effectiveness of the proposed method, and the results confirm that FLW-Net can significantly reduce the complexity of supervised low-light image enhancement networks while improving processing effect. Code is available at https://github.com/hitzhangyu/FLW-Net

IoT Federated Blockchain Learning at the Edge

  • Authors: James Calo, Benny Lo
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.03006
  • Pdf link: https://arxiv.org/pdf/2304.03006
  • Abstract
    IoT devices are sorely underutilized in the medical field, especially within machine learning for medicine, yet they offer unrivaled benefits. IoT devices are low-cost, energy-efficient, small and intelligent devices. In this paper, we propose a distributed federated learning framework for IoT devices, more specifically for IoMT (Internet of Medical Things), using blockchain to allow for a decentralized scheme improving privacy and efficiency over a centralized system; this allows us to move from the cloud-based architectures, that are prevalent, to the edge. The system is designed for three paradigms: 1) Training neural networks on IoT devices to allow for collaborative training of a shared model whilst decoupling the learning from the dataset to ensure privacy. Training is performed in an online manner simultaneously amongst all participants, allowing for the training of actual data that may not have been present in a dataset collected in the traditional way and dynamically adapt the system whilst it is being trained. 2) Training of an IoMT system in a fully private manner such as to mitigate the issue with confidentiality of medical data and to build robust, and potentially bespoke, models where not much, if any, data exists. 3) Distribution of the actual network training, something federated learning itself does not do, to allow hospitals, for example, to utilize their spare computing resources to train network models.

PointCAT: Cross-Attention Transformer for point cloud

  • Authors: Xincheng Yang, Mingze Jin, Weiji He, Qian Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03012
  • Pdf link: https://arxiv.org/pdf/2304.03012
  • Abstract
    Transformer-based models have significantly advanced natural language processing and computer vision in recent years. However, due to the irregular and disordered structure of point cloud data, transformer-based models for 3D deep learning are still in their infancy compared to other methods. In this paper we present Point Cross-Attention Transformer (PointCAT), a novel end-to-end network architecture using cross-attentions mechanism for point cloud representing. Our approach combines multi-scale features via two seprate cross-attention transformer branches. To reduce the computational increase brought by multi-branch structure, we further introduce an efficient model for shape classification, which only process single class token of one branch as a query to calculate attention map with the other. Extensive experiments demonstrate that our method outperforms or achieves comparable performance to several approaches in shape classification, part segmentation and semantic segmentation tasks.

Tensor Slicing and Optimization for Multicore NPUs

  • Authors: Rafael Sousa, Marcio Pereira, Yongin Kwon, Taeho Kim, Namsoon Jung, Chang Soo Kim, Michael Frank, Guido Araujo
  • Subjects: Performance (cs.PF); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03013
  • Pdf link: https://arxiv.org/pdf/2304.03013
  • Abstract
    Although code generation for Convolution Neural Network (CNN) models has been extensively studied, performing efficient data slicing and parallelization for highly-constrai-ned Multicore Neural Processor Units (NPUs) is still a challenging problem. Given the size of convolutions' input/output tensors and the small footprint of NPU on-chip memories, minimizing memory transactions while maximizing parallelism and MAC utilization are central to any effective solution. This paper proposes a TensorFlow XLA/LLVM compiler optimization pass for Multicore NPUs, called Tensor Slicing Optimization (TSO), which: (a) maximizes convolution parallelism and memory usage across NPU cores; and (b) reduces data transfers between host and NPU on-chip memories by using DRAM memory burst time estimates to guide tensor slicing. To evaluate the proposed approach, a set of experiments was performed using the NeuroMorphic Processor (NMP), a multicore NPU containing 32 RISC-V cores extended with novel CNN instructions. Experimental results show that TSO is capable of identifying the best tensor slicing that minimizes execution time for a set of CNN models. Speed-ups of up to 21.7% result when comparing the TSO burst-based technique to a no-burst data slicing approach. To validate the generality of the TSO approach, the algorithm was also ported to the Glow Machine Learning framework. The performance of the models were measured on both Glow and TensorFlow XLA/LLVM compilers, revealing similar results.

A computation of D(9) using FPGA Supercomputing

  • Authors: Lennart Van Hirtum, Patrick De Causmaecker, Jens Goemaere, Tobias Kenter, Heinrich Riebler, Michael Lass, Christian Plessl
  • Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)
  • Arxiv link: https://arxiv.org/abs/2304.03039
  • Pdf link: https://arxiv.org/pdf/2304.03039
  • Abstract
    This preprint makes the claim of having computed the $9^{th}$ Dedekind Number. This was done by building an efficient FPGA Accelerator for the core operation of the process, and parallelizing it on the Noctua 2 Supercluster at Paderborn University. The resulting value is 286386577668298411128469151667598498812366. This value can be verified in two steps. We have made the data file containing the 490M results available, each of which can be verified separately on CPU, and the whole file sums to our proposed value.

Data-driven HVAC Control Using Symbolic Regression: Design and Implementation

  • Authors: Yuki Ozawa, Dafang Zhao, Daichi Watari, Ittetsu Taniguchi, Toshihiro Suzuki, Yoshiyuki Shimoda, Takao Onoye
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.03078
  • Pdf link: https://arxiv.org/pdf/2304.03078
  • Abstract
    The large amount of data collected in buildings makes energy management smarter and more energy efficient. This study proposes a design and implementation methodology of data-driven heating, ventilation, and air conditioning (HVAC) control. Building thermodynamics is modeled using a symbolic regression model (SRM) built from the collected data. Additionally, an HVAC system model is also developed with a data-driven approach. A model predictive control (MPC) based HVAC scheduling is formulated with the developed models to minimize energy consumption and peak power demand and maximize thermal comfort. The performance of the proposed framework is demonstrated in the workspace in the actual campus building. The HVAC system using the proposed framework reduces the peak power by 16.1% compared to the widely used thermostat controller.

Offline Uncertainty Sampling in Data-driven Stochastic MPC

  • Authors: Johannes Teutsch, Sebastian Kerz, Tim Brüdigam, Dirk Wollherr, Marion Leibold
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.03088
  • Pdf link: https://arxiv.org/pdf/2304.03088
  • Abstract
    In this work, we exploit an offline-sampling based strategy for the constrained data-driven predictive control of an unknown linear system subject to random measurement noise. The strategy uses only past measured, potentially noisy data in a non-parametric system representation and does not require any prior model identification. The approximation of chance constraints using uncertainty sampling leads to efficient constraint tightening. Under mild assumptions, robust recursive feasibility and closed-loop constraint satisfaction is shown. In a simulation example, we provide evidence for the improved control performance of the proposed control scheme in comparison to a purely robust data-driven predictive control approach.

Inductive Graph Unlearning

  • Authors: Cheng-Long Wang, Mengdi Huai, Di Wang
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.03093
  • Pdf link: https://arxiv.org/pdf/2304.03093
  • Abstract
    As a way to implement the "right to be forgotten" in machine learning, \textit{machine unlearning} aims to completely remove the contributions and information of the samples to be deleted from a trained model without affecting the contributions of other samples. Recently, many frameworks for machine unlearning have been proposed, and most of them focus on image and text data. To extend machine unlearning to graph data, \textit{GraphEraser} has been proposed. However, a critical issue is that \textit{GraphEraser} is specifically designed for the transductive graph setting, where the graph is static and attributes and edges of test nodes are visible during training. It is unsuitable for the inductive setting, where the graph could be dynamic and the test graph information is invisible in advance. Such inductive capability is essential for production machine learning systems with evolving graphs like social media and transaction networks. To fill this gap, we propose the \underline{{\bf G}}\underline{{\bf U}}ided \underline{{\bf I}}n\underline{{\bf D}}uctiv\underline{{\bf E}} Graph Unlearning framework (GUIDE). GUIDE consists of three components: guided graph partitioning with fairness and balance, efficient subgraph repair, and similarity-based aggregation. Empirically, we evaluate our method on several inductive benchmarks and evolving transaction graphs. Generally speaking, GUIDE can be efficiently implemented on the inductive graph learning tasks for its low graph partition cost, no matter on computation or structure information. The code will be available here: https://github.com/Happy2Git/GUIDE.

FABRID: Flexible Attestation-Based Routing for Inter-Domain Networks

  • Authors: Cyrill Krähenbühl (ETH Zürich), Marc Wyss (ETH Zürich), David Basin (ETH Zürich), Vincent Lenders (armasuisse), Adrian Perrig (ETH Zürich), Martin Strohmeier (armasuisse)
  • Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.03108
  • Pdf link: https://arxiv.org/pdf/2304.03108
  • Abstract
    In its current state, the Internet does not provide end users with transparency and control regarding on-path forwarding devices. In particular, the lack of network device information reduces the trustworthiness of the forwarding path and prevents end-user applications requiring specific router capabilities from reaching their full potential. Moreover, the inability to influence the traffic's forwarding path results in applications communicating over undesired routes, while alternative paths with more desirable properties remain unusable. In this work, we present FABRID, a system that enables applications to forward traffic flexibly, potentially on multiple paths selected to comply with user-defined preferences, where information about forwarding devices is exposed and transparently attested by autonomous systems (ASes). The granularity of this information is chosen by each AS individually, protecting them from leaking sensitive network details, while the secrecy and authenticity of preferences embedded within the users' packets are protected through efficient cryptographic operations. We show the viability of FABRID by deploying it on a global SCION network test bed, and we demonstrate high throughput on commodity hardware.

Simplifying Content-Based Neural News Recommendation: On User Modeling and Training Objectives

  • Authors: Andreea Iana, Goran Glavaš, Heiko Paulheim
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.03112
  • Pdf link: https://arxiv.org/pdf/2304.03112
  • Abstract
    The advent of personalized news recommendation has given rise to increasingly complex recommender architectures. Most neural news recommenders rely on user click behavior and typically introduce dedicated user encoders that aggregate the content of clicked news into user embeddings (early fusion). These models are predominantly trained with standard point-wise classification objectives. The existing body of work exhibits two main shortcomings: (1) despite general design homogeneity, direct comparisons between models are hindered by varying evaluation datasets and protocols; (2) it leaves alternative model designs and training objectives vastly unexplored. In this work, we present a unified framework for news recommendation, allowing for a systematic and fair comparison of news recommenders across several crucial design dimensions: (i) candidate-awareness in user modeling, (ii) click behavior fusion, and (iii) training objectives. Our findings challenge the status quo in neural news recommendation. We show that replacing sizable user encoders with parameter-efficient dot products between candidate and clicked news embeddings (late fusion) often yields substantial performance gains. Moreover, our results render contrastive training a viable alternative to point-wise classification objectives.

Zero-shot Generative Model Adaptation via Image-specific Prompt Learning

  • Authors: Jiayi Guo, Chaofei Wang, You Wu, Eric Zhang, Kai Wang, Xingqian Xu, Shiji Song, Humphrey Shi, Gao Huang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03119
  • Pdf link: https://arxiv.org/pdf/2304.03119
  • Abstract
    Recently, CLIP-guided image synthesis has shown appealing performance on adapting a pre-trained source-domain generator to an unseen target domain. It does not require any target-domain samples but only the textual domain labels. The training is highly efficient, e.g., a few minutes. However, existing methods still have some limitations in the quality of generated images and may suffer from the mode collapse issue. A key reason is that a fixed adaptation direction is applied for all cross-domain image pairs, which leads to identical supervision signals. To address this issue, we propose an Image-specific Prompt Learning (IPL) method, which learns specific prompt vectors for each source-domain image. This produces a more precise adaptation direction for every cross-domain image pair, endowing the target-domain generator with greatly enhanced flexibility. Qualitative and quantitative evaluations on various domains demonstrate that IPL effectively improves the quality and diversity of synthesized images and alleviates the mode collapse. Moreover, IPL is independent of the structure of the generative model, such as generative adversarial networks or diffusion models. Code is available at https://github.com/Picsart-AI-Research/IPL-Zero-Shot-Generative-Model-Adaptation.

BotTriNet: A Unified and Efficient Embedding for Social Bots Detection via Metric Learning

  • Authors: Jun Wu, Xuesong Ye, Man Yan Yuet
  • Subjects: Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.03144
  • Pdf link: https://arxiv.org/pdf/2304.03144
  • Abstract
    A persistently popular topic in online social networks is the rapid and accurate discovery of bot accounts to prevent their invasion and harassment of genuine users. We propose a unified embedding framework called BOTTRINET, which utilizes textual content posted by accounts for bot detection based on the assumption that contexts naturally reveal account personalities and habits. Content is abundant and valuable if the system efficiently extracts bot-related information using embedding techniques. Beyond the general embedding framework that generates word, sentence, and account embeddings, we design a triplet network to tune the raw embeddings (produced by traditional natural language processing techniques) for better classification performance. We evaluate detection accuracy and f1score on a real-world dataset CRESCI2017, comprising three bot account categories and five bot sample sets. Our system achieves the highest average accuracy of 98.34% and f1score of 97.99% on two content-intensive bot sets, outperforming previous work and becoming state-of-the-art. It also makes a breakthrough on four content-less bot sets, with an average accuracy improvement of 11.52% and an average f1score increase of 16.70%.

Parameterized Approximation Schemes for Clustering with General Norm Objectives

  • Authors: Fateme Abbasi, Sandip Banerjee, Jarosław Byrka, Parinya Chalermsook, Ameet Gadekar, Kamyar Khodamoradi, Dániel Marx, Roohani Sharma, Joachim Spoerhase
  • Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03146
  • Pdf link: https://arxiv.org/pdf/2304.03146
  • Abstract
    This paper considers the well-studied algorithmic regime of designing a $(1+\epsilon)$-approximation algorithm for a $k$-clustering problem that runs in time $f(k,\epsilon)poly(n)$ (sometimes called an efficient parameterized approximation scheme or EPAS for short). Notable results of this kind include EPASes in the high-dimensional Euclidean setting for $k$-center [Bad\u{o}iu, Har-Peled, Indyk; STOC'02] as well as $k$-median, and $k$-means [Kumar, Sabharwal, Sen; J. ACM 2010]. However, existing EPASes handle only basic objectives (such as $k$-center, $k$-median, and $k$-means) and are tailored to the specific objective and metric space. Our main contribution is a clean and simple EPAS that settles more than ten clustering problems (across multiple well-studied objectives as well as metric spaces) and unifies well-known EPASes. Our algorithm gives EPASes for a large variety of clustering objectives (for example, $k$-means, $k$-center, $k$-median, priority $k$-center, $\ell$-centrum, ordered $k$-median, socially fair $k$-median aka robust $k$-median, or more generally monotone norm $k$-clustering) and metric spaces (for example, continuous high-dimensional Euclidean spaces, metrics of bounded doubling dimension, bounded treewidth metrics, and planar metrics). Key to our approach is a new concept that we call bounded $\epsilon$-scatter dimension--an intrinsic complexity measure of a metric space that is a relaxation of the standard notion of bounded doubling dimension. Our main technical result shows that two conditions are essentially sufficient for our algorithm to yield an EPAS on the input metric $M$ for any clustering objective: (i) The objective is described by a monotone (not necessarily symmetric!) norm, and (ii) the $\epsilon$-scatter dimension of $M$ is upper bounded by a function of $\epsilon$.

Spectral Toolkit of Algorithms for Graphs: Technical Report (1)

  • Authors: Peter Macgregor, He Sun
  • Subjects: Social and Information Networks (cs.SI); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Mathematical Software (cs.MS)
  • Arxiv link: https://arxiv.org/abs/2304.03170
  • Pdf link: https://arxiv.org/pdf/2304.03170
  • Abstract
    Spectral Toolkit of Algorithms for Graphs (STAG) is an open-source library for efficient spectral graph algorithms, and its development starts in September 2022. We have so far finished the component on local graph clustering, and this technical report presents a user's guide to STAG, showcase studies, and several technical considerations behind our development.

Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream

  • Authors: Yuheng Jiang, Kaixin Yao, Zhuo Su, Zhehao Shen, Haimin Luo, Lan Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03184
  • Pdf link: https://arxiv.org/pdf/2304.03184
  • Abstract
    Convenient 4D modeling of human-object interactions is essential for numerous applications. However, monocular tracking and rendering of complex interaction scenarios remain challenging. In this paper, we propose Instant-NVR, a neural approach for instant volumetric human-object tracking and rendering using a single RGBD camera. It bridges traditional non-rigid tracking with recent instant radiance field techniques via a multi-thread tracking-rendering mechanism. In the tracking front-end, we adopt a robust human-object capture scheme to provide sufficient motion priors. We further introduce a separated instant neural representation with a novel hybrid deformation module for the interacting scene. We also provide an on-the-fly reconstruction scheme of the dynamic/static radiance fields via efficient motion-prior searching. Moreover, we introduce an online key frame selection scheme and a rendering-aware refinement strategy to significantly improve the appearance details for online novel-view synthesis. Extensive experiments demonstrate the effectiveness and efficiency of our approach for the instant generation of human-object radiance fields on the fly, notably achieving real-time photo-realistic novel view synthesis under complex human-object interactions.

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

  • Authors: Nolan Dey, Gurpreet Gosal, Zhiming (Charles)Chen, Hemant Khachane, William Marshall, Ribhu Pathria, Marvin Tom, Joel Hestness
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.03208
  • Pdf link: https://arxiv.org/pdf/2304.03208
  • Abstract
    We study recent research advances that improve large language models through efficient pre-training and scaling, and open datasets and tools. We combine these advances to introduce Cerebras-GPT, a family of open compute-optimal language models scaled from 111M to 13B parameters. We train Cerebras-GPT models on the Eleuther Pile dataset following DeepMind Chinchilla scaling rules for efficient pre-training (highest accuracy for a given compute budget). We characterize the predictable power-law scaling and compare Cerebras-GPT with other publicly-available models to show all Cerebras-GPT models have state-of-the-art training efficiency on both pre-training and downstream objectives. We describe our learnings including how Maximal Update Parameterization ($\mu$P) can further improve large model scaling, improving accuracy and hyperparameter predictability at scale. We release our pre-trained models and code, making this paper the first open and reproducible work comparing compute-optimal model scaling to models trained on fixed dataset sizes. Cerebras-GPT models are available on HuggingFace: https://huggingface.co/cerebras.

Hierarchical Graph Neural Network with Cross-Attention for Cross-Device User Matching

  • Authors: Ali Taghibakhshi, Mingyuan Ma, Ashwath Aithal, Onur Yilmaz, Haggai Maron, Matthew West
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.03215
  • Pdf link: https://arxiv.org/pdf/2304.03215
  • Abstract
    Cross-device user matching is a critical problem in numerous domains, including advertising, recommender systems, and cybersecurity. It involves identifying and linking different devices belonging to the same person, utilizing sequence logs. Previous data mining techniques have struggled to address the long-range dependencies and higher-order connections between the logs. Recently, researchers have modeled this problem as a graph problem and proposed a two-tier graph contextual embedding (TGCE) neural network architecture, which outperforms previous methods. In this paper, we propose a novel hierarchical graph neural network architecture (HGNN), which has a more computationally efficient second level design than TGCE. Furthermore, we introduce a cross-attention (Cross-Att) mechanism in our model, which improves performance by 5% compared to the state-of-the-art TGCE method.

FedBot: Enhancing Privacy in Chatbots with Federated Learning

  • Authors: Addi Ait-Mlouk, Sadi Alawadi, Salman Toor, Andreas Hellander
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03228
  • Pdf link: https://arxiv.org/pdf/2304.03228
  • Abstract
    Chatbots are mainly data-driven and usually based on utterances that might be sensitive. However, training deep learning models on shared data can violate user privacy. Such issues have commonly existed in chatbots since their inception. In the literature, there have been many approaches to deal with privacy, such as differential privacy and secure multi-party computation, but most of them need to have access to users' data. In this context, Federated Learning (FL) aims to protect data privacy through distributed learning methods that keep the data in its location. This paper presents Fedbot, a proof-of-concept (POC) privacy-preserving chatbot that leverages large-scale customer support data. The POC combines Deep Bidirectional Transformer models and federated learning algorithms to protect customer data privacy during collaborative model training. The results of the proof-of-concept showcase the potential for privacy-preserving chatbots to transform the customer support industry by delivering personalized and efficient customer service that meets data privacy regulations and legal requirements. Furthermore, the system is specifically designed to improve its performance and accuracy over time by leveraging its ability to learn from previous interactions.

DiffMimic: Efficient Motion Mimicking with Differentiable Physics

  • Authors: Jiawei Ren, Cunjun Yu, Siwei Chen, Xiao Ma, Liang Pan, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03274
  • Pdf link: https://arxiv.org/pdf/2304.03274
  • Abstract
    Motion mimicking is a foundational task in physics-based character animation. However, most existing motion mimicking methods are built upon reinforcement learning (RL) and suffer from heavy reward engineering, high variance, and slow convergence with hard explorations. Specifically, they usually take tens of hours or even days of training to mimic a simple motion sequence, resulting in poor scalability. In this work, we leverage differentiable physics simulators (DPS) and propose an efficient motion mimicking method dubbed DiffMimic. Our key insight is that DPS casts a complex policy learning task to a much simpler state matching problem. In particular, DPS learns a stable policy by analytical gradients with ground-truth physical priors hence leading to significantly faster and stabler convergence than RL-based methods. Moreover, to escape from local optima, we utilize a Demonstration Replay mechanism to enable stable gradient backpropagation in a long horizon. Extensive experiments on standard benchmarks show that DiffMimic has a better sample efficiency and time efficiency than existing methods (e.g., DeepMimic). Notably, DiffMimic allows a physically simulated character to learn Backflip after 10 minutes of training and be able to cycle it after 3 hours of training, while the existing approach may require about a day of training to cycle Backflip. More importantly, we hope DiffMimic can benefit more differentiable animation systems with techniques like differentiable clothes simulation in future research.

Keyword: faster

DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model

  • Authors: Hoigi Seo, Hayeon Kim, Gwanghyun Kim, Se Young Chun
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.02827
  • Pdf link: https://arxiv.org/pdf/2304.02827
  • Abstract
    The increasing demand for high-quality 3D content creation has motivated the development of automated methods for creating 3D object models from a single image and/or from a text prompt. However, the reconstructed 3D objects using state-of-the-art image-to-3D methods still exhibit low correspondence to the given image and low multi-view consistency. Recent state-of-the-art text-to-3D methods are also limited, yielding 3D samples with low diversity per prompt with long synthesis time. To address these challenges, we propose DITTO-NeRF, a novel pipeline to generate a high-quality 3D NeRF model from a text prompt or a single image. Our DITTO-NeRF consists of constructing high-quality partial 3D object for limited in-boundary (IB) angles using the given or text-generated 2D image from the frontal view and then iteratively reconstructing the remaining 3D NeRF using inpainting latent diffusion model. We propose progressive 3D object reconstruction schemes in terms of scales (low to high resolution), angles (IB angles initially to outer-boundary (OB) later), and masks (object to background boundary) in our DITTO-NeRF so that high-quality information on IB can be propagated into OB. Our DITTO-NeRF outperforms state-of-the-art methods in terms of fidelity and diversity qualitatively and quantitatively with much faster training times than prior arts on image/text-to-3D such as DreamFusion, and NeuralLift-360.

Convolutional neural networks for crack detection on flexible road pavements

  • Authors: Hermann Tapamo, Anna Bosman, James Maina, Emile Horak
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02933
  • Pdf link: https://arxiv.org/pdf/2304.02933
  • Abstract
    Flexible road pavements deteriorate primarily due to traffic and adverse environmental conditions. Cracking is the most common deterioration mechanism; the surveying thereof is typically conducted manually using internationally defined classification standards. In South Africa, the use of high-definition video images has been introduced, which allows for safer road surveying. However, surveying is still a tedious manual process. Automation of the detection of defects such as cracks would allow for faster analysis of road networks and potentially reduce human bias and error. This study performs a comparison of six state-of-the-art convolutional neural network models for the purpose of crack detection. The models are pretrained on the ImageNet dataset, and fine-tuned using a new real-world binary crack dataset consisting of 14000 samples. The effects of dataset augmentation are also investigated. Of the six models trained, five achieved accuracy above 97%. The highest recorded accuracy was 98%, achieved by the ResNet and VGG16 models. The dataset is available at the following URL: https://zenodo.org/record/7795975

Boundary-Denoising for Video Activity Localization

  • Authors: Mengmeng Xu, Mattia Soldan, Jialin Gao, Shuming Liu, Juan-Manuel Pérez-Rúa, Bernard Ghanem
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02934
  • Pdf link: https://arxiv.org/pdf/2304.02934
  • Abstract
    Video activity localization aims at understanding the semantic content in long untrimmed videos and retrieving actions of interest. The retrieved action with its start and end locations can be used for highlight generation, temporal action detection, etc. Unfortunately, learning the exact boundary location of activities is highly challenging because temporal activities are continuous in time, and there are often no clear-cut transitions between actions. Moreover, the definition of the start and end of events is subjective, which may confuse the model. To alleviate the boundary ambiguity, we propose to study the video activity localization problem from a denoising perspective. Specifically, we propose an encoder-decoder model named DenoiseLoc. During training, a set of action spans is randomly generated from the ground truth with a controlled noise scale. Then we attempt to reverse this process by boundary denoising, allowing the localizer to predict activities with precise boundaries and resulting in faster convergence speed. Experiments show that DenoiseLoc advances %in several video activity understanding tasks. For example, we observe a gain of +12.36% average mAP on QV-Highlights dataset and +1.64% [email protected] on THUMOS'14 dataset over the baseline. Moreover, DenoiseLoc achieves state-of-the-art performance on TACoS and MAD datasets, but with much fewer predictions compared to other current methods.

Training a Two Layer ReLU Network Analytically

  • Authors: Adrian Barbu
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02972
  • Pdf link: https://arxiv.org/pdf/2304.02972
  • Abstract
    Neural networks are usually trained with different variants of gradient descent based optimization algorithms such as stochastic gradient descent or the Adam optimizer. Recent theoretical work states that the critical points (where the gradient of the loss is zero) of two-layer ReLU networks with the square loss are not all local minima. However, in this work we will explore an algorithm for training two-layer neural networks with ReLU-like activation and the square loss that alternatively finds the critical points of the loss function analytically for one layer while keeping the other layer and the neuron activation pattern fixed. Experiments indicate that this simple algorithm can find deeper optima than Stochastic Gradient Descent or the Adam optimizer, obtaining significantly smaller training loss values on four out of the five real datasets evaluated. Moreover, the method is faster than the gradient descent methods and has virtually no tuning parameters.

Patch-wise Features for Blur Image Classification

  • Authors: Sri Charan Kattamuru, Kshitij Agrawal, Shyam Prasad Adhikari, Abhishek Bose, Hemant Misra
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03156
  • Pdf link: https://arxiv.org/pdf/2304.03156
  • Abstract
    Images captured through smartphone cameras often suffer from degradation, blur being one of the major ones, posing a challenge in processing these images for downstream tasks. In this paper we propose low-compute lightweight patch-wise features for image quality assessment. Using our method we can discriminate between blur vs sharp image degradation. To this end, we train a decision-tree based XGBoost model on various intuitive image features like gray level variance, first and second order gradients, texture features like local binary patterns. Experiments conducted on an open dataset show that the proposed low compute method results in 90.1% mean accuracy on the validation set, which is comparable to the accuracy of a compute-intensive VGG16 network with 94% mean accuracy fine-tuned to this task. To demonstrate the generalizability of our proposed features and model we test the model on BHBID dataset and an internal dataset where we attain accuracy of 98% and 91%, respectively. The proposed method is 10x faster than the VGG16 based model on CPU and scales linearly to the input image size making it suitable to be implemented on low compute edge devices.

DiffMimic: Efficient Motion Mimicking with Differentiable Physics

  • Authors: Jiawei Ren, Cunjun Yu, Siwei Chen, Xiao Ma, Liang Pan, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03274
  • Pdf link: https://arxiv.org/pdf/2304.03274
  • Abstract
    Motion mimicking is a foundational task in physics-based character animation. However, most existing motion mimicking methods are built upon reinforcement learning (RL) and suffer from heavy reward engineering, high variance, and slow convergence with hard explorations. Specifically, they usually take tens of hours or even days of training to mimic a simple motion sequence, resulting in poor scalability. In this work, we leverage differentiable physics simulators (DPS) and propose an efficient motion mimicking method dubbed DiffMimic. Our key insight is that DPS casts a complex policy learning task to a much simpler state matching problem. In particular, DPS learns a stable policy by analytical gradients with ground-truth physical priors hence leading to significantly faster and stabler convergence than RL-based methods. Moreover, to escape from local optima, we utilize a Demonstration Replay mechanism to enable stable gradient backpropagation in a long horizon. Extensive experiments on standard benchmarks show that DiffMimic has a better sample efficiency and time efficiency than existing methods (e.g., DeepMimic). Notably, DiffMimic allows a physically simulated character to learn Backflip after 10 minutes of training and be able to cycle it after 3 hours of training, while the existing approach may require about a day of training to cycle Backflip. More importantly, we hope DiffMimic can benefit more differentiable animation systems with techniques like differentiable clothes simulation in future research.

Keyword: mobile

Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural Networks

  • Authors: Michael Weiss, Paolo Tonella
  • Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.02654
  • Pdf link: https://arxiv.org/pdf/2304.02654
  • Abstract
    Recent decades have seen the rise of large-scale Deep Neural Networks (DNNs) to achieve human-competitive performance in a variety of artificial intelligence tasks. Often consisting of hundreds of millions, if not hundreds of billion parameters, these DNNs are too large to be deployed to, or efficiently run on resource-constrained devices such as mobile phones or IoT microcontrollers. Systems relying on large-scale DNNs thus have to call the corresponding model over the network, leading to substantial costs for hosting and running the large-scale remote model, costs which are often charged on a per-use basis. In this paper, we propose BiSupervised, a novel architecture, where, before relying on a large remote DNN, a system attempts to make a prediction on a small-scale local model. A DNN supervisor monitors said prediction process and identifies easy inputs for which the local prediction can be trusted. For these inputs, the remote model does not have to be invoked, thus saving costs, while only marginally impacting the overall system accuracy. Our architecture furthermore foresees a second supervisor to monitor the remote predictions and identify inputs for which not even these can be trusted, allowing to raise an exception or run a fallback strategy instead. We evaluate the cost savings, and the ability to detect incorrectly predicted inputs on four diverse case studies: IMDB movie review sentiment classification, Github issue triaging, Imagenet image classification, and SQuADv2 free-text question answering

Adaptive Headway Motion Control and Motion Prediction for Safe Unicycle Motion Design

  • Authors: Aykut İşleyen, Nathan van de Wouw, Ömür Arslan
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02760
  • Pdf link: https://arxiv.org/pdf/2304.02760
  • Abstract
    Differential drive robots that can be modeled as a kinematic unicycle are a standard mobile base platform for many service and logistics robots. Safe and smooth autonomous motion around obstacles is a crucial skill for unicycle robots to perform diverse tasks in complex environments. A classical control approach for unicycle control is feedback linearization using a headway point at a fixed headway distance in front of the unicycle. The unicycle headway control brings the headway point to a desired goal location by embedding a linear headway reference dynamics, which often results in an undesired offset for the actual unicycle position. In this paper, we introduce a new unicycle headway control approach with an adaptive headway distance that overcomes this limitation, i.e., when the headway point reaches the goal the unicycle position is also at the goal. By systematically analyzing the closed-loop unicycle motion under the adaptive headway controller, we design analytical feedback motion prediction methods that bound the closed-loop unicycle position trajectory and so can be effectively used for safety assessment and safe unicycle motion design around obstacles. We present an application of adaptive headway motion control and motion prediction for safe unicycle path following around obstacles in numerical simulations.

Evaluating Customization of Remote Tele-operation Interfaces for Assistive Robots

  • Authors: Vinitha Ranganeni, Noah Ponto, Maya Cakmak
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02771
  • Pdf link: https://arxiv.org/pdf/2304.02771
  • Abstract
    Mobile manipulator platforms, like the Stretch RE1 robot, make the promise of in-home robotic assistance feasible. For people with severe physical limitations, like those with quadriplegia, the ability to tele-operate these robots themselves means that they can perform physical tasks they cannot otherwise do themselves, thereby increasing their level of independence. In order for users with physical limitations to operate these robots, their interfaces must be accessible and cater to the specific needs of all users. As physical limitations vary amongst users, it is difficult to make a single interface that will accommodate all users. Instead, such interfaces should be customizable to each individual user. In this paper we explore the value of customization of a browser-based interface for tele-operating the Stretch RE1 robot. More specifically, we evaluate the usability and effectiveness of a customized interface in comparison to the default interface configurations from prior work. We present a user study involving participants with motor impairments (N=10) and without motor impairments, who could serve as a caregiver, (N=13) that use the robot to perform mobile manipulation tasks in a real kitchen environment. Our study demonstrates that no single interface configuration satisfies all users' needs and preferences. Users perform better when using the customized interface for navigation, but not for manipulation due to higher complexity of learning to manipulate through the robot. All participants are able to use the robot to complete all tasks and participants with motor impairments believe that having the robot in their home would make them more independent.

Gotta Assess `Em All: A Risk Analysis of Criminal Offenses Facilitated through PokemonGO

  • Authors: Ashly Fuller, Martin Lo, Angelica Holmes, Lu Lemanski, Marie Vasek, Enrico Mariconti
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.02952
  • Pdf link: https://arxiv.org/pdf/2304.02952
  • Abstract
    Location-based games have come to the forefront of popularity in casual and mobile gaming over the past six years. However, there is no hard data on crimes that these games enable, ranging from assault to cyberstalking to grooming. Given these potential harms, we conduct a risk assessment and quasi-experiment on the game features of location-based games. Using PokemonGO as a case study, we identify and establish cyber-enabled stalking as the main risk event where in-game features such as an innocent function to share in-game postcards can be exploited by malicious users. Users obtain postcards that are unique to each Pokestop and represent gifts that can be shared with in-game friends. The number of postcards that each user can retain is limited, so they send the excess to their friends with items that boost their friends' game activities. The postcard often also unintentionally leaks the users' commonly visited locations to their in-game friends. We analyze these in-game features using risk assessment and identify cyber-enabled stalking as one of the main threats. We further evaluate the feasibility of this crime through a quasi-experiment. Our results show that participants' routine locations such as home and work can be reliably re-identified within days from the first gift exchange. This exploitation of a previously unconsidered in-game feature enables physical stalking of previously unknown persons which can escalate into more serious crimes. Given current data protection legislation in Europe, further preventive measures are required by Niantic to protect pseudonymized users from being re-identified by in-game features and (potentially) stalked.

SwarmGear: Heterogeneous Swarm of Drones with Reconfigurable Leader Drone and Virtual Impedance Links for Multi-Robot Inspection

  • Authors: Zhanibek Darush, Mikhail Martynov, Aleksey Fedoseev, Aleksei Shcherbak, Dzmitry Tsetserukou
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02956
  • Pdf link: https://arxiv.org/pdf/2304.02956
  • Abstract
    The continuous monitoring by drone swarms remains a challenging problem due to the lack of power supply and the inability of drones to land on uneven surfaces. Heterogeneous swarms, including ground and aerial vehicles, can support longer inspections and carry a higher number of sensors on board. However, their capabilities are limited by the mobility of wheeled and legged robots in a cluttered environment. In this paper, we propose a novel concept for autonomous inspection that we call SwarmGear. SwarmGear utilizes a heterogeneous swarm that investigates the environment in a leader-follower formation. The leader drone is able to land on rough terrain and traverse it by four compliant robotic legs, possessing both the functionalities of an aerial and mobile robot. To preserve the formation of the swarm during its motion, virtual impedance links were developed between the leader and the follower drones. We evaluated experimentally the accuracy of the hybrid leader drone's ground locomotion. By changing the step parameters, the optimal step configuration was found. Two types of gaits were evaluated. The experiments revealed low crosstrack error (mean of 2 cm and max of 4.8 cm) and the ability of the leader drone to move with a 190 mm step length and a 3 degree standard yaw deviation. Four types of drone formations were considered. The best formation was used for experiments with SwarmGear, and it showed low overall crosstrack error for the swarm (mean 7.9 cm for the type 1 gait and 5.1 cm for the type 2 gait). The proposed system can potentially improve the performance of autonomous swarms in cluttered and unstructured environments by allowing all agents of the swarm to switch between aerial and ground formations to overcome various obstacles and perform missions over a large area.

Spritz-PS: Validation of Synthetic Face Images Using a Large Dataset of Printed Documents

  • Authors: Ehsan Nowroozi, Yoosef Habibi, Mauro Conti
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computers and Society (cs.CY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02982
  • Pdf link: https://arxiv.org/pdf/2304.02982
  • Abstract
    The capability of doing effective forensic analysis on printed and scanned (PS) images is essential in many applications. PS documents may be used to conceal the artifacts of images which is due to the synthetic nature of images since these artifacts are typically present in manipulated images and the main artifacts in the synthetic images can be removed after the PS. Due to the appeal of Generative Adversarial Networks (GANs), synthetic face images generated with GANs models are difficult to differentiate from genuine human faces and may be used to create counterfeit identities. Additionally, since GANs models do not account for physiological constraints for generating human faces and their impact on human IRISes, distinguishing genuine from synthetic IRISes in the PS scenario becomes extremely difficult. As a result of the lack of large-scale reference IRIS datasets in the PS scenario, we aim at developing a novel dataset to become a standard for Multimedia Forensics (MFs) investigation which is available at [45]. In this paper, we provide a novel dataset made up of a large number of synthetic and natural printed IRISes taken from VIPPrint Printed and Scanned face images. We extracted irises from face images and it is possible that the model due to eyelid occlusion captured the incomplete irises. To fill the missing pixels of extracted iris, we applied techniques to discover the complex link between the iris images. To highlight the problems involved with the evaluation of the dataset's IRIS images, we conducted a large number of analyses employing Siamese Neural Networks to assess the similarities between genuine and synthetic human IRISes, such as ResNet50, Xception, VGG16, and MobileNet-v2. For instance, using the Xception network, we achieved 56.76% similarity of IRISes for synthetic images and 92.77% similarity of IRISes for real images.

Keyword: pruning

To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency

  • Authors: Daniel Campos, ChengXiang Zhai
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.02721
  • Pdf link: https://arxiv.org/pdf/2304.02721
  • Abstract
    Sequence-to-sequence language models can be used to produce abstractive summaries which are coherent, relevant, and concise. Still, model sizes can make deployment in latency-sensitive or web-scale implementations difficult. This paper studies the relationship between model size, structured pruning, inference efficiency, and summarization accuracy on widely used summarization datasets. We show that model accuracy is tied to the encoder size while inference efficiency is connected to the decoder. Using asymmetric pruning can lead to nearly 3x improvement in inference latency with ~1 point loss in Rouge-2. Moreover, we find both the average degradation and the role of asymmetry to be consistent across model sizes and variations in datasets.

NTK-SAP: Improving neural network pruning by aligning training dynamics

  • Authors: Yite Wang, Dawei Li, Ruoyu Sun
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02840
  • Pdf link: https://arxiv.org/pdf/2304.02840
  • Abstract
    Pruning neural networks before training has received increasing interest due to its potential to reduce training time and memory. One popular method is to prune the connections based on a certain metric, but it is not entirely clear what metric is the best choice. Recent advances in neural tangent kernel (NTK) theory suggest that the training dynamics of large enough neural networks is closely related to the spectrum of the NTK. Motivated by this finding, we propose to prune the connections that have the least influence on the spectrum of the NTK. This method can help maintain the NTK spectrum, which may help align the training dynamics to that of its dense counterpart. However, one possible issue is that the fixed-weight-NTK corresponding to a given initial point can be very different from the NTK corresponding to later iterates during the training phase. We further propose to sample multiple realizations of random weights to estimate the NTK spectrum. Note that our approach is weight-agnostic, which is different from most existing methods that are weight-dependent. In addition, we use random inputs to compute the fixed-weight-NTK, making our method data-agnostic as well. We name our foresight pruning algorithm Neural Tangent Kernel Spectrum-Aware Pruning (NTK-SAP). Empirically, our method achieves better performance than all baselines on multiple datasets.

Learning to Learn with Indispensable Connections

  • Authors: Sambhavi Tiwari, Manas Gogoi, Shekhar Verma, Krishna Pratap Singh
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02862
  • Pdf link: https://arxiv.org/pdf/2304.02862
  • Abstract
    Meta-learning aims to solve unseen tasks with few labelled instances. Nevertheless, despite its effectiveness for quick learning in existing optimization-based methods, it has several flaws. Inconsequential connections are frequently seen during meta-training, which results in an over-parameterized neural network. Because of this, meta-testing observes unnecessary computations and extra memory overhead. To overcome such flaws. We propose a novel meta-learning method called Meta-LTH that includes indispensible (necessary) connections. We applied the lottery ticket hypothesis technique known as magnitude pruning to generate these crucial connections that can effectively solve few-shot learning problem. We aim to perform two things: (a) to find a sub-network capable of more adaptive meta-learning and (b) to learn new low-level features of unseen tasks and recombine those features with the already learned features during the meta-test phase. Experimental results show that our proposed Met-LTH method outperformed existing first-order MAML algorithm for three different classification datasets. Our method improves the classification accuracy by approximately 2% (20-way 1-shot task setting) for omniglot dataset.

Keyword: voxel

VPFusion: Towards Robust Vertical Representation Learning for 3D Object Detection

  • Authors: Yuhao Huang, Sanping Zhou, Junjie Zhang, Jinpeng Dong, Nanning Zheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02867
  • Pdf link: https://arxiv.org/pdf/2304.02867
  • Abstract
    Efficient point cloud representation is a fundamental element of Lidar-based 3D object detection. Recent grid-based detectors usually divide point clouds into voxels or pillars and construct single-stream networks in Bird's Eye View. However, these point cloud encoding paradigms underestimate the point representation in the vertical direction, which cause the loss of semantic or fine-grained information, especially for vertical sensitive objects like pedestrian and cyclists. In this paper, we propose an explicit vertical multi-scale representation learning framework, VPFusion, to combine the complementary information from both voxel and pillar streams. Specifically, VPFusion first builds upon a sparse voxel-pillar-based backbone. The backbone divides point clouds into voxels and pillars, then encodes features with 3D and 2D sparse convolution simultaneously. Next, we introduce the Sparse Fusion Layer (SFL), which establishes a bidirectional pathway for sparse voxel and pillar features to enable the interaction between them. Additionally, we present the Dense Fusion Neck (DFN) to effectively combine the dense feature maps from voxel and pillar branches with multi-scale. Extensive experiments on the large-scale Waymo Open Dataset and nuScenes Dataset demonstrate that VPFusion surpasses the single-stream baselines by a large margin and achieves state-of-the-art performance with real-time inference speed.

Keyword: lidar

VPFusion: Towards Robust Vertical Representation Learning for 3D Object Detection

  • Authors: Yuhao Huang, Sanping Zhou, Junjie Zhang, Jinpeng Dong, Nanning Zheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02867
  • Pdf link: https://arxiv.org/pdf/2304.02867
  • Abstract
    Efficient point cloud representation is a fundamental element of Lidar-based 3D object detection. Recent grid-based detectors usually divide point clouds into voxels or pillars and construct single-stream networks in Bird's Eye View. However, these point cloud encoding paradigms underestimate the point representation in the vertical direction, which cause the loss of semantic or fine-grained information, especially for vertical sensitive objects like pedestrian and cyclists. In this paper, we propose an explicit vertical multi-scale representation learning framework, VPFusion, to combine the complementary information from both voxel and pillar streams. Specifically, VPFusion first builds upon a sparse voxel-pillar-based backbone. The backbone divides point clouds into voxels and pillars, then encodes features with 3D and 2D sparse convolution simultaneously. Next, we introduce the Sparse Fusion Layer (SFL), which establishes a bidirectional pathway for sparse voxel and pillar features to enable the interaction between them. Additionally, we present the Dense Fusion Neck (DFN) to effectively combine the dense feature maps from voxel and pillar branches with multi-scale. Extensive experiments on the large-scale Waymo Open Dataset and nuScenes Dataset demonstrate that VPFusion surpasses the single-stream baselines by a large margin and achieves state-of-the-art performance with real-time inference speed.

Geometric-aware Pretraining for Vision-centric 3D Object Detection

  • Authors: Linyan Huang, Huijie Wang, Jia Zeng, Shengchuan Zhang, Liujuan Cao, Rongrong Ji, Junchi Yan, Hongyang Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03105
  • Pdf link: https://arxiv.org/pdf/2304.03105
  • Abstract
    Multi-camera 3D object detection for autonomous driving is a challenging problem that has garnered notable attention from both academia and industry. An obstacle encountered in vision-based techniques involves the precise extraction of geometry-conscious features from RGB images. Recent approaches have utilized geometric-aware image backbones pretrained on depth-relevant tasks to acquire spatial information. However, these approaches overlook the critical aspect of view transformation, resulting in inadequate performance due to the misalignment of spatial knowledge between the image backbone and view transformation. To address this issue, we propose a novel geometric-aware pretraining framework called GAPretrain. Our approach incorporates spatial and structural cues to camera networks by employing the geometric-rich modality as guidance during the pretraining phase. The transference of modal-specific attributes across different modalities is non-trivial, but we bridge this gap by using a unified bird's-eye-view (BEV) representation and structural hints derived from LiDAR point clouds to facilitate the pretraining process. GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors. Our experiments demonstrate the effectiveness and generalization ability of the proposed method. We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively. We also conduct experiments on various image backbones and view transformations to validate the efficacy of our approach. Code will be released at https://github.com/OpenDriveLab/BEVPerception-Survey-Recipe.

SALUDA: Surface-based Automotive Lidar Unsupervised Domain Adaptation

  • Authors: Bjoern Michele, Alexandre Boulch, Gilles Puy, Tuan-Hung Vu, Renaud Marlet, Nicolas Courty
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03251
  • Pdf link: https://arxiv.org/pdf/2304.03251
  • Abstract
    Learning models on one labeled dataset that generalize well on another domain is a difficult task, as several shifts might happen between the data domains. This is notably the case for lidar data, for which models can exhibit large performance discrepancies due for instance to different lidar patterns or changes in acquisition conditions. This paper addresses the corresponding Unsupervised Domain Adaptation (UDA) task for semantic segmentation. To mitigate this problem, we introduce an unsupervised auxiliary task of learning an implicit underlying surface representation simultaneously on source and target data. As both domains share the same latent representation, the model is forced to accommodate discrepancies between the two sources of data. This novel strategy differs from classical minimization of statistical divergences or lidar-specific state-of-the-art domain adaptation techniques. Our experiments demonstrate that our method achieves a better performance than the current state of the art in synthetic-to-real and real-to-real scenarios.

Keyword: diffusion

DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model

  • Authors: Hoigi Seo, Hayeon Kim, Gwanghyun Kim, Se Young Chun
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.02827
  • Pdf link: https://arxiv.org/pdf/2304.02827
  • Abstract
    The increasing demand for high-quality 3D content creation has motivated the development of automated methods for creating 3D object models from a single image and/or from a text prompt. However, the reconstructed 3D objects using state-of-the-art image-to-3D methods still exhibit low correspondence to the given image and low multi-view consistency. Recent state-of-the-art text-to-3D methods are also limited, yielding 3D samples with low diversity per prompt with long synthesis time. To address these challenges, we propose DITTO-NeRF, a novel pipeline to generate a high-quality 3D NeRF model from a text prompt or a single image. Our DITTO-NeRF consists of constructing high-quality partial 3D object for limited in-boundary (IB) angles using the given or text-generated 2D image from the frontal view and then iteratively reconstructing the remaining 3D NeRF using inpainting latent diffusion model. We propose progressive 3D object reconstruction schemes in terms of scales (low to high resolution), angles (IB angles initially to outer-boundary (OB) later), and masks (object to background boundary) in our DITTO-NeRF so that high-quality information on IB can be propagated into OB. Our DITTO-NeRF outperforms state-of-the-art methods in terms of fidelity and diversity qualitatively and quantitatively with much faster training times than prior arts on image/text-to-3D such as DreamFusion, and NeuralLift-360.

Benchmarking Robustness to Text-Guided Corruptions

  • Authors: Mohammadreza Mofayezi, Yasamin Medghalchi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02963
  • Pdf link: https://arxiv.org/pdf/2304.02963
  • Abstract
    This study investigates the robustness of image classifiers to text-guided corruptions. We utilize diffusion models to edit images to different domains. Unlike other works that use synthetic or hand-picked data for benchmarking, we use diffusion models as they are generative models capable of learning to edit images while preserving their semantic content. Thus, the corruptions will be more realistic and the comparison will be more informative. Also, there is no need for manual labeling and we can create large-scale benchmarks with less effort. We define a prompt hierarchy based on the original ImageNet hierarchy to apply edits in different domains. As well as introducing a new benchmark we try to investigate the robustness of different vision models. The results of this study demonstrate that the performance of image classifiers decreases significantly in different language-based corruptions and edit domains. We also observe that convolutional models are more robust than transformer architectures. Additionally, we see that common data augmentation techniques can improve the performance on both the original data and the edited images. The findings of this research can help improve the design of image classifiers and contribute to the development of more robust machine learning systems. The code for generating the benchmark will be made available online upon publication.

DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance

  • Authors: Longwen Zhang, Qiwei Qiu, Hongyang Lin, Qixuan Zhang, Cheng Shi, Wei Yang, Ye Shi, Sibei Yang, Lan Xu, Jingyi Yu
  • Subjects: Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.03117
  • Pdf link: https://arxiv.org/pdf/2304.03117
  • Abstract
    Emerging Metaverse applications demand accessible, accurate, and easy-to-use tools for 3D digital human creations in order to depict different cultures and societies as if in the physical world. Recent large-scale vision-language advances pave the way to for novices to conveniently customize 3D content. However, the generated CG-friendly assets still cannot represent the desired facial traits for human characteristics. In this paper, we present DreamFace, a progressive scheme to generate personalized 3D faces under text guidance. It enables layman users to naturally customize 3D facial assets that are compatible with CG pipelines, with desired shapes, textures, and fine-grained animation capabilities. From a text input to describe the facial traits, we first introduce a coarse-to-fine scheme to generate the neutral facial geometry with a unified topology. We employ a selection strategy in the CLIP embedding space, and subsequently optimize both the details displacements and normals using Score Distillation Sampling from generic Latent Diffusion Model. Then, for neutral appearance generation, we introduce a dual-path mechanism, which combines the generic LDM with a novel texture LDM to ensure both the diversity and textural specification in the UV space. We also employ a two-stage optimization to perform SDS in both the latent and image spaces to significantly provides compact priors for fine-grained synthesis. Our generated neutral assets naturally support blendshapes-based facial animations. We further improve the animation ability with personalized deformation characteristics by learning the universal expression prior using the cross-identity hypernetwork. Notably, DreamFace can generate of realistic 3D facial assets with physically-based rendering quality and rich animation ability from video footage, even for fashion icons or exotic characters in cartoons and fiction movies.

Zero-shot Generative Model Adaptation via Image-specific Prompt Learning

  • Authors: Jiayi Guo, Chaofei Wang, You Wu, Eric Zhang, Kai Wang, Xingqian Xu, Shiji Song, Humphrey Shi, Gao Huang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03119
  • Pdf link: https://arxiv.org/pdf/2304.03119
  • Abstract
    Recently, CLIP-guided image synthesis has shown appealing performance on adapting a pre-trained source-domain generator to an unseen target domain. It does not require any target-domain samples but only the textual domain labels. The training is highly efficient, e.g., a few minutes. However, existing methods still have some limitations in the quality of generated images and may suffer from the mode collapse issue. A key reason is that a fixed adaptation direction is applied for all cross-domain image pairs, which leads to identical supervision signals. To address this issue, we propose an Image-specific Prompt Learning (IPL) method, which learns specific prompt vectors for each source-domain image. This produces a more precise adaptation direction for every cross-domain image pair, endowing the target-domain generator with greatly enhanced flexibility. Qualitative and quantitative evaluations on various domains demonstrate that IPL effectively improves the quality and diversity of synthesized images and alleviates the mode collapse. Moreover, IPL is independent of the structure of the generative model, such as generative adversarial networks or diffusion models. Code is available at https://github.com/Picsart-AI-Research/IPL-Zero-Shot-Generative-Model-Adaptation.

SketchFFusion: Sketch-guided image editing with diffusion model

  • Authors: Weihang Mao, Bo Han, Zihao Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03174
  • Pdf link: https://arxiv.org/pdf/2304.03174
  • Abstract
    Sketch-guided image editing aims to achieve local fine-tuning of the image based on the sketch information provided by the user, while maintaining the original status of the unedited areas. Due to the high cost of acquiring human sketches, previous works mostly relied on edge maps as a substitute for sketches, but sketches possess more rich structural information. In this paper, we propose a sketch generation scheme that can preserve the main contours of an image and closely adhere to the actual sketch style drawn by the user. Simultaneously, current image editing methods often face challenges such as image distortion, training cost, and loss of fine details in the sketch. To address these limitations, We propose a conditional diffusion model (SketchFFusion) based on the sketch structure vector. We evaluate the generative performance of our model and demonstrate that it outperforms existing methods.

Face Animation with an Attribute-Guided Diffusion Model

  • Authors: Bohan Zeng, Xuhui Liu, Sicheng Gao, Boyu Liu, Hong Li, Jianzhuang Liu, Baochang Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03199
  • Pdf link: https://arxiv.org/pdf/2304.03199
  • Abstract
    Face animation has achieved much progress in computer vision. However, prevailing GAN-based methods suffer from unnatural distortions and artifacts due to sophisticated motion deformation. In this paper, we propose a Face Animation framework with an attribute-guided Diffusion Model (FADM), which is the first work to exploit the superior modeling capacity of diffusion models for photo-realistic talking-head generation. To mitigate the uncontrollable synthesis effect of the diffusion model, we design an Attribute-Guided Conditioning Network (AGCN) to adaptively combine the coarse animation features and 3D face reconstruction results, which can incorporate appearance and motion conditions into the diffusion process. These specific designs help FADM rectify unnatural artifacts and distortions, and also enrich high-fidelity facial details through iterative diffusion refinements with accurate animation attributes. FADM can flexibly and effectively improve existing animation videos. Extensive experiments on widely used talking-head benchmarks validate the effectiveness of FADM over prior arts.

Inst-Inpaint: Instructing to Remove Objects with Diffusion Models

  • Authors: Ahmet Burak Yildirim, Vedat Baday, Erkut Erdem, Aykut Erdem, Aysegul Dundar
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03246
  • Pdf link: https://arxiv.org/pdf/2304.03246
  • Abstract
    Image inpainting task refers to erasing unwanted pixels from images and filling them in a semantically consistent and realistic way. Traditionally, the pixels that are wished to be erased are defined with binary masks. From the application point of view, a user needs to generate the masks for the objects they would like to remove which can be time-consuming and prone to errors. In this work, we are interested in an image inpainting algorithm that estimates which object to be removed based on natural language input and also removes it, simultaneously. For this purpose, first, we construct a dataset named GQA-Inpaint for this task which will be released soon. Second, we present a novel inpainting framework, Inst-Inpaint, that can remove objects from images based on the instructions given as text prompts. We set various GAN and diffusion-based baselines and run experiments on synthetic and real image datasets. We compare methods with different evaluation metrics that measure the quality and accuracy of the models and show significant quantitative and qualitative improvements.

Diffusion Models as Masked Autoencoders

  • Authors: Chen Wei, Karttikeya Mangalam, Po-Yao Huang, Yanghao Li, Haoqi Fan, Hu Xu, Huiyu Wang, Cihang Xie, Alan Yuille, Christoph Feichtenhofer
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03283
  • Pdf link: https://arxiv.org/pdf/2304.03283
  • Abstract
    There has been a longstanding belief that generation can facilitate a true understanding of visual data. In line with this, we revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models. While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE). Our approach is capable of (i) serving as a strong initialization for downstream recognition tasks, (ii) conducting high-quality image inpainting, and (iii) being effortlessly extended to video where it produces state-of-the-art classification accuracy. We further perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.

Keyword: dynamic

Abstraction-based Probabilistic Stability Analysis of Polyhedral Probabilistic Hybrid Systems

  • Authors: Spandan Das, Pavithra Prabhakar
  • Subjects: Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02647
  • Pdf link: https://arxiv.org/pdf/2304.02647
  • Abstract
    In this paper, we consider the problem of probabilistic stability analysis of a subclass of Stochastic Hybrid Systems, namely, Polyhedral Probabilistic Hybrid Systems (PPHS), where the flow dynamics is given by a polyhedral inclusion, the discrete switching between modes happens probabilistically at the boundaries of their invariant regions and the continuous state is not reset during switching. We present an abstraction-based analysis framework that consists of constructing a finite Markov Decision Processes (MDP) such that verification of certain property on the finite MDP ensures the satisfaction of probabilistic stability on the PPHS. Further, we present a polynomial-time algorithm for verifying the corresponding property on the MDP. Our experimental analysis demonstrates the feasibility of the approach in successfully verifying probabilistic stability on PPHS of various dimensions and sizes.

Emergent Coordination through Game-Induced Nonlinear Opinion Dynamics

  • Authors: Haimin Hu, Kensuke Nakamura, Kai-Chieh Hsu, Naomi Ehrich Leonard, Jaime Fernández Fisac
  • Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02687
  • Pdf link: https://arxiv.org/pdf/2304.02687
  • Abstract
    We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to unsafe behavior in safety-critical events. The recently developed nonlinear opinion dynamics (NOD) provide guarantees for breaking deadlocks. However, choosing the appropriate model parameters automatically in general multi-agent settings remains a challenge. In this paper, we first propose a novel and principled procedure for synthesizing NOD based on the value functions of dynamic games conditioned on agents' intents. In particular, we provide for the two-player two-option case precise stability conditions for equilibria of the game-induced NOD based on the mismatch between agents' opinions and their game values. We then propose an optimization-based trajectory optimization algorithm that computes agents' policies guided by the evolution of opinions. The efficacy of our method is illustrated with a simulated toll station coordination example.

Going Further: Flatness at the Rescue of Early Stopping for Adversarial Example Transferability

  • Authors: Martin Gubri, Maxime Cordy, Yves Le Traon
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02688
  • Pdf link: https://arxiv.org/pdf/2304.02688
  • Abstract
    Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that transferability is substantially increased when the training of the surrogate model has been early stopped. A common hypothesis to explain this is that the later training epochs are when models learn the non-robust features that adversarial attacks exploit. Hence, an early stopped model is more robust (hence, a better surrogate) than fully trained models. We demonstrate that the reasons why early stopping improves transferability lie in the side effects it has on the learning dynamics of the model. We first show that early stopping benefits transferability even on models learning from data with non-robust features. We then establish links between transferability and the exploration of the loss landscape in the parameter space, on which early stopping has an inherent effect. More precisely, we observe that transferability peaks when the learning rate decays, which is also the time at which the sharpness of the loss significantly drops. This leads us to propose RFN, a new approach for transferability that minimizes loss sharpness during training in order to maximize transferability. We show that by searching for large flat neighborhoods, RFN always improves over early stopping (by up to 47 points of transferability rate) and is competitive to (if not better than) strong state-of-the-art baselines.

ACTION++: Improving Semi-supervised Medical Image Segmentation with Adaptive Anatomical Contrast

  • Authors: Chenyu You, Weicheng Dai, Yifei Min, Lawrence Staib, Jas Sekhon, James S. Duncan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.02689
  • Pdf link: https://arxiv.org/pdf/2304.02689
  • Abstract
    Medical data often exhibits long-tail distributions with heavy class imbalance, which naturally leads to difficulty in classifying the minority classes (i.e., boundary regions or rare objects). Recent work has significantly improved semi-supervised medical image segmentation in long-tailed scenarios by equipping them with unsupervised contrastive criteria. However, it remains unclear how well they will perform in the labeled portion of data where class distribution is also highly imbalanced. In this work, we present ACTION++, an improved contrastive learning framework with adaptive anatomical contrast for semi-supervised medical segmentation. Specifically, we propose an adaptive supervised contrastive loss, where we first compute the optimal locations of class centers uniformly distributed on the embedding space (i.e., off-line), and then perform online contrastive matching training by encouraging different class features to adaptively match these distinct and uniformly distributed class centers. Moreover, we argue that blindly adopting a constant temperature $\tau$ in the contrastive loss on long-tailed medical data is not optimal, and propose to use a dynamic $\tau$ via a simple cosine schedule to yield better separation between majority and minority classes. Empirically, we evaluate ACTION++ on ACDC and LA benchmarks and show that it achieves state-of-the-art across two semi-supervised settings. Theoretically, we analyze the performance of adaptive anatomical contrast and confirm its superiority in label efficiency.

Recovering Continuous Scene Dynamics from A Single Blurry Image with Events

  • Authors: Zhangyi Cheng, Xiang Zhang, Lei Yu, Jianzhuang Liu, Wen Yang, Gui-Song Xia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02695
  • Pdf link: https://arxiv.org/pdf/2304.02695
  • Abstract
    This paper aims at demystifying a single motion-blurred image with events and revealing temporally continuous scene dynamics encrypted behind motion blurs. To achieve this end, an Implicit Video Function (IVF) is learned to represent a single motion blurred image with concurrent events, enabling the latent sharp image restoration of arbitrary timestamps in the range of imaging exposures. Specifically, a dual attention transformer is proposed to efficiently leverage merits from both modalities, i.e., the high temporal resolution of event features and the smoothness of image features, alleviating temporal ambiguities while suppressing the event noise. The proposed network is trained only with the supervision of ground-truth images of limited referenced timestamps. Motion- and texture-guided supervisions are employed simultaneously to enhance restorations of the non-referenced timestamps and improve the overall sharpness. Experiments on synthetic, semi-synthetic, and real-world datasets demonstrate that our proposed method outperforms state-of-the-art methods by a large margin in terms of both objective PSNR and SSIM measurements and subjective evaluations.

Efficient and Accurate Automatic Python Bindings with cppyy & Cling

  • Authors: Baidyanath Kundu (1 and 2), Vassil Vassilev (1 and 2), Wim Lavrijsen (3) ((1) European Council for Nuclear Research, (2) Princeton University (US), (3) LBNL (US))
  • Subjects: Programming Languages (cs.PL)
  • Arxiv link: https://arxiv.org/abs/2304.02712
  • Pdf link: https://arxiv.org/pdf/2304.02712
  • Abstract
    The simplicity of Python and the power of C++ force stark choices on a scientific software stack. There have been multiple developments to mitigate language boundaries by implementing language bindings, but the impedance mismatch between the static nature of C++ and the dynamic one of Python hinders their implementation; examples include the use of user-defined Python types with templated C++ and advanced memory management. The development of the C++ interpreter Cling has changed the way we can think of language bindings as it provides an incremental compilation infrastructure available at runtime. That is, Python can interrogate C++ on demand, and bindings can be lazily constructed at runtime. This automatic binding provision requires no direct support from library authors and offers better performance than alternative solutions, such as PyBind11. ROOT pioneered this approach with PyROOT, which was later enhanced with its successor, cppyy. However, until now, cppyy relied on the reflection layer of ROOT, which is limited in terms of provided features and performance. This paper presents the next step for language interoperability with cppyy, enabling research into uniform cross-language execution environments and boosting optimization opportunities across language boundaries. We illustrate the use of advanced C++ in Numba-accelerated Python through cppyy. We outline a path forward for re-engineering parts of cppyy to use upstream LLVM components to improve performance and sustainability. We demonstrate cppyy purely based on a C++ reflection library, InterOp, which offers interoperability primitives based on Cling and Clang-Repl.

Software and Analysis for Dynamic Voronoi Diagrams in the Hilbert Metric

  • Authors: Madeline Bumpus, Caesar Dai, Auguste H. Gezalyan, Sam Munoz, Renita Santhoshkumar, Songyu Ye, David M. Mount
  • Subjects: Computational Geometry (cs.CG)
  • Arxiv link: https://arxiv.org/abs/2304.02745
  • Pdf link: https://arxiv.org/pdf/2304.02745
  • Abstract
    The Hilbert metric is a projective metric defined on a convex body which generalizes the Cayley-Klein model of hyperbolic geometry to any convex set. In this paper we analyze Hilbert Voronoi diagrams in the Dynamic setting. In addition we introduce dynamic visualization software for Voronoi diagrams in the Hilbert metric on user specified convex polygons.

Adaptive Headway Motion Control and Motion Prediction for Safe Unicycle Motion Design

  • Authors: Aykut İşleyen, Nathan van de Wouw, Ömür Arslan
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02760
  • Pdf link: https://arxiv.org/pdf/2304.02760
  • Abstract
    Differential drive robots that can be modeled as a kinematic unicycle are a standard mobile base platform for many service and logistics robots. Safe and smooth autonomous motion around obstacles is a crucial skill for unicycle robots to perform diverse tasks in complex environments. A classical control approach for unicycle control is feedback linearization using a headway point at a fixed headway distance in front of the unicycle. The unicycle headway control brings the headway point to a desired goal location by embedding a linear headway reference dynamics, which often results in an undesired offset for the actual unicycle position. In this paper, we introduce a new unicycle headway control approach with an adaptive headway distance that overcomes this limitation, i.e., when the headway point reaches the goal the unicycle position is also at the goal. By systematically analyzing the closed-loop unicycle motion under the adaptive headway controller, we design analytical feedback motion prediction methods that bound the closed-loop unicycle position trajectory and so can be effectively used for safety assessment and safe unicycle motion design around obstacles. We present an application of adaptive headway motion control and motion prediction for safe unicycle path following around obstacles in numerical simulations.

A Robust Observer with Gyroscopic Bias Correction for Rotational Dynamics

  • Authors: Erjen Lefeber, Marcus Greiff, Anders Robertsson
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02763
  • Pdf link: https://arxiv.org/pdf/2304.02763
  • Abstract
    We propose an observer for rotational dynamics subject to directional and gyroscopic measurements, which simultaneously estimates the gyroscopic biases and attitude rates. We show uniform almost global asymptotic and local exponential stability of the resulting error dynamics, implying robustness against bounded disturbances. This robustness is quantified with respect to a popular nonlinear complementary filter in quantitative simulation studies, and we explore how the measurement noise propagates to the asymptotic errors as a function of tuning. This is an extended version of a paper with the same title (to appear at IFAC WC 2023). Additional mathematical details are provided in this extended version.

MoStGAN-V: Video Generation with Temporal Motion Styles

  • Authors: Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02777
  • Pdf link: https://arxiv.org/pdf/2304.02777
  • Abstract
    Video generation remains a challenging task due to spatiotemporal complexity and the requirement of synthesizing diverse motions with temporal consistency. Previous works attempt to generate videos in arbitrary lengths either in an autoregressive manner or regarding time as a continuous signal. However, they struggle to synthesize detailed and diverse motions with temporal coherence and tend to generate repetitive scenes after a few time steps. In this work, we argue that a single time-agnostic latent vector of style-based generator is insufficient to model various and temporally-consistent motions. Hence, we introduce additional time-dependent motion styles to model diverse motion patterns. In addition, a Motion Style Attention modulation mechanism, dubbed as MoStAtt, is proposed to augment frames with vivid dynamics for each specific scale (i.e., layer), which assigns attention score for each motion style w.r.t deconvolution filter weights in the target synthesis layer and softly attends different motion styles for weight modulation. Experimental results show our model achieves state-of-the-art performance on four unconditional $256^2$ video synthesis benchmarks trained with only 3 frames per clip and produces better qualitative results with respect to dynamic motions. Code and videos have been made available at https://github.com/xiaoqian-shen/MoStGAN-V.

Enhanced Grid Following Inverter: A Uniform Control Design Framework

  • Authors: Alireza Askarian, Jaesang Park, Srinivasa Salapaka
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.02792
  • Pdf link: https://arxiv.org/pdf/2304.02792
  • Abstract
    This article presents a novel grid following (GFL) inverter control design framework that exploits the line dynamics structure in $dq$ frame and treats the inverter as an actuator. The proposed framework imposes a structure on the line's coupled dynamics and captures the effect of coupling on the GFL inverter's closed-loop stability and performance. One of the main features of our work is using the bode sensitivity integral to characterize the fundamental limitations of control design. These constraints translate into fundamental trade-offs between performance objectives such as reference tracking, closed-loop bandwidth, robust synchronization, and resilience to grid anomalies. The article develops design considerations to ensure specific trade-offs. We assess the performance of our proposed framework through simulation and experimental results.

Unveiling the Dynamics of Censorship, COVID-19 Regulations, and Protest: An Empirical Study of Chinese Subreddit r/china_irl

  • Authors: Siyi Zhou, Luca Luceri, Emilio Ferrara
  • Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2304.02800
  • Pdf link: https://arxiv.org/pdf/2304.02800
  • Abstract
    The COVID-19 pandemic has intensified numerous social issues that warrant academic investigation. Although information dissemination has been extensively studied, the silenced voices and censored content also merit attention due to their role in mobilizing social movements. In this paper, we provide empirical evidence to explore the relationships among COVID-19 regulations, censorship, and protest through a series of social incidents occurred in China during 2022. We analyze the similarities and differences between censored articles and discussions on r/china_irl, the most popular Chinese-speaking subreddit, and scrutinize the temporal dynamics of government censorship activities and their impact on user engagement within the subreddit. Furthermore, we examine users' linguistic patterns under the influence of a censorship-driven environment. Our findings reveal patterns in topic recurrence, the complex interplay between censorship activities, user subscription, and collective commenting behavior, as well as potential linguistic adaptation strategies to circumvent censorship. These insights hold significant implications for researchers interested in understanding the survival mechanisms of marginalized groups within censored information ecosystems.

Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling

  • Authors: Haotao Wang, Ziyu Jiang, Yan Han, Zhangyang Wang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.02806
  • Pdf link: https://arxiv.org/pdf/2304.02806
  • Abstract
    Graph neural networks (GNNs) have been widely applied to learning over graph data. Yet, real-world graphs commonly exhibit diverse graph structures and contain heterogeneous nodes and edges. Moreover, to enhance the generalization ability of GNNs, it has become common practice to further increase the diversity of training graph structures by incorporating graph augmentations and/or performing large-scale pre-training on more graphs. Therefore, it becomes essential for a GNN to simultaneously model diverse graph structures. Yet, naively increasing the GNN model capacity will suffer from both higher inference costs and the notorious trainability issue of GNNs. This paper introduces the Mixture-of-Expert (MoE) idea to GNNs, aiming to enhance their ability to accommodate the diversity of training graph structures, without incurring computational overheads. Our new Graph Mixture of Expert (GMoE) model enables each node in the graph to dynamically select its own optimal \textit{information aggregation experts}. These experts are trained to model different subgroups of graph structures in the training set. Additionally, GMoE includes information aggregation experts with varying aggregation hop sizes, where the experts with larger hop sizes are specialized in capturing information over longer ranges. The effectiveness of GMoE is verified through experimental results on a large variety of graph, node, and link prediction tasks in the OGB benchmark. For instance, it enhances ROC-AUC by $1.81%$ in ogbg-molhiv and by $1.40%$ in ogbg-molbbbp, as compared to the non-MoE baselines. Our code is available at https://github.com/VITA-Group/Graph-Mixture-of-Experts.

Causal Repair of Learning-enabled Cyber-physical Systems

  • Authors: Pengyuan Lu, Ivan Ruchkin, Matthew Cleaveland, Oleg Sokolsky, Insup Lee
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.02813
  • Pdf link: https://arxiv.org/pdf/2304.02813
  • Abstract
    Models of actual causality leverage domain knowledge to generate convincing diagnoses of events that caused an outcome. It is promising to apply these models to diagnose and repair run-time property violations in cyber-physical systems (CPS) with learning-enabled components (LEC). However, given the high diversity and complexity of LECs, it is challenging to encode domain knowledge (e.g., the CPS dynamics) in a scalable actual causality model that could generate useful repair suggestions. In this paper, we focus causal diagnosis on the input/output behaviors of LECs. Specifically, we aim to identify which subset of I/O behaviors of the LEC is an actual cause for a property violation. An important by-product is a counterfactual version of the LEC that repairs the run-time property by fixing the identified problematic behaviors. Based on this insights, we design a two-step diagnostic pipeline: (1) construct and Halpern-Pearl causality model that reflects the dependency of property outcome on the component's I/O behaviors, and (2) perform a search for an actual cause and corresponding repair on the model. We prove that our pipeline has the following guarantee: if an actual cause is found, the system is guaranteed to be repaired; otherwise, we have high probabilistic confidence that the LEC under analysis did not cause the property violation. We demonstrate that our approach successfully repairs learned controllers on a standard OpenAI Gym benchmark.

NTK-SAP: Improving neural network pruning by aligning training dynamics

  • Authors: Yite Wang, Dawei Li, Ruoyu Sun
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02840
  • Pdf link: https://arxiv.org/pdf/2304.02840
  • Abstract
    Pruning neural networks before training has received increasing interest due to its potential to reduce training time and memory. One popular method is to prune the connections based on a certain metric, but it is not entirely clear what metric is the best choice. Recent advances in neural tangent kernel (NTK) theory suggest that the training dynamics of large enough neural networks is closely related to the spectrum of the NTK. Motivated by this finding, we propose to prune the connections that have the least influence on the spectrum of the NTK. This method can help maintain the NTK spectrum, which may help align the training dynamics to that of its dense counterpart. However, one possible issue is that the fixed-weight-NTK corresponding to a given initial point can be very different from the NTK corresponding to later iterates during the training phase. We further propose to sample multiple realizations of random weights to estimate the NTK spectrum. Note that our approach is weight-agnostic, which is different from most existing methods that are weight-dependent. In addition, we use random inputs to compute the fixed-weight-NTK, making our method data-agnostic as well. We name our foresight pruning algorithm Neural Tangent Kernel Spectrum-Aware Pruning (NTK-SAP). Empirically, our method achieves better performance than all baselines on multiple datasets.

Design and Control of a Ballbot Drivetrain with High Agility, Minimal Footprint, and High Payload

  • Authors: Chenzhang Xiao, Mahshid Mansouri, David Lam, Joao Ramos, Elizabeth T. Hsiao-Wecksler
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.02887
  • Pdf link: https://arxiv.org/pdf/2304.02887
  • Abstract
    This paper presents the design and control of a ballbot drivetrain that aims to achieve high agility, minimal footprint, and high payload capacity while maintaining dynamic stability. Two hardware platforms and analytical models were developed to test design and control methodologies. The full-scale ballbot prototype (MiaPURE) was constructed using off-the-shelf components and designed to have agility, footprint, and balance similar to that of a walking human. The planar inverted pendulum testbed (PIPTB) was developed as a reduced-order testbed for quick validation of system performance. We then proposed a simple yet robust LQR-PI controller to balance and maneuver the ballbot drivetrain with a heavy payload. This is crucial because the drivetrain is often subject to high stiction due to elastomeric components in the torque transmission system. This controller was first tested in the PIPTB to compare with traditional LQR and cascaded PI-PD controllers, and then implemented in the ballbot drivetrain. The MiaPURE drivetrain was able to carry a payload of 60 kg, achieve a maximum speed of 2.3 m/s, and come to a stop from a speed of 1.4 m/s in 2 seconds in a selected translation direction. Finally, we demonstrated the omnidirectional movement of the ballbot drivetrain in an indoor environment as a payload-carrying robot and a human-riding mobility device. Our experiments demonstrated the feasibility of using the ballbot drivetrain as a universal mobility platform with agile movements, minimal footprint, and high payload capacity using our proposed design and control methodologies.

LSketch: A Label-Enabled Graph Stream Sketch Toward Time-Sensitive Queries

  • Authors: Yiling Zeng, Chunyao Song, Yuhan Li, Tingjian Ge
  • Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.02897
  • Pdf link: https://arxiv.org/pdf/2304.02897
  • Abstract
    Graph streams represent data interactions in real applications. The mining of graph streams plays an important role in network security, social network analysis, and traffic control, among others. However, the sheer volume and high dynamics cause great challenges for efficient storage and subsequent query analysis on them. Current studies apply sketches to summarize graph streams. We propose LSketch that works for heterogeneous graph streams, which effectively preserves the label information carried by the streams in real scenes, thereby enriching the expressive ability of sketches. In addition, as graph streams continue to evolve over time, edges too old may lose their practical significance. Therefore, we introduce the sliding window model into LSketch to eliminate the expired edges automatically. LSketch uses sub-linear storage space and can support structure based queries and time-sensitive queries with high accuracy. We perform extensive experiments over four real datasets, demonstrating the superiority of the proposed method over state-of-the-art methods, in aspects of query accuracy and time efficiency.

Quantifying and Defending against Privacy Threats on Federated Knowledge Graph Embedding

  • Authors: Yuke Hu, Wei Liang, Ruofan Wu, Kai Xiao, Weiqiang Wang, Xiaochen Li, Jinfei Liu, Zhan Qin
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.02932
  • Pdf link: https://arxiv.org/pdf/2304.02932
  • Abstract
    Knowledge Graph Embedding (KGE) is a fundamental technique that extracts expressive representation from knowledge graph (KG) to facilitate diverse downstream tasks. The emerging federated KGE (FKGE) collaboratively trains from distributed KGs held among clients while avoiding exchanging clients' sensitive raw KGs, which can still suffer from privacy threats as evidenced in other federated model trainings (e.g., neural networks). However, quantifying and defending against such privacy threats remain unexplored for FKGE which possesses unique properties not shared by previously studied models. In this paper, we conduct the first holistic study of the privacy threat on FKGE from both attack and defense perspectives. For the attack, we quantify the privacy threat by proposing three new inference attacks, which reveal substantial privacy risk by successfully inferring the existence of the KG triple from victim clients. For the defense, we propose DP-Flames, a novel differentially private FKGE with private selection, which offers a better privacy-utility tradeoff by exploiting the entity-binding sparse gradient property of FKGE and comes with a tight privacy accountant by incorporating the state-of-the-art private selection technique. We further propose an adaptive privacy budget allocation policy to dynamically adjust defense magnitude across the training procedure. Comprehensive evaluations demonstrate that the proposed defense can successfully mitigate the privacy threat by effectively reducing the success rate of inference attacks from $83.1%$ to $59.4%$ on average with only a modest utility decrease.

Adaptable and Interpretable Framework for Novelty Detection in Real-Time IoT Systems

  • Authors: Marek Wadinger, Michal Kvasnica
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.02947
  • Pdf link: https://arxiv.org/pdf/2304.02947
  • Abstract
    This paper presents the Real-time Adaptive and Interpretable Detection (RAID) algorithm. The novel approach addresses the limitations of state-of-the-art anomaly detection methods for multivariate dynamic processes, which are restricted to detecting anomalies within the scope of the model training conditions. The RAID algorithm adapts to non-stationary effects such as data drift and change points that may not be accounted for during model development, resulting in prolonged service life. A dynamic model based on joint probability distribution handles anomalous behavior detection in a system and the root cause isolation based on adaptive process limits. RAID algorithm does not require changes to existing process automation infrastructures, making it highly deployable across different domains. Two case studies involving real dynamic system data demonstrate the benefits of the RAID algorithm, including change point adaptation, root cause isolation, and improved detection accuracy.

FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead

  • Authors: Kang Chen, Tao Han, Junchao Gong, Lei Bai, Fenghua Ling, Jing-Jia Luo, Xi Chen, Leiming Ma, Tianning Zhang, Rui Su, Yuanzheng Ci, Bin Li, Xiaokang Yang, Wanli Ouyang
  • Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
  • Arxiv link: https://arxiv.org/abs/2304.02948
  • Pdf link: https://arxiv.org/pdf/2304.02948
  • Abstract
    We present FengWu, an advanced data-driven global medium-range weather forecast system based on Artificial Intelligence (AI). Different from existing data-driven weather forecast methods, FengWu solves the medium-range forecast problem from a multi-modal and multi-task perspective. Specifically, a deep learning architecture equipped with model-specific encoder-decoders and cross-modal fusion Transformer is elaborately designed, which is learned under the supervision of an uncertainty loss to balance the optimization of different predictors in a region-adaptive manner. Besides this, a replay buffer mechanism is introduced to improve medium-range forecast performance. With 39-year data training based on the ERA5 reanalysis, FengWu is able to accurately reproduce the atmospheric dynamics and predict the future land and atmosphere states at 37 vertical levels on a 0.25{\deg} latitude-longitude resolution. Hindcasts of 6-hourly weather in 2018 based on ERA5 demonstrate that FengWu performs better than GraphCast in predicting 80% of the 880 reported predictands, e.g., reducing the root mean square error (RMSE) of 10-day lead global z500 prediction from 733 to 651 $m^{2}/s^2$. In addition, the inference cost of each iteration is merely 600ms on NVIDIA Tesla A100 hardware. The results suggest that FengWu can significantly improve the forecast skill and extend the skillful global medium-range weather forecast out to 10.75 days lead (with ACC of z500 > 0.6) for the first time.

Deep Long-Short Term Memory networks: Stability properties and Experimental validation

  • Authors: Fabio Bonassi, Alessio La Bella, Giulio Panzani, Marcello Farina, Riccardo Scattolini
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.02975
  • Pdf link: https://arxiv.org/pdf/2304.02975
  • Abstract
    The aim of this work is to investigate the use of Incrementally Input-to-State Stable ($\delta$ISS) deep Long Short Term Memory networks (LSTMs) for the identification of nonlinear dynamical systems. We show that suitable sufficient conditions on the weights of the network can be leveraged to setup a training procedure able to learn provenly-$\delta$ISS LSTM models from data. The proposed approach is tested on a real brake-by-wire apparatus to identify a model of the system from input-output experimentally collected data. Results show satisfactory modeling performances.

Distributed Model Predictive Control for Periodic Cooperation of Multi-Agent Systems

  • Authors: Matthias Köhler, Matthias A. Müller, Frank Allgöwer
  • Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.03002
  • Pdf link: https://arxiv.org/pdf/2304.03002
  • Abstract
    We consider multi-agent systems with heterogeneous, nonlinear agents subject to individual constraints that want to achieve a periodic, dynamic cooperative control goal which can be characterised by a set and a suitable cost. We propose a sequential distributed model predictive control (MPC) scheme in which agents sequentially solve an individual optimisation problem to track an artificial periodic output trajectory. The optimisation problems are coupled through these artificial periodic output trajectories, which are communicated and penalised using the cost that characterises the cooperative goal. The agents communicate only their artificial trajectories and only once per time step. We show that under suitable assumptions, the agents can incrementally move their artificial output trajectories towards the cooperative goal, and, hence, their closed-loop output trajectories asymptotically achieve it. We illustrate the scheme with a simulation example.

IoT Federated Blockchain Learning at the Edge

  • Authors: James Calo, Benny Lo
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.03006
  • Pdf link: https://arxiv.org/pdf/2304.03006
  • Abstract
    IoT devices are sorely underutilized in the medical field, especially within machine learning for medicine, yet they offer unrivaled benefits. IoT devices are low-cost, energy-efficient, small and intelligent devices. In this paper, we propose a distributed federated learning framework for IoT devices, more specifically for IoMT (Internet of Medical Things), using blockchain to allow for a decentralized scheme improving privacy and efficiency over a centralized system; this allows us to move from the cloud-based architectures, that are prevalent, to the edge. The system is designed for three paradigms: 1) Training neural networks on IoT devices to allow for collaborative training of a shared model whilst decoupling the learning from the dataset to ensure privacy. Training is performed in an online manner simultaneously amongst all participants, allowing for the training of actual data that may not have been present in a dataset collected in the traditional way and dynamically adapt the system whilst it is being trained. 2) Training of an IoMT system in a fully private manner such as to mitigate the issue with confidentiality of medical data and to build robust, and potentially bespoke, models where not much, if any, data exists. 3) Distribution of the actual network training, something federated learning itself does not do, to allow hospitals, for example, to utilize their spare computing resources to train network models.

Data-driven HVAC Control Using Symbolic Regression: Design and Implementation

  • Authors: Yuki Ozawa, Dafang Zhao, Daichi Watari, Ittetsu Taniguchi, Toshihiro Suzuki, Yoshiyuki Shimoda, Takao Onoye
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.03078
  • Pdf link: https://arxiv.org/pdf/2304.03078
  • Abstract
    The large amount of data collected in buildings makes energy management smarter and more energy efficient. This study proposes a design and implementation methodology of data-driven heating, ventilation, and air conditioning (HVAC) control. Building thermodynamics is modeled using a symbolic regression model (SRM) built from the collected data. Additionally, an HVAC system model is also developed with a data-driven approach. A model predictive control (MPC) based HVAC scheduling is formulated with the developed models to minimize energy consumption and peak power demand and maximize thermal comfort. The performance of the proposed framework is demonstrated in the workspace in the actual campus building. The HVAC system using the proposed framework reduces the peak power by 16.1% compared to the widely used thermostat controller.

Inductive Graph Unlearning

  • Authors: Cheng-Long Wang, Mengdi Huai, Di Wang
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.03093
  • Pdf link: https://arxiv.org/pdf/2304.03093
  • Abstract
    As a way to implement the "right to be forgotten" in machine learning, \textit{machine unlearning} aims to completely remove the contributions and information of the samples to be deleted from a trained model without affecting the contributions of other samples. Recently, many frameworks for machine unlearning have been proposed, and most of them focus on image and text data. To extend machine unlearning to graph data, \textit{GraphEraser} has been proposed. However, a critical issue is that \textit{GraphEraser} is specifically designed for the transductive graph setting, where the graph is static and attributes and edges of test nodes are visible during training. It is unsuitable for the inductive setting, where the graph could be dynamic and the test graph information is invisible in advance. Such inductive capability is essential for production machine learning systems with evolving graphs like social media and transaction networks. To fill this gap, we propose the \underline{{\bf G}}\underline{{\bf U}}ided \underline{{\bf I}}n\underline{{\bf D}}uctiv\underline{{\bf E}} Graph Unlearning framework (GUIDE). GUIDE consists of three components: guided graph partitioning with fairness and balance, efficient subgraph repair, and similarity-based aggregation. Empirically, we evaluate our method on several inductive benchmarks and evolving transaction graphs. Generally speaking, GUIDE can be efficiently implemented on the inductive graph learning tasks for its low graph partition cost, no matter on computation or structure information. The code will be available here: https://github.com/Happy2Git/GUIDE.

Constrained Exploration in Reinforcement Learning with Optimality Preservation

  • Authors: Peter C. Y. Chen
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.03104
  • Pdf link: https://arxiv.org/pdf/2304.03104
  • Abstract
    We consider a class of reinforcement-learning systems in which the agent follows a behavior policy to explore a discrete state-action space to find an optimal policy while adhering to some restriction on its behavior. Such restriction may prevent the agent from visiting some state-action pairs, possibly leading to the agent finding only a sub-optimal policy. To address this problem we introduce the concept of constrained exploration with optimality preservation, whereby the exploration behavior of the agent is constrained to meet a specification while the optimality of the (original) unconstrained learning process is preserved. We first establish a feedback-control structure that models the dynamics of the unconstrained learning process. We then extend this structure by adding a supervisor to ensure that the behavior of the agent meets the specification, and establish (for a class of reinforcement-learning problems with a known deterministic environment) a necessary and sufficient condition under which optimality is preserved. This work demonstrates the utility and the prospect of studying reinforcement-learning problems in the context of the theories of discrete-event systems, automata and formal languages.

A self-organizing robotic aggregate using solid and liquid-like collective states

  • Authors: Baudouin Saintyves, Matthew Spenko, Heinrich M. Jaeger
  • Subjects: Robotics (cs.RO); Soft Condensed Matter (cond-mat.soft); Adaptation and Self-Organizing Systems (nlin.AO)
  • Arxiv link: https://arxiv.org/abs/2304.03125
  • Pdf link: https://arxiv.org/pdf/2304.03125
  • Abstract
    Designing robotic systems that can change their physical form factor as well as their compliance to adapt to environmental constraints remains a major conceptual and technical challenge. To address this, we introduce the Granulobot, a modular system that blurs the distinction between soft, modular, and swarm robotics. The system consists of gear-like units that each contain a single actuator such that units can self-assemble into larger, granular aggregates using magnetic coupling. These aggregates can reconfigure dynamically and also split up into subsystems that might later recombine. Aggregates can self-organize into collective states with solid- and liquid-like properties, thus displaying widely differing compliances. These states can be perturbed locally via actuators or externally via mechanical feedback from the environment to produce adaptive shape shifting in a decentralized manner. This in turn can generate locomotion strategies adapted to different conditions. Aggregates can move over obstacles without using external sensors or coordinate to maintain a steady gait over different surfaces without electronic communication among units. The modular design highlights a physical, morphological form of control that advances the development of resilient robotic systems with the ability to morph and adapt to different functions and conditions.

From Saliency to DINO: Saliency-guided Vision Transformer for Few-shot Keypoint Detection

  • Authors: Changsheng Lu, Hao Zhu, Piotr Koniusz
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03140
  • Pdf link: https://arxiv.org/pdf/2304.03140
  • Abstract
    Unlike current deep keypoint detectors that are trained to recognize limited number of body parts, few-shot keypoint detection (FSKD) attempts to localize any keypoints, including novel or base keypoints, depending on the reference samples. FSKD requires the semantically meaningful relations for keypoint similarity learning to overcome the ubiquitous noise and ambiguous local patterns. One rescue comes with vision transformer (ViT) as it captures long-range relations well. However, ViT may model irrelevant features outside of the region of interest due to the global attention matrix, thus degrading similarity learning between support and query features. In this paper, we present a novel saliency-guided vision transformer, dubbed SalViT, for few-shot keypoint detection. Our SalViT enjoys a uniquely designed masked self-attention and a morphology learner, where the former introduces saliency map as a soft mask to constrain the self-attention on foregrounds, while the latter leverages the so-called power normalization to adjust morphology of saliency map, realizing ``dynamically changing receptive field''. Moreover, as salinecy detectors add computations, we show that attentive masks of DINO transformer can replace saliency. On top of SalViT, we also investigate i) transductive FSKD that enhances keypoint representations with unlabelled data and ii) FSKD under occlusions. We show that our model performs well on five public datasets and achieves ~10% PCK higher than the normally trained model under severe occlusions.

Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream

  • Authors: Yuheng Jiang, Kaixin Yao, Zhuo Su, Zhehao Shen, Haimin Luo, Lan Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03184
  • Pdf link: https://arxiv.org/pdf/2304.03184
  • Abstract
    Convenient 4D modeling of human-object interactions is essential for numerous applications. However, monocular tracking and rendering of complex interaction scenarios remain challenging. In this paper, we propose Instant-NVR, a neural approach for instant volumetric human-object tracking and rendering using a single RGBD camera. It bridges traditional non-rigid tracking with recent instant radiance field techniques via a multi-thread tracking-rendering mechanism. In the tracking front-end, we adopt a robust human-object capture scheme to provide sufficient motion priors. We further introduce a separated instant neural representation with a novel hybrid deformation module for the interacting scene. We also provide an on-the-fly reconstruction scheme of the dynamic/static radiance fields via efficient motion-prior searching. Moreover, we introduce an online key frame selection scheme and a rendering-aware refinement strategy to significantly improve the appearance details for online novel-view synthesis. Extensive experiments demonstrate the effectiveness and efficiency of our approach for the instant generation of human-object radiance fields on the fly, notably achieving real-time photo-realistic novel view synthesis under complex human-object interactions.

LANe: Lighting-Aware Neural Fields for Compositional Scene Synthesis

  • Authors: Akshay Krishnan, Amit Raj, Xianling Zhang, Alexandra Carlson, Nathan Tseng, Sandhya Sridhar, Nikita Jaipuria, James Hays
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03280
  • Pdf link: https://arxiv.org/pdf/2304.03280
  • Abstract
    Neural fields have recently enjoyed great success in representing and rendering 3D scenes. However, most state-of-the-art implicit representations model static or dynamic scenes as a whole, with minor variations. Existing work on learning disentangled world and object neural fields do not consider the problem of composing objects into different world neural fields in a lighting-aware manner. We present Lighting-Aware Neural Field (LANe) for the compositional synthesis of driving scenes in a physically consistent manner. Specifically, we learn a scene representation that disentangles the static background and transient elements into a world-NeRF and class-specific object-NeRFs to allow compositional synthesis of multiple objects in the scene. Furthermore, we explicitly designed both the world and object models to handle lighting variation, which allows us to compose objects into scenes with spatially varying lighting. This is achieved by constructing a light field of the scene and using it in conjunction with a learned shader to modulate the appearance of the object NeRFs. We demonstrate the performance of our model on a synthetic dataset of diverse lighting conditions rendered with the CARLA simulator, as well as a novel real-world dataset of cars collected at different times of the day. Our approach shows that it outperforms state-of-the-art compositional scene synthesis on the challenging dataset setup, via composing object-NeRFs learned from one scene into an entirely different scene whilst still respecting the lighting variations in the novel scene. For more results, please visit our project website https://lane-composition.github.io/.

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention

  • Authors: Mingyu Ding, Yikang Shen, Lijie Fan, Zhenfang Chen, Zitian Chen, Ping Luo, Joshua B. Tenenbaum, Chuang Gan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03282
  • Pdf link: https://arxiv.org/pdf/2304.03282
  • Abstract
    Humans possess a versatile mechanism for extracting structured representations of our visual world. When looking at an image, we can decompose the scene into entities and their parts as well as obtain the dependencies between them. To mimic such capability, we propose Visual Dependency Transformers (DependencyViT) that can induce visual dependencies without any labels. We achieve that with a novel neural operator called \emph{reversed attention} that can naturally capture long-range visual dependencies between image patches. Specifically, we formulate it as a dependency graph where a child token in reversed attention is trained to attend to its parent tokens and send information following a normalized probability distribution rather than gathering information in conventional self-attention. With such a design, hierarchies naturally emerge from reversed attention layers, and a dependency tree is progressively induced from leaf nodes to the root node unsupervisedly. DependencyViT offers several appealing benefits. (i) Entities and their parts in an image are represented by different subtrees, enabling part partitioning from dependencies; (ii) Dynamic visual pooling is made possible. The leaf nodes which rarely send messages can be pruned without hindering the model performance, based on which we propose the lightweight DependencyViT-Lite to reduce the computational and memory footprints; (iii) DependencyViT works well on both self- and weakly-supervised pretraining paradigms on ImageNet, and demonstrates its effectiveness on 8 datasets and 5 tasks, such as unsupervised part and saliency segmentation, recognition, and detection.

New submissions for Fri, 14 Apr 23

Keyword: efficient

RELS-DQN: A Robust and Efficient Local Search Framework for Combinatorial Optimization

  • Authors: Yuanhang Shao, Tonmoy Dey, Nikola Vuckovic, Luke Van Popering, Alan Kuhnle
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06048
  • Pdf link: https://arxiv.org/pdf/2304.06048
  • Abstract
    Combinatorial optimization (CO) aims to efficiently find the best solution to NP-hard problems ranging from statistical physics to social media marketing. A wide range of CO applications can benefit from local search methods because they allow reversible action over greedy policies. Deep Q-learning (DQN) using message-passing neural networks (MPNN) has shown promise in replicating the local search behavior and obtaining comparable results to the local search algorithms. However, the over-smoothing and the information loss during the iterations of message passing limit its robustness across applications, and the large message vectors result in memory inefficiency. Our paper introduces RELS-DQN, a lightweight DQN framework that exhibits the local search behavior while providing practical scalability. Using the RELS-DQN model trained on one application, it can generalize to various applications by providing solution values higher than or equal to both the local search algorithms and the existing DQN models while remaining efficient in runtime and memory.

Exploiting Symmetry and Heuristic Demonstrations in Off-policy Reinforcement Learning for Robotic Manipulation

  • Authors: Amir M. Soufi Enayati, Zengjie Zhang, Kashish Gupta, Homayoun Najjaran
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06055
  • Pdf link: https://arxiv.org/pdf/2304.06055
  • Abstract
    Reinforcement learning demonstrates significant potential in automatically building control policies in numerous domains, but shows low efficiency when applied to robot manipulation tasks due to the curse of dimensionality. To facilitate the learning of such tasks, prior knowledge or heuristics that incorporate inherent simplification can effectively improve the learning performance. This paper aims to define and incorporate the natural symmetry present in physical robotic environments. Then, sample-efficient policies are trained by exploiting the expert demonstrations in symmetrical environments through an amalgamation of reinforcement and behavior cloning, which gives the off-policy learning process a diverse yet compact initiation. Furthermore, it presents a rigorous framework for a recent concept and explores its scope for robot manipulation tasks. The proposed method is validated via two point-to-point reaching tasks of an industrial arm, with and without an obstacle, in a simulation experiment study. A PID controller, which tracks the linear joint-space trajectories with hard-coded temporal logic to produce interim midpoints, is used to generate demonstrations in the study. The results of the study present the effect of the number of demonstrations and quantify the magnitude of behavior cloning to exemplify the possible improvement of model-free reinforcement learning in common manipulation tasks. A comparison study between the proposed method and a traditional off-policy reinforcement learning algorithm indicates its advantage in learning performance and potential value for applications.

Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays

  • Authors: Chen Xie, Francesco Daghero, Yukai Chen, Marco Castellano, Luca Gandolfi, Andrea Calimera, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06059
  • Pdf link: https://arxiv.org/pdf/2304.06059
  • Abstract
    Ultra-low-resolution Infrared (IR) array sensors offer a low-cost, energy-efficient, and privacy-preserving solution for people counting, with applications such as occupancy monitoring. Previous work has shown that Deep Learning (DL) can yield superior performance on this task. However, the literature was missing an extensive comparative analysis of various efficient DL architectures for IR array-based people counting, that considers not only their accuracy, but also the cost of deploying them on memory- and energy-constrained Internet of Things (IoT) edge nodes. In this work, we address this need by comparing 6 different DL architectures on a novel dataset composed of IR images collected from a commercial 8x8 array, which we made openly available. With a wide architectural exploration of each model type, we obtain a rich set of Pareto-optimal solutions, spanning cross-validated balanced accuracy scores in the 55.70-82.70% range. When deployed on a commercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, these models occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference, while consuming 17.18-120.43 $\mu$J of energy. Our models are significantly more accurate than a previous deterministic method (up to +39.9%), while being up to 3.53x faster and more energy efficient. Further, our models' accuracy is comparable to state-of-the-art DL solutions on similar resolution sensors, despite a much lower complexity. All our models enable continuous, real-time inference on a MCU-based IoT node, with years of autonomous operation without battery recharging.

Energy-guided Entropic Neural Optimal Transport

  • Authors: Petr Mokrov, Alexander Korotin, Evgeny Burnaev
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.06094
  • Pdf link: https://arxiv.org/pdf/2304.06094
  • Abstract
    Energy-Based Models (EBMs) are known in the Machine Learning community for the decades. Since the seminal works devoted to EBMs dating back to the noughties there have been appearing a lot of efficient methods which solve the generative modelling problem by means of energy potentials (unnormalized likelihood functions). In contrast, the realm of Optimal Transport (OT) and, in particular, neural OT solvers is much less explored and limited by few recent works (excluding WGAN based approaches which utilize OT as a loss function and do not model OT maps themselves). In our work, we bridge the gap between EBMs and Entropy-regularized OT. We present the novel methodology which allows utilizing the recent developments and technical improvements of the former in order to enrich the latter. We validate the applicability of our method on toy 2D scenarios as well as standard unpaired image-to-image translation problems. For the sake of simplicity, we choose simple short- and long- run EBMs as a backbone of our Energy-guided Entropic OT method, leaving the application of more sophisticated EBMs for future research.

IoT trust and reputation: a survey and taxonomy

  • Authors: Muhammad Aaqib, Aftab Ali, Liming Chen, Omar Nibouche
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06119
  • Pdf link: https://arxiv.org/pdf/2304.06119
  • Abstract
    IoT is one of the fastest-growing technologies and it is estimated that more than a billion devices would be utilized across the globe by the end of 2030. To maximize the capability of these connected entities, trust and reputation among IoT entities is essential. Several trust management models have been proposed in the IoT environment; however, these schemes have not fully addressed the IoT devices features, such as devices role, device type and its dynamic behavior in a smart environment. As a result, traditional trust and reputation models are insufficient to tackle these characteristics and uncertainty risks while connecting nodes to the network. Whilst continuous study has been carried out and various articles suggest promising solutions in constrained environments, research on trust and reputation is still at its infancy. In this paper, we carry out a comprehensive literature review on state-of-the-art research on the trust and reputation of IoT devices and systems. Specifically, we first propose a new structure, namely a new taxonomy, to organize the trust and reputation models based on the ways trust is managed. The proposed taxonomy comprises of traditional trust management-based systems and artificial intelligence-based systems, and combine both the classes which encourage the existing schemes to adapt these emerging concepts. This collaboration between the conventional mathematical and the advanced ML models result in design schemes that are more robust and efficient. Then we drill down to compare and analyse the methods and applications of these systems based on community-accepted performance metrics, e.g. scalability, delay, cooperativeness and efficiency. Finally, built upon the findings of the analysis, we identify and discuss open research issues and challenges, and further speculate and point out future research directions.

Label-Free Concept Bottleneck Models

  • Authors: Tuomas Oikarinen, Subhro Das, Lam M. Nguyen, Tsui-Wei Weng
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06129
  • Pdf link: https://arxiv.org/pdf/2304.06129
  • Abstract
    Concept bottleneck models (CBM) are a popular way of creating more interpretable neural networks by having hidden layer neurons correspond to human-understandable concepts. However, existing CBMs and their variants have two crucial limitations: first, they need to collect labeled data for each of the predefined concepts, which is time consuming and labor intensive; second, the accuracy of a CBM is often significantly lower than that of a standard neural network, especially on more complex datasets. This poor performance creates a barrier for adopting CBMs in practical real world applications. Motivated by these challenges, we propose Label-free CBM which is a novel framework to transform any neural network into an interpretable CBM without labeled concept data, while retaining a high accuracy. Our Label-free CBM has many advantages, it is: scalable - we present the first CBM scaled to ImageNet, efficient - creating a CBM takes only a few hours even for very large datasets, and automated - training it for a new dataset requires minimal human effort. Our code is available at https://github.com/Trustworthy-ML-Lab/Label-free-CBM.

AGI for Agriculture

  • Authors: Guoyu Lu, Sheng Li, Gengchen Mai, Jin Sun, Dajiang Zhu, Lilong Chai, Haijian Sun, Xianqiao Wang, Haixing Dai, Ninghao Liu, Rui Xu, Daniel Petti, Changying Li, Tianming Liu, Changying Li
  • Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06136
  • Pdf link: https://arxiv.org/pdf/2304.06136
  • Abstract
    Artificial General Intelligence (AGI) is poised to revolutionize a variety of sectors, including healthcare, finance, transportation, and education. Within healthcare, AGI is being utilized to analyze clinical medical notes, recognize patterns in patient data, and aid in patient management. Agriculture is another critical sector that impacts the lives of individuals worldwide. It serves as a foundation for providing food, fiber, and fuel, yet faces several challenges, such as climate change, soil degradation, water scarcity, and food security. AGI has the potential to tackle these issues by enhancing crop yields, reducing waste, and promoting sustainable farming practices. It can also help farmers make informed decisions by leveraging real-time data, leading to more efficient and effective farm management. This paper delves into the potential future applications of AGI in agriculture, such as agriculture image processing, natural language processing (NLP), robotics, knowledge graphs, and infrastructure, and their impact on precision livestock and precision crops. By leveraging the power of AGI, these emerging technologies can provide farmers with actionable insights, allowing for optimized decision-making and increased productivity. The transformative potential of AGI in agriculture is vast, and this paper aims to highlight its potential to revolutionize the industry.

Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

  • Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.06178
  • Pdf link: https://arxiv.org/pdf/2304.06178
  • Abstract
    Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.

SePEnTra: A secure and privacy-preserving energy trading mechanisms in transactive energy market

  • Authors: Rumpa Dasgupta, Amin Sakzad, Carsten Rudolph, Rafael Dowsley
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.06179
  • Pdf link: https://arxiv.org/pdf/2304.06179
  • Abstract
    In this paper, we design and present a novel model called SePEnTra to ensure the security and privacy of energy data while sharing with other entities during energy trading to determine optimal price signals. Furthermore, the market operator can use this data to detect malicious activities of users in the later stage without violating privacy (e.g., deviation of actual energy generation/consumption from forecast beyond a threshold). We use two cryptographic primitives, additive secret sharing and Pedersen commitment, in SePEnTra. The performance of our model is evaluated theoretically and numerically. We compare the performance of SePEnTra with the same Transactive energy market (TEM) framework without security mechanisms. The result shows that even though using advanced cryptographic primitives in a large market framework, SePEnTra has very low computational complexity and communication overhead. Moreover, it is storage efficient for all parties.

SURFSUP: Learning Fluid Simulation for Novel Surfaces

  • Authors: Arjun Mani, Ishaan Preetam Chandratreya, Elliot Creager, Carl Vondrick, Richard Zemel
  • Subjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)
  • Arxiv link: https://arxiv.org/abs/2304.06197
  • Pdf link: https://arxiv.org/pdf/2304.06197
  • Abstract
    Modeling the mechanics of fluid in complex scenes is vital to applications in design, graphics, and robotics. Learning-based methods provide fast and differentiable fluid simulators, however most prior work is unable to accurately model how fluids interact with genuinely novel surfaces not seen during training. We introduce SURFSUP, a framework that represents objects implicitly using signed distance functions (SDFs), rather than an explicit representation of meshes or particles. This continuous representation of geometry enables more accurate simulation of fluid-object interactions over long time periods while simultaneously making computation more efficient. Moreover, SURFSUP trained on simple shape primitives generalizes considerably out-of-distribution, even to complex real-world scenes and objects. Finally, we show we can invert our model to design simple objects to manipulate fluid flow.

Space-Time Tradeoffs for Conjunctive Queries with Access Patterns

  • Authors: Hangdong Zhao, Shaleen Deep, Paraschos Koutris
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.06221
  • Pdf link: https://arxiv.org/pdf/2304.06221
  • Abstract
    In this paper, we investigate space-time tradeoffs for answering conjunctive queries with access patterns (CQAPs). The goal is to create a space-efficient data structure in an initial preprocessing phase and use it for answering (multiple) queries in an online phase. Previous work has developed data structures that trades off space usage for answering time for queries of practical interest, such as the path and triangle query. However, these approaches lack a comprehensive framework and are not generalizable. Our main contribution is a general algorithmic framework for obtaining space-time tradeoffs for any CQAP. Our framework builds upon the $\PANDA$ algorithm and tree decomposition techniques. We demonstrate that our framework captures all state-of-the-art tradeoffs that were independently produced for various queries. Further, we show surprising improvements over the state-of-the-art tradeoffs known in the existing literature for reachability queries.

Improving Segmentation of Objects with Varying Sizes in Biomedical Images using Instance-wise and Center-of-Instance Segmentation Loss Function

  • Authors: Muhammad Febrian Rachmadi, Charissa Poon, Henrik Skibbe
  • Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06229
  • Pdf link: https://arxiv.org/pdf/2304.06229
  • Abstract
    In this paper, we propose a novel two-component loss for biomedical image segmentation tasks called the Instance-wise and Center-of-Instance (ICI) loss, a loss function that addresses the instance imbalance problem commonly encountered when using pixel-wise loss functions such as the Dice loss. The Instance-wise component improves the detection of small instances or ``blobs" in image datasets with both large and small instances. The Center-of-Instance component improves the overall detection accuracy. We compared the ICI loss with two existing losses, the Dice loss and the blob loss, in the task of stroke lesion segmentation using the ATLAS R2.0 challenge dataset from MICCAI 2022. Compared to the other losses, the ICI loss provided a better balanced segmentation, and significantly outperformed the Dice loss with an improvement of $1.7-3.7%$ and the blob loss by $0.6-5.0%$ in terms of the Dice similarity coefficient on both validation and test set, suggesting that the ICI loss is a potential solution to the instance imbalance problem.

Physics-informed radial basis network (PIRBN): A local approximation neural network for solving nonlinear PDEs

  • Authors: Jinshuai Bai, Gui-Rong Liu, Ashish Gupta, Laith Alzubaidi, Xi-Qiao Feng, YuanTong Gu
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06234
  • Pdf link: https://arxiv.org/pdf/2304.06234
  • Abstract
    Our recent intensive study has found that physics-informed neural networks (PINN) tend to be local approximators after training. This observation leads to this novel physics-informed radial basis network (PIRBN), which can maintain the local property throughout the entire training process. Compared to deep neural networks, a PIRBN comprises of only one hidden layer and a radial basis "activation" function. Under appropriate conditions, we demonstrated that the training of PIRBNs using gradient descendent methods can converge to Gaussian processes. Besides, we studied the training dynamics of PIRBN via the neural tangent kernel (NTK) theory. In addition, comprehensive investigations regarding the initialisation strategies of PIRBN were conducted. Based on numerical examples, PIRBN has been demonstrated to be more effective and efficient than PINN in solving PDEs with high-frequency features and ill-posed computational domains. Moreover, the existing PINN numerical techniques, such as adaptive learning, decomposition and different types of loss functions, are applicable to PIRBN. The programs that can regenerate all numerical results can be found at https://github.com/JinshuaiBai/PIRBN.

Cross-View Hierarchy Network for Stereo Image Super-Resolution

  • Authors: Wenbin Zou, Hongxia Gao, Liang Chen, Yunchen Zhang, Mingchao Jiang, Zhongxin Yu, Ming Tan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06236
  • Pdf link: https://arxiv.org/pdf/2304.06236
  • Abstract
    Stereo image super-resolution aims to improve the quality of high-resolution stereo image pairs by exploiting complementary information across views. To attain superior performance, many methods have prioritized designing complex modules to fuse similar information across views, yet overlooking the importance of intra-view information for high-resolution reconstruction. It also leads to problems of wrong texture in recovered images. To address this issue, we explore the interdependencies between various hierarchies from intra-view and propose a novel method, named Cross-View-Hierarchy Network for Stereo Image Super-Resolution (CVHSSR). Specifically, we design a cross-hierarchy information mining block (CHIMB) that leverages channel attention and large kernel convolution attention to extract both global and local features from the intra-view, enabling the efficient restoration of accurate texture details. Additionally, a cross-view interaction module (CVIM) is proposed to fuse similar features from different views by utilizing cross-view attention mechanisms, effectively adapting to the binocular scene. Extensive experiments demonstrate the effectiveness of our method. CVHSSR achieves the best stereo image super-resolution performance than other state-of-the-art methods while using fewer parameters. The source code and pre-trained models are available at https://github.com/AlexZou14/CVHSSR.

EWT: Efficient Wavelet-Transformer for Single Image Denoising

  • Authors: Juncheng Li, Bodong Cheng, Ying Chen, Guangwei Gao, Tieyong Zeng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06274
  • Pdf link: https://arxiv.org/pdf/2304.06274
  • Abstract
    Transformer-based image denoising methods have achieved encouraging results in the past year. However, it must uses linear operations to model long-range dependencies, which greatly increases model inference time and consumes GPU storage space. Compared with convolutional neural network-based methods, current Transformer-based image denoising methods cannot achieve a balance between performance improvement and resource consumption. In this paper, we propose an Efficient Wavelet Transformer (EWT) for image denoising. Specifically, we use Discrete Wavelet Transform (DWT) and Inverse Wavelet Transform (IWT) for downsampling and upsampling, respectively. This method can fully preserve the image features while reducing the image resolution, thereby greatly reducing the device resource consumption of the Transformer model. Furthermore, we propose a novel Dual-stream Feature Extraction Block (DFEB) to extract image features at different levels, which can further reduce model inference time and GPU memory usage. Experiments show that our method speeds up the original Transformer by more than 80%, reduces GPU memory usage by more than 60%, and achieves excellent denoising results. All code will be public.

Optimizing Multi-Domain Performance with Active Learning-based Improvement Strategies

  • Authors: Anand Gokul Mahalingam, Aayush Shah, Akshay Gulati, Royston Mascarenhas, Rakshitha Panduranga
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06277
  • Pdf link: https://arxiv.org/pdf/2304.06277
  • Abstract
    Improving performance in multiple domains is a challenging task, and often requires significant amounts of data to train and test models. Active learning techniques provide a promising solution by enabling models to select the most informative samples for labeling, thus reducing the amount of labeled data required to achieve high performance. In this paper, we present an active learning-based framework for improving performance across multiple domains. Our approach consists of two stages: first, we use an initial set of labeled data to train a base model, and then we iteratively select the most informative samples for labeling to refine the model. We evaluate our approach on several multi-domain datasets, including image classification, sentiment analysis, and object recognition. Our experiments demonstrate that our approach consistently outperforms baseline methods and achieves state-of-the-art performance on several datasets. We also show that our method is highly efficient, requiring significantly fewer labeled samples than other active learning-based methods. Overall, our approach provides a practical and effective solution for improving performance across multiple domains using active learning techniques.

Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning

  • Authors: Wenli Xiao, Yiwei Lyu, John Dolan
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06281
  • Pdf link: https://arxiv.org/pdf/2304.06281
  • Abstract
    Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Although shielding with Linear Temporal Logic (LTL) is a promising formal method to ensure safety in single-agent Reinforcement Learning (RL), it results in conservative behaviors when scaling to multi-agent scenarios. Additionally, it poses computational challenges for synthesizing shields in complex multi-agent environments. This work introduces Model-based Dynamic Shielding (MBDS) to support MARL algorithm design. Our algorithm synthesizes distributive shields, which are reactive systems running in parallel with each MARL agent, to monitor and rectify unsafe behaviors. The shields can dynamically split, merge, and recompute based on agents' states. This design enables efficient synthesis of shields to monitor agents in complex environments without coordination overheads. We also propose an algorithm to synthesize shields without prior knowledge of the dynamics model. The proposed algorithm obtains an approximate world model by interacting with the environment during the early stage of exploration, making our MBDS enjoy formal safety guarantees with high probability. We demonstrate in simulations that our framework can surpass existing baselines in terms of safety guarantees and learning performance.

ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis

  • Authors: Hongchen Tan, Baocai Yin, Kun Wei, Xiuping Liu, Xin Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06297
  • Pdf link: https://arxiv.org/pdf/2304.06297
  • Abstract
    We propose a novel Text-to-Image Generation Network, Adaptive Layout Refinement Generative Adversarial Network (ALR-GAN), to adaptively refine the layout of synthesized images without any auxiliary information. The ALR-GAN includes an Adaptive Layout Refinement (ALR) module and a Layout Visual Refinement (LVR) loss. The ALR module aligns the layout structure (which refers to locations of objects and background) of a synthesized image with that of its corresponding real image. In ALR module, we proposed an Adaptive Layout Refinement (ALR) loss to balance the matching of hard and easy features, for more efficient layout structure matching. Based on the refined layout structure, the LVR loss further refines the visual representation within the layout area. Experimental results on two widely-used datasets show that ALR-GAN performs competitively at the Text-to-Image generation task.

Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution

  • Authors: Zhuo Su, Jiehua Zhang, Tianpeng Liu, Zhen Liu, Shuanghui Zhang, Matti Pietikäinen, Li Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06305
  • Pdf link: https://arxiv.org/pdf/2304.06305
  • Abstract
    This paper proposes a novel module called middle spectrum grouped convolution (MSGC) for efficient deep convolutional neural networks (DCNNs) with the mechanism of grouped convolution. It explores the broad "middle spectrum" area between channel pruning and conventional grouped convolution. Compared with channel pruning, MSGC can retain most of the information from the input feature maps due to the group mechanism; compared with grouped convolution, MSGC benefits from the learnability, the core of channel pruning, for constructing its group topology, leading to better channel division. The middle spectrum area is unfolded along four dimensions: group-wise, layer-wise, sample-wise, and attention-wise, making it possible to reveal more powerful and interpretable structures. As a result, the proposed module acts as a booster that can reduce the computational cost of the host backbones for general image recognition with even improved predictive accuracy. For example, in the experiments on ImageNet dataset for image classification, MSGC can reduce the multiply-accumulates (MACs) of ResNet-18 and ResNet-50 by half but still increase the Top-1 accuracy by more than 1%. With 35% reduction of MACs, MSGC can also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on MS COCO dataset for object detection show similar observations. Our code and trained models are available at https://github.com/hellozhuo/msgc.

Efficient Multimodal Fusion via Interactive Prompting

  • Authors: Yaowei Li, Ruijie Quan, Linchao Zhu, Yi Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06306
  • Pdf link: https://arxiv.org/pdf/2304.06306
  • Abstract
    Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era. Following this trend, the size of multi-modal learning models constantly increases, leading to an urgent need to reduce the massive computational cost of finetuning these models for downstream tasks. In this paper, we propose an efficient and flexible multimodal fusion method, namely PMF, tailored for fusing unimodally pre-trained transformers. Specifically, we first present a modular multimodal fusion framework that exhibits high flexibility and facilitates mutual interactions among different modalities. In addition, we disentangle vanilla prompts into three types in order to learn different optimizing objectives for multimodal learning. It is also worth noting that we propose to add prompt vectors only on the deep layers of the unimodal transformers, thus significantly reducing the training memory usage. Experiment results show that our proposed method achieves comparable performance to several other multimodal finetuning methods with less than 3% trainable parameters and up to 66% saving of training memory usage.

Out-of-distribution Few-shot Learning For Edge Devices without Model Fine-tuning

  • Authors: Xinyun Zhang, Lanqing Hong
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06309
  • Pdf link: https://arxiv.org/pdf/2304.06309
  • Abstract
    Few-shot learning (FSL) via customization of a deep learning network with limited data has emerged as a promising technique to achieve personalized user experiences on edge devices. However, existing FSL methods primarily assume independent and identically distributed (IID) data and utilize either computational backpropagation updates for each task or a common model with task-specific prototypes. Unfortunately, the former solution is infeasible for edge devices that lack on-device backpropagation capabilities, while the latter often struggles with limited generalization ability, especially for out-of-distribution (OOD) data. This paper proposes a lightweight, plug-and-play FSL module called Task-aware Normalization (TANO) that enables efficient and task-aware adaptation of a deep neural network without backpropagation. TANO covers the properties of multiple user groups by coordinating the updates of several groups of the normalization statistics during meta-training and automatically identifies the appropriate normalization group for a downstream few-shot task. Consequently, TANO provides stable but task-specific estimations of the normalization statistics to close the distribution gaps and achieve efficient model adaptation. Results on both intra-domain and out-of-domain generalization experiments demonstrate that TANO outperforms recent methods in terms of accuracy, inference speed, and model size. Moreover, TANO achieves promising results on widely-used FSL benchmarks and data from real applications.

Universally Optimal Deterministic Broadcasting in the HYBRID Distributed Model

  • Authors: Yi-Jun Chang, Oren Hecht, Dean Leitersdorf
  • Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.06317
  • Pdf link: https://arxiv.org/pdf/2304.06317
  • Abstract
    In theoretical computer science, it is a common practice to show existential lower bounds for problems, meaning there is a family of pathological inputs on which no algorithm can do better. However, most inputs of interest can be solved much more efficiently, giving rise to the notion of universally optimal algorithms, which run as fast as possible on every input. Questions on the existence of universally optimal algorithms were first raised by Garay, Kutten, and Peleg in FOCS '93. This research direction reemerged recently through a series of works, including the influential work of Haeupler, Wajc, and Zuzic in STOC '21, which resolves some of these decades-old questions in the supported CONGEST model. We work in the HYBRID distributed model, which analyzes networks combining both global and local communication. Much attention has recently been devoted to solving distance related problems, such as All-Pairs Shortest Paths (APSP) in HYBRID, culminating in a $\tilde \Theta(n^{1/2})$ round algorithm for exact APSP. However, by definition, every problem in HYBRID is solvable in $D$ (diameter) rounds, showing that it is far from universally optimal. We show the first universally optimal algorithms in HYBRID, by presenting a fundamental tool that solves any broadcasting problem in a universally optimal number of rounds, deterministically. Specifically, we consider the problem in a graph $G$ where a set of $k$ messages $M$ distributed arbitrarily across $G$, requires every node to learn all of $M$. We show a universal lower bound and a matching, deterministic upper bound, for any graph $G$, any value $k$, and any distribution of $M$ across $G$. This broadcasting tool opens a new exciting direction of research into showing universally optimal algorithms in HYBRID. As an example, we use it to obtain algorithms for approximate and exact APSP in general and sparse graphs.

Continual Learning of Hand Gestures for Human-Robot Interaction

  • Authors: Xavier Cucurull, Anaís Garrell
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06319
  • Pdf link: https://arxiv.org/pdf/2304.06319
  • Abstract
    In this paper, we present an efficient method to incrementally learn to classify static hand gestures. This method allows users to teach a robot to recognize new symbols in an incremental manner. Contrary to other works which use special sensors or external devices such as color or data gloves, our proposed approach makes use of a single RGB camera to perform static hand gesture recognition from 2D images. Furthermore, our system is able to incrementally learn up to 38 new symbols using only 5 samples for each old class, achieving a final average accuracy of over 90%. In addition to that, the incremental training time can be reduced to a 10% of the time required when using all data available.

An Automotive Case Study on the Limits of Approximation for Object Detection

  • Authors: Martí Caro, Hamid Tabani, Jaume Abella, Francesc Moll, Enric Morancho, Ramon Canal, Josep Altet, Antonio Calomarde, Francisco J. Cazorla, Antonio Rubio, Pau Fontova, Jordi Fornt
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.06327
  • Pdf link: https://arxiv.org/pdf/2304.06327
  • Abstract
    The accuracy of camera-based object detection (CBOD) built upon deep learning is often evaluated against the real objects in frames only. However, such simplistic evaluation ignores the fact that many unimportant objects are small, distant, or background, and hence, their misdetections have less impact than those for closer, larger, and foreground objects in domains such as autonomous driving. Moreover, sporadic misdetections are irrelevant since confidence on detections is typically averaged across consecutive frames, and detection devices (e.g. cameras, LiDARs) are often redundant, thus providing fault tolerance. This paper exploits such intrinsic fault tolerance of the CBOD process, and assesses in an automotive case study to what extent CBOD can tolerate approximation coming from multiple sources such as lower precision arithmetic, approximate arithmetic units, and even random faults due to, for instance, low voltage operation. We show that the accuracy impact of those sources of approximation is within 1% of the baseline even when considering the three approximate domains simultaneously, and hence, multiple sources of approximation can be exploited to build highly efficient accelerators for CBOD in cars.

EF/CF: High Performance Smart Contract Fuzzing for Exploit Generation

  • Authors: Michael Rodler, David Paaßen, Wenting Li, Lukas Bernhard, Thorsten Holz, Ghassan Karame, Lucas Davi
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.06341
  • Pdf link: https://arxiv.org/pdf/2304.06341
  • Abstract
    Smart contracts are increasingly being used to manage large numbers of high-value cryptocurrency accounts. There is a strong demand for automated, efficient, and comprehensive methods to detect security vulnerabilities in a given contract. While the literature features a plethora of analysis methods for smart contracts, the existing proposals do not address the increasing complexity of contracts. Existing analysis tools suffer from false alarms and missed bugs in today's smart contracts that are increasingly defined by complexity and interdependencies. To scale accurate analysis to modern smart contracts, we introduce EF/CF, a high-performance fuzzer for Ethereum smart contracts. In contrast to previous work, EF/CF efficiently and accurately models complex smart contract interactions, such as reentrancy and cross-contract interactions, at a very high fuzzing throughput rate. To achieve this, EF/CF transpiles smart contract bytecode into native C++ code, thereby enabling the reuse of existing, optimized fuzzing toolchains. Furthermore, EF/CF increases fuzzing efficiency by employing a structure-aware mutation engine for smart contract transaction sequences and using a contract's ABI to generate valid transaction inputs. In a comprehensive evaluation, we show that EF/CF scales better -- without compromising accuracy -- to complex contracts compared to state-of-the-art approaches, including other fuzzers, symbolic/concolic execution, and hybrid approaches. Moreover, we show that EF/CF can automatically generate transaction sequences that exploit reentrancy bugs to steal Ether.

DDT: Dual-branch Deformable Transformer for Image Denoising

  • Authors: Kangliang Liu, Xiangcheng Du, Sijie Liu, Yingbin Zheng, Xingjiao Wu, Cheng Jin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06346
  • Pdf link: https://arxiv.org/pdf/2304.06346
  • Abstract
    Transformer is beneficial for image denoising tasks since it can model long-range dependencies to overcome the limitations presented by inductive convolutional biases. However, directly applying the transformer structure to remove noise is challenging because its complexity grows quadratically with the spatial resolution. In this paper, we propose an efficient Dual-branch Deformable Transformer (DDT) denoising network which captures both local and global interactions in parallel. We divide features with a fixed patch size and a fixed number of patches in local and global branches, respectively. In addition, we apply deformable attention operation in both branches, which helps the network focus on more important regions and further reduces computational complexity. We conduct extensive experiments on real-world and synthetic denoising tasks, and the proposed DDT achieves state-of-the-art performance with significantly fewer computational costs.

ODAM: Gradient-based instance-specific visual explanations for object detection

  • Authors: Chenyang Zhao, Antoni B. Chan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06354
  • Pdf link: https://arxiv.org/pdf/2304.06354
  • Abstract
    We propose the gradient-weighted Object Detector Activation Maps (ODAM), a visualized explanation technique for interpreting the predictions of object detectors. Utilizing the gradients of detector targets flowing into the intermediate feature maps, ODAM produces heat maps that show the influence of regions on the detector's decision for each predicted attribute. Compared to previous works classification activation maps (CAM), ODAM generates instance-specific explanations rather than class-specific ones. We show that ODAM is applicable to both one-stage detectors and two-stage detectors with different types of detector backbones and heads, and produces higher-quality visual explanations than the state-of-the-art both effectively and efficiently. We next propose a training scheme, Odam-Train, to improve the explanation ability on object discrimination of the detector through encouraging consistency between explanations for detections on the same object, and distinct explanations for detections on different objects. Based on the heat maps produced by ODAM with Odam-Train, we propose Odam-NMS, which considers the information of the model's explanation for each prediction to distinguish the duplicate detected objects. We present a detailed analysis of the visualized explanations of detectors and carry out extensive experiments to validate the effectiveness of the proposed ODAM.

IBIA: An Incremental Build-Infer-Approximate Framework for Approximate Inference of Partition Function

  • Authors: Shivani Bathla, Vinita Vasudevan
  • Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06366
  • Pdf link: https://arxiv.org/pdf/2304.06366
  • Abstract
    Exact computation of the partition function is known to be intractable, necessitating approximate inference techniques. Existing methods for approximate inference are slow to converge for many benchmarks. The control of accuracy-complexity trade-off is also non-trivial in many of these methods. We propose a novel incremental build-infer-approximate (IBIA) framework for approximate inference that addresses these issues. In this framework, the probabilistic graphical model is converted into a sequence of clique tree forests (SCTF) with bounded clique sizes. We show that the SCTF can be used to efficiently compute the partition function. We propose two new algorithms which are used to construct the SCTF and prove the correctness of both. The first is an algorithm for incremental construction of CTFs that is guaranteed to give a valid CTF with bounded clique sizes and the second is an approximation algorithm that takes a calibrated CTF as input and yields a valid and calibrated CTF with reduced clique sizes as the output. We have evaluated our method using several benchmark sets from recent UAI competitions and our results show good accuracies with competitive runtimes.

An attack resilient policy on the tip pool for DAG-based distributed ledgers

  • Authors: Lianna Zhao, Andrew Culleny, Sebastian Muellerz, Olivia Saay, Robert Shorten
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.06369
  • Pdf link: https://arxiv.org/pdf/2304.06369
  • Abstract
    This paper discusses congestion control and inconsistency problems in DAG-based distributed ledgers and proposes an additional filter to mitigate these issues. Unlike traditional blockchains, DAG-based DLTs use a directed acyclic graph structure to organize transactions, allowing higher scalability and efficiency. However, this also introduces challenges in controlling the rate at which blocks are added to the network and preventing the influence of spam attacks. To address these challenges, we propose a filter to limit the tip pool size and to avoid referencing old blocks. Furthermore, we present experimental results to demonstrate the effectiveness of this filter in reducing the negative impacts of various attacks. Our approach offers a lightweight and efficient solution for managing the flow of blocks in DAG-based DLTs, which can enhance the consistency and reliability of these systems. Index

Contact Models in Robotics: a Comparative Analysis

  • Authors: Quentin Le Lidec, Wilson Jallet, Louis Montaut, Ivan Laptev, Cordelia Schmid, Justin Carpentier
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06372
  • Pdf link: https://arxiv.org/pdf/2304.06372
  • Abstract
    Physics simulation is ubiquitous in robotics. Whether in model-based approaches (e.g., trajectory optimization), or model-free algorithms (e.g., reinforcement learning), physics simulators are a central component of modern control pipelines in robotics. Over the past decades, several robotic simulators have been developed, each with dedicated contact modeling assumptions and algorithmic solutions. In this article, we survey the main contact models and the associated numerical methods commonly used in robotics for simulating advanced robot motions involving contact interactions. In particular, we recall the physical laws underlying contacts and friction (i.e., Signorini condition, Coulomb's law, and the maximum dissipation principle), and how they are transcribed in current simulators. For each physics engine, we expose their inherent physical relaxations along with their limitations due to the numerical techniques employed. Based on our study, we propose theoretically grounded quantitative criteria on which we build benchmarks assessing both the physical and computational aspects of simulation. We support our work with an open-source and efficient C++ implementation of the existing algorithmic variations. Our results demonstrate that some approximations or algorithms commonly used in robotics can severely widen the reality gap and impact target applications. We hope this work will help motivate the development of new contact models, contact solvers, and robotic simulators in general, at the root of recent progress in motion generation in robotics.

Learning Accurate Performance Predictors for Ultrafast Automated Model Compression

  • Authors: Ziwei Wang, Jiwen Lu, Han Xiao, Shengyu Liu, Jie Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06393
  • Pdf link: https://arxiv.org/pdf/2304.06393
  • Abstract
    In this paper, we propose an ultrafast automated model compression framework called SeerNet for flexible network deployment. Conventional non-differen-tiable methods discretely search the desirable compression policy based on the accuracy from exhaustively trained lightweight models, and existing differentiable methods optimize an extremely large supernet to obtain the required compressed model for deployment. They both cause heavy computational cost due to the complex compression policy search and evaluation process. On the contrary, we obtain the optimal efficient networks by directly optimizing the compression policy with an accurate performance predictor, where the ultrafast automated model compression for various computational cost constraint is achieved without complex compression policy search and evaluation. Specifically, we first train the performance predictor based on the accuracy from uncertain compression policies actively selected by efficient evolutionary search, so that informative supervision is provided to learn the accurate performance predictor with acceptable cost. Then we leverage the gradient that maximizes the predicted performance under the barrier complexity constraint for ultrafast acquisition of the desirable compression policy, where adaptive update stepsizes with momentum are employed to enhance optimality of the acquired pruning and quantization strategy. Compared with the state-of-the-art automated model compression methods, experimental results on image classification and object detection show that our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.

Fast And Automatic Floating Point Error Analysis With CHEF-FP

  • Authors: Garima Singh, Baidyanath Kundu, Harshitha Menon, Alexander Penev, David J. Lange, Vassil Vassilev
  • Subjects: Numerical Analysis (math.NA); Hardware Architecture (cs.AR); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.06441
  • Pdf link: https://arxiv.org/pdf/2304.06441
  • Abstract
    As we reach the limit of Moore's Law, researchers are exploring different paradigms to achieve unprecedented performance. Approximate Computing (AC), which relies on the ability of applications to tolerate some error in the results to trade-off accuracy for performance, has shown significant promise. Despite the success of AC in domains such as Machine Learning, its acceptance in High-Performance Computing (HPC) is limited due to stringent requirements for accuracy. We need tools and techniques to identify regions of code that are amenable to approximations and their impact on the application output quality to guide developers to employ selective approximation. To this end, we propose CHEF-FP, a flexible, scalable, and easy-to-use source-code transformation tool based on Automatic Differentiation (AD) for analyzing approximation errors in HPC applications. CHEF-FP uses Clad, an efficient AD tool built as a plugin to the Clang compiler and based on the LLVM compiler infrastructure, as a backend and utilizes its AD abilities to evaluate approximation errors in C++ code. CHEF-FP works at the source by injecting error estimation code into the generated adjoints. This enables the error-estimation code to undergo compiler optimizations resulting in improved analysis time and reduced memory usage. We also provide theoretical and architectural augmentations to source code transformation-based AD tools to perform FP error analysis. This paper primarily focuses on analyzing errors introduced by mixed-precision AC techniques. We also show the applicability of our tool in estimating other kinds of errors by evaluating our tool on codes that use approximate functions. Moreover, we demonstrate the speedups CHEF-FP achieved during analysis time compared to the existing state-of-the-art tool due to its ability to generate and insert approximation error estimate code directly into the derivative source.

SpectFormer: Frequency and Attention is what you need in a Vision Transformer

  • Authors: Badri N. Patro, Vinay P. Namboodiri, Vijay Srinivas Agneeswaran
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06446
  • Pdf link: https://arxiv.org/pdf/2304.06446
  • Abstract
    Vision transformers have been applied successfully for image recognition tasks. There have been either multi-headed self-attention based (ViT \cite{dosovitskiy2020image}, DeIT, \cite{touvron2021training}) similar to the original work in textual models or more recently based on spectral layers (Fnet\cite{lee2021fnet}, GFNet\cite{rao2021global}, AFNO\cite{guibas2021efficient}). We hypothesize that both spectral and multi-headed attention plays a major role. We investigate this hypothesis through this work and observe that indeed combining spectral and multi-headed attention layers provides a better transformer architecture. We thus propose the novel Spectformer architecture for transformers that combines spectral and multi-headed attention layers. We believe that the resulting representation allows the transformer to capture the feature representation appropriately and it yields improved performance over other transformer representations. For instance, it improves the top-1 accuracy by 2% on ImageNet compared to both GFNet-H and LiT. SpectFormer-S reaches 84.25% top-1 accuracy on ImageNet-1K (state of the art for small version). Further, Spectformer-L achieves 85.7% that is the state of the art for the comparable base version of the transformers. We further ensure that we obtain reasonable results in other scenarios such as transfer learning on standard datasets such as CIFAR-10, CIFAR-100, Oxford-IIIT-flower, and Standford Car datasets. We then investigate its use in downstream tasks such of object detection and instance segmentation on the MS-COCO dataset and observe that Spectformer shows consistent performance that is comparable to the best backbones and can be further optimized and improved. Hence, we believe that combined spectral and attention layers are what are needed for vision transformers.

CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input

  • Authors: Senmao Tian, Ming Lu, Jiaming Liu, Yandong Guo, Yurong Chen, Shunli Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06454
  • Pdf link: https://arxiv.org/pdf/2304.06454
  • Abstract
    With the development of high-definition display devices, the practical scenario of Super-Resolution (SR) usually needs to super-resolve large input like 2K to higher resolution (4K/8K). To reduce the computational and memory cost, current methods first split the large input into local patches and then merge the SR patches into the output. These methods adaptively allocate a subnet for each patch. Quantization is a very important technique for network acceleration and has been used to design the subnets. Current methods train an MLP bit selector to determine the propoer bit for each layer. However, they uniformly sample subnets for training, making simple subnets overfitted and complicated subnets underfitted. Therefore, the trained bit selector fails to determine the optimal bit. Apart from this, the introduced bit selector brings additional cost to each layer of the SR network. In this paper, we propose a novel method named Content-Aware Bit Mapping (CABM), which can remove the bit selector without any performance loss. CABM also learns a bit selector for each layer during training. After training, we analyze the relation between the edge information of an input patch and the bit of each layer. We observe that the edge information can be an effective metric for the selected bit. Therefore, we design a strategy to build an Edge-to-Bit lookup table that maps the edge score of a patch to the bit of each layer during inference. The bit configuration of SR network can be determined by the lookup tables of all layers. Our strategy can find better bit configuration, resulting in more efficient mixed precision networks. We conduct detailed experiments to demonstrate the generalization ability of our method. The code will be released.

Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages

  • Authors: Israel Abebe Azime, Sana Sabah Al-Azzawi, Atnafu Lambebo Tonja, Iyanuoluwa Shode, Jesujoba Alabi, Ayodele Awokoya, Mardiyyah Oduwole, Tosin Adewumi, Samuel Fanijo, Oyinkansola Awosan, Oreen Yousuf
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06459
  • Pdf link: https://arxiv.org/pdf/2304.06459
  • Abstract
    AfriSenti-SemEval Shared Task 12 of SemEval-2023. The task aims to perform monolingual sentiment classification (sub-task A) for 12 African languages, multilingual sentiment classification (sub-task B), and zero-shot sentiment classification (task C). For sub-task A, we conducted experiments using classical machine learning classifiers, Afro-centric language models, and language-specific models. For task B, we fine-tuned multilingual pre-trained language models that support many of the languages in the task. For task C, we used we make use of a parameter-efficient Adapter approach that leverages monolingual texts in the target language for effective zero-shot transfer. Our findings suggest that using pre-trained Afro-centric language models improves performance for low-resource African languages. We also ran experiments using adapters for zero-shot tasks, and the results suggest that we can obtain promising results by using adapters with a limited amount of resources.

Repositioning Tiered HotSpot Execution Performance Relative to the Interpreter

  • Authors: Jonathan Lambert, Kevin Casey, Rosemary Monahan
  • Subjects: Programming Languages (cs.PL); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.06460
  • Pdf link: https://arxiv.org/pdf/2304.06460
  • Abstract
    Although the advantages of just-in-time compilation over traditional interpretive execution are widely recognised, there needs to be more current research investigating and repositioning the performance differences between these two execution models relative to contemporary workloads. Specifically, there is a need to examine the performance differences between Java Runtime Environment (JRE) Java Virtual Machine (JVM) tiered execution and JRE JVM interpretive execution relative to modern multicore architectures and modern concurrent and parallel benchmark workloads. This article aims to fill this research gap by presenting the results of a study that compares the performance of these two execution models under load from the Renaissance Benchmark Suite. This research is relevant to anyone interested in understanding the performance differences between just-in-time compiled code and interpretive execution. It provides a contemporary assessment of the interpretive JVM core, the entry and starting point for bytecode execution, relative to just-in-time tiered execution. The study considers factors such as the JRE version, the GNU GCC version used in the JRE build toolchain, and the garbage collector algorithm specified at runtime, and their impact on the performance difference envelope between interpretive and tiered execution. Our findings indicate that tiered execution is considerably more efficient than interpretive execution, and the performance gap has increased, ranging from 4 to 37 times more efficient. On average, tiered execution is approximately 15 times more efficient than interpretive execution. Additionally, the performance differences between interpretive and tiered execution are influenced by workload category, with narrower performance differences observed for web-based workloads and more significant differences for Functional and Scala-type workloads.

Towards Understanding the Benefits and Challenges of Demand Responsive Public Transit- A Case Study in the City of Charlotte, NC

  • Authors: Sanaz Sadat Hosseini, Mona Azarbayjani, Jason Lawrence, Hamed Tabkhi
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06467
  • Pdf link: https://arxiv.org/pdf/2304.06467
  • Abstract
    Access to adequate public transportation plays a critical role in inequity and socio-economic mobility, particularly in low-income communities. Low-income workers who rely heavily on public transportation face a spatial disparity between home and work, which leads to higher unemployment, longer job searches, and longer commute times. The overarching goal of this study is to get initial data that would result in creating a connected, coordinated, demand-responsive, and efficient public bus system that minimizes transit gaps for low-income, transit-dependent communities. To create equitable metropolitan public transportation, this paper evaluates existing CATS mobile applications that assist passengers in finding bus routes and arrival times. Our community survey methodology includes filling out questionnaires on Charlotte's current bus system on specific bus lines and determining user acceptance for a future novel smart technology. We have also collected data on the demand and transit gap for a real-world pilot study, Sprinter bus line, Bus line 7, Bus line 9, and Bus lines 97-99. These lines connect all of Charlotte City's main areas and are the most important bus lines in the system. On the studied routes, the primary survey results indicate that the current bus system has many flaws, the major one being the lack of proper timing to meet the needs of passengers. The most common problems are long commutes and long waiting times at stations. Moreover, the existing application provides inaccurate information, and on average, 80 percent of travelers and respondents are inclined to use new technology.

An Efficient Transfer Learning-based Approach for Apple Leaf Disease Classification

  • Authors: Md. Hamjajul Ashmafee, Tasnim Ahmed, Sabbir Ahmed, Md. Bakhtiar Hasan, Mst Nura Jahan, A.B.M. Ashikur Rahman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06520
  • Pdf link: https://arxiv.org/pdf/2304.06520
  • Abstract
    Correct identification and categorization of plant diseases are crucial for ensuring the safety of the global food supply and the overall financial success of stakeholders. In this regard, a wide range of solutions has been made available by introducing deep learning-based classification systems for different staple crops. Despite being one of the most important commercial crops in many parts of the globe, research proposing a smart solution for automatically classifying apple leaf diseases remains relatively unexplored. This study presents a technique for identifying apple leaf diseases based on transfer learning. The system extracts features using a pretrained EfficientNetV2S architecture and passes to a classifier block for effective prediction. The class imbalance issues are tackled by utilizing runtime data augmentation. The effect of various hyperparameters, such as input resolution, learning rate, number of epochs, etc., has been investigated carefully. The competence of the proposed pipeline has been evaluated on the apple leaf disease subset from the publicly available `PlantVillage' dataset, where it achieved an accuracy of 99.21%, outperforming the existing works.

Multi-kernel Correntropy-based Orientation Estimation of IMUs: Gradient Descent Methods

  • Authors: Shilei Li, Lijing Li, Dawei Shi, Yunjiang Lou, Ling Shi
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.06548
  • Pdf link: https://arxiv.org/pdf/2304.06548
  • Abstract
    This paper presents two computationally efficient algorithms for the orientation estimation of inertial measurement units (IMUs): the correntropy-based gradient descent (CGD) and the correntropy-based decoupled orientation estimation (CDOE). Traditional methods, such as gradient descent (GD) and decoupled orientation estimation (DOE), rely on the mean squared error (MSE) criterion, making them vulnerable to external acceleration and magnetic interference. To address this issue, we demonstrate that the multi-kernel correntropy loss (MKCL) is an optimal objective function for maximum likelihood estimation (MLE) when the noise follows a type of heavy-tailed distribution. In certain situations, the estimation error of the MKCL is bounded even in the presence of arbitrarily large outliers. By replacing the standard MSE cost function with MKCL, we develop the CGD and CDOE algorithms. We evaluate the effectiveness of our proposed methods by comparing them with existing algorithms in various situations. Experimental results indicate that our proposed methods (CGD and CDOE) outperform their conventional counterparts (GD and DOE), especially when faced with external acceleration and magnetic disturbances. Furthermore, the new algorithms demonstrate significantly lower computational complexity than Kalman filter-based approaches, making them suitable for applications with low-cost microprocessors.

Multiscale Finite Element Formulations for 2D/1D Problems

  • Authors: Karl Hollaus, Markus Schöbinger
  • Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
  • Arxiv link: https://arxiv.org/abs/2304.06553
  • Pdf link: https://arxiv.org/pdf/2304.06553
  • Abstract
    Multiscale finite element methods for 2D/1D problems have been studied in this work to demonstrate their excellent ability to solve real-world problems. These methods are much more efficient than conventional 3D finite element methods and just as accurate. The 2D/1D multiscale finite element methods are based on a magnetic vector potential or a current vector potential. Known currents for excitation can be replaced by the Biot-Savart-field. Boundary conditions allow to integrate planes of symmetry. All presented approaches consider eddy currents, an insulation layer and preserve the edge effect. A segment of a fictitious electrical machine has been studied to demonstrate all above options, the accuracy and the low computational costs of the 2D/1D multiscale finite element methods.

Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation

  • Authors: Mohit Sharma, Claudio Fantacci, Yuxiang Zhou, Skanda Koppula, Nicolas Heess, Jon Scholz, Yusuf Aytar
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06600
  • Pdf link: https://arxiv.org/pdf/2304.06600
  • Abstract
    Recent works have shown that large models pretrained on common visual learning tasks can provide useful representations for a wide range of specialized perception problems, as well as a variety of robotic manipulation tasks. While prior work on robotic manipulation has predominantly used frozen pretrained features, we demonstrate that in robotics this approach can fail to reach optimal performance, and that fine-tuning of the full model can lead to significantly better results. Unfortunately, fine-tuning disrupts the pretrained visual representation, and causes representational drift towards the fine-tuned task thus leading to a loss of the versatility of the original model. We introduce "lossless adaptation" to address this shortcoming of classical fine-tuning. We demonstrate that appropriate placement of our parameter efficient adapters can significantly reduce the performance gap between frozen pretrained representations and full end-to-end fine-tuning without changes to the original representation and thus preserving original capabilities of the pretrained model. We perform a comprehensive investigation across three major model architectures (ViTs, NFNets, and ResNets), supervised (ImageNet-1K classification) and self-supervised pretrained weights (CLIP, BYOL, Visual MAE) in 3 task domains and 35 individual tasks, and demonstrate that our claims are strongly validated in various settings.

Robustness Measures and Monitors for Time Window Temporal Logic

  • Authors: Ahmad Ahmad, Cristian-Ioan Vasile, Roberto Tron, Calin Belta
  • Subjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.06645
  • Pdf link: https://arxiv.org/pdf/2304.06645
  • Abstract
    Temporal logics (TLs) have been widely used to formalize interpretable tasks for cyber-physical systems. Time Window Temporal Logic (TWTL) has been recently proposed as a specification language for dynamical systems. In particular, it can easily express robotic tasks, and it allows for efficient, automata-based verification and synthesis of control policies for such systems. In this paper, we define two quantitative semantics for this logic, and two corresponding monitoring algorithms, which allow for real-time quantification of satisfaction of formulas by trajectories of discrete-time systems. We demonstrate the new semantics and their runtime monitors on numerical examples.

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

  • Authors: Enze Xie, Lewei Yao, Han Shi, Zhili Liu, Daquan Zhou, Zhaoqiang Liu, Jiawei Li, Zhenguo Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06648
  • Pdf link: https://arxiv.org/pdf/2304.06648
  • Abstract
    Diffusion models have proven to be highly effective in generating high-quality images. However, adapting large pre-trained diffusion models to new domains remains an open challenge, which is critical for real-world applications. This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains. DiffFit is embarrassingly simple that only fine-tunes the bias term and newly-added scaling factors in specific layers, yet resulting in significant training speed-up and reduced model storage costs. Compared with full fine-tuning, DiffFit achieves 2$\times$ training speed-up and only needs to store approximately 0.12% of the total model parameters. Intuitive theoretical analysis has been provided to justify the efficacy of scaling factors on fast adaptation. On 8 downstream datasets, DiffFit achieves superior or competitive performances compared to the full fine-tuning while being more efficient. Remarkably, we show that DiffFit can adapt a pre-trained low-resolution generative model to a high-resolution one by adding minimal cost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of 3.02 on ImageNet 512$\times$512 benchmark by fine-tuning only 25 epochs from a public pre-trained ImageNet 256$\times$256 checkpoint while being 30$\times$ more training efficient than the closest competitor.

DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer

  • Authors: Amit Kumar Rana, Sabarinath Mahadevan, Alexander Hermans, Bastian Leibe
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06668
  • Pdf link: https://arxiv.org/pdf/2304.06668
  • Abstract
    Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth annotations for training, which are expensive to create. Interactive segmentation networks help generate such annotations based on an image and the corresponding user interactions such as clicks. Existing methods for this task can only process a single instance at a time and each user interaction requires a full forward pass through the entire deep network. We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as spatio-temporal queries to a Transformer decoder with a potential to segment multiple object instances in a single iteration. Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image when compared to other methods. DynaMITe achieves state-of-the-art results on multiple existing interactive segmentation benchmarks, and also on the new multi-instance benchmark that we propose in this paper.

Inertia-Aware Microgrid Investment Planning Using Tractable Decomposition Algorithms

  • Authors: Agnes Marjorie Nakiganda, Shahab Dehghan, Petros Aristidou
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06674
  • Pdf link: https://arxiv.org/pdf/2304.06674
  • Abstract
    The integration of the frequency dynamics into Micro-Grid (MG) investment and operational planning problems is vital in ensuring the security of the system in the post-contingency states. However, the task of including transient security constraints in MG planning problems is non-trivial. This is due to the highly non-linear and non-convex nature of the analytical closed form of the frequency metrics (e.g., frequency nadir) and power flow constraints. To handle this issue, this paper presents two algorithms for decomposing the MG investment planning problem into multiple levels to enhance computational tractability and optimality. Furthermore, the sensitivity of the decisions made at each level is captured by corresponding dual cutting planes to model feasible secure regions. This, in turn, ensures both the optimal determination and placement of inertia services and accelerates the convergence of the proposed decomposition algorithms. The efficient and effective performance of the proposed algorithms is tested and verified on an 18-bus Low Voltage (LV) network and a 30-bus Medium Voltage (MV) network under various operating scenarios.

OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems

  • Authors: Jiachang Liu, Sam Rosen, Chudi Zhong, Cynthia Rudin
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.06686
  • Pdf link: https://arxiv.org/pdf/2304.06686
  • Abstract
    We consider an important problem in scientific discovery, identifying sparse governing equations for nonlinear dynamical systems. This involves solving sparse ridge regression problems to provable optimality in order to determine which terms drive the underlying dynamics. We propose a fast algorithm, OKRidge, for sparse ridge regression, using a novel lower bound calculation involving, first, a saddle point formulation, and from there, either solving (i) a linear system or (ii) using an ADMM-based approach, where the proximal operators can be efficiently evaluated by solving another linear system and an isotonic regression problem. We also propose a method to warm-start our solver, which leverages a beam search. Experimentally, our methods attain provable optimality with run times that are orders of magnitude faster than those of the existing MIP formulations solved by the commercial solver Gurobi.

Representing Volumetric Videos as Dynamic MLP Maps

  • Authors: Sida Peng, Yunzhi Yan, Qing Shuai, Hujun Bao, Xiaowei Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06717
  • Pdf link: https://arxiv.org/pdf/2304.06717
  • Abstract
    This paper introduces a novel representation of volumetric videos for real-time view synthesis of dynamic scenes. Recent advances in neural scene representations demonstrate their remarkable capability to model and render complex static scenes, but extending them to represent dynamic scenes is not straightforward due to their slow rendering speed or high storage cost. To solve this problem, our key idea is to represent the radiance field of each frame as a set of shallow MLP networks whose parameters are stored in 2D grids, called MLP maps, and dynamically predicted by a 2D CNN decoder shared by all frames. Representing 3D scenes with shallow MLPs significantly improves the rendering speed, while dynamically predicting MLP parameters with a shared 2D CNN instead of explicitly storing them leads to low storage cost. Experiments show that the proposed approach achieves state-of-the-art rendering quality on the NHR and ZJU-MoCap datasets, while being efficient for real-time rendering with a speed of 41.7 fps for $512 \times 512$ images on an RTX 3090 GPU. The code is available at https://zju3dv.github.io/mlp_maps/.

Keyword: faster

Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays

  • Authors: Chen Xie, Francesco Daghero, Yukai Chen, Marco Castellano, Luca Gandolfi, Andrea Calimera, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06059
  • Pdf link: https://arxiv.org/pdf/2304.06059
  • Abstract
    Ultra-low-resolution Infrared (IR) array sensors offer a low-cost, energy-efficient, and privacy-preserving solution for people counting, with applications such as occupancy monitoring. Previous work has shown that Deep Learning (DL) can yield superior performance on this task. However, the literature was missing an extensive comparative analysis of various efficient DL architectures for IR array-based people counting, that considers not only their accuracy, but also the cost of deploying them on memory- and energy-constrained Internet of Things (IoT) edge nodes. In this work, we address this need by comparing 6 different DL architectures on a novel dataset composed of IR images collected from a commercial 8x8 array, which we made openly available. With a wide architectural exploration of each model type, we obtain a rich set of Pareto-optimal solutions, spanning cross-validated balanced accuracy scores in the 55.70-82.70% range. When deployed on a commercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, these models occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference, while consuming 17.18-120.43 $\mu$J of energy. Our models are significantly more accurate than a previous deterministic method (up to +39.9%), while being up to 3.53x faster and more energy efficient. Further, our models' accuracy is comparable to state-of-the-art DL solutions on similar resolution sensors, despite a much lower complexity. All our models enable continuous, real-time inference on a MCU-based IoT node, with years of autonomous operation without battery recharging.

Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

  • Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.06178
  • Pdf link: https://arxiv.org/pdf/2304.06178
  • Abstract
    Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.

Beyond the Quadratic Time Barrier for Network Unreliability

  • Authors: Ruoxu Cen, William He, Jason Li, Debmalya Panigrahi
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.06552
  • Pdf link: https://arxiv.org/pdf/2304.06552
  • Abstract
    Karger (STOC 1995) gave the first FPTAS for the network (un)reliability problem, setting in motion research over the next three decades that obtained increasingly faster running times, eventually leading to a $\tilde{O}(n^2)$-time algorithm (Karger, STOC 2020). This represented a natural culmination of this line of work because the algorithmic techniques used can enumerate $\Theta(n^2)$ (near)-minimum cuts. In this paper, we go beyond this quadratic barrier and obtain a faster algorithm for the network unreliability problem. Our algorithm runs in $m^{1+o(1)} + \tilde{O}(n^{1.5})$ time. Our main contribution is a new estimator for network unreliability in very reliable graphs. These graphs are usually the bottleneck for network unreliability since the disconnection event is elusive. Our estimator is obtained by defining an appropriate importance sampling subroutine on a dual spanning tree packing of the graph. To complement this estimator for very reliable graphs, we use recursive contraction for moderately reliable graphs. We show that an interleaving of sparsification and contraction can be used to obtain a better parametrization of the recursive contraction algorithm that yields a faster running time matching the one obtained for the very reliable case.

Class-Incremental Learning of Plant and Disease Detection: Growing Branches with Knowledge Distillation

  • Authors: Mathieu Pagé Fortin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06619
  • Pdf link: https://arxiv.org/pdf/2304.06619
  • Abstract
    This paper investigates the problem of class-incremental object detection for agricultural applications where a model needs to learn new plant species and diseases incrementally without forgetting the previously learned ones. We adapt two public datasets to include new categories over time, simulating a more realistic and dynamic scenario. We then compare three class-incremental learning methods that leverage different forms of knowledge distillation to mitigate catastrophic forgetting. Our experiments show that all three methods suffer from catastrophic forgetting, but the recent Dynamic Y-KD approach, which additionally uses a dynamic architecture that grows new branches to learn new tasks, outperforms ILOD and Faster-ILOD in most scenarios both on new and old classes. These results highlight the challenges and opportunities of continual object detection for agricultural applications. In particular, the large intra-class and small inter-class variability that is typical of plant images exacerbate the difficulty of learning new categories without interfering with previous knowledge. We publicly release our code to encourage future work.

OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems

  • Authors: Jiachang Liu, Sam Rosen, Chudi Zhong, Cynthia Rudin
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.06686
  • Pdf link: https://arxiv.org/pdf/2304.06686
  • Abstract
    We consider an important problem in scientific discovery, identifying sparse governing equations for nonlinear dynamical systems. This involves solving sparse ridge regression problems to provable optimality in order to determine which terms drive the underlying dynamics. We propose a fast algorithm, OKRidge, for sparse ridge regression, using a novel lower bound calculation involving, first, a saddle point formulation, and from there, either solving (i) a linear system or (ii) using an ADMM-based approach, where the proximal operators can be efficiently evaluated by solving another linear system and an isotonic regression problem. We also propose a method to warm-start our solver, which leverages a beam search. Experimentally, our methods attain provable optimality with run times that are orders of magnitude faster than those of the existing MIP formulations solved by the commercial solver Gurobi.

Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields

  • Authors: Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, Peter Hedman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06706
  • Pdf link: https://arxiv.org/pdf/2304.06706
  • Abstract
    Neural Radiance Field training can be accelerated through the use of grid-based representations in NeRF's learned mapping from spatial coordinates to colors and volumetric density. However, these grid-based approaches lack an explicit understanding of scale and therefore often introduce aliasing, usually in the form of jaggies or missing scene content. Anti-aliasing has previously been addressed by mip-NeRF 360, which reasons about sub-volumes along a cone rather than points along a ray, but this approach is not natively compatible with current grid-based techniques. We show how ideas from rendering and signal processing can be used to construct a technique that combines mip-NeRF 360 and grid-based models such as Instant NGP to yield error rates that are 8% - 76% lower than either prior technique, and that trains 22x faster than mip-NeRF 360.

Keyword: mobile

Situational-Aware Multi-Graph Convolutional Recurrent Network (SA-MGCRN) for Travel Demand Forecasting During Wildfires

  • Authors: Xiaojian Zhang, Xilei Zhao, Yiming Xu, Ruggiero Lovreglio, Daniel Nilsson
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06233
  • Pdf link: https://arxiv.org/pdf/2304.06233
  • Abstract
    Real-time forecasting of travel demand during wildfire evacuations is crucial for emergency managers and transportation planners to make timely and better-informed decisions. However, few studies focus on accurate travel demand forecasting in large-scale emergency evacuations. Therefore, this study develops and tests a new methodological framework for modeling trip generation in wildfire evacuations by using (a) large-scale GPS data generated by mobile devices and (b) state-of-the-art AI technologies. The proposed methodology aims at forecasting evacuation trips and other types of trips. Based on the travel demand inferred from the GPS data, we develop a new deep learning model, i.e., Situational-Aware Multi-Graph Convolutional Recurrent Network (SA-MGCRN), along with a model updating scheme to achieve real-time forecasting of travel demand during wildfire evacuations. The proposed methodological framework is tested in this study for a real-world case study: the 2019 Kincade Fire in Sonoma County, CA. The results show that SA-MGCRN significantly outperforms all the selected state-of-the-art benchmarks in terms of prediction performance. Our finding suggests that the most important model components of SA-MGCRN are evacuation order/warning information, proximity to fire, and population change, which are consistent with behavioral theories and empirical findings.

Loosely Coupled Odometry, UWB Ranging, and Cooperative Spatial Detection for Relative Monte-Carlo Multi-Robot Localization

  • Authors: Xianjia Yu, Paola Torrico Morrn, Sahar Salimpour, Jorge Pena Queralta, Tomi Westerlund
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06264
  • Pdf link: https://arxiv.org/pdf/2304.06264
  • Abstract
    As mobile robots become more ubiquitous, their deployments grow across use cases where GNSS positioning is either unavailable or unreliable. This has led to increased interest in multi-modal relative localization methods. Complementing onboard odometry, ranging allows for relative state estimation, with ultra-wideband (UWB) ranging having gained widespread recognition due to its low cost and centimeter-level out-of-box accuracy. Infrastructure-free localization methods allow for more dynamic, ad-hoc, and flexible deployments, yet they have received less attention from the research community. In this work, we propose a cooperative relative multi-robot localization where we leverage inter-robot ranging and simultaneous spatial detections of objects in the environment. To achieve this, we equip robots with a single UWB transceiver and a stereo camera. We propose a novel Monte-Carlo approach to estimate relative states by either employing only UWB ranges or dynamically integrating simultaneous spatial detections from the stereo cameras. We also address the challenges for UWB ranging error mitigation, especially in non-line-of-sight, with a study on different LSTM networks to estimate the ranging error. The proposed approach has multiple benefits. First, we show that a single range is enough to estimate the accurate relative states of two robots when fusing odometry measurements. Second, our experiments also demonstrate that our approach surpasses traditional methods such as multilateration in terms of accuracy. Third, to increase accuracy even further, we allow for the integration of cooperative spatial detections. Finally, we show how ROS 2 and Zenoh can be integrated to build a scalable wireless communication solution for multi-robot systems. The experimental validation includes real-time deployment and autonomous navigation based on the relative positioning method.

Gamifying Math Education using Object Detection

  • Authors: Yueqiu Sun, Rohitkrishna Nambiar, Vivek Vidyasagaran
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06270
  • Pdf link: https://arxiv.org/pdf/2304.06270
  • Abstract
    Manipulatives used in the right way help improve mathematical concepts leading to better learning outcomes. In this paper, we present a phygital (physical + digital) curriculum inspired teaching system for kids aged 5-8 to learn geometry using shape tile manipulatives. Combining smaller shapes to form larger ones is an important skill kids learn early on which requires shape tiles to be placed close to each other in the play area. This introduces a challenge of oriented object detection for densely packed objects with arbitrary orientations. Leveraging simulated data for neural network training and light-weight mobile architectures, we enable our system to understand user interactions and provide real-time audiovisual feedback. Experimental results show that our network runs real-time with high precision/recall on consumer devices, thereby providing a consistent and enjoyable learning experience.

Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution

  • Authors: Zhuo Su, Jiehua Zhang, Tianpeng Liu, Zhen Liu, Shuanghui Zhang, Matti Pietikäinen, Li Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06305
  • Pdf link: https://arxiv.org/pdf/2304.06305
  • Abstract
    This paper proposes a novel module called middle spectrum grouped convolution (MSGC) for efficient deep convolutional neural networks (DCNNs) with the mechanism of grouped convolution. It explores the broad "middle spectrum" area between channel pruning and conventional grouped convolution. Compared with channel pruning, MSGC can retain most of the information from the input feature maps due to the group mechanism; compared with grouped convolution, MSGC benefits from the learnability, the core of channel pruning, for constructing its group topology, leading to better channel division. The middle spectrum area is unfolded along four dimensions: group-wise, layer-wise, sample-wise, and attention-wise, making it possible to reveal more powerful and interpretable structures. As a result, the proposed module acts as a booster that can reduce the computational cost of the host backbones for general image recognition with even improved predictive accuracy. For example, in the experiments on ImageNet dataset for image classification, MSGC can reduce the multiply-accumulates (MACs) of ResNet-18 and ResNet-50 by half but still increase the Top-1 accuracy by more than 1%. With 35% reduction of MACs, MSGC can also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on MS COCO dataset for object detection show similar observations. Our code and trained models are available at https://github.com/hellozhuo/msgc.

Towards Understanding the Benefits and Challenges of Demand Responsive Public Transit- A Case Study in the City of Charlotte, NC

  • Authors: Sanaz Sadat Hosseini, Mona Azarbayjani, Jason Lawrence, Hamed Tabkhi
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06467
  • Pdf link: https://arxiv.org/pdf/2304.06467
  • Abstract
    Access to adequate public transportation plays a critical role in inequity and socio-economic mobility, particularly in low-income communities. Low-income workers who rely heavily on public transportation face a spatial disparity between home and work, which leads to higher unemployment, longer job searches, and longer commute times. The overarching goal of this study is to get initial data that would result in creating a connected, coordinated, demand-responsive, and efficient public bus system that minimizes transit gaps for low-income, transit-dependent communities. To create equitable metropolitan public transportation, this paper evaluates existing CATS mobile applications that assist passengers in finding bus routes and arrival times. Our community survey methodology includes filling out questionnaires on Charlotte's current bus system on specific bus lines and determining user acceptance for a future novel smart technology. We have also collected data on the demand and transit gap for a real-world pilot study, Sprinter bus line, Bus line 7, Bus line 9, and Bus lines 97-99. These lines connect all of Charlotte City's main areas and are the most important bus lines in the system. On the studied routes, the primary survey results indicate that the current bus system has many flaws, the major one being the lack of proper timing to meet the needs of passengers. The most common problems are long commutes and long waiting times at stations. Moreover, the existing application provides inaccurate information, and on average, 80 percent of travelers and respondents are inclined to use new technology.

IoT-Based Water Quality Assessment System for Industrial Waste WaterHealthcare Perspective

  • Authors: Abdur Rab Dhruba, Kazi Nabiul Alam, Md. Shakib Khan, Sananda Saha, Mohammad Monirujjaman Khan, Mohammed Baz, Mehedi Masud, Mohammed A. AlZain
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06491
  • Pdf link: https://arxiv.org/pdf/2304.06491
  • Abstract
    The environment, especially water, gets polluted due to industrialization and urbanization. Pollution due to industrialization and urbanization has harmful effects on both the environment and the lives on Earth. This polluted water can cause food poisoning, diarrhea, short-term gastrointestinal problems, respiratory diseases, skin problems, and other serious health complications. In a developing country like Bangladesh, where ready-made garments sector is one of the major sources of the total Gross Domestic Product (GDP), most of the wastes released from the garment factories are dumped into the nearest rivers or canals. Hence, the quality of the water of these bodies become very incompatible for the living beings, and so, it has become one of the major threats to the environment and human health. In addition, the amount of fish in the rivers and canals in Bangladesh is decreasing day by day as a result of water pollution. Therefore, to save fish and other water animals and the environment, we need to monitor the quality of the water and find out the reasons for the pollution. Real-time monitoring of the quality of water is vital for controlling water pollution. Most of the approaches for controlling water pollution are mainly biological and lab-based, which takes a lot of time and resources. To address this issue, we developed an Internet of Things (IoT)-based real-time water quality monitoring system, integrated with a mobile application. The proposed system in this research measures some of the most important indexes of water, including the potential of hydrogen (pH), total dissolved solids (TDS), and turbidity, and temperature of water. The proposed system results will be very helpful in saving the environment, and thus, improving the health of living creatures on Earth.

IoT-Based Remote Health Monitoring System Employing Smart Sensors for Asthma Patients during COVID-19 Pandemic

  • Authors: Nafisa Shamim Rafa, Basma Binte Azmal, Abdur Rab Dhruba, Mohammad Monirujjaman Khan, Turki M. Alanazi, Faris A. Almalki, Othman AlOmeir
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06511
  • Pdf link: https://arxiv.org/pdf/2304.06511
  • Abstract
    COVID19 and asthma are respiratory diseases that can be life threatening in uncontrolled circumstances and require continuous monitoring. A poverty stricken South Asian country like Bangladesh has been bearing the brunt of the COVID19 pandemic since its beginning. The majority of the country's population resides in rural areas, where proper healthcare is difficult to access. This emphasizes the necessity of telemedicine, implementing the concept of the Internet of Things (IoT), which is still under development in Bangladesh. This paper demonstrates how the current challenges in the healthcare system are resolvable through the design of a remote health and environment monitoring system, specifically for asthma patients who are at an increased risk of COVID19. Since on-time treatment is essential, this system will allow doctors and medical staff to receive patient information in real time and deliver their services immediately to the patient regardless of their location. The proposed system consists of various sensors collecting heart rate, body temperature, ambient temperature, humidity, and air quality data and processing them through the Arduino Microcontroller. It is integrated with a mobile application. All this data is sent to the mobile application via a Bluetooth module and updated every few seconds so that the medical staff can instantly track patients' conditions and emergencies. The developed prototype is portable and easily usable by anyone. The system has been applied to five people of different ages and medical histories over a particular period. Upon analyzing all their data, it became clear which participants were particularly vulnerable to health deterioration and needed constant observation. Through this research, awareness about asthmatic symptoms will improve and help prevent their severity through effective treatment anytime, anywhere.

Keyword: pruning

Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution

  • Authors: Zhuo Su, Jiehua Zhang, Tianpeng Liu, Zhen Liu, Shuanghui Zhang, Matti Pietikäinen, Li Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06305
  • Pdf link: https://arxiv.org/pdf/2304.06305
  • Abstract
    This paper proposes a novel module called middle spectrum grouped convolution (MSGC) for efficient deep convolutional neural networks (DCNNs) with the mechanism of grouped convolution. It explores the broad "middle spectrum" area between channel pruning and conventional grouped convolution. Compared with channel pruning, MSGC can retain most of the information from the input feature maps due to the group mechanism; compared with grouped convolution, MSGC benefits from the learnability, the core of channel pruning, for constructing its group topology, leading to better channel division. The middle spectrum area is unfolded along four dimensions: group-wise, layer-wise, sample-wise, and attention-wise, making it possible to reveal more powerful and interpretable structures. As a result, the proposed module acts as a booster that can reduce the computational cost of the host backbones for general image recognition with even improved predictive accuracy. For example, in the experiments on ImageNet dataset for image classification, MSGC can reduce the multiply-accumulates (MACs) of ResNet-18 and ResNet-50 by half but still increase the Top-1 accuracy by more than 1%. With 35% reduction of MACs, MSGC can also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on MS COCO dataset for object detection show similar observations. Our code and trained models are available at https://github.com/hellozhuo/msgc.

Learning Accurate Performance Predictors for Ultrafast Automated Model Compression

  • Authors: Ziwei Wang, Jiwen Lu, Han Xiao, Shengyu Liu, Jie Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06393
  • Pdf link: https://arxiv.org/pdf/2304.06393
  • Abstract
    In this paper, we propose an ultrafast automated model compression framework called SeerNet for flexible network deployment. Conventional non-differen-tiable methods discretely search the desirable compression policy based on the accuracy from exhaustively trained lightweight models, and existing differentiable methods optimize an extremely large supernet to obtain the required compressed model for deployment. They both cause heavy computational cost due to the complex compression policy search and evaluation process. On the contrary, we obtain the optimal efficient networks by directly optimizing the compression policy with an accurate performance predictor, where the ultrafast automated model compression for various computational cost constraint is achieved without complex compression policy search and evaluation. Specifically, we first train the performance predictor based on the accuracy from uncertain compression policies actively selected by efficient evolutionary search, so that informative supervision is provided to learn the accurate performance predictor with acceptable cost. Then we leverage the gradient that maximizes the predicted performance under the barrier complexity constraint for ultrafast acquisition of the desirable compression policy, where adaptive update stepsizes with momentum are employed to enhance optimality of the acquired pruning and quantization strategy. Compared with the state-of-the-art automated model compression methods, experimental results on image classification and object detection show that our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.

Keyword: voxel

$E(3) \times SO(3)$-Equivariant Networks for Spherical Deconvolution in Diffusion MRI

  • Authors: Axel Elaldi, Guido Gerig, Neel Dey
  • Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06103
  • Pdf link: https://arxiv.org/pdf/2304.06103
  • Abstract
    We present Roto-Translation Equivariant Spherical Deconvolution (RT-ESD), an $E(3)\times SO(3)$ equivariant framework for sparse deconvolution of volumes where each voxel contains a spherical signal. Such 6D data naturally arises in diffusion MRI (dMRI), a medical imaging modality widely used to measure microstructure and structural connectivity. As each dMRI voxel is typically a mixture of various overlapping structures, there is a need for blind deconvolution to recover crossing anatomical structures such as white matter tracts. Existing dMRI work takes either an iterative or deep learning approach to sparse spherical deconvolution, yet it typically does not account for relationships between neighboring measurements. This work constructs equivariant deep learning layers which respect to symmetries of spatial rotations, reflections, and translations, alongside the symmetries of voxelwise spherical rotations. As a result, RT-ESD improves on previous work across several tasks including fiber recovery on the DiSCo dataset, deconvolution-derived partial volume estimation on real-world \textit{in vivo} human brain dMRI, and improved downstream reconstruction of fiber tractograms on the Tractometer dataset. Our implementation is available at https://github.com/AxelElaldi/e3so3_conv

Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

  • Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.06178
  • Pdf link: https://arxiv.org/pdf/2304.06178
  • Abstract
    Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.

Brain Structure Ages -- A new biomarker for multi-disease classification

  • Authors: Huy-Dung Nguyen, Michaël Clément, Boris Mansencal, Pierrick Coupé
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06591
  • Pdf link: https://arxiv.org/pdf/2304.06591
  • Abstract
    Age is an important variable to describe the expected brain's anatomy status across the normal aging trajectory. The deviation from that normative aging trajectory may provide some insights into neurological diseases. In neuroimaging, predicted brain age is widely used to analyze different diseases. However, using only the brain age gap information (\ie the difference between the chronological age and the estimated age) can be not enough informative for disease classification problems. In this paper, we propose to extend the notion of global brain age by estimating brain structure ages using structural magnetic resonance imaging. To this end, an ensemble of deep learning models is first used to estimate a 3D aging map (\ie voxel-wise age estimation). Then, a 3D segmentation mask is used to obtain the final brain structure ages. This biomarker can be used in several situations. First, it enables to accurately estimate the brain age for the purpose of anomaly detection at the population level. In this situation, our approach outperforms several state-of-the-art methods. Second, brain structure ages can be used to compute the deviation from the normal aging process of each brain structure. This feature can be used in a multi-disease classification task for an accurate differential diagnosis at the subject level. Finally, the brain structure age deviations of individuals can be visualized, providing some insights about brain abnormality and helping clinicians in real medical contexts.

Keyword: lidar

Survey on LiDAR Perception in Adverse Weather Conditions

  • Authors: Mariella Dreissig, Dominik Scheuble, Florian Piewak, Joschka Boedecker
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06312
  • Pdf link: https://arxiv.org/pdf/2304.06312
  • Abstract
    Autonomous vehicles rely on a variety of sensors to gather information about their surrounding. The vehicle's behavior is planned based on the environment perception, making its reliability crucial for safety reasons. The active LiDAR sensor is able to create an accurate 3D representation of a scene, making it a valuable addition for environment perception for autonomous vehicles. Due to light scattering and occlusion, the LiDAR's performance change under adverse weather conditions like fog, snow or rain. This limitation recently fostered a large body of research on approaches to alleviate the decrease in perception performance. In this survey, we gathered, analyzed, and discussed different aspects on dealing with adverse weather conditions in LiDAR-based environment perception. We address topics such as the availability of appropriate data, raw point cloud processing and denoising, robust perception algorithms and sensor fusion to mitigate adverse weather induced shortcomings. We furthermore identify the most pressing gaps in the current literature and pinpoint promising research directions.

An Automotive Case Study on the Limits of Approximation for Object Detection

  • Authors: Martí Caro, Hamid Tabani, Jaume Abella, Francesc Moll, Enric Morancho, Ramon Canal, Josep Altet, Antonio Calomarde, Francisco J. Cazorla, Antonio Rubio, Pau Fontova, Jordi Fornt
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.06327
  • Pdf link: https://arxiv.org/pdf/2304.06327
  • Abstract
    The accuracy of camera-based object detection (CBOD) built upon deep learning is often evaluated against the real objects in frames only. However, such simplistic evaluation ignores the fact that many unimportant objects are small, distant, or background, and hence, their misdetections have less impact than those for closer, larger, and foreground objects in domains such as autonomous driving. Moreover, sporadic misdetections are irrelevant since confidence on detections is typically averaged across consecutive frames, and detection devices (e.g. cameras, LiDARs) are often redundant, thus providing fault tolerance. This paper exploits such intrinsic fault tolerance of the CBOD process, and assesses in an automotive case study to what extent CBOD can tolerate approximation coming from multiple sources such as lower precision arithmetic, approximate arithmetic units, and even random faults due to, for instance, low voltage operation. We show that the accuracy impact of those sources of approximation is within 1% of the baseline even when considering the three approximate domains simultaneously, and hence, multiple sources of approximation can be exploited to build highly efficient accelerators for CBOD in cars.

RadarGNN: Transformation Invariant Graph Neural Network for Radar-based Perception

  • Authors: Felix Fent, Philipp Bauerschmidt, Markus Lienkamp
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06547
  • Pdf link: https://arxiv.org/pdf/2304.06547
  • Abstract
    A reliable perception has to be robust against challenging environmental conditions. Therefore, recent efforts focused on the use of radar sensors in addition to camera and lidar sensors for perception applications. However, the sparsity of radar point clouds and the poor data availability remain challenging for current perception methods. To address these challenges, a novel graph neural network is proposed that does not just use the information of the points themselves but also the relationships between the points. The model is designed to consider both point features and point-pair features, embedded in the edges of the graph. Furthermore, a general approach for achieving transformation invariance is proposed which is robust against unseen scenarios and also counteracts the limited data availability. The transformation invariance is achieved by an invariant data representation rather than an invariant model architecture, making it applicable to other methods. The proposed RadarGNN model outperforms all previous methods on the RadarScenes dataset. In addition, the effects of different invariances on the object detection and semantic segmentation quality are investigated. The code is made available as open-source software under https://github.com/TUMFTM/RadarGNN.

Keyword: diffusion

Social Biases through the Text-to-Image Generation Lens

  • Authors: Ranjita Naik, Besmira Nushi
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06034
  • Pdf link: https://arxiv.org/pdf/2304.06034
  • Abstract
    Text-to-Image (T2I) generation is enabling new applications that support creators, designers, and general end users of productivity software by generating illustrative content with high photorealism starting from a given descriptive text as a prompt. Such models are however trained on massive amounts of web data, which surfaces the peril of potential harmful biases that may leak in the generation process itself. In this paper, we take a multi-dimensional approach to studying and quantifying common social biases as reflected in the generated images, by focusing on how occupations, personality traits, and everyday situations are depicted across representations of (perceived) gender, age, race, and geographical location. Through an extensive set of both automated and human evaluation experiments we present findings for two popular T2I models: DALLE-v2 and Stable Diffusion. Our results reveal that there exist severe occupational biases of neutral prompts majorly excluding groups of people from results for both models. Such biases can get mitigated by increasing the amount of specification in the prompt itself, although the prompting mitigation will not address discrepancies in image quality or other usages of the model or its representations in other scenarios. Further, we observe personality traits being associated with only a limited set of people at the intersection of race, gender, and age. Finally, an analysis of geographical location representations on everyday situations (e.g., park, food, weddings) shows that for most situations, images generated through default location-neutral prompts are closer and more similar to images generated for locations of United States and Germany.

$E(3) \times SO(3)$-Equivariant Networks for Spherical Deconvolution in Diffusion MRI

  • Authors: Axel Elaldi, Guido Gerig, Neel Dey
  • Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06103
  • Pdf link: https://arxiv.org/pdf/2304.06103
  • Abstract
    We present Roto-Translation Equivariant Spherical Deconvolution (RT-ESD), an $E(3)\times SO(3)$ equivariant framework for sparse deconvolution of volumes where each voxel contains a spherical signal. Such 6D data naturally arises in diffusion MRI (dMRI), a medical imaging modality widely used to measure microstructure and structural connectivity. As each dMRI voxel is typically a mixture of various overlapping structures, there is a need for blind deconvolution to recover crossing anatomical structures such as white matter tracts. Existing dMRI work takes either an iterative or deep learning approach to sparse spherical deconvolution, yet it typically does not account for relationships between neighboring measurements. This work constructs equivariant deep learning layers which respect to symmetries of spatial rotations, reflections, and translations, alongside the symmetries of voxelwise spherical rotations. As a result, RT-ESD improves on previous work across several tasks including fiber recovery on the DiSCo dataset, deconvolution-derived partial volume estimation on real-world \textit{in vivo} human brain dMRI, and improved downstream reconstruction of fiber tractograms on the Tractometer dataset. Our implementation is available at https://github.com/AxelElaldi/e3so3_conv

PATMAT: Person Aware Tuning of Mask-Aware Transformer for Face Inpainting

  • Authors: Saman Motamed, Jianjin Xu, Chen Henry Wu, Fernando De la Torre
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06107
  • Pdf link: https://arxiv.org/pdf/2304.06107
  • Abstract
    Generative models such as StyleGAN2 and Stable Diffusion have achieved state-of-the-art performance in computer vision tasks such as image synthesis, inpainting, and de-noising. However, current generative models for face inpainting often fail to preserve fine facial details and the identity of the person, despite creating aesthetically convincing image structures and textures. In this work, we propose Person Aware Tuning (PAT) of Mask-Aware Transformer (MAT) for face inpainting, which addresses this issue. Our proposed method, PATMAT, effectively preserves identity by incorporating reference images of a subject and fine-tuning a MAT architecture trained on faces. By using ~40 reference images, PATMAT creates anchor points in MAT's style module, and tunes the model using the fixed anchors to adapt the model to a new face identity. Moreover, PATMAT's use of multiple images per anchor during training allows the model to use fewer reference images than competing methods. We demonstrate that PATMAT outperforms state-of-the-art models in terms of image quality, the preservation of person-specific details, and the identity of the subject. Our results suggest that PATMAT can be a promising approach for improving the quality of personalized face inpainting.

An Edit Friendly DDPM Noise Space: Inversion and Manipulations

  • Authors: Inbar Huberman-Spiegelglas, Vladimir Kulikov, Tomer Michaeli
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06140
  • Pdf link: https://arxiv.org/pdf/2304.06140
  • Abstract
    Denoising diffusion probabilistic models (DDPMs) employ a sequence of white Gaussian noise samples to generate an image. In analogy with GANs, those noise maps could be considered as the latent code associated with the generated image. However, this native noise space does not possess a convenient structure, and is thus challenging to work with in editing tasks. Here, we propose an alternative latent noise space for DDPM that enables a wide range of editing operations via simple means, and present an inversion method for extracting these edit-friendly noise maps for any given image (real or synthetically generated). As opposed to the native DDPM noise space, the edit-friendly noise maps do not have a standard normal distribution and are not statistically independent across timesteps. However, they allow perfect reconstruction of any desired image, and simple transformations on them translate into meaningful manipulations of the output image (e.g., shifting, color edits). Moreover, in text-conditional models, fixing those noise maps while changing the text prompt, modifies semantics while retaining structure. We illustrate how this property enables text-based editing of real images via the diverse DDPM sampling scheme (in contrast to the popular non-diverse DDIM inversion). We also show how it can be used within existing diffusion-based editing methods to improve their quality and diversity.

Intriguing properties of synthetic images: from generative adversarial networks to diffusion models

  • Authors: Riccardo Corvi, Davide Cozzolino, Giovanni Poggi, Koki Nagano, Luisa Verdoliva
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06408
  • Pdf link: https://arxiv.org/pdf/2304.06408
  • Abstract
    Detecting fake images is becoming a major goal of computer vision. This need is becoming more and more pressing with the continuous improvement of synthesis methods based on Generative Adversarial Networks (GAN), and even more with the appearance of powerful methods based on Diffusion Models (DM). Towards this end, it is important to gain insight into which image features better discriminate fake images from real ones. In this paper we report on our systematic study of a large number of image generators of different families, aimed at discovering the most forensically relevant characteristics of real and generated images. Our experiments provide a number of interesting observations and shed light on some intriguing properties of synthetic images: (1) not only the GAN models but also the DM and VQ-GAN (Vector Quantized Generative Adversarial Networks) models give rise to visible artifacts in the Fourier domain and exhibit anomalous regular patterns in the autocorrelation; (2) when the dataset used to train the model lacks sufficient variety, its biases can be transferred to the generated images; (3) synthetic and real images exhibit significant differences in the mid-high frequency signal content, observable in their radial and angular spectral power distributions.

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

  • Authors: Enze Xie, Lewei Yao, Han Shi, Zhili Liu, Daquan Zhou, Zhaoqiang Liu, Jiawei Li, Zhenguo Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06648
  • Pdf link: https://arxiv.org/pdf/2304.06648
  • Abstract
    Diffusion models have proven to be highly effective in generating high-quality images. However, adapting large pre-trained diffusion models to new domains remains an open challenge, which is critical for real-world applications. This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains. DiffFit is embarrassingly simple that only fine-tunes the bias term and newly-added scaling factors in specific layers, yet resulting in significant training speed-up and reduced model storage costs. Compared with full fine-tuning, DiffFit achieves 2$\times$ training speed-up and only needs to store approximately 0.12% of the total model parameters. Intuitive theoretical analysis has been provided to justify the efficacy of scaling factors on fast adaptation. On 8 downstream datasets, DiffFit achieves superior or competitive performances compared to the full fine-tuning while being more efficient. Remarkably, we show that DiffFit can adapt a pre-trained low-resolution generative model to a high-resolution one by adding minimal cost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of 3.02 on ImageNet 512$\times$512 benchmark by fine-tuning only 25 epochs from a public pre-trained ImageNet 256$\times$256 checkpoint while being 30$\times$ more training efficient than the closest competitor.

Learning Controllable 3D Diffusion Models from Single-view Images

  • Authors: Jiatao Gu, Qingzhe Gao, Shuangfei Zhai, Baoquan Chen, Lingjie Liu, Josh Susskind
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06700
  • Pdf link: https://arxiv.org/pdf/2304.06700
  • Abstract
    Diffusion models have recently become the de-facto approach for generative modeling in the 2D domain. However, extending diffusion models to 3D is challenging due to the difficulties in acquiring 3D ground truth data for training. On the other hand, 3D GANs that integrate implicit 3D representations into GANs have shown remarkable 3D-aware generation when trained only on single-view image datasets. However, 3D GANs do not provide straightforward ways to precisely control image synthesis. To address these challenges, We present Control3Diff, a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis for single-view datasets. Control3Diff explicitly models the underlying latent distribution (optionally conditioned on external inputs), thus enabling direct control during the diffusion process. Moreover, our approach is general and applicable to any type of controlling input, allowing us to train it with the same diffusion objective without any auxiliary supervision. We validate the efficacy of Control3Diff on standard image generation benchmarks, including FFHQ, AFHQ, and ShapeNet, using various conditioning inputs such as images, sketches, and text prompts. Please see the project website (\url{https://jiataogu.me/control3diff}) for video comparisons.

DiffusionRig: Learning Personalized Priors for Facial Appearance Editing

  • Authors: Zheng Ding, Xuaner Zhang, Zhihao Xia, Lars Jebe, Zhuowen Tu, Xiuming Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.06711
  • Pdf link: https://arxiv.org/pdf/2304.06711
  • Abstract
    We address the problem of learning person-specific facial priors from a small number (e.g., 20) of portrait photos of the same person. This enables us to edit this specific person's facial appearance, such as expression and lighting, while preserving their identity and high-frequency facial details. Key to our approach, which we dub DiffusionRig, is a diffusion model conditioned on, or "rigged by," crude 3D face models estimated from single in-the-wild images by an off-the-shelf estimator. On a high level, DiffusionRig learns to map simplistic renderings of 3D face models to realistic photos of a given person. Specifically, DiffusionRig is trained in two stages: It first learns generic facial priors from a large-scale face dataset and then person-specific priors from a small portrait photo collection of the person of interest. By learning the CGI-to-photo mapping with such personalized priors, DiffusionRig can "rig" the lighting, facial expression, head pose, etc. of a portrait photo, conditioned only on coarse 3D models while preserving this person's identity and other high-frequency characteristics. Qualitative and quantitative experiments show that DiffusionRig outperforms existing approaches in both identity preservation and photorealism. Please see the project website: https://diffusionrig.github.io for the supplemental material, video, code, and data.

Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction

  • Authors: Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, Hao Su
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06714
  • Pdf link: https://arxiv.org/pdf/2304.06714
  • Abstract
    3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images. Despite numerous task-specific methods, developing a comprehensive model remains challenging. In this paper, we present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects. Previous studies have used two-stage approaches that rely on pretrained NeRFs as real data to train diffusion models. In contrast, we propose a new single-stage training paradigm with an end-to-end objective that jointly optimizes a NeRF auto-decoder and a latent diffusion model, enabling simultaneous 3D reconstruction and prior learning, even from sparsely available views. At test time, we can directly sample the diffusion prior for unconditional generation, or combine it with arbitrary observations of unseen objects for NeRF reconstruction. SSDNeRF demonstrates robust results comparable to or better than leading task-specific methods in unconditional generation and single/sparse-view 3D reconstruction.

Expressive Text-to-Image Generation with Rich Text

  • Authors: Songwei Ge, Taesung Park, Jun-Yan Zhu, Jia-Bin Huang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06720
  • Pdf link: https://arxiv.org/pdf/2304.06720
  • Abstract
    Plain text has become a prevalent interface for text-to-image synthesis. However, its limited customization options hinder users from accurately describing desired outputs. For example, plain text makes it hard to specify continuous quantities, such as the precise RGB color value or importance of each word. Furthermore, creating detailed text prompts for complex scenes is tedious for humans to write and challenging for text encoders to interpret. To address these challenges, we propose using a rich-text editor supporting formats such as font style, size, color, and footnote. We extract each word's attributes from rich text to enable local style control, explicit token reweighting, precise color rendering, and detailed region synthesis. We achieve these capabilities through a region-based diffusion process. We first obtain each word's region based on cross-attention maps of a vanilla diffusion process using plain text. For each region, we enforce its text attributes by creating region-specific detailed prompts and applying region-specific guidance. We present various examples of image generation from rich text and demonstrate that our method outperforms strong baselines with quantitative evaluations.

Keyword: dynamic

Fairness: from the ethical principle to the practice of Machine Learning development as an ongoing agreement with stakeholders

  • Authors: Georgina Curto, Flavio Comim
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06031
  • Pdf link: https://arxiv.org/pdf/2304.06031
  • Abstract
    This paper clarifies why bias cannot be completely mitigated in Machine Learning (ML) and proposes an end-to-end methodology to translate the ethical principle of justice and fairness into the practice of ML development as an ongoing agreement with stakeholders. The pro-ethical iterative process presented in the paper aims to challenge asymmetric power dynamics in the fairness decision making within ML design and support ML development teams to identify, mitigate and monitor bias at each step of ML systems development. The process also provides guidance on how to explain the always imperfect trade-offs in terms of bias to users.

Web 3.0: The Future of Internet

  • Authors: Wensheng Gan, Zhenqiang Ye, Shicheng Wan, Philip S. Yu
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.06032
  • Pdf link: https://arxiv.org/pdf/2304.06032
  • Abstract
    With the rapid growth of the Internet, human daily life has become deeply bound to the Internet. To take advantage of massive amounts of data and information on the internet, the Web architecture is continuously being reinvented and upgraded. From the static informative characteristics of Web 1.0 to the dynamic interactive features of Web 2.0, scholars and engineers have worked hard to make the internet world more open, inclusive, and equal. Indeed, the next generation of Web evolution (i.e., Web 3.0) is already coming and shaping our lives. Web 3.0 is a decentralized Web architecture that is more intelligent and safer than before. The risks and ruin posed by monopolists or criminals will be greatly reduced by a complete reconstruction of the Internet and IT infrastructure. In a word, Web 3.0 is capable of addressing web data ownership according to distributed technology. It will optimize the internet world from the perspectives of economy, culture, and technology. Then it promotes novel content production methods, organizational structures, and economic forms. However, Web 3.0 is not mature and is now being disputed. Herein, this paper presents a comprehensive survey of Web 3.0, with a focus on current technologies, challenges, opportunities, and outlook. This article first introduces a brief overview of the history of World Wide Web as well as several differences among Web 1.0, Web 2.0, Web 3.0, and Web3. Then, some technical implementations of Web 3.0 are illustrated in detail. We discuss the revolution and benefits that Web 3.0 brings. Finally, we explore several challenges and issues in this promising area.

Learning solution of nonlinear constitutive material models using physics-informed neural networks: COMM-PINN

  • Authors: Shahed Rezaei, Ahmad Moeineddin, Ali Harandi
  • Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06044
  • Pdf link: https://arxiv.org/pdf/2304.06044
  • Abstract
    We applied physics-informed neural networks to solve the constitutive relations for nonlinear, path-dependent material behavior. As a result, the trained network not only satisfies all thermodynamic constraints but also instantly provides information about the current material state (i.e., free energy, stress, and the evolution of internal variables) under any given loading scenario without requiring initial data. One advantage of this work is that it bypasses the repetitive Newton iterations needed to solve nonlinear equations in complex material models. Additionally, strategies are provided to reduce the required order of derivation for obtaining the tangent operator. The trained model can be directly used in any finite element package (or other numerical methods) as a user-defined material model. However, challenges remain in the proper definition of collocation points and in integrating several non-equality constraints that become active or non-active simultaneously. We tested this methodology on rate-independent processes such as the classical von Mises plasticity model with a nonlinear hardening law, as well as local damage models for interface cracking behavior with a nonlinear softening law. Finally, we discuss the potential and remaining challenges for future developments of this new approach.

Primal-Dual Contextual Bayesian Optimization for Control System Online Optimization with Time-Average Constraints

  • Authors: Wenjie Xu, Yuning Jiang, Bratislav Svetozarevic, Colin N. Jones
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.06104
  • Pdf link: https://arxiv.org/pdf/2304.06104
  • Abstract
    This paper studies the problem of online performance optimization of constrained closed-loop control systems, where both the objective and the constraints are unknown black-box functions affected by exogenous time-varying contextual disturbances. A primal-dual contextual Bayesian optimization algorithm is proposed that achieves sublinear cumulative regret with respect to the dynamic optimal solution under certain regularity conditions. Furthermore, the algorithm achieves zero time-average constraint violation, ensuring that the average value of the constraint function satisfies the desired constraint. The method is applied to both sampled instances from Gaussian processes and a continuous stirred tank reactor parameter tuning problem; simulation results show that the method simultaneously provides close-to-optimal performance and maintains constraint feasibility on average. This contrasts current state-of-the-art methods, which either suffer from large cumulative regret or severe constraint violations for the case studies presented.

IoT trust and reputation: a survey and taxonomy

  • Authors: Muhammad Aaqib, Aftab Ali, Liming Chen, Omar Nibouche
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.06119
  • Pdf link: https://arxiv.org/pdf/2304.06119
  • Abstract
    IoT is one of the fastest-growing technologies and it is estimated that more than a billion devices would be utilized across the globe by the end of 2030. To maximize the capability of these connected entities, trust and reputation among IoT entities is essential. Several trust management models have been proposed in the IoT environment; however, these schemes have not fully addressed the IoT devices features, such as devices role, device type and its dynamic behavior in a smart environment. As a result, traditional trust and reputation models are insufficient to tackle these characteristics and uncertainty risks while connecting nodes to the network. Whilst continuous study has been carried out and various articles suggest promising solutions in constrained environments, research on trust and reputation is still at its infancy. In this paper, we carry out a comprehensive literature review on state-of-the-art research on the trust and reputation of IoT devices and systems. Specifically, we first propose a new structure, namely a new taxonomy, to organize the trust and reputation models based on the ways trust is managed. The proposed taxonomy comprises of traditional trust management-based systems and artificial intelligence-based systems, and combine both the classes which encourage the existing schemes to adapt these emerging concepts. This collaboration between the conventional mathematical and the advanced ML models result in design schemes that are more robust and efficient. Then we drill down to compare and analyse the methods and applications of these systems based on community-accepted performance metrics, e.g. scalability, delay, cooperativeness and efficiency. Finally, built upon the findings of the analysis, we identify and discuss open research issues and challenges, and further speculate and point out future research directions.

Robust and Context-Aware Real-Time Collaborative Robot Handling via Dynamic Gesture Commands

  • Authors: Rui Chen, Alvin Shek, Changliu Liu
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06175
  • Pdf link: https://arxiv.org/pdf/2304.06175
  • Abstract
    This paper studies real-time collaborative robot (cobot) handling, where the cobot maneuvers an object under human dynamic gesture commands. Enabling dynamic gesture commands is useful when the human needs to avoid direct contact with the robot or the object handled by the robot. However, the key challenge lies in the heterogeneity in human behaviors and the stochasticity in the perception of dynamic gestures, which requires the robot handling policy to be adaptable and robust. To address these challenges, we introduce Conditional Collaborative Handling Process (CCHP) to encode a contextaware cobot handling policy and a procedure to learn such policy from human-human collaboration. We thoroughly evaluate the adaptability and robustness of CCHP and apply our approach to a real-time cobot assembly task with Kinova Gen3 robot arm. Results show that our method leads to significantly less human effort and smoother human-robot collaboration than state-of-the-art rule-based approach even with first-time users.

Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

  • Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.06178
  • Pdf link: https://arxiv.org/pdf/2304.06178
  • Abstract
    Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.

Do "bad" citations have "good" effects?

  • Authors: Honglin Bao, Misha Teplitskiy
  • Subjects: Digital Libraries (cs.DL); Computers and Society (cs.CY); Multiagent Systems (cs.MA); Adaptation and Self-Organizing Systems (nlin.AO)
  • Arxiv link: https://arxiv.org/abs/2304.06190
  • Pdf link: https://arxiv.org/pdf/2304.06190
  • Abstract
    The scientific community generally discourages authors of research papers from citing papers that did not influence them because such "rhetorical" citations are assumed to degrade the literature and incentives for good work. Intuitively, a world where authors cite only substantively appears attractive. We argue that manding substantive citing may have underappreciated consequences on the allocation of attention and dynamism. We develop a novel agent-based model in which agents cite substantively and rhetorically. Agents first select papers to read based on their expected quality, read them and observe their actual quality, become influenced by those that are sufficiently good, and substantively cite them. Next, agents fill any remaining slots in the reference lists with papers that support their claims, regardless of whether they were actually influential. By turning rhetorical citing on-and-off, we find that rhetorical citing increases the correlation between quality and citations, increases citation churn, and reduces citation inequality. This occurs because rhetorical citing redistributes some citations from a stable set of elite-quality papers to a more dynamic set with high-to-moderate quality and high rhetorical value. Increasing the size of reference lists, often seen as an undesirable trend, amplifies the effects. In sum, rhetorical citing helps deconcentrate attention and makes it easier to displace incumbent ideas, so whether it is indeed undesirable depends on the metrics used to judge desirability.

Learning Over All Contracting and Lipschitz Closed-Loops for Partially-Observed Nonlinear Systems

  • Authors: Nicholas H. Barbara, Ruigang Wang, Ian R. Manchester
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.06193
  • Pdf link: https://arxiv.org/pdf/2304.06193
  • Abstract
    This paper presents a policy parameterization for learning-based control on nonlinear, partially-observed dynamical systems. The parameterization is based on a nonlinear version of the Youla parameterization and the recently proposed Recurrent Equilibrium Network (REN) class of models. We prove that the resulting Youla-REN parameterization automatically satisfies stability (contraction) and user-tunable robustness (Lipschitz) conditions on the closed-loop system. This means it can be used for safe learning-based control with no additional constraints or projections required to enforce stability or robustness. We test the new policy class in simulation on two reinforcement learning tasks: 1) magnetic suspension, and 2) inverting a rotary-arm pendulum. We find that the Youla-REN performs similarly to existing learning-based and optimal control methods while also ensuring stability and exhibiting improved robustness to adversarial disturbances.

Sub-Optimal Moving Horizon Estimation in Feedback Control of Linear Constrained Systems

  • Authors: Yujia Yang, Chris Manzie, Ye Pu
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06216
  • Pdf link: https://arxiv.org/pdf/2304.06216
  • Abstract
    Moving horizon estimation (MHE) offers benefits relative to other estimation approaches by its ability to explicitly handle constraints, but suffers increased computation cost. To help enable MHE on platforms with limited computation power, we propose to solve the optimization problem underlying MHE sub-optimally for a fixed number of optimization iterations per time step. The stability of the closed-loop system is analyzed using the small-gain theorem by considering the closed-loop controlled system, the optimization algorithm dynamics, and the estimation error dynamics as three interconnected subsystems. By assuming incremental input/output-to-state stability ({\delta}- IOSS) of the system and imposing standard ISS conditions on the controller, we derive conditions on the iteration number such that the interconnected system is input-to-state stable (ISS) w.r.t. the external disturbances. A simulation using an MHE- MPC estimator-controller pair is used to validate the results.

Physics-informed radial basis network (PIRBN): A local approximation neural network for solving nonlinear PDEs

  • Authors: Jinshuai Bai, Gui-Rong Liu, Ashish Gupta, Laith Alzubaidi, Xi-Qiao Feng, YuanTong Gu
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.06234
  • Pdf link: https://arxiv.org/pdf/2304.06234
  • Abstract
    Our recent intensive study has found that physics-informed neural networks (PINN) tend to be local approximators after training. This observation leads to this novel physics-informed radial basis network (PIRBN), which can maintain the local property throughout the entire training process. Compared to deep neural networks, a PIRBN comprises of only one hidden layer and a radial basis "activation" function. Under appropriate conditions, we demonstrated that the training of PIRBNs using gradient descendent methods can converge to Gaussian processes. Besides, we studied the training dynamics of PIRBN via the neural tangent kernel (NTK) theory. In addition, comprehensive investigations regarding the initialisation strategies of PIRBN were conducted. Based on numerical examples, PIRBN has been demonstrated to be more effective and efficient than PINN in solving PDEs with high-frequency features and ill-posed computational domains. Moreover, the existing PINN numerical techniques, such as adaptive learning, decomposition and different types of loss functions, are applicable to PIRBN. The programs that can regenerate all numerical results can be found at https://github.com/JinshuaiBai/PIRBN.

Loosely Coupled Odometry, UWB Ranging, and Cooperative Spatial Detection for Relative Monte-Carlo Multi-Robot Localization

  • Authors: Xianjia Yu, Paola Torrico Morrn, Sahar Salimpour, Jorge Pena Queralta, Tomi Westerlund
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06264
  • Pdf link: https://arxiv.org/pdf/2304.06264
  • Abstract
    As mobile robots become more ubiquitous, their deployments grow across use cases where GNSS positioning is either unavailable or unreliable. This has led to increased interest in multi-modal relative localization methods. Complementing onboard odometry, ranging allows for relative state estimation, with ultra-wideband (UWB) ranging having gained widespread recognition due to its low cost and centimeter-level out-of-box accuracy. Infrastructure-free localization methods allow for more dynamic, ad-hoc, and flexible deployments, yet they have received less attention from the research community. In this work, we propose a cooperative relative multi-robot localization where we leverage inter-robot ranging and simultaneous spatial detections of objects in the environment. To achieve this, we equip robots with a single UWB transceiver and a stereo camera. We propose a novel Monte-Carlo approach to estimate relative states by either employing only UWB ranges or dynamically integrating simultaneous spatial detections from the stereo cameras. We also address the challenges for UWB ranging error mitigation, especially in non-line-of-sight, with a study on different LSTM networks to estimate the ranging error. The proposed approach has multiple benefits. First, we show that a single range is enough to estimate the accurate relative states of two robots when fusing odometry measurements. Second, our experiments also demonstrate that our approach surpasses traditional methods such as multilateration in terms of accuracy. Third, to increase accuracy even further, we allow for the integration of cooperative spatial detections. Finally, we show how ROS 2 and Zenoh can be integrated to build a scalable wireless communication solution for multi-robot systems. The experimental validation includes real-time deployment and autonomous navigation based on the relative positioning method.

Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning

  • Authors: Wenli Xiao, Yiwei Lyu, John Dolan
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.06281
  • Pdf link: https://arxiv.org/pdf/2304.06281
  • Abstract
    Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Although shielding with Linear Temporal Logic (LTL) is a promising formal method to ensure safety in single-agent Reinforcement Learning (RL), it results in conservative behaviors when scaling to multi-agent scenarios. Additionally, it poses computational challenges for synthesizing shields in complex multi-agent environments. This work introduces Model-based Dynamic Shielding (MBDS) to support MARL algorithm design. Our algorithm synthesizes distributive shields, which are reactive systems running in parallel with each MARL agent, to monitor and rectify unsafe behaviors. The shields can dynamically split, merge, and recompute based on agents' states. This design enables efficient synthesis of shields to monitor agents in complex environments without coordination overheads. We also propose an algorithm to synthesize shields without prior knowledge of the dynamics model. The proposed algorithm obtains an approximate world model by interacting with the environment during the early stage of exploration, making our MBDS enjoy formal safety guarantees with high probability. We demonstrate in simulations that our framework can surpass existing baselines in terms of safety guarantees and learning performance.

Neural State-Space Models: Empirical Evaluation of Uncertainty Quantification

  • Authors: Marco Forgione, Dario Piga
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06349
  • Pdf link: https://arxiv.org/pdf/2304.06349
  • Abstract
    Effective quantification of uncertainty is an essential and still missing step towards a greater adoption of deep-learning approaches in different applications, including mission-critical ones. In particular, investigations on the predictive uncertainty of deep-learning models describing non-linear dynamical systems are very limited to date. This paper is aimed at filling this gap and presents preliminary results on uncertainty quantification for system identification with neural state-space models. We frame the learning problem in a Bayesian probabilistic setting and obtain posterior distributions for the neural network's weights and outputs through approximate inference techniques. Based on the posterior, we construct credible intervals on the outputs and define a surprise index which can effectively diagnose usage of the model in a potentially dangerous out-of-distribution regime, where predictions cannot be trusted.

Emergence of Symbols in Neural Networks for Semantic Understanding and Communication

  • Authors: Yang Chen, Liangxuan Guo, Shan Yu
  • Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Symbolic Computation (cs.SC); Neurons and Cognition (q-bio.NC)
  • Arxiv link: https://arxiv.org/abs/2304.06377
  • Pdf link: https://arxiv.org/pdf/2304.06377
  • Abstract
    Being able to create meaningful symbols and proficiently use them for higher cognitive functions such as communication, reasoning, planning, etc., is essential and unique for human intelligence. Current deep neural networks are still far behind human's ability to create symbols for such higher cognitive functions. Here we propose a solution, named SEA-net, to endow neural networks with ability of symbol creation, semantic understanding and communication. SEA-net generates symbols that dynamically configure the network to perform specific tasks. These symbols capture compositional semantic information that enables the system to acquire new functions purely by symbolic manipulation or communication. In addition, we found that these self-generated symbols exhibit an intrinsic structure resembling that of natural language, suggesting a common framework underlying the generation and understanding of symbols in both human brains and artificial neural networks. We hope that it will be instrumental in producing more capable systems in the future that can synergize the strengths of connectionist and symbolic approaches for AI.

Energy-Efficient GPU Clusters Scheduling for Deep Learning

  • Authors: Diandian Gu, Xintong Xie, Gang Huang, Xin Jin, Xuanzhe Liu
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.06381
  • Pdf link: https://arxiv.org/pdf/2304.06381
  • Abstract
    Training deep neural networks (DNNs) is a major workload in datacenters today, resulting in a tremendously fast growth of energy consumption. It is important to reduce the energy consumption while completing the DL training jobs early in data centers. In this paper, we propose PowerFlow, a GPU clusters scheduler that reduces the average Job Completion Time (JCT) under an energy budget. We first present performance models for DL training jobs to predict the throughput and energy consumption performance with different configurations. Based on the performance models, PowerFlow dynamically allocates GPUs and adjusts the GPU-level or job-level configurations of DL training jobs. PowerFlow applies network packing and buddy allocation to job placement, thus avoiding extra energy consumed by cluster fragmentations. Evaluation results show that under the same energy consumption, PowerFlow improves the average JCT by 1.57 - 3.39 x at most, compared to competitive baselines.

TransHP: Image Classification with Hierarchical Prompting

  • Authors: Wenhao Wang, Yifan Sun, Wei Li, Yi Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06385
  • Pdf link: https://arxiv.org/pdf/2304.06385
  • Abstract
    This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task. Different from prior HIC methods, our hierarchical prompting is the first to explicitly inject ancestor-class information as a tokenized hint that benefits the descendant-class discrimination. We think it well imitates human visual recognition, i.e., humans may use the ancestor class as a prompt to draw focus on the subtle differences among descendant classes. We model this prompting mechanism into a Transformer with Hierarchical Prompting (TransHP). TransHP consists of three steps: 1) learning a set of prompt tokens to represent the coarse (ancestor) classes, 2) on-the-fly predicting the coarse class of the input image at an intermediate block, and 3) injecting the prompt token of the predicted coarse class into the intermediate feature. Though the parameters of TransHP maintain the same for all input images, the injected coarse-class prompt conditions (modifies) the subsequent feature extraction and encourages a dynamic focus on relatively subtle differences among the descendant classes. Extensive experiments show that TransHP improves image classification on accuracy (e.g., improving ViT-B/16 by +2.83% ImageNet classification accuracy), training data efficiency (e.g., +12.69% improvement under 10% ImageNet training data), and model explainability. Moreover, TransHP also performs favorably against prior HIC methods, showing that TransHP well exploits the hierarchical information.

Communicating Actor Automata -- Modelling Erlang Processes as Communicating Machines

  • Authors: Dominic Orchard (University of Kent, UK), Mihail Munteanu (Masabi Ltd.), Paulo Torrens (University of Kent, UK)
  • Subjects: Programming Languages (cs.PL)
  • Arxiv link: https://arxiv.org/abs/2304.06395
  • Pdf link: https://arxiv.org/pdf/2304.06395
  • Abstract
    Brand and Zafiropulo's notion of Communicating Finite-State Machines (CFSMs) provides a succinct and powerful model of message-passing concurrency, based around channels. However, a major variant of message-passing concurrency is not readily captured by CFSMs: the actor model. In this work, we define a variant of CFSMs, called Communicating Actor Automata, to capture the actor model of concurrency as provided by Erlang: with mailboxes, from which messages are received according to repeated application of pattern matching. Furthermore, this variant of CFSMs supports dynamic process topologies, capturing common programming idioms in the context of actor-based message-passing concurrency. This gives a new basis for modelling, specifying, and verifying Erlang programs. We also consider a class of CAAs that give rise to freedom from race conditions.

Event-based tracking of human hands

  • Authors: Laura Duarte, Mohammad Safeea, Pedro Neto
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06534
  • Pdf link: https://arxiv.org/pdf/2304.06534
  • Abstract
    This paper proposes a novel method for human hands tracking using data from an event camera. The event camera detects changes in brightness, measuring motion, with low latency, no motion blur, low power consumption and high dynamic range. Captured frames are analysed using lightweight algorithms reporting 3D hand position data. The chosen pick-and-place scenario serves as an example input for collaborative human-robot interactions and in obstacle avoidance for human-robot safety applications. Events data are pre-processed into intensity frames. The regions of interest (ROI) are defined through object edge event activity, reducing noise. ROI features are extracted for use in-depth perception. Event-based tracking of human hand demonstrated feasible, in real time and at a low computational cost. The proposed ROI-finding method reduces noise from intensity images, achieving up to 89% of data reduction in relation to the original, while preserving the features. The depth estimation error in relation to ground truth (measured with wearables), measured using dynamic time warping and using a single event camera, is from 15 to 30 millimetres, depending on the plane it is measured. Tracking of human hands in 3D space using a single event camera data and lightweight algorithms to define ROI features (hands tracking in space).

DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos

  • Authors: Qi Zhao, M. Salman Asif, Zhan Ma
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06544
  • Pdf link: https://arxiv.org/pdf/2304.06544
  • Abstract
    Existing implicit neural representation (INR) methods do not fully exploit spatiotemporal redundancies in videos. Index-based INRs ignore the content-specific spatial features and hybrid INRs ignore the contextual dependency on adjacent frames, leading to poor modeling capability for scenes with large motion or dynamics. We analyze this limitation from the perspective of function fitting and reveal the importance of frame difference. To use explicit motion information, we propose Difference Neural Representation for Videos (DNeRV), which consists of two streams for content and frame difference. We also introduce a collaborative content unit for effective feature fusion. We test DNeRV for video compression, inpainting, and interpolation. DNeRV achieves competitive results against the state-of-the-art neural compression approaches and outperforms existing implicit methods on downstream inpainting and interpolation for $960 \times 1920$ videos.

Class-Incremental Learning of Plant and Disease Detection: Growing Branches with Knowledge Distillation

  • Authors: Mathieu Pagé Fortin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06619
  • Pdf link: https://arxiv.org/pdf/2304.06619
  • Abstract
    This paper investigates the problem of class-incremental object detection for agricultural applications where a model needs to learn new plant species and diseases incrementally without forgetting the previously learned ones. We adapt two public datasets to include new categories over time, simulating a more realistic and dynamic scenario. We then compare three class-incremental learning methods that leverage different forms of knowledge distillation to mitigate catastrophic forgetting. Our experiments show that all three methods suffer from catastrophic forgetting, but the recent Dynamic Y-KD approach, which additionally uses a dynamic architecture that grows new branches to learn new tasks, outperforms ILOD and Faster-ILOD in most scenarios both on new and old classes. These results highlight the challenges and opportunities of continual object detection for agricultural applications. In particular, the large intra-class and small inter-class variability that is typical of plant images exacerbate the difficulty of learning new categories without interfering with previous knowledge. We publicly release our code to encourage future work.

Robustness Measures and Monitors for Time Window Temporal Logic

  • Authors: Ahmad Ahmad, Cristian-Ioan Vasile, Roberto Tron, Calin Belta
  • Subjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.06645
  • Pdf link: https://arxiv.org/pdf/2304.06645
  • Abstract
    Temporal logics (TLs) have been widely used to formalize interpretable tasks for cyber-physical systems. Time Window Temporal Logic (TWTL) has been recently proposed as a specification language for dynamical systems. In particular, it can easily express robotic tasks, and it allows for efficient, automata-based verification and synthesis of control policies for such systems. In this paper, we define two quantitative semantics for this logic, and two corresponding monitoring algorithms, which allow for real-time quantification of satisfaction of formulas by trajectories of discrete-time systems. We demonstrate the new semantics and their runtime monitors on numerical examples.

ProtoDiv: Prototype-guided Division of Consistent Pseudo-bags for Whole-slide Image Classification

  • Authors: Rui Yang, Pei Liu, Luping Ji
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06652
  • Pdf link: https://arxiv.org/pdf/2304.06652
  • Abstract
    Due to the limitations of inadequate Whole-Slide Image (WSI) samples with weak labels, pseudo-bag-based multiple instance learning (MIL) appears as a vibrant prospect in WSI classification. However, the pseudo-bag dividing scheme, often crucial for classification performance, is still an open topic worth exploring. Therefore, this paper proposes a novel scheme, ProtoDiv, using a bag prototype to guide the division of WSI pseudo-bags. Rather than designing complex network architecture, this scheme takes a plugin-and-play approach to safely augment WSI data for effective training while preserving sample consistency. Furthermore, we specially devise an attention-based prototype that could be optimized dynamically in training to adapt to a classification task. We apply our ProtoDiv scheme on seven baseline models, and then carry out a group of comparison experiments on two public WSI datasets. Experiments confirm our ProtoDiv could usually bring obvious performance improvements to WSI classification.

D-SVM over Networked Systems with Non-Ideal Linking Conditions

  • Authors: Mohammadreza Doostmohammadian, Alireza Aghasi, Houman Zarrabi
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.06667
  • Pdf link: https://arxiv.org/pdf/2304.06667
  • Abstract
    This paper considers distributed optimization algorithms, with application in binary classification via distributed support-vector-machines (D-SVM) over multi-agent networks subject to some link nonlinearities. The agents solve a consensus-constraint distributed optimization cooperatively via continuous-time dynamics, while the links are subject to strongly sign-preserving odd nonlinear conditions. Logarithmic quantization and clipping (saturation) are two examples of such nonlinearities. In contrast to existing literature that mostly considers ideal links and perfect information exchange over linear channels, we show how general sector-bounded models affect the convergence to the optimizer (i.e., the SVM classifier) over dynamic balanced directed networks. In general, any odd sector-bounded nonlinear mapping can be applied to our dynamics. The main challenge is to show that the proposed system dynamics always have one zero eigenvalue (associated with the consensus) and the other eigenvalues all have negative real parts. This is done by recalling arguments from matrix perturbation theory. Then, the solution is shown to converge to the agreement state under certain conditions. For example, the gradient tracking (GT) step size is tighter than the linear case by factors related to the upper/lower sector bounds. To the best of our knowledge, no existing work in distributed optimization and learning literature considers non-ideal link conditions.

Inertia-Aware Microgrid Investment Planning Using Tractable Decomposition Algorithms

  • Authors: Agnes Marjorie Nakiganda, Shahab Dehghan, Petros Aristidou
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.06674
  • Pdf link: https://arxiv.org/pdf/2304.06674
  • Abstract
    The integration of the frequency dynamics into Micro-Grid (MG) investment and operational planning problems is vital in ensuring the security of the system in the post-contingency states. However, the task of including transient security constraints in MG planning problems is non-trivial. This is due to the highly non-linear and non-convex nature of the analytical closed form of the frequency metrics (e.g., frequency nadir) and power flow constraints. To handle this issue, this paper presents two algorithms for decomposing the MG investment planning problem into multiple levels to enhance computational tractability and optimality. Furthermore, the sensitivity of the decisions made at each level is captured by corresponding dual cutting planes to model feasible secure regions. This, in turn, ensures both the optimal determination and placement of inertia services and accelerates the convergence of the proposed decomposition algorithms. The efficient and effective performance of the proposed algorithms is tested and verified on an 18-bus Low Voltage (LV) network and a 30-bus Medium Voltage (MV) network under various operating scenarios.

OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems

  • Authors: Jiachang Liu, Sam Rosen, Chudi Zhong, Cynthia Rudin
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.06686
  • Pdf link: https://arxiv.org/pdf/2304.06686
  • Abstract
    We consider an important problem in scientific discovery, identifying sparse governing equations for nonlinear dynamical systems. This involves solving sparse ridge regression problems to provable optimality in order to determine which terms drive the underlying dynamics. We propose a fast algorithm, OKRidge, for sparse ridge regression, using a novel lower bound calculation involving, first, a saddle point formulation, and from there, either solving (i) a linear system or (ii) using an ADMM-based approach, where the proximal operators can be efficiently evaluated by solving another linear system and an isotonic regression problem. We also propose a method to warm-start our solver, which leverages a beam search. Experimentally, our methods attain provable optimality with run times that are orders of magnitude faster than those of the existing MIP formulations solved by the commercial solver Gurobi.

Representing Volumetric Videos as Dynamic MLP Maps

  • Authors: Sida Peng, Yunzhi Yan, Qing Shuai, Hujun Bao, Xiaowei Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.06717
  • Pdf link: https://arxiv.org/pdf/2304.06717
  • Abstract
    This paper introduces a novel representation of volumetric videos for real-time view synthesis of dynamic scenes. Recent advances in neural scene representations demonstrate their remarkable capability to model and render complex static scenes, but extending them to represent dynamic scenes is not straightforward due to their slow rendering speed or high storage cost. To solve this problem, our key idea is to represent the radiance field of each frame as a set of shallow MLP networks whose parameters are stored in 2D grids, called MLP maps, and dynamically predicted by a 2D CNN decoder shared by all frames. Representing 3D scenes with shallow MLPs significantly improves the rendering speed, while dynamically predicting MLP parameters with a shared 2D CNN instead of explicitly storing them leads to low storage cost. Experiments show that the proposed approach achieves state-of-the-art rendering quality on the NHR and ZJU-MoCap datasets, while being efficient for real-time rendering with a speed of 41.7 fps for $512 \times 512$ images on an RTX 3090 GPU. The code is available at https://zju3dv.github.io/mlp_maps/.

New submissions for Wed, 26 Apr 23

Keyword: efficient

Proposal for a distributed, community-driven academic publishing system

  • Authors: Matteo Barbone, Mustafa Gündoğan, Dhiren M. Kara, Benjamin Pingault, Alejandro Rodriguez-Pardo Montblanch, Lucio Stefan, Anthony K. C. Tan
  • Subjects: Digital Libraries (cs.DL); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2304.12326
  • Pdf link: https://arxiv.org/pdf/2304.12326
  • Abstract
    We propose an academic publishing system where research papers are stored in a network of data centres owned by university libraries and research institutions, and are interfaced with the academic community through a website. In our system, the editor is replaced by an initial adjusted community-wide evaluation, the standard peer-review is accompanied by a post-publication open-ended and community-wide review process, aiming at a more objective and longer-term evaluation, the publishing costs are reduced to the running costs of the servers, and access is fully open. Our proposal addresses the fundamental problems of the current system: it reduces publishing costs, allowing easier access by less well-funded institutions (especially from developing countries); it makes the editorial evaluation distributed and more transparent; it speeds up the peer review process by eliminating the need for multiple resubmissions; and it introduces a long-term, community-wide evaluation of papers, ensuring their continued relevance and accuracy; while maximising its main goals, i.e. ensuring the highest quality of peer review and giving the best referees, the most visibility and the most credit to the best papers. Our scheme is time-efficient, financially sustainable, ethically fair and represents a significant improvement over the current system.

Parallel bootstrap-based on-policy deep reinforcement learning for continuous flow control applications

  • Authors: J. Viquerat, E. Hachem
  • Subjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)
  • Arxiv link: https://arxiv.org/abs/2304.12330
  • Pdf link: https://arxiv.org/pdf/2304.12330
  • Abstract
    The coupling of deep reinforcement learning to numerical flow control problems has recently received a considerable attention, leading to groundbreaking results and opening new perspectives for the domain. Due to the usually high computational cost of fluid dynamics solvers, the use of parallel environments during the learning process represents an essential ingredient to attain efficient control in a reasonable time. Yet, most of the deep reinforcement learning literature for flow control relies on on-policy algorithms, for which the massively parallel transition collection may break theoretical assumptions and lead to suboptimal control models. To overcome this issue, we propose a parallelism pattern relying on partial-trajectory buffers terminated by a return bootstrapping step, allowing a flexible use of parallel environments while preserving the on-policiness of the updates. This approach is illustrated on a CPU-intensive continuous flow control problem from the literature.

Beyond the Pixel: a Photometrically Calibrated HDR Dataset for Luminance and Color Temperature Prediction

  • Authors: Christophe Bolduc, Justine Giroux, Marc Hébert, Claude Demers, Jean-François Lalonde
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12372
  • Pdf link: https://arxiv.org/pdf/2304.12372
  • Abstract
    Light plays an important role in human well-being. However, most computer vision tasks treat pixels without considering their relationship to physical luminance. To address this shortcoming, we present the first large-scale photometrically calibrated dataset of high dynamic range \ang{360} panoramas. Our key contribution is the calibration of an existing, uncalibrated HDR Dataset. We do so by accurately capturing RAW bracketed exposures simultaneously with a professional photometric measurement device (chroma meter) for multiple scenes across a variety of lighting conditions. Using the resulting measurements, we establish the calibration coefficients to be applied to the HDR images. The resulting dataset is a rich representation of indoor scenes which displays a wide range of illuminance and color temperature, and varied types of light sources. We exploit the dataset to introduce three novel tasks: where per-pixel luminance, per-pixel temperature and planar illuminance can be predicted from a single input image. Finally, we also capture another smaller calibrated dataset with a commercial \ang{360} camera, to experiment on generalization across cameras. We are optimistic that the release of our datasets and associated code will spark interest in physically accurate light estimation within the community.

Efficient and Scalable Path-Planning Algorithms for Curvature Constrained Motion in the Hamilton-Jacobi Formulation

  • Authors: Christian Parkinson, Isabelle Boyle
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.12377
  • Pdf link: https://arxiv.org/pdf/2304.12377
  • Abstract
    We present a partial-differential-equation-based optimal path-planning framework for curvature constrained motion, with application to vehicles in 2- and 3-spatial-dimensions. This formulation relies on optimal control theory, dynamic programming, and a Hamilton-Jacobi-Bellman equation. Many authors have developed similar models and employed grid-based numerical methods to solve the partial differential equation required to generate optimal trajectories. However, these methods can be inefficient and do not scale well to high dimensions. We describe how efficient and scalable algorithms for solutions of high dimensional Hamilton-Jacobi equations can be developed to solve similar problems very efficiently, even in high dimensions, while maintaining the Hamilton-Jacobi formulation. We demonstrate our method with several examples.

Recognizing and generating unswitchable graphs

  • Authors: Asish Mukhopadhyay, Daniel John, Srivatsan Vasudevan
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.12381
  • Pdf link: https://arxiv.org/pdf/2304.12381
  • Abstract
    In this paper, we show that unswitchable graphs are a proper subclass of split graphs, and exploit this fact to propose efficient algorithms for their recognition and generation.

Green Video Complexity Analysis for Efficient Encoding in Adaptive Video Streaming

  • Authors: Vignesh V Menon, Christian Feldmann, Klaus Schoeffmann, Mohammad Ghanbari, Christian Timmerer
  • Subjects: Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2304.12384
  • Pdf link: https://arxiv.org/pdf/2304.12384
  • Abstract
    For adaptive streaming applications, low-complexity and accurate video complexity features are necessary to analyze the video content in real time, which ensures fast and compression-efficient video streaming without disruptions. State-of-the-art video complexity features are Spatial Information (SI) and Temporal Information (TI) features which do not correlate well with the encoding parameters in adaptive streaming applications. To this light, Video Complexity Analyzer (VCA) was introduced, determining the features based on Discrete Cosine Transform (DCT)-energy. This paper presents optimizations on VCA for faster and energy-efficient video complexity analysis. Experimental results show that VCA v2.0, using eight CPU threads, Single Instruction Multiple Data (SIMD), and low-pass DCT optimization, determines seven complexity features of Ultra High Definition 8-bit videos with better accuracy at a speed of up to 292.68 fps and an energy consumption of 97.06% lower than the reference SITI implementation.

Matrix-free GPU-accelerated saddle-point solvers for high-order problems in $H(\mathrm{div})$

  • Authors: Will Pazner, Tzanio Kolev, Panayot Vassilevski
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.12387
  • Pdf link: https://arxiv.org/pdf/2304.12387
  • Abstract
    This work describes the development of matrix-free GPU-accelerated solvers for high-order finite element problems in $H(\mathrm{div})$. The solvers are applicable to grad-div and Darcy problems in saddle-point formulation, and have applications in radiation diffusion and porous media flow problems, among others. Using the interpolation-histopolation basis (cf. SIAM J. Sci. Comput., 45 (2023), A675-A702, arXiv:2203.02465), efficient matrix-free preconditioners can be constructed for the $(1,1)$-block and Schur complement of the block system. With these approximations, block-preconditioned MINRES converges in a number of iterations that is independent of the mesh size and polynomial degree. The approximate Schur complement takes the form of an M-matrix graph Laplacian, and therefore can be well-preconditioned by highly scalable algebraic multigrid methods. High-performance GPU-accelerated algorithms for all components of the solution algorithm are developed, discussed, and benchmarked. Numerical results are presented on a number of challenging test cases, including the "crooked pipe" grad-div problem, the SPE10 reservoir modeling benchmark problem, and a nonlinear radiation diffusion test case.

HDCC: A Hyperdimensional Computing compiler for classification on embedded systems and high-performance computing

  • Authors: Pere Vergés, Mike Heddes, Igor Nunes, Tony Givargis, Alexandru Nicolau
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.12398
  • Pdf link: https://arxiv.org/pdf/2304.12398
  • Abstract
    Hyperdimensional Computing (HDC) is a bio-inspired computing framework that has gained increasing attention, especially as a more efficient approach to machine learning (ML). This work introduces the \name{} compiler, the first open-source compiler that translates high-level descriptions of HDC classification methods into optimized C code. The code generated by the proposed compiler has three main features for embedded systems and High-Performance Computing: (1) it is self-contained and has no library or platform dependencies; (2) it supports multithreading and single instruction multiple data (SIMD) instructions using C intrinsics; (3) it is optimized for maximum performance and minimal memory usage. \name{} is designed like a modern compiler, featuring an intuitive and descriptive input language, an intermediate representation (IR), and a retargetable backend. This makes \name{} a valuable tool for research and applications exploring HDC for classification tasks on embedded systems and High-Performance Computing. To substantiate these claims, we conducted experiments with HDCC on several of the most popular datasets in the HDC literature. The experiments were run on four different machines, including different hyperparameter configurations, and the results were compared to a popular prototyping library built on PyTorch. The results show a training and inference speedup of up to 132x, averaging 25x across all datasets and machines. Regarding memory usage, using 10240-dimensional hypervectors, the average reduction was 5x, reaching up to 14x. When considering vectors of 64 dimensions, the average reduction was 85x, with a maximum of 158x less memory utilization.

Codes Correcting a Single Long Duplication Error

  • Authors: Daniil Goshkoder, Nikita Polyanskii, Ilya Vorobyev
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2304.12399
  • Pdf link: https://arxiv.org/pdf/2304.12399
  • Abstract
    We consider the problem of constructing a code capable of correcting a single long tandem duplication error of variable length. As the main contribution of this paper, we present a $q$-ary efficiently encodable code of length $n+1$ and redundancy $1$ that can correct a single duplication of length at least $K=4\cdot\lceil \log_q n\rceil +1$. The complexity of encoding is $O(\frac{n^2}{\log n})$ and the complexity of decoding is $O(n)$. We also present a $q$-ary non-efficient code of length $n+1$ correcting single long duplication of length at least $K = \lceil \log_q n\rceil +\phi(n)$, where $\phi(n)\rightarrow{\infty}$ as $n\rightarrow{\infty}$. This code has redundancy less than $1$ for sufficiently large $n$. Moreover, we show that in the class of codes correcting a single long duplication with redundancy $1$, the value $K$ in our constructions is order-optimal.

PEFT-Ref: A Modular Reference Architecture and Typology for Parameter-Efficient Finetuning Techniques

  • Authors: Mohammed Sabry, Anya Belz
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.12410
  • Pdf link: https://arxiv.org/pdf/2304.12410
  • Abstract
    Recent parameter-efficient finetuning (PEFT) techniques aim to improve over the considerable cost of fully finetuning large pretrained language models (PLM). As different PEFT techniques proliferate, it is becoming difficult to compare them, in particular in terms of (i) the structure and functionality they add to the PLM, (ii) the different types and degrees of efficiency improvements achieved, (iii) performance at different downstream tasks, and (iv) how differences in structure and functionality relate to efficiency and task performance. To facilitate such comparisons, this paper presents a reference framework which standardises aspects shared by different PEFT techniques, while isolating differences to specific locations and interactions with the standard components. Through this process of standardising and isolating differences, a modular view of PEFT techniques emerges, supporting not only direct comparison of different techniques and their efficiency and task performance, but also systematic exploration of reusability and composability of the different types of finetuned modules. We demonstrate how the reference framework can be applied to understand properties and relative advantages of PEFT techniques, hence to inform selection of techniques for specific tasks, and design choices for new PEFT techniques.

Sample-Efficient and Surrogate-Based Design Optimization of Underwater Vehicle Hulls

  • Authors: Harsh Vardhan, David Hyde, Umesh Timalsina, Peter Volgyesi, Janos Sztipanovits
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Applied Physics (physics.app-ph); Fluid Dynamics (physics.flu-dyn); Applications (stat.AP); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.12420
  • Pdf link: https://arxiv.org/pdf/2304.12420
  • Abstract
    Physics simulations are a computational bottleneck in computer-aided design (CAD) optimization processes. Hence, in order to make accurate (computationally expensive) simulations feasible for use in design optimization, one requires either an optimization framework that is highly sample-efficient or fast data-driven proxies (surrogate models) for long running simulations. In this work, we leverage recent advances in optimization and artificial intelligence (AI) to address both of these potential solutions, in the context of designing an optimal unmanned underwater vehicle (UUV). We first investigate and compare the sample efficiency and convergence behavior of different optimization techniques with a standard computational fluid dynamics (CFD) solver in the optimization loop. We then develop a deep neural network (DNN) based surrogate model to approximate drag forces that would otherwise be computed via direct numerical simulation with the CFD solver. The surrogate model is in turn used in the optimization loop of the hull design. Our study finds that the Bayesian Optimization Lower Condition Bound (BO LCB) algorithm is the most sample-efficient optimization framework and has the best convergence behavior of those considered. Subsequently, we show that our DNN-based surrogate model predicts drag force on test data in tight agreement with CFD simulations, with a mean absolute percentage error (MAPE) of 1.85%. Combining these results, we demonstrate a two-orders-of-magnitude speedup (with comparable accuracy) for the design optimization process when the surrogate model is used. To our knowledge, this is the first study applying Bayesian optimization and DNN-based surrogate modeling to the problem of UUV design optimization, and we share our developments as open-source software.

TIGTEC : Token Importance Guided TExt Counterfactuals

  • Authors: Milan Bhan, Jean-Noel Vittaut, Nicolas Chesneau, Marie-Jeanne Lesot
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Methodology (stat.ME)
  • Arxiv link: https://arxiv.org/abs/2304.12425
  • Pdf link: https://arxiv.org/pdf/2304.12425
  • Abstract
    Counterfactual examples explain a prediction by highlighting changes of instance that flip the outcome of a classifier. This paper proposes TIGTEC, an efficient and modular method for generating sparse, plausible and diverse counterfactual explanations for textual data. TIGTEC is a text editing heuristic that targets and modifies words with high contribution using local feature importance. A new attention-based local feature importance is proposed. Counterfactual candidates are generated and assessed with a cost function integrating semantic distance, while the solution space is efficiently explored in a beam search fashion. The conducted experiments show the relevance of TIGTEC in terms of success rate, sparsity, diversity and plausibility. This method can be used in both model-specific or model-agnostic way, which makes it very convenient for generating counterfactual explanations.

Sparse Private LASSO Logistic Regression

  • Authors: Amol Khanna, Fred Lu, Edward Raff
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.12429
  • Pdf link: https://arxiv.org/pdf/2304.12429
  • Abstract
    LASSO regularized logistic regression is particularly useful for its built-in feature selection, allowing coefficients to be removed from deployment and producing sparse solutions. Differentially private versions of LASSO logistic regression have been developed, but generally produce dense solutions, reducing the intrinsic utility of the LASSO penalty. In this paper, we present a differentially private method for sparse logistic regression that maintains hard zeros. Our key insight is to first train a non-private LASSO logistic regression model to determine an appropriate privatized number of non-zero coefficients to use in final model selection. To demonstrate our method's performance, we run experiments on synthetic and real-world datasets.

VpROM: A novel Variational AutoEncoder-boosted Reduced Order Model for the treatment of parametric dependencies in nonlinear systems

  • Authors: Thomas Simpson, Konstantinos Vlachas, Anthony Garland, Nikolaos Dervilis, Eleni Chatzi
  • Subjects: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2304.12437
  • Pdf link: https://arxiv.org/pdf/2304.12437
  • Abstract
    Reduced Order Models (ROMs) are of considerable importance in many areas of engineering in which computational time presents difficulties. Established approaches employ projection-based reduction such as Proper Orthogonal Decomposition, however, such methods can become inefficient or fail in the case of parameteric or strongly nonlinear models. Such limitations are usually tackled via a library of local reduction bases each of which being valid for a given parameter vector. The success of such methods, however, is strongly reliant upon the method used to relate the parameter vectors to the local bases, this is typically achieved using clustering or interpolation methods. We propose the replacement of these methods with a Variational Autoencoder (VAE) to be used as a generative model which can infer the local basis corresponding to a given parameter vector in a probabilistic manner. The resulting VAE-boosted parametric ROM \emph{VpROM} still retains the physical insights of a projection-based method but also allows for better treatment of problems where model dependencies or excitation traits cause the dynamic behavior to span multiple response regimes. Moreover, the probabilistic treatment of the VAE representation allows for uncertainty quantification on the reduction bases which may then be propagated to the ROM response. The performance of the proposed approach is validated on an open-source simulation benchmark featuring hysteresis and multi-parametric dependencies, and on a large-scale wind turbine tower characterised by nonlinear material behavior and model uncertainty.

Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

  • Authors: Andrew Wagenmaker, Dylan J. Foster
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.12466
  • Pdf link: https://arxiv.org/pdf/2304.12466
  • Abstract
    We consider the development of adaptive, instance-dependent algorithms for interactive decision making (bandits, reinforcement learning, and beyond) that, rather than only performing well in the worst case, adapt to favorable properties of real-world instances for improved performance. We aim for instance-optimality, a strong notion of adaptivity which asserts that, on any particular problem instance, the algorithm under consideration outperforms all consistent algorithms. Instance-optimality enjoys a rich asymptotic theory originating from the work of \citet{lai1985asymptotically,graves1997asymptotically}, but non-asymptotic guarantees have remained elusive outside of certain special cases. Even for problems as simple as tabular reinforcement learning, existing algorithms do not attain instance-optimal performance until the number of rounds of interaction is doubly exponential in the number of states. In this paper, we take the first step toward developing a non-asymptotic theory of instance-optimal decision making with general function approximation. We introduce a new complexity measure, the Allocation-Estimation Coefficient (AEC), and provide a new algorithm, $\mathsf{AE}^2$, which attains non-asymptotic instance-optimal performance at a rate controlled by the AEC. Our results recover the best known guarantees for well-studied problems such as finite-armed and linear bandits and, when specialized to tabular reinforcement learning, attain the first instance-optimal regret bounds with polynomial dependence on all problem parameters, improving over prior work exponentially. We complement these results with lower bounds that show that i) existing notions of statistical complexity are insufficient to derive non-asymptotic guarantees, and ii) under certain technical conditions, boundedness of the AEC is necessary to learn an instance-optimal allocation of decisions in finite time.

Evaluating Adversarial Robustness on Document Image Classification

  • Authors: Timothée Fronteau, Arnaud Paran, Aymen Shabou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.12486
  • Pdf link: https://arxiv.org/pdf/2304.12486
  • Abstract
    Adversarial attacks and defenses have gained increasing interest on computer vision systems in recent years, but as of today, most investigations are limited to images. However, many artificial intelligence models actually handle documentary data, which is very different from real world images. Hence, in this work, we try to apply the adversarial attack philosophy on documentary and natural data and to protect models against such attacks. We focus our work on untargeted gradient-based, transfer-based and score-based attacks and evaluate the impact of adversarial training, JPEG input compression and grey-scale input transformation on the robustness of ResNet50 and EfficientNetB0 model architectures. To the best of our knowledge, no such work has been conducted by the community in order to study the impact of these attacks on the document image classification task.

Queue Routing Strategies to Improve Equitable Housing Coordination in New York City

  • Authors: Yaren Bilge Kaya, Kayse Lee Maass
  • Subjects: Numerical Analysis (math.NA); Probability (math.PR)
  • Arxiv link: https://arxiv.org/abs/2304.12487
  • Pdf link: https://arxiv.org/pdf/2304.12487
  • Abstract
    Runaway and homeless youth (RHY) are a group of youth and young adults who are at high risk of being exploited through human trafficking. Although access to housing and support services is an effective way to decrease their vulnerability to being exploited, research reveals that coordination of these services provided to RHY by non-profit and government organizations is neither standardized, nor efficient. This situation often causes decreased, delayed, and inequitable access to these scarce housing resources. In this study, we aim to increase the housing system efficiency and reduce the barriers that are contributing to inequitable access to housing through simulation modeling and analyses. Specifically, we simulate a set of crisis and emergency shelters in New York City, funded by a single governmental organization, considering a queuing network with pools of multiple parallel servers, servers with demographic eligibility criteria, stochastic RHY arrival, impatient youth behaviour (possibility of abandonment), and a decision-maker (coordinator) that determines which server pool RHY is routed to. This simulation allows us to evaluate the impact of different queue routing strategies. Our simulation results show that by changing the way RHY is routed to shelters, we can reduce the average wait time by approximately a day and decrease the proportion of RHY abandoning the shelters by 13%.

Graph Convolutional Networks based on Manifold Learning for Semi-Supervised Image Classification

  • Authors: Lucas Pascotti Valem, Daniel Carlos Guimarães Pedronette, Longin Jan Latecki
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.12492
  • Pdf link: https://arxiv.org/pdf/2304.12492
  • Abstract
    Due to a huge volume of information in many domains, the need for classification methods is imperious. In spite of many advances, most of the approaches require a large amount of labeled data, which is often not available, due to costs and difficulties of manual labeling processes. In this scenario, unsupervised and semi-supervised approaches have been gaining increasing attention. The GCNs (Graph Convolutional Neural Networks) represent a promising solution since they encode the neighborhood information and have achieved state-of-the-art results on scenarios with limited labeled data. However, since GCNs require graph-structured data, their use for semi-supervised image classification is still scarce in the literature. In this work, we propose a novel approach, the Manifold-GCN, based on GCNs for semi-supervised image classification. The main hypothesis of this paper is that the use of manifold learning to model the graph structure can further improve the GCN classification. To the best of our knowledge, this is the first framework that allows the combination of GCNs with different types of manifold learning approaches for image classification. All manifold learning algorithms employed are completely unsupervised, which is especially useful for scenarios where the availability of labeled data is a concern. A broad experimental evaluation was conducted considering 5 GCN models, 3 manifold learning approaches, 3 image datasets, and 5 deep features. The results reveal that our approach presents better accuracy than traditional and recent state-of-the-art methods with very efficient run times for both training and testing.

DualSlide: Global-to-Local Sketching Interface for Slide Content and Layout Design

  • Authors: Jiahao Weng, Xusheng Du, Haoran Xie
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.12506
  • Pdf link: https://arxiv.org/pdf/2304.12506
  • Abstract
    Online learning and academic conferences have become pervasive and essential for education and professional development, especially since the onset of pandemics. Academic presentations usually require well-designed slides that are easily understood. Sketches that visually represent design intentions and are readily accessible to the average users. To assist non-expert users in creating visually appealing academic slides, we propose DualSlide, a global and local two-stage sketching interface system that provides image retrieval and user guidance. At the global stage, DualSlide provides a heat map canvas to display the distribution of all slide layouts in a dataset, allowing users to explore the reference slides efficiently. At the local stage of the system, detailed references and guidance for designing slide content, such as diagrams and fonts, can be provided. We further propose a sketch-matching algorithm to compare the user's input sketch and similar diagrams. All user guidance can be adapted in real-time editing, and users can design slides with a high degree of freedom. We conducted a user study to verify the effectiveness and usability of the proposed DualSlide system confirming that DualSlide provides high retrieval accuracy and satisfactory design results with a good user experience. Video: https://youtu.be/lUI1zjxCdM0

Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning

  • Authors: Zhongzhi Yu, Shang Wu, Yonggan Fu, Shunyao Zhang, Yingyan (Celine)Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.12520
  • Pdf link: https://arxiv.org/pdf/2304.12520
  • Abstract
    Despite the growing demand for tuning foundation vision transformers (FViTs) on downstream tasks, fully unleashing FViTs' potential under data-limited scenarios (e.g., few-shot tuning) remains a challenge due to FViTs' data-hungry nature. Common data augmentation techniques fall short in this context due to the limited features contained in the few-shot tuning data. To tackle this challenge, we first identify an opportunity for FViTs in few-shot tuning: pretrained FViTs themselves have already learned highly representative features from large-scale pretraining data, which are fully preserved during widely used parameter-efficient tuning. We thus hypothesize that leveraging those learned features to augment the tuning data can boost the effectiveness of few-shot FViT tuning. To this end, we propose a framework called Hint-based Data Augmentation (Hint-Aug), which aims to boost FViT in few-shot tuning by augmenting the over-fitted parts of tuning samples with the learned features of pretrained FViTs. Specifically, Hint-Aug integrates two key enablers: (1) an Attentive Over-fitting Detector (AOD) to detect over-confident patches of foundation ViTs for potentially alleviating their over-fitting on the few-shot tuning data and (2) a Confusion-based Feature Infusion (CFI) module to infuse easy-to-confuse features from the pretrained FViTs with the over-confident patches detected by the above AOD in order to enhance the feature diversity during tuning. Extensive experiments and ablation studies on five datasets and three parameter-efficient tuning techniques consistently validate Hint-Aug's effectiveness: 0.04% ~ 32.91% higher accuracy over the state-of-the-art (SOTA) data augmentation method under various low-shot settings. For example, on the Pet dataset, Hint-Aug achieves a 2.22% higher accuracy with 50% less training data over SOTA data augmentation methods.

Foley Sound Synthesis at the DCASE 2023 Challenge

  • Authors: Keunwoo Choi, Jaekwon Im, Laurie Heller, Brian McFee, Keisuke Imoto, Yuki Okamoto, Mathieu Lagrange, Shinosuke Takamichi
  • Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.12521
  • Pdf link: https://arxiv.org/pdf/2304.12521
  • Abstract
    The addition of Foley sound effects during post-production is a common technique used to enhance the perceived acoustic properties of multimedia content. Traditionally, Foley sound has been produced by human Foley artists, which involves manual recording and mixing of sound. However, recent advances in sound synthesis and generative models have generated interest in machine-assisted or automatic Foley synthesis techniques. To promote further research in this area, we have organized a challenge in DCASE 2023: Task 7 - Foley Sound Synthesis. Our challenge aims to provide a standardized evaluation framework that is both rigorous and efficient, allowing for the evaluation of different Foley synthesis systems. Through this challenge, we hope to encourage active participation from the research community and advance the state-of-the-art in automatic Foley synthesis. In this technical report, we provide a detailed overview of the Foley sound synthesis challenge, including task definition, dataset, baseline, evaluation scheme and criteria, and discussion.

Text-guided Eyeglasses Manipulation with Spatial Constraints

  • Authors: Jiacheng Wang, Ping Liu, Jingen Liu, Wei Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12539
  • Pdf link: https://arxiv.org/pdf/2304.12539
  • Abstract
    Virtual try-on of eyeglasses involves placing eyeglasses of different shapes and styles onto a face image without physically trying them on. While existing methods have shown impressive results, the variety of eyeglasses styles is limited and the interactions are not always intuitive or efficient. To address these limitations, we propose a Text-guided Eyeglasses Manipulation method that allows for control of the eyeglasses shape and style based on a binary mask and text, respectively. Specifically, we introduce a mask encoder to extract mask conditions and a modulation module that enables simultaneous injection of text and mask conditions. This design allows for fine-grained control of the eyeglasses' appearance based on both textual descriptions and spatial constraints. Our approach includes a disentangled mapper and a decoupling strategy that preserves irrelevant areas, resulting in better local editing. We employ a two-stage training scheme to handle the different convergence speeds of the various modality conditions, successfully controlling both the shape and style of eyeglasses. Extensive comparison experiments and ablation analyses demonstrate the effectiveness of our approach in achieving diverse eyeglasses styles while preserving irrelevant areas.

Efficient Bayesian inference using physics-informed invertible neural networks for inverse problems

  • Authors: Xiaofei Guan, Xintong Wang, Hao Wu
  • Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.12541
  • Pdf link: https://arxiv.org/pdf/2304.12541
  • Abstract
    In the paper, we propose a novel approach for solving Bayesian inverse problems with physics-informed invertible neural networks (PI-INN). The architecture of PI-INN consists of two sub-networks: an invertible neural network (INN) and a neural basis network (NB-Net). The invertible map between the parametric input and the INN output with the aid of NB-Net is constructed to provide a tractable estimation of the posterior distribution, which enables efficient sampling and accurate density evaluation. Furthermore, the loss function of PI-INN includes two components: a residual-based physics-informed loss term and a new independence loss term. The presented independence loss term can Gaussianize the random latent variables and ensure statistical independence between two parts of INN output by effectively utilizing the estimated density function. Several numerical experiments are presented to demonstrate the efficiency and accuracy of the proposed PI-INN, including inverse kinematics, inverse problems of the 1-d and 2-d diffusion equations, and seismic traveltime tomography.

SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge

  • Authors: Ke Chen, Liangyan Li, Huan Liu, Yunzhe Li, Congling Tang, Jun Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12556
  • Pdf link: https://arxiv.org/pdf/2304.12556
  • Abstract
    Stereo Image Super-Resolution (stereoSR) has attracted significant attention in recent years due to the extensive deployment of dual cameras in mobile phones, autonomous vehicles and robots. In this work, we propose a new StereoSR method, named SwinFSR, based on an extension of SwinIR, originally designed for single image restoration, and the frequency domain knowledge obtained by the Fast Fourier Convolution (FFC). Specifically, to effectively gather global information, we modify the Residual Swin Transformer blocks (RSTBs) in SwinIR by explicitly incorporating the frequency domain knowledge using the FFC and employing the resulting residual Swin Fourier Transformer blocks (RSFTBs) for feature extraction. Besides, for the efficient and accurate fusion of stereo views, we propose a new cross-attention module referred to as RCAM, which achieves highly competitive performance while requiring less computational cost than the state-of-the-art cross-attention modules. Extensive experimental results and ablation studies demonstrate the effectiveness and efficiency of our proposed SwinFSR.

Performance Optimization using Multimodal Modeling and Heterogeneous GNN

  • Authors: Akash Dutta, Jordi Alcaraz, Ali TehraniJamsaz, Anna Sikora, Eduardo Cesar, Ali Jannesari
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.12568
  • Pdf link: https://arxiv.org/pdf/2304.12568
  • Abstract
    Growing heterogeneity and configurability in HPC architectures has made auto-tuning applications and runtime parameters on these systems very complex. Users are presented with a multitude of options to configure parameters. In addition to application specific solutions, a common approach is to use general purpose search strategies, which often might not identify the best configurations or their time to convergence is a significant barrier. There is, thus, a need for a general purpose and efficient tuning approach that can be easily scaled and adapted to various tuning tasks. We propose a technique for tuning parallel code regions that is general enough to be adapted to multiple tasks. In this paper, we analyze IR-based programming models to make task-specific performance optimizations. To this end, we propose the Multimodal Graph Neural Network and Autoencoder (MGA) tuner, a multimodal deep learning based approach that adapts Heterogeneous Graph Neural Networks and Denoizing Autoencoders for modeling IR-based code representations that serve as separate modalities. This approach is used as part of our pipeline to model a syntax, semantics, and structure-aware IR-based code representation for tuning parallel code regions/kernels. We extensively experiment on OpenMP and OpenCL code regions/kernels obtained from PolyBench, Rodinia, STREAM, DataRaceBench, AMD SDK, NPB, NVIDIA SDK, Parboil, SHOC, and LULESH benchmarks. We apply our multimodal learning techniques to the tasks of i) optimizing the number of threads, scheduling policy and chunk size in OpenMP loops and, ii) identifying the best device for heterogeneous device mapping of OpenCL kernels. Our experiments show that this multimodal learning based approach outperforms the state-of-the-art in all experiments.

Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures

  • Authors: Evangelos Georganas, Dhiraj Kalamkar, Kirill Voronin, Antonio Noack, Hans Pabst, Alexander Breuer, Alexander Heinecke
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.12576
  • Pdf link: https://arxiv.org/pdf/2304.12576
  • Abstract
    During the past decade, Deep Learning (DL) algorithms, programming systems and hardware have converged with the High Performance Computing (HPC) counterparts. Nevertheless, the programming methodology of DL and HPC systems is stagnant, relying on highly-optimized, yet platform-specific and inflexible vendor-optimized libraries. Such libraries provide close-to-peak performance on specific platforms, kernels and shapes thereof that vendors have dedicated optimizations efforts, while they underperform in the remaining use-cases, yielding non-portable codes with performance glass-jaws. This work introduces a framework to develop efficient, portable DL and HPC kernels for modern CPU architectures. We decompose the kernel development in two steps: 1) Expressing the computational core using Tensor Processing Primitives (TPPs): a compact, versatile set of 2D-tensor operators, 2) Expressing the logical loops around TPPs in a high-level, declarative fashion whereas the exact instantiation (ordering, tiling, parallelization) is determined via simple knobs. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.

AdaLIO: Robust Adaptive LiDAR-Inertial Odometry in Degenerate Indoor Environments

  • Authors: Hyungtae Lim, Daebeom Kim, Beomsoo Kim, Hyun Myung
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.12577
  • Pdf link: https://arxiv.org/pdf/2304.12577
  • Abstract
    In recent years, the demand for mapping construction sites or buildings using light detection and ranging~(LiDAR) sensors has been increased to model environments for efficient site management. However, it is observed that sometimes LiDAR-based approaches diverge in narrow and confined environments, such as spiral stairs and corridors, caused by fixed parameters regardless of the changes in the environments. That is, the parameters of LiDAR (-inertial) odometry are mostly set for open space; thus, if the same parameters suitable for the open space are applied in a corridor-like scene, it results in divergence of odometry methods, which is referred to as \textit{degeneracy}. To tackle this degeneracy problem, we propose a robust LiDAR inertial odometry called \textit{AdaLIO}, which employs an adaptive parameter setting strategy. To this end, we first check the degeneracy by checking whether the surroundings are corridor-like environments. If so, the parameters relevant to voxelization and normal vector estimation are adaptively changed to increase the number of correspondences. As verified in a public dataset, our proposed method showed promising performance in narrow and cramped environments, avoiding the degeneracy problem.

MixNeRF: Memory Efficient NeRF with Feature Mixed-up Hash Table

  • Authors: Yongjae Lee, Li Yang, Deliang Fan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12587
  • Pdf link: https://arxiv.org/pdf/2304.12587
  • Abstract
    Neural radiance field (NeRF) has shown remarkable performance in generating photo-realistic novel views. Since the emergence of NeRF, many studies have been conducted, among which managing features with explicit structures such as grids has achieved exceptionally fast training by reducing the complexity of multilayer perceptron (MLP) networks. However, storing features in dense grids requires significantly large memory space, which leads to memory bottleneck in computer systems and thus large training time. To address this issue, in this work, we propose MixNeRF, a memory-efficient NeRF framework that employs a mixed-up hash table to improve memory efficiency and reduce training time while maintaining reconstruction quality. We first design a \textit{mixed-up hash table} to adaptively mix part of multi-level feature grids into one and map it to a single hash table. Following that, in order to obtain the correct index of a grid point, we further design an \textit{index transformation} method that transforms indices of an arbitrary level grid to those of a canonical grid. Extensive experiments benchmarking with state-of-the-art Instant-NGP, TensoRF, and DVGO, indicate our MixNeRF could achieve the fastest training time on the same GPU hardware with similar or even higher reconstruction quality. Source code is available at \url{https://github.com/nfyfamr/MixNeRF}.

Analog Iterative Machine (AIM): using light to solve quadratic optimization problems with mixed variables

  • Authors: Kirill kalinin, George Mourgias-Alexandris, Hitesh Ballani, Natalia G. Berloff, James H. Clegg, Daniel Cletheroe, Christos Gkantsidis, Istvan Haller, Vassily Lyutsarev, Francesca Parmigiani, Lucinda Pickup, Antony Rowstron
  • Subjects: Emerging Technologies (cs.ET); Optimization and Control (math.OC); Applied Physics (physics.app-ph)
  • Arxiv link: https://arxiv.org/abs/2304.12594
  • Pdf link: https://arxiv.org/pdf/2304.12594
  • Abstract
    Solving optimization problems is challenging for existing digital computers and even for future quantum hardware. The practical importance of diverse problems, from healthcare to financial optimization, has driven the emergence of specialised hardware over the past decade. However, their support for problems with only binary variables severely restricts the scope of practical problems that can be efficiently embedded. We build analog iterative machine (AIM), the first instance of an opto-electronic solver that natively implements a wider class of quadratic unconstrained mixed optimization (QUMO) problems and supports all-to-all connectivity of both continuous and binary variables.Beyond synthetic 7-bit problems at small-scale, AIM solves the financial transaction settlement problem entirely in analog domain with higher accuracy than quantum hardware and at room temperature. With compute-in-memory operation and spatial-division multiplexed representation of variables, the design of AIM paves the path to chip-scale architecture with 100 times speed-up per unit-power over the latest GPUs for solving problems with 10,000 variables. The robustness of the AIM algorithm at such scale is further demonstrated by comparing it with commercial production solvers across multiple benchmarks, where for several problems we report new best solutions. By combining the superior QUMO abstraction, sophisticated gradient descent methods inspired by machine learning, and commodity hardware, AIM introduces a novel platform with a step change in expressiveness, performance, and scalability, for optimization in the post-Moores law era.

Fast Continuous Subgraph Matching over Streaming Graphs via Backtracking Reduction

  • Authors: Rongjian Yang, Zhijie Zhang, Weiguo Zheng, Jeffery Xu Yu
  • Subjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.12610
  • Pdf link: https://arxiv.org/pdf/2304.12610
  • Abstract
    Streaming graphs are drawing increasing attention in both academic and industrial communities as many graphs in real applications evolve over time. Continuous subgraph matching (shorted as CSM) aims to report the incremental matches of a query graph in such streaming graphs. It involves two major steps, i.e., candidate maintenance and incremental match generation, to answer CSM. Throughout the course of continuous subgraph matching, incremental match generation backtracking over the search space dominates the total cost. However, most previous approaches focus on developing techniques for efficient candidate maintenance, while incremental match generation receives less attention despite its importance in CSM. Aiming to minimize the overall cost, we propose two techniques to reduce backtrackings in this paper. We present a cost-effective index CaLiG that yields tighter candidate maintenance, shrinking the search space of backtracking. In addition, we develop a novel incremental matching paradigm KSS that decomposes the query vertices into conditional kernel vertices and shell vertices. With the matches of kernel vertices, the incremental matches can be produced immediately by joining the candidates of shell vertices without any backtrackings. Benefiting from reduced backtrackings, the elapsed time of CSM decreases significantly. Extensive experiments over real graphs show that our method runs faster than the state-of-the-art algorithm orders of magnitude.

Weakly-Supervised Temporal Action Localization with Bidirectional Semantic Consistency Constraint

  • Authors: Guozhang Li, De Cheng, Xinpeng Ding, Nannan Wang, Jie Li, Xinbo Gao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12616
  • Pdf link: https://arxiv.org/pdf/2304.12616
  • Abstract
    Weakly Supervised Temporal Action Localization (WTAL) aims to classify and localize temporal boundaries of actions for the video, given only video-level category labels in the training datasets. Due to the lack of boundary information during training, existing approaches formulate WTAL as a classificationproblem, i.e., generating the temporal class activation map (T-CAM) for localization. However, with only classification loss, the model would be sub-optimized, i.e., the action-related scenes are enough to distinguish different class labels. Regarding other actions in the action-related scene ( i.e., the scene same as positive actions) as co-scene actions, this sub-optimized model would misclassify the co-scene actions as positive actions. To address this misclassification, we propose a simple yet efficient method, named bidirectional semantic consistency constraint (Bi-SCC), to discriminate the positive actions from co-scene actions. The proposed Bi-SCC firstly adopts a temporal context augmentation to generate an augmented video that breaks the correlation between positive actions and their co-scene actions in the inter-video; Then, a semantic consistency constraint (SCC) is used to enforce the predictions of the original video and augmented video to be consistent, hence suppressing the co-scene actions. However, we find that this augmented video would destroy the original temporal context. Simply applying the consistency constraint would affect the completeness of localized positive actions. Hence, we boost the SCC in a bidirectional way to suppress co-scene actions while ensuring the integrity of positive actions, by cross-supervising the original and augmented videos. Finally, our proposed Bi-SCC can be applied to current WTAL approaches, and improve their performance. Experimental results show that our approach outperforms the state-of-the-art methods on THUMOS14 and ActivityNet.

Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation

  • Authors: Junde Wu, Rao Fu, Huihui Fang, Yuanpei Liu, Zhaowei Wang, Yanwu Xu, Yueming Jin, Tal Arbel
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12620
  • Pdf link: https://arxiv.org/pdf/2304.12620
  • Abstract
    The Segment Anything Model (SAM) has recently gained popularity in the field of image segmentation. Thanks to its impressive capabilities in all-round segmentation tasks and its prompt-based interface, SAM has sparked intensive discussion within the community. It is even said by many prestigious experts that image segmentation task has been "finished" by SAM. However, medical image segmentation, although an important branch of the image segmentation family, seems not to be included in the scope of Segmenting "Anything". Many individual experiments and recent studies have shown that SAM performs subpar in medical image segmentation. A natural question is how to find the missing piece of the puzzle to extend the strong segmentation capability of SAM to medical image segmentation. In this paper, we present a possible solution by fine-tuning the pretrained SAM model following parameter-efficient fine-tuning paradigm with Adapter. Although this work is still one of a few to transfer the popular NLP technique Adapter to computer vision cases, this simple implementation shows surprisingly good performance on medical image segmentation. A medical image adapted SAM, which we have dubbed Medical SAM Adapter (MSA), shows superior performance on 19 medical image segmentation tasks with various image modalities including CT, MRI, ultrasound image, fundus image, and dermoscopic images. MSA outperforms a wide range of state-of-the-art (SOTA) medical image segmentation methods, such as nnUNet, TransUNet, UNetr, MedSegDiff, and so on. Code will be released at: https://github.com/WuJunde/Medical-SAM-Adapter.

Spatiotemporal Graph Convolutional Recurrent Neural Network Model for Citywide Air Pollution Forecasting

  • Authors: Van-Duc Le
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.12630
  • Pdf link: https://arxiv.org/pdf/2304.12630
  • Abstract
    Citywide Air Pollution Forecasting tries to precisely predict the air quality multiple hours ahead for the entire city. This topic is challenged since air pollution varies in a spatiotemporal manner and depends on many complicated factors. Our previous research has solved the problem by considering the whole city as an image and leveraged a Convolutional Long Short-Term Memory (ConvLSTM) model to learn the spatiotemporal features. However, an image-based representation may not be ideal as air pollution and other impact factors have natural graph structures. In this research, we argue that a Graph Convolutional Network (GCN) can efficiently represent the spatial features of air quality readings in the whole city. Specially, we extend the ConvLSTM model to a Spatiotemporal Graph Convolutional Recurrent Neural Network (Spatiotemporal GCRNN) model by tightly integrating a GCN architecture into an RNN structure for efficient learning spatiotemporal characteristics of air quality values and their influential factors. Our extensive experiments prove the proposed model has a better performance compare to the state-of-the-art ConvLSTM model for air pollution predicting while the number of parameters is much smaller. Moreover, our approach is also superior to a hybrid GCN-based method in a real-world air pollution dataset.

LMSFC: A Novel Multidimensional Index based on Learned Monotonic Space Filling Curves

  • Authors: Jian Gao, Xin Cao, Xin Yao, Gong Zhang, Wei Wang
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.12635
  • Pdf link: https://arxiv.org/pdf/2304.12635
  • Abstract
    The recently proposed learned indexes have attracted much attention as they can adapt to the actual data and query distributions to attain better search efficiency. Based on this technique, several existing works build up indexes for multi-dimensional data and achieve improved query performance. A common paradigm of these works is to (i) map multi-dimensional data points to a one-dimensional space using a fixed space-filling curve (SFC) or its variant and (ii) then apply the learned indexing techniques. We notice that the first step typically uses a fixed SFC method, such as row-major order and z-order. It definitely limits the potential of learned multi-dimensional indexes to adapt variable data distributions via different query workloads. In this paper, we propose a novel idea of learning a space-filling curve that is carefully designed and actively optimized for efficient query processing. We also identify innovative offline and online optimization opportunities common to SFC-based learned indexes and offer optimal and/or heuristic solutions. Experimental results demonstrate that our proposed method, LMSFC, outperforms state-of-the-art non-learned or learned methods across three commonly used real-world datasets and diverse experimental settings.

A Practical Algorithm for Max-Norm Optimal Binary Labeling of Graphs

  • Authors: Filip Malmberg, Alexandre X. Falcão
  • Subjects: Discrete Mathematics (cs.DM)
  • Arxiv link: https://arxiv.org/abs/2304.12642
  • Pdf link: https://arxiv.org/pdf/2304.12642
  • Abstract
    This paper concerns the efficient implementation of a method for optimal binary labeling of graph vertices, originally proposed by Malmberg and Ciesielski (2020). This method finds, in quadratic time with respect to graph size, a labeling that globally minimizes an objective function based on the $L_\infty$-norm. The method enables global optimization for a novel class of optimization problems, with high relevance in application areas such as image processing and computer vision. In the original formulation, the Malmberg-Ciesielski algorithm is unfortunately very computationally expensive, limiting its utility in practical applications. Here, we present a modified version of the algorithm that exploits redundancies in the original method to reduce computation time. While our proposed method has the same theoretical asymptotic time complexity, we demonstrate that is substantially more efficient in practice. Even for small problems, we observe a speedup of 4-5 orders of magnitude. This reduction in computation time makes the Malmberg-Ciesielski method a viable option for many practical applications.

Evaluating the Energy Measurements of the IBM POWER9 On-Chip Controller

  • Authors: Hannes Tröpgen, Mario Bielert, Thomas Ilsche
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.12646
  • Pdf link: https://arxiv.org/pdf/2304.12646
  • Abstract
    Dependable power measurements are the backbone of energy-efficient computing systems. The IBM PowerNV platform offers such power measurements through an embedded PowerPC 405 processor: The On-Chip Controller (OCC). Among other system-control tasks, the OCC provides power measurements for several domains, such as system, CPU, and GPU. This paper provides a detailed description and an in-depth evaluation of these OCC-provided power measurements. For that, we describe the provided interfaces themselves and experimentally verify their overhead (3.6 us to 10.8 us per access) and readout rate (24.95 Sa/s). We also study the consistency of the reported sensor readouts across the measurement domains and compare it to externally measured data. Furthermore, we estimate the internal sampling rate (1996 Sa/s) by provoking aliasing errors with artificial workloads, and quantify the errors that such aliasing could introduce in practice (for power consumption of processors 12% in our experimental worst-case scenario). Given these insights, practitioners using the IBM PowerNV platform can assess the quality of the embedded measurements, permitting sought-after energy efficiency improvements.

Towards Generating Hop-constrained s-t Simple Path Graphs

  • Authors: Yuzheng Cai, Siyuan Liu, Weiguo Zheng, Xuemin Lin
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.12656
  • Pdf link: https://arxiv.org/pdf/2304.12656
  • Abstract
    Graphs have been widely used in real-world applications, in which investigating relations between vertices is an important task. In this paper, we study the problem of generating the k-hop-constrained s-t simple path graph, i.e., the subgraph consisting of all simple paths from vertex s to vertex t of length no larger than k. To our best knowledge, we are the first to formalize this problem and prove its NP-hardness on directed graphs. To tackle this challenging problem, we propose an efficient algorithm named EVE, which exploits the paradigm of edge-wise examination rather than exhaustively enumerating all paths. Powered by essential vertices appearing in all simple paths between vertex pairs, EVE distinguishes the edges that are definitely (or not) contained in the desired simple path graph, producing a tight upper-bound graph in the time cost $\mathcal{O}(k^2|E|)$. Each remaining undetermined edge is further verified to deliver the exact answer. Extensive experiments are conducted on 15 real networks. The results show that EVE significantly outperforms all baselines by several orders of magnitude. Moreover, by taking EVE as a built-in block, state-of-the-art for hop-constrained simple path enumeration can be accelerated by up to an order of magnitude.

Patch-based 3D Natural Scene Generation from a Single Example

  • Authors: Weiyu Li, Xuelin Chen, Jue Wang, Baoquan Chen
  • Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12670
  • Pdf link: https://arxiv.org/pdf/2304.12670
  • Abstract
    We target a 3D generative model for general natural scenes that are typically unique and intricate. Lacking the necessary volumes of training data, along with the difficulties of having ad hoc designs in presence of varying scene characteristics, renders existing setups intractable. Inspired by classical patch-based image models, we advocate for synthesizing 3D scenes at the patch level, given a single example. At the core of this work lies important algorithmic designs w.r.t the scene representation and generative patch nearest-neighbor module, that address unique challenges arising from lifting classical 2D patch-based framework to 3D generation. These design choices, on a collective level, contribute to a robust, effective, and efficient model that can generate high-quality general natural scenes with both realistic geometric structure and visual appearance, in large quantities and varieties, as demonstrated upon a variety of exemplar scenes.

A Static Pruning Study on Sparse Neural Retrievers

  • Authors: Carlos Lassance, Simon Lupart, Hervé Dejean, Stéphane Clinchant, Nicola Tonellotto
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.12702
  • Pdf link: https://arxiv.org/pdf/2304.12702
  • Abstract
    Sparse neural retrievers, such as DeepImpact, uniCOIL and SPLADE, have been introduced recently as an efficient and effective way to perform retrieval with inverted indexes. They aim to learn term importance and, in some cases, document expansions, to provide a more effective document ranking compared to traditional bag-of-words retrieval models such as BM25. However, these sparse neural retrievers have been shown to increase the computational costs and latency of query processing compared to their classical counterparts. To mitigate this, we apply a well-known family of techniques for boosting the efficiency of query processing over inverted indexes: static pruning. We experiment with three static pruning strategies, namely document-centric, term-centric and agnostic pruning, and we assess, over diverse datasets, that these techniques still work with sparse neural retrievers. In particular, static pruning achieves $2\times$ speedup with negligible effectiveness loss ($\leq 2%$ drop) and, depending on the use case, even $4\times$ speedup with minimal impact on the effectiveness ($\leq 8%$ drop). Moreover, we show that neural rerankers are robust to candidates from statically pruned indexes.

Focusing on Information Context for ITS using a Spatial Age of Information Model

  • Authors: Julian Heinovski, Jorge Torres Gómez, Falko Dressler
  • Subjects: Multiagent Systems (cs.MA); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.12761
  • Pdf link: https://arxiv.org/pdf/2304.12761
  • Abstract
    New technologies for sensing and communication act as enablers for cooperative driving applications. Sensors are able to detect objects in the surrounding environment and information such as their current location is exchanged among vehicles. In order to cope with the vehicles' mobility, such information is required to be as fresh as possible for proper operation of cooperative driving applications. The age of information (AoI) has been proposed as a metric for evaluating freshness of information; recently also within the context of intelligent transportation systems (ITS). We investigate mechanisms to reduce the AoI of data transported in form of beacon messages while controlling their emission rate. We aim to balance packet collision probability and beacon frequency using the average peak age of information (PAoI) as a metric. This metric, however, only accounts for the generation time of the data but not for application-specific aspects, such as the location of the transmitting vehicle. We thus propose a new way of interpreting the AoI by considering information context; thereby incorporating vehicles' locations. As an example, we characterize such importance using the orientation and the distance of the involved vehicles. In particular, we introduce a weighting coefficient used in combination with the PAoI to evaluate the information freshness; emphasizing on information from more important neighbors. We further design the beaconing approach in a way to meet a given AoI requirement, thus, saving resources on the wireless channel while keeping the AoI minimal. We illustrate the effectiveness of our approach in Manhattan-like urban scenarios; reaching pre-specified targets for the AoI of beacon messages.

Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games

  • Authors: Hédi Hadiji, Sarah Sachs (UvA), Tim van Erven (UvA), Wouter M. Koolen (CWI)
  • Subjects: Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.12768
  • Pdf link: https://arxiv.org/pdf/2304.12768
  • Abstract
    In the first-order query model for zero-sum $K\times K$ matrix games, playersobserve the expected pay-offs for all their possible actions under therandomized action played by their opponent. This is a classical model,which has received renewed interest after the discoveryby Rakhlin and Sridharan that $\epsilon$-approximate Nash equilibria can be computedefficiently from $O(\ln K / \epsilon) $ instead of $O( \ln K / \epsilon^2)$ queries.Surprisingly, the optimal number of such queries, as a function of both$\epsilon$ and $K$, is not known.We make progress on this question on two fronts. First, we fully characterise the query complexity of learning exact equilibria ($\epsilon=0$), by showing that they require a number of queries that is linearin $K$, which means that it is essentially as hard as querying the wholematrix, which can also be done with $K$ queries. Second, for $\epsilon &gt; 0$, the currentquery complexity upper bound stands at $O(\min(\ln(K) / \epsilon , K))$. We argue that, unfortunately, obtaining matchinglower bound is not possible with existing techniques: we prove that nolower bound can be derived by constructing hard matrices whose entriestake values in a known countable set, because such matrices can be fullyidentified by a single query. This rules out, for instance, reducing toa submodular optimization problem over the hypercube by encoding itas a binary matrix. We then introduce a new technique for lower bounds,which allows us to obtain lower bounds of order$\tilde\Omega(\log(1 / (K\epsilon)))$ for any $\epsilon \leq1 / cK^4$, where $c$ is a constant independent of $K$. We furtherdiscuss possible future directions to improve on our techniques in orderto close the gap with the upper bounds.

Binary stochasticity enabled highly efficient neuromorphic deep learning achieves better-than-software accuracy

  • Authors: Yang Li, Wei Wang, Ming Wang, Chunmeng Dou, Zhengyu Ma, Huihui Zhou, Peng Zhang, Nicola Lepri, Xumeng Zhang, Qing Luo, Xiaoxin Xu, Guanhua Yang, Feng Zhang, Ling Li, Daniele Ielmini, Ming Liu
  • Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Signal Processing (eess.SP); Data Analysis, Statistics and Probability (physics.data-an)
  • Arxiv link: https://arxiv.org/abs/2304.12866
  • Pdf link: https://arxiv.org/pdf/2304.12866
  • Abstract
    Deep learning needs high-precision handling of forwarding signals, backpropagating errors, and updating weights. This is inherently required by the learning algorithm since the gradient descent learning rule relies on the chain product of partial derivatives. However, it is challenging to implement deep learning in hardware systems that use noisy analog memristors as artificial synapses, as well as not being biologically plausible. Memristor-based implementations generally result in an excessive cost of neuronal circuits and stringent demands for idealized synaptic devices. Here, we demonstrate that the requirement for high precision is not necessary and that more efficient deep learning can be achieved when this requirement is lifted. We propose a binary stochastic learning algorithm that modifies all elementary neural network operations, by introducing (i) stochastic binarization of both the forwarding signals and the activation function derivatives, (ii) signed binarization of the backpropagating errors, and (iii) step-wised weight updates. Through an extensive hybrid approach of software simulation and hardware experiments, we find that binary stochastic deep learning systems can provide better performance than the software-based benchmarks using the high-precision learning algorithm. Also, the binary stochastic algorithm strongly simplifies the neural network operations in hardware, resulting in an improvement of the energy efficiency for the multiply-and-accumulate operations by more than three orders of magnitudes.

SPDH-Sign: towards Efficient, Post-quantum Group-based Signatures

  • Authors: Christopher Battarbee, Delaram Kahrobaei, Ludovic Perret, Siamak F. Shahandashti
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.12900
  • Pdf link: https://arxiv.org/pdf/2304.12900
  • Abstract
    In this paper, we present a new diverse class of post-quantum group-based Digital Signature Schemes (DSS). The approach is significantly different from previous examples of group-based digital signatures and adopts the framework of group action-based cryptography: we show that each finite group defines a group action relative to the semidirect product of the group by its automorphism group, and give security bounds on the resulting signature scheme in terms of the group-theoretic computational problem known as the Semidirect Discrete Logarithm Problem (SDLP). Crucially, we make progress towards being able to efficiently compute the novel group action, and give an example of a parameterised family of groups for which the group action can be computed for any parameters, thereby negating the need for expensive offline computation or inclusion of redundancy required in other schemes of this type.

User-Centric Federated Learning: Trading off Wireless Resources for Personalization

  • Authors: Mohamad Mestoukirdi, Matteo Zecchin, David Gesbert, Qianrui Li
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.12930
  • Pdf link: https://arxiv.org/pdf/2304.12930
  • Abstract
    Statistical heterogeneity across clients in a Federated Learning (FL) system increases the algorithm convergence time and reduces the generalization performance, resulting in a large communication overhead in return for a poor model. To tackle the above problems without violating the privacy constraints that FL imposes, personalized FL methods have to couple statistically similar clients without directly accessing their data in order to guarantee a privacy-preserving transfer. In this work, we design user-centric aggregation rules at the parameter server (PS) that are based on readily available gradient information and are capable of producing personalized models for each FL client. The proposed aggregation rules are inspired by an upper bound of the weighted aggregate empirical risk minimizer. Secondly, we derive a communication-efficient variant based on user clustering which greatly enhances its applicability to communication-constrained systems. Our algorithm outperforms popular personalized FL baselines in terms of average accuracy, worst node performance, and training communication overhead.

SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators

  • Authors: Victor J.B. Jung, Arne Symons, Linyan Mei, Marian Verhelst, Luca Benini
  • Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.12931
  • Pdf link: https://arxiv.org/pdf/2304.12931
  • Abstract
    To meet the growing need for computational power for DNNs, multiple specialized hardware architectures have been proposed. Each DNN layer should be mapped onto the hardware with the most efficient schedule, however, SotA schedulers struggle to consistently provide optimum schedules in a reasonable time across all DNN-HW combinations. This paper proposes SALSA, a fast dual-engine scheduler to generate optimal execution schedules for both even and uneven mapping. We introduce a new strategy, combining exhaustive search with simulated annealing to address the dynamic nature of the loop ordering design space size across layers. SALSA is extensively benchmarked against two SotA schedulers, LOMA and Timeloop on 5 different DNNs, on average SALSA finds schedules with 11.9% and 7.6% lower energy while speeding up the search by 1.7x and 24x compared to LOMA and Timeloop, respectively.

Nondeterministic Stacks in Neural Networks

  • Authors: Brian DuSell
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.12955
  • Pdf link: https://arxiv.org/pdf/2304.12955
  • Abstract
    Human language is full of compositional syntactic structures, and although neural networks have contributed to groundbreaking improvements in computer systems that process language, widely-used neural network architectures still exhibit limitations in their ability to process syntax. To address this issue, prior work has proposed adding stack data structures to neural networks, drawing inspiration from theoretical connections between syntax and stacks. However, these methods employ deterministic stacks that are designed to track one parse at a time, whereas syntactic ambiguity, which requires a nondeterministic stack to parse, is extremely common in language. In this dissertation, we remedy this discrepancy by proposing a method of incorporating nondeterministic stacks into neural networks. We develop a differentiable data structure that efficiently simulates a nondeterministic pushdown automaton, representing an exponential number of computations with a dynamic programming algorithm. We incorporate this module into two predominant architectures: recurrent neural networks (RNNs) and transformers. We show that this raises their formal recognition power to arbitrary context-free languages, and also aids training, even on deterministic context-free languages. Empirically, neural networks with nondeterministic stacks learn context-free languages much more effectively than prior stack-augmented models, including a language with theoretically maximal parsing difficulty. We also show that an RNN augmented with a nondeterminsitic stack is capable of surprisingly powerful behavior, such as learning cross-serial dependencies, a well-known non-context-free pattern. We demonstrate improvements on natural language modeling and provide analysis on a syntactic generalization benchmark. This work represents an important step toward building systems that learn to use syntax in more human-like fashion.

Faster High Accuracy Multi-Commodity Flow from Single-Commodity Techniques

  • Authors: Jan van den Brand, Daniel Zhang
  • Subjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.12992
  • Pdf link: https://arxiv.org/pdf/2304.12992
  • Abstract
    Since the development of efficient linear program solvers in the 80s, all major improvements for solving multi-commodity flows to high accuracy came from improvements to general linear program solvers. This differs from the single commodity problem (e.g.maximum flow) where all recent improvements also rely on graph specific techniques such as graph decompositions or the Laplacian paradigm (see e.g.[CMSV17,KLS20,BLL+21,CKL+22]). This phenomenon sparked research to understand why these graph techniques are unlikely to help for multi-commodity flow. [Kyng, Zhang'20] reduced solving multi-commodity Laplacians to general linear systems and [Ding, Kyng, Zhang'22] showed that general linear programs can be reduced to 2-commodity flow. However, the reductions create sparse graph instances, so improvement to multi-commodity flows on denser graphs might exist. We show that one can indeed speed up multi-commodity flow algorithms on non-sparse graphs using graph techniques from single-commodity flow algorithms. This is the first improvement to high accuracy multi-commodity flow algorithms that does not just stem from improvements to general linear program solvers. In particular, using graph data structures from recent min-cost flow algorithm by [BLL+21] based on the celebrated expander decomposition framework, we show that 2-commodity flow on an $n$-vertex $m$-edge graph can be solved in $\tilde{O}(\sqrt{m}n^{\omega-1/2})$ time for current bounds on fast matrix multiplication $\omega \approx 2.373$, improving upon the previous fastest algorithms with $\tilde{O}(m^\omega)$ [CLS19] and $\tilde{O}(\sqrt{m}n^2)$ [KV96] time complexity. For general $k$ commodities, our algorithm runs in $\tilde{O}(k^{2.5}\sqrt{m}n^{\omega-1/2})$ time.

Room dimensions and absorption inference from room transfer function via machine learning

  • Authors: Yuanxin Xia, Cheol-Ho Jeong
  • Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.12993
  • Pdf link: https://arxiv.org/pdf/2304.12993
  • Abstract
    The inference of the absorption configuration of an existing room solely using acoustic signals can be challenging. This research presents two methods for estimating the room dimensions and frequency-dependent absorption coefficients using room transfer functions. The first method, a knowledge-based approach, calculates the room dimensions through damped resonant frequencies of the room. The second method, a machine learning approach, employs multi-task convolutional neural networks for inferring the room dimensions and frequency-dependent absorption coefficients of each surface. The study shows that accurate wave-based simulation data can be used to train neural networks for real-world measurements and demonstrates a potential for this algorithm to be used to estimate the boundary input data for room acoustic simulations. The proposed methods can be a valuable tool for room acoustic simulations during acoustic renovation or intervention projects, as they enable to infer the room geometry and absorption conditions with reasonably small data requirements.

On the Generalization of Learned Structured Representations

  • Authors: Andrea Dittadi
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13001
  • Pdf link: https://arxiv.org/pdf/2304.13001
  • Abstract
    Despite tremendous progress over the past decade, deep learning methods generally fall short of human-level systematic generalization. It has been argued that explicitly capturing the underlying structure of data should allow connectionist systems to generalize in a more predictable and systematic manner. Indeed, evidence in humans suggests that interpreting the world in terms of symbol-like compositional entities may be crucial for intelligent behavior and high-level reasoning. Another common limitation of deep learning systems is that they require large amounts of training data, which can be expensive to obtain. In representation learning, large datasets are leveraged to learn generic data representations that may be useful for efficient learning of arbitrary downstream tasks. This thesis is about structured representation learning. We study methods that learn, with little or no supervision, representations of unstructured data that capture its hidden structure. In the first part of the thesis, we focus on representations that disentangle the explanatory factors of variation of the data. We scale up disentangled representation learning to a novel robotic dataset, and perform a systematic large-scale study on the role of pretrained representations for out-of-distribution generalization in downstream robotic tasks. The second part of this thesis focuses on object-centric representations, which capture the compositional structure of the input in terms of symbol-like entities, such as objects in visual scenes. Object-centric learning methods learn to form meaningful entities from unstructured input, enabling symbolic information processing on a connectionist substrate. In this study, we train a selection of methods on several common datasets, and investigate their usefulness for downstream tasks and their ability to generalize out of distribution.

Flickr-PAD: New Face High-Resolution Presentation Attack Detection Database

  • Authors: Diego Pasmino, Carlos Aravena, Juan Tapia, Christoph Busch
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13015
  • Pdf link: https://arxiv.org/pdf/2304.13015
  • Abstract
    Nowadays, Presentation Attack Detection is a very active research area. Several databases are constituted in the state-of-the-art using images extracted from videos. One of the main problems identified is that many databases present a low-quality, small image size and do not represent an operational scenario in a real remote biometric system. Currently, these images are captured from smartphones with high-quality and bigger resolutions. In order to increase the diversity of image quality, this work presents a new PAD database based on open-access Flickr images called: "Flickr-PAD". Our new hand-made database shows high-quality printed and screen scenarios. This will help researchers to compare new approaches to existing algorithms on a wider database. This database will be available for other researchers. A leave-one-out protocol was used to train and evaluate three PAD models based on MobileNet-V3 (small and large) and EfficientNet-B0. The best result was reached with MobileNet-V3 large with BPCER10 of 7.08% and BPCER20 of 11.15%.

Keyword: faster

Green Video Complexity Analysis for Efficient Encoding in Adaptive Video Streaming

  • Authors: Vignesh V Menon, Christian Feldmann, Klaus Schoeffmann, Mohammad Ghanbari, Christian Timmerer
  • Subjects: Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2304.12384
  • Pdf link: https://arxiv.org/pdf/2304.12384
  • Abstract
    For adaptive streaming applications, low-complexity and accurate video complexity features are necessary to analyze the video content in real time, which ensures fast and compression-efficient video streaming without disruptions. State-of-the-art video complexity features are Spatial Information (SI) and Temporal Information (TI) features which do not correlate well with the encoding parameters in adaptive streaming applications. To this light, Video Complexity Analyzer (VCA) was introduced, determining the features based on Discrete Cosine Transform (DCT)-energy. This paper presents optimizations on VCA for faster and energy-efficient video complexity analysis. Experimental results show that VCA v2.0, using eight CPU threads, Single Instruction Multiple Data (SIMD), and low-pass DCT optimization, determines seven complexity features of Ultra High Definition 8-bit videos with better accuracy at a speed of up to 292.68 fps and an energy consumption of 97.06% lower than the reference SITI implementation.

DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents

  • Authors: Mohamed Dhouib, Ghassen Bettaieb, Aymen Shabou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.12484
  • Pdf link: https://arxiv.org/pdf/2304.12484
  • Abstract
    Information Extraction from visually rich documents is a challenging task that has gained a lot of attention in recent years due to its importance in several document-control based applications and its widespread commercial value. The majority of the research work conducted on this topic to date follow a two-step pipeline. First, they read the text using an off-the-shelf Optical Character Recognition (OCR) engine, then, they extract the fields of interest from the obtained text. The main drawback of these approaches is their dependence on an external OCR system, which can negatively impact both performance and computational speed. Recent OCR-free methods were proposed to address the previous issues. Inspired by their promising results, we propose in this paper an OCR-free end-to-end information extraction model named DocParser. It differs from prior end-to-end approaches by its ability to better extract discriminative character features. DocParser achieves state-of-the-art results on various datasets, while still being faster than previous works.

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

  • Authors: Zhendong Wang, Yifan Jiang, Huangjie Zheng, Peihao Wang, Pengcheng He, Zhangyang Wang, Weizhu Chen, Mingyuan Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.12526
  • Pdf link: https://arxiv.org/pdf/2304.12526
  • Abstract
    Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. At the core of our innovations is a new conditional score function at the patch level, where the patch location in the original image is included as additional coordinate channels, while the patch size is randomized and diversified throughout training to encode the cross-region dependency at multiple scales. Sampling with our method is as easy as in the original diffusion model. Through Patch Diffusion, we could achieve $\mathbf{\ge 2\times}$ faster training, while maintaining comparable or better generation quality. Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e.g.$, as few as 5,000 images to train from scratch. We achieve state-of-the-art FID scores 1.77 on CelebA-64$\times$64 and 1.93 on AFHQv2-Wild-64$\times$64. We will share our code and pre-trained models soon.

Fast Continuous Subgraph Matching over Streaming Graphs via Backtracking Reduction

  • Authors: Rongjian Yang, Zhijie Zhang, Weiguo Zheng, Jeffery Xu Yu
  • Subjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.12610
  • Pdf link: https://arxiv.org/pdf/2304.12610
  • Abstract
    Streaming graphs are drawing increasing attention in both academic and industrial communities as many graphs in real applications evolve over time. Continuous subgraph matching (shorted as CSM) aims to report the incremental matches of a query graph in such streaming graphs. It involves two major steps, i.e., candidate maintenance and incremental match generation, to answer CSM. Throughout the course of continuous subgraph matching, incremental match generation backtracking over the search space dominates the total cost. However, most previous approaches focus on developing techniques for efficient candidate maintenance, while incremental match generation receives less attention despite its importance in CSM. Aiming to minimize the overall cost, we propose two techniques to reduce backtrackings in this paper. We present a cost-effective index CaLiG that yields tighter candidate maintenance, shrinking the search space of backtracking. In addition, we develop a novel incremental matching paradigm KSS that decomposes the query vertices into conditional kernel vertices and shell vertices. With the matches of kernel vertices, the incremental matches can be produced immediately by joining the candidates of shell vertices without any backtrackings. Benefiting from reduced backtrackings, the elapsed time of CSM decreases significantly. Extensive experiments over real graphs show that our method runs faster than the state-of-the-art algorithm orders of magnitude.

Demystifying Random Number in Ethereum Smart Contract: Taxonomy, Vulnerability Identification, and Attack Detection

  • Authors: Peng Qian, Jianting He, Lingling Lu, Siwei Wu, Zhipeng Lu, Lei Wu, Yajin Zhou, Qinming He
  • Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.12645
  • Pdf link: https://arxiv.org/pdf/2304.12645
  • Abstract
    Recent years have witnessed explosive growth in blockchain smart contract applications. As smart contracts become increasingly popular and carry trillion dollars worth of digital assets, they become more of an appealing target for attackers, who have exploited vulnerabilities in smart contracts to cause catastrophic economic losses. Notwithstanding a proliferation of work that has been developed to detect an impressive list of vulnerabilities, the bad randomness vulnerability is overlooked by many existing tools. In this paper, we make the first attempt to provide a systematic analysis of random numbers in Ethereum smart contracts, by investigating the principles behind pseudo-random number generation and organizing them into a taxonomy. We also lucubrate various attacks against bad random numbers and group them into four categories. Furthermore, we present RNVulDet - a tool that incorporates taint analysis techniques to automatically identify bad randomness vulnerabilities and detect corresponding attack transactions. To extensively verify the effectiveness of RNVulDet, we construct three new datasets: i) 34 well-known contracts that are reported to possess bad randomness vulnerabilities, ii) 214 popular contracts that have been rigorously audited before launch and are regarded as free of bad randomness vulnerabilities, and iii) a dataset consisting of 47,668 smart contracts and 49,951 suspicious transactions. We compare RNVulDet with three state-of-the-art smart contract vulnerability detectors, and our tool significantly outperforms them. Meanwhile, RNVulDet spends 2.98s per contract on average, in most cases orders-of-magnitude faster than other tools. RNVulDet successfully reveals 44,264 attack transactions. Our implementation and datasets are released, hoping to inspire others.

Channel Estimation and Signal Detection for NLOS Ultraviolet Scattering Communication with Space Division Multiple Access

  • Authors: Yubo Zhang, Yuchen Pan, Chen Gong, Beiyuan Liu, Zhengyuan Xu
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.12804
  • Pdf link: https://arxiv.org/pdf/2304.12804
  • Abstract
    We design a receiver assembling several photomultipliers (PMTs) as an array to increase the field of view (FOV) of the receiver and adapt to multiuser situation over None-line-of-sight (NLOS) ultraviolet (UV) channels. Channel estimation and signal detection have been investigated according to the space division characteristics of the structure. Firstly, we adopt the balanced structure on the pilot matrix, analyze the channel estimation mean square error (MSE), and optimize the structure parameters. Then, with the estimated parameters, an analytical threshold detection rule is proposed as a preliminary work of multiuser detection. The detection rule can be optimized by analyzing the separability of two users based on the Gaussian approximation of Poisson weighted sum. To assess the effect of imperfect estimation, the sensitivity analysis of channel estimation error on two-user signal detection is performed. Moreover, we propose a successive elimination method for on-off keying (OOK) modulated multiuser symbol detection based on the previous threshold detection rule. A closed-form upper bound on the detection error rate is calculated, which turns out to be a good approximation of that of multiuser maximum-likelihood (ML) detection. The proposed successive elimination method is twenty times faster than the ML detection with negligible detection error rate degradation.

Keyword: mobile

IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and Earbuds

  • Authors: Vimal Mollyn, Riku Arakawa, Mayank Goel, Chris Harrison, Karan Ahuja
  • Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12518
  • Pdf link: https://arxiv.org/pdf/2304.12518
  • Abstract
    Tracking body pose on-the-go could have powerful uses in fitness, mobile gaming, context-aware virtual assistants, and rehabilitation. However, users are unlikely to buy and wear special suits or sensor arrays to achieve this end. Instead, in this work, we explore the feasibility of estimating body pose using IMUs already in devices that many users own -- namely smartphones, smartwatches, and earbuds. This approach has several challenges, including noisy data from low-cost commodity IMUs, and the fact that the number of instrumentation points on a users body is both sparse and in flux. Our pipeline receives whatever subset of IMU data is available, potentially from just a single device, and produces a best-guess pose. To evaluate our model, we created the IMUPoser Dataset, collected from 10 participants wearing or holding off-the-shelf consumer devices and across a variety of activity contexts. We provide a comprehensive evaluation of our system, benchmarking it on both our own and existing IMU datasets.

SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge

  • Authors: Ke Chen, Liangyan Li, Huan Liu, Yunzhe Li, Congling Tang, Jun Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12556
  • Pdf link: https://arxiv.org/pdf/2304.12556
  • Abstract
    Stereo Image Super-Resolution (stereoSR) has attracted significant attention in recent years due to the extensive deployment of dual cameras in mobile phones, autonomous vehicles and robots. In this work, we propose a new StereoSR method, named SwinFSR, based on an extension of SwinIR, originally designed for single image restoration, and the frequency domain knowledge obtained by the Fast Fourier Convolution (FFC). Specifically, to effectively gather global information, we modify the Residual Swin Transformer blocks (RSTBs) in SwinIR by explicitly incorporating the frequency domain knowledge using the FFC and employing the resulting residual Swin Fourier Transformer blocks (RSFTBs) for feature extraction. Besides, for the efficient and accurate fusion of stereo views, we propose a new cross-attention module referred to as RCAM, which achieves highly competitive performance while requiring less computational cost than the state-of-the-art cross-attention modules. Extensive experimental results and ablation studies demonstrate the effectiveness and efficiency of our proposed SwinFSR.

Social media in the Global South: A Network Dataset of the Malian Twittersphere

  • Authors: Daniel Thilo Schroeder, Mirjam de Bruijn, Luca Bruls, Mulatu Alemayehu Moges, Samba Dialimpa Badji, Noémie Fritz, Modibo Galy Cisse, Johannes Langguth, Bruce Mutsvairo, Kristin Skare Orgeret
  • Subjects: Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.12668
  • Pdf link: https://arxiv.org/pdf/2304.12668
  • Abstract
    With the expansion of mobile communications infrastructure and the resulting proliferation of smartphones, social media usage in the Global South is surging with Twitter fast becoming an important platform. In this paper, we present what to our knowledge is the first data set of a Twitter landscape in an African country that is beset by conflict. In particular, we provide a comprehensive data base to explore Twitter usage in Mali, a west African country that until recently has had a relatively precarious media ecology. Mali has since 2012 been affected by an intersection of armed conflicts, often between different ethnic and religious groups. We collected the database in 2022, in a period when the Malian conflict became more violent, both internally and towards external, international actors. We assume that this context influences the ways in which people access social media, and therefore the shape of the Twittersphere and its characteristics. Hence our aim is to primarily invite researchers from various disciplines including complex networks and social sciences scholars to further explore these characteristics. The given snapshot of the Malian Twitter follower network, contains 7M accounts with 56K accounts clearly identifiable as Malian, a figure that coincides with official numbers. In addition, we present the tweets. Both are attached to the data set. The dataset is available at https://osf.io/XXX (available after review). The corresponding hydrate scripts are available at https: //github.com/XXX (available after review).

Linguistic Dead-Ends and Alphabet Soup: Finding Dark Patterns in Japanese Apps

  • Authors: Shun Hidaka, Sota Kobuki, Mizuki Watanabe, Katie Seaborn
  • Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.12811
  • Pdf link: https://arxiv.org/pdf/2304.12811
  • Abstract
    Dark patterns are deceptive and malicious properties of user interfaces that lead the end-user to do something different from intended or expected. While now a key topic in critical computing, most work has been conducted in Western contexts. Japan, with its booming app market, is a relatively uncharted context that offers culturally- and linguistically-sensitive differences in design standards, contexts of use, values, and language, all of which could influence the presence and expression of dark patterns. In this work, we analyzed 200 popular mobile apps in the Japanese market. We found that most apps had dark patterns, with an average of 3.9 per app. We also identified a new class of dark pattern: "Linguistic Dead-Ends" in the forms of "Untranslation" and "Alphabet Soup." We outline the implications for design and research practice, especially for future cross-cultural research on dark patterns.

Automated Solubility Analysis System and Method Using Computer Vision and Machine Learning

  • Authors: Gahee Kim, Minwoo Jeon, Hyun Do Choi, Jun Ki Cho, Youn-Suk Choi, Hyoseok Hwang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Chemical Physics (physics.chem-ph)
  • Arxiv link: https://arxiv.org/abs/2304.12972
  • Pdf link: https://arxiv.org/pdf/2304.12972
  • Abstract
    In this study, a novel active solubility sensing device using computer vision is proposed to improve separation purification performance and prevent malfunctions of separation equipment such as preparative liquid chromatographers and evaporators. The proposed device actively measures the solubility by transmitting a solution using a background image. The proposed system is a combination of a device that uses a background image and a method for estimating the dissolution and particle presence by changing the background image. The proposed device consists of four parts: camera, display, adjustment, and server units. The camera unit is made up of a rear image sensor on a mobile phone. The display unit is comprised of a tablet screen. The adjustment unit is composed of rotating and height-adjustment jigs. Finally, the server unit consists of a socket server for communication between the units and a PC, including an automated solubility analysis system implemented in Python. The dissolution status of the solution was divided into four categories and a case study was conducted. The algorithms were trained based on these results. Six organic materials and four organic solvents were combined with 202 tests to train the developed algorithm. As a result, the evaluation rate for the dissolution state exhibited an accuracy of 95 %. In addition, the device and method must develop a feedback function that can add a solvent or solute after dissolution detection using solubility results for use in autonomous systems, such as a synthetic automation system. Finally, the diversification of the sensing method is expected to extend not only to the solution but also to the solubility and homogeneity analysis of the film.

Flickr-PAD: New Face High-Resolution Presentation Attack Detection Database

  • Authors: Diego Pasmino, Carlos Aravena, Juan Tapia, Christoph Busch
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13015
  • Pdf link: https://arxiv.org/pdf/2304.13015
  • Abstract
    Nowadays, Presentation Attack Detection is a very active research area. Several databases are constituted in the state-of-the-art using images extracted from videos. One of the main problems identified is that many databases present a low-quality, small image size and do not represent an operational scenario in a real remote biometric system. Currently, these images are captured from smartphones with high-quality and bigger resolutions. In order to increase the diversity of image quality, this work presents a new PAD database based on open-access Flickr images called: "Flickr-PAD". Our new hand-made database shows high-quality printed and screen scenarios. This will help researchers to compare new approaches to existing algorithms on a wider database. This database will be available for other researchers. A leave-one-out protocol was used to train and evaluate three PAD models based on MobileNet-V3 (small and large) and EfficientNet-B0. The best result was reached with MobileNet-V3 large with BPCER10 of 7.08% and BPCER20 of 11.15%.

Keyword: pruning

Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures

  • Authors: Eugenia Iofinova, Alexandra Peste, Dan Alistarh
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.12622
  • Pdf link: https://arxiv.org/pdf/2304.12622
  • Abstract
    Pruning - that is, setting a significant subset of the parameters of a neural network to zero - is one of the most popular methods of model compression. Yet, several recent works have raised the issue that pruning may induce or exacerbate bias in the output of the compressed model. Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood. In this work, we systematically investigate and characterize this phenomenon in Convolutional Neural Networks for computer vision. First, we show that it is in fact possible to obtain highly-sparse models, e.g. with less than 10% remaining weights, which do not decrease in accuracy nor substantially increase in bias when compared to dense models. At the same time, we also find that, at higher sparsities, pruned models exhibit higher uncertainty in their outputs, as well as increased correlations, which we directly link to increased bias. We propose easy-to-use criteria which, based only on the uncompressed model, establish whether bias will increase with pruning, and identify the samples most susceptible to biased predictions post-compression.

A Static Pruning Study on Sparse Neural Retrievers

  • Authors: Carlos Lassance, Simon Lupart, Hervé Dejean, Stéphane Clinchant, Nicola Tonellotto
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.12702
  • Pdf link: https://arxiv.org/pdf/2304.12702
  • Abstract
    Sparse neural retrievers, such as DeepImpact, uniCOIL and SPLADE, have been introduced recently as an efficient and effective way to perform retrieval with inverted indexes. They aim to learn term importance and, in some cases, document expansions, to provide a more effective document ranking compared to traditional bag-of-words retrieval models such as BM25. However, these sparse neural retrievers have been shown to increase the computational costs and latency of query processing compared to their classical counterparts. To mitigate this, we apply a well-known family of techniques for boosting the efficiency of query processing over inverted indexes: static pruning. We experiment with three static pruning strategies, namely document-centric, term-centric and agnostic pruning, and we assess, over diverse datasets, that these techniques still work with sparse neural retrievers. In particular, static pruning achieves $2\times$ speedup with negligible effectiveness loss ($\leq 2%$ drop) and, depending on the use case, even $4\times$ speedup with minimal impact on the effectiveness ($\leq 8%$ drop). Moreover, we show that neural rerankers are robust to candidates from statically pruned indexes.

Expand-and-Cluster: Exact Parameter Recovery of Neural Networks

  • Authors: Flavio Martinelli, Berfin Simsek, Johanni Brea, Wulfram Gerstner
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.12794
  • Pdf link: https://arxiv.org/pdf/2304.12794
  • Abstract
    Can we recover the hidden parameters of an Artificial Neural Network (ANN) by probing its input-output mapping? We propose a systematic method, called `Expand-and-Cluster' that needs only the number of hidden layers and the activation function of the probed ANN to identify all network parameters. In the expansion phase, we train a series of student networks of increasing size using the probed data of the ANN as a teacher. Expansion stops when a minimal loss is consistently reached in student networks of a given size. In the clustering phase, weight vectors of the expanded students are clustered, which allows structured pruning of superfluous neurons in a principled way. We find that an overparameterization of a factor four is sufficient to reliably identify the minimal number of neurons and to retrieve the original network parameters in $80%$ of tasks across a family of 150 toy problems of variable difficulty. Furthermore, a teacher network trained on MNIST data can be identified with less than $5%$ overhead in the neuron number. Thus, while direct training of a student network with a size identical to that of the teacher is practically impossible because of the non-convex loss function, training with mild overparameterization followed by clustering and structured pruning correctly identifies the target network.

Keyword: voxel

AdaLIO: Robust Adaptive LiDAR-Inertial Odometry in Degenerate Indoor Environments

  • Authors: Hyungtae Lim, Daebeom Kim, Beomsoo Kim, Hyun Myung
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.12577
  • Pdf link: https://arxiv.org/pdf/2304.12577
  • Abstract
    In recent years, the demand for mapping construction sites or buildings using light detection and ranging~(LiDAR) sensors has been increased to model environments for efficient site management. However, it is observed that sometimes LiDAR-based approaches diverge in narrow and confined environments, such as spiral stairs and corridors, caused by fixed parameters regardless of the changes in the environments. That is, the parameters of LiDAR (-inertial) odometry are mostly set for open space; thus, if the same parameters suitable for the open space are applied in a corridor-like scene, it results in divergence of odometry methods, which is referred to as \textit{degeneracy}. To tackle this degeneracy problem, we propose a robust LiDAR inertial odometry called \textit{AdaLIO}, which employs an adaptive parameter setting strategy. To this end, we first check the degeneracy by checking whether the surroundings are corridor-like environments. If so, the parameters relevant to voxelization and normal vector estimation are adaptively changed to increase the number of correspondences. As verified in a public dataset, our proposed method showed promising performance in narrow and cramped environments, avoiding the degeneracy problem.

DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection

  • Authors: Huan-ang Gao, Beiwen Tian, Pengfei Li, Hao Zhao, Guyue Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13031
  • Pdf link: https://arxiv.org/pdf/2304.13031
  • Abstract
    In this paper, we study the problem of semi-supervised 3D object detection, which is of great importance considering the high annotation cost for cluttered 3D indoor scenes. We resort to the robust and principled framework of selfteaching, which has triggered notable progress for semisupervised learning recently. While this paradigm is natural for image-level or pixel-level prediction, adapting it to the detection problem is challenged by the issue of proposal matching. Prior methods are based upon two-stage pipelines, matching heuristically selected proposals generated in the first stage and resulting in spatially sparse training signals. In contrast, we propose the first semisupervised 3D detection algorithm that works in the singlestage manner and allows spatially dense training signals. A fundamental issue of this new design is the quantization error caused by point-to-voxel discretization, which inevitably leads to misalignment between two transformed views in the voxel domain. To this end, we derive and implement closed-form rules that compensate this misalignment onthe-fly. Our results are significant, e.g., promoting ScanNet [email protected] from 35.2% to 48.5% using 20% annotation. Codes and data will be publicly available.

Keyword: lidar

Pointersect: Neural Rendering with Cloud-Ray Intersection

  • Authors: Jen-Hao Rick Chang, Wei-Yu Chen, Anurag Ranjan, Kwang Moo Yi, Oncel Tuzel
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.12390
  • Pdf link: https://arxiv.org/pdf/2304.12390
  • Abstract
    We propose a novel method that renders point clouds as if they are surfaces. The proposed method is differentiable and requires no scene-specific optimization. This unique capability enables, out-of-the-box, surface normal estimation, rendering room-scale point clouds, inverse rendering, and ray tracing with global illumination. Unlike existing work that focuses on converting point clouds to other representations--e.g., surfaces or implicit functions--our key idea is to directly infer the intersection of a light ray with the underlying surface represented by the given point cloud. Specifically, we train a set transformer that, given a small number of local neighbor points along a light ray, provides the intersection point, the surface normal, and the material blending weights, which are used to render the outcome of this light ray. Localizing the problem into small neighborhoods enables us to train a model with only 48 meshes and apply it to unseen point clouds. Our model achieves higher estimation accuracy than state-of-the-art surface reconstruction and point-cloud rendering methods on three test sets. When applied to room-scale point clouds, without any scene-specific optimization, the model achieves competitive quality with the state-of-the-art novel-view rendering methods. Moreover, we demonstrate ability to render and manipulate Lidar-scanned point clouds such as lighting control and object insertion.

End-to-End Lidar-Camera Self-Calibration for Autonomous Vehicles

  • Authors: Arya Rachman, Jürgen Seiler, André Kaup
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.12412
  • Pdf link: https://arxiv.org/pdf/2304.12412
  • Abstract
    Autonomous vehicles are equipped with a multi-modal sensor setup to enable the car to drive safely. The initial calibration of such perception sensors is a highly matured topic and is routinely done in an automated factory environment. However, an intriguing question arises on how to maintain the calibration quality throughout the vehicle's operating duration. Another challenge is to calibrate multiple sensors jointly to ensure no propagation of systemic errors. In this paper, we propose CaLiCa, an end-to-end deep self-calibration network which addresses the automatic calibration problem for pinhole camera and Lidar. We jointly predict the camera intrinsic parameters (focal length and distortion) as well as Lidar-Camera extrinsic parameters (rotation and translation), by regressing feature correlation between the camera image and the Lidar point cloud. The network is arranged in a Siamese-twin structure to constrain the network features learning to a mutually shared feature in both point cloud and camera (Lidar-camera constraint). Evaluation using KITTI datasets shows that we achieve 0.154 {\deg} and 0.059 m accuracy with a reprojection error of 0.028 pixel with a single-pass inference. We also provide an ablative study of how our end-to-end learning architecture offers lower terminal loss (21% decrease in rotation loss) compared to isolated calibration

Object Semantics Give Us the Depth We Need: Multi-task Approach to Aerial Depth Completion

  • Authors: Sara Hatami Gazani, Fardad Dadboud, Miodrag Bolic, Iraj Mantegh, Homayoun Najjaran
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.12542
  • Pdf link: https://arxiv.org/pdf/2304.12542
  • Abstract
    Depth completion and object detection are two crucial tasks often used for aerial 3D mapping, path planning, and collision avoidance of Uncrewed Aerial Vehicles (UAVs). Common solutions include using measurements from a LiDAR sensor; however, the generated point cloud is often sparse and irregular and limits the system's capabilities in 3D rendering and safety-critical decision-making. To mitigate this challenge, information from other sensors on the UAV (viz., a camera used for object detection) is utilized to help the depth completion process generate denser 3D models. Performing both aerial depth completion and object detection tasks while fusing the data from the two sensors poses a challenge to resource efficiency. We address this challenge by proposing a novel approach to jointly execute the two tasks in a single pass. The proposed method is based on an encoder-focused multi-task learning model that exposes the two tasks to jointly learned features. We demonstrate how semantic expectations of the objects in the scene learned by the object detection pathway can boost the performance of the depth completion pathway while placing the missing depth values. Experimental results show that the proposed multi-task network outperforms its single-task counterpart, particularly when exposed to defective inputs.

AdaLIO: Robust Adaptive LiDAR-Inertial Odometry in Degenerate Indoor Environments

  • Authors: Hyungtae Lim, Daebeom Kim, Beomsoo Kim, Hyun Myung
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.12577
  • Pdf link: https://arxiv.org/pdf/2304.12577
  • Abstract
    In recent years, the demand for mapping construction sites or buildings using light detection and ranging~(LiDAR) sensors has been increased to model environments for efficient site management. However, it is observed that sometimes LiDAR-based approaches diverge in narrow and confined environments, such as spiral stairs and corridors, caused by fixed parameters regardless of the changes in the environments. That is, the parameters of LiDAR (-inertial) odometry are mostly set for open space; thus, if the same parameters suitable for the open space are applied in a corridor-like scene, it results in divergence of odometry methods, which is referred to as \textit{degeneracy}. To tackle this degeneracy problem, we propose a robust LiDAR inertial odometry called \textit{AdaLIO}, which employs an adaptive parameter setting strategy. To this end, we first check the degeneracy by checking whether the surroundings are corridor-like environments. If so, the parameters relevant to voxelization and normal vector estimation are adaptively changed to increase the number of correspondences. As verified in a public dataset, our proposed method showed promising performance in narrow and cramped environments, avoiding the degeneracy problem.

ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds

  • Authors: Xiangze Jia, Hui Zhou, Xinge Zhu, Yandong Guo, Ji Zhang, Yuexin Ma
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12589
  • Pdf link: https://arxiv.org/pdf/2304.12589
  • Abstract
    In this paper, we propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation. Different from usually adopted self-supervised strategies for data-level structure consistency, we predict scene motion via feature-level consistency between pillars in consecutive frames, which can eliminate the effect caused by noise points and view-changing point clouds in dynamic scenes. Specifically, we propose \textit{Soft Discriminative Loss} that provides the network with more pseudo-supervised signals to learn discriminative and robust features in a contrastive learning manner. We also propose \textit{Gated Multi-frame Fusion} block that learns valid compensation between point cloud frames automatically to enhance feature extraction. Finally, \textit{pillar association} is proposed to predict pillar correspondence probabilities based on feature distance, and whereby further predicts scene motion. Extensive experiments show the effectiveness and superiority of our \textbf{ContrastMotion} on both scene flow and motion prediction tasks. The code is available soon.

Keyword: diffusion

Matrix-free GPU-accelerated saddle-point solvers for high-order problems in $H(\mathrm{div})$

  • Authors: Will Pazner, Tzanio Kolev, Panayot Vassilevski
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.12387
  • Pdf link: https://arxiv.org/pdf/2304.12387
  • Abstract
    This work describes the development of matrix-free GPU-accelerated solvers for high-order finite element problems in $H(\mathrm{div})$. The solvers are applicable to grad-div and Darcy problems in saddle-point formulation, and have applications in radiation diffusion and porous media flow problems, among others. Using the interpolation-histopolation basis (cf. SIAM J. Sci. Comput., 45 (2023), A675-A702, arXiv:2203.02465), efficient matrix-free preconditioners can be constructed for the $(1,1)$-block and Schur complement of the block system. With these approximations, block-preconditioned MINRES converges in a number of iterations that is independent of the mesh size and polynomial degree. The approximate Schur complement takes the form of an M-matrix graph Laplacian, and therefore can be well-preconditioned by highly scalable algebraic multigrid methods. High-performance GPU-accelerated algorithms for all components of the solution algorithm are developed, discussed, and benchmarked. Numerical results are presented on a number of challenging test cases, including the "crooked pipe" grad-div problem, the SPE10 reservoir modeling benchmark problem, and a nonlinear radiation diffusion test case.

TextMesh: Generation of Realistic 3D Meshes From Text Prompts

  • Authors: Christina Tsalicoglou, Fabian Manhardt, Alessio Tonioni, Michael Niemeyer, Federico Tombari
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12439
  • Pdf link: https://arxiv.org/pdf/2304.12439
  • Abstract
    The ability to generate highly realistic 2D images from mere text prompts has recently made huge progress in terms of speed and quality, thanks to the advent of image diffusion models. Naturally, the question arises if this can be also achieved in the generation of 3D content from such text prompts. To this end, a new line of methods recently emerged trying to harness diffusion models, trained on 2D images, for supervision of 3D model generation using view dependent prompts. While achieving impressive results, these methods, however, have two major drawbacks. First, rather than commonly used 3D meshes, they instead generate neural radiance fields (NeRFs), making them impractical for most real applications. Second, these approaches tend to produce over-saturated models, giving the output a cartoonish looking effect. Therefore, in this work we propose a novel method for generation of highly realistic-looking 3D meshes. To this end, we extend NeRF to employ an SDF backbone, leading to improved 3D mesh extraction. In addition, we propose a novel way to finetune the mesh texture, removing the effect of high saturation and improving the details of the output 3D mesh.

RenderDiffusion: Text Generation as Image Generation

  • Authors: Junyi Li, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen
  • Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.12519
  • Pdf link: https://arxiv.org/pdf/2304.12519
  • Abstract
    Diffusion models have become a new generative paradigm for text generation. Considering the discrete categorical nature of text, in this paper, we propose \textsc{RenderDiffusion}, a novel diffusion approach for text generation via text-guided image generation. Our key idea is to render the target text as a \emph{glyph image} containing visual language content. In this way, conditional text generation can be cast as a glyph image generation task, and it is then natural to apply continuous diffusion models to discrete texts. Specially, we utilize a cascaded architecture (\ie a base and a super-resolution diffusion model) to generate high-fidelity glyph images, conditioned on the input text. Furthermore, we design a text grounding module to transform and refine the visual language content from generated glyph images into the final texts. In experiments over four conditional text generation tasks and two classes of metrics (\ie quality and diversity), \textsc{RenderDiffusion} can achieve comparable or even better results than several baselines, including pretrained language models. Our model also makes significant improvements compared to the recent diffusion model.

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

  • Authors: Zhendong Wang, Yifan Jiang, Huangjie Zheng, Peihao Wang, Pengcheng He, Zhangyang Wang, Weizhu Chen, Mingyuan Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.12526
  • Pdf link: https://arxiv.org/pdf/2304.12526
  • Abstract
    Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. At the core of our innovations is a new conditional score function at the patch level, where the patch location in the original image is included as additional coordinate channels, while the patch size is randomized and diversified throughout training to encode the cross-region dependency at multiple scales. Sampling with our method is as easy as in the original diffusion model. Through Patch Diffusion, we could achieve $\mathbf{\ge 2\times}$ faster training, while maintaining comparable or better generation quality. Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e.g.$, as few as 5,000 images to train from scratch. We achieve state-of-the-art FID scores 1.77 on CelebA-64$\times$64 and 1.93 on AFHQv2-Wild-64$\times$64. We will share our code and pre-trained models soon.

Exploring Compositional Visual Generation with Latent Classifier Guidance

  • Authors: Changhao Shi, Haomiao Ni, Kai Li, Shaobo Han, Mingfu Liang, Martin Renqiang Min
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12536
  • Pdf link: https://arxiv.org/pdf/2304.12536
  • Abstract
    Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space for compositional visual tasks. linear fashion. Specifically, we train latent diffusion models and auxiliary latent classifiers to facilitate non-linear navigation of latent representation generation for any pre-trained generative model with a semantic latent space. We demonstrate that such conditional generation achieved by latent classifier guidance provably maximizes a lower bound of the conditional log probability during training. To maintain the original semantics during manipulation, we introduce a new guidance term, which we show is crucial for achieving compositionality. With additional assumptions, we show that the non-linear manipulation reduces to a simple latent arithmetic approach. We show that this paradigm based on latent classifier guidance is agnostic to pre-trained generative models, and present competitive results for both image generation and sequential manipulation of real and synthetic images. Our findings suggest that latent classifier guidance is a promising approach that merits further exploration, even in the presence of other strong competing methods.

Efficient Bayesian inference using physics-informed invertible neural networks for inverse problems

  • Authors: Xiaofei Guan, Xintong Wang, Hao Wu
  • Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.12541
  • Pdf link: https://arxiv.org/pdf/2304.12541
  • Abstract
    In the paper, we propose a novel approach for solving Bayesian inverse problems with physics-informed invertible neural networks (PI-INN). The architecture of PI-INN consists of two sub-networks: an invertible neural network (INN) and a neural basis network (NB-Net). The invertible map between the parametric input and the INN output with the aid of NB-Net is constructed to provide a tractable estimation of the posterior distribution, which enables efficient sampling and accurate density evaluation. Furthermore, the loss function of PI-INN includes two components: a residual-based physics-informed loss term and a new independence loss term. The presented independence loss term can Gaussianize the random latent variables and ensure statistical independence between two parts of INN output by effectively utilizing the estimated density function. Several numerical experiments are presented to demonstrate the efficiency and accuracy of the proposed PI-INN, including inverse kinematics, inverse problems of the 1-d and 2-d diffusion equations, and seismic traveltime tomography.

CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis

  • Authors: Chaejeong Lee, Jayoung Kim, Noseong Park
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.12654
  • Pdf link: https://arxiv.org/pdf/2304.12654
  • Abstract
    With growing attention to tabular data these days, the attempt to apply a synthetic table to various tasks has been expanded toward various scenarios. Owing to the recent advances in generative modeling, fake data generated by tabular data synthesis models become sophisticated and realistic. However, there still exists a difficulty in modeling discrete variables (columns) of tabular data. In this work, we propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models. The two diffusion models are co-evolved during training by reading conditions from each other. In order to further bind the diffusion models, moreover, we introduce a contrastive learning method with a negative sampling method. In our experiments with 11 real-world tabular datasets and 8 baseline methods, we prove the efficacy of the proposed method, called CoDi.

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

  • Authors: Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, Jun Zhu
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.12824
  • Pdf link: https://arxiv.org/pdf/2304.12824
  • Abstract
    Guided sampling is a vital approach for applying diffusion models in real-world tasks that embeds human-defined guidance during the sampling procedure. This paper considers a general setting where the guidance is defined by an (unnormalized) energy function. The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate. To address this challenge, we propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance. Our method is guaranteed to converge to the exact guidance under unlimited model capacity and data samples, while previous methods can not. We demonstrate the effectiveness of our method by applying it to offline reinforcement learning (RL). Extensive experiments on D4RL benchmarks demonstrate that our method outperforms existing state-of-the-art algorithms. We also provide some examples of applying CEP for image synthesis to demonstrate the scalability of CEP on high-dimensional data.

The Score-Difference Flow for Implicit Generative Modeling

  • Authors: Romann M. Weber
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.12906
  • Pdf link: https://arxiv.org/pdf/2304.12906
  • Abstract
    Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. We introduce the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schr"odinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. However, unlike diffusion models, SD flow places no restrictions on the prior distribution. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that, taken together, address all three challenges of the "generative modeling trilemma": high sample quality, mode coverage, and fast sampling.

Keyword: dynamic

Parallel bootstrap-based on-policy deep reinforcement learning for continuous flow control applications

  • Authors: J. Viquerat, E. Hachem
  • Subjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)
  • Arxiv link: https://arxiv.org/abs/2304.12330
  • Pdf link: https://arxiv.org/pdf/2304.12330
  • Abstract
    The coupling of deep reinforcement learning to numerical flow control problems has recently received a considerable attention, leading to groundbreaking results and opening new perspectives for the domain. Due to the usually high computational cost of fluid dynamics solvers, the use of parallel environments during the learning process represents an essential ingredient to attain efficient control in a reasonable time. Yet, most of the deep reinforcement learning literature for flow control relies on on-policy algorithms, for which the massively parallel transition collection may break theoretical assumptions and lead to suboptimal control models. To overcome this issue, we propose a parallelism pattern relying on partial-trajectory buffers terminated by a return bootstrapping step, allowing a flexible use of parallel environments while preserving the on-policiness of the updates. This approach is illustrated on a CPU-intensive continuous flow control problem from the literature.

Beyond the Pixel: a Photometrically Calibrated HDR Dataset for Luminance and Color Temperature Prediction

  • Authors: Christophe Bolduc, Justine Giroux, Marc Hébert, Claude Demers, Jean-François Lalonde
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12372
  • Pdf link: https://arxiv.org/pdf/2304.12372
  • Abstract
    Light plays an important role in human well-being. However, most computer vision tasks treat pixels without considering their relationship to physical luminance. To address this shortcoming, we present the first large-scale photometrically calibrated dataset of high dynamic range \ang{360} panoramas. Our key contribution is the calibration of an existing, uncalibrated HDR Dataset. We do so by accurately capturing RAW bracketed exposures simultaneously with a professional photometric measurement device (chroma meter) for multiple scenes across a variety of lighting conditions. Using the resulting measurements, we establish the calibration coefficients to be applied to the HDR images. The resulting dataset is a rich representation of indoor scenes which displays a wide range of illuminance and color temperature, and varied types of light sources. We exploit the dataset to introduce three novel tasks: where per-pixel luminance, per-pixel temperature and planar illuminance can be predicted from a single input image. Finally, we also capture another smaller calibrated dataset with a commercial \ang{360} camera, to experiment on generalization across cameras. We are optimistic that the release of our datasets and associated code will spark interest in physically accurate light estimation within the community.

Efficient and Scalable Path-Planning Algorithms for Curvature Constrained Motion in the Hamilton-Jacobi Formulation

  • Authors: Christian Parkinson, Isabelle Boyle
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.12377
  • Pdf link: https://arxiv.org/pdf/2304.12377
  • Abstract
    We present a partial-differential-equation-based optimal path-planning framework for curvature constrained motion, with application to vehicles in 2- and 3-spatial-dimensions. This formulation relies on optimal control theory, dynamic programming, and a Hamilton-Jacobi-Bellman equation. Many authors have developed similar models and employed grid-based numerical methods to solve the partial differential equation required to generate optimal trajectories. However, these methods can be inefficient and do not scale well to high dimensions. We describe how efficient and scalable algorithms for solutions of high dimensional Hamilton-Jacobi equations can be developed to solve similar problems very efficiently, even in high dimensions, while maintaining the Hamilton-Jacobi formulation. We demonstrate our method with several examples.

PID-inspired modifications in response threshold models in swarm intelligent systems

  • Authors: Maryam Kebari, Annie S. Wu, H. David Mathias
  • Subjects: Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2304.12385
  • Pdf link: https://arxiv.org/pdf/2304.12385
  • Abstract
    In this study, we investigate the effectiveness of using the PID (Proportional - Integral - Derivative) control loop factors for modifying response thresholds in a decentralized, non-communicating, threshold-based swarm. Each agent in our swarm has a set of four thresholds, each corresponding to a task the agent is capable of performing. The agent will act on a particular task if the stimulus is higher than its corresponding threshold. The ability to modify their thresholds allows the agents to specialize dynamically in response to task demands. Current approaches to dynamic thresholds typically use a learning and forgetting process to adjust thresholds. These methods are able to effectively specialize once, but can have difficulty re-specializing if the task demands change. Our approach, inspired by the PID control loop, alters the threshold values based on the current task demand value, the change in task demand, and the cumulative sum of previous task demands. We show that our PID-inspired method is scalable and outperforms fixed and current learning and forgetting response thresholds with non-changing, constant, and abrupt changes in task demand. This superior performance is due to the ability of our method to re-specialize repeatedly in response to changing task demands.

CEDR-API: Productive, Performant Programming of Domain-Specific Embedded Systems

  • Authors: Joshua Mack, Serhan Gener, Sahil Hassan, H. Umut Suluhan, Ali Akoglu
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.12396
  • Pdf link: https://arxiv.org/pdf/2304.12396
  • Abstract
    As the computing landscape evolves, system designers continue to explore design methodologies that leverage increased levels of heterogeneity to push performance within limited size, weight, power, and cost budgets. One such methodology is to build Domain-Specific System on Chips (DSSoCs) that promise increased productivity through narrowed scope of their target application domain. In previous works, we have proposed CEDR, an open source, unified compilation and runtime framework for DSSoC architectures that allows applications, scheduling heuristics, and accelerators to be co-designed in a cohesive manner that maximizes system performance. In this work, we present changes to the application development workflow that enable a more productive and expressive API-based programming methodology. These changes allow for more rapid integration of new applications without sacrificing application performance. Towards the design of heterogeneous SoCs with rich set of accelerators, in this study we experimentally study the impact of increase in workload complexity and growth in the pool of compute resources on execution time of dynamically arriving workloads composed of real-life applications executed over architectures emulated on Xilinx ZCU102 MPSoC and Nvidia Jetson AGX Xavier. We expand CEDR into the application domain of autonomous vehicles, and we find that API-based CEDR achieves a runtime overhead reduction of 19.5% with respect to the original CEDR.

Synthesizing Stable Reduced-Order Visuomotor Policies for Nonlinear Systems via Sums-of-Squares Optimization

  • Authors: Glen Chou, Russ Tedrake
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.12405
  • Pdf link: https://arxiv.org/pdf/2304.12405
  • Abstract
    We present a method for synthesizing dynamic, reduced-order output-feedback polynomial control policies for control-affine nonlinear systems which guarantees runtime stability to a goal state, when using visual observations and a learned perception module in the feedback control loop. We leverage Lyapunov analysis to formulate the problem of synthesizing such policies. This problem is nonconvex in the policy parameters and the Lyapunov function that is used to prove the stability of the policy. To solve this problem approximately, we propose two approaches: the first solves a sequence of sum-of-squares optimization problems to iteratively improve a policy which is provably-stable by construction, while the second directly performs gradient-based optimization on the parameters of the polynomial policy, and its closed-loop stability is verified a posteriori. We extend our approach to provide stability guarantees in the presence of observation noise, which realistically arises due to errors in the learned perception module. We evaluate our approach on several underactuated nonlinear systems, including pendula and quadrotors, showing that our guarantees translate to empirical stability when controlling these systems from images, while baseline approaches can fail to reliably stabilize the system.

Sample-Efficient and Surrogate-Based Design Optimization of Underwater Vehicle Hulls

  • Authors: Harsh Vardhan, David Hyde, Umesh Timalsina, Peter Volgyesi, Janos Sztipanovits
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Applied Physics (physics.app-ph); Fluid Dynamics (physics.flu-dyn); Applications (stat.AP); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.12420
  • Pdf link: https://arxiv.org/pdf/2304.12420
  • Abstract
    Physics simulations are a computational bottleneck in computer-aided design (CAD) optimization processes. Hence, in order to make accurate (computationally expensive) simulations feasible for use in design optimization, one requires either an optimization framework that is highly sample-efficient or fast data-driven proxies (surrogate models) for long running simulations. In this work, we leverage recent advances in optimization and artificial intelligence (AI) to address both of these potential solutions, in the context of designing an optimal unmanned underwater vehicle (UUV). We first investigate and compare the sample efficiency and convergence behavior of different optimization techniques with a standard computational fluid dynamics (CFD) solver in the optimization loop. We then develop a deep neural network (DNN) based surrogate model to approximate drag forces that would otherwise be computed via direct numerical simulation with the CFD solver. The surrogate model is in turn used in the optimization loop of the hull design. Our study finds that the Bayesian Optimization Lower Condition Bound (BO LCB) algorithm is the most sample-efficient optimization framework and has the best convergence behavior of those considered. Subsequently, we show that our DNN-based surrogate model predicts drag force on test data in tight agreement with CFD simulations, with a mean absolute percentage error (MAPE) of 1.85%. Combining these results, we demonstrate a two-orders-of-magnitude speedup (with comparable accuracy) for the design optimization process when the surrogate model is used. To our knowledge, this is the first study applying Bayesian optimization and DNN-based surrogate modeling to the problem of UUV design optimization, and we share our developments as open-source software.

Neuroevolution of Recurrent Architectures on Control Tasks

  • Authors: Maximilien Le Clei, Pierre Bellec
  • Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.12431
  • Pdf link: https://arxiv.org/pdf/2304.12431
  • Abstract
    Modern artificial intelligence works typically train the parameters of fixed-sized deep neural networks using gradient-based optimization techniques. Simple evolutionary algorithms have recently been shown to also be capable of optimizing deep neural network parameters, at times matching the performance of gradient-based techniques, e.g. in reinforcement learning settings. In addition to optimizing network parameters, many evolutionary computation techniques are also capable of progressively constructing network architectures. However, constructing network architectures from elementary evolution rules has not yet been shown to scale to modern reinforcement learning benchmarks. In this paper we therefore propose a new approach in which the architectures of recurrent neural networks dynamically evolve according to a small set of mutation rules. We implement a massively parallel evolutionary algorithm and run experiments on all 19 OpenAI Gym state-based reinforcement learning control tasks. We find that in most cases, dynamic agents match or exceed the performance of gradient-based agents while utilizing orders of magnitude fewer parameters. We believe our work to open avenues for real-life applications where network compactness and autonomous design are of critical importance. We provide our source code, final model checkpoints and full results at github.com/MaximilienLC/nra.

VpROM: A novel Variational AutoEncoder-boosted Reduced Order Model for the treatment of parametric dependencies in nonlinear systems

  • Authors: Thomas Simpson, Konstantinos Vlachas, Anthony Garland, Nikolaos Dervilis, Eleni Chatzi
  • Subjects: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2304.12437
  • Pdf link: https://arxiv.org/pdf/2304.12437
  • Abstract
    Reduced Order Models (ROMs) are of considerable importance in many areas of engineering in which computational time presents difficulties. Established approaches employ projection-based reduction such as Proper Orthogonal Decomposition, however, such methods can become inefficient or fail in the case of parameteric or strongly nonlinear models. Such limitations are usually tackled via a library of local reduction bases each of which being valid for a given parameter vector. The success of such methods, however, is strongly reliant upon the method used to relate the parameter vectors to the local bases, this is typically achieved using clustering or interpolation methods. We propose the replacement of these methods with a Variational Autoencoder (VAE) to be used as a generative model which can infer the local basis corresponding to a given parameter vector in a probabilistic manner. The resulting VAE-boosted parametric ROM \emph{VpROM} still retains the physical insights of a projection-based method but also allows for better treatment of problems where model dependencies or excitation traits cause the dynamic behavior to span multiple response regimes. Moreover, the probabilistic treatment of the VAE representation allows for uncertainty quantification on the reduction bases which may then be propagated to the ROM response. The performance of the proposed approach is validated on an open-source simulation benchmark featuring hysteresis and multi-parametric dependencies, and on a large-scale wind turbine tower characterised by nonlinear material behavior and model uncertainty.

Real-Time Ground Fault Detection for Inverter-Based Microgrid Systems

  • Authors: Jingwei Dong, Yucheng Liao, Peyman Mohajerin Esfahani
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.12445
  • Pdf link: https://arxiv.org/pdf/2304.12445
  • Abstract
    Ground fault detection in inverter-based microgrid systems is challenging, particularly in a real-time setting, as the fault current deviates slightly from the nominal value. This difficulty is reinforced when natural disturbances exhibit similar output patterns as a faulty setting does. The conventional solution of installing more relays to obtain additional measurements is costly and also increases the complexity of the system. In this paper, we propose diagnosis schemes based on optimization-based fault detection filters with the output current as the only measurement. Modeling the microgrid dynamics and the diagnosis filter, we formulate the filter design as a linear programming (LP) problem that accounts for decoupling a class of disturbances and ensuring fault sensitivity simultaneously. Next, we robustify the filter to disturbances that cannot be fully decoupled. To this end, we leverage tools from the existing literature and extend the optimization program to a quadratic programming (QP) problem in which the filter is trained for this class of disturbances. To ease the computational effort, we also provide an approximate but analytical solution to this QP. Additionally, we use classical statistical results to provide a thresholding mechanism that enjoys probabilistic false-alarm guarantees. Finally, we verify the effectiveness of the proposed methods through several numerical simulations.

Large Intelligent Surface Measurements for Joint Communication and Sensing

  • Authors: Christian Nelson, Xuhong Li, Thomas Wilding, Benjamin Deutschmann, Klaus Witrisal, Fredrik Tufvesson
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.12457
  • Pdf link: https://arxiv.org/pdf/2304.12457
  • Abstract
    Multiple concepts for future generations of wireless communication standards utilize coherent processing of signals from many distributed antennas. Names for these concepts include distributed MIMO, cell-free massive MIMO, XL-MIMO, and large intelligent surfaces. They aim to improve communication reliability, capacity, as well as energy efficiency and provide possibilities for new applications through joint communication and sensing. One such recently proposed solution is the concept of RadioWeaves. It proposes a new radio infrastructure for distributed MIMO with distributed internal processing, storage, and compute resources integrated into the infrastructure. The large bandwidths available in the higher bands have inspired much work regarding sensing in the mmWave- and sub-THz-bands, however, sub-6 GHz cellular bands will still be the main provider of broad cellular coverage due to the more favorable propagation conditions. In this paper, we present results from a sub-6 GHz measurement campaign targeting the non-stationary spatial channel statistics for a large RadioWeave and the temporal non-stationarity in a dynamic scenario with RadioWeaves. From the results, we also predict the possibility of multi-static sensing and positioning of users in the environment.

Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout

  • Authors: Carmel Fiscko, Soummya Kar, Bruno Sinopoli
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.12458
  • Pdf link: https://arxiv.org/pdf/2304.12458
  • Abstract
    This work studies a multi-agent Markov decision process (MDP) that can undergo agent dropout and the computation of policies for the post-dropout system based on control and sampling of the pre-dropout system. The controller's objective is to find an optimal policy that maximizes the value of the expected system given a priori knowledge of the agents' dropout probabilities. Finding an optimal policy for any specific dropout realization is a special case of this problem. For MDPs with a certain transition independence and reward separability structure, we assume that removing agents from the system forms a new MDP comprised of the remaining agents with new state and action spaces, transition dynamics that marginalize the removed agents, and rewards that are independent of the removed agents. We first show that under these assumptions, the value of the expected post-dropout system can be represented by a single MDP; this "robust MDP" eliminates the need to evaluate all $2^N$ realizations of the system, where $N$ denotes the number of agents. More significantly, in a model-free context, it is shown that the robust MDP value can be estimated with samples generated by the pre-dropout system, meaning that robust policies can be found before dropout occurs. This fact is used to propose a policy importance sampling (IS) routine that performs policy evaluation for dropout scenarios while controlling the existing system with good pre-dropout policies. The policy IS routine produces value estimates for both the robust MDP and specific post-dropout system realizations and is justified with exponential confidence bounds. Finally, the utility of this approach is verified in simulation, showing how structural properties of agent dropout can help a controller find good post-dropout policies before dropout occurs.

Recurrent Transformer Encoders for Vision-based Estimation of Fatigue and Engagement in Cognitive Training Sessions

  • Authors: Yanchen Wang, Yunlong Xu, Feng Vankee Lin, Ehsan Adeli
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12470
  • Pdf link: https://arxiv.org/pdf/2304.12470
  • Abstract
    The effectiveness of computerized cognitive training in slowing cognitive decline and brain aging in dementia is often limited by the engagement of participants in the training. Monitoring older users' real-time engagement in domains of attention, motivation, and affect is crucial to understanding the overall effectiveness of such training. In this paper, we propose to predict engagement, quantified via an established mental fatigue measure assessing users' perceived attention, motivation, and affect throughout computerized cognitive training sessions, in older adults with mild cognitive impairment (MCI), by monitoring their real-time video-recorded facial gestures in training sessions. To achieve the goal, we used computer vision, analyzing video frames every 5 seconds to optimize the balance between information retention and data size, and developed a novel Recurrent Video Transformer (RVT). Our RVT model, which combines a clip-wise transformer encoder module and a session-wise Recurrent Neural Network (RNN) classifier, achieved the highest balanced accuracy, F1 score, and precision compared to other state-of-the-art models for both detecting mental fatigue/disengagement cases (binary classification) and rating the level of mental fatigue (multi-class classification). By leveraging dynamic temporal information, the RVT model demonstrates the potential to accurately predict engagement among computerized cognitive training users, which lays the foundation for future work to modulate the level of engagement in computerized cognitive training interventions. The code will be released.

Artificial General Intelligence (AGI) for Education

  • Authors: Ehsan Latif, Gengchen Mai, Matthew Nyaaba, Xuansheng Wu, Ninghao Liu, Guoyu Lu, Sheng Li, Tianming Liu, Xiaoming Zhai
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.12479
  • Pdf link: https://arxiv.org/pdf/2304.12479
  • Abstract
    Artificial general intelligence (AGI) has gained global recognition as a future technology due to the emergence of breakthrough large language models and chatbots such as GPT-4 and ChatGPT, respectively. AGI aims to replicate human intelligence through computer systems, which is one of the critical technologies having the potential to revolutionize the field of education. Compared to conventional AI models, typically designed for a limited range of tasks, demand significant amounts of domain-specific data for training and may not always consider intricate interpersonal dynamics in education. AGI, driven by the recent large pre-trained models, represents a significant leap in the capability of machines to perform tasks that require human-level intelligence, such as reasoning, problem-solving, decision-making, and even understanding human emotions and social interactions. This work reviews AGI's key concepts, capabilities, scope, and potential within future education, including setting educational goals, designing pedagogy and curriculum, and performing assessments. We also provide rich discussions over various ethical issues in education faced by AGI and how AGI will affect human educators. The development of AGI necessitates interdisciplinary collaborations between educators and AI engineers to advance research and application efforts.

Information Theory for Complex Systems Scientists

  • Authors: Thomas F. Varley
  • Subjects: Information Theory (cs.IT); Data Analysis, Statistics and Probability (physics.data-an); Quantitative Methods (q-bio.QM); Other Statistics (stat.OT)
  • Arxiv link: https://arxiv.org/abs/2304.12482
  • Pdf link: https://arxiv.org/pdf/2304.12482
  • Abstract
    In the 21st century, many of the crucial scientific and technical issues facing humanity can be understood as problems associated with understanding, modelling, and ultimately controlling complex systems: systems comprised of a large number of non-trivially interacting components whose collective behaviour can be difficult to predict. Information theory, a branch of mathematics historically associated with questions about encoding and decoding messages, has emerged as something of a lingua franca for those studying complex systems, far exceeding its original narrow domain of communication systems engineering. In the context of complexity science, information theory provides a set of tools which allow researchers to uncover the statistical and effective dependencies between interacting components; relationships between systems and their environment; mereological whole-part relationships; and is sensitive to non-linearities missed by commonly parametric statistical models. In this review, we aim to provide an accessible introduction to the core of modern information theory, aimed specifically at aspiring (and established) complex systems scientists. This includes standard measures, such as Shannon entropy, relative entropy, and mutual information, before building to more advanced topics, including: information dynamics, measures of statistical complexity, information decomposition, and effective network inference. In addition to detailing the formal definitions, in this review we make an effort to discuss how information theory can be interpreted and develop the intuition behind abstract concepts like "entropy," in the hope that this will enable interested readings to understand what information is, and how it is used, at a more fundamental level.

What is the Expected Transient Behavior of Opinion Evolution for Two Communities?

  • Authors: Yu Xing, Karl H. Johansson
  • Subjects: Systems and Control (eess.SY); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2304.12495
  • Pdf link: https://arxiv.org/pdf/2304.12495
  • Abstract
    We study the transient behavior of a gossip model, in which agents randomly interact pairwise over a weighted graph with two communities. Edges within each community have identical weights, different from the weights between communities. It is shown that, at the early stage of the opinion evolution, the expected agent states in the same community have identical sign, despite influence of stubborn agents. Moreover, it is shown that the expected states of the agents in the same community concentrate around the initial average opinion of that community, if the weights within communities are larger than between. In contrast, if the edge weights between communities are larger, then the expected states of all agents concentrate around everyone's initial average opinion. Different from the traditional asymptotic analysis in the opinion dynamics literature, these results focus on the initial phase of opinion evolution and establish a correspondence between community structure and transient behavior of the gossip model. The results are illustrated by numerical examples.

Causal Semantic Communication for Digital Twins: A Generalizable Imitation Learning Approach

  • Authors: Christo Kurisummoottil Thomas, Walid Saad, Yong Xiao
  • Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP); Methodology (stat.ME)
  • Arxiv link: https://arxiv.org/abs/2304.12502
  • Pdf link: https://arxiv.org/pdf/2304.12502
  • Abstract
    A digital twin (DT) leverages a virtual representation of the physical world, along with communication (e.g., 6G), computing (e.g., edge computing), and artificial intelligence (AI) technologies to enable many connected intelligence services. In order to handle the large amounts of network data based on digital twins (DTs), wireless systems can exploit the paradigm of semantic communication (SC) for facilitating informed decision-making under strict communication constraints by utilizing AI techniques such as causal reasoning. In this paper, a novel framework called causal semantic communication (CSC) is proposed for DT-based wireless systems. The CSC system is posed as an imitation learning (IL) problem, where the transmitter, with access to optimal network control policies using a DT, teaches the receiver using SC over a bandwidth limited wireless channel how to improve its knowledge to perform optimal control actions. The causal structure in the source data is extracted using novel approaches from the framework of deep end-to-end causal inference, thereby enabling the creation of a semantic representation that is causally invariant, which in turn helps generalize the learned knowledge of the system to unseen scenarios. The CSC decoder at the receiver is designed to extract and estimate semantic information while ensuring high semantic reliability. The receiver control policies, semantic decoder, and causal inference are formulated as a bi-level optimization problem within a variational inference framework. This problem is solved using a novel concept called network state models, inspired from world models in generative AI, that faithfully represents the environment dynamics leading to data generation. Simulation results demonstrate that the proposed CSC system outperforms state-of-the-art SC systems by achieving better semantic reliability and reduced semantic representation.

Mobilizing Personalized Federated Learning via Random Walk Stochastic ADMM

  • Authors: Ziba Parsons, Fei Dou, Houyi Du, Jin Lu
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.12534
  • Pdf link: https://arxiv.org/pdf/2304.12534
  • Abstract
    In this research, we investigate the barriers associated with implementing Federated Learning (FL) in real-world scenarios, where a consistent connection between the central server and all clients cannot be maintained, and data distribution is heterogeneous. To address these challenges, we focus on mobilizing the federated setting, where the server moves between groups of adjacent clients to learn local models. Specifically, we propose a new algorithm, Random Walk Stochastic Alternating Direction Method of Multipliers (RWSADMM), capable of adapting to dynamic and ad-hoc network conditions as long as a sufficient number of connected clients are available for model training. In RWSADMM, the server walks randomly toward a group of clients. It formulates local proximity among adjacent clients based on hard inequality constraints instead of consensus updates to address data heterogeneity. Our proposed method is convergent, reduces communication costs, and enhances scalability by reducing the number of clients the central server needs to communicate with.

Opinion Control under Adversarial Network Perturbation: A Stackelberg Game Approach

  • Authors: Yuejiang Li, Zhanjiang Chen, H. Vicky Zhao
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.12540
  • Pdf link: https://arxiv.org/pdf/2304.12540
  • Abstract
    The emerging social network platforms enable users to share their own opinions, as well as to exchange opinions with others. However, adversarial network perturbation, where malicious users intentionally spread their extreme opinions, rumors, and misinformation to others, is ubiquitous in social networks. Such adversarial network perturbation greatly influences the opinion formation of the public and threatens our societies. Thus, it is critical to study and control the influence of adversarial network perturbation. Although tremendous efforts have been made in both academia and industry to guide and control the public opinion dynamics, most of these works assume that the network is static, and ignore such adversarial network perturbation. In this work, based on the well-accepted Friedkin-Johnsen opinion dynamics model, we model the adversarial network perturbation and analyze its impact on the networks' opinion. Then, from the adversary's perspective, we analyze its optimal network perturbation, which maximally changes the network's opinion. Next, from the network defender's perspective, we formulate a Stackelberg game and aim to control the network's opinion even under such adversarial network perturbation. We devise a projected subgradient algorithm to solve the formulated Stackelberg game. Extensive simulations on real social networks validate our analysis of the adversarial network perturbation's influence and the effectiveness of the proposed opinion control algorithm.

Real-time Safety Assessment of Dynamic Systems in Non-stationary Environments: A Review of Methods and Techniques

  • Authors: Zeyi Liu, Songqiao Hu, Xiao He
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.12583
  • Pdf link: https://arxiv.org/pdf/2304.12583
  • Abstract
    Real-time safety assessment (RTSA) of dynamic systems is a critical task that has significant implications for various fields such as industrial and transportation applications, especially in non-stationary environments. However, the absence of a comprehensive review of real-time safety assessment methods in non-stationary environments impedes the progress and refinement of related methods. In this paper, a review of methods and techniques for RTSA tasks in non-stationary environments is provided. Specifically, the background and significance of RTSA approaches in non-stationary environments are firstly highlighted. We then present a problem description that covers the definition, classification, and main challenges. We review recent developments in related technologies such as online active learning, online semi-supervised learning, online transfer learning, and online anomaly detection. Finally, we discuss future outlooks and potential directions for further research. Our review aims to provide a comprehensive and up-to-date overview of real-time safety assessment methods in non-stationary environments, which can serve as a valuable resource for researchers and practitioners in this field.

ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds

  • Authors: Xiangze Jia, Hui Zhou, Xinge Zhu, Yandong Guo, Ji Zhang, Yuexin Ma
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12589
  • Pdf link: https://arxiv.org/pdf/2304.12589
  • Abstract
    In this paper, we propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation. Different from usually adopted self-supervised strategies for data-level structure consistency, we predict scene motion via feature-level consistency between pillars in consecutive frames, which can eliminate the effect caused by noise points and view-changing point clouds in dynamic scenes. Specifically, we propose \textit{Soft Discriminative Loss} that provides the network with more pseudo-supervised signals to learn discriminative and robust features in a contrastive learning manner. We also propose \textit{Gated Multi-frame Fusion} block that learns valid compensation between point cloud frames automatically to enhance feature extraction. Finally, \textit{pillar association} is proposed to predict pillar correspondence probabilities based on feature distance, and whereby further predicts scene motion. Extensive experiments show the effectiveness and superiority of our \textbf{ContrastMotion} on both scene flow and motion prediction tasks. The code is available soon.

Partially Observable Mean Field Multi-Agent Reinforcement Learning Based on Graph-Attention

  • Authors: Min Yang, Guanjun Liu, Ziyuan Zhou
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.12653
  • Pdf link: https://arxiv.org/pdf/2304.12653
  • Abstract
    Traditional multi-agent reinforcement learning algorithms are difficultly applied in a large-scale multi-agent environment. The introduction of mean field theory has enhanced the scalability of multi-agent reinforcement learning in recent years. This paper considers partially observable multi-agent reinforcement learning (MARL), where each agent can only observe other agents within a fixed range. This partial observability affects the agent's ability to assess the quality of the actions of surrounding agents. This paper focuses on developing a method to capture more effective information from local observations in order to select more effective actions. Previous work in this field employs probability distributions or weighted mean field to update the average actions of neighborhood agents, but it does not fully consider the feature information of surrounding neighbors and leads to a local optimum. In this paper, we propose a novel multi-agent reinforcement learning algorithm, Partially Observable Mean Field Multi-Agent Reinforcement Learning based on Graph--Attention (GAMFQ) to remedy this flaw. GAMFQ uses a graph attention module and a mean field module to describe how an agent is influenced by the actions of other agents at each time step. This graph attention module consists of a graph attention encoder and a differentiable attention mechanism, and this mechanism outputs a dynamic graph to represent the effectiveness of neighborhood agents against central agents. The mean--field module approximates the effect of a neighborhood agent on a central agent as the average effect of effective neighborhood agents. We evaluate GAMFQ on three challenging tasks in the MAgents framework. Experiments show that GAMFQ outperforms baselines including the state-of-the-art partially observable mean-field reinforcement learning algorithms.

Dynamic Video Frame Interpolation with integrated Difficulty Pre-Assessment

  • Authors: Ban Chen, Xin Jin, Youxin Chen, Longhai Wu, Jie Chen, Jayoon Koo, Cheul-hee Hahm
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12664
  • Pdf link: https://arxiv.org/pdf/2304.12664
  • Abstract
    Video frame interpolation(VFI) has witnessed great progress in recent years. While existing VFI models still struggle to achieve a good trade-off between accuracy and efficiency: fast models often have inferior accuracy; accurate models typically run slowly. However, easy samples with small motion or clear texture can achieve competitive results with simple models and do not require heavy computation. In this paper, we present an integrated pipeline which combines difficulty assessment with video frame interpolation. Specifically, it firstly leverages a pre-assessment model to measure the interpolation difficulty level of input frames, and then dynamically selects an appropriate VFI model to generate interpolation results. Furthermore, a large-scale VFI difficulty assessment dataset is collected and annotated to train our pre-assessment model. Extensive experiments show that easy samples pass through fast models while difficult samples inference with heavy models, and our proposed pipeline can improve the accuracy-efficiency trade-off for VFI.

Low-Power Data Streaming in Systolic Arrays with Bus-Invert Coding and Zero-Value Clock Gating

  • Authors: Christodoulos Peltekis, Dionysios Filippas, Giorgos Dimitrakopoulos, Chrysostomos Nicopoulos
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.12691
  • Pdf link: https://arxiv.org/pdf/2304.12691
  • Abstract
    Systolic Array (SA) architectures are well suited for accelerating matrix multiplications through the use of a pipelined array of Processing Elements (PEs) communicating with local connections and pre-orchestrated data movements. Even though most of the dynamic power consumption in SAs is due to multiplications and additions, pipelined data movement within the SA constitutes an additional important contributor. The goal of this work is to reduce the dynamic power consumption associated with the feeding of data to the SA, by synergistically applying bus-invert coding and zero-value clock gating. By exploiting salient attributes of state-of-the-art CNNs, such as the value distribution of the weights, the proposed SA applies appropriate encoding only to the data that exhibits high switching activity. Similarly, when one of the inputs is zero, unnecessary operations are entirely skipped. This selectively targeted, application-aware encoding approach is demonstrated to reduce the dynamic power consumption of data streaming in CNN applications using Bfloat16 arithmetic by 1%-19%. This translates to an overall dynamic power reduction of 6.2%-9.4%.

Learning Robust Deep Equilibrium Models

  • Authors: Haoyu Chu, Shikui Wei, Ting Liu
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12707
  • Pdf link: https://arxiv.org/pdf/2304.12707
  • Abstract
    Deep equilibrium (DEQ) models have emerged as a promising class of implicit layer models in deep learning, which abandon traditional depth by solving for the fixed points of a single nonlinear layer. Despite their success, the stability of the fixed points for these models remains poorly understood. Recently, Lyapunov theory has been applied to Neural ODEs, another type of implicit layer model, to confer adversarial robustness. By considering DEQ models as nonlinear dynamic systems, we propose a robust DEQ model named LyaDEQ with guaranteed provable stability via Lyapunov theory. The crux of our method is ensuring the fixed points of the DEQ models are Lyapunov stable, which enables the LyaDEQ models to resist the minor initial perturbations. To avoid poor adversarial defense due to Lyapunov-stable fixed points being located near each other, we add an orthogonal fully connected layer after the Lyapunov stability module to separate different fixed points. We evaluate LyaDEQ models on several widely used datasets under well-known adversarial attacks, and experimental results demonstrate significant improvement in robustness. Furthermore, we show that the LyaDEQ model can be combined with other defense methods, such as adversarial training, to achieve even better adversarial robustness.

Inverting the Imaging Process by Learning an Implicit Camera Model

  • Authors: Xin Huang, Qi Zhang, Ying Feng, Hongdong Li, Qing Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12748
  • Pdf link: https://arxiv.org/pdf/2304.12748
  • Abstract
    Representing visual signals with implicit coordinate-based neural networks, as an effective replacement of the traditional discrete signal representation, has gained considerable popularity in computer vision and graphics. In contrast to existing implicit neural representations which focus on modelling the scene only, this paper proposes a novel implicit camera model which represents the physical imaging process of a camera as a deep neural network. We demonstrate the power of this new implicit camera model on two inverse imaging tasks: i) generating all-in-focus photos, and ii) HDR imaging. Specifically, we devise an implicit blur generator and an implicit tone mapper to model the aperture and exposure of the camera's imaging process, respectively. Our implicit camera model is jointly learned together with implicit scene models under multi-focus stack and multi-exposure bracket supervision. We have demonstrated the effectiveness of our new model on a large number of test images and videos, producing accurate and visually appealing all-in-focus and high dynamic range images. In principle, our new implicit neural camera model has the potential to benefit a wide array of other inverse imaging tasks.

Blockchain Large Language Models

  • Authors: Yu Gai, Liyi Zhou, Kaihua Qin, Dawn Song, Arthur Gervais
  • Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.12749
  • Pdf link: https://arxiv.org/pdf/2304.12749
  • Abstract
    This paper presents a dynamic, real-time approach to detecting anomalous blockchain transactions. The proposed tool, TXRANK, generates tracing representations of blockchain activity and trains from scratch a large language model to act as a real-time Intrusion Detection System. Unlike traditional methods, TXRANK is designed to offer an unrestricted search space and does not rely on predefined rules or patterns, enabling it to detect a broader range of anomalies. We demonstrate the effectiveness of TXRANK through its use as an anomaly detection tool for Ethereum transactions. In our experiments, it effectively identifies abnormal transactions among a dataset of 68M transactions and has a batched throughput of 2284 transactions per second on average. Our results show that, TXRANK identifies abnormal transactions by ranking 49 out of 124 attacks among the top-3 most abnormal transactions interacting with their victim contracts. This work makes contributions to the field of blockchain transaction analysis by introducing a custom data encoding compatible with the transformer architecture, a domain-specific tokenization technique, and a tree encoding method specifically crafted for the Ethereum Virtual Machine (EVM) trace representation.

Parallel Spiking Neurons with High Efficiency and Long-term Dependencies Learning Ability

  • Authors: Wei Fang, Zhaofei Yu, Zhaokun Zhou, Yanqi Chen, Zhengyu Ma, Timothée Masquelier, Yonghong Tian
  • Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12760
  • Pdf link: https://arxiv.org/pdf/2304.12760
  • Abstract
    Vanilla spiking neurons in Spiking Neural Networks (SNNs) use charge-fire-reset neuronal dynamics, which can only be simulated in serial and can hardly learn long-time dependencies. We find that when removing reset, the neuronal dynamics are reformulated in a non-iterative form and can be parallelized. By rewriting neuronal dynamics without resetting to a general formulation, we propose the Parallel Spiking Neuron (PSN), which uses dense connections between time-steps to maximize the utilization of temporal information. To avoid the use of future inputs for low-latency inference, we add masks on the weights and obtain the masked PSN. By sharing weights across time-steps, the sliding PSN is proposed with the ability to deal with sequences with variant lengths. We evaluate the PSN family on simulation speed and temporal/static data classification, and the results show the overwhelming advantage of the PSN family in efficiency and accuracy. To our best knowledge, this is the first research about parallelizing spiking neurons and can be a cornerstone for the spiking deep learning community. Our codes are available at \url{https://github.com/fangwei123456/Parallel-Spiking-Neuron}.

Dynamic Ineffectuality-based Clustered Architectures

  • Authors: Rajshekar Kalayappan, Sandeep Chandran
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.12762
  • Pdf link: https://arxiv.org/pdf/2304.12762
  • Abstract
    The direction of conditional branches is predicted correctly in modern processors with great accuracy. We find several instructions in the dynamic instruction stream that contribute only towards computing the condition of these instructions. Hence, when the predicted direction of conditional branches is indeed correct, these instructions become Ineffectual - the functional state of the program would not be different had these instructions been dropped. However, the execution of ineffectual instructions cannot be avoided altogether because it is possible that the prediction of the branch direction is wrong. In this work, we determine all sources of ineffectuality in an instruction stream such as conditional branches, predicated instructions, indirect jumps and dynamically dead instructions. Then, we propose a technique to steer the ineffectual instructions away from the primary execution cluster so that effectual instructions can execute uncontended. We find that such ineffectuality-based clustering of instructions naturally simplifies the design and avoids several caveats of a clustered architecture. Finally, we propose a technique to detect instances when instructions were incorrectly marked as ineffectual, say due to a branch misprediction, and recover the pipeline. The empirical evaluation of the proposed changes on the SPEC CPU2017 and GAPBS benchmarks show performance uplifts of up to 4.9% and 10.3% on average respectively.

Adaptive Collective Responses to Local Stimuli in Anonymous Dynamic Networks

  • Authors: Shunhao Oh, Dana Randall, Andréa W. Richa
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)
  • Arxiv link: https://arxiv.org/abs/2304.12771
  • Pdf link: https://arxiv.org/pdf/2304.12771
  • Abstract
    We develop a framework for self-induced phase changes in programmable matter in which a collection of agents with limited computational and communication capabilities can collectively perform appropriate global tasks in response to local stimuli that dynamically appear and disappear. Agents reside on graph vertices, where each stimulus is only recognized locally, and agents communicate via token passing along edges to alert other agents to transition to an "aware" state when stimuli are present and an "unaware" state when the stimuli disappear. We present an Adaptive Stimuli Algorithm that is robust to competing waves of messages as multiple stimuli change, possibly adversarially. Moreover, in addition to handling arbitrary stimulus dynamics, the algorithm can handle agents reconfiguring the connections (edges) of the graph over time in a controlled way. As an application, we show how this Adaptive Stimuli Algorithm on reconfigurable graphs can be used to solve the foraging problem, where food sources may be discovered, removed, or shifted at arbitrary times. We would like the agents to consistently self-organize using only local interactions, such that if the food remains in position long enough, the agents transition to a gather phase, collectively forming a single large component with small perimeter around the food. Alternatively, if no food source has existed recently, the agents should self-induce a switch to a search phase in which they distribute themselves randomly throughout the lattice region to search for food. Unlike previous approaches to foraging, this process is indefinitely repeatable. Like a physical phase change, microscopic changes such as the deletion or addition of a single food source triggers these macroscopic, system-wide transitions as agents share information about the environment and respond locally to get the desired collective response.

Modeling Adaptive Self-healing Systems

  • Authors: Habtom Kahsay Gidey, Diego Marmsoler, Dominik Ascher
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.12773
  • Pdf link: https://arxiv.org/pdf/2304.12773
  • Abstract
    Motivation: Smart grids design requires energy distribution operations to be adaptable to abnormality. This requirement entails distribution system operators (DSOs) to optimize restoration to normal operational states dynamically. However, these design challenges demand collaborative research efforts on sophisticated modeling and simulation approaches. Approach: In the ESOSEG research project, analyzing the smart grid domain as a software-intensive system, we employed a dynamic architecture approach, particularly the FOCUS theory, to model and assure the domains' self-healing requirements. Although some works specify various self-healing systems, to the best of our knowledge, the use of the approach in smart grids is the first work to enable a formal specification and verification of self-healing properties in smart grids. Results: As a result, to support the modeling and verification process, we developed tool support with Eclipse Modeling Framework (EMF), Xtext, and other languages in the EMF ecosystem. The tool includes a grammar or a meta-model of the DSL, an interface to enable textual and graphical modeling of architectural patterns and code transformer engine for verification. Furthermore, we evaluated the modeling and verification features of the tool support with an e-Car charging scenario for modeling adaptive self-healing properties. Futureworks: As an outlook, future works could include investigation of comprehensive case studies. These, for instance, could be further particular adaptability scenarios addressing challenges in DSOs. Another interesting aspect could be the evaluation of the modeling approach by investigating its use with engineers involved in a smart grid design. Next, the evaluation could be followed with abstractions of the verification process to make it useable by system architects with no knowledge of the proof language, Isabelle/HOL.

Towards a generalizable simulation framework to study collisions between spacecraft and debris

  • Authors: Simone Asci, Angadh Nanjangud
  • Subjects: Robotics (cs.RO); Symbolic Computation (cs.SC)
  • Arxiv link: https://arxiv.org/abs/2304.12799
  • Pdf link: https://arxiv.org/pdf/2304.12799
  • Abstract
    In recent years, computer simulators of rigid-body systems have been successfully used to improve and expand the field of developing new space robots, becoming a leading tool for the preliminary investigation and evaluation of space robotic missions. However, the impressive progress in performance has not been matched yet by an improvement in modelling capabilities, which remain limited to very basic representations of real systems. We present a new approach to modelling and simulation of collision-inclusive multibody dynamics by leveraging symbolic models generated by a computer algebra system (CAS). While similar investigations into contact dynamics on other domains exploit pre-existing models of common multibody systems (e.g., industrial robot arms, humanoids, and wheeled robots), our focus is on allowing researchers to develop models of novel designs of systems that are not as common or yet to be fabricated: e.g., small spacecraft manipulators. In this paper, we demonstrate the usefulness of our approach to investigate spacecraft-debris collision dynamics.

Adaptive Services Function Chain Orchestration For Digital Health Twin Use Cases: Heuristic-boosted Q-Learning Approach

  • Authors: Jamila Alsayed Kassem, Li Zhong, Arie Taal, Paola Grosso
  • Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.12853
  • Pdf link: https://arxiv.org/pdf/2304.12853
  • Abstract
    Digital Twin (DT) is a prominent technology to utilise and deploy within the healthcare sector. Yet, the main challenges facing such applications are: Strict health data-sharing policies, high-performance network requirements, and possible infrastructure resource limitations. In this paper, we address all the challenges by provisioning adaptive Virtual Network Functions (VNFs) to enforce security policies associated with different data-sharing scenarios. We define a Cloud-Native Network orchestrator on top of a multi-node cluster mesh infrastructure for flexible and dynamic container scheduling. The proposed framework considers the intended data-sharing use case, the policies associated, and infrastructure configurations, then provision Service Function Chaining (SFC) and provides routing configurations accordingly with little to no human intervention. Moreover, what is \textit{optimal} when deploying SFC is dependent on the use case itself, and we tune the hyperparameters to prioritise resource utilisation or latency in an effort to comply with the performance requirements. As a result, we provide an adaptive network orchestration for digital health twin use cases, that is policy-aware, requirements-aware, and resource-aware.

Constraining Chaos: Enforcing dynamical invariants in the training of recurrent neural networks

  • Authors: Jason A. Platt, Stephen G. Penny, Timothy A. Smith, Tse-Chun Chen, Henry D. I. Abarbanel
  • Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Geophysics (physics.geo-ph)
  • Arxiv link: https://arxiv.org/abs/2304.12865
  • Pdf link: https://arxiv.org/pdf/2304.12865
  • Abstract
    Drawing on ergodic theory, we introduce a novel training method for machine learning based forecasting methods for chaotic dynamical systems. The training enforces dynamical invariants--such as the Lyapunov exponent spectrum and fractal dimension--in the systems of interest, enabling longer and more stable forecasts when operating with limited data. The technique is demonstrated in detail using the recurrent neural network architecture of reservoir computing. Results are given for the Lorenz 1996 chaotic dynamical system and a spectral quasi-geostrophic model, both typical test cases for numerical weather prediction.

Data-Driven Robust Optimization for Energy-Aware and Safe Navigation of Electric Vehicles

  • Authors: Simran Kumari, Ashish R. Hota, Siddhartha Mukhopadhyay
  • Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.12887
  • Pdf link: https://arxiv.org/pdf/2304.12887
  • Abstract
    In this paper, we simultaneously tackle the problem of energy optimal and safe navigation of electric vehicles in a data-driven robust optimization framework. We consider a dynamic model of the electric vehicle which includes both longitudinal and lateral motion as well as dynamics of stored energy level. We leverage past data of obstacle motion to construct a future occupancy set with probabilistic guarantees, and formulate robust collision avoidance constraints with respect to such an occupancy set using convex programming duality. Consequently, we present the finite horizon optimal control problem subject to robust collision avoidance constraints while penalizing resulting energy consumption. Finally, we show the effectiveness of the proposed techniques in reducing energy consumption and ensuring safe navigation via extensive simulations.

The Score-Difference Flow for Implicit Generative Modeling

  • Authors: Romann M. Weber
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.12906
  • Pdf link: https://arxiv.org/pdf/2304.12906
  • Abstract
    Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. We introduce the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schr"odinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. However, unlike diffusion models, SD flow places no restrictions on the prior distribution. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that, taken together, address all three challenges of the "generative modeling trilemma": high sample quality, mode coverage, and fast sampling.

Direct Collocation Methods for Trajectory Optimization in Constrained Robotic Systems

  • Authors: Ricard Bordalba, Tobias Schoels, Lluís Ros, Josep M. Porta, Moritz Diehl
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.12908
  • Pdf link: https://arxiv.org/pdf/2304.12908
  • Abstract
    Direct collocation methods are powerful tools to solve trajectory optimization problems in robotics. While their resulting trajectories tend to be dynamically accurate, they may also present large kinematic errors in the case of constrained mechanical systems, i.e., those whose state coordinates are subject to holonomic or nonholonomic constraints, like loop-closure or rolling-contact constraints. These constraints confine the robot trajectories to an implicitly-defined manifold, which complicates the computation of accurate solutions. Discretization errors inherent to the transcription of the problem easily make the trajectories drift away from this manifold, which results in physically inconsistent motions that are difficult to track with a controller. This paper reviews existing methods to deal with this problem and proposes new ones to overcome their limitations. Current approaches either disregard the kinematic constraints (which leads to drift accumulation) or modify the system dynamics to keep the trajectory close to the manifold (which adds artificial forces or energy dissipation to the system). The methods we propose, in contrast, achieve full drift elimination on the discrete trajectory, or even along the continuous one, without artificial modifications of the system dynamics. We illustrate and compare the methods using various examples of different complexity.

System Identification with Copula Entropy

  • Authors: Jian Ma
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Methodology (stat.ME)
  • Arxiv link: https://arxiv.org/abs/2304.12922
  • Pdf link: https://arxiv.org/pdf/2304.12922
  • Abstract
    Identifying differential equation governing dynamical system is an important problem with wide applications. Copula Entropy (CE) is a mathematical concept for measuring statistical independence in information theory. In this paper we propose a method for identifying differential equation of dynamical systems with CE. The problem is considered as a variable selection problem and solved with the previously proposed CE-based method for variable selection. The proposed method composed of two components: the difference operator and the CE estimator. Since both components can be done non-parametrically, the proposed method is therefore model-free and hyperparameter-free. The simulation experiment with the 3D Lorenz system verified the effectiveness of the proposed method.

SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators

  • Authors: Victor J.B. Jung, Arne Symons, Linyan Mei, Marian Verhelst, Luca Benini
  • Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.12931
  • Pdf link: https://arxiv.org/pdf/2304.12931
  • Abstract
    To meet the growing need for computational power for DNNs, multiple specialized hardware architectures have been proposed. Each DNN layer should be mapped onto the hardware with the most efficient schedule, however, SotA schedulers struggle to consistently provide optimum schedules in a reasonable time across all DNN-HW combinations. This paper proposes SALSA, a fast dual-engine scheduler to generate optimal execution schedules for both even and uneven mapping. We introduce a new strategy, combining exhaustive search with simulated annealing to address the dynamic nature of the loop ordering design space size across layers. SALSA is extensively benchmarked against two SotA schedulers, LOMA and Timeloop on 5 different DNNs, on average SALSA finds schedules with 11.9% and 7.6% lower energy while speeding up the search by 1.7x and 24x compared to LOMA and Timeloop, respectively.

The ACCompanion: Combining Reactivity, Robustness, and Musical Expressivity in an Automatic Piano Accompanist

  • Authors: Carlos Cancino-Chacón, Silvan Peter, Patricia Hu, Emmanouil Karystinaios, Florian Henkel, Francesco Foscarin, Nimrod Varga, Gerhard Widmer
  • Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.12939
  • Pdf link: https://arxiv.org/pdf/2304.12939
  • Abstract
    This paper introduces the ACCompanion, an expressive accompaniment system. Similarly to a musician who accompanies a soloist playing a given musical piece, our system can produce a human-like rendition of the accompaniment part that follows the soloist's choices in terms of tempo, dynamics, and articulation. The ACCompanion works in the symbolic domain, i.e., it needs a musical instrument capable of producing and playing MIDI data, with explicitly encoded onset, offset, and pitch for each played note. We describe the components that go into such a system, from real-time score following and prediction to expressive performance generation and online adaptation to the expressive choices of the human player. Based on our experience with repeated live demonstrations in front of various audiences, we offer an analysis of the challenges of combining these components into a system that is highly reactive and precise, while still a reliable musical partner, robust to possible performance errors and responsive to expressive variations.

Latent Traversals in Generative Models as Potential Flows

  • Authors: Yue Song, Andy Keller, Nicu Sebe, Max Welling
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.12944
  • Pdf link: https://arxiv.org/pdf/2304.12944
  • Abstract
    Despite the significant recent progress in deep generative models, the underlying structure of their latent spaces is still poorly understood, thereby making the task of performing semantically meaningful latent traversals an open research challenge. Most prior work has aimed to solve this challenge by modeling latent structures linearly, and finding corresponding linear directions which result in `disentangled' generations. In this work, we instead propose to model latent structures with a learned dynamic potential landscape, thereby performing latent traversals as the flow of samples down the landscape's gradient. Inspired by physics, optimal transport, and neuroscience, these potential landscapes are learned as physically realistic partial differential equations, thereby allowing them to flexibly vary over both space and time. To achieve disentanglement, multiple potentials are learned simultaneously, and are constrained by a classifier to be distinct and semantically self-consistent. Experimentally, we demonstrate that our method achieves both more qualitatively and quantitatively disentangled trajectories than state-of-the-art baselines. Further, we demonstrate that our method can be integrated as a regularization term during training, thereby acting as an inductive bias towards the learning of structured representations, ultimately improving model likelihood on similarly structured data.

Nondeterministic Stacks in Neural Networks

  • Authors: Brian DuSell
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.12955
  • Pdf link: https://arxiv.org/pdf/2304.12955
  • Abstract
    Human language is full of compositional syntactic structures, and although neural networks have contributed to groundbreaking improvements in computer systems that process language, widely-used neural network architectures still exhibit limitations in their ability to process syntax. To address this issue, prior work has proposed adding stack data structures to neural networks, drawing inspiration from theoretical connections between syntax and stacks. However, these methods employ deterministic stacks that are designed to track one parse at a time, whereas syntactic ambiguity, which requires a nondeterministic stack to parse, is extremely common in language. In this dissertation, we remedy this discrepancy by proposing a method of incorporating nondeterministic stacks into neural networks. We develop a differentiable data structure that efficiently simulates a nondeterministic pushdown automaton, representing an exponential number of computations with a dynamic programming algorithm. We incorporate this module into two predominant architectures: recurrent neural networks (RNNs) and transformers. We show that this raises their formal recognition power to arbitrary context-free languages, and also aids training, even on deterministic context-free languages. Empirically, neural networks with nondeterministic stacks learn context-free languages much more effectively than prior stack-augmented models, including a language with theoretically maximal parsing difficulty. We also show that an RNN augmented with a nondeterminsitic stack is capable of surprisingly powerful behavior, such as learning cross-serial dependencies, a well-known non-context-free pattern. We demonstrate improvements on natural language modeling and provide analysis on a syntactic generalization benchmark. This work represents an important step toward building systems that learn to use syntax in more human-like fashion.

Centralized control for multi-agent RL in a complex Real-Time-Strategy game

  • Authors: Roger Creus Castanyer
  • Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2304.13004
  • Pdf link: https://arxiv.org/pdf/2304.13004
  • Abstract
    Multi-agent Reinforcement learning (MARL) studies the behaviour of multiple learning agents that coexist in a shared environment. MARL is more challenging than single-agent RL because it involves more complex learning dynamics: the observations and rewards of each agent are functions of all other agents. In the context of MARL, Real-Time Strategy (RTS) games represent very challenging environments where multiple players interact simultaneously and control many units of different natures all at once. In fact, RTS games are so challenging for the current RL methods, that just being able to tackle them with RL is interesting. This project provides the end-to-end experience of applying RL in the Lux AI v2 Kaggle competition, where competitors design agents to control variable-sized fleets of units and tackle a multi-variable optimization, resource gathering, and allocation problem in a 1v1 scenario against other competitors. We use a centralized approach for training the RL agents, and report multiple design decisions along the process. We provide the source code of the project: https://github.com/roger-creus/centralized-control-lux.

PoseVocab: Learning Joint-structured Pose Embeddings for Human Avatar Modeling

  • Authors: Zhe Li, Zerong Zheng, Yuxiao Liu, Boyao Zhou, Yebin Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13006
  • Pdf link: https://arxiv.org/pdf/2304.13006
  • Abstract
    Creating pose-driven human avatars is about modeling the mapping from the low-frequency driving pose to high-frequency dynamic human appearances, so an effective pose encoding method that can encode high-fidelity human details is essential to human avatar modeling.To this end, we present PoseVocab, a novel pose encoding method that encourages the network to discover the optimal pose embeddings for learning the dynamic human appearance. Given multi-view RGB videos of a character, PoseVocab constructs key poses and latent embeddings based on the training poses. To achieve pose generalization and temporal consistency, we sample key rotations in $so(3)$ of each joint rather than the global pose vectors, and assign a pose embedding to each sampled key rotation. These joint-structured pose embeddings not only encode the dynamic appearances under different key poses, but also factorize the global pose embedding into joint-structured ones to better learn the appearance variation related to the motion of each joint. To improve the representation ability of the pose embedding while maintaining memory efficiency, we introduce feature lines, a compact yet effective 3D representation, to model more fine-grained details of human appearances. Furthermore, given a query pose and a spatial position, a hierarchical query strategy is introduced to interpolate pose embeddings and acquire the conditional pose feature for dynamic human synthesis. Overall, PoseVocab effectively encodes the dynamic details of human appearance and enables realistic and generalized animation under novel poses. Experiments show that our method outperforms other state-of-the-art baselines both qualitatively and quantitatively in terms of synthesis quality. Code is available at https://github.com/lizhe00/PoseVocab.

Bake off redux: a review and experimental evaluation of recent time series classification algorithms

  • Authors: Matthew Middlehurst, Patrick Schäfer, Anthony Bagnall
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13029
  • Pdf link: https://arxiv.org/pdf/2304.13029
  • Abstract
    In 2017, a research paper compared 18 Time Series Classification (TSC) algorithms on 85 datasets from the University of California, Riverside (UCR) archive. This study, commonly referred to as a `bake off', identified that only nine algorithms performed significantly better than the Dynamic Time Warping (DTW) and Rotation Forest benchmarks that were used. The study categorised each algorithm by the type of feature they extract from time series data, forming a taxonomy of five main algorithm types. This categorisation of algorithms alongside the provision of code and accessible results for reproducibility has helped fuel an increase in popularity of the TSC field. Over six years have passed since this bake off, the UCR archive has expanded to 112 datasets and there have been a large number of new algorithms proposed. We revisit the bake off, seeing how each of the proposed categories have advanced since the original publication, and evaluate the performance of newer algorithms against the previous best-of-category using an expanded UCR archive. We extend the taxonomy to include three new categories to reflect recent developments. Alongside the originally proposed distance, interval, shapelet, dictionary and hybrid based algorithms, we compare newer convolution and feature based algorithms as well as deep learning approaches. We introduce 30 classification datasets either recently donated to the archive or reformatted to the TSC format, and use these to further evaluate the best performing algorithm from each category. Overall, we find that two recently proposed algorithms, Hydra+MultiROCKET and HIVE-COTEv2, perform significantly better than other approaches on both the current and new TSC problems.

New submissions for Mon, 3 Apr 23

Keyword: efficient

Machine learning for discovering laws of nature

  • Authors: Lizhi Xin, Kevin Xin, Houwen Xin
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)
  • Arxiv link: https://arxiv.org/abs/2303.17607
  • Pdf link: https://arxiv.org/pdf/2303.17607
  • Abstract
    A microscopic particle obeys the principles of quantum mechanics -- so where is the sharp boundary between the macroscopic and microscopic worlds? It was this "interpretation problem" that prompted Schr"odinger to propose his famous thought experiment (a cat that is simultaneously both dead and alive) and sparked a great debate about the quantum measurement problem, and there is still no satisfactory answer yet. This is precisely the inadequacy of rigorous mathematical models in describing the laws of nature. We propose a computational model to describe and understand the laws of nature based on Darwin's natural selection. In fact, whether it's a macro particle, a micro electron or a security, they can all be considered as an entity, the change of this entity over time can be described by a data series composed of states and values. An observer can learn from this data series to construct theories (usually consisting of functions and differential equations). We don't model with the usual functions or differential equations, but with a state Decision Tree (determines the state of an entity) and a value Function Tree (determines the distance between two points of an entity). A state Decision Tree and a value Function Tree together can reconstruct an entity's trajectory and make predictions about its future trajectory. Our proposed algorithmic model discovers laws of nature by only learning observed historical data (sequential measurement of observables) based on maximizing the observer's expected value. There is no differential equation in our model; our model has an emphasis on machine learning, where the observer builds up his/her experience by being rewarded or punished for each decision he/she makes, and eventually leads to rediscovering Newton's law, the Born rule (quantum mechanics) and the efficient market hypothesis (financial market).

Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition

  • Authors: Yujin Wu, Mohamed Daoudi, Ali Amad
  • Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2303.17611
  • Pdf link: https://arxiv.org/pdf/2303.17611
  • Abstract
    Recently, wearable emotion recognition based on peripheral physiological signals has drawn massive attention due to its less invasive nature and its applicability in real-life scenarios. However, how to effectively fuse multimodal data remains a challenging problem. Moreover, traditional fully-supervised based approaches suffer from overfitting given limited labeled data. To address the above issues, we propose a novel self-supervised learning (SSL) framework for wearable emotion recognition, where efficient multimodal fusion is realized with temporal convolution-based modality-specific encoders and a transformer-based shared encoder, capturing both intra-modal and inter-modal correlations. Extensive unlabeled data is automatically assigned labels by five signal transforms, and the proposed SSL model is pre-trained with signal transformation recognition as a pretext task, allowing the extraction of generalized multimodal representations for emotion-related downstream tasks. For evaluation, the proposed SSL model was first pre-trained on a large-scale self-collected physiological dataset and the resulting encoder was subsequently frozen or fine-tuned on three public supervised emotion recognition datasets. Ultimately, our SSL-based method achieved state-of-the-art results in various emotion classification tasks. Meanwhile, the proposed model proved to be more accurate and robust compared to fully-supervised methods on low data regimes.

Scalable High-Quality Hypergraph Partitioning

  • Authors: Lars Gottesbüren, Tobias Heuer, Nikolai Maas, Peter Sanders, Sebastian Schlag
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.17679
  • Pdf link: https://arxiv.org/pdf/2303.17679
  • Abstract
    Balanced hypergraph partitioning is an NP-hard problem with many applications, e.g., optimizing communication in distributed data placement problems. The goal is to place all nodes across $k$ different blocks of bounded size, such that hyperedges span as few parts as possible. This problem is well-studied in sequential and distributed settings, but not in shared-memory. We close this gap by devising efficient and scalable shared-memory algorithms for all components employed in the best sequential solvers without compromises with regards to solution quality. This work presents the scalable and high-quality hypergraph partitioning framework Mt-KaHyPar. Its most important components are parallel improvement algorithms based on the FM algorithm and maximum flows, as well as a parallel clustering algorithm for coarsening - which are used in a multilevel scheme with $\log(n)$ levels. As additional components, we parallelize the $n$-level partitioning scheme, devise a deterministic version of our algorithm, and present optimizations for plain graphs. We evaluate our solver on more than 800 graphs and hypergraphs, and compare it with 25 different algorithms from the literature. Our fastest configuration outperforms almost all existing hypergraph partitioners with regards to both solution quality and running time. Our highest-quality configuration achieves the same solution quality as the best sequential partitioner KaHyPar, while being an order of magnitude faster with ten threads. Thus, two of our configurations occupy all fronts of the Pareto curve for hypergraph partitioning. Furthermore, our solvers exhibit good speedups, e.g., 29.6x in the geometric mean on 64 cores (deterministic), 22.3x ($\log(n)$-level), and 25.9x ($n$-level).

Mitigating Source Bias for Fairer Weak Supervision

  • Authors: Changho Shin, Sonia Cromp, Dyah Adila, Frederic Sala
  • Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2303.17713
  • Pdf link: https://arxiv.org/pdf/2303.17713
  • Abstract
    Weak supervision overcomes the label bottleneck, enabling efficient development of training sets. Millions of models trained on such datasets have been deployed in the real world and interact with users on a daily basis. However, the techniques that make weak supervision attractive -- such as integrating any source of signal to estimate unknown labels -- also ensure that the pseudolabels it produces are highly biased. Surprisingly, given everyday use and the potential for increased bias, weak supervision has not been studied from the point of view of fairness. This work begins such a study. Our departure point is the observation that even when a fair model can be built from a dataset with access to ground-truth labels, the corresponding dataset labeled via weak supervision can be arbitrarily unfair. Fortunately, not all is lost: we propose and empirically validate a model for source unfairness in weak supervision, then introduce a simple counterfactual fairness-based technique that can mitigate these biases. Theoretically, we show that it is possible for our approach to simultaneously improve both accuracy and fairness metrics -- in contrast to standard fairness approaches that suffer from tradeoffs. Empirically, we show that our technique improves accuracy on weak supervision baselines by as much as 32% while reducing demographic parity gap by 82.5%.

BOLT: An Automated Deep Learning Framework for Training and Deploying Large-Scale Neural Networks on Commodity CPU Hardware

  • Authors: Nicholas Meisburger, Vihan Lakshman, Benito Geordie, Joshua Engels, David Torres Ramos, Pratik Pranav, Benjamin Coleman, Benjamin Meisburger, Shubh Gupta, Yashwanth Adunukota, Tharun Medini, Anshumali Shrivastava
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17727
  • Pdf link: https://arxiv.org/pdf/2303.17727
  • Abstract
    Efficient large-scale neural network training and inference on commodity CPU hardware is of immense practical significance in democratizing deep learning (DL) capabilities. Presently, the process of training massive models consisting of hundreds of millions to billions of parameters requires the extensive use of specialized hardware accelerators, such as GPUs, which are only accessible to a limited number of institutions with considerable financial resources. Moreover, there is often an alarming carbon footprint associated with training and deploying these models. In this paper, we address these challenges by introducing BOLT, a sparse deep learning library for training massive neural network models on standard CPU hardware. BOLT provides a flexible, high-level API for constructing models that will be familiar to users of existing popular DL frameworks. By automatically tuning specialized hyperparameters, BOLT also abstracts away the algorithmic details of sparse network training. We evaluate BOLT on a number of machine learning tasks drawn from recommendations, search, natural language processing, and personalization. We find that our proposed system achieves competitive performance with state-of-the-art techniques at a fraction of the cost and energy consumption and an order-of-magnitude faster inference time. BOLT has also been successfully deployed by multiple businesses to address critical problems, and we highlight one customer deployment case study in the field of e-commerce.

MemPool: A Scalable Manycore Architecture with a Low-Latency Shared L1 Memory

  • Authors: Samuel Riedel, Matheus Cavalcante, Renzo Andri, Luca Benini
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2303.17742
  • Pdf link: https://arxiv.org/pdf/2303.17742
  • Abstract
    Shared L1 memory clusters are a common architectural pattern (e.g., in GPGPUs) for building efficient and flexible multi-processing-element (PE) engines. However, it is a common belief that these tightly-coupled clusters would not scale beyond a few tens of PEs. In this work, we tackle scaling shared L1 clusters to hundreds of PEs while supporting a flexible and productive programming model and maintaining high efficiency. We present MemPool, a manycore system with 256 RV32IMAXpulpimg "Snitch" cores featuring application-tunable functional units. We designed and implemented an efficient low-latency PE to L1-memory interconnect, an optimized instruction path to ensure each PE's independent execution, and a powerful DMA engine and system interconnect to stream data in and out. MemPool is easy to program, with all the cores sharing a global view of a large, multi-banked, L1 scratchpad memory, accessible within at most five cycles in the absence of conflicts. We provide multiple runtimes to program MemPool at different abstraction levels and illustrate its versatility with a wide set of applications. MemPool runs at 600 MHz (60 gate delays) in typical conditions (TT/0.80V/25{\deg}C) in 22 nm FDX technology and achieves a performance of up to 229 GOPS or 192 GOPS/W with less than 2% of execution stalls.

Solution of Real Cubic Equations without Cardano's Formula

  • Authors: Bahman Kalantari
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.17747
  • Pdf link: https://arxiv.org/pdf/2303.17747
  • Abstract
    Building on a classification of zeros of cubic equations due to the $12$-th century Persian mathematician Sharaf al-Din Tusi, together with Smale's theory of {\it point estimation}, we derive an efficient recipe for computing high-precision approximation to a real root of an arbitrary real cubic equation. First, via reversible transformations we reduce any real cubic equation into one of four canonical forms with $0$, $\pm 1$ coefficients, except for the constant term as $\pm q$, $q \geq 0$. Next, given any form, if $\rho_q$ is an approximation to $\sqrt[3]{q}$ to within a relative error of five percent, we prove a {\it seed} $x_0$ in ${ \rho_q, \pm .95 \rho_q, -\frac{1}{3}, 1 }$ can be selected such that in $t$ Newton iterations $|x_t - \theta_q| \leq \sqrt[3]{q}\cdot 2^{-2^{t}}$ for some real root $\theta_q$. While computing a good seed, even for approximation of $\sqrt[3]{q}$, is considered to be ``somewhat of black art'' (see Wikipedia), as we justify, $\rho_q$ is readily computable from {\it mantissa} and {\it exponent} of $q$. It follows that the above approach gives a simple recipe for numerical approximation of solutions of real cubic equations independent of Cardano's formula.

MLGCN: An Ultra Efficient Graph Convolution Neural Model For 3D Point Cloud Analysis

  • Authors: Mohammad Khodadad, Morteza Rezanejad, Ali Shiraee Kasmaee, Kaleem Siddiqi, Dirk Walther, Hamidreza Mahyar
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17748
  • Pdf link: https://arxiv.org/pdf/2303.17748
  • Abstract
    The analysis of 3D point clouds has diverse applications in robotics, vision and graphics. Processing them presents specific challenges since they are naturally sparse, can vary in spatial resolution and are typically unordered. Graph-based networks to abstract features have emerged as a promising alternative to convolutional neural networks for their analysis, but these can be computationally heavy as well as memory inefficient. To address these limitations we introduce a novel Multi-level Graph Convolution Neural (MLGCN) model, which uses Graph Neural Networks (GNN) blocks to extract features from 3D point clouds at specific locality levels. Our approach employs precomputed graph KNNs, where each KNN graph is shared between GCN blocks inside a GNN block, making it both efficient and effective compared to present models. We demonstrate the efficacy of our approach on point cloud based object classification and part segmentation tasks on benchmark datasets, showing that it produces comparable results to those of state-of-the-art models while requiring up to a thousand times fewer floating-point operations (FLOPs) and having significantly reduced storage requirements. Thus, our MLGCN model could be particular relevant to point cloud based 3D shape analysis in industrial applications when computing resources are scarce.

Pacti: Scaling Assume-Guarantee Reasoning for System Analysis and Design

  • Authors: Inigo Incer, Apurva Badithela, Josefine Graebener, Piergiuseppe Mallozzi, Ayush Pandey, Sheng-Jung Yu, Albert Benveniste, Benoit Caillaud, Richard M. Murray, Alberto Sangiovanni-Vincentelli, Sanjit A. Seshia
  • Subjects: Logic in Computer Science (cs.LO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.17751
  • Pdf link: https://arxiv.org/pdf/2303.17751
  • Abstract
    Contract-based design is a method to facilitate modular system design. While there has been substantial progress on the theory of contracts, there has been less progress on scalable algorithms for the algebraic operations in this theory. In this paper, we present: 1) principles to implement a contract-based design tool at scale and 2) Pacti, a tool that can efficiently compute these operations. We then illustrate the use of Pacti in a variety of case studies.

Lattice-based kernel approximation and serendipitous weights for parametric PDEs in very high dimensions

  • Authors: Vesa Kaarnioja, Frances Y. Kuo, Ian H. Sloan
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.17755
  • Pdf link: https://arxiv.org/pdf/2303.17755
  • Abstract
    We describe a fast method for solving elliptic partial differential equations (PDEs) with uncertain coefficients using kernel interpolation at a lattice point set. By representing the input random field of the system using the model proposed by Kaarnioja, Kuo, and Sloan (SIAM J.~Numer.~Anal.~2020), in which a countable number of independent random variables enter the random field as periodic functions, it was shown by Kaarnioja, Kazashi, Kuo, Nobile, and Sloan (Numer.~Math.~2022) that the lattice-based kernel interpolant can be constructed for the PDE solution as a function of the stochastic variables in a highly efficient manner using fast Fourier transform (FFT). In this work, we discuss the connection between our model and the popular ``affine and uniform model'' studied widely in the literature of uncertainty quantification for PDEs with uncertain coefficients. We also propose a new class of weights entering the construction of the kernel interpolant -- \emph{serendipitous weights} -- which dramatically improve the computational performance of the kernel interpolant for PDE problems with uncertain coefficients, and allow us to tackle function approximation problems up to very high dimensionalities. Numerical experiments are presented to showcase the performance of the serendipitous weights.

SOSR: Source-Free Image Super-Resolution with Wavelet Augmentation Transformer

  • Authors: Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Lei Zhang, Ran He
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17783
  • Pdf link: https://arxiv.org/pdf/2303.17783
  • Abstract
    Real-world images taken by different cameras with different degradation kernels often result in a cross-device domain gap in image super-resolution. A prevalent attempt to this issue is unsupervised domain adaptation (UDA) that needs to access source data. Considering privacy policies or transmission restrictions of data in many practical applications, we propose a SOurce-free image Super-Resolution framework (SOSR) to address this issue, i.e., adapt a model pre-trained on labeled source data to a target domain with only unlabeled target data. SOSR leverages the source model to generate refined pseudo-labels for teacher-student learning. To better utilize the pseudo-labels, this paper proposes a novel wavelet-based augmentation method, named Wavelet Augmentation Transformer (WAT), which can be flexibly incorporated with existing networks, to implicitly produce useful augmented data. WAT learns low-frequency information of varying levels across diverse samples, which is aggregated efficiently via deformable attention. Furthermore, an uncertainty-aware self-training mechanism is proposed to improve the accuracy of pseudo-labels, with inaccurate predictions being rectified by uncertainty estimation. To acquire better SR results and avoid overfitting pseudo-labels, several regularization losses are proposed to constrain the frequency information between target LR and SR images. Experiments show that without accessing source data, SOSR achieves superior results to the state-of-the-art UDA methods.

Attention is Not Always What You Need: Towards Efficient Classification of Domain-Specific Text

  • Authors: Yasmen Wahba, Nazim Madhavji, John Steinbacher
  • Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17786
  • Pdf link: https://arxiv.org/pdf/2303.17786
  • Abstract
    For large-scale IT corpora with hundreds of classes organized in a hierarchy, the task of accurate classification of classes at the higher level in the hierarchies is crucial to avoid errors propagating to the lower levels. In the business world, an efficient and explainable ML model is preferred over an expensive black-box model, especially if the performance increase is marginal. A current trend in the Natural Language Processing (NLP) community is towards employing huge pre-trained language models (PLMs) or what is known as self-attention models (e.g., BERT) for almost any kind of NLP task (e.g., question-answering, sentiment analysis, text classification). Despite the widespread use of PLMs and the impressive performance in a broad range of NLP tasks, there is a lack of a clear and well-justified need to as why these models are being employed for domain-specific text classification (TC) tasks, given the monosemic nature of specialized words (i.e., jargon) found in domain-specific text which renders the purpose of contextualized embeddings (e.g., PLMs) futile. In this paper, we compare the accuracies of some state-of-the-art (SOTA) models reported in the literature against a Linear SVM classifier and TFIDF vectorization model on three TC datasets. Results show a comparable performance for the LinearSVM. The findings of this study show that for domain-specific TC tasks, a linear model can provide a comparable, cheap, reproducible, and interpretable alternative to attention-based models.

Neural Microfacet Fields for Inverse Rendering

  • Authors: Alexander Mai, Dor Verbin, Falko Kuester, Sara Fridovich-Keil
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2303.17806
  • Pdf link: https://arxiv.org/pdf/2303.17806
  • Abstract
    We present Neural Microfacet Fields, a method for recovering materials, geometry, and environment illumination from images of a scene. Our method uses a microfacet reflectance model within a volumetric setting by treating each sample along the ray as a (potentially non-opaque) surface. Using surface-based Monte Carlo rendering in a volumetric setting enables our method to perform inverse rendering efficiently by combining decades of research in surface-based light transport with recent advances in volume rendering for view synthesis. Our approach outperforms prior work in inverse rendering, capturing high fidelity geometry and high frequency illumination details; its novel view synthesis results are on par with state-of-the-art methods that do not recover illumination or materials.

LabelVizier: Interactive Validation and Relabeling for Technical Text Annotations

  • Authors: Xiaoyu Zhang, Xiwei Xuan, Alden Dima, Thurston Sexton, Kwan-Liu Ma
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2303.17820
  • Pdf link: https://arxiv.org/pdf/2303.17820
  • Abstract
    With the rapid accumulation of text data produced by data-driven techniques, the task of extracting "data annotations"--concise, high-quality data summaries from unstructured raw text--has become increasingly important. The recent advances in weak supervision and crowd-sourcing techniques provide promising solutions to efficiently create annotations (labels) for large-scale technical text data. However, such annotations may fail in practice because of the change in annotation requirements, application scenarios, and modeling goals, where label validation and relabeling by domain experts are required. To approach this issue, we present LabelVizier, a human-in-the-loop workflow that incorporates domain knowledge and user-specific requirements to reveal actionable insights into annotation flaws, then produce better-quality labels for large-scale multi-label datasets. We implement our workflow as an interactive notebook to facilitate flexible error profiling, in-depth annotation validation for three error types, and efficient annotation relabeling on different data scales. We evaluated the efficiency and generalizability of our workflow with two use cases and four expert reviews. The results indicate that LabelVizier is applicable in various application scenarios and assist domain experts with different knowledge backgrounds to efficiently improve technical text annotation quality.

AI-Oriented Two-Phase Multi-Factor Authentication in SAGINs: Prospects and Challenges

  • Authors: Bin Yang, Shanyun Liu, Tao Xu, Chuyu Li, Yongdong Zhu, Zipeng Li, Zhifeng Zhao
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2303.17833
  • Pdf link: https://arxiv.org/pdf/2303.17833
  • Abstract
    Space-air-ground integrated networks (SAGINs), which have emerged as an expansion of terrestrial networks, provide flexible access, ubiquitous coverage, high-capacity backhaul, and emergency/disaster recovery for mobile users (MUs). While the massive benefits brought by SAGIN may improve the quality of service, unauthorized access to SAGIN entities is potentially dangerous. At present, conventional crypto-based authentication is facing challenges, such as the inability to provide continuous and transparent protection for MUs. In this article, we propose an AI-oriented two-phase multi-factor authentication scheme (ATMAS) by introducing intelligence to authentication. The satellite and network control center collaborate on continuous authentication, while unique spatial-temporal features, including service features and geographic features, are utilized to enhance the system security. Our further security analysis and performance evaluations show that ATMAS has proper security characteristics which can meet various security requirements. Moreover, we shed light on lightweight and efficient authentication mechanism design through a proper combination of spatial-temporal factors.

Vision-Assisted mmWave Beam Management for Next-Generation Wireless Systems: Concepts, Solutions and Open Challenges

  • Authors: Kan Zheng, Haojun Yang, Ziqiang Ying, Pengshuo Wang, Lajos Hanzo
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.17857
  • Pdf link: https://arxiv.org/pdf/2303.17857
  • Abstract
    Beamforming techniques have been widely used in the millimeter wave (mmWave) bands to mitigate the path loss of mmWave radio links as the narrow straight beams by directionally concentrating the signal energy. However, traditional mmWave beam management algorithms usually require excessive channel state information overhead, leading to extremely high computational and communication costs. This hinders the widespread deployment of mmWave communications. By contrast, the revolutionary vision-assisted beam management system concept employed at base stations (BSs) can select the optimal beam for the target user equipment (UE) based on its location information determined by machine learning (ML) algorithms applied to visual data, without requiring channel information. In this paper, we present a comprehensive framework for a vision-assisted mmWave beam management system, its typical deployment scenarios as well as the specifics of the framework. Then, some of the challenges faced by this system and their efficient solutions are discussed from the perspective of ML. Next, a new simulation platform is conceived to provide both visual and wireless data for model validation and performance evaluation. Our simulation results indicate that the vision-assisted beam management is indeed attractive for next-generation wireless systems.

Exploring the Limits of Deep Image Clustering using Pretrained Models

  • Authors: Nikolas Adaloglou, Felix Michels, Hamza Kalisch, Markus Kollmann
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17896
  • Pdf link: https://arxiv.org/pdf/2303.17896
  • Abstract
    We present a general methodology that learns to classify images without labels by leveraging pretrained feature extractors. Our approach involves self-distillation training of clustering heads, based on the fact that nearest neighbors in the pretrained feature space are likely to share the same label. We propose a novel objective to learn associations between images by introducing a variant of pointwise mutual information together with instance weighting. We demonstrate that the proposed objective is able to attenuate the effect of false positive pairs while efficiently exploiting the structure in the pretrained feature space. As a result, we improve the clustering accuracy over $k$-means on $17$ different pretrained models by $6.1$% and $12.2$% on ImageNet and CIFAR100, respectively. Finally, using self-supervised pretrained vision transformers we push the clustering accuracy on ImageNet to $61.6$%. The code will be open-sourced.

FP8 versus INT8 for efficient deep learning inference

  • Authors: Mart van Baalen, Andrey Kuzmin, Suparna S Nair, Yuwei Ren, Eric Mahurin, Chirag Patel, Sundar Subramanian, Sanghyuk Lee, Markus Nagel, Joseph Soriaga, Tijmen Blankevoort
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17951
  • Pdf link: https://arxiv.org/pdf/2303.17951
  • Abstract
    Recently, the idea of using FP8 as a number format for neural network training has been floating around the deep learning world. Given that most training is currently conducted with entire networks in FP32, or sometimes FP16 with mixed-precision, the step to having some parts of a network run in FP8 with 8-bit weights is an appealing potential speed-up for the generally costly and time-intensive training procedures in deep learning. A natural question arises regarding what this development means for efficient inference on edge devices. In the efficient inference device world, workloads are frequently executed in INT8. Sometimes going even as low as INT4 when efficiency calls for it. In this whitepaper, we compare the performance for both the FP8 and INT formats for efficient on-device inference. We theoretically show the difference between the INT and FP formats for neural networks and present a plethora of post-training quantization and quantization-aware-training results to show how this theory translates to practice. We also provide a hardware analysis showing that the FP formats are somewhere between 50-180% less efficient in terms of compute in dedicated hardware than the INT format. Based on our research and a read of the research field, we conclude that although the proposed FP8 format could be good for training, the results for inference do not warrant a dedicated implementation of FP8 in favor of INT8 for efficient inference. We show that our results are mostly consistent with previous findings but that important comparisons between the formats have thus far been lacking. Finally, we discuss what happens when FP8-trained networks are converted to INT8 and conclude with a brief discussion on the most efficient way for on-device deployment and an extensive suite of INT8 results for many models.

How to measure research performance of single scientists? A proposal for an index based on scientific prizes: The Prize Winner Index (PWI)

  • Authors: Lutz Bornmann, Robin Haunschild
  • Subjects: Digital Libraries (cs.DL); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2303.18007
  • Pdf link: https://arxiv.org/pdf/2303.18007
  • Abstract
    In this study, we propose a new index for measuring excellence in science which is based on collaborations (co-authorship distances) in science. The index is based on the Erd\H{o}s number - a number that was introduced several years ago. We propose to focus with the new index on laureates of prestigious prizes in a certain field and to measure co-authorship distances between the laureates and other scientists. To exemplify and explain our proposal, we computed the proposed index in the field of quantitative science studies (PWIPM). The Derek de Solla Price Memorial Award (Price Medal, PM) is awarded to outstanding scientists in the field. We tested the convergent validity of the PWIPM. We were interested whether the indicator is related to an established bibliometric indicator: P(top 10%). The results show that the coefficients for the correlation between PWIPM and P(top 10%) are high (in cases when a sufficient number of papers have been considered for a reliable assessment of performance). Therefore, measured by an established indicator for research excellence, the new PWI indicator seems to be convergently valid and, therefore, might be a possible alternative for established (bibliometric) indicators - with a focus on prizes.

The Many Qualities of a New Directly Accessible Compression Scheme

  • Authors: Domenico Cantone, Simone Faro
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2303.18063
  • Pdf link: https://arxiv.org/pdf/2303.18063
  • Abstract
    We present a new variable-length computation-friendly encoding scheme, named SFDC (Succinct Format with Direct aCcesibility), that supports direct and fast accessibility to any element of the compressed sequence and achieves compression ratios often higher than those offered by other solutions in the literature. The SFDC scheme provides a flexible and simple representation geared towards either practical efficiency or compression ratios, as required. For a text of length $n$ over an alphabet of size $\sigma$ and a fixed parameter $\lambda$, the access time of the proposed encoding is proportional to the length of the character's code-word, plus an expected $\mathcal{O}((F_{\sigma - \lambda + 3} - 3)/F_{\sigma+1})$ overhead, where $F_j$ is the $j$-th number of the Fibonacci sequence. In the overall it uses $N+\mathcal{O}\big(n \left(\lambda - (F_{\sigma+3}-3)/F_{\sigma+1}\big) \right) = N + \mathcal{O}(n)$ bits, where $N$ is the length of the encoded string. Experimental results show that the performance of our scheme is, in some respects, comparable with the performance of DACs and Wavelet Tees, which are among of the most efficient schemes. In addition our scheme is configured as a \emph{computation-friendly compression} scheme, as it counts several features that make it very effective in text processing tasks. In the string matching problem, that we take as a case study, we experimentally prove that the new scheme enables results that are up to 29 times faster than standard string-matching techniques on plain texts.

A data-driven method for parametric PDE Eigenvalue Problems using Gaussian Process with different covariance functions

  • Authors: Moataz Alghamdi, Fleurianne Bertrand, Daniele Boffi, Abdul Halim
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.18064
  • Pdf link: https://arxiv.org/pdf/2303.18064
  • Abstract
    We propose a non-intrusive, reduced-basis, and data-driven method for approximating both eigenvalues and eigenvectors in parametric eigenvalue problems. We generate the basis of the reduced space by applying the proper orthogonal decomposition (POD) approach on a collection of pre-computed, full-order snapshots at a chosen set of parameters. Then, we use Bayesian linear regression (a.k.a. Gaussian Process Regression) in the online phase to predict both eigenvalues and eigenvectors at new parameters. A split of the data generated in the offline phase into training and test data sets is utilized in the numerical experiments following standard practices in the field of supervised machine learning. Furthermore, we discuss the connection between Gaussian Process Regression and spline methods, and compare the performance of GPR method against linear and cubic spline methods. We show that GPR outperforms other methods for functions with a certain regularity. To this end, we discuss various different covariance functions which influence the performance of GPR. The proposed method is shown to be accurate and efficient for the approximation of multiple 1D and 2D affine and non-affine parameter-dependent eigenvalue problems that exhibit crossing of eigenvalues.

Dictionary-based Online-adaptive Structure-preserving Model Order Reduction for Parametric Hamiltonian Systems

  • Authors: Robin Herkert, Patrick Buchfink, Bernard Haasdonk
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.18072
  • Pdf link: https://arxiv.org/pdf/2303.18072
  • Abstract
    Classical model order reduction (MOR) for parametric problems may become computationally inefficient due to large sizes of the required projection bases, especially for problems with slowly decaying Kolmogorov n-widths. Additionally, Hamiltonian structure of dynamical systems may be available and should be preserved during the reduction. In the current presentation, we address these two aspects by proposing a corresponding dictionary-based, online-adaptive MOR approach. The method requires dictionaries for the state-variable, non-linearities and discrete empirical interpolation (DEIM) points. During the online simulation, local basis extensions/simplifications are performed in an online-efficient way, i.e. the runtime complexity of basis modifications and online simulation of the reduced models do not depend on the full state dimension. Experiments on a linear wave equation and a non-linear Sine-Gordon example demonstrate the efficiency of the approach.

Analysis and Comparison of Two-Level KFAC Methods for Training Deep Neural Networks

  • Authors: Abdoulaye Koroko, Ani Anciaux-Sedrakian, Ibtihel Ben Gharbia, Valérie Garès, Mounir Haddou, Quang Huy Tran
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2303.18083
  • Pdf link: https://arxiv.org/pdf/2303.18083
  • Abstract
    As a second-order method, the Natural Gradient Descent (NGD) has the ability to accelerate training of neural networks. However, due to the prohibitive computational and memory costs of computing and inverting the Fisher Information Matrix (FIM), efficient approximations are necessary to make NGD scalable to Deep Neural Networks (DNNs). Many such approximations have been attempted. The most sophisticated of these is KFAC, which approximates the FIM as a block-diagonal matrix, where each block corresponds to a layer of the neural network. By doing so, KFAC ignores the interactions between different layers. In this work, we investigate the interest of restoring some low-frequency interactions between the layers by means of two-level methods. Inspired from domain decomposition, several two-level corrections to KFAC using different coarse spaces are proposed and assessed. The obtained results show that incorporating the layer interactions in this fashion does not really improve the performance of KFAC. This suggests that it is safe to discard the off-diagonal blocks of the FIM, since the block-diagonal approach is sufficiently robust, accurate and economical in computation time.

RDMNet: Reliable Dense Matching Based Point Cloud Registration for Autonomous Driving

  • Authors: Chenghao Shi, Xieyuanli Chen, Huimin Lu, Wenbang Deng, Junhao Xiao, Bin Dai
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.18084
  • Pdf link: https://arxiv.org/pdf/2303.18084
  • Abstract
    Point cloud registration is an important task in robotics and autonomous driving to estimate the ego-motion of the vehicle. Recent advances following the coarse-to-fine manner show promising potential in point cloud registration. However, existing methods rely on good superpoint correspondences, which are hard to be obtained reliably and efficiently, thus resulting in less robust and accurate point cloud registration. In this paper, we propose a novel network, named RDMNet, to find dense point correspondences coarse-to-fine and improve final pose estimation based on such reliable correspondences. Our RDMNet uses a devised 3D-RoFormer mechanism to first extract distinctive superpoints and generates reliable superpoints matches between two point clouds. The proposed 3D-RoFormer fuses 3D position information into the transformer network, efficiently exploiting point clouds' contextual and geometric information to generate robust superpoint correspondences. RDMNet then propagates the sparse superpoints matches to dense point matches using the neighborhood information for accurate point cloud registration. We extensively evaluate our method on multiple datasets from different environments. The experimental results demonstrate that our method outperforms existing state-of-the-art approaches in all tested datasets with a strong generalization ability.

Direct Data-Driven Computation of Polytopic Robust Control Invariant Sets and State-Feedback Controllers

  • Authors: Manas Mejari, Ankit Gupta
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.18154
  • Pdf link: https://arxiv.org/pdf/2303.18154
  • Abstract
    This paper presents a direct data-driven approach for computing robust control invariant (RCI) sets and their associated state-feedback control laws. The proposed method utilizes a single state-input trajectory generated from the system, to compute a polytopic RCI set with a desired complexity and an invariance-inducing feedback controller, without the need to identify a model of the system. The problem is formulated in terms of a set of sufficient LMI conditions that are then combined in a semi-definite program to maximize the volume of the RCI set while respecting the state and input constraints. We demonstrate through a numerical case study that the proposed data-driven approach can generate RCI sets that are of comparable size to those obtained by a model-based method in which exact knowledge of the system matrices is assumed. Under the assumption of persistency of excitation of the data, the proposed algorithm guarantees robust invariance even with a small number of data samples. Overall, the direct data-driven approach presented in this paper offers a reliable and efficient counterpart to the model-based methods for RCI set computation and state-feedback controller design.

Single Image Depth Prediction Made Better: A Multivariate Gaussian Take

  • Authors: Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Luc Van Gool
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.18164
  • Pdf link: https://arxiv.org/pdf/2303.18164
  • Abstract
    Neural-network-based single image depth prediction (SIDP) is a challenging task where the goal is to predict the scene's per-pixel depth at test time. Since the problem, by definition, is ill-posed, the fundamental goal is to come up with an approach that can reliably model the scene depth from a set of training examples. In the pursuit of perfect depth estimation, most existing state-of-the-art learning techniques predict a single scalar depth value per-pixel. Yet, it is well-known that the trained model has accuracy limits and can predict imprecise depth. Therefore, an SIDP approach must be mindful of the expected depth variations in the model's prediction at test time. Accordingly, we introduce an approach that performs continuous modeling of per-pixel depth, where we can predict and reason about the per-pixel depth and its distribution. To this end, we model per-pixel scene depth using a multivariate Gaussian distribution. Moreover, contrary to the existing uncertainty modeling methods -- in the same spirit, where per-pixel depth is assumed to be independent, we introduce per-pixel covariance modeling that encodes its depth dependency w.r.t all the scene points. Unfortunately, per-pixel depth covariance modeling leads to a computationally expensive continuous loss function, which we solve efficiently using the learned low-rank approximation of the overall covariance matrix. Notably, when tested on benchmark datasets such as KITTI, NYU, and SUN-RGB-D, the SIDP model obtained by optimizing our loss function shows state-of-the-art results. Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.

How Efficient Are Today's Continual Learning Algorithms?

  • Authors: Md Yousuf Harun, Jhair Gallardo, Tyler L. Hayes, Christopher Kanan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.18171
  • Pdf link: https://arxiv.org/pdf/2303.18171
  • Abstract
    Supervised Continual learning involves updating a deep neural network (DNN) from an ever-growing stream of labeled data. While most work has focused on overcoming catastrophic forgetting, one of the major motivations behind continual learning is being able to efficiently update a network with new information, rather than retraining from scratch on the training dataset as it grows over time. Despite recent continual learning methods largely solving the catastrophic forgetting problem, there has been little attention paid to the efficiency of these algorithms. Here, we study recent methods for incremental class learning and illustrate that many are highly inefficient in terms of compute, memory, and storage. Some methods even require more compute than training from scratch! We argue that for continual learning to have real-world applicability, the research community cannot ignore the resources used by these algorithms. There is more to continual learning than mitigating catastrophic forgetting.

A Closer Look at Parameter-Efficient Tuning in Diffusion Models

  • Authors: Chendong Xiang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.18181
  • Pdf link: https://arxiv.org/pdf/2303.18181
  • Abstract
    Large-scale diffusion models like Stable Diffusion are powerful and find various real-world applications while customizing such models by fine-tuning is both memory and time inefficient. Motivated by the recent progress in natural language processing, we investigate parameter-efficient tuning in large diffusion models by inserting small learnable modules (termed adapters). In particular, we decompose the design space of adapters into orthogonal factors -- the input position, the output position as well as the function form, and perform Analysis of Variance (ANOVA), a classical statistical approach for analyzing the correlation between discrete (design options) and continuous variables (evaluation metrics). Our analysis suggests that the input position of adapters is the critical factor influencing the performance of downstream tasks. Then, we carefully study the choice of the input position, and we find that putting the input position after the cross-attention block can lead to the best performance, validated by additional visualization analyses. Finally, we provide a recipe for parameter-efficient tuning in diffusion models, which is comparable if not superior to the fully fine-tuned baseline (e.g., DreamBooth) with only 0.75 % extra parameters, across various customized tasks.

GVP: Generative Volumetric Primitives

  • Authors: Mallikarjun B R, Xingang Pan, Mohamed Elgharib, Christian Theobalt
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.18193
  • Pdf link: https://arxiv.org/pdf/2303.18193
  • Abstract
    Advances in 3D-aware generative models have pushed the boundary of image synthesis with explicit camera control. To achieve high-resolution image synthesis, several attempts have been made to design efficient generators, such as hybrid architectures with both 3D and 2D components. However, such a design compromises multiview consistency, and the design of a pure 3D generator with high resolution is still an open problem. In this work, we present Generative Volumetric Primitives (GVP), the first pure 3D generative model that can sample and render 512-resolution images in real-time. GVP jointly models a number of volumetric primitives and their spatial information, both of which can be efficiently generated via a 2D convolutional network. The mixture of these primitives naturally captures the sparsity and correspondence in the 3D volume. The training of such a generator with a high degree of freedom is made possible through a knowledge distillation technique. Experiments on several datasets demonstrate superior efficiency and 3D consistency of GVP over the state-of-the-art.

Aerostack2: A Software Framework for Developing Multi-robot Aerial Systems

  • Authors: Miguel Fernandez-Cortizas, Martin Molina, Pedro Arias-Perez, Rafael Perez-Segui, David Perez-Saura, Pascual Campoy
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.18237
  • Pdf link: https://arxiv.org/pdf/2303.18237
  • Abstract
    In recent years, the robotics community has witnessed the development of several software stacks for ground and articulated robots, such as Navigation2 and MoveIt. However, the same level of collaboration and standardization is yet to be achieved in the field of aerial robotics, where each research group has developed their own frameworks. This work presents Aerostack2, a framework for the development of autonomous aerial robotics systems that aims to address the lack of standardization and fragmentation of efforts in the field. Built on ROS 2 middleware and featuring an efficient modular software architecture and multi-robot orientation, Aerostack2 is a versatile and platform-independent environment that covers a wide range of robot capabilities for autonomous operation. Its major contributions include providing a logical level for specifying missions, reusing components and sub-systems for aerial robotics, and enabling the development of complete control architectures. All major contributions have been tested in simulation and real flights with multiple heterogeneous swarms. Aerostack2 is open source and community oriented, democratizing the access to its technology by autonomous drone systems developers.

Keyword: faster

oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

  • Authors: Daniel Campos, Alexandre Marques, Mark Kurtz, ChengXiang Zhai
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17612
  • Pdf link: https://arxiv.org/pdf/2303.17612
  • Abstract
    In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models, which allows Natural Language Processing (NLP) practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression. Specifically, oBERTa extends existing work on pruning, knowledge distillation, and quantization and leverages frozen embeddings to improve knowledge distillation, and improved model initialization to deliver higher accuracy on a a broad range of transfer tasks. In generating oBERTa, we explore how the highly optimized RoBERTa differs from the BERT with respect to pruning during pre-training and fine-tuning and find it less amenable to compression during fine-tuning. We explore the use of oBERTa on a broad seven representative NLP tasks and find that the improved compression techniques allow a pruned oBERTa model to match the performance of BERTBASE and exceed the performance of Prune OFA Large on the SQUAD V1.1 Question Answering dataset, despite being 8x and 2x, respectively, faster in inference. We release our code, training regimes, and associated model for broad usage to encourage usage and experimentation.

Scalable High-Quality Hypergraph Partitioning

  • Authors: Lars Gottesbüren, Tobias Heuer, Nikolai Maas, Peter Sanders, Sebastian Schlag
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.17679
  • Pdf link: https://arxiv.org/pdf/2303.17679
  • Abstract
    Balanced hypergraph partitioning is an NP-hard problem with many applications, e.g., optimizing communication in distributed data placement problems. The goal is to place all nodes across $k$ different blocks of bounded size, such that hyperedges span as few parts as possible. This problem is well-studied in sequential and distributed settings, but not in shared-memory. We close this gap by devising efficient and scalable shared-memory algorithms for all components employed in the best sequential solvers without compromises with regards to solution quality. This work presents the scalable and high-quality hypergraph partitioning framework Mt-KaHyPar. Its most important components are parallel improvement algorithms based on the FM algorithm and maximum flows, as well as a parallel clustering algorithm for coarsening - which are used in a multilevel scheme with $\log(n)$ levels. As additional components, we parallelize the $n$-level partitioning scheme, devise a deterministic version of our algorithm, and present optimizations for plain graphs. We evaluate our solver on more than 800 graphs and hypergraphs, and compare it with 25 different algorithms from the literature. Our fastest configuration outperforms almost all existing hypergraph partitioners with regards to both solution quality and running time. Our highest-quality configuration achieves the same solution quality as the best sequential partitioner KaHyPar, while being an order of magnitude faster with ten threads. Thus, two of our configurations occupy all fronts of the Pareto curve for hypergraph partitioning. Furthermore, our solvers exhibit good speedups, e.g., 29.6x in the geometric mean on 64 cores (deterministic), 22.3x ($\log(n)$-level), and 25.9x ($n$-level).

Task Oriented Conversational Modelling With Subjective Knowledge

  • Authors: Raja Kumar
  • Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17695
  • Pdf link: https://arxiv.org/pdf/2303.17695
  • Abstract
    Existing conversational models are handled by a database(DB) and API based systems. However, very often users' questions require information that cannot be handled by such systems. Nonetheless, answers to these questions are available in the form of customer reviews and FAQs. DSTC-11 proposes a three stage pipeline consisting of knowledge seeking turn detection, knowledge selection and response generation to create a conversational model grounded on this subjective knowledge. In this paper, we focus on improving the knowledge selection module to enhance the overall system performance. In particular, we propose entity retrieval methods which result in an accurate and faster knowledge search. Our proposed Named Entity Recognition (NER) based entity retrieval method results in 7X faster search compared to the baseline model. Additionally, we also explore a potential keyword extraction method which can improve the accuracy of knowledge selection. Preliminary results show a 4 % improvement in exact match score on knowledge selection task. The code is available https://github.com/raja-kumar/knowledge-grounded-TODS

BOLT: An Automated Deep Learning Framework for Training and Deploying Large-Scale Neural Networks on Commodity CPU Hardware

  • Authors: Nicholas Meisburger, Vihan Lakshman, Benito Geordie, Joshua Engels, David Torres Ramos, Pratik Pranav, Benjamin Coleman, Benjamin Meisburger, Shubh Gupta, Yashwanth Adunukota, Tharun Medini, Anshumali Shrivastava
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17727
  • Pdf link: https://arxiv.org/pdf/2303.17727
  • Abstract
    Efficient large-scale neural network training and inference on commodity CPU hardware is of immense practical significance in democratizing deep learning (DL) capabilities. Presently, the process of training massive models consisting of hundreds of millions to billions of parameters requires the extensive use of specialized hardware accelerators, such as GPUs, which are only accessible to a limited number of institutions with considerable financial resources. Moreover, there is often an alarming carbon footprint associated with training and deploying these models. In this paper, we address these challenges by introducing BOLT, a sparse deep learning library for training massive neural network models on standard CPU hardware. BOLT provides a flexible, high-level API for constructing models that will be familiar to users of existing popular DL frameworks. By automatically tuning specialized hyperparameters, BOLT also abstracts away the algorithmic details of sparse network training. We evaluate BOLT on a number of machine learning tasks drawn from recommendations, search, natural language processing, and personalization. We find that our proposed system achieves competitive performance with state-of-the-art techniques at a fraction of the cost and energy consumption and an order-of-magnitude faster inference time. BOLT has also been successfully deployed by multiple businesses to address critical problems, and we highlight one customer deployment case study in the field of e-commerce.

Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis

  • Authors: Yanjie Dong, Luya Wang, Yuanfang Chi, Jia Wang, Haijun Zhang, Fei Richard Yu, Victor C. M. Leung, Xiping Hu
  • Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2303.17885
  • Pdf link: https://arxiv.org/pdf/2303.17885
  • Abstract
    A wireless federated learning system is investigated by allowing a server and workers to exchange uncoded information via orthogonal wireless channels. Since the workers frequently upload local gradients to the server via bandwidth-limited channels, the uplink transmission from the workers to the server becomes a communication bottleneck. Therefore, a one-shot distributed principle component analysis (PCA) is leveraged to reduce the dimension of uploaded gradients such that the communication bottleneck is relieved. A PCA-based wireless federated learning (PCA-WFL) algorithm and its accelerated version (i.e., PCA-AWFL) are proposed based on the low-dimensional gradients and the Nesterov's momentum. For the non-convex loss functions, a finite-time analysis is performed to quantify the impacts of system hyper-parameters on the convergence of the PCA-WFL and PCA-AWFL algorithms. The PCA-AWFL algorithm is theoretically certified to converge faster than the PCA-WFL algorithm. Besides, the convergence rates of PCA-WFL and PCA-AWFL algorithms quantitatively reveal the linear speedup with respect to the number of workers over the vanilla gradient descent algorithm. Numerical results are used to demonstrate the improved convergence rates of the proposed PCA-WFL and PCA-AWFL algorithms over the benchmarks.

IC-FPS: Instance-Centroid Faster Point Sampling Module for 3D Point-base Object Detection

  • Authors: Hu Haotian, Wang Fanyi, Su Jingwen, Gao Shiyu, Zhang Zhiwang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17921
  • Pdf link: https://arxiv.org/pdf/2303.17921
  • Abstract
    3D object detection is one of the most important tasks in autonomous driving and robotics. Our research focuses on tackling low efficiency issue of point-based methods on large-scale point clouds. Existing point-based methods adopt farthest point sampling (FPS) strategy for downsampling, which is computationally expensive in terms of inference time and memory consumption when the number of point cloud increases. In order to improve efficiency, we propose a novel Instance-Centroid Faster Point Sampling Module (IC-FPS) , which effectively replaces the first Set Abstraction (SA) layer that is extremely tedious. IC-FPS module is comprised of two methods, local feature diffusion based background point filter (LFDBF) and Centroid-Instance Sampling Strategy (CISS). LFDBF is constructed to exclude most invalid background points, while CISS substitutes FPS strategy by fast sampling centroids and instance points. IC-FPS module can be inserted to almost every point-based models. Extensive experiments on multiple public benchmarks have demonstrated the superiority of IC-FPS. On Waymo dataset, the proposed module significantly improves performance of baseline model and accelerates inference speed by 3.8 times. For the first time, real-time detection of point-based models in large-scale point cloud scenario is realized.

The Many Qualities of a New Directly Accessible Compression Scheme

  • Authors: Domenico Cantone, Simone Faro
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2303.18063
  • Pdf link: https://arxiv.org/pdf/2303.18063
  • Abstract
    We present a new variable-length computation-friendly encoding scheme, named SFDC (Succinct Format with Direct aCcesibility), that supports direct and fast accessibility to any element of the compressed sequence and achieves compression ratios often higher than those offered by other solutions in the literature. The SFDC scheme provides a flexible and simple representation geared towards either practical efficiency or compression ratios, as required. For a text of length $n$ over an alphabet of size $\sigma$ and a fixed parameter $\lambda$, the access time of the proposed encoding is proportional to the length of the character's code-word, plus an expected $\mathcal{O}((F_{\sigma - \lambda + 3} - 3)/F_{\sigma+1})$ overhead, where $F_j$ is the $j$-th number of the Fibonacci sequence. In the overall it uses $N+\mathcal{O}\big(n \left(\lambda - (F_{\sigma+3}-3)/F_{\sigma+1}\big) \right) = N + \mathcal{O}(n)$ bits, where $N$ is the length of the encoded string. Experimental results show that the performance of our scheme is, in some respects, comparable with the performance of DACs and Wavelet Tees, which are among of the most efficient schemes. In addition our scheme is configured as a \emph{computation-friendly compression} scheme, as it counts several features that make it very effective in text processing tasks. In the string matching problem, that we take as a case study, we experimentally prove that the new scheme enables results that are up to 29 times faster than standard string-matching techniques on plain texts.

TPMCF: Temporal QoS Prediction using Multi-Source Collaborative Features

  • Authors: Suraj Kumar, Soumi Chattopadhyay, Chandranath Adak
  • Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.18201
  • Pdf link: https://arxiv.org/pdf/2303.18201
  • Abstract
    Recently, with the rapid deployment of service APIs, personalized service recommendations have played a paramount role in the growth of the e-commerce industry. Quality-of-Service (QoS) parameters determining the service performance, often used for recommendation, fluctuate over time. Thus, the QoS prediction is essential to identify a suitable service among functionally equivalent services over time. The contemporary temporal QoS prediction methods hardly achieved the desired accuracy due to various limitations, such as the inability to handle data sparsity and outliers and capture higher-order temporal relationships among user-service interactions. Even though some recent recurrent neural-network-based architectures can model temporal relationships among QoS data, prediction accuracy degrades due to the absence of other features (e.g., collaborative features) to comprehend the relationship among the user-service interactions. This paper addresses the above challenges and proposes a scalable strategy for Temporal QoS Prediction using Multi-source Collaborative-Features (TPMCF), achieving high prediction accuracy and faster responsiveness. TPMCF combines the collaborative-features of users/services by exploiting user-service relationship with the spatio-temporal auto-extracted features by employing graph convolution and transformer encoder with multi-head self-attention. We validated our proposed method on WS-DREAM-2 datasets. Extensive experiments showed TPMCF outperformed major state-of-the-art approaches regarding prediction accuracy while ensuring high scalability and reasonably faster responsiveness.

Shipper collaboration matching: fast enumeration of triangular transports with high cooperation effects

  • Authors: Akifumi Kira, Nobuo Terajima, Yasuhiko Watanabe, Hirotaka Yamamoto
  • Subjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2303.18222
  • Pdf link: https://arxiv.org/pdf/2303.18222
  • Abstract
    The logistics industry in Japan is facing a severe shortage of labor. Therefore, there is an increasing need for joint transportation allowing large amounts of cargo to be transported using fewer trucks. In recent years, the use of artificial intelligence and other new technologies has gained wide attention for improving matching efficiency. However, it is difficult to develop a system that can instantly respond to requests because browsing through enormous combinations of two transport lanes is time consuming. In this study, we focus on a form of joint transportation called triangular transportation and enumerate the combinations with high cooperation effects. The proposed algorithm makes good use of hidden inequalities, such as the distance axiom, to narrow down the search range without sacrificing accuracy. Numerical experiments show that the proposed algorithm is thousands of times faster than simple brute force. With this technology as the core engine, we developed a joint transportation matching system. The system has already been in use by over 150 companies as of October 2022, and was featured in a collection of logistics digital transformation cases published by Japan's Ministry of Land, Infrastructure, Transport and Tourism.

Keyword: mobile

A CI-based Auditing Framework for Data Collection Practices

  • Authors: Athina Markopoulou, Rahmadi Trimananda, Hao Cui
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.17740
  • Pdf link: https://arxiv.org/pdf/2303.17740
  • Abstract
    Apps and devices (mobile devices, web browsers, IoT, VR, voice assistants, etc.) routinely collect user data, and send them to first- and third-party servers through the network. Recently, there is a lot of interest in (1) auditing the actual data collection practices of those systems; and also in (2) checking the consistency of those practices against the statements made in the corresponding privacy policies. In this paper, we argue that the contextual integrity (CI) tuple can be the basic building block for defining and implementing such an auditing framework. We elaborate on the special case where the tuple is partially extracted from the network traffic generated by the end-device of interest, and partially from the corresponding privacy policies using natural language processing (NLP) techniques. Along the way, we discuss related bodies of work and representative examples that fit into that framework. More generally, we believe that CI can be the building block not only for auditing at the edge, but also for specifying privacy policies and system APIs. We also discuss limitations and directions for future work.

Semi-Weakly Supervised Object Kinematic Motion Prediction

  • Authors: Gengxin Liu, Qian Sun, Haibin Huang, Chongyang Ma, Yulan Guo, Li Yi, Hui Huang, Ruizhen Hu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2303.17774
  • Pdf link: https://arxiv.org/pdf/2303.17774
  • Abstract
    Given a 3D object, kinematic motion prediction aims to identify the mobile parts as well as the corresponding motion parameters. Due to the large variations in both topological structure and geometric details of 3D objects, this remains a challenging task and the lack of large scale labeled data also constrain the performance of deep learning based approaches. In this paper, we tackle the task of object kinematic motion prediction problem in a semi-weakly supervised manner. Our key observations are two-fold. First, although 3D dataset with fully annotated motion labels is limited, there are existing datasets and methods for object part semantic segmentation at large scale. Second, semantic part segmentation and mobile part segmentation is not always consistent but it is possible to detect the mobile parts from the underlying 3D structure. Towards this end, we propose a graph neural network to learn the map between hierarchical part-level segmentation and mobile parts parameters, which are further refined based on geometric alignment. This network can be first trained on PartNet-Mobility dataset with fully labeled mobility information and then applied on PartNet dataset with fine-grained and hierarchical part-level segmentation. The network predictions yield a large scale of 3D objects with pseudo labeled mobility information and can further be used for weakly-supervised learning with pre-existing segmentation. Our experiments show there are significant performance boosts with the augmented data for previous method designed for kinematic motion prediction on 3D partial scans.

Rethinking Local Perception in Lightweight Vision Transformer

  • Authors: Qihang Fan, Huaibo Huang, Jiyang Guan, Ran He
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17803
  • Pdf link: https://arxiv.org/pdf/2303.17803
  • Abstract
    Vision Transformers (ViTs) have been shown to be effective in various vision tasks. However, resizing them to a mobile-friendly size leads to significant performance degradation. Therefore, developing lightweight vision transformers has become a crucial area of research. This paper introduces CloFormer, a lightweight vision transformer that leverages context-aware local enhancement. CloFormer explores the relationship between globally shared weights often used in vanilla convolutional operators and token-specific context-aware weights appearing in attention, then proposes an effective and straightforward module to capture high-frequency local information. In CloFormer, we introduce AttnConv, a convolution operator in attention's style. The proposed AttnConv uses shared weights to aggregate local information and deploys carefully designed context-aware weights to enhance local features. The combination of the AttnConv and vanilla attention which uses pooling to reduce FLOPs in CloFormer enables the model to perceive high-frequency and low-frequency information. Extensive experiments were conducted in image classification, object detection, and semantic segmentation, demonstrating the superiority of CloFormer.

AI-Oriented Two-Phase Multi-Factor Authentication in SAGINs: Prospects and Challenges

  • Authors: Bin Yang, Shanyun Liu, Tao Xu, Chuyu Li, Yongdong Zhu, Zipeng Li, Zhifeng Zhao
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2303.17833
  • Pdf link: https://arxiv.org/pdf/2303.17833
  • Abstract
    Space-air-ground integrated networks (SAGINs), which have emerged as an expansion of terrestrial networks, provide flexible access, ubiquitous coverage, high-capacity backhaul, and emergency/disaster recovery for mobile users (MUs). While the massive benefits brought by SAGIN may improve the quality of service, unauthorized access to SAGIN entities is potentially dangerous. At present, conventional crypto-based authentication is facing challenges, such as the inability to provide continuous and transparent protection for MUs. In this article, we propose an AI-oriented two-phase multi-factor authentication scheme (ATMAS) by introducing intelligence to authentication. The satellite and network control center collaborate on continuous authentication, while unique spatial-temporal features, including service features and geographic features, are utilized to enhance the system security. Our further security analysis and performance evaluations show that ATMAS has proper security characteristics which can meet various security requirements. Moreover, we shed light on lightweight and efficient authentication mechanism design through a proper combination of spatial-temporal factors.

Adaptive Model Prediction Control-Based Multi-Terrain Trajectory Tracking Framework for Mobile Spherical Robots

  • Authors: Yifan Liu, Tao Hu, Xiaoqing Guan, Yixu Wang, Bixuan Zhang, You Wang, Guang Li
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.18186
  • Pdf link: https://arxiv.org/pdf/2303.18186
  • Abstract
    Owing to uncertainties in both kinematics and dynamics, the current trajectory tracking framework for mobile robots like spherical robots cannot function effectively on multiple terrains, especially uneven and unknown ones. Since this is a prerequisite for robots to execute tasks in the wild, we enhance our previous hierarchical trajectory tracking framework to handle this issue. First, a modified adaptive RBF neural network (RBFNN) is proposed to represent all uncertainties in kinodynamics. Then the Lyapunov function is utilized to design its adaptive law, and a variable step-size algorithm is employed in the weights update procedure to accelerate convergence and improve stability. Hence, a new adaptive model prediction control-based instruction planner (VAN-MPC) is proposed. Without modifying the bottom controllers, we finally develop the multi-terrain trajectory tracking framework by employing the new instruction planner VAN-MPC. The practical experiments demonstrate its effectiveness and robustness.

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

  • Authors: Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.18240
  • Pdf link: https://arxiv.org/pdf/2303.18240
  • Abstract
    We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate CortexBench, consisting of 17 different tasks spanning locomotion, navigation, dexterous, and mobile manipulation. Next, we systematically evaluate existing PVRs and find that none are universally dominant. To study the effect of pre-training data scale and diversity, we combine over 4,000 hours of egocentric videos from 7 different sources (over 5.6M images) and ImageNet to train different-sized vision transformers using Masked Auto-Encoding (MAE) on slices of this data. Contrary to inferences from prior work, we find that scaling dataset size and diversity does not improve performance universally (but does so on average). Our largest model, named VC-1, outperforms all prior PVRs on average but does not universally dominate either. Finally, we show that task or domain-specific adaptation of VC-1 leads to substantial gains, with VC-1 (adapted) achieving competitive or superior performance than the best known results on all of the benchmarks in CortexBench. These models required over 10,000 GPU-hours to train and can be found on our website for the benefit of the research community.

Keyword: pruning

oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

  • Authors: Daniel Campos, Alexandre Marques, Mark Kurtz, ChengXiang Zhai
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17612
  • Pdf link: https://arxiv.org/pdf/2303.17612
  • Abstract
    In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models, which allows Natural Language Processing (NLP) practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression. Specifically, oBERTa extends existing work on pruning, knowledge distillation, and quantization and leverages frozen embeddings to improve knowledge distillation, and improved model initialization to deliver higher accuracy on a a broad range of transfer tasks. In generating oBERTa, we explore how the highly optimized RoBERTa differs from the BERT with respect to pruning during pre-training and fine-tuning and find it less amenable to compression during fine-tuning. We explore the use of oBERTa on a broad seven representative NLP tasks and find that the improved compression techniques allow a pruned oBERTa model to match the performance of BERTBASE and exceed the performance of Prune OFA Large on the SQUAD V1.1 Question Answering dataset, despite being 8x and 2x, respectively, faster in inference. We release our code, training regimes, and associated model for broad usage to encourage usage and experimentation.

Keyword: voxel

There is no result

Keyword: lidar

CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition

  • Authors: Tianrui Guan, Aswath Muthuselvam, Montana Hoover, Xijun Wang, Jing Liang, Adarsh Jagan Sathyamoorthy, Damon Conover, Dinesh Manocha
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17778
  • Pdf link: https://arxiv.org/pdf/2303.17778
  • Abstract
    We present CrossLoc3D, a novel 3D place recognition method that solves a large-scale point matching problem in a cross-source setting. Cross-source point cloud data corresponds to point sets captured by depth sensors with different accuracies or from different distances and perspectives. We address the challenges in terms of developing 3D place recognition methods that account for the representation gap between points captured by different sources. Our method handles cross-source data by utilizing multi-grained features and selecting convolution kernel sizes that correspond to most prominent features. Inspired by the diffusion models, our method uses a novel iterative refinement process that gradually shifts the embedding spaces from different sources to a single canonical space for better metric learning. In addition, we present CS-Campus3D, the first 3D aerial-ground cross-source dataset consisting of point cloud data from both aerial and ground LiDAR scans. The point clouds in CS-Campus3D have representation gaps and other features like different views, point densities, and noise patterns. We show that our CrossLoc3D algorithm can achieve an improvement of 4.74% - 15.37% in terms of the top 1 average recall on our CS-Campus3D benchmark and achieves performance comparable to state-of-the-art 3D place recognition method on the Oxford RobotCar. We will release the code and CS-Campus3D benchmark.

EA-BEV: Edge-aware Bird' s-Eye-View Projector for 3D Object Detection

  • Authors: Haotian, Hu, Fanyi, Wang, Jingwen, Su, Laifeng, Hu, Tianpeng, Feng, Zhaokai, Zhang, Wangzhi, Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17895
  • Pdf link: https://arxiv.org/pdf/2303.17895
  • Abstract
    In recent years, great progress has been made in the Lift-Splat-Shot-based (LSS-based) 3D object detection method, which converts features of 2D camera view and 3D lidar view to Bird's-Eye-View (BEV) for feature fusion. However, inaccurate depth estimation (e.g. the 'depth jump' problem) is an obstacle to develop LSS-based methods. To alleviate the 'depth jump' problem, we proposed Edge-Aware Bird's-Eye-View (EA-BEV) projector. By coupling proposed edge-aware depth fusion module and depth estimate module, the proposed EA-BEV projector solves the problem and enforces refined supervision on depth. Besides, we propose sparse depth supervision and gradient edge depth supervision, for constraining learning on global depth and local marginal depth information. Our EA-BEV projector is a plug-and-play module for any LSS-based 3D object detection models, and effectively improves the baseline performance. We demonstrate the effectiveness on the nuScenes benchmark. On the nuScenes 3D object detection validation dataset, our proposed EA-BEV projector can boost several state-of-the-art LLS-based baselines on nuScenes 3D object detection benchmark and nuScenes BEV map segmentation benchmark with negligible increment of inference time.

CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions

  • Authors: Ming Yan, Xin Wang, Yudi Dai, Siqi Shen, Chenglu Wen, Lan Xu, Yuexin Ma, Cheng Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17948
  • Pdf link: https://arxiv.org/pdf/2303.17948
  • Abstract
    Motion capture is a long-standing research problem. Although it has been studied for decades, the majority of research focus on ground-based movements such as walking, sitting, dancing, etc. Off-grounded actions such as climbing are largely overlooked. As an important type of action in sports and firefighting field, the climbing movements is challenging to capture because of its complex back poses, intricate human-scene interactions, and difficult global localization. The research community does not have an in-depth understanding of the climbing action due to the lack of specific datasets. To address this limitation, we collect CIMI4D, a large rock \textbf{C}l\textbf{I}mbing \textbf{M}ot\textbf{I}on dataset from 12 persons climbing 13 different climbing walls. The dataset consists of around 180,000 frames of pose inertial measurements, LiDAR point clouds, RGB videos, high-precision static point cloud scenes, and reconstructed scene meshes. Moreover, we frame-wise annotate touch rock holds to facilitate a detailed exploration of human-scene interaction. The core of this dataset is a blending optimization process, which corrects for the pose as it drifts and is affected by the magnetic conditions. To evaluate the merit of CIMI4D, we perform four tasks which include human pose estimations (with/without scene constraints), pose prediction, and pose generation. The experimental results demonstrate that CIMI4D presents great challenges to existing methods and enables extensive research opportunities. We share the dataset with the research community in this http URL

Keyword: diffusion

CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition

  • Authors: Tianrui Guan, Aswath Muthuselvam, Montana Hoover, Xijun Wang, Jing Liang, Adarsh Jagan Sathyamoorthy, Damon Conover, Dinesh Manocha
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17778
  • Pdf link: https://arxiv.org/pdf/2303.17778
  • Abstract
    We present CrossLoc3D, a novel 3D place recognition method that solves a large-scale point matching problem in a cross-source setting. Cross-source point cloud data corresponds to point sets captured by depth sensors with different accuracies or from different distances and perspectives. We address the challenges in terms of developing 3D place recognition methods that account for the representation gap between points captured by different sources. Our method handles cross-source data by utilizing multi-grained features and selecting convolution kernel sizes that correspond to most prominent features. Inspired by the diffusion models, our method uses a novel iterative refinement process that gradually shifts the embedding spaces from different sources to a single canonical space for better metric learning. In addition, we present CS-Campus3D, the first 3D aerial-ground cross-source dataset consisting of point cloud data from both aerial and ground LiDAR scans. The point clouds in CS-Campus3D have representation gaps and other features like different views, point densities, and noise patterns. We show that our CrossLoc3D algorithm can achieve an improvement of 4.74% - 15.37% in terms of the top 1 average recall on our CS-Campus3D benchmark and achieves performance comparable to state-of-the-art 3D place recognition method on the Oxford RobotCar. We will release the code and CS-Campus3D benchmark.

GlyphDraw: Learning to Draw Chinese Characters in Image Synthesis Models Coherently

  • Authors: Jian Ma, Mingjun Zhao, Chen Chen, Ruichen Wang, Di Niu, Haonan Lu, Xiaodong Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17870
  • Pdf link: https://arxiv.org/pdf/2303.17870
  • Abstract
    Recent breakthroughs in the field of language-guided image generation have yielded impressive achievements, enabling the creation of high-quality and diverse images based on user instructions. Although the synthesis performance is fascinating, one significant limitation of current image generation models is their insufficient ability to generate coherent text within images, particularly for complex glyph structures like Chinese characters. To address this problem, we introduce GlyphDraw, a general learning framework aiming at endowing image generation models with the capacity to generate images embedded with coherent text. To the best of our knowledge, this is the first work in the field of image synthesis to address the generation of Chinese characters. % we first adopt the OCR technique to collect images with Chinese characters as training samples, and extract the text and locations as auxiliary information. We first sophisticatedly design the image-text dataset's construction strategy, then build our model specifically on a diffusion-based image generator and carefully modify the network structure to allow the model to learn drawing Chinese characters with the help of glyph and position information. Furthermore, we maintain the model's open-domain image synthesis capability by preventing catastrophic forgetting by using a variety of training techniques. Extensive qualitative and quantitative experiments demonstrate that our method not only produces accurate Chinese characters as in prompts, but also naturally blends the generated text into the background. Please refer to https://1073521013.github.io/glyph-draw.github.io

3D-aware Image Generation using 2D Diffusion Models

  • Authors: Jianfeng Xiang, Jiaolong Yang, Binbin Huang, Xin Tong
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17905
  • Pdf link: https://arxiv.org/pdf/2303.17905
  • Abstract
    In this paper, we introduce a novel 3D-aware image generation method that leverages 2D diffusion models. We formulate the 3D-aware image generation task as multiview 2D image set generation, and further to a sequential unconditional-conditional multiview image generation process. This allows us to utilize 2D diffusion models to boost the generative modeling power of the method. Additionally, we incorporate depth information from monocular depth estimators to construct the training data for the conditional diffusion model using only still images. We train our method on a large-scale dataset, i.e., ImageNet, which is not addressed by previous methods. It produces high-quality images that significantly outperform prior methods. Furthermore, our approach showcases its capability to generate instances with large view angles, even though the training images are diverse and unaligned, gathered from "in-the-wild" real-world environments.

Pay Attention: Accuracy Versus Interpretability Trade-off in Fine-tuned Diffusion Models

  • Authors: Mischa Dombrowski, Hadrien Reynaud, Johanna P. Müller, Matthew Baugh, Bernhard Kainz
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17908
  • Pdf link: https://arxiv.org/pdf/2303.17908
  • Abstract
    The recent progress of diffusion models in terms of image quality has led to a major shift in research related to generative models. Current approaches often fine-tune pre-trained foundation models using domain-specific text-to-image pairs. This approach is straightforward for X-ray image generation due to the high availability of radiology reports linked to specific images. However, current approaches hardly ever look at attention layers to verify whether the models understand what they are generating. In this paper, we discover an important trade-off between image fidelity and interpretability in generative diffusion models. In particular, we show that fine-tuning text-to-image models with learnable text encoder leads to a lack of interpretability of diffusion models. Finally, we demonstrate the interpretability of diffusion models by showing that keeping the language encoder frozen, enables diffusion models to achieve state-of-the-art phrase grounding performance on certain diseases for a challenging multi-label segmentation task, without any additional training. Code and models will be available at https://github.com/MischaD/chest-distillation.

IC-FPS: Instance-Centroid Faster Point Sampling Module for 3D Point-base Object Detection

  • Authors: Hu Haotian, Wang Fanyi, Su Jingwen, Gao Shiyu, Zhang Zhiwang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17921
  • Pdf link: https://arxiv.org/pdf/2303.17921
  • Abstract
    3D object detection is one of the most important tasks in autonomous driving and robotics. Our research focuses on tackling low efficiency issue of point-based methods on large-scale point clouds. Existing point-based methods adopt farthest point sampling (FPS) strategy for downsampling, which is computationally expensive in terms of inference time and memory consumption when the number of point cloud increases. In order to improve efficiency, we propose a novel Instance-Centroid Faster Point Sampling Module (IC-FPS) , which effectively replaces the first Set Abstraction (SA) layer that is extremely tedious. IC-FPS module is comprised of two methods, local feature diffusion based background point filter (LFDBF) and Centroid-Instance Sampling Strategy (CISS). LFDBF is constructed to exclude most invalid background points, while CISS substitutes FPS strategy by fast sampling centroids and instance points. IC-FPS module can be inserted to almost every point-based models. Extensive experiments on multiple public benchmarks have demonstrated the superiority of IC-FPS. On Waymo dataset, the proposed module significantly improves performance of baseline model and accelerates inference speed by 3.8 times. For the first time, real-time detection of point-based models in large-scale point cloud scenario is realized.

Diffusion Action Segmentation

  • Authors: Daochang Liu, Qiyue Li, AnhDung Dinh, Tingting Jiang, Mubarak Shah, Chang Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2303.17959
  • Pdf link: https://arxiv.org/pdf/2303.17959
  • Abstract
    Temporal action segmentation is crucial for understanding long-form videos. Previous works on this task commonly adopt an iterative refinement paradigm by using multi-stage models. Our paper proposes an essentially different framework via denoising diffusion models, which nonetheless shares the same inherent spirit of such iterative refinement. In this framework, action predictions are progressively generated from random noise with input video features as conditions. To enhance the modeling of three striking characteristics of human actions, including the position prior, the boundary ambiguity, and the relational dependency, we devise a unified masking strategy for the conditioning inputs in our framework. Extensive experiments on three benchmark datasets, i.e., GTEA, 50Salads, and Breakfast, are performed and the proposed method achieves superior or comparable results to state-of-the-art methods, showing the effectiveness of a generative approach for action segmentation. Our codes will be made available.

HD-GCN:A Hybrid Diffusion Graph Convolutional Network

  • Authors: Zhi Yang, Kang Li, Haitao Gan, Zhongwei Huang, Ming Shi
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17966
  • Pdf link: https://arxiv.org/pdf/2303.17966
  • Abstract
    The information diffusion performance of GCN and its variant models is limited by the adjacency matrix, which can lower their performance. Therefore, we introduce a new framework for graph convolutional networks called Hybrid Diffusion-based Graph Convolutional Network (HD-GCN) to address the limitations of information diffusion caused by the adjacency matrix. In the HD-GCN framework, we initially utilize diffusion maps to facilitate the diffusion of information among nodes that are adjacent to each other in the feature space. This allows for the diffusion of information between similar points that may not have an adjacent relationship. Next, we utilize graph convolution to further propagate information among adjacent nodes after the diffusion maps, thereby enabling the spread of information among similar nodes that are adjacent in the graph. Finally, we employ the diffusion distances obtained through the use of diffusion maps to regularize and constrain the predicted labels of training nodes. This regularization method is then applied to the HD-GCN training, resulting in a smoother classification surface. The model proposed in this paper effectively overcomes the limitations of information diffusion imposed only by the adjacency matrix. HD-GCN utilizes hybrid diffusion by combining information diffusion between neighborhood nodes in the feature space and adjacent nodes in the adjacency matrix. This method allows for more comprehensive information propagation among nodes, resulting in improved model performance. We evaluated the performance of DM-GCN on three well-known citation network datasets and the results showed that the proposed framework is more effective than several graph-based semi-supervised learning methods.

One-shot Unsupervised Domain Adaptation with Personalized Diffusion Models

  • Authors: Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.18080
  • Pdf link: https://arxiv.org/pdf/2303.18080
  • Abstract
    Adapting a segmentation model from a labeled source domain to a target domain, where a single unlabeled datum is available, is one the most challenging problems in domain adaptation and is otherwise known as one-shot unsupervised domain adaptation (OSUDA). Most of the prior works have addressed the problem by relying on style transfer techniques, where the source images are stylized to have the appearance of the target domain. Departing from the common notion of transferring only the target ``texture'' information, we leverage text-to-image diffusion models (e.g., Stable Diffusion) to generate a synthetic target dataset with photo-realistic images that not only faithfully depict the style of the target domain, but are also characterized by novel scenes in diverse contexts. The text interface in our method Data AugmenTation with diffUsion Models (DATUM) endows us with the possibility of guiding the generation of images towards desired semantic concepts while respecting the original spatial context of a single training image, which is not possible in existing OSUDA methods. Extensive experiments on standard benchmarks show that our DATUM surpasses the state-of-the-art OSUDA methods by up to +7.1%. The implementation is available at https://github.com/yasserben/DATUM

A Closer Look at Parameter-Efficient Tuning in Diffusion Models

  • Authors: Chendong Xiang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.18181
  • Pdf link: https://arxiv.org/pdf/2303.18181
  • Abstract
    Large-scale diffusion models like Stable Diffusion are powerful and find various real-world applications while customizing such models by fine-tuning is both memory and time inefficient. Motivated by the recent progress in natural language processing, we investigate parameter-efficient tuning in large diffusion models by inserting small learnable modules (termed adapters). In particular, we decompose the design space of adapters into orthogonal factors -- the input position, the output position as well as the function form, and perform Analysis of Variance (ANOVA), a classical statistical approach for analyzing the correlation between discrete (design options) and continuous variables (evaluation metrics). Our analysis suggests that the input position of adapters is the critical factor influencing the performance of downstream tasks. Then, we carefully study the choice of the input position, and we find that putting the input position after the cross-attention block can lead to the best performance, validated by additional visualization analyses. Finally, we provide a recipe for parameter-efficient tuning in diffusion models, which is comparable if not superior to the fully fine-tuned baseline (e.g., DreamBooth) with only 0.75 % extra parameters, across various customized tasks.

$\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States

  • Authors: Sam Bond-Taylor, Chris G. Willcocks
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.18242
  • Pdf link: https://arxiv.org/pdf/2303.18242
  • Abstract
    We introduce $\infty$-Diff, a generative diffusion model which directly operates on infinite resolution data. By randomly sampling subsets of coordinates during training and learning to denoise the content at those coordinates, a continuous function is learned that allows sampling at arbitrary resolutions. In contrast to other recent infinite resolution generative models, our approach operates directly on the raw data, not requiring latent vector compression for context, using hypernetworks, nor relying on discrete components. As such, our approach achieves significantly higher sample quality, as evidenced by lower FID scores, as well as being able to effectively scale to higher resolutions than the training data while retaining detail.

Keyword: dynamic

Data-driven abstractions via adaptive refinements and a Kantorovich metric [extended version]

  • Authors: Adrien Banse, Licio Romao, Alessandro Abate, Raphaël M. Jungers
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.17618
  • Pdf link: https://arxiv.org/pdf/2303.17618
  • Abstract
    We introduce an adaptive refinement procedure for smart, and scalable abstraction of dynamical systems. Our technique relies on partitioning the state space depending on the observation of future outputs. However, this knowledge is dynamically constructed in an adaptive, asymmetric way. In order to learn the optimal structure, we define a Kantorovich-inspired metric between Markov chains, and we use it as a loss function. Our technique is prone to data-driven frameworks, but not restricted to. We also study properties of the above mentioned metric between Markov chains, which we believe could be of application for wider purpose. We propose an algorithm to approximate it, and we show that our method yields a much better computational complexity than using classical linear programming techniques.

Data-Driven Covariance Steering Control Design

  • Authors: Joshua Pilipovsky, Panagiotis Tsiotras
  • Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2303.17675
  • Pdf link: https://arxiv.org/pdf/2303.17675
  • Abstract
    This paper studies the problem of steering the distribution of a linear time-invariant system from an initial normal distribution to a terminal normal distribution under no knowledge of the system dynamics. This data-driven control framework uses data collected from the input and the state and utilizes the seminal work by Willems et al. to construct a data-based parametrization of the mean and the covariance control problems. These problems are then solved to optimality as convex programs using standard techniques from the covariance control literature. We also discuss the equivalence of indirect and direct data-driven covariance steering designs, as well as a regularized version of the problem that provides a balance between the two. We illustrate the proposed framework through a set of randomized trials on a double integrator system and show that the results match up almost exactly with the corresponding model-based method in the noiseless case. We then analyze the robustness properties of the data-free and data-driven covariance steering methods and demonstrate the trade-offs between performance and optimality among these methods in the presence of data corrupted with exogenous noise.

Implementation and (Inverse Modified) Error Analysis for implicitly-templated ODE-nets

  • Authors: Aiqing Zhu, Tom Bertalan, Beibei Zhu, Yifa Tang, Ioannis G. Kevrekidis
  • Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17824
  • Pdf link: https://arxiv.org/pdf/2303.17824
  • Abstract
    We focus on learning hidden dynamics from data using ODE-nets templated on implicit numerical initial value problem solvers. First, we perform Inverse Modified error analysis of the ODE-nets using unrolled implicit schemes for ease of interpretation. It is shown that training an ODE-net using an unrolled implicit scheme returns a close approximation of an Inverse Modified Differential Equation (IMDE). In addition, we establish a theoretical basis for hyper-parameter selection when training such ODE-nets, whereas current strategies usually treat numerical integration of ODE-nets as a black box. We thus formulate an adaptive algorithm which monitors the level of error and adapts the number of (unrolled) implicit solution iterations during the training process, so that the error of the unrolled approximation is less than the current learning loss. This helps accelerate training, while maintaining accuracy. Several numerical experiments are performed to demonstrate the advantages of the proposed algorithm compared to nonadaptive unrollings, and validate the theoretical analysis. We also note that this approach naturally allows for incorporating partially known physical terms in the equations, giving rise to what is termed ``gray box" identification.

MapFormer: Boosting Change Detection by Using Pre-change Information

  • Authors: Maximilian Bernhard, Niklas Strauß, Matthias Schubert
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17859
  • Pdf link: https://arxiv.org/pdf/2303.17859
  • Abstract
    Change detection in remote sensing imagery is essential for a variety of applications such as urban planning, disaster management, and climate research. However, existing methods for identifying semantically changed areas overlook the availability of semantic information in the form of existing maps describing features of the earth's surface. In this paper, we leverage this information for change detection in bi-temporal images. We show that the simple integration of the additional information via concatenation of latent representations suffices to significantly outperform state-of-the-art change detection methods. Motivated by this observation, we propose the new task of Conditional Change Detection, where pre-change semantic information is used as input next to bi-temporal images. To fully exploit the extra information, we propose MapFormer, a novel architecture based on a multi-modal feature fusion module that allows for feature processing conditioned on the available semantic information. We further employ a supervised, cross-modal contrastive loss to guide the learning of visual representations. Our approach outperforms existing change detection methods by an absolute 11.7% and 18.4% in terms of binary change IoU on DynamicEarthNet and HRSCD, respectively. Furthermore, we demonstrate the robustness of our approach to the quality of the pre-change semantic information and the absence pre-change imagery. The code will be made publicly available.

The Blockchain Imitation Game

  • Authors: Kaihua Qin, Stefanos Chaliasos, Liyi Zhou, Benjamin Livshits, Dawn Song, Arthur Gervais
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.17877
  • Pdf link: https://arxiv.org/pdf/2303.17877
  • Abstract
    The use of blockchains for automated and adversarial trading has become commonplace. However, due to the transparent nature of blockchains, an adversary is able to observe any pending, not-yet-mined transactions, along with their execution logic. This transparency further enables a new type of adversary, which copies and front-runs profitable pending transactions in real-time, yielding significant financial gains. Shedding light on such "copy-paste" malpractice, this paper introduces the Blockchain Imitation Game and proposes a generalized imitation attack methodology called Ape. Leveraging dynamic program analysis techniques, Ape supports the automatic synthesis of adversarial smart contracts. Over a timeframe of one year (1st of August, 2021 to 31st of July, 2022), Ape could have yielded 148.96M USD in profit on Ethereum, and 42.70M USD on BNB Smart Chain (BSC). Not only as a malicious attack, we further show the potential of transaction and contract imitation as a defensive strategy. Within one year, we find that Ape could have successfully imitated 13 and 22 known Decentralized Finance (DeFi) attacks on Ethereum and BSC, respectively. Our findings suggest that blockchain validators can imitate attacks in real-time to prevent intrusions in DeFi.

Information-Theoretic Study of Time-Domain Energy-Saving Techniques in Radio Access

  • Authors: François Rottenberg
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.17898
  • Pdf link: https://arxiv.org/pdf/2303.17898
  • Abstract
    Reduction of wireless network energy consumption is becoming increasingly important to reduce environmental footprint and operational costs. A key concept to achieve it is the use of lean transmission techniques that dynamically (de)activate hardware resources as a function of the load. In this paper, we propose a pioneering information-theoretic study of time-domain energy-saving techniques, relying on a practical hardware power consumption model of sleep and active modes. By minimizing the power consumption under a quality of service constraint (rate, latency), we propose simple yet powerful techniques to allocate power and choose which resources to activate or to put in sleep mode. Power consumption scaling regimes are identified. We show that a ``rush-to-sleep" approach (maximal power in fewest symbols followed by sleep) is only optimal in a high noise regime. It is shown how consumption can be made linear with the load and achieve massive energy reduction (factor of 10) at low-to-medium load. The trade-off between energy efficiency (EE) and spectral efficiency (SE) is also characterized, followed by a multi-user study based on time division multiple access (TDMA).

How accurate does Newton have to be?

  • Authors: Carl Christian Kjelgaard Mikkelsen, Lorién López-Villellas, Pablo García-Risueño
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.17911
  • Pdf link: https://arxiv.org/pdf/2303.17911
  • Abstract
    We analyze the convergence of quasi-Newton methods in exact and finite precision arithmetic. In particular, we derive an upper bound for the stagnation level and we show that any sufficiently exact quasi-Newton method will converge quadratically until stagnation. In the absence of sufficient accuracy, we are likely to retain rapid linear convergence. We confirm our analysis by computing square roots and solving bond constraint equations in the context of molecular dynamics. We briefly discuss implications for parallel solvers.

Learning-Based Optimal Control with Performance Guarantees for Unknown Systems with Latent States

  • Authors: Robert Lefringhausen, Supitsana Srithasan, Armin Lederer, Sandra Hirche
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2303.17963
  • Pdf link: https://arxiv.org/pdf/2303.17963
  • Abstract
    As control engineering methods are applied to increasingly complex systems, data-driven approaches for system identification appear as a promising alternative to physics-based modeling. While many of these approaches rely on the availability of state measurements, the states of a complex system are often not directly measurable. It may then be necessary to jointly estimate the dynamics and a latent state, making it considerably more challenging to design controllers with performance guarantees. This paper proposes a novel method for the computation of an optimal input trajectory for unknown nonlinear systems with latent states. Probabilistic performance guarantees are derived for the resulting input trajectory, and an approach to validate the performance of arbitrary control laws is presented. The effectiveness of the proposed method is demonstrated in a numerical simulation.

VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization

  • Authors: Bingfan Zhu, Yanchao Yang, Xulong Wang, Youyi Zheng, Leonidas Guibas
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17968
  • Pdf link: https://arxiv.org/pdf/2303.17968
  • Abstract
    We propose VDN-NeRF, a method to train neural radiance fields (NeRFs) for better geometry under non-Lambertian surface and dynamic lighting conditions that cause significant variation in the radiance of a point when viewed from different angles. Instead of explicitly modeling the underlying factors that result in the view-dependent phenomenon, which could be complex yet not inclusive, we develop a simple and effective technique that normalizes the view-dependence by distilling invariant information already encoded in the learned NeRFs. We then jointly train NeRFs for view synthesis with view-dependence normalization to attain quality geometry. Our experiments show that even though shape-radiance ambiguity is inevitable, the proposed normalization can minimize its effect on geometry, which essentially aligns the optimal capacity needed for explaining view-dependent variations. Our method applies to various baselines and significantly improves geometry without changing the volume rendering pipeline, even if the data is captured under a moving light source. Code is available at: https://github.com/BoifZ/VDN-NeRF.

Upside down: affordable high-performance motion platform

  • Authors: Nayan Man Singh Pradhan, Patrick Frank, An Mo, Alexander Badri-Spröwitz
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17974
  • Pdf link: https://arxiv.org/pdf/2303.17974
  • Abstract
    Parallel robots are capable of high-speed manipulation and have become essential tools in the industry. The proximal placement of their motors and the low weight of their end effectors make them ideal for generating highly dynamic motion. Therefore, parallel robots can be adopted for motion platform designs, as long as end effector loads are low. Traditional motion platforms can be large and powerful to generate multiple g acceleration. However, these designs tend to be expensive and large. Similar but smaller motion platforms feature a small work range with reduced degrees of freedom (DoFs) and a limited payload. Here we seek a medium-sized affordable parallel robot capable of powerful and high-speed 6-DoF motion in a comparably large workspace. This work explores the concept of a quadruped robot flipped upside-down, with the motion platform fixed between its feet. In particular, we exploit the high-power dynamic brushless actuation and the four-leg redundancy when moving the motion platform. We characterize the resulting motion platform by tracking sinusoidal and circular trajectories with varying loads. Dynamic motions in 6 DoFs up to 10 Hz and ~10 mm amplitude are possible when moving a mass of 300 grams. We demonstrate single-axis end-effector translations up to ~20 mm at 10 Hz for higher loads of 1.2 kg. The motion platform can be replicated easily by 3D printing and off-the-shelf components. All motion platform-related hardware and the custom-written software required to replicate are open-source.

Models as Agents: Optimizing Multi-Step Predictions of Interactive Local Models in Model-Based Multi-Agent Reinforcement Learning

  • Authors: Zifan Wu, Chao Yu, Chen Chen, Jianye Hao, Hankz Hankui Zhuo
  • Subjects: Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2303.17984
  • Pdf link: https://arxiv.org/pdf/2303.17984
  • Abstract
    Research in model-based reinforcement learning has made significant progress in recent years. Compared to single-agent settings, the exponential dimension growth of the joint state-action space in multi-agent systems dramatically increases the complexity of the environment dynamics, which makes it infeasible to learn an accurate global model and thus necessitates the use of agent-wise local models. However, during multi-step model rollouts, the prediction of one local model can affect the predictions of other local models in the next step. As a result, local prediction errors can be propagated to other localities and eventually give rise to considerably large global errors. Furthermore, since the models are generally used to predict for multiple steps, simply minimizing one-step prediction errors regardless of their long-term effect on other models may further aggravate the propagation of local errors. To this end, we propose Models as AGents (MAG), a multi-agent model optimization framework that reversely treats the local models as multi-step decision making agents and the current policies as the dynamics during the model rollout process. In this way, the local models are able to consider the multi-step mutual affect between each other before making predictions. Theoretically, we show that the objective of MAG is approximately equivalent to maximizing a lower bound of the true environment return. Experiments on the challenging StarCraft II benchmark demonstrate the effectiveness of MAG.

Neural Network Entropy (NNetEn): EEG Signals and Chaotic Time Series Separation by Entropy Features, Python Package for NNetEn Calculation

  • Authors: Andrei Velichko, Maksim Belyaev, Yuriy Izotov, Murugappan Murugappan, Hanif Heidari
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Chaotic Dynamics (nlin.CD)
  • Arxiv link: https://arxiv.org/abs/2303.17995
  • Pdf link: https://arxiv.org/pdf/2303.17995
  • Abstract
    Entropy measures are effective features for time series classification problems. Traditional entropy measures, such as Shannon entropy, use probability distribution function. However, for the effective separation of time series, new entropy estimation methods are required to characterize the chaotic dynamic of the system. Our concept of Neural Network Entropy (NNetEn) is based on the classification of special datasets (MNIST-10 and SARS-CoV-2-RBV1) in relation to the entropy of the time series recorded in the reservoir of the LogNNet neural network. NNetEn estimates the chaotic dynamics of time series in an original way. Based on the NNetEn algorithm, we propose two new classification metrics: R2 Efficiency and Pearson Efficiency. The efficiency of NNetEn is verified on separation of two chaotic time series of sine mapping using dispersion analysis (ANOVA). For two close dynamic time series (r = 1.1918 and r = 1.2243), the F-ratio has reached the value of 124 and reflects high efficiency of the introduced method in classification problems. The EEG signal classification for healthy persons and patients with Alzheimer disease illustrates the practical application of the NNetEn features. Our computations demonstrate the synergistic effect of increasing classification accuracy when applying traditional entropy measures and the NNetEn concept conjointly. An implementation of the algorithms in Python is presented.

A flatness-based saturated controller design for a quadcopter with experimental validation

  • Authors: Huu-Thinh Do, Franco Blanchini, Ionela Prodan
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.18021
  • Pdf link: https://arxiv.org/pdf/2303.18021
  • Abstract
    Using the properties of differential flatness, a controllable system, such as a quadcoper model, may be transformed into a linear equivalent system via a coordinate change and an input mapping. This is a straightforward advantage for the quadcopter's controller design and its real-time implementation. However, one significant hindrance is that, while the dynamics become linear in the new coordinates (the flat output space), the input constraints become convoluted. This paper addresses an explicit pre-stabilization based control scheme which handles the input constraints for the quadcopter in the flat output space with a saturation component. The system's stability is shown to hold by Lyapunov-stability arguments. Moreover, the practical viability of the proposed method is validated both in simulation and experiments over a nano-drone platform. Hence, the flatness-based saturated controller not only ensures stability and constraints satisfaction, but also requires very low computational effort, allowing for embedded implementations.

Deep neural operator for learning transient response of interpenetrating phase composites subject to dynamic loading

  • Authors: Minglei Lu, Ali Mohammadi, Zhaoxu Meng, Xuhui Meng, Gang Li, Zhen Li
  • Subjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)
  • Arxiv link: https://arxiv.org/abs/2303.18055
  • Pdf link: https://arxiv.org/pdf/2303.18055
  • Abstract
    Additive manufacturing has been recognized as an industrial technological revolution for manufacturing, which allows fabrication of materials with complex three-dimensional (3D) structures directly from computer-aided design models. The mechanical properties of interpenetrating phase composites (IPCs), especially response to dynamic loading, highly depend on their 3D structures. In general, for each specified structural design, it could take hours or days to perform either finite element analysis (FEA) or experiments to test the mechanical response of IPCs to a given dynamic load. To accelerate the physics-based prediction of mechanical properties of IPCs for various structural designs, we employ a deep neural operator (DNO) to learn the transient response of IPCs under dynamic loading as surrogate of physics-based FEA models. We consider a 3D IPC beam formed by two metals with a ratio of Young's modulus of 2.7, wherein random blocks of constituent materials are used to demonstrate the generality and robustness of the DNO model. To obtain FEA results of IPC properties, 5,000 random time-dependent strain loads generated by a Gaussian process kennel are applied to the 3D IPC beam, and the reaction forces and stress fields inside the IPC beam under various loading are collected. Subsequently, the DNO model is trained using an incremental learning method with sequence-to-sequence training implemented in JAX, leading to a 100X speedup compared to widely used vanilla deep operator network models. After an offline training, the DNO model can act as surrogate of physics-based FEA to predict the transient mechanical response in terms of reaction force and stress distribution of the IPCs to various strain loads in one second at an accuracy of 98%. Also, the learned operator is able to provide extended prediction of the IPC beam subject to longer random strain loads at a reasonably well accuracy.

Inferring networks from time series: a neural approach

  • Authors: Thomas Gaskin, Grigorios A. Pavliotis, Mark Girolami
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2303.18059
  • Pdf link: https://arxiv.org/pdf/2303.18059
  • Abstract
    Network structures underlie the dynamics of many complex phenomena, from gene regulation and foodwebs to power grids and social media. Yet, as they often cannot be observed directly, their connectivities must be inferred from observations of their emergent dynamics. In this work we present a powerful and fast computational method to infer large network adjacency matrices from time series data using a neural network. Using a neural network provides uncertainty quantification on the prediction in a manner that reflects both the non-convexity of the inference problem as well as the noise on the data. This is useful since network inference problems are typically underdetermined, and a feature that has hitherto been lacking from network inference methods. We demonstrate our method's capabilities by inferring line failure locations in the British power grid from observations of its response to a power cut. Since the problem is underdetermined, many classical statistical tools (e.g. regression) will not be straightforwardly applicable. Our method, in contrast, provides probability densities on each edge, allowing the use of hypothesis testing to make meaningful probabilistic statements about the location of the power cut. We also demonstrate our method's ability to learn an entire cost matrix for a non-linear model from a dataset of economic activity in Greater London. Our method outperforms OLS regression on noisy data in terms of both speed and prediction accuracy, and scales as $N^2$ where OLS is cubic. Since our technique is not specifically engineered for network inference, it represents a general parameter estimation scheme that is applicable to any parameter dimension.

Dictionary-based Online-adaptive Structure-preserving Model Order Reduction for Parametric Hamiltonian Systems

  • Authors: Robin Herkert, Patrick Buchfink, Bernard Haasdonk
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.18072
  • Pdf link: https://arxiv.org/pdf/2303.18072
  • Abstract
    Classical model order reduction (MOR) for parametric problems may become computationally inefficient due to large sizes of the required projection bases, especially for problems with slowly decaying Kolmogorov n-widths. Additionally, Hamiltonian structure of dynamical systems may be available and should be preserved during the reduction. In the current presentation, we address these two aspects by proposing a corresponding dictionary-based, online-adaptive MOR approach. The method requires dictionaries for the state-variable, non-linearities and discrete empirical interpolation (DEIM) points. During the online simulation, local basis extensions/simplifications are performed in an online-efficient way, i.e. the runtime complexity of basis modifications and online simulation of the reduced models do not depend on the full state dimension. Experiments on a linear wave equation and a non-linear Sine-Gordon example demonstrate the efficiency of the approach.

Robust LSTM-based Vehicle Velocity Observer for Regular and Near-limits Applications

  • Authors: Agapius Bou Ghosn, Marcus Nolte, Philip Polack, Arnaud de La Fortelle, Markus Maurer
  • Subjects: Robotics (cs.RO); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.18094
  • Pdf link: https://arxiv.org/pdf/2303.18094
  • Abstract
    Accurate velocity estimation is key to vehicle control. While the literature describes how model-based and learning-based observers are able to estimate a vehicle's velocity in normal driving conditions, the challenge remains to estimate the velocity in near-limits maneuvers while using only conventional in-car sensors. In this paper, we introduce a novel neural network architecture based on Long Short-Term Memory (LSTM) networks to accurately estimate the vehicle's velocity in different driving conditions, including maneuvers at the limits of handling. The approach has been tested on real vehicle data and it provides more accurate estimations than state-of-the-art model-based and learning-based methods, for both regular and near-limits driving scenarios. Our approach is robust since the performance of the state-of-the-art observers deteriorates with higher dynamics, while our method adapts to different maneuvers, providing accurate estimations even at the vehicle's limits of handling.

Status Updating under Partial Battery Knowledge in Energy Harvesting IoT Networks

  • Authors: Mohammad Hatami, Markus Leinonen, Marian Codreanu
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.18104
  • Pdf link: https://arxiv.org/pdf/2303.18104
  • Abstract
    We study status updating under inexact knowledge about the battery levels of the energy harvesting sensors in an IoT network, where users make on-demand requests to a cache-enabled edge node to send updates about various random processes monitored by the sensors. To serve the request(s), the edge node either commands the corresponding sensor to send an update or uses the aged data from the cache. We find a control policy that minimizes the average on-demand AoI subject to per-slot energy harvesting constraints under partial battery knowledge at the edge node. Namely, the edge node is informed about sensors' battery levels only via received status updates, leading to uncertainty about the battery levels for the decision-making. We model the problem as a POMDP which is then reformulated as an equivalent belief-MDP. The belief-MDP in its original form is difficult to solve due to the infinite belief space. However, by exploiting a specific pattern in the evolution of beliefs, we truncate the belief space and develop a dynamic programming algorithm to obtain an optimal policy. Moreover, we address a multi-sensor setup under a transmission limitation for which we develop an asymptotically optimal algorithm. Simulation results assess the performance of the proposed methods.

Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction

  • Authors: Delin Qu, Yizhen Lao, Zhigang Wang, Dong Wang, Bin Zhao, Xuelong Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.18125
  • Pdf link: https://arxiv.org/pdf/2303.18125
  • Abstract
    This paper addresses the problem of rolling shutter correction in complex nonlinear and dynamic scenes with extreme occlusion. Existing methods suffer from two main drawbacks. Firstly, they face challenges in estimating the accurate correction field due to the uniform velocity assumption, leading to significant image correction errors under complex motion. Secondly, the drastic occlusion in dynamic scenes prevents current solutions from achieving better image quality because of the inherent difficulties in aligning and aggregating multiple frames. To tackle these challenges, we model the curvilinear trajectory of pixels analytically and propose a geometry-based Quadratic Rolling Shutter (QRS) motion solver, which precisely estimates the high-order correction field of individual pixel. Besides, to reconstruct high-quality occlusion frames in dynamic scenes, we present a 3D video architecture that effectively Aligns and Aggregates multi-frame context, namely, RSA^2-Net. We evaluate our method across a broad range of cameras and video sequences, demonstrating its significant superiority. Specifically, our method surpasses the state-of-the-arts by +4.98, +0.77, and +4.33 of PSNR on Carla-RS, Fastec-RS, and BS-RSC datasets, respectively.

Age of Incorrect Information With Hybrid ARQ Under a Resource Constraint for N-ary Symmetric Markov Sources

  • Authors: Konstantinos Bountrogiannis, Anthony Ephremides, Panagiotis Tsakalides, George Tzagkarakis
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2303.18128
  • Pdf link: https://arxiv.org/pdf/2303.18128
  • Abstract
    The Age of Incorrect Information (AoII) is a recently proposed metric for real-time remote monitoring systems. In particular, AoII measures the time the information at the monitor is incorrect, weighted by the magnitude of this incorrectness, thereby combining the notions of freshness and distortion. This paper addresses the definition of an AoII-optimal transmission policy in a discrete-time communication scheme with a resource constraint and a hybrid automatic repeat request (HARQ) protocol. Considering an $N$-ary symmetric Markov source, the problem is formulated as an infinite-horizon average-cost constrained Markov decision process (CMDP). The source model is characterized by the cardinality of the state space and the probability of staying at the same state. Interestingly, it is proved that under some conditions, the optimal transmission policy is to never transmit. This reveals that there exists a region of the source dynamics where communication is inadequate in reducing the AoII. Elsewhere, there exists an optimal transmission policy, which is a randomized mixture of two discrete threshold-based policies that randomize at one state. The optimal threshold and the randomization component are derived analytically. Numerical results illustrate the impact of source dynamics, channel conditions, and the resource constraint on the average AoII.

BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection

  • Authors: Sihao Hu, Zhen Zhang, Bingqiao Luo, Shengliang Lu, Bingsheng He, Ling Liu
  • Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.18138
  • Pdf link: https://arxiv.org/pdf/2303.18138
  • Abstract
    As various forms of fraud proliferate on Ethereum, it is imperative to safeguard against these malicious activities to protect susceptible users from being victimized. While current studies solely rely on graph-based fraud detection approaches, it is argued that they may not be well-suited for dealing with highly repetitive, skew-distributed and heterogeneous Ethereum transactions. To address these challenges, we propose BERT4ETH, a universal pre-trained Transformer encoder that serves as an account representation extractor for detecting various fraud behaviors on Ethereum. BERT4ETH features the superior modeling capability of Transformer to capture the dynamic sequential patterns inherent in Ethereum transactions, and addresses the challenges of pre-training a BERT model for Ethereum with three practical and effective strategies, namely repetitiveness reduction, skew alleviation and heterogeneity modeling. Our empirical evaluation demonstrates that BERT4ETH outperforms state-of-the-art methods with significant enhancements in terms of the phishing account detection and de-anonymization tasks. The code for BERT4ETH is available at: https://github.com/git-disl/BERT4ETH.

Adaptive Model Prediction Control-Based Multi-Terrain Trajectory Tracking Framework for Mobile Spherical Robots

  • Authors: Yifan Liu, Tao Hu, Xiaoqing Guan, Yixu Wang, Bixuan Zhang, You Wang, Guang Li
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.18186
  • Pdf link: https://arxiv.org/pdf/2303.18186
  • Abstract
    Owing to uncertainties in both kinematics and dynamics, the current trajectory tracking framework for mobile robots like spherical robots cannot function effectively on multiple terrains, especially uneven and unknown ones. Since this is a prerequisite for robots to execute tasks in the wild, we enhance our previous hierarchical trajectory tracking framework to handle this issue. First, a modified adaptive RBF neural network (RBFNN) is proposed to represent all uncertainties in kinodynamics. Then the Lyapunov function is utilized to design its adaptive law, and a variable step-size algorithm is employed in the weights update procedure to accelerate convergence and improve stability. Hence, a new adaptive model prediction control-based instruction planner (VAN-MPC) is proposed. Without modifying the bottom controllers, we finally develop the multi-terrain trajectory tracking framework by employing the new instruction planner VAN-MPC. The practical experiments demonstrate its effectiveness and robustness.

Learning Spiking Neural Systems with the Event-Driven Forward-Forward Process

  • Authors: Alexander Ororbia
  • Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.18187
  • Pdf link: https://arxiv.org/pdf/2303.18187
  • Abstract
    We develop a novel credit assignment algorithm for information processing with spiking neurons without requiring feedback synapses. Specifically, we propose an event-driven generalization of the forward-forward and the predictive forward-forward learning processes for a spiking neural system that iteratively processes sensory input over a stimulus window. As a result, the recurrent circuit computes the membrane potential of each neuron in each layer as a function of local bottom-up, top-down, and lateral signals, facilitating a dynamic, layer-wise parallel form of neural computation. Unlike spiking neural coding, which relies on feedback synapses to adjust neural electrical activity, our model operates purely online and forward in time, offering a promising way to learn distributed representations of sensory data patterns with temporal spike signals. Notably, our experimental results on several pattern datasets demonstrate that the even-driven forward-forward (ED-FF) framework works well for training a dynamic recurrent spiking system capable of both classification and reconstruction.

Assessing Language Model Deployment with Risk Cards

  • Authors: Leon Derczynski, Hannah Rose Kirk, Vidhisha Balachandran, Sachin Kumar, Yulia Tsvetkov, M. R. Leiser, Saif Mohammad
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2303.18190
  • Pdf link: https://arxiv.org/pdf/2303.18190
  • Abstract
    This paper introduces RiskCards, a framework for structured assessment and documentation of risks associated with an application of language models. As with all language, text generated by language models can be harmful, or used to bring about harm. Automating language generation adds both an element of scale and also more subtle or emergent undesirable tendencies to the generated text. Prior work establishes a wide variety of language model harms to many different actors: existing taxonomies identify categories of harms posed by language models; benchmarks establish automated tests of these harms; and documentation standards for models, tasks and datasets encourage transparent reporting. However, there is no risk-centric framework for documenting the complexity of a landscape in which some risks are shared across models and contexts, while others are specific, and where certain conditions may be required for risks to manifest as harms. RiskCards address this methodological gap by providing a generic framework for assessing the use of a given language model in a given scenario. Each RiskCard makes clear the routes for the risk to manifest harm, their placement in harm taxonomies, and example prompt-output pairs. While RiskCards are designed to be open-source, dynamic and participatory, we present a "starter set" of RiskCards taken from a broad literature survey, each of which details a concrete risk presentation. Language model RiskCards initiate a community knowledge base which permits the mapping of risks and harms to a specific model or its application scenario, ultimately contributing to a better, safer and shared understanding of the risk landscape.

Attributed Stream Hypergraphs: temporal modeling of node-attributed high-order interactions

  • Authors: Andrea Failla, Salvatore Citraro, Giulio Rossetti
  • Subjects: Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2303.18226
  • Pdf link: https://arxiv.org/pdf/2303.18226
  • Abstract
    Recent advances in network science have resulted in two distinct research directions aimed at augmenting and enhancing representations for complex networks. The first direction, that of high-order modeling, aims to focus on connectivity between sets of nodes rather than pairs, whereas the second one, that of feature-rich augmentation, incorporates into a network all those elements that are driven by information which is external to the structure, like node properties or the flow of time. This paper proposes a novel toolbox, that of Attributed Stream Hypergraphs (ASHs), unifying both high-order and feature-rich elements for representing, mining, and analyzing complex networks. Applied to social network analysis, ASHs can characterize complex social phenomena along topological, dynamic and attributive elements. Experiments on real-world face-to-face and online social media interactions highlight that ASHs can easily allow for the analyses, among others, of high-order groups' homophily, nodes' homophily with respect to the hyperedges in which nodes participate, and time-respecting paths between hyperedges.

3D Human Pose Estimation via Intuitive Physics

  • Authors: Shashank Tripathi, Lea Müller, Chun-Hao P. Huang, Omid Taheri, Michael J. Black, Dimitrios Tzionas
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2303.18246
  • Pdf link: https://arxiv.org/pdf/2303.18246
  • Abstract
    Estimating 3D humans from images often produces implausible bodies that lean, float, or penetrate the floor. Such methods ignore the fact that bodies are typically supported by the scene. A physics engine can be used to enforce physical plausibility, but these are not differentiable, rely on unrealistic proxy bodies, and are difficult to integrate into existing optimization and learning frameworks. In contrast, we exploit novel intuitive-physics (IP) terms that can be inferred from a 3D SMPL body interacting with the scene. Inspired by biomechanics, we infer the pressure heatmap on the body, the Center of Pressure (CoP) from the heatmap, and the SMPL body's Center of Mass (CoM). With these, we develop IPMAN, to estimate a 3D body from a color image in a "stable" configuration by encouraging plausible floor contact and overlapping CoP and CoM. Our IP terms are intuitive, easy to implement, fast to compute, differentiable, and can be integrated into existing optimization and regression methods. We evaluate IPMAN on standard datasets and MoYo, a new dataset with synchronized multi-view images, ground-truth 3D bodies with complex poses, body-floor contact, CoM and pressure. IPMAN produces more plausible results than the state of the art, improving accuracy for static poses, while not hurting dynamic ones. Code and data are available for research at https://ipman.is.tue.mpg.de.

Adaptive Sparse Pairwise Loss for Object Re-Identification

  • Authors: Xiao Zhou, Yujie Zhong, Zhen Cheng, Fan Liang, Lin Ma
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.18247
  • Pdf link: https://arxiv.org/pdf/2303.18247
  • Abstract
    Object re-identification (ReID) aims to find instances with the same identity as the given probe from a large gallery. Pairwise losses play an important role in training a strong ReID network. Existing pairwise losses densely exploit each instance as an anchor and sample its triplets in a mini-batch. This dense sampling mechanism inevitably introduces positive pairs that share few visual similarities, which can be harmful to the training. To address this problem, we propose a novel loss paradigm termed Sparse Pairwise (SP) loss that only leverages few appropriate pairs for each class in a mini-batch, and empirically demonstrate that it is sufficient for the ReID tasks. Based on the proposed loss framework, we propose an adaptive positive mining strategy that can dynamically adapt to diverse intra-class variations. Extensive experiments show that SP loss and its adaptive variant AdaSP loss outperform other pairwise losses, and achieve state-of-the-art performance across several ReID benchmarks. Code is available at https://github.com/Astaxanthin/AdaSP.

New submissions for Fri, 17 Mar 23

Keyword: pruning

There is no result

Keyword: neural\ architecture\ search

There is no result

Keyword: 3d object detection

Among Us: Adversarially Robust Collaborative Perception by Consensus

  • Authors: Yiming Li, Qi Fang, Jiamu Bai, Siheng Chen, Felix Juefei-Xu, Chen Feng
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2303.09495
  • Pdf link: https://arxiv.org/pdf/2303.09495
  • Abstract
    Multiple robots could perceive a scene (e.g., detect objects) collaboratively better than individuals, although easily suffer from adversarial attacks when using deep learning. This could be addressed by the adversarial defense, but its training requires the often-unknown attacking mechanism. Differently, we propose ROBOSAC, a novel sampling-based defense strategy generalizable to unseen attackers. Our key idea is that collaborative perception should lead to consensus rather than dissensus in results compared to individual perception. This leads to our hypothesize-and-verify framework: perception results with and without collaboration from a random subset of teammates are compared until reaching a consensus. In such a framework, more teammates in the sampled subset often entail better perception performance but require longer sampling time to reject potential attackers. Thus, we derive how many sampling trials are needed to ensure the desired size of an attacker-free subset, or equivalently, the maximum size of such a subset that we can successfully sample within a given number of trials. We validate our method on the task of collaborative 3D object detection in autonomous driving scenarios.

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

  • Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09551
  • Pdf link: https://arxiv.org/pdf/2303.09551
  • Abstract
    3D scene understanding plays a vital role in vision-based autonomous driving. While most existing methods focus on 3D object detection, they have difficulty describing real-world objects of arbitrary shapes and infinite classes. Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial 2D-3D attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To obtain dense occupancy prediction, we design a pipeline to generate dense occupancy ground truth without expansive occupancy annotations. Specifically, we fuse multi-frame LiDAR scans of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense occupancy labels. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our method. Code and dataset are available at https://github.com/weiyithu/SurroundOcc

Keyword: voxel

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

  • Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09551
  • Pdf link: https://arxiv.org/pdf/2303.09551
  • Abstract
    3D scene understanding plays a vital role in vision-based autonomous driving. While most existing methods focus on 3D object detection, they have difficulty describing real-world objects of arbitrary shapes and infinite classes. Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial 2D-3D attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To obtain dense occupancy prediction, we design a pipeline to generate dense occupancy ground truth without expansive occupancy annotations. Specifically, we fuse multi-frame LiDAR scans of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense occupancy labels. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our method. Code and dataset are available at https://github.com/weiyithu/SurroundOcc

Keyword: lidar

SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments

  • Authors: Yudi Dai (1), Yitai Lin (1), Xiping Lin (2), Chenglu Wen (1), Lan Xu (2), Hongwei Yi (3), Siqi Shen (1), Yuexin Ma (2), Cheng Wang (1) ((1) Xiamen University, China, (2) ShanghaiTech University, China, (3) Max Planck Institute for Intelligent Systems, Germany)
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09095
  • Pdf link: https://arxiv.org/pdf/2303.09095
  • Abstract
    We present SLOPER4D, a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction in the wild. Employing a head-mounted device integrated with a LiDAR and camera, we record 12 human subjects' activities over 10 diverse urban scenes from an egocentric view. Frame-wise annotations for 2D key points, 3D pose parameters, and global translations are provided, together with reconstructed scene point clouds. To obtain accurate 3D ground truth in such large dynamic scenes, we propose a joint optimization method to fit local SMPL meshes to the scene and fine-tune the camera calibration during dynamic motions frame by frame, resulting in plausible and scene-natural 3D human poses. Eventually, SLOPER4D consists of 15 sequences of human motions, each of which has a trajectory length of more than 200 meters (up to 1,300 meters) and covers an area of more than 2,000 $m^2$ (up to 13,000 $m^2$), including more than 100K LiDAR frames, 300k video frames, and 500K IMU-based motion frames. With SLOPER4D, we provide a detailed and thorough analysis of two critical tasks, including camera-based 3D HPE and LiDAR-based 3D HPE in urban environments, and benchmark a new task, GHPE. The in-depth analysis demonstrates SLOPER4D poses significant challenges to existing methods and produces great research opportunities. The dataset and code are released at \url{this http URL}

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

  • Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.09551
  • Pdf link: https://arxiv.org/pdf/2303.09551
  • Abstract
    3D scene understanding plays a vital role in vision-based autonomous driving. While most existing methods focus on 3D object detection, they have difficulty describing real-world objects of arbitrary shapes and infinite classes. Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. We first extract multi-scale features for each image and adopt spatial 2D-3D attention to lift them to the 3D volume space. Then we apply 3D convolutions to progressively upsample the volume features and impose supervision on multiple levels. To obtain dense occupancy prediction, we design a pipeline to generate dense occupancy ground truth without expansive occupancy annotations. Specifically, we fuse multi-frame LiDAR scans of dynamic objects and static scenes separately. Then we adopt Poisson Reconstruction to fill the holes and voxelize the mesh to get dense occupancy labels. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our method. Code and dataset are available at https://github.com/weiyithu/SurroundOcc

New submissions for Fri, 31 Mar 23

Keyword: efficient

Machine learning-based spin structure detection

  • Authors: Isaac Labrie-Boulay, Thomas Brian Winkler, Daniel Franzen, Alena Romanova, Hans Fangohr, Mathias Kläui
  • Subjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET); Data Analysis, Statistics and Probability (physics.data-an)
  • Arxiv link: https://arxiv.org/abs/2303.16905
  • Pdf link: https://arxiv.org/pdf/2303.16905
  • Abstract
    One of the most important magnetic spin structure is the topologically stabilised skyrmion quasi-particle. Its interesting physical properties make them candidates for memory and efficient neuromorphic computation schemes. For the device operation, detection of the position, shape, and size of skyrmions is required and magnetic imaging is typically employed. A frequently used technique is magneto-optical Kerr microscopy where depending on the samples material composition, temperature, material growing procedures, etc., the measurements suffer from noise, low-contrast, intensity gradients, or other optical artifacts. Conventional image analysis packages require manual treatment, and a more automatic solution is required. We report a convolutional neural network specifically designed for segmentation problems to detect the position and shape of skyrmions in our measurements. The network is tuned using selected techniques to optimize predictions and in particular the number of detected classes is found to govern the performance. The results of this study shows that a well-trained network is a viable method of automating data pre-processing in magnetic microscopy. The approach is easily extendable to other spin structures and other magnetic imaging methods.

Optimizing Reconfigurable Intelligent Surfaces for Short Transmissions: How Detailed Configurations can be Afforded?

  • Authors: Anders Enqvist, Özlem Tuğfe Demir, Cicek Cavdar, Emil Björnson
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.16913
  • Pdf link: https://arxiv.org/pdf/2303.16913
  • Abstract
    In this paper, we examine how to minimize the total energy consumption of a user equipment (UE) when it transmits a finite-sized data payload of a given length. The receiving base station (BS) controls a reconfigurable intelligent surface (RIS) that can be utilized to improve the channel conditions, but only if additional pilot signals are transmitted to configure the RIS. The challenge is that the pilot resources spent on configuring the RIS increase the energy consumption, especially when small payloads are transmitted, so it must be balanced against the energy savings during data transmission. We derive a formula for the energy consumption, taking both the pilot and data transmission power into account. It also includes the effects of imperfect channel state information, the use of phase-shifts with finite resolution at the RIS, and the passive circuit energy consumption. We also consider how dividing the RIS into subarrays consisting of multiple RIS elements using the same reflection coefficient can shorten the pilot length. In particular, the pilot power and subarray size are tuned to the payload length to minimize the energy consumption while maintaining parts of the aperture gain. Our analytical results show that, for a given geometry and transmission payload length, there exists a unique energy-minimizing subarray size and pilot power. For small payloads and when the channel conditions between the BS and UE are favorable compared to the path to the RIS, the energy consumption is minimized using subarrays with many elements and low pilot transmission power. On the other hand, when the channel conditions to the RIS are better and the data payloads are large, it is preferable to use fewer elements per subarray, potentially configuring each element individually and transmitting the pilot signals with additional power.

T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals

  • Authors: James Giroux, Martin Bouchard, Robert Laganiere
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16940
  • Pdf link: https://arxiv.org/pdf/2303.16940
  • Abstract
    Object detection utilizing Frequency Modulated Continous Wave radar is becoming increasingly popular in the field of autonomous systems. Radar does not possess the same drawbacks seen by other emission-based sensors such as LiDAR, primarily the degradation or loss of return signals due to weather conditions such as rain or snow. However, radar does possess traits that make it unsuitable for standard emission-based deep learning representations such as point clouds. Radar point clouds tend to be sparse and therefore information extraction is not efficient. To overcome this, more traditional digital signal processing pipelines were adapted to form inputs residing directly in the frequency domain via Fast Fourier Transforms. Commonly, three transformations were used to form Range-Azimuth-Doppler cubes in which deep learning algorithms could perform object detection. This too has drawbacks, namely the pre-processing costs associated with performing multiple Fourier Transforms and normalization. We explore the possibility of operating on raw radar inputs from analog to digital converters via the utilization of complex transformation layers. Moreover, we introduce hierarchical Swin Vision transformers to the field of radar object detection and show their capability to operate on inputs varying in pre-processing, along with different radar configurations, i.e. relatively low and high numbers of transmitters and receivers, while obtaining on par or better results than the state-of-the-art.

Concise QBF Encodings for Games on a Grid (extended version)

  • Authors: Irfansha Shaik, Jaco van de Pol
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16949
  • Pdf link: https://arxiv.org/pdf/2303.16949
  • Abstract
    Encoding 2-player games in QBF correctly and efficiently is challenging and error-prone. To enable concise specifications and uniform encodings of games played on grid boards, like Tic-Tac-Toe, Connect-4, Domineering, Pursuer-Evader and Breakthrough, we introduce Board-game Domain Definition Language (BDDL), inspired by the success of PDDL in the planning domain. We provide an efficient translation from BDDL into QBF, encoding the existence of a winning strategy of bounded depth. Our lifted encoding treats board positions symbolically and allows concise definitions of conditions, effects and winning configurations, relative to symbolic board positions. The size of the encoding grows linearly in the input model and the considered depth. To show the feasibility of such a generic approach, we use QBF solvers to compute the critical depths of winning strategies for instances of several known games. For several games, our work provides the first QBF encoding. Unlike plan validation in SAT-based planning, validating QBF-based winning strategies is difficult. We show how to validate winning strategies using QBF certificates and interactive game play.

Fairness-Aware Data Valuation for Supervised Learning

  • Authors: José Pombal, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro
  • Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2303.16963
  • Pdf link: https://arxiv.org/pdf/2303.16963
  • Abstract
    Data valuation is a ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how training instances may influence both performance and fairness of ML models. Thus, we propose Fairness-Aware Data vauatiOn (FADO), a data valuation framework that can be used to incorporate fairness concerns into a series of ML-related tasks (e.g., data pre-processing, exploratory data analysis, active learning). We propose an entropy-based data valuation metric suited to address our two-pronged goal of maximizing both performance and fairness, which is more computationally efficient than existing metrics. We then show how FADO can be applied as the basis for unfairness mitigation pre-processing techniques. Our methods achieve promising results -- up to a 40 p.p. improvement in fairness at a less than 1 p.p. loss in performance compared to a baseline -- and promote fairness in a data-centric way, where a deeper understanding of data quality takes center stage.

Computationally efficient sampling methods for sparsity promoting hierarchical Bayesian models

  • Authors: Daniela Calvetti, Erkki Somersalo
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.16988
  • Pdf link: https://arxiv.org/pdf/2303.16988
  • Abstract
    Bayesian hierarchical models have been demonstrated to provide efficient algorithms for finding sparse solutions to ill-posed inverse problems. The models comprise typically a conditionally Gaussian prior model for the unknown, augmented by a hyperprior model for the variances. A widely used choice for the hyperprior is a member of the family of generalized gamma distributions. Most of the work in the literature has concentrated on numerical approximation of the maximum a posteriori (MAP) estimates, and less attention has been paid on sampling methods or other means for uncertainty quantification. Sampling from the hierarchical models is challenging mainly for two reasons: The hierarchical models are typically high-dimensional, thus suffering from the curse of dimensionality, and the strong correlation between the unknown of interest and its variance can make sampling rather inefficient. This work addresses mainly the first one of these obstacles. By using a novel reparametrization, it is shown how the posterior distribution can be transformed into one dominated by a Gaussian white noise, allowing sampling by using the preconditioned Crank-Nicholson (pCN) scheme that has been shown to be efficient for sampling from distributions dominated by a Gaussian component. Furthermore, a novel idea for speeding up the pCN in a special case is developed, and the question of how strongly the hierarchical models are concentrated on sparse solutions is addressed in light of a computed example.

The G-invariant graph Laplacian

  • Authors: Eitan Rosen, Yoel Shkolnisky
  • Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2303.17001
  • Pdf link: https://arxiv.org/pdf/2303.17001
  • Abstract
    Graph Laplacian based algorithms for data lying on a manifold have been proven effective for tasks such as dimensionality reduction, clustering, and denoising. In this work, we consider data sets whose data point not only lie on a manifold, but are also closed under the action of a continuous group. An example of such data set is volumes that line on a low dimensional manifold, where each volume may be rotated in three-dimensional space. We introduce the G-invariant graph Laplacian that generalizes the graph Laplacian by accounting for the action of the group on the data set. We show that like the standard graph Laplacian, the G-invariant graph Laplacian converges to the Laplace-Beltrami operator on the data manifold, but with a significantly improved convergence rate. Furthermore, we show that the eigenfunctions of the G-invariant graph Laplacian admit the form of tensor products between the group elements and eigenvectors of certain matrices, which can be computed efficiently using FFT-type algorithms. We demonstrate our construction and its advantages on the problem of filtering data on a noisy manifold closed under the action of the special unitary group SU(2).

The secret of immersion: actor driven camera movement generation for auto-cinematography

  • Authors: Xinyi Wu, Haohong Wang, Aggelos K. Katsaggelos
  • Subjects: Multimedia (cs.MM); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17041
  • Pdf link: https://arxiv.org/pdf/2303.17041
  • Abstract
    Immersion plays a vital role when designing cinematic creations, yet the difficulty in immersive shooting prevents designers to create satisfactory outputs. In this work, we analyze the specific components that contribute to cinematographic immersion considering spatial, emotional, and aesthetic level, while these components are then combined into a high-level evaluation mechanism. Guided by such a immersion mechanism, we propose a GAN-based camera control system that is able to generate actor-driven camera movements in the 3D virtual environment to obtain immersive film sequences. The proposed encoder-decoder architecture in the generation flow transfers character motion into camera trajectory conditioned on an emotion factor. This ensures spatial and emotional immersion by performing actor-camera synchronization physically and psychologically. The emotional immersion is further strengthened by incorporating regularization that controls camera shakiness for expressing different mental statuses. To achieve aesthetic immersion, we make effort to improve aesthetic frame compositions by modifying the synthesized camera trajectory. Based on a self-supervised adjustor, the adjusted camera placements can project the character to the appropriate on-frame locations following aesthetic rules. The experimental results indicate that our proposed camera control system can efficiently offer immersive cinematic videos, both quantitatively and qualitatively, based on a fine-grained immersive shooting. Live examples are shown in the supplementary video.

Material-agnostic Shaping of Granular Materials with Optimal Transport

  • Authors: Nikhilesh Alatur, Olov Andersson, Roland Siegwart, Lionel Ott
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17047
  • Pdf link: https://arxiv.org/pdf/2303.17047
  • Abstract
    From construction materials, such as sand or asphalt, to kitchen ingredients, like rice, sugar, or salt; the world is full of granular materials. Despite impressive progress in robotic manipulation, manipulating and interacting with granular material remains a challenge due to difficulties in perceiving, representing, modelling, and planning for these variable materials that have complex internal dynamics. While some prior work has looked into estimating or learning accurate dynamics models for granular materials, the literature is still missing a more abstract planning method that can be used for planning manipulation actions for granular materials with unknown material properties. In this work, we leverage tools from optimal transport and connect them to robot motion planning. We propose a heuristics-based sweep planner that does not require knowledge of the material's properties and directly uses a height map representation to generate promising sweeps. These sweeps transform granular material from arbitrary start shapes into arbitrary target shapes. We apply the sweep planner in a fast and reactive feedback loop and avoid the need for model-based planning over multiple time steps. We validate our approach with a large set of simulation and hardware experiments where we show that our method is capable of efficiently solving several complex tasks, including gathering, separating, and shaping of several types of granular materials into different target shapes.

Transductive few-shot adapters for medical image segmentation

  • Authors: Julio Silva-Rodríguez, Jose Dolz, Ismail Ben Ayed
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17051
  • Pdf link: https://arxiv.org/pdf/2303.17051
  • Abstract
    With the recent raise of foundation models in computer vision and NLP, the pretrain-and-adapt strategy, where a large-scale model is fine-tuned on downstream tasks, is gaining popularity. However, traditional fine-tuning approaches may still require significant resources and yield sub-optimal results when the labeled data of the target task is scarce. This is especially the case in clinical settings. To address this challenge, we formalize few-shot efficient fine-tuning (FSEFT), a novel and realistic setting for medical image segmentation. Furthermore, we introduce a novel parameter-efficient fine-tuning strategy tailored to medical image segmentation, with (a) spatial adapter modules that are more appropriate for dense prediction tasks; and (b) a constrained transductive inference, which leverages task-specific prior knowledge. Our comprehensive experiments on a collection of public CT datasets for organ segmentation reveal the limitations of standard fine-tuning methods in few-shot scenarios, point to the potential of vision adapters and transductive inference, and confirm the suitability of foundation models.

A Tensor-based Convolutional Neural Network for Small Dataset Classification

  • Authors: Zhenhua Chen, David Crandall
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2303.17061
  • Pdf link: https://arxiv.org/pdf/2303.17061
  • Abstract
    Inspired by the ConvNets with structured hidden representations, we propose a Tensor-based Neural Network, TCNN. Different from ConvNets, TCNNs are composed of structured neurons rather than scalar neurons, and the basic operation is neuron tensor transformation. Unlike other structured ConvNets, where the part-whole relationships are modeled explicitly, the relationships are learned implicitly in TCNNs. Also, the structured neurons in TCNNs are high-rank tensors rather than vectors or matrices. We compare TCNNs with current popular ConvNets, including ResNets, MobileNets, EfficientNets, RegNets, etc., on CIFAR10, CIFAR100, and Tiny ImageNet. The experiment shows that TCNNs have higher efficiency in terms of parameters. TCNNs also show higher robustness against white-box adversarial attacks on MNIST compared to ConvNets.

Reading Strategies for Graph Visualizations that Wrap Around in Torus Topology

  • Authors: Kun-Ting Chen, Quynh Quang Ngo, Kuno Kurzhals, Kim Marriott, Tim Dwyer, Michael Sedlmair, Daniel Weiskopf
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2303.17066
  • Pdf link: https://arxiv.org/pdf/2303.17066
  • Abstract
    We investigate reading strategies for node-link diagrams that wrap around the boundaries in a flattened torus topology by examining eye tracking data recorded in a previous controlled study. Prior work showed that torus drawing affords greater flexibility in clutter reduction than traditional node-link representations, but impedes link-and-path exploration tasks, while repeating tiles around boundaries aids comprehension. However, it remains unclear what strategies users apply in different wrapping settings. This is important for design implications for future work on more effective wrapped visualizations for network applications, and cyclic data that could benefit from wrapping. We perform visual-exploratory data analysis of gaze data, and conduct statistical tests derived from the patterns identified. Results show distinguishable gaze behaviors, with more visual glances and transitions between areas of interest in the non-replicated layout. Full-context has more successful visual searches than partial-context, but the gaze allocation indicates that the layout could be more space-efficient.

Dependent Task Offloading in Edge Computing Using GNN and Deep Reinforcement Learning

  • Authors: Zequn Cao, Xiaoheng Deng
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.17100
  • Pdf link: https://arxiv.org/pdf/2303.17100
  • Abstract
    Task offloading is a widely used technology in Mobile Edge Computing (MEC), which declines the completion time of user task with the help of resourceful edge servers. Existing works mainly focus on the case that the computation density of a user task is homogenous so that it can be offloaded in full or by percentage. However, various user tasks in real life consist of several inner dependent subtasks, each of which is a minimum execution unit logically. Motivated by this gap, we aim to solve the Dependent Task Offloading (DTO) problem under multi-user multi-edge scenario in this paper. We firstly use Directed Acyclic Graph (DAG) to represent dependent task where nodes indicate subtasks and directed edges indicate dependencies among subtasks. Then we propose a scheme based on Graph Attention Network (GAT) and Deep Reinforcement Learning (DRL) to minimize the makespan of user tasks. To utilize GAT efficiently, we put the training of it on resourceful cloud in unsupervised style due to the numerous data and computation resource requirements. In addition, we design a multi-discrete Action space for DRL algorithm to enhance the applicability of our proposed scheme. Experiments are conducted on broadly distributed synthetic data. The results demonstrate that our proposed approach can be adapted to both simple and complex MEC environments and outperforms other methods.

Deep Generative Model and Its Applications in Efficient Wireless Network Management: A Tutorial and Case Study

  • Authors: Yinqiu Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Abbas Jamalipour
  • Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17114
  • Pdf link: https://arxiv.org/pdf/2303.17114
  • Abstract
    With the phenomenal success of diffusion models and ChatGPT, deep generation models (DGMs) have been experiencing explosive growth from 2022. Not limited to content generation, DGMs are also widely adopted in Internet of Things, Metaverse, and digital twin, due to their outstanding ability to represent complex patterns and generate plausible samples. In this article, we explore the applications of DGMs in a crucial task, i.e., improving the efficiency of wireless network management. Specifically, we firstly overview the generative AI, as well as three representative DGMs. Then, a DGM-empowered framework for wireless network management is proposed, in which we elaborate the issues of the conventional network management approaches, why DGMs can address them efficiently, and the step-by-step workflow for applying DGMs in managing wireless networks. Moreover, we conduct a case study on network economics, using the state-of-the-art DGM model, i.e., diffusion model, to generate effective contracts for incentivizing the mobile AI-Generated Content (AIGC) services. Last but not least, we discuss important open directions for the further research.

Conservation and stability in a discontinuous Galerkin method for the vector invariant spherical shallow water equations

  • Authors: Kieran Ricardo, David Lee, Kenneth Duru
  • Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
  • Arxiv link: https://arxiv.org/abs/2303.17120
  • Pdf link: https://arxiv.org/pdf/2303.17120
  • Abstract
    We develop a novel and efficient discontinuous Galerkin spectral element method (DG-SEM) for the spherical rotating shallow water equations in vector invariant form. We prove that the DG-SEM is energy stable, and discretely conserves mass, vorticity, and linear geostrophic balance on general curvlinear meshes. These theoretical results are possible due to our novel entropy stable numerical DG fluxes for the shallow water equations in vector invariant form. We experimentally verify these results on a cubed sphere mesh. Additionally, we show that our method is robust, that is can be run stably without any dissipation. The entropy stable fluxes are sufficient to control the grid scale noise generated by geostrophic turbulence without the need for artificial stabilisation.

C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation

  • Authors: Nazmul Karim, Niluthpol Chowdhury Mithun, Abhinav Rajvanshi, Han-pang Chiu, Supun Samarasekera, Nazanin Rahnavard
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17132
  • Pdf link: https://arxiv.org/pdf/2303.17132
  • Abstract
    Unsupervised domain adaptation (UDA) approaches focus on adapting models trained on a labeled source domain to an unlabeled target domain. UDA methods have a strong assumption that the source data is accessible during adaptation, which may not be feasible in many real-world scenarios due to privacy concerns and resource constraints of devices. In this regard, source-free domain adaptation (SFDA) excels as access to source data is no longer required during adaptation. Recent state-of-the-art (SOTA) methods on SFDA mostly focus on pseudo-label refinement based self-training which generally suffers from two issues: i) inevitable occurrence of noisy pseudo-labels that could lead to early training time memorization, ii) refinement process requires maintaining a memory bank which creates a significant burden in resource constraint scenarios. To address these concerns, we propose C-SFDA, a curriculum learning aided self-training framework for SFDA that adapts efficiently and reliably to changes across domains based on selective pseudo-labeling. Specifically, we employ a curriculum learning scheme to promote learning from a restricted amount of pseudo labels selected based on their reliabilities. This simple yet effective step successfully prevents label noise propagation during different stages of adaptation and eliminates the need for costly memory-bank based label refinement. Our extensive experimental evaluations on both image recognition and semantic segmentation tasks confirm the effectiveness of our method. C-SFDA is readily applicable to online test-time domain adaptation and also outperforms previous SOTA methods in this task.

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

  • Authors: Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17144
  • Pdf link: https://arxiv.org/pdf/2303.17144
  • Abstract
    Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research. To address this gap, we present DAMO-StreamNet, an optimized framework that combines recent advances from the YOLO series with a comprehensive analysis of spatial and temporal perception mechanisms, delivering a cutting-edge solution. The key innovations of DAMO-StreamNet are: (1) A robust neck structure incorporating deformable convolution, enhancing the receptive field and feature alignment capabilities. (2) A dual-branch structure that integrates short-path semantic features and long-path temporal features, improving motion state prediction accuracy. (3) Logits-level distillation for efficient optimization, aligning the logits of teacher and student networks in semantic space. (4) A real-time forecasting mechanism that updates support frame features with the current frame, ensuring seamless streaming perception during inference. Our experiments demonstrate that DAMO-StreamNet surpasses existing state-of-the-art methods, achieving 37.8% (normal size (600, 960)) and 43.3% (large size (1200, 1920)) sAP without using extra data. This work not only sets a new benchmark for real-time perception but also provides valuable insights for future research. Additionally, DAMO-StreamNet can be applied to various autonomous systems, such as drones and robots, paving the way for real-time perception.

Convergence of the CEM-GMsFEM for compressible flow in highly heterogeneous media

  • Authors: Leonardo A. Poveda, Shubin Fu, Eric T. Chung, Lina Zhao
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.17157
  • Pdf link: https://arxiv.org/pdf/2303.17157
  • Abstract
    This paper presents and analyses a Constraint Energy Minimization Generalized Multiscale Finite Element Method (CEM-GMsFEM) for solving single-phase non-linear compressible flows in highly heterogeneous media. The construction of CEM-GMsFEM hinges on two crucial steps: First, the auxiliary space is constructed by solving local spectral problems, where the basis functions corresponding to small eigenvalues are captured. Then the basis functions are obtained by solving local energy minimization problems over the oversampling domains using the auxiliary space. The basis functions have exponential decay outside the corresponding local oversampling regions. The convergence of the proposed method is provided, and we show that this convergence only depends on the coarse grid size and is independent of the heterogeneities. An online enrichment guided by \emph{a posteriori} error estimator is developed to enhance computational efficiency. Several numerical experiments on a three-dimensional case to confirm the theoretical findings are presented, illustrating the performance of the method and giving efficient and accurate numerical.

Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models

  • Authors: Sifan Long, Zhen Zhao, Junkun Yuan, Zichang Tan, Jiangjiang Liu, Luping Zhou, Shengsheng Wang, Jingdong Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17169
  • Pdf link: https://arxiv.org/pdf/2303.17169
  • Abstract
    Prompt learning has become one of the most efficient paradigms for adapting large pre-trained vision-language models to downstream tasks. Current state-of-the-art methods, like CoOp and ProDA, tend to adopt soft prompts to learn an appropriate prompt for each specific task. Recent CoCoOp further boosts the base-to-new generalization performance via an image-conditional prompt. However, it directly fuses identical image semantics to prompts of different labels and significantly weakens the discrimination among different classes as shown in our experiments. Motivated by this observation, we first propose a class-aware text prompt (CTP) to enrich generated prompts with label-related image information. Unlike CoCoOp, CTP can effectively involve image semantics and avoid introducing extra ambiguities into different prompts. On the other hand, instead of reserving the complete image representations, we propose text-guided feature tuning (TFT) to make the image branch attend to class-related representation. A contrastive loss is employed to align such augmented text and image representations on downstream tasks. In this way, the image-to-text CTP and text-to-image TFT can be mutually promoted to enhance the adaptation of VLMs for downstream tasks. Extensive experiments demonstrate that our method outperforms the existing methods by a significant margin. Especially, compared to CoCoOp, we achieve an average improvement of 4.03% on new classes and 3.19% on harmonic-mean over eleven classification benchmarks.

High-Performance Low-Complexity Hierarchical Frequency Synchronization for Distributed Massive MIMO-OFDMA Systems

  • Authors: Xiao-Yang Wang, Shaoshi Yang, Tian-Hao Yuan, Hou-Yu Zhai, Jianhua Zhang, Lajos Hanzo
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.17188
  • Pdf link: https://arxiv.org/pdf/2303.17188
  • Abstract
    We propose a high-performance yet low-complexity hierarchical frequency synchronization scheme for orthogonal frequency-division multiple-access (OFDMA) aided distributed massive multi-input multi-output (MIMO) systems, where multi-ple carrier frequency offsets (CFOs) have to be estimated in the uplink. To solve this multi-CFO estimation problem efficiently, we classify the active antenna units (AAUs) as the master and the slaves. Then, we split the scheme into two stages. During the first stage the distributed slave AAUs are synchronized with the master AAU, while the user equipment (UE) is synchronized with the closest slave AAU during the second stage. The mean square error (MSE) performance of our scheme is better than that of the representative state-of-the-art baseline schemes, while its computational complexity is substantially lower.

Practical self-supervised continual learning with continual fine-tuning

  • Authors: Chi Ian Tang, Lorena Qendro, Dimitris Spathis, Fahim Kawsar, Cecilia Mascolo, Akhil Mathur
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17235
  • Pdf link: https://arxiv.org/pdf/2303.17235
  • Abstract
    Self-supervised learning (SSL) has shown remarkable performance in computer vision tasks when trained offline. However, in a Continual Learning (CL) scenario where new data is introduced progressively, models still suffer from catastrophic forgetting. Retraining a model from scratch to adapt to newly generated data is time-consuming and inefficient. Previous approaches suggested re-purposing self-supervised objectives with knowledge distillation to mitigate forgetting across tasks, assuming that labels from all tasks are available during fine-tuning. In this paper, we generalize self-supervised continual learning in a practical setting where available labels can be leveraged in any step of the SSL process. With an increasing number of continual tasks, this offers more flexibility in the pre-training and fine-tuning phases. With Kaizen, we introduce a training architecture that is able to mitigate catastrophic forgetting for both the feature extractor and classifier with a carefully designed loss function. By using a set of comprehensive evaluation metrics reflecting different aspects of continual learning, we demonstrated that Kaizen significantly outperforms previous SSL models in competitive vision benchmarks, with up to 16.5% accuracy improvement on split CIFAR-100. Kaizen is able to balance the trade-off between knowledge retention and learning from new data with an end-to-end model, paving the way for practical deployment of continual learning systems.

Simultaneous reconstruction of sound speed and nonlinearity parameter in a paraxial model of vibro-acoustography in frequency domain

  • Authors: Barbara Kaltenbacher ans teresa Rauscher
  • Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
  • Arxiv link: https://arxiv.org/abs/2303.17236
  • Pdf link: https://arxiv.org/pdf/2303.17236
  • Abstract
    In this paper we consider the inverse problem of vibro-acoustography, a technique for enhancing ultrasound imaging by making use of nonlinear effects. It amounts to determining two spatially variable coefficients in a system of PDEs describing propagation of two directed sound beams and the wave resulting from their nonlinear interaction. To justify the use of Newton's method for solving this inverse problem, on one hand we verify well-definedeness and differentiability of the forward operator corresponding to two versions of the PDE model; on the other hand we consider an all-at-once formulation of the inverse problem and prove convergence of Newton's method for its solution.

Computationally efficient predictive control based on ANN state-space model

  • Authors: Jan H. Hoekstra, Bence Cseppentő, Gerben I. Beintema, Maarten Schoukens, Zsolt Kollár, Roland Tóth
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.17305
  • Pdf link: https://arxiv.org/pdf/2303.17305
  • Abstract
    Artificial neural networks (ANN) have been shown to be flexible and effective function estimators for identification of nonlinear state-space models. However, if the resulting models are used directly for nonlinear model predictive control (NMPC), the resulting nonlinear optimization problem is often overly complex due the size of the network, requires the use of high-order observers to track the states of the ANN model, and the overall control scheme exploits little of the structural properties or available autograd tools for these models. In this paper, we propose an efficient approach to auto-convert ANN state-space models to linear parameter-varying (LPV) form and solve predictive control problems by successive solutions of linear model predictive problems, corresponding to quadratic programs (QPs). Furthermore, we show how existing ANN identification methods, such as the SUBNET method that uses a state encoder, can provide efficient implementation of MPCs. The performance of the proposed approach is demonstrated via a simulation study on an unbalanced disc system.

Masked Autoencoders as Image Processors

  • Authors: Huiyu Duan, Wei Shen, Xiongkuo Min, Danyang Tu, Long Teng, Jia Wang, Guangtao Zhai
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17316
  • Pdf link: https://arxiv.org/pdf/2303.17316
  • Abstract
    Transformers have shown significant effectiveness for various vision tasks including both high-level vision and low-level vision. Recently, masked autoencoders (MAE) for feature pre-training have further unleashed the potential of Transformers, leading to state-of-the-art performances on various high-level vision tasks. However, the significance of MAE pre-training on low-level vision tasks has not been sufficiently explored. In this paper, we show that masked autoencoders are also scalable self-supervised learners for image processing tasks. We first present an efficient Transformer model considering both channel attention and shifted-window-based self-attention termed CSformer. Then we develop an effective MAE architecture for image processing (MAEIP) tasks. Extensive experimental results show that with the help of MAEIP pre-training, our proposed CSformer achieves state-of-the-art performance on various image processing tasks, including Gaussian denoising, real image denoising, single-image motion deblurring, defocus deblurring, and image deraining.

Topics in the Haystack: Extracting and Evaluating Topics beyond Coherence

  • Authors: Anton Thielmann, Quentin Seifert, Arik Reuter, Elisabeth Bergherr, Benjamin Säfken
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2303.17324
  • Pdf link: https://arxiv.org/pdf/2303.17324
  • Abstract
    Extracting and identifying latent topics in large text corpora has gained increasing importance in Natural Language Processing (NLP). Most models, whether probabilistic models similar to Latent Dirichlet Allocation (LDA) or neural topic models, follow the same underlying approach of topic interpretability and topic extraction. We propose a method that incorporates a deeper understanding of both sentence and document themes, and goes beyond simply analyzing word frequencies in the data. This allows our model to detect latent topics that may include uncommon words or neologisms, as well as words not present in the documents themselves. Additionally, we propose several new evaluation metrics based on intruder words and similarity measures in the semantic space. We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task. We demonstrate the competitive performance of our method with a large benchmark study, and achieve superior results compared to state-of-the-art topic modeling and document clustering models.

Linear Insertion Deletion Codes in the High-Noise and High-Rate Regimes

  • Authors: Kuan Cheng, Zhengzhong Jin, Xin Li, Zhide Wei, Yu Zheng
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2303.17370
  • Pdf link: https://arxiv.org/pdf/2303.17370
  • Abstract
    This work continues the study of linear error correcting codes against adversarial insertion deletion errors (insdel errors). Previously, the work of Cheng, Guruswami, Haeupler, and Li \cite{CGHL21} showed the existence of asymptotically good linear insdel codes that can correct arbitrarily close to $1$ fraction of errors over some constant size alphabet, or achieve rate arbitrarily close to $1/2$ even over the binary alphabet. As shown in \cite{CGHL21}, these bounds are also the best possible. However, known explicit constructions in \cite{CGHL21}, and subsequent improved constructions by Con, Shpilka, and Tamo \cite{9770830} all fall short of meeting these bounds. Over any constant size alphabet, they can only achieve rate $&lt; 1/8$ or correct $&lt; 1/4$ fraction of errors; over the binary alphabet, they can only achieve rate $&lt; 1/1216$ or correct $&lt; 1/54$ fraction of errors. Apparently, previous techniques face inherent barriers to achieve rate better than $1/4$ or correct more than $1/2$ fraction of errors. In this work we give new constructions of such codes that meet these bounds, namely, asymptotically good linear insdel codes that can correct arbitrarily close to $1$ fraction of errors over some constant size alphabet, and binary asymptotically good linear insdel codes that can achieve rate arbitrarily close to $1/2$.\ All our constructions are efficiently encodable and decodable. Our constructions are based on a novel approach of code concatenation, which embeds the index information implicitly into codewords. This significantly differs from previous techniques and may be of independent interest. Finally, we also prove the existence of linear concatenated insdel codes with parameters that match random linear codes, and propose a conjecture about linear insdel codes.

Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions

  • Authors: Yicheng Luo, Jackie Kay, Edward Grefenstette, Marc Peter Deisenroth
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17396
  • Pdf link: https://arxiv.org/pdf/2303.17396
  • Abstract
    Offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment. Online finetuning of such offline models can further improve performance. But how should we ideally finetune agents obtained from offline RL training? While offline RL algorithms can in principle be used for finetuning, in practice, their online performance improves slowly. In contrast, we show that it is possible to use standard online off-policy algorithms for faster improvement. However, we find this approach may suffer from policy collapse, where the policy undergoes severe performance deterioration during initial online learning. We investigate the issue of policy collapse and how it relates to data diversity, algorithm choices and online replay distribution. Based on these insights, we propose a conservative policy optimization procedure that can achieve stable and sample-efficient online learning from offline pretraining.

An Efficient Mobile Gateway Selection and Discovery Based-Routing Protocol in Heterogeneous LTE-VANET Networks

  • Authors: Driss Abada, Rachid Adrdor, Omar Boutkhoum, Adil Bohouch
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2303.17439
  • Pdf link: https://arxiv.org/pdf/2303.17439
  • Abstract
    Coupling cellular communication networks with vehicular ad hoc networks (VANET) can be a very interesting way out for providing Internet access to vehicles in the road. However, due to the several specific characteristics of VANETs, making an efficient multi-hop routing from vehicular sources to the Internet gateways through Long Term Evolution (LTE) technology is still challenging. In this paper, an Internet mobile gateway selection scheme is proposed to elect more potential vehicles to behave as gateways to Internet in VANETs. Therefore, the discovery and the selection of route to those mobiles gateways is carried out via an efficient multiple metrics-based relay selection mechanism. The objective is to select the more reliable route to the mobile gateways, by reducing the communication overhead and performing seamless handover. The proposed protocol is compared with one recent protocol based on packet delivery ratio, average end-to-end delay and overhead. The results show that the proposed protocol ameliorates significantly the network performance in the contrast of the other protocol.

NN-Copula-CD: A Copula-Guided Interpretable Neural Network for Change Detection in Heterogeneous Remote Sensing Images

  • Authors: Weiming Li, Xueqian Wang, Gang Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2303.17448
  • Pdf link: https://arxiv.org/pdf/2303.17448
  • Abstract
    Change detection (CD) in heterogeneous remote sensing images is a practical and challenging issue for real-life emergencies. In the past decade, the heterogeneous CD problem has significantly benefited from the development of deep neural networks (DNN). However, the data-driven DNNs always perform like a black box where the lack of interpretability limits the trustworthiness and controllability of DNNs in most practical CD applications. As a strong knowledge-driven tool to measure correlation between random variables, Copula theory has been introduced into CD, yet it suffers from non-robust CD performance without manual prior selection for Copula functions. To address the above issues, we propose a knowledge-data-driven heterogeneous CD method (NN-Copula-CD) based on the Copula-guided interpretable neural network. In our NN-Copula-CD, the mathematical characteristics of Copula are designed as the losses to supervise a simple fully connected neural network to learn the correlation between bi-temporal image patches, and then the changed regions are identified via binary classification for the correlation coefficients of all image patch pairs of the bi-temporal images. We conduct in-depth experiments on three datasets with multimodal images (e.g., Optical, SAR, and NIR), where the quantitative results and visualized analysis demonstrate both the effectiveness and interpretability of the proposed NN-Copula-CD.

HMES: A Scalable Human Mobility and Epidemic Simulation System with Fast Intervention Modeling

  • Authors: Haoyu Geng, Guanjie Zheng, Zhengqing Han, Hua Wei, Zhenhui Li
  • Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2303.17464
  • Pdf link: https://arxiv.org/pdf/2303.17464
  • Abstract
    Recently, the world has witnessed the most severe pandemic (COVID-19) in this century. Studies on epidemic prediction and simulation have received increasing attention. However, the current methods suffer from three issues. First, most of the current studies focus on epidemic prediction, which can not provide adequate support for intervention policy making. Second, most of the current interventions are based on population groups rather than fine-grained individuals, which can not make the measures towards the infected people and may cause waste of medical resources. Third, current simulations are not efficient and flexible enough for large-scale complex systems. In this paper, we propose a new epidemic simulation framework called HMES to address the above three challenges. The proposed framework covers a full pipeline of epidemic simulation and enables comprehensive fine-grained control in a large scale. In addition, we conduct experiments on real COVID-19 data. HMES demonstrates more accurate modeling of disease transmission up to 300 million people and up to 3 times acceleration compared to the state-of-the-art methods.

PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation

  • Authors: Qitao Zhao, Ce Zheng, Mengyuan Liu, Pichao Wang, Chen Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17472
  • Pdf link: https://arxiv.org/pdf/2303.17472
  • Abstract
    Recently, transformer-based methods have gained significant success in sequential 2D-to-3D lifting human pose estimation. As a pioneering work, PoseFormer captures spatial relations of human joints in each video frame and human dynamics across frames with cascaded transformer layers and has achieved impressive performance. However, in real scenarios, the performance of PoseFormer and its follow-ups is limited by two factors: (a) The length of the input joint sequence; (b) The quality of 2D joint detection. Existing methods typically apply self-attention to all frames of the input sequence, causing a huge computational burden when the frame number is increased to obtain advanced estimation accuracy, and they are not robust to noise naturally brought by the limited capability of 2D joint detectors. In this paper, we propose PoseFormerV2, which exploits a compact representation of lengthy skeleton sequences in the frequency domain to efficiently scale up the receptive field and boost robustness to noisy 2D joint detection. With minimum modifications to PoseFormer, the proposed method effectively fuses features both in the time domain and frequency domain, enjoying a better speed-accuracy trade-off than its precursor. Extensive experiments on two benchmark datasets (i.e., Human3.6M and MPI-INF-3DHP) demonstrate that the proposed approach significantly outperforms the original PoseFormer and other transformer-based variants. Code is released at \url{https://github.com/QitaoZhao/PoseFormerV2}.

Efficient distributed representations beyond negative sampling

  • Authors: Lorenzo Dall'Amico, Enrico Maria Belliardo
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2303.17475
  • Pdf link: https://arxiv.org/pdf/2303.17475
  • Abstract
    This article describes an efficient method to learn distributed representations, also known as embeddings. This is accomplished minimizing an objective function similar to the one introduced in the Word2Vec algorithm and later adopted in several works. The optimization computational bottleneck is the calculation of the softmax normalization constants for which a number of operations scaling quadratically with the sample size is required. This complexity is unsuited for large datasets and negative sampling is a popular workaround, allowing one to obtain distributed representations in linear time with respect to the sample size. Negative sampling consists, however, in a change of the loss function and hence solves a different optimization problem from the one originally proposed. Our contribution is to show that the sotfmax normalization constants can be estimated in linear time, allowing us to design an efficient optimization strategy to learn distributed representations. We test our approximation on two popular applications related to word and node embeddings. The results evidence competing performance in terms of accuracy with respect to negative sampling with a remarkably lower computational time.

Teaching contact-rich tasks from visual demonstrations by constraint extraction

  • Authors: Christian Hegeler, Filippo Rozzi, Loris Roveda, Kevin Haninger
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17481
  • Pdf link: https://arxiv.org/pdf/2303.17481
  • Abstract
    Contact-rich manipulation involves kinematic constraints on the task motion, typically with discrete transitions between these constraints during the task. Allowing the robot to detect and reason about these contact constraints can support robust and dynamic manipulation, but how can these contact models be efficiently learned? Purely visual observations are an attractive data source, allowing passive task demonstrations with unmodified objects. Existing approaches for vision-only learning from demonstration are effective in pick-and-place applications and planar tasks. Nevertheless, accuracy/occlusions and unobserved task dynamics can limit their robustness in contact-rich manipulation. To use visual demonstrations for contact-rich robotic tasks, we consider the demonstration of pose trajectories with transitions between holonomic kinematic constraints, first clustering the trajectories into discrete contact modes, then fitting kinematic constraints per each mode. The fit constraints are then used to (i) detect contact online with force/torque measurements and (ii) plan the robot policy with respect to the active constraint. We demonstrate the approach with real experiments, on cabling and rake tasks, showing the approach gives robust manipulation through contact transitions.

Edge Ranking of Graphs in Transportation Networks using a Graph Neural Network (GNN)

  • Authors: Debasish Jana, Sven Malama, Sriram Narasimhan, Ertugrul Taciroglu
  • Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17485
  • Pdf link: https://arxiv.org/pdf/2303.17485
  • Abstract
    Many networks, such as transportation, power, and water distribution, can be represented as graphs. Crucial challenge in graph representations is identifying the importance of graph edges and their influence on overall network efficiency and information flow performance. For example, important edges in a transportation network are those roads that, when affected, will significantly alter the network's overall efficiency. Commonly used approach to finding such important edges is ``edge betweenness centrality'' (EBC), an edge ranking measure to determine the influential edges of the graph based on connectivity and information spread. Computing the EBC utilizing the common Brandes algorithm involves calculating the shortest paths for every node pair, which can be computationally expensive and restrictive, especially for large graphs. Changes in the graph parameters, e.g., in the edge weight or the addition and deletion of nodes or edges, require the recalculation of the EBC. As the main contribution, we propose an approximate method to estimate the EBC using a Graph Neural Network (GNN), a deep learning-based approach. We show that it is computationally efficient compared to the conventional method, especially for large graphs. The proposed method of GNN-based edge ranking is evaluated on several synthetic graphs and a real-world transportation data set. We show that this framework can estimate the approximate edge ranking much faster compared to the conventional method. This approach is inductive, i.e., training and testing are performed on different sets of graphs with varying numbers of nodes and edges. The proposed method is especially suitable for applications on large-scale networks when edge information is desired, for example, in urban infrastructure improvement projects, power, and water network resilience analyses, and optimizing resource allocations in engineering networks.

3D Line Mapping Revisited

  • Authors: Shaohui Liu, Yifan Yu, Rémi Pautrat, Marc Pollefeys, Viktor Larsson
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17504
  • Pdf link: https://arxiv.org/pdf/2303.17504
  • Abstract
    In contrast to sparse keypoints, a handful of line segments can concisely encode the high-level scene layout, as they often delineate the main structural elements. In addition to offering strong geometric cues, they are also omnipresent in urban landscapes and indoor scenes. Despite their apparent advantages, current line-based reconstruction methods are far behind their point-based counterparts. In this paper we aim to close the gap by introducing LIMAP, a library for 3D line mapping that robustly and efficiently creates 3D line maps from multi-view imagery. This is achieved through revisiting the degeneracy problem of line triangulation, carefully crafted scoring and track building, and exploiting structural priors such as line coincidence, parallelism, and orthogonality. Our code integrates seamlessly with existing point-based Structure-from-Motion methods and can leverage their 3D points to further improve the line reconstruction. Furthermore, as a byproduct, the method is able to recover 3D association graphs between lines and points / vanishing points (VPs). In thorough experiments, we show that LIMAP significantly outperforms existing approaches for 3D line mapping. Our robust 3D line maps also open up new research directions. We show two example applications: visual localization and bundle adjustment, where integrating lines alongside points yields the best results. Code is available at https://github.com/cvg/limap.

Sum-of-Squares Lower Bounds for Densest $k$-Subgraph

  • Authors: Chris Jones, Aaron Potechin, Goutham Rajendran, Jeff Xu
  • Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2303.17506
  • Pdf link: https://arxiv.org/pdf/2303.17506
  • Abstract
    Given a graph and an integer $k$, Densest $k$-Subgraph is the algorithmic task of finding the subgraph on $k$ vertices with the maximum number of edges. This is a fundamental problem that has been subject to intense study for decades, with applications spanning a wide variety of fields. The state-of-the-art algorithm is an $O(n^{1/4 + \epsilon})$-factor approximation (for any $\epsilon &gt; 0$) due to Bhaskara et al. [STOC '10]. Moreover, the so-called log-density framework predicts that this is optimal, i.e. it is impossible for an efficient algorithm to achieve an $O(n^{1/4 - \epsilon})$-factor approximation. In the average case, Densest $k$-Subgraph is a prototypical noisy inference task which is conjectured to exhibit a statistical-computational gap. In this work, we provide the strongest evidence yet of hardness for Densest $k$-Subgraph by showing matching lower bounds against the powerful Sum-of-Squares (SoS) algorithm, a meta-algorithm based on convex programming that achieves state-of-art algorithmic guarantees for many optimization and inference problems. For $k \leq n^{\frac{1}{2}}$, we obtain a degree $n^{\delta}$ SoS lower bound for the hard regime as predicted by the log-density framework. To show this, we utilize the modern framework for proving SoS lower bounds on average-case problems pioneered by Barak et al. [FOCS '16]. A key issue is that small denser-than-average subgraphs in the input will greatly affect the value of the candidate pseudoexpectation operator around the subgraph. To handle this challenge, we devise a novel matrix factorization scheme based on the positive minimum vertex separator. We then prove an intersection tradeoff lemma to show that the error terms when using this separator are indeed small.

Learning in Factored Domains with Information-Constrained Visual Representations

  • Authors: Tyler Malloy, Miao Liu, Matthew D. Riemer, Tim Klinger, Gerald Tesauro, Chris R. Sims
  • Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Neurons and Cognition (q-bio.NC)
  • Arxiv link: https://arxiv.org/abs/2303.17508
  • Pdf link: https://arxiv.org/pdf/2303.17508
  • Abstract
    Humans learn quickly even in tasks that contain complex visual information. This is due in part to the efficient formation of compressed representations of visual information, allowing for better generalization and robustness. However, compressed representations alone are insufficient for explaining the high speed of human learning. Reinforcement learning (RL) models that seek to replicate this impressive efficiency may do so through the use of factored representations of tasks. These informationally simplistic representations of tasks are similarly motivated as the use of compressed representations of visual information. Recent studies have connected biological visual perception to disentangled and compressed representations. This raises the question of how humans learn to efficiently represent visual information in a manner useful for learning tasks. In this paper we present a model of human factored representation learning based on an altered form of a $\beta$-Variational Auto-encoder used in a visual learning task. Modelling results demonstrate a trade-off in the informational complexity of model latent dimension spaces, between the speed of learning and the accuracy of reconstructions.

Hybrid Dealiasing of Complex Convolutions

  • Authors: Noel Murasko, John C. Bowman
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.17510
  • Pdf link: https://arxiv.org/pdf/2303.17510
  • Abstract
    Efficient algorithms for computing linear convolutions based on the fast Fourier transform are developed. A hybrid approach is described that combines the conventional practice of explicit dealiasing (explicitly padding the input data with zeros) and implicit dealiasing (mathematically accounting for these zero values). The new approach generalizes implicit dealiasing to arbitrary padding ratios and includes explicit dealiasing as a special case. Unlike existing implementations of implicit dealiasing, hybrid dealiasing tailors its subtransform sizes to the convolution geometry. Multidimensional convolutions are implemented with hybrid dealiasing by decomposing them into lower-dimensional convolutions. Convolutions of complex-valued and Hermitian inputs of equal length are illustrated with pseudocode and implemented in the open-source FFTW++ library. Hybrid dealiasing is shown to outperform explicit dealiasing in one, two, and three dimensions.

Power-Optimal HARQ Protocol for Reliable Free Space Optical Communication

  • Authors: Georgios D. Chondrogiannis, Nikos A. Mitsiou, Nestor D. Chatzidiamantis, Alexandros-Apostolos A. Boulogeorgos, George K. Karagiannidis
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.17512
  • Pdf link: https://arxiv.org/pdf/2303.17512
  • Abstract
    This paper investigates the usage of hybrid automatic repeat request (HARQ) protocols for power-efficient and reliable communications over free space optical (FSO) links. By exploiting the large coherence time of the FSO channel, the proposed transmission schemes combat turbulence-induced fading by retransmitting the failed packets in the same coherence interval. To assess the performance of the presented HARQ technique, we extract a theoretical framework for the outage performance. In more detail, a closed-form expression for the outage probability (OP) is reported and an approximation for the high signal-to-noise ratio (SNR) region is extracted. Building upon the theoretical framework, we formulate a transmission power allocation problem throughout the retransmission rounds. This optimization problem is solved numerically through the use of an iterative algorithm. In addition, the average throughput of the HARQ schemes under consideration is examined. Simulation results validate the theoretical analysis under different turbulence conditions and demonstrate the performance improvement, in terms of both OP and throughput, of the proposed HARQ schemes compared to fixed transmit power HARQ benchmarks.

Nonlinear Approximation with Subsampled Rank-1 Lattices

  • Authors: Felix Bartel, Fabian Taubert
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.17541
  • Pdf link: https://arxiv.org/pdf/2303.17541
  • Abstract
    In this paper we approximate high-dimensional functions $f\colon\mathbb T^d\to\mathbb C$ by sparse trigonometric polynomials based on function evaluations. Recently it was shown that a dimension-incremental sparse Fourier transform (SFT) approach does not require the signal to be exactly sparse and is applicable in this setting. We combine this approach with subsampling techniques for rank-1 lattices. This way our approach benefits from the underlying structure in the sampling points making fast Fourier algorithms applicable whilst achieving the good sampling complexity of random points (logarithmic oversampling). In our analysis we show detection guarantees of the frequencies corresponding to the Fourier coefficients of largest magnitude. In numerical experiments we make a comparison to full rank-1 lattices and uniformly random points to confirm our findings.

Active User Identification in Fast Fading Massive Random Access Channels

  • Authors: Jyotish Robin, Elza Erkip
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.17543
  • Pdf link: https://arxiv.org/pdf/2303.17543
  • Abstract
    Reliable and prompt identification of active users is critical for enabling random access in massive machine-to-machine type networks which typically operate within stringent access delay and energy constraints. In this paper, an energy efficient active user identification protocol is envisioned in which the active users simultaneously transmit On-Off Keying (OOK) modulated preambles whereas the base station uses non-coherent detection to avoid the channel estimation overheads. The minimum number of channel-uses required for active user identification in the asymptotic regime of total number of users $\ell$ when the number of active devices k scales as $k = \Theta(1)$ is characterized along with an achievability scheme relying on the equivalence of activity detection to a group testing problem. A practical scheme for active user identification based on a belief propagation strategy is also proposed and its performance is compared against the theoretical bounds.

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

  • Authors: Chenpng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2303.17550
  • Pdf link: https://arxiv.org/pdf/2303.17550
  • Abstract
    While recent research has made significant progress in speech-driven talking face generation, the quality of the generated video still lags behind that of real recordings. One reason for this is the use of handcrafted intermediate representations like facial landmarks and 3DMM coefficients, which are designed based on human knowledge and are insufficient to precisely describe facial movements. Additionally, these methods require an external pretrained model for extracting these representations, whose performance sets an upper bound on talking face generation. To address these limitations, we propose a novel method called DAE-Talker that leverages data-driven latent representations obtained from a diffusion autoencoder (DAE). DAE contains an image encoder that encodes an image into a latent vector and a DDIM image decoder that reconstructs the image from it. We train our DAE on talking face video frames and then extract their latent representations as the training target for a Conformer-based speech2latent model. This allows DAE-Talker to synthesize full video frames and produce natural head movements that align with the content of speech, rather than relying on a predetermined head pose from a template video. We also introduce pose modelling in speech2latent for pose controllability. Additionally, we propose a novel method for generating continuous video frames with the DDIM image decoder trained on individual frames, eliminating the need for modelling the joint distribution of consecutive frames directly. Our experiments show that DAE-Talker outperforms existing popular methods in lip-sync, video fidelity, and pose naturalness. We also conduct ablation studies to analyze the effectiveness of the proposed techniques and demonstrate the pose controllability of DAE-Talker.

DDP: Diffusion Model for Dense Visual Prediction

  • Authors: Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17559
  • Pdf link: https://arxiv.org/pdf/2303.17559
  • Abstract
    We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task compared to the specialist counterparts. For example, semantic segmentation (83.9 mIoU on Cityscapes), BEV map segmentation (70.6 mIoU on nuScenes), and depth estimation (0.05 REL on KITTI). We hope that our approach will serve as a solid baseline and facilitate future research

Using AI to Measure Parkinson's Disease Severity at Home

  • Authors: Md Saiful Islam, Wasifur Rahman, Abdelrahman Abdelkader, Phillip T. Yang, Sangwu Lee, Jamie L. Adams, Ruth B. Schneider, E. Ray Dorsey, Ehsan Hoque
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17573
  • Pdf link: https://arxiv.org/pdf/2303.17573
  • Abstract
    We present an artificial intelligence system to remotely assess the motor performance of individuals with Parkinson's disease (PD). Participants performed a motor task (i.e., tapping fingers) in front of a webcam, and data from 250 global participants were rated by three expert neurologists following the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS). The neurologists' ratings were highly reliable, with an intra-class correlation coefficient (ICC) of 0.88. We developed computer algorithms to obtain objective measurements that align with the MDS-UPDRS guideline and are strongly correlated with the neurologists' ratings. Our machine learning model trained on these measures outperformed an MDS-UPDRS certified rater, with a mean absolute error (MAE) of 0.59 compared to the rater's MAE of 0.79. However, the model performed slightly worse than the expert neurologists (0.53 MAE). The methodology can be replicated for similar motor tasks, providing the possibility of evaluating individuals with PD and other movement disorders remotely, objectively, and in areas with limited access to neurological care.

Human-Robot Interaction using VAHR: Virtual Assistant, Human, and Robots in the Loop

  • Authors: Ahmad Amine, Mostafa Aldilati, Hadi Hasan, Noel Maalouf, Imad H. Elhajj
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17582
  • Pdf link: https://arxiv.org/pdf/2303.17582
  • Abstract
    Robots have become ubiquitous tools in various industries and households, highlighting the importance of human-robot interaction (\textbf{HRI}). This has increased the need for easy and accessible communication between humans and robots. Recent research has focused on the intersection of virtual assistant technology, such as Amazon's Alexa, with robots and its effect on HRI. This paper presents the Virtual Assistant, Human, and Robots in the loop (VAHR) system, which utilizes bidirectional communication to control multiple robots through Alexa. VAHR's performance was evaluated through a human-subjects experiment, comparing objective and subjective metrics of traditional keyboard and mouse interfaces to VAHR. The results showed that VAHR required 41% less Robot Attention Demand and ensured 91% more Fan-out time compared to the standard method. Additionally, VAHR led to a 62.5% improvement in multi-tasking, highlighting the potential for efficient human-robot interaction in physically- and mentally-demanding scenarios. However, subjective metrics revealed a need for human operators to build confidence and trust with this new method of operation.

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models

  • Authors: Eric Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, Humphrey Shi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17591
  • Pdf link: https://arxiv.org/pdf/2303.17591
  • Abstract
    The unlearning problem of deep learning models, once primarily an academic concern, has become a prevalent issue in the industry. The significant advances in text-to-image generation techniques have prompted global discussions on privacy, copyright, and safety, as numerous unauthorized personal IDs, content, artistic creations, and potentially harmful materials have been learned by these models and later utilized to generate and distribute uncontrolled content. To address this challenge, we propose \textbf{Forget-Me-Not}, an efficient and low-cost solution designed to safely remove specified IDs, objects, or styles from a well-configured text-to-image model in as little as 30 seconds, without impairing its ability to generate other content. Alongside our method, we introduce the \textbf{Memorization Score (M-Score)} and \textbf{ConceptBench} to measure the models' capacity to generate general concepts, grouped into three primary categories: ID, object, and style. Using M-Score and ConceptBench, we demonstrate that Forget-Me-Not can effectively eliminate targeted concepts while maintaining the model's performance on other concepts. Furthermore, Forget-Me-Not offers two practical extensions: a) removal of potentially harmful or NSFW content, and b) enhancement of model accuracy, inclusion and diversity through \textbf{concept correction and disentanglement}. It can also be adapted as a lightweight model patch for Stable Diffusion, allowing for concept manipulation and convenient distribution. To encourage future research in this critical area and promote the development of safe and inclusive generative models, we will open-source our code and ConceptBench at \href{https://github.com/SHI-Labs/Forget-Me-Not}{https://github.com/SHI-Labs/Forget-Me-Not}.

MobileInst: Video Instance Segmentation on the Mobile

  • Authors: Renhong Zhang, Tianheng Cheng, Shusheng Yang, Haoyi Jiang, Shuai Zhang, Jiancheng Lyu, Xin Li, Xiaowen Ying, Dashan Gao, Wenyu Liu, Xinggang Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17594
  • Pdf link: https://arxiv.org/pdf/2303.17594
  • Abstract
    Although recent approaches aiming for video instance segmentation have achieved promising results, it is still difficult to employ those approaches for real-world applications on mobile devices, which mainly suffer from (1) heavy computation and memory cost and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. Firstly, MobileInst adopts a mobile vision transformer to extract multi-level semantic features and presents an efficient query-based dual-transformer instance decoder for mask kernels and a semantic-enhanced mask decoder to generate instance segmentation per frame. Secondly, MobileInst exploits simple yet effective kernel reuse and kernel association to track objects for video instance segmentation. Further, we propose temporal query passing to enhance the tracking ability for kernels. We conduct experiments on COCO and YouTube-VIS datasets to demonstrate the superiority of MobileInst and evaluate the inference latency on a mobile CPU core of Qualcomm Snapdragon-778G, without other methods of acceleration. On the COCO dataset, MobileInst achieves 30.5 mask AP and 176 ms on the mobile CPU, which reduces the latency by 50% compared to the previous SOTA. For video instance segmentation, MobileInst achieves 35.0 AP on YouTube-VIS 2019 and 30.1 AP on YouTube-VIS 2021. Code will be available to facilitate real-world applications and future research.

Token Merging for Fast Stable Diffusion

  • Authors: Daniel Bolya, Judy Hoffman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17604
  • Pdf link: https://arxiv.org/pdf/2303.17604
  • Abstract
    The landscape of image generation has been forever changed by open vocabulary diffusion models. However, at their core these models use transformers, which makes generation slow. Better implementations to increase the throughput of these transformers have emerged, but they still evaluate the entire model. In this paper, we instead speed up diffusion models by exploiting natural redundancy in generated images by merging redundant tokens. After making some diffusion-specific improvements to Token Merging (ToMe), our ToMe for Stable Diffusion can reduce the number of tokens in an existing Stable Diffusion model by up to 60% while still producing high quality images without any extra training. In the process, we speed up image generation by up to 2x and reduce memory consumption by up to 5.6x. Furthermore, this speed-up stacks with efficient implementations such as xFormers, minimally impacting quality while being up to 5.4x faster for large images. Code is available at https://github.com/dbolya/tomesd.

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

  • Authors: Xuanyao Chen, Zhijian Liu, Haotian Tang, Li Yi, Hang Zhao, Song Han
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17605
  • Pdf link: https://arxiv.org/pdf/2303.17605
  • Abstract
    High-resolution images enable neural networks to learn richer visual representations. However, this improved performance comes at the cost of growing computational complexity, hindering their usage in latency-sensitive applications. As not all pixels are equal, skipping computations for less-important regions offers a simple and effective measure to reduce the computation. This, however, is hard to be translated into actual speedup for CNNs since it breaks the regularity of the dense convolution workload. In this paper, we introduce SparseViT that revisits activation sparsity for recent window-based vision transformers (ViTs). As window attentions are naturally batched over blocks, actual speedup with window activation pruning becomes possible: i.e., ~50% latency reduction with 60% sparsity. Different layers should be assigned with different pruning ratios due to their diverse sensitivities and computational costs. We introduce sparsity-aware adaptation and apply the evolutionary search to efficiently find the optimal layerwise sparsity configuration within the vast search space. SparseViT achieves speedups of 1.5x, 1.4x, and 1.3x compared to its dense counterpart in monocular 3D object detection, 2D instance segmentation, and 2D semantic segmentation, respectively, with negligible to no loss of accuracy.

Keyword: faster

Urgency-aware Routing in Single Origin-destination Itineraries through Artificial Currencies

  • Authors: Leonardo Pedroso, W.P.M.H. Heemels, Mauro Salazar
  • Subjects: Systems and Control (eess.SY); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2303.16945
  • Pdf link: https://arxiv.org/pdf/2303.16945
  • Abstract
    Within mobility systems, the presence of self-interested users can lead to aggregate routing patterns that are far from the societal optimum which could be achieved by centrally controlling the users' choices. In this paper, we design a fair incentive mechanism to steer the selfish behavior of the users to align with the societally optimal aggregate routing. The proposed mechanism is based on an artificial currency that cannot be traded or bought, but only spent or received when traveling. Specifically, we consider a parallel-arc network with a single origin and destination node within a repeated game setting whereby each user chooses from one of the available arcs to reach their destination on a daily basis. In this framework, taking faster routes comes at a cost, whereas taking slower routes is incentivized by a reward. The users are thus playing against their future selves when choosing their present actions. To capture this complex behavior, we assume the users to be rational and to minimize an urgency-weighted combination of their immediate and future discomfort. To design the optimal pricing, we first derive a closed-form expression for the best individual response strategy. Second, we formulate the pricing design problem for each arc to achieve the societally optimal aggregate flows, and reformulate it so that it can be solved with gradient-free optimization methods. Our numerical simulations show that it is possible to achieve a near-optimal routing whilst significantly reducing the users' perceived discomfort when compared to a centralized optimal but urgency-unaware policy.

PopSparse: Accelerated block sparse matrix multiplication on IPU

  • Authors: Zhiyi Li, Douglas Orr, Valeriu Ohan, Godfrey Da costa, Tom Murray, Adam Sanders, Deniz Beker, Dominic Masters
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16999
  • Pdf link: https://arxiv.org/pdf/2303.16999
  • Abstract
    Reducing the computational cost of running large scale neural networks using sparsity has attracted great attention in the deep learning community. While much success has been achieved in reducing FLOP and parameter counts while maintaining acceptable task performance, achieving actual speed improvements has typically been much more difficult, particularly on general purpose accelerators (GPAs) such as NVIDIA GPUs using low precision number formats. In this work we introduce PopSparse, a library that enables fast sparse operations on Graphcore IPUs by leveraging both the unique hardware characteristics of IPUs as well as any block structure defined in the data. We target two different types of sparsity: static, where the sparsity pattern is fixed at compile-time; and dynamic, where it can change each time the model is run. We present benchmark results for matrix multiplication for both of these modes on IPU with a range of block sizes, matrix sizes and densities. Results indicate that the PopSparse implementations are faster than dense matrix multiplications on IPU at a range of sparsity levels with large matrix size and block size. Furthermore, static sparsity in general outperforms dynamic sparsity. While previous work on GPAs has shown speedups only for very high sparsity (typically 99% and above), the present work demonstrates that our static sparse implementation outperforms equivalent dense calculations in FP16 at lower sparsity (around 90%).

Overcoming Challenges to Continuous Integration in HPC

  • Authors: Todd Gamblin, Daniel S. Katz
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.17034
  • Pdf link: https://arxiv.org/pdf/2303.17034
  • Abstract
    Continuous integration (CI) has become a ubiquitous practice in modern software development, with major code hosting services offering free automation on popular platforms. CI offers major benefits, as it enables detecting bugs in code prior to committing changes. While high-performance computing (HPC) research relies heavily on software, HPC machines are not considered "common" platforms. This presents several challenges that hinder the adoption of CI in HPC environments, making it difficult to maintain bug-free HPC projects, and resulting in adverse effects on the research community. In this article, we explore the challenges that impede HPC CI, such as hardware diversity, security, isolation, administrative policies, and non-standard authentication, environments, and job submission mechanisms. We propose several solutions that could enhance the quality of HPC software and the experience of developers. Implementing these solutions would require significant changes at HPC centers, but if these changes are made, it would ultimately enable faster and better science.

ACM with Overlapping Partitions: Implementation and Periodicity Analysis

  • Authors: Anthony O'Dea
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.17069
  • Pdf link: https://arxiv.org/pdf/2303.17069
  • Abstract
    The Arnold Cat Map (ACM) is a popular chaotic map used in image encryption. Chaotic maps are known for their sensitivity to initial conditions and their ability to mix, or rearrange, pixels. However, ACM is periodic, and the period is relatively short. This periodicity decreases the effective key space for a cryptosystem. Further, ACM can only be performed on square matrices. For non-square images, this issue can be solved by performing ACM on multiple square partitions of the image. If these partitions overlap, the periodicity will greatly increase. The resulting system will be referred to as overlapping ACM or OACM. This paper will cover the implementation and periodicity analysis for these overlapping systems, which previous papers involving similar overlapping block partitions did not. Viewing OACM as a scan as opposed to a map allows for faster implementation and period analysis.

TreePiece: Faster Semantic Parsing via Tree Tokenization

  • Authors: Sid Wang, Akshat Shrivastava, Sasha Livshits
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17161
  • Pdf link: https://arxiv.org/pdf/2303.17161
  • Abstract
    Autoregressive (AR) encoder-decoder neural networks have proved successful in many NLP problems, including Semantic Parsing -- a task that translates natural language to machine-readable parse trees. However, the sequential prediction process of AR models can be slow. To accelerate AR for semantic parsing, we introduce a new technique called TreePiece that tokenizes a parse tree into subtrees and generates one subtree per decoding step. On TopV2 benchmark, TreePiece shows 4.6 times faster decoding speed than standard AR, and comparable speed but significantly higher accuracy compared to Non-Autoregressive (NAR).

DPP-based Client Selection for Federated Learning with Non-IID Data

  • Authors: Yuxuan Zhang, Chao Xu, Howard H. Yang, Xijun Wang, Tony Q. S. Quek
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17358
  • Pdf link: https://arxiv.org/pdf/2303.17358
  • Abstract
    This paper proposes a client selection (CS) method to tackle the communication bottleneck of federated learning (FL) while concurrently coping with FL's data heterogeneity issue. Specifically, we first analyze the effect of CS in FL and show that FL training can be accelerated by adequately choosing participants to diversify the training dataset in each round of training. Based on this, we leverage data profiling and determinantal point process (DPP) sampling techniques to develop an algorithm termed Federated Learning with DPP-based Participant Selection (FL-DP$^3$S). This algorithm effectively diversifies the participants' datasets in each round of training while preserving their data privacy. We conduct extensive experiments to examine the efficacy of our proposed method. The results show that our scheme attains a faster convergence rate, as well as a smaller communication overhead than several baselines.

Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions

  • Authors: Yicheng Luo, Jackie Kay, Edward Grefenstette, Marc Peter Deisenroth
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17396
  • Pdf link: https://arxiv.org/pdf/2303.17396
  • Abstract
    Offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment. Online finetuning of such offline models can further improve performance. But how should we ideally finetune agents obtained from offline RL training? While offline RL algorithms can in principle be used for finetuning, in practice, their online performance improves slowly. In contrast, we show that it is possible to use standard online off-policy algorithms for faster improvement. However, we find this approach may suffer from policy collapse, where the policy undergoes severe performance deterioration during initial online learning. We investigate the issue of policy collapse and how it relates to data diversity, algorithm choices and online replay distribution. Based on these insights, we propose a conservative policy optimization procedure that can achieve stable and sample-efficient online learning from offline pretraining.

Edge Ranking of Graphs in Transportation Networks using a Graph Neural Network (GNN)

  • Authors: Debasish Jana, Sven Malama, Sriram Narasimhan, Ertugrul Taciroglu
  • Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17485
  • Pdf link: https://arxiv.org/pdf/2303.17485
  • Abstract
    Many networks, such as transportation, power, and water distribution, can be represented as graphs. Crucial challenge in graph representations is identifying the importance of graph edges and their influence on overall network efficiency and information flow performance. For example, important edges in a transportation network are those roads that, when affected, will significantly alter the network's overall efficiency. Commonly used approach to finding such important edges is ``edge betweenness centrality'' (EBC), an edge ranking measure to determine the influential edges of the graph based on connectivity and information spread. Computing the EBC utilizing the common Brandes algorithm involves calculating the shortest paths for every node pair, which can be computationally expensive and restrictive, especially for large graphs. Changes in the graph parameters, e.g., in the edge weight or the addition and deletion of nodes or edges, require the recalculation of the EBC. As the main contribution, we propose an approximate method to estimate the EBC using a Graph Neural Network (GNN), a deep learning-based approach. We show that it is computationally efficient compared to the conventional method, especially for large graphs. The proposed method of GNN-based edge ranking is evaluated on several synthetic graphs and a real-world transportation data set. We show that this framework can estimate the approximate edge ranking much faster compared to the conventional method. This approach is inductive, i.e., training and testing are performed on different sets of graphs with varying numbers of nodes and edges. The proposed method is especially suitable for applications on large-scale networks when edge information is desired, for example, in urban infrastructure improvement projects, power, and water network resilience analyses, and optimizing resource allocations in engineering networks.

Pgx: Hardware-accelerated parallel game simulation for reinforcement learning

  • Authors: Sotetsu Koyamada, Shinri Okano, Soichiro Nishimori, Yu Murata, Keigo Habara, Haruka Kita, Shin Ishii
  • Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17503
  • Pdf link: https://arxiv.org/pdf/2303.17503
  • Abstract
    We propose Pgx, a collection of board game simulators written in JAX. Thanks to auto-vectorization and Just-In-Time compilation of JAX, Pgx scales easily to thousands of parallel execution on GPU/TPU accelerators. We found that the simulation of Pgx on a single A100 GPU is 10x faster than that of existing reinforcement learning libraries. Pgx implements games considered vital benchmarks in artificial intelligence research, such as Backgammon, Shogi, and Go. Pgx is available at https://github.com/sotetsuk/pgx.

Token Merging for Fast Stable Diffusion

  • Authors: Daniel Bolya, Judy Hoffman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17604
  • Pdf link: https://arxiv.org/pdf/2303.17604
  • Abstract
    The landscape of image generation has been forever changed by open vocabulary diffusion models. However, at their core these models use transformers, which makes generation slow. Better implementations to increase the throughput of these transformers have emerged, but they still evaluate the entire model. In this paper, we instead speed up diffusion models by exploiting natural redundancy in generated images by merging redundant tokens. After making some diffusion-specific improvements to Token Merging (ToMe), our ToMe for Stable Diffusion can reduce the number of tokens in an existing Stable Diffusion model by up to 60% while still producing high quality images without any extra training. In the process, we speed up image generation by up to 2x and reduce memory consumption by up to 5.6x. Furthermore, this speed-up stacks with efficient implementations such as xFormers, minimally impacting quality while being up to 5.4x faster for large images. Code is available at https://github.com/dbolya/tomesd.

Keyword: mobile

A Tensor-based Convolutional Neural Network for Small Dataset Classification

  • Authors: Zhenhua Chen, David Crandall
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2303.17061
  • Pdf link: https://arxiv.org/pdf/2303.17061
  • Abstract
    Inspired by the ConvNets with structured hidden representations, we propose a Tensor-based Neural Network, TCNN. Different from ConvNets, TCNNs are composed of structured neurons rather than scalar neurons, and the basic operation is neuron tensor transformation. Unlike other structured ConvNets, where the part-whole relationships are modeled explicitly, the relationships are learned implicitly in TCNNs. Also, the structured neurons in TCNNs are high-rank tensors rather than vectors or matrices. We compare TCNNs with current popular ConvNets, including ResNets, MobileNets, EfficientNets, RegNets, etc., on CIFAR10, CIFAR100, and Tiny ImageNet. The experiment shows that TCNNs have higher efficiency in terms of parameters. TCNNs also show higher robustness against white-box adversarial attacks on MNIST compared to ConvNets.

Dependent Task Offloading in Edge Computing Using GNN and Deep Reinforcement Learning

  • Authors: Zequn Cao, Xiaoheng Deng
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.17100
  • Pdf link: https://arxiv.org/pdf/2303.17100
  • Abstract
    Task offloading is a widely used technology in Mobile Edge Computing (MEC), which declines the completion time of user task with the help of resourceful edge servers. Existing works mainly focus on the case that the computation density of a user task is homogenous so that it can be offloaded in full or by percentage. However, various user tasks in real life consist of several inner dependent subtasks, each of which is a minimum execution unit logically. Motivated by this gap, we aim to solve the Dependent Task Offloading (DTO) problem under multi-user multi-edge scenario in this paper. We firstly use Directed Acyclic Graph (DAG) to represent dependent task where nodes indicate subtasks and directed edges indicate dependencies among subtasks. Then we propose a scheme based on Graph Attention Network (GAT) and Deep Reinforcement Learning (DRL) to minimize the makespan of user tasks. To utilize GAT efficiently, we put the training of it on resourceful cloud in unsupervised style due to the numerous data and computation resource requirements. In addition, we design a multi-discrete Action space for DRL algorithm to enhance the applicability of our proposed scheme. Experiments are conducted on broadly distributed synthetic data. The results demonstrate that our proposed approach can be adapted to both simple and complex MEC environments and outperforms other methods.

Deep Generative Model and Its Applications in Efficient Wireless Network Management: A Tutorial and Case Study

  • Authors: Yinqiu Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Abbas Jamalipour
  • Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17114
  • Pdf link: https://arxiv.org/pdf/2303.17114
  • Abstract
    With the phenomenal success of diffusion models and ChatGPT, deep generation models (DGMs) have been experiencing explosive growth from 2022. Not limited to content generation, DGMs are also widely adopted in Internet of Things, Metaverse, and digital twin, due to their outstanding ability to represent complex patterns and generate plausible samples. In this article, we explore the applications of DGMs in a crucial task, i.e., improving the efficiency of wireless network management. Specifically, we firstly overview the generative AI, as well as three representative DGMs. Then, a DGM-empowered framework for wireless network management is proposed, in which we elaborate the issues of the conventional network management approaches, why DGMs can address them efficiently, and the step-by-step workflow for applying DGMs in managing wireless networks. Moreover, we conduct a case study on network economics, using the state-of-the-art DGM model, i.e., diffusion model, to generate effective contracts for incentivizing the mobile AI-Generated Content (AIGC) services. Last but not least, we discuss important open directions for the further research.

GAT-COBO: Cost-Sensitive Graph Neural Network for Telecom Fraud Detection

  • Authors: Xinxin Hu, Haotian Chen, Junjie Zhang, Hongchang Chen, Shuxin Liu, Xing Li, Yahui Wang, Xiangyang Xue
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17334
  • Pdf link: https://arxiv.org/pdf/2303.17334
  • Abstract
    Along with the rapid evolution of mobile communication technologies, such as 5G, there has been a drastically increase in telecom fraud, which significantly dissipates individual fortune and social wealth. In recent years, graph mining techniques are gradually becoming a mainstream solution for detecting telecom fraud. However, the graph imbalance problem, caused by the Pareto principle, brings severe challenges to graph data mining. This is a new and challenging problem, but little previous work has been noticed. In this paper, we propose a Graph ATtention network with COst-sensitive BOosting (GAT-COBO) for the graph imbalance problem. First, we design a GAT-based base classifier to learn the embeddings of all nodes in the graph. Then, we feed the embeddings into a well-designed cost-sensitive learner for imbalanced learning. Next, we update the weights according to the misclassification cost to make the model focus more on the minority class. Finally, we sum the node embeddings obtained by multiple cost-sensitive learners to obtain a comprehensive node representation, which is used for the downstream anomaly detection task. Extensive experiments on two real-world telecom fraud detection datasets demonstrate that our proposed method is effective for the graph imbalance problem, outperforming the state-of-the-art GNNs and GNN-based fraud detectors. In addition, our model is also helpful for solving the widespread over-smoothing problem in GNNs. The GAT-COBO code and datasets are available at https://github.com/xxhu94/GAT-COBO.

An Efficient Mobile Gateway Selection and Discovery Based-Routing Protocol in Heterogeneous LTE-VANET Networks

  • Authors: Driss Abada, Rachid Adrdor, Omar Boutkhoum, Adil Bohouch
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2303.17439
  • Pdf link: https://arxiv.org/pdf/2303.17439
  • Abstract
    Coupling cellular communication networks with vehicular ad hoc networks (VANET) can be a very interesting way out for providing Internet access to vehicles in the road. However, due to the several specific characteristics of VANETs, making an efficient multi-hop routing from vehicular sources to the Internet gateways through Long Term Evolution (LTE) technology is still challenging. In this paper, an Internet mobile gateway selection scheme is proposed to elect more potential vehicles to behave as gateways to Internet in VANETs. Therefore, the discovery and the selection of route to those mobiles gateways is carried out via an efficient multiple metrics-based relay selection mechanism. The objective is to select the more reliable route to the mobile gateways, by reducing the communication overhead and performing seamless handover. The proposed protocol is compared with one recent protocol based on packet delivery ratio, average end-to-end delay and overhead. The results show that the proposed protocol ameliorates significantly the network performance in the contrast of the other protocol.

Cost Sensitive GNN-based Imbalanced Learning for Mobile Social Network Fraud Detection

  • Authors: Xinxin Hu, Haotian Chen, Hongchang Chen, Shuxin Liu, Xing Li, Shibo Zhang, Yahui Wang, Xiangyang Xue
  • Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17486
  • Pdf link: https://arxiv.org/pdf/2303.17486
  • Abstract
    With the rapid development of mobile networks, the people's social contacts have been considerably facilitated. However, the rise of mobile social network fraud upon those networks, has caused a great deal of distress, in case of depleting personal and social wealth, then potentially doing significant economic harm. To detect fraudulent users, call detail record (CDR) data, which portrays the social behavior of users in mobile networks, has been widely utilized. But the imbalance problem in the aforementioned data, which could severely hinder the effectiveness of fraud detectors based on graph neural networks(GNN), has hardly been addressed in previous work. In this paper, we are going to present a novel Cost-Sensitive Graph Neural Network (CSGNN) by creatively combining cost-sensitive learning and graph neural networks. We conduct extensive experiments on two open-source realworld mobile network fraud datasets. The results show that CSGNN can effectively solve the graph imbalance problem and then achieve better detection performance than the state-of-the-art algorithms. We believe that our research can be applied to solve the graph imbalance problems in other fields. The CSGNN code and datasets are publicly available at https://github.com/xxhu94/CSGNN.

MobileInst: Video Instance Segmentation on the Mobile

  • Authors: Renhong Zhang, Tianheng Cheng, Shusheng Yang, Haoyi Jiang, Shuai Zhang, Jiancheng Lyu, Xin Li, Xiaowen Ying, Dashan Gao, Wenyu Liu, Xinggang Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17594
  • Pdf link: https://arxiv.org/pdf/2303.17594
  • Abstract
    Although recent approaches aiming for video instance segmentation have achieved promising results, it is still difficult to employ those approaches for real-world applications on mobile devices, which mainly suffer from (1) heavy computation and memory cost and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. Firstly, MobileInst adopts a mobile vision transformer to extract multi-level semantic features and presents an efficient query-based dual-transformer instance decoder for mask kernels and a semantic-enhanced mask decoder to generate instance segmentation per frame. Secondly, MobileInst exploits simple yet effective kernel reuse and kernel association to track objects for video instance segmentation. Further, we propose temporal query passing to enhance the tracking ability for kernels. We conduct experiments on COCO and YouTube-VIS datasets to demonstrate the superiority of MobileInst and evaluate the inference latency on a mobile CPU core of Qualcomm Snapdragon-778G, without other methods of acceleration. On the COCO dataset, MobileInst achieves 30.5 mask AP and 176 ms on the mobile CPU, which reduces the latency by 50% compared to the previous SOTA. For video instance segmentation, MobileInst achieves 35.0 AP on YouTube-VIS 2019 and 30.1 AP on YouTube-VIS 2021. Code will be available to facilitate real-world applications and future research.

Keyword: pruning

Explainable Intrusion Detection Systems Using Competitive Learning Techniques

  • Authors: Jesse Ables, Thomas Kirby, Sudip Mittal, Ioana Banicescu, Shahram Rahimi, William Anderson, Maria Seale
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17387
  • Pdf link: https://arxiv.org/pdf/2303.17387
  • Abstract
    The current state of the art systems in Artificial Intelligence (AI) enabled intrusion detection use a variety of black box methods. These black box methods are generally trained using Error Based Learning (EBL) techniques with a focus on creating accurate models. These models have high performative costs and are not easily explainable. A white box Competitive Learning (CL) based eXplainable Intrusion Detection System (X-IDS) offers a potential solution to these problem. CL models utilize an entirely different learning paradigm than EBL approaches. This different learning process makes the CL family of algorithms innately explainable and less resource intensive. In this paper, we create an X-IDS architecture that is based on DARPA's recommendation for explainable systems. In our architecture we leverage CL algorithms like, Self Organizing Maps (SOM), Growing Self Organizing Maps (GSOM), and Growing Hierarchical Self Organizing Map (GHSOM). The resulting models can be data-mined to create statistical and visual explanations. Our architecture is tested using NSL-KDD and CIC-IDS-2017 benchmark datasets, and produces accuracies that are 1% - 3% less than EBL models. However, CL models are much more explainable than EBL models. Additionally, we use a pruning process that is able to significantly reduce the size of these CL based models. By pruning our models, we are able to increase prediction speeds. Lastly, we analyze the statistical and visual explanations generated by our architecture, and we give a strategy that users could use to help navigate the set of explanations. These explanations will help users build trust with an Intrusion Detection System (IDS), and allow users to discover ways to increase the IDS's potency.

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

  • Authors: Xuanyao Chen, Zhijian Liu, Haotian Tang, Li Yi, Hang Zhao, Song Han
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17605
  • Pdf link: https://arxiv.org/pdf/2303.17605
  • Abstract
    High-resolution images enable neural networks to learn richer visual representations. However, this improved performance comes at the cost of growing computational complexity, hindering their usage in latency-sensitive applications. As not all pixels are equal, skipping computations for less-important regions offers a simple and effective measure to reduce the computation. This, however, is hard to be translated into actual speedup for CNNs since it breaks the regularity of the dense convolution workload. In this paper, we introduce SparseViT that revisits activation sparsity for recent window-based vision transformers (ViTs). As window attentions are naturally batched over blocks, actual speedup with window activation pruning becomes possible: i.e., ~50% latency reduction with 60% sparsity. Different layers should be assigned with different pruning ratios due to their diverse sensitivities and computational costs. We introduce sparsity-aware adaptation and apply the evolutionary search to efficiently find the optimal layerwise sparsity configuration within the vast search space. SparseViT achieves speedups of 1.5x, 1.4x, and 1.3x compared to its dense counterpart in monocular 3D object detection, 2D instance segmentation, and 2D semantic segmentation, respectively, with negligible to no loss of accuracy.

Keyword: voxel

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions

  • Authors: Lingdong Kong, Youquan Liu, Xin Li, Runnan Chen, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17597
  • Pdf link: https://arxiv.org/pdf/2303.17597
  • Abstract
    The robustness of 3D perception systems under natural corruptions from environments and sensors is pivotal for safety-critical applications. Existing large-scale 3D perception datasets often contain data that are meticulously cleaned. Such configurations, however, cannot reflect the reliability of perception models during the deployment stage. In this work, we present Robo3D, the first comprehensive benchmark heading toward probing the robustness of 3D detectors and segmentors under out-of-distribution scenarios against natural corruptions that occur in real-world environments. Specifically, we consider eight corruption types stemming from adversarial weather conditions, external disturbances, and internal sensor failure. We uncover that, although promising results have been progressively achieved on standard benchmarks, state-of-the-art 3D perception models are at risk of being vulnerable to corruptions. We draw key observations on the use of data representations, augmentation schemes, and training strategies, that could severely affect the model's performance. To pursue better robustness, we propose a density-insensitive training framework along with a simple flexible voxelization strategy to enhance the model resiliency. We hope our benchmark and approach could inspire future research in designing more robust and reliable 3D perception models. Our robustness benchmark suite is publicly available.

Keyword: lidar

T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals

  • Authors: James Giroux, Martin Bouchard, Robert Laganiere
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16940
  • Pdf link: https://arxiv.org/pdf/2303.16940
  • Abstract
    Object detection utilizing Frequency Modulated Continous Wave radar is becoming increasingly popular in the field of autonomous systems. Radar does not possess the same drawbacks seen by other emission-based sensors such as LiDAR, primarily the degradation or loss of return signals due to weather conditions such as rain or snow. However, radar does possess traits that make it unsuitable for standard emission-based deep learning representations such as point clouds. Radar point clouds tend to be sparse and therefore information extraction is not efficient. To overcome this, more traditional digital signal processing pipelines were adapted to form inputs residing directly in the frequency domain via Fast Fourier Transforms. Commonly, three transformations were used to form Range-Azimuth-Doppler cubes in which deep learning algorithms could perform object detection. This too has drawbacks, namely the pre-processing costs associated with performing multiple Fourier Transforms and normalization. We explore the possibility of operating on raw radar inputs from analog to digital converters via the utilization of complex transformation layers. Moreover, we introduce hierarchical Swin Vision transformers to the field of radar object detection and show their capability to operate on inputs varying in pre-processing, along with different radar configurations, i.e. relatively low and high numbers of transmitters and receivers, while obtaining on par or better results than the state-of-the-art.

BEVFusion4D: Learning LiDAR-Camera Fusion Under Bird's-Eye-View via Cross-Modality Guidance and Temporal Aggregation

  • Authors: Hongxiang Cai, Zeyuan Zhang, Zhenyu Zhou, Ziyin Li, Wenbo Ding, Jiuhua Zhao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17099
  • Pdf link: https://arxiv.org/pdf/2303.17099
  • Abstract
    Integrating LiDAR and Camera information into Bird's-Eye-View (BEV) has become an essential topic for 3D object detection in autonomous driving. Existing methods mostly adopt an independent dual-branch framework to generate LiDAR and camera BEV, then perform an adaptive modality fusion. Since point clouds provide more accurate localization and geometry information, they could serve as a reliable spatial prior to acquiring relevant semantic information from the images. Therefore, we design a LiDAR-Guided View Transformer (LGVT) to effectively obtain the camera representation in BEV space and thus benefit the whole dual-branch fusion system. LGVT takes camera BEV as the primitive semantic query, repeatedly leveraging the spatial cue of LiDAR BEV for extracting image features across multiple camera views. Moreover, we extend our framework into the temporal domain with our proposed Temporal Deformable Alignment (TDA) module, which aims to aggregate BEV features from multiple historical frames. Including these two modules, our framework dubbed BEVFusion4D achieves state-of-the-art results in 3D object detection, with 72.0% mAP and 73.5% NDS on the nuScenes validation set, and 73.3% mAP and 74.7% NDS on nuScenes test set, respectively.

Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving

  • Authors: Zijian Zhu, Yichi Zhang, Hai Chen, Yinpeng Dong, Shu Zhao, Wenbo Ding, Jiachen Zhong, Shibao Zheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.17297
  • Pdf link: https://arxiv.org/pdf/2303.17297
  • Abstract
    3D object detection is an essential perception task in autonomous driving to understand the environments. The Bird's-Eye-View (BEV) representations have significantly improved the performance of 3D detectors with camera inputs on popular benchmarks. However, there still lacks a systematic understanding of the robustness of these vision-dependent BEV models, which is closely related to the safety of autonomous driving systems. In this paper, we evaluate the natural and adversarial robustness of various representative models under extensive settings, to fully understand their behaviors influenced by explicit BEV features compared with those without BEV. In addition to the classic settings, we propose a 3D consistent patch attack by applying adversarial patches in the 3D space to guarantee the spatiotemporal consistency, which is more realistic for the scenario of autonomous driving. With substantial experiments, we draw several findings: 1) BEV models tend to be more stable than previous methods under different natural conditions and common corruptions due to the expressive spatial representations; 2) BEV models are more vulnerable to adversarial noises, mainly caused by the redundant BEV features; 3) Camera-LiDAR fusion models have superior performance under different settings with multi-modal inputs, but BEV fusion model is still vulnerable to adversarial noises of both point cloud and image. These findings alert the safety issue in the applications of BEV detectors and could facilitate the development of more robust models.

Event-based Agile Object Catching with a Quadrupedal Robot

  • Authors: Benedek Forrai, Takahiro Miki, Daniel Gehrig, Marco Hutter, Davide Scaramuzza
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17479
  • Pdf link: https://arxiv.org/pdf/2303.17479
  • Abstract
    Quadrupedal robots are conquering various indoor and outdoor applications due to their ability to navigate challenging uneven terrains. Exteroceptive information greatly enhances this capability since perceiving their surroundings allows them to adapt their controller and thus achieve higher levels of robustness. However, sensors such as LiDARs and RGB cameras do not provide sufficient information to quickly and precisely react in a highly dynamic environment since they suffer from a bandwidth-latency tradeoff. They require significant bandwidth at high frame rates while featuring significant perceptual latency at lower frame rates, thereby limiting their versatility on resource-constrained platforms. In this work, we tackle this problem by equipping our quadruped with an event camera, which does not suffer from this tradeoff due to its asynchronous and sparse operation. In leveraging the low latency of the events, we push the limits of quadruped agility and demonstrate high-speed ball catching for the first time. We show that our quadruped equipped with an event camera can catch objects with speeds up to 15 m/s from 4 meters, with a success rate of 83%. Using a VGA event camera, our method runs at 100 Hz on an NVIDIA Jetson Orin.

Keyword: diffusion

HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion

  • Authors: Ziya Erkoç, Fangchang Ma, Qi Shan, Matthias Nießner, Angela Dai
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17015
  • Pdf link: https://arxiv.org/pdf/2303.17015
  • Abstract
    Implicit neural fields, typically encoded by a multilayer perceptron (MLP) that maps from coordinates (e.g., xyz) to signals (e.g., signed distances), have shown remarkable promise as a high-fidelity and compact representation. However, the lack of a regular and explicit grid structure also makes it challenging to apply generative modeling directly on implicit neural fields in order to synthesize new data. To this end, we propose HyperDiffusion, a novel approach for unconditional generative modeling of implicit neural fields. HyperDiffusion operates directly on MLP weights and generates new neural implicit fields encoded by synthesized MLP parameters. Specifically, a collection of MLPs is first optimized to faithfully represent individual data samples. Subsequently, a diffusion process is trained in this MLP weight space to model the underlying distribution of neural implicit fields. HyperDiffusion enables diffusion modeling over a implicit, compact, and yet high-fidelity representation of complex signals across 3D shapes and 4D mesh animations within one single unified framework.

DiffCollage: Parallel Generation of Large Content with Diffusion Models

  • Authors: Qinsheng Zhang, Jiaming Song, Xun Huang, Yongxin Chen, Ming-Yu Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17076
  • Pdf link: https://arxiv.org/pdf/2303.17076
  • Abstract
    We present DiffCollage, a compositional diffusion model that can generate large content by leveraging diffusion models trained on generating pieces of the large content. Our approach is based on a factor graph representation where each factor node represents a portion of the content and a variable node represents their overlap. This representation allows us to aggregate intermediate outputs from diffusion models defined on individual nodes to generate content of arbitrary size and shape in parallel without resorting to an autoregressive generation procedure. We apply DiffCollage to various tasks, including infinite image generation, panorama image generation, and long-duration text-guided motion generation. Extensive experimental results with a comparison to strong autoregressive baselines verify the effectiveness of our approach.

Deep Generative Model and Its Applications in Efficient Wireless Network Management: A Tutorial and Case Study

  • Authors: Yinqiu Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Abbas Jamalipour
  • Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17114
  • Pdf link: https://arxiv.org/pdf/2303.17114
  • Abstract
    With the phenomenal success of diffusion models and ChatGPT, deep generation models (DGMs) have been experiencing explosive growth from 2022. Not limited to content generation, DGMs are also widely adopted in Internet of Things, Metaverse, and digital twin, due to their outstanding ability to represent complex patterns and generate plausible samples. In this article, we explore the applications of DGMs in a crucial task, i.e., improving the efficiency of wireless network management. Specifically, we firstly overview the generative AI, as well as three representative DGMs. Then, a DGM-empowered framework for wireless network management is proposed, in which we elaborate the issues of the conventional network management approaches, why DGMs can address them efficiently, and the step-by-step workflow for applying DGMs in managing wireless networks. Moreover, we conduct a case study on network economics, using the state-of-the-art DGM model, i.e., diffusion model, to generate effective contracts for incentivizing the mobile AI-Generated Content (AIGC) services. Last but not least, we discuss important open directions for the further research.

Discriminative Class Tokens for Text-to-Image Diffusion Models

  • Authors: Idan Schwartz, Vésteinn Snæbjarnarson, Sagie Benaim, Hila Chefer, Ryan Cotterell, Lior Wolf, Serge Belongie
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.17155
  • Pdf link: https://arxiv.org/pdf/2303.17155
  • Abstract
    Recent advances in text-to-image diffusion models have enabled the generation of diverse and high-quality images. However, generated images often fall short of depicting subtle details and are susceptible to errors due to ambiguity in the input text. One way of alleviating these issues is to train diffusion models on class-labeled datasets. This comes with a downside, doing so limits their expressive power: (i) supervised datasets are generally small compared to large-scale scraped text-image datasets on which text-to-image models are trained, and so the quality and diversity of generated images are severely affected, or (ii) the input is a hard-coded label, as opposed to free-form text, which limits the control over the generated images. In this work, we propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text while achieving high accuracy through discriminative signals from a pretrained classifier, which guides the generation. This is done by iteratively modifying the embedding of a single input token of a text-to-image diffusion model, using the classifier, by steering generated images toward a given target class. Our method is fast compared to prior fine-tuning methods and does not require a collection of in-class images or retraining of a noise-tolerant classifier. We evaluate our method extensively, showing that the generated images are: (i) more accurate and of higher quality than standard diffusion models, (ii) can be used to augment training data in a low-resource setting, and (iii) reveal information about the data used to train the guiding classifier. The code is available at \url{https://github.com/idansc/discriminative_class_tokens}

LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation

  • Authors: Guangcong Zheng, Xianpan Zhou, Xuewei Li, Zhongang Qi, Ying Shan, Xi Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17189
  • Pdf link: https://arxiv.org/pdf/2303.17189
  • Abstract
    Recently, diffusion models have achieved great success in image synthesis. However, when it comes to the layout-to-image generation where an image often has a complex scene of multiple objects, how to make strong control over both the global layout map and each detailed object remains a challenging task. In this paper, we propose a diffusion model named LayoutDiffusion that can obtain higher generation quality and greater controllability than the previous works. To overcome the difficult multimodal fusion of image and layout, we propose to construct a structural image patch with region information and transform the patched image into a special layout to fuse with the normal layout in a unified form. Moreover, Layout Fusion Module (LFM) and Object-aware Cross Attention (OaCA) are proposed to model the relationship among multiple objects and designed to be object-aware and position-sensitive, allowing for precisely controlling the spatial related information. Extensive experiments show that our LayoutDiffusion outperforms the previous SOTA methods on FID, CAS by relatively 46.35%, 26.70% on COCO-stuff and 44.29%, 41.82% on VG. Code is available at https://github.com/ZGCTroy/LayoutDiffusion.

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models

  • Authors: Vidit Goel, Elia Peruzzo, Yifan Jiang, Dejia Xu, Nicu Sebe, Trevor Darrell, Zhangyang Wang, Humphrey Shi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17546
  • Pdf link: https://arxiv.org/pdf/2303.17546
  • Abstract
    Image editing using diffusion models has witnessed extremely fast-paced growth recently. There are various ways in which previous works enable controlling and editing images. Some works use high-level conditioning such as text, while others use low-level conditioning. Nevertheless, most of them lack fine-grained control over the properties of the different objects present in the image, i.e. object-level image editing. In this work, we consider an image as a composition of multiple objects, each defined by various properties. Out of these properties, we identify structure and appearance as the most intuitive to understand and useful for editing purposes. We propose Structure-and-Appearance Paired Diffusion model (PAIR-Diffusion), which is trained using structure and appearance information explicitly extracted from the images. The proposed model enables users to inject a reference image's appearance into the input image at both the object and global levels. Additionally, PAIR-Diffusion allows editing the structure while maintaining the style of individual components of the image unchanged. We extensively evaluate our method on LSUN datasets and the CelebA-HQ face dataset, and we demonstrate fine-grained control over both structure and appearance at the object level. We also applied the method to Stable Diffusion to edit any real image at the object level.

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

  • Authors: Chenpng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2303.17550
  • Pdf link: https://arxiv.org/pdf/2303.17550
  • Abstract
    While recent research has made significant progress in speech-driven talking face generation, the quality of the generated video still lags behind that of real recordings. One reason for this is the use of handcrafted intermediate representations like facial landmarks and 3DMM coefficients, which are designed based on human knowledge and are insufficient to precisely describe facial movements. Additionally, these methods require an external pretrained model for extracting these representations, whose performance sets an upper bound on talking face generation. To address these limitations, we propose a novel method called DAE-Talker that leverages data-driven latent representations obtained from a diffusion autoencoder (DAE). DAE contains an image encoder that encodes an image into a latent vector and a DDIM image decoder that reconstructs the image from it. We train our DAE on talking face video frames and then extract their latent representations as the training target for a Conformer-based speech2latent model. This allows DAE-Talker to synthesize full video frames and produce natural head movements that align with the content of speech, rather than relying on a predetermined head pose from a template video. We also introduce pose modelling in speech2latent for pose controllability. Additionally, we propose a novel method for generating continuous video frames with the DDIM image decoder trained on individual frames, eliminating the need for modelling the joint distribution of consecutive frames directly. Our experiments show that DAE-Talker outperforms existing popular methods in lip-sync, video fidelity, and pose naturalness. We also conduct ablation studies to analyze the effectiveness of the proposed techniques and demonstrate the pose controllability of DAE-Talker.

DDP: Diffusion Model for Dense Visual Prediction

  • Authors: Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17559
  • Pdf link: https://arxiv.org/pdf/2303.17559
  • Abstract
    We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task compared to the specialist counterparts. For example, semantic segmentation (83.9 mIoU on Cityscapes), BEV map segmentation (70.6 mIoU on nuScenes), and depth estimation (0.05 REL on KITTI). We hope that our approach will serve as a solid baseline and facilitate future research

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models

  • Authors: Eric Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, Humphrey Shi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17591
  • Pdf link: https://arxiv.org/pdf/2303.17591
  • Abstract
    The unlearning problem of deep learning models, once primarily an academic concern, has become a prevalent issue in the industry. The significant advances in text-to-image generation techniques have prompted global discussions on privacy, copyright, and safety, as numerous unauthorized personal IDs, content, artistic creations, and potentially harmful materials have been learned by these models and later utilized to generate and distribute uncontrolled content. To address this challenge, we propose \textbf{Forget-Me-Not}, an efficient and low-cost solution designed to safely remove specified IDs, objects, or styles from a well-configured text-to-image model in as little as 30 seconds, without impairing its ability to generate other content. Alongside our method, we introduce the \textbf{Memorization Score (M-Score)} and \textbf{ConceptBench} to measure the models' capacity to generate general concepts, grouped into three primary categories: ID, object, and style. Using M-Score and ConceptBench, we demonstrate that Forget-Me-Not can effectively eliminate targeted concepts while maintaining the model's performance on other concepts. Furthermore, Forget-Me-Not offers two practical extensions: a) removal of potentially harmful or NSFW content, and b) enhancement of model accuracy, inclusion and diversity through \textbf{concept correction and disentanglement}. It can also be adapted as a lightweight model patch for Stable Diffusion, allowing for concept manipulation and convenient distribution. To encourage future research in this critical area and promote the development of safe and inclusive generative models, we will open-source our code and ConceptBench at \href{https://github.com/SHI-Labs/Forget-Me-Not}{https://github.com/SHI-Labs/Forget-Me-Not}.

Consistent View Synthesis with Pose-Guided Diffusion Models

  • Authors: Hung-Yu Tseng, Qinbo Li, Changil Kim, Suhib Alsisan, Jia-Bin Huang, Johannes Kopf
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17598
  • Pdf link: https://arxiv.org/pdf/2303.17598
  • Abstract
    Novel view synthesis from a single image has been a cornerstone problem for many Virtual Reality applications that provide immersive experiences. However, most existing techniques can only synthesize novel views within a limited range of camera motion or fail to generate consistent and high-quality novel views under significant camera movement. In this work, we propose a pose-guided diffusion model to generate a consistent long-term video of novel views from a single image. We design an attention layer that uses epipolar lines as constraints to facilitate the association between different viewpoints. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of the proposed diffusion model against state-of-the-art transformer-based and GAN-based approaches.

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

  • Authors: Wen Wang, Kangyang Xie, Zide Liu, Hao Chen, Yue Cao, Xinlong Wang, Chunhua Shen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17599
  • Pdf link: https://arxiv.org/pdf/2303.17599
  • Abstract
    Large-scale text-to-image diffusion models achieve unprecedented success in image generation and editing. However, how to extend such success to video editing is unclear. Recent initial attempts at video editing require significant text-to-video data and computation resources for training, which is often not accessible. In this work, we propose vid2vid-zero, a simple yet effective method for zero-shot video editing. Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video. At the core of our method is a null-text inversion module for text-to-video alignment, a cross-frame modeling module for temporal consistency, and a spatial regularization module for fidelity to the original video. Without any training, we leverage the dynamic nature of the attention mechanism to enable bi-directional temporal modeling at test time. Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos. Code will be made available at \url{https://github.com/baaivision/vid2vid-zero}.

Token Merging for Fast Stable Diffusion

  • Authors: Daniel Bolya, Judy Hoffman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17604
  • Pdf link: https://arxiv.org/pdf/2303.17604
  • Abstract
    The landscape of image generation has been forever changed by open vocabulary diffusion models. However, at their core these models use transformers, which makes generation slow. Better implementations to increase the throughput of these transformers have emerged, but they still evaluate the entire model. In this paper, we instead speed up diffusion models by exploiting natural redundancy in generated images by merging redundant tokens. After making some diffusion-specific improvements to Token Merging (ToMe), our ToMe for Stable Diffusion can reduce the number of tokens in an existing Stable Diffusion model by up to 60% while still producing high quality images without any extra training. In the process, we speed up image generation by up to 2x and reduce memory consumption by up to 5.6x. Furthermore, this speed-up stacks with efficient implementations such as xFormers, minimally impacting quality while being up to 5.4x faster for large images. Code is available at https://github.com/dbolya/tomesd.

AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control

  • Authors: Ruixiang Jiang, Can Wang, Jingbo Zhang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17606
  • Pdf link: https://arxiv.org/pdf/2303.17606
  • Abstract
    Neural implicit fields are powerful for representing 3D scenes and generating high-quality novel views, but it remains challenging to use such implicit representations for creating a 3D human avatar with a specific identity and artistic style that can be easily animated. Our proposed method, AvatarCraft, addresses this challenge by using diffusion models to guide the learning of geometry and texture for a neural avatar based on a single text prompt. We carefully design the optimization framework of neural implicit fields, including a coarse-to-fine multi-bounding box training strategy, shape regularization, and diffusion-based constraints, to produce high-quality geometry and texture. Additionally, we make the human avatar animatable by deforming the neural implicit field with an explicit warping field that maps the target human mesh to a template human mesh, both represented using parametric human models. This simplifies animation and reshaping of the generated avatar by controlling pose and shape parameters. Extensive experiments on various text descriptions show that AvatarCraft is effective and robust in creating human avatars and rendering novel views, poses, and shapes. Our project page is: \url{https://avatar-craft.github.io/}.

Keyword: dynamic

Thrust vector control and state estimation architecture for low-cost small-scale launchers

  • Authors: Pedro dos Santos, Paulo Oliveira
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.16983
  • Pdf link: https://arxiv.org/pdf/2303.16983
  • Abstract
    This paper proposes an integrated architecture for Thrust Vector Control (TVC) and state estimation for low-cost small-scale launchers, naturally unstable, and propelled by a solid motor. The architecture is based on a non-linear, six-degrees-of-freedom model for the generic thrust-vector-controlled launcher dynamics and kinematics, deduced and implemented in a realistic simulation environment. For estimation and control design purposes, a linearized version of the model is proposed. Single-nozzle TVC actuation is adopted, allowing for pitch and yaw control, with the control law being derived from the Linear Quadratic Regulator (LQR) with additional integral action (LQI). The control system is implemented through gain scheduling. Full state estimation is performed resorting to complementary kinematic filters, closely related to linear Kalman fitering theory. The architecture, composed by the navigation and control systems, is tested in simulation environment, demonstrating satisfactory attitude tracking performance and robustness to both external disturbances and model uncertainties.

PopSparse: Accelerated block sparse matrix multiplication on IPU

  • Authors: Zhiyi Li, Douglas Orr, Valeriu Ohan, Godfrey Da costa, Tom Murray, Adam Sanders, Deniz Beker, Dominic Masters
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16999
  • Pdf link: https://arxiv.org/pdf/2303.16999
  • Abstract
    Reducing the computational cost of running large scale neural networks using sparsity has attracted great attention in the deep learning community. While much success has been achieved in reducing FLOP and parameter counts while maintaining acceptable task performance, achieving actual speed improvements has typically been much more difficult, particularly on general purpose accelerators (GPAs) such as NVIDIA GPUs using low precision number formats. In this work we introduce PopSparse, a library that enables fast sparse operations on Graphcore IPUs by leveraging both the unique hardware characteristics of IPUs as well as any block structure defined in the data. We target two different types of sparsity: static, where the sparsity pattern is fixed at compile-time; and dynamic, where it can change each time the model is run. We present benchmark results for matrix multiplication for both of these modes on IPU with a range of block sizes, matrix sizes and densities. Results indicate that the PopSparse implementations are faster than dense matrix multiplications on IPU at a range of sparsity levels with large matrix size and block size. Furthermore, static sparsity in general outperforms dynamic sparsity. While previous work on GPAs has shown speedups only for very high sparsity (typically 99% and above), the present work demonstrates that our static sparse implementation outperforms equivalent dense calculations in FP16 at lower sparsity (around 90%).

Scalable Implicit Solvers with Dynamic Mesh Adaptation for a Relativistic Drift-Kinetic Fokker-Planck-Boltzmann Model

  • Authors: Johann Rudi, Max Heldman, Emil M. Constantinescu, Qi Tang, Xian-Zhu Tang
  • Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Plasma Physics (physics.plasm-ph)
  • Arxiv link: https://arxiv.org/abs/2303.17019
  • Pdf link: https://arxiv.org/pdf/2303.17019
  • Abstract
    In this work we consider a relativistic drift-kinetic model for runaway electrons along with a Fokker-Planck operator for small-angle Coulomb collisions, a radiation damping operator, and a secondary knock-on (Boltzmann) collision source. We develop a new scalable fully implicit solver utilizing finite volume and conservative finite difference schemes and dynamic mesh adaptivity. A new data management framework in the PETSc library based on the p4est library is developed to enable simulations with dynamic adaptive mesh refinement (AMR), parallel computation, and load balancing. This framework is tested through the development of the runaway electron solver that is able to dynamically capture both bulk Maxwellian at the low-energy region and a runaway tail at the high-energy region. To effectively capture features via the AMR algorithm, a new AMR indicator prediction strategy is proposed that is performed alongside the implicit time evolution of the solution. This strategy is complemented by the introduction of computationally cheap feature-based AMR indicators that are analyzed theoretically. Numerical results quantify the advantages of the prediction strategy in better capturing features compared with nonpredictive strategies; and we demonstrate trade-offs regarding computational costs. The full solver is further verified through several benchmark problems including manufactured solutions and solutions of physics models. We particularly focus on demonstrating the advantages of using implicit time stepping and AMR for runaway electron simulations.

Stability bounds of droop-controlled inverters in power grid networks

  • Authors: Philipp C. Böttcher, Leonardo Rydin Gorjão, Dirk Witthaut
  • Subjects: Systems and Control (eess.SY); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2303.17032
  • Pdf link: https://arxiv.org/pdf/2303.17032
  • Abstract
    The energy mix of future power systems will include high shares of wind power and solar PV. These generation facilities are generally connected via power-electronic inverters. While conventional generation responds dynamically to the state of the electric power system, inverters are power electronic hardware and need to be programmed to react to the state of the system. Choosing an appropriate control scheme and the corresponding parameters is necessary to guarantee that the system operates safely. A prominent control scheme for inverters is droop control, which mimics the response of conventional generation. In this work, we investigate the stability of coupled systems of droop-controlled inverters in arbitrary network topologies. Employing linear stability analysis, we derive effective local stability criteria that consider both the overall network topology as well as its interplay with the inverters' intrinsic parameters. First, we explore the stability of an inverter coupled to an infinite grid in an analytic fashion and uncover stability and instability regions. Secondly, we extend the analysis to a generic topology of inverters and provide mathematical criteria for stability and instability of the system. Last, we showcase the usefulness of the criteria by examining two model systems using numerical simulations. The developed criteria show which parameters might lead to an unstable operating state.

Material-agnostic Shaping of Granular Materials with Optimal Transport

  • Authors: Nikhilesh Alatur, Olov Andersson, Roland Siegwart, Lionel Ott
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17047
  • Pdf link: https://arxiv.org/pdf/2303.17047
  • Abstract
    From construction materials, such as sand or asphalt, to kitchen ingredients, like rice, sugar, or salt; the world is full of granular materials. Despite impressive progress in robotic manipulation, manipulating and interacting with granular material remains a challenge due to difficulties in perceiving, representing, modelling, and planning for these variable materials that have complex internal dynamics. While some prior work has looked into estimating or learning accurate dynamics models for granular materials, the literature is still missing a more abstract planning method that can be used for planning manipulation actions for granular materials with unknown material properties. In this work, we leverage tools from optimal transport and connect them to robot motion planning. We propose a heuristics-based sweep planner that does not require knowledge of the material's properties and directly uses a height map representation to generate promising sweeps. These sweeps transform granular material from arbitrary start shapes into arbitrary target shapes. We apply the sweep planner in a fast and reactive feedback loop and avoid the need for model-based planning over multiple time steps. We validate our approach with a large set of simulation and hardware experiments where we show that our method is capable of efficiently solving several complex tasks, including gathering, separating, and shaping of several types of granular materials into different target shapes.

Modularized Control Synthesis for Complex Signal Temporal Logic Specifications

  • Authors: Zengjie Zhang, Sofie Haesaert
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.17086
  • Pdf link: https://arxiv.org/pdf/2303.17086
  • Abstract
    The control synthesis of a dynamic system subject to signal temporal logic (STL) specifications is commonly formulated as a mixed-integer linear programming (MILP) problem. Solving a MILP problem is computationally expensive when the STL formulas are long and complex. In this paper, we propose a framework to transform a long and complex STL formula into a syntactically separate form, i.e., the logical combination of a series of short and simple subformulas with non-overlapping timing intervals. Using this framework, one can easily modularize the synthesis of a complex formula using the synthesis solutions of the subformulas, which improves the efficiency of solving a MILP problem. Specifically, we propose a group of separation principles to guarantee the syntactic equivalence between the original formula and its syntactically separate counterpart. Then, we propose novel methods to solve the largest satisfaction region and the open-loop controller of the specification in a modularized manner. The efficacy of the methods is validated with a robot monitoring case study in simulation. Our work is promising to promote the efficiency of control synthesis for systems with complicated specifications.

Learning Reliable Representations for Incomplete Multi-View Partial Multi-Label Classification

  • Authors: Chengliang Liu, Jie Wen, Yong Xu, Liqiang Nie, Min Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17117
  • Pdf link: https://arxiv.org/pdf/2303.17117
  • Abstract
    As a cross-topic of multi-view learning and multi-label classification, multi-view multi-label classification has gradually gained traction in recent years. The application of multi-view contrastive learning has further facilitated this process, however, the existing multi-view contrastive learning methods crudely separate the so-called negative pair, which largely results in the separation of samples belonging to the same category or similar ones. Besides, plenty of multi-view multi-label learning methods ignore the possible absence of views and labels. To address these issues, in this paper, we propose an incomplete multi-view partial multi-label classification network named RANK. In this network, a label-driven multi-view contrastive learning strategy is proposed to leverage supervised information to preserve the structure within view and perform consistent alignment across views. Furthermore, we break through the view-level weights inherent in existing methods and propose a quality-aware sub-network to dynamically assign quality scores to each view of each sample. The label correlation information is fully utilized in the final multi-label cross-entropy classification loss, effectively improving the discriminative power. Last but not least, our model is not only able to handle complete multi-view multi-label datasets, but also works on datasets with missing instances and labels. Extensive experiments confirm that our RANK outperforms existing state-of-the-art methods.

Weighted Scheduling of Time-Sensitive Coflows

  • Authors: Olivier Brun, Rachid El-Azouzi, Quang-Trung Luu, Francesco De Pellergrini, Balakrishna J. Prabhu, Cédric Richier
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.17175
  • Pdf link: https://arxiv.org/pdf/2303.17175
  • Abstract
    Datacenter networks routinely support the data transfers of distributed computing frameworks in the form of coflows, i.e., sets of concurrent flows related to a common task. The vast majority of the literature has focused on the problem of scheduling coflows for completion time minimization, i.e., to maximize the average rate at which coflows are dispatched in the network fabric. However, many modern applications generate coflows dedicated to online services and mission-critical computing tasks which have to comply with specific completion deadlines. In this paper, we introduce $\mathtt{WDCoflow}$, a new algorithm to maximize the weighted number of coflows that complete before their deadline. By combining a dynamic programming algorithm along with parallel inequalities, our heuristic solution performs at once coflow admission control and coflow prioritization, imposing a $\sigma$-order on the set of coflows. With extensive simulation, we demonstrate the effectiveness of our algorithm in improving up to $3\times$ more coflows that meet their deadline in comparison the best SotA solution, namely $\mathtt{CS\text{-}MHA}$. Furthermore, when weights are used to differentiate coflow classes, $\mathtt{WDCoflow}$ is able to improve the admission per class up to $4\times$, while increasing the average weighted coflow admission rate.

Innovative Countermeasures to Defeat Cyber Attacks Against Blockchain Wallets: A Crypto Terminal Use Case

  • Authors: Pascal Urien (LTCI)
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.17206
  • Pdf link: https://arxiv.org/pdf/2303.17206
  • Abstract
    Blockchain transactions are signed by private keys. Secure key storage and tamper-proof computers are essential requirements for deploying a trusted infrastructure. In this paper, we identify some threats against blockchain wallets and propose a set of physical and logical countermeasures to thwart them. We present the crypto terminal device, operating with a removable secure element, built on open software and hardware architectures, capable of detecting a cloned device or corrupted software. These technologies are based on tamper-resistant computing (javacard), smart card anti-cloning, smart card content attestation, application firewall, bare-metal architecture, remote attestation, dynamic Physical Unclonable Function (dPUF), and programming tokens as a root of trust.This paper is an extended version of the paper ''Innovative Countermeasures to Defeat Cyber Attacks Against Blockchain Wallets,'' 2021 5th Cyber Security in Networking Conference (CSNet), 2021, pp. 49-54, doi: 10.1109/CSNet52717.2021.9614649

Multifactor Sequential Disentanglement via Structured Koopman Autoencoders

  • Authors: Nimrod Berman, Ilan Naiman, Omri Azencot
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17264
  • Pdf link: https://arxiv.org/pdf/2303.17264
  • Abstract
    Disentangling complex data to its latent factors of variation is a fundamental task in representation learning. Existing work on sequential disentanglement mostly provides two factor representations, i.e., it separates the data to time-varying and time-invariant factors. In contrast, we consider multifactor disentanglement in which multiple (more than two) semantic disentangled components are generated. Key to our approach is a strong inductive bias where we assume that the underlying dynamics can be represented linearly in the latent space. Under this assumption, it becomes natural to exploit the recently introduced Koopman autoencoder models. However, disentangled representations are not guaranteed in Koopman approaches, and thus we propose a novel spectral loss term which leads to structured Koopman matrices and disentanglement. Overall, we propose a simple and easy to code new deep model that is fully unsupervised and it supports multifactor disentanglement. We showcase new disentangling abilities such as swapping of individual static factors between characters, and an incremental swap of disentangled factors from the source to the target. Moreover, we evaluate our method extensively on two factor standard benchmark tasks where we significantly improve over competing unsupervised approaches, and we perform competitively in comparison to weakly- and self-supervised state-of-the-art approaches. The code is available at https://github.com/azencot-group/SKD.

Improved a posteriori Error Bounds for Reduced port-Hamiltonian Systems

  • Authors: Johannes Rettberg, Dominik Wittwar, Patrick Buchfink, Robin Herkert, Jörg Fehr, Bernard Haasdonk
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.17329
  • Pdf link: https://arxiv.org/pdf/2303.17329
  • Abstract
    Projection-based model order reduction of dynamical systems usually introduces an error between the high-fidelity model and its counterpart of lower dimension. This unknown error can be bounded by residual-based methods, which are typically known to be highly pessimistic in the sense of largely overestimating the true error. This work applies two improved error bounding techniques, namely (a) a hierarchical error bound and (b) an error bound based on an auxiliary linear problem, to the case of port-Hamiltonian systems. The approaches rely on a second approximation of (a) the dynamical system and (b) the error system. In this paper, these methods are for the first time adapted to port-Hamiltonian systems by exploiting their structure. The mathematical relationship between the two methods is discussed both, theoretically and numerically. The effectiveness of the described methods is demonstrated using a challenging three-dimensional port-Hamiltonian model of a classical guitar with fluid-structure interaction.

Uniform Substitution for Dynamic Logic with Communicating Hybrid Programs

  • Authors: Marvin Brieger, Stefan Mitsch, André Platzer
  • Subjects: Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2303.17333
  • Pdf link: https://arxiv.org/pdf/2303.17333
  • Abstract
    This paper introduces a uniform substitution calculus for $d\mathcal{L}\text{CHP}$, the dynamic logic of communicating hybrid programs. Uniform substitution enables parsimonious prover kernels by using axioms instead of axiom schemata. Instantiations can be recovered from a single proof rule responsible for soundness-critical instantiation checks rather than being spread across axiom schemata in side conditions. Even though communication and parallelism reasoning are notorious for necessitating subtle soundness-critical side conditions, uniform substitution when generalized to $d\mathcal{L}\text{CHP}$ manages to limit and isolate their conceptual overhead. Since uniform substitution has proven to simplify the implementation of hybrid systems provers substantially, uniform substitution for $d\mathcal{L}_\text{CHP}$ paves the way for a parsimonious implementation of theorem provers for hybrid systems with communication and parallelism.

The Essential Algorithms for the Matrix Chain

  • Authors: Francisco López, Lars Karlsson, Paolo Bientinesi
  • Subjects: Discrete Mathematics (cs.DM)
  • Arxiv link: https://arxiv.org/abs/2303.17352
  • Pdf link: https://arxiv.org/pdf/2303.17352
  • Abstract
    For a given product of $n$ matrices, the matrix chain multiplication problem asks for a parenthesisation that minimises the number of arithmetic operations. In 1973, Godbole presented a now classical dynamic programming formulation with cubic time complexity on the length of the chain. The best known algorithms run in linearithmic time, and the best known approximation algorithms run in linear time with an approximation factor smaller than two. All solutions have in common that they select an optimal parenthesisation from a set of $C_{n-1}$ (Catalan number $n - 1$) distinct parenthesisations. We studied the set of parenthesisations and discovered (a) that all of the exponentially many parenthesisations are useful in the sense that they are optimal in an infinite subset of the input space, (b) that only $n + 1$ parenthesisations are essential in the sense that they are arbitrarily better than the second best on an infinite subset of the input space, and (c) that the best essential parenthesisation is never more than twice as costly as the best non-essential parenthesisation. Through random sampling of the input space, we further discovered that the set of essential parenthesisations includes an optimal parenthesisation in the vast majority of inputs, and that the best essential parenthesisation is on average much closer to optimal than the worst-case bound. The results have direct consequences for the development of compilers for linear algebra expressions where the matrix sizes are unknown at compile-time.

Dynamic Conceptional Contrastive Learning for Generalized Category Discovery

  • Authors: Nan Pu, Zhun Zhong, Nicu Sebe
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17393
  • Pdf link: https://arxiv.org/pdf/2303.17393
  • Abstract
    Generalized category discovery (GCD) is a recently proposed open-world problem, which aims to automatically cluster partially labeled data. The main challenge is that the unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories. This leads traditional novel category discovery (NCD) methods to be incapacitated for GCD, due to their assumption of unlabeled data are only from novel categories. One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data. However, this manner largely ignores underlying relationships between instances of the same concepts (e.g., class, super-class, and sub-class), which results in inferior representation learning. In this paper, we propose a Dynamic Conceptional Contrastive Learning (DCCL) framework, which can effectively improve clustering accuracy by alternately estimating underlying visual conceptions and learning conceptional representation. In addition, we design a dynamic conception generation and update mechanism, which is able to ensure consistent conception learning and thus further facilitate the optimization of DCCL. Extensive experiments show that DCCL achieves new state-of-the-art performances on six generic and fine-grained visual recognition datasets, especially on fine-grained ones. For example, our method significantly surpasses the best competitor by 16.2% on the new classes for the CUB-200 dataset. Code is available at https://github.com/TPCD/DCCL.

Fast inference of latent space dynamics in huge relational event networks

  • Authors: Igor Artico, Ernst Wit
  • Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2303.17460
  • Pdf link: https://arxiv.org/pdf/2303.17460
  • Abstract
    Relational events are a type of social interactions, that sometimes are referred to as dynamic networks. Its dynamics typically depends on emerging patterns, so-called endogenous variables, or external forces, referred to as exogenous variables. Comprehensive information on the actors in the network, especially for huge networks, is rare, however. A latent space approach in network analysis has been a popular way to account for unmeasured covariates that are driving network configurations. Bayesian and EM-type algorithms have been proposed for inferring the latent space, but both the sheer size many social network applications as well as the dynamic nature of the process, and therefore the latent space, make computations prohibitively expensive. In this work we propose a likelihood-based algorithm that can deal with huge relational event networks. We propose a hierarchical strategy for inferring network community dynamics embedded into an interpretable latent space. Node dynamics are described by smooth spline processes. To make the framework feasible for large networks we borrow from machine learning optimization methodology. Model-based clustering is carried out via a convex clustering penalization, encouraging shared trajectories for ease of interpretation. We propose a model-based approach for separating macro-microstructures and perform a hierarchical analysis within successive hierarchies. The method can fit millions of nodes on a public Colab GPU in a few minutes. The code and a tutorial are available in a Github repository.

PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation

  • Authors: Qitao Zhao, Ce Zheng, Mengyuan Liu, Pichao Wang, Chen Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17472
  • Pdf link: https://arxiv.org/pdf/2303.17472
  • Abstract
    Recently, transformer-based methods have gained significant success in sequential 2D-to-3D lifting human pose estimation. As a pioneering work, PoseFormer captures spatial relations of human joints in each video frame and human dynamics across frames with cascaded transformer layers and has achieved impressive performance. However, in real scenarios, the performance of PoseFormer and its follow-ups is limited by two factors: (a) The length of the input joint sequence; (b) The quality of 2D joint detection. Existing methods typically apply self-attention to all frames of the input sequence, causing a huge computational burden when the frame number is increased to obtain advanced estimation accuracy, and they are not robust to noise naturally brought by the limited capability of 2D joint detectors. In this paper, we propose PoseFormerV2, which exploits a compact representation of lengthy skeleton sequences in the frequency domain to efficiently scale up the receptive field and boost robustness to noisy 2D joint detection. With minimum modifications to PoseFormer, the proposed method effectively fuses features both in the time domain and frequency domain, enjoying a better speed-accuracy trade-off than its precursor. Extensive experiments on two benchmark datasets (i.e., Human3.6M and MPI-INF-3DHP) demonstrate that the proposed approach significantly outperforms the original PoseFormer and other transformer-based variants. Code is released at \url{https://github.com/QitaoZhao/PoseFormerV2}.

Differentiable Environment Primitives for Contact State Estimation

  • Authors: Kevin Haninger, Kangwagye Samuel, Filippo Rozzi, Sehoon Oh, Loris Roveda
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17476
  • Pdf link: https://arxiv.org/pdf/2303.17476
  • Abstract
    In contact-rich manipulation, the robot dynamics are coupled with an environment that has application-specific dynamic properties (stiffness, inertia) and geometry (contact normal). Knowledge of these environmental parameters can improve control and monitoring, but they are often unobserved and may vary, either online or between task instances. Observers, such as the extended Kalman filter, can be used to estimate these parameters, but such model-based techniques can require too much engineering work to scale up to complex environments, such as multi-point contact. To accelerate environment modeling, we propose environment primitives: parameterized environment dynamics that can be connected in parallel and are expressed in an automatic differentiation framework. This simplifies offline gradient-based optimization to fit model parameters and linearization of the coupled dynamics for an observer. This method is implemented for stiffness contact models, allowing the fitting of contact geometry and stiffness offline or their online estimation by an extended Kalman filter. This method is applied to a collaborative robot, estimating external force, contact stiffness, and contact geometry from the motor position and current. The estimates of external force and stiffness are compared with a momentum observer and direct force measurements.

On the Analysis of Computational Delays in Reinforcement Learning-based Rate Adaptation Algorithms

  • Authors: Ricardo Trancoso, Ruben Queiros, Helder Fontes, Rui Campos
  • Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.17477
  • Pdf link: https://arxiv.org/pdf/2303.17477
  • Abstract
    Several research works have applied Reinforcement Learning (RL) algorithms to solve the Rate Adaptation (RA) problem in Wi-Fi networks. The dynamic nature of the radio link requires the algorithms to be responsive to changes in link quality. Delays in the execution of the algorithm may be detrimental to its performance, which in turn may decrease network performance. This aspect has been overlooked in the state of the art. In this paper, we present an analysis of common computational delays in RL-based RA algorithms, and propose a methodology that may be applied to reduce these computational delays and increase the efficiency of this type of algorithms. We apply the proposed methodology to an existing RL-based RA algorithm. The obtained experimental results indicate a reduction of one order of magnitude in the execution time of the algorithm, improving its responsiveness to link quality changes.

Event-based Agile Object Catching with a Quadrupedal Robot

  • Authors: Benedek Forrai, Takahiro Miki, Daniel Gehrig, Marco Hutter, Davide Scaramuzza
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17479
  • Pdf link: https://arxiv.org/pdf/2303.17479
  • Abstract
    Quadrupedal robots are conquering various indoor and outdoor applications due to their ability to navigate challenging uneven terrains. Exteroceptive information greatly enhances this capability since perceiving their surroundings allows them to adapt their controller and thus achieve higher levels of robustness. However, sensors such as LiDARs and RGB cameras do not provide sufficient information to quickly and precisely react in a highly dynamic environment since they suffer from a bandwidth-latency tradeoff. They require significant bandwidth at high frame rates while featuring significant perceptual latency at lower frame rates, thereby limiting their versatility on resource-constrained platforms. In this work, we tackle this problem by equipping our quadruped with an event camera, which does not suffer from this tradeoff due to its asynchronous and sparse operation. In leveraging the low latency of the events, we push the limits of quadruped agility and demonstrate high-speed ball catching for the first time. We show that our quadruped equipped with an event camera can catch objects with speeds up to 15 m/s from 4 meters, with a success rate of 83%. Using a VGA event camera, our method runs at 100 Hz on an NVIDIA Jetson Orin.

Teaching contact-rich tasks from visual demonstrations by constraint extraction

  • Authors: Christian Hegeler, Filippo Rozzi, Loris Roveda, Kevin Haninger
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.17481
  • Pdf link: https://arxiv.org/pdf/2303.17481
  • Abstract
    Contact-rich manipulation involves kinematic constraints on the task motion, typically with discrete transitions between these constraints during the task. Allowing the robot to detect and reason about these contact constraints can support robust and dynamic manipulation, but how can these contact models be efficiently learned? Purely visual observations are an attractive data source, allowing passive task demonstrations with unmodified objects. Existing approaches for vision-only learning from demonstration are effective in pick-and-place applications and planar tasks. Nevertheless, accuracy/occlusions and unobserved task dynamics can limit their robustness in contact-rich manipulation. To use visual demonstrations for contact-rich robotic tasks, we consider the demonstration of pose trajectories with transitions between holonomic kinematic constraints, first clustering the trajectories into discrete contact modes, then fitting kinematic constraints per each mode. The fit constraints are then used to (i) detect contact online with force/torque measurements and (ii) plan the robot policy with respect to the active constraint. We demonstrate the approach with real experiments, on cabling and rake tasks, showing the approach gives robust manipulation through contact transitions.

DDP: Diffusion Model for Dense Visual Prediction

  • Authors: Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17559
  • Pdf link: https://arxiv.org/pdf/2303.17559
  • Abstract
    We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task compared to the specialist counterparts. For example, semantic segmentation (83.9 mIoU on Cityscapes), BEV map segmentation (70.6 mIoU on nuScenes), and depth estimation (0.05 REL on KITTI). We hope that our approach will serve as a solid baseline and facilitate future research

TiDy-PSFs: Computational Imaging with Time-Averaged Dynamic Point-Spread-Functions

  • Authors: Sachin Shah, Sakshum Kulshrestha, Christopher A. Metzler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17583
  • Pdf link: https://arxiv.org/pdf/2303.17583
  • Abstract
    Point-spread-function (PSF) engineering is a powerful computational imaging techniques wherein a custom phase mask is integrated into an optical system to encode additional information into captured images. Used in combination with deep learning, such systems now offer state-of-the-art performance at monocular depth estimation, extended depth-of-field imaging, lensless imaging, and other tasks. Inspired by recent advances in spatial light modulator (SLM) technology, this paper answers a natural question: Can one encode additional information and achieve superior performance by changing a phase mask dynamically over time? We first prove that the set of PSFs described by static phase masks is non-convex and that, as a result, time-averaged PSFs generated by dynamic phase masks are fundamentally more expressive. We then demonstrate, in simulation, that time-averaged dynamic (TiDy) phase masks can offer substantially improved monocular depth estimation and extended depth-of-field imaging performance.

Polarity is all you need to learn and transfer faster

  • Authors: Qingyang Wang, Michael A.Powell, Ali Geisa, Eric Bridgeford, Joshua T. Vogelstein
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
  • Arxiv link: https://arxiv.org/abs/2303.17589
  • Pdf link: https://arxiv.org/pdf/2303.17589
  • Abstract
    Natural intelligences (NIs) thrive in a dynamic world - they learn quickly, sometimes with only a few samples. In contrast, Artificial intelligences (AIs) typically learn with prohibitive amount of training samples and computational power. What design principle difference between NI and AI could contribute to such a discrepancy? Here, we propose an angle from weight polarity: development processes initialize NIs with advantageous polarity configurations; as NIs grow and learn, synapse magnitudes update yet polarities are largely kept unchanged. We demonstrate with simulation and image classification tasks that if weight polarities are adequately set $\textit{a priori}$, then networks learn with less time and data. We also explicitly illustrate situations in which $\textit{a priori}$ setting the weight polarities is disadvantageous for networks. Our work illustrates the value of weight polarities from the perspective of statistical and computational efficiency during learning.

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

  • Authors: Wen Wang, Kangyang Xie, Zide Liu, Hao Chen, Yue Cao, Xinlong Wang, Chunhua Shen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.17599
  • Pdf link: https://arxiv.org/pdf/2303.17599
  • Abstract
    Large-scale text-to-image diffusion models achieve unprecedented success in image generation and editing. However, how to extend such success to video editing is unclear. Recent initial attempts at video editing require significant text-to-video data and computation resources for training, which is often not accessible. In this work, we propose vid2vid-zero, a simple yet effective method for zero-shot video editing. Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video. At the core of our method is a null-text inversion module for text-to-video alignment, a cross-frame modeling module for temporal consistency, and a spatial regularization module for fidelity to the original video. Without any training, we leverage the dynamic nature of the attention mechanism to enable bi-directional temporal modeling at test time. Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos. Code will be made available at \url{https://github.com/baaivision/vid2vid-zero}.

New submissions for Wed, 3 May 23

Keyword: efficient

Two-phase Dual COPOD Method for Anomaly Detection in Industrial Control System

  • Authors: Emmanuel Aboah Boateng, Jerry Bruce
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.00982
  • Pdf link: https://arxiv.org/pdf/2305.00982
  • Abstract
    Critical infrastructures like water treatment facilities and power plants depend on industrial control systems (ICS) for monitoring and control, making them vulnerable to cyber attacks and system malfunctions. Traditional ICS anomaly detection methods lack transparency and interpretability, which make it difficult for practitioners to understand and trust the results. This paper proposes a two-phase dual Copula-based Outlier Detection (COPOD) method that addresses these challenges. The first phase removes unwanted outliers using an empirical cumulative distribution algorithm, and the second phase develops two parallel COPOD models based on the output data of phase 1. The method is based on empirical distribution functions, parameter-free, and provides interpretability by quantifying each feature's contribution to an anomaly. The method is also computationally and memory-efficient, suitable for low- and high-dimensional datasets. Experimental results demonstrate superior performance in terms of F1-score and recall on three open-source ICS datasets, enabling real-time ICS anomaly detection.

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

  • Authors: Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Bryan M. Wong, Zizhong Chen
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2305.01024
  • Pdf link: https://arxiv.org/pdf/2305.01024
  • Abstract
    General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as machine learning and scientific computing, and an efficient GEMM implementation is essential for the performance of these systems. While researchers often strive for faster performance by using large compute platforms, the increased scale of these systems can raise concerns about hardware and software reliability. In this paper, we present a design for a high-performance GEMM with algorithm-based fault tolerance for use on GPUs. We describe fault-tolerant designs for GEMM at the thread, warp, and threadblock levels, and also provide a baseline GEMM implementation that is competitive with or faster than the state-of-the-art, proprietary cuBLAS GEMM. We present a kernel fusion strategy to overlap and mitigate the memory latency due to fault tolerance with the original GEMM computation. To support a wide range of input matrix shapes and reduce development costs, we present a template-based approach for automatic code generation for both fault-tolerant and non-fault-tolerant GEMM implementations. We evaluate our work on NVIDIA Tesla T4 and A100 server GPUs. Experimental results demonstrate that our baseline GEMM presents comparable or superior performance compared to the closed-source cuBLAS. The fault-tolerant GEMM incurs only a minimal overhead (8.89% on average) compared to cuBLAS even with hundreds of errors injected per minute. For irregularly shaped inputs, the code generator-generated kernels show remarkable speedups of $160% \sim 183.5%$ and $148.55% \sim 165.12%$ for fault-tolerant and non-fault-tolerant GEMMs, outperforming cuBLAS by up to $41.40%$.

Hardware implementation of digital memcomputing on small-size FPGAs

  • Authors: Dyk Chung Nguyen, Yuan-Hang Zhang, Massimiliano Di Ventra, Yuriy V. Pershin
  • Subjects: Emerging Technologies (cs.ET)
  • Arxiv link: https://arxiv.org/abs/2305.01061
  • Pdf link: https://arxiv.org/pdf/2305.01061
  • Abstract
    Memcomputing is a novel computing paradigm beyond the von-Neumann one. Its digital version is designed for the efficient solution of combinatorial optimization problems, which emerge in various fields of science and technology. Previously, the performance of digital memcomputing machines (DMMs) was demonstrated using software simulations of their ordinary differential equations. Here, we present the first hardware realization of a DMM algorithm on a low-cost FPGA board. In this demonstration, we have implemented a Boolean satisfiability problem solver. To optimize the use of hardware resources, the algorithm was partially parallelized. The scalability of the present implementation is explored and our FPGA-based results are compared to those obtained using a python code running on a traditional (von-Neumann) computer, showing one to two orders of magnitude speed-up in time to solution. This initial small-scale implementation is projected to state-of-the-art FPGA boards anticipating further advantages of the hardware realization of DMMs over their software emulation.

Robust Communication Complexity of Matching: EDCS Achieves 5/6 Approximation

  • Authors: Amir Azarmehr, Soheil Behnezhad
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2305.01070
  • Pdf link: https://arxiv.org/pdf/2305.01070
  • Abstract
    We study the robust communication complexity of maximum matching. Edges of an arbitrary $n$-vertex graph $G$ are randomly partitioned between Alice and Bob independently and uniformly. Alice has to send a single message to Bob such that Bob can find an (approximate) maximum matching of the whole graph $G$. We specifically study the best approximation ratio achievable via protocols where Alice communicates only $\widetilde{O}(n)$ bits to Bob. There has been a growing interest on the robust communication model due to its connections to the random-order streaming model. An algorithm of Assadi and Behnezhad [ICALP'21] implies a $(2/3+\epsilon_0 \sim .667)$-approximation for a small constant $0 &lt; \epsilon_0 &lt; 10^{-18}$, which remains the best-known approximation for general graphs. For bipartite graphs, Assadi and Behnezhad [Random'21] improved the approximation to .716 albeit with a computationally inefficient (i.e., exponential time) protocol. In this paper, we study a natural and efficient protocol implied by a random-order streaming algorithm of Bernstein [ICALP'20] which is based on edge-degree constrained subgraphs (EDCS) [Bernstein and Stein; ICALP'15]. The result of Bernstein immediately implies that this protocol achieves an (almost) $(2/3 \sim .666)$-approximation in the robust communication model. We present a new analysis, proving that it achieves a much better (almost) $(5/6 \sim .833)$-approximation. This significantly improves previous approximations both for general and bipartite graphs. We also prove that our analysis of Bernstein's protocol is tight.

Fast Path Planning Through Large Collections of Safe Boxes

  • Authors: Tobia Marcucci, Parth Nobel, Russ Tedrake, Stephen Boyd
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.01072
  • Pdf link: https://arxiv.org/pdf/2305.01072
  • Abstract
    We present a fast algorithm for the design of smooth paths (or trajectories) that are constrained to lie in a collection of axis-aligned boxes. We consider the case where the number of these safe boxes is large, and basic preprocessing of them (such as finding their intersections) can be done offline. At runtime we quickly generate a smooth path between given initial and terminal positions. Our algorithm designs trajectories that are guaranteed to be safe at all times, and it detects infeasibility whenever such a trajectory does not exist. Our algorithm is based on two subproblems that we can solve very efficiently: finding a shortest path in a weighted graph, and solving (multiple) convex optimal control problems. We demonstrate the proposed path planner on large-scale numerical examples, and we provide an efficient open-source software implementation, fastpathplanning.

Design and Evaluation of a Bioinspired Tendon-Driven 3D-Printed Robotic Eye with Active Vision Capabilities

  • Authors: Hamid Osooli, Mohsen Irani Rahaghi, S. Reza Ahmadzadeh
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.01076
  • Pdf link: https://arxiv.org/pdf/2305.01076
  • Abstract
    The field of robotics has seen significant advancements in recent years, particularly in the development of humanoid robots. One area of research that has yet to be fully explored is the design of robotic eyes. In this paper, we propose a computer-aided 3D design scheme for a robotic eye that incorporates realistic appearance, natural movements, and efficient actuation. The proposed design utilizes a tendon-driven actuation mechanism, which offers a broad range of motion capabilities. The use of the minimum number of servos for actuation, one for each agonist-antagonist pair of muscles, makes the proposed design highly efficient. Compared to existing ones in the same class, our designed robotic eye comprises aesthetic and realistic features. We evaluate the robot's performance using a vision-based controller, which demonstrates the effectiveness of the proposed design in achieving natural movement, and efficient actuation. The experiment code, toolbox, and printable 3D sketches of our design have been open-sourced.

An Update-intensive LSM-based R-tree Index

  • Authors: Jaewoo Shin, Jianguo Wang, Walid G. Aref
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2305.01087
  • Pdf link: https://arxiv.org/pdf/2305.01087
  • Abstract
    Many applications require update-intensive workloads on spatial objects, e.g., social-network services and shared-riding services that track moving objects. By buffering insert and delete operations in memory, the Log Structured Merge Tree (LSM) has been used widely in various systems because of its ability to handle write-heavy workloads. While the focus on LSM has been on key-value stores and their optimizations, there is a need to study how to efficiently support LSM-based {\em secondary} indexes (e.g., location-based indexes) as modern, heterogeneous data necessitates the use of secondary indexes. In this paper, we investigate the augmentation of a main-memory-based memo structure into an LSM secondary index structure to handle update-intensive workloads efficiently. We conduct this study in the context of an R-tree-based secondary index. In particular, we introduce the LSM RUM-tree that demonstrates the use of an Update Memo in an LSM-based R-tree to enhance the performance of the R-tree's insert, delete, update, and search operations. The LSM RUM-tree introduces new strategies to control the size of the Update Memo to make sure it always fits in memory for high performance. The Update Memo is a light-weight in-memory structure that is suitable for handling update-intensive workloads without introducing significant overhead. Experimental results using real spatial data demonstrate that the LSM RUM-tree achieves up to 9.6x speedup on update operations and up to 2400x speedup on query processing over existing LSM R-tree implementations.

RadAdapt: Radiology Report Summarization via Lightweight Domain Adaptation of Large Language Models

  • Authors: Dave Van Veen, Cara Van Uden, Maayane Attias, Anuj Pareek, Christian Bluethgen, Malgorzata Polacin, Wah Chiu, Jean-Benoit Delbrouck, Juan Manuel Zambrano Chaves, Curtis P. Langlotz, Akshay S. Chaudhari, John Pauly
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2305.01146
  • Pdf link: https://arxiv.org/pdf/2305.01146
  • Abstract
    We systematically investigate lightweight strategies to adapt large language models (LLMs) for the task of radiology report summarization (RRS). Specifically, we focus on domain adaptation via pretraining (on natural language, biomedical text, and clinical text) and via prompting (zero-shot, in-context learning) or parameter-efficient fine-tuning (prefix tuning, LoRA). Our results on the MIMIC-III dataset consistently demonstrate best performance by maximally adapting to the task via pretraining on clinical text and parameter-efficient fine-tuning on RRS examples. Importantly, this method fine-tunes a mere 0.32% of parameters throughout the model, in contrast to end-to-end fine-tuning (100% of parameters). Additionally, we study the effect of in-context examples and out-of-distribution (OOD) training before concluding with a radiologist reader study and qualitative analysis. Our findings highlight the importance of domain adaptation in RRS and provide valuable insights toward developing effective natural language processing solutions for clinical tasks.

Unbounded Differentially Private Quantile and Maximum Estimation

  • Authors: David Durfee
  • Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2305.01177
  • Pdf link: https://arxiv.org/pdf/2305.01177
  • Abstract
    In this work we consider the problem of differentially private computation of quantiles for the data, especially the highest quantiles such as maximum, but with an unbounded range for the dataset. We show that this can be done efficiently through a simple invocation of $\texttt{AboveThreshold}$, a subroutine that is iteratively called in the fundamental Sparse Vector Technique, even when there is no upper bound on the data. In particular, we show that this procedure can give more accurate and robust estimates on the highest quantiles with applications towards clipping that is essential for differentially private sum and mean estimation. In addition, we show how two invocations can handle the fully unbounded data setting. Within our study, we show that an improved analysis of $\texttt{AboveThreshold}$ can improve the privacy guarantees for the widely used Sparse Vector Technique that is of independent interest. We give a more general characterization of privacy loss for $\texttt{AboveThreshold}$ which we immediately apply to our method for improved privacy guarantees. Our algorithm only requires one $O(n)$ pass through the data, which can be unsorted, and each subsequent query takes $O(1)$ time. We empirically compare our unbounded algorithm with the state-of-the-art algorithms in the bounded setting. For inner quantiles, we find that our method often performs better on non-synthetic datasets. For the maximal quantiles, which we apply to differentially private sum computation, we find that our method performs significantly better.

LatentAvatar: Learning Latent Expression Code for Expressive Neural Head Avatar

  • Authors: Yuelang Xu, Hongwen Zhang, Lizhen Wang, Xiaochen Zhao, Han Huang, Guojun Qi, Yebin Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.01190
  • Pdf link: https://arxiv.org/pdf/2305.01190
  • Abstract
    Existing approaches to animatable NeRF-based head avatars are either built upon face templates or use the expression coefficients of templates as the driving signal. Despite the promising progress, their performances are heavily bound by the expression power and the tracking accuracy of the templates. In this work, we present LatentAvatar, an expressive neural head avatar driven by latent expression codes. Such latent expression codes are learned in an end-to-end and self-supervised manner without templates, enabling our method to get rid of expression and tracking issues. To achieve this, we leverage a latent head NeRF to learn the person-specific latent expression codes from a monocular portrait video, and further design a Y-shaped network to learn the shared latent expression codes of different subjects for cross-identity reenactment. By optimizing the photometric reconstruction objectives in NeRF, the latent expression codes are learned to be 3D-aware while faithfully capturing the high-frequency detailed expressions. Moreover, by learning a mapping between the latent expression code learned in shared and person-specific settings, LatentAvatar is able to perform expressive reenactment between different subjects. Experimental results show that our LatentAvatar is able to capture challenging expressions and the subtle movement of teeth and even eyeballs, which outperforms previous state-of-the-art solutions in both quantitative and qualitative comparisons. Project page: https://www.liuyebin.com/latentavatar.

Exploration of Unranked Items in Safe Online Learning to Re-Rank

  • Authors: Hiroaki Shiino, Kaito Ariu, Kenshi Abe, Togashi Riku
  • Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.01202
  • Pdf link: https://arxiv.org/pdf/2305.01202
  • Abstract
    Bandit algorithms for online learning to rank (OLTR) problems often aim to maximize long-term revenue by utilizing user feedback. From a practical point of view, however, such algorithms have a high risk of hurting user experience due to their aggressive exploration. Thus, there has been a rising demand for safe exploration in recent years. One approach to safe exploration is to gradually enhance the quality of an original ranking that is already guaranteed acceptable quality. In this paper, we propose a safe OLTR algorithm that efficiently exchanges one of the items in the current ranking with an item outside the ranking (i.e., an unranked item) to perform exploration. We select an unranked item optimistically to explore based on Kullback-Leibler upper confidence bounds (KL-UCB) and safely re-rank the items including the selected one. Through experiments, we demonstrate that the proposed algorithm improves long-term regret from baselines without any safety violation.

Chronosymbolic Learning: Efficient CHC Solving with Symbolic Reasoning and Inductive Learning

  • Authors: Ziyan Luo, Xujie Si
  • Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
  • Arxiv link: https://arxiv.org/abs/2305.01206
  • Pdf link: https://arxiv.org/pdf/2305.01206
  • Abstract
    Solving Constrained Horn Clauses (CHCs) is a fundamental challenge behind a wide range of verification and analysis tasks. Data-driven approaches show great promise in improving CHC solving without the painstaking manual effort of creating and tuning various heuristics. However, a large performance gap exists between data-driven CHC solvers and symbolic reasoning-based solvers. In this work, we develop a simple but effective framework, "Chronosymbolic Learning", which unifies symbolic information and numerical data points to solve a CHC system efficiently. We also present a simple instance of Chronosymbolic Learning with a data-driven learner and a BMC-styled reasoner. Despite its great simplicity, experimental results show the efficacy and robustness of our tool. It outperforms state-of-the-art CHC solvers on a dataset consisting of 288 benchmarks, including many instances with non-linear integer arithmetics.

Rate-Compatible Polar Codes for Automorphism Ensemble Decoding

  • Authors: Marvin Geiselhart, Jannis Clausius, Stephan ten Brink
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2305.01214
  • Pdf link: https://arxiv.org/pdf/2305.01214
  • Abstract
    Recently, automorphism ensemble decoding (AED) has drawn research interest as a more computationally efficient alternative to successive cancellation list (SCL) decoding of polar codes. Although AED has demonstrated superior performance for specific code parameters, a flexible code design that can accommodate varying code rates does not yet exist. This work proposes a theoretical framework for constructing rate-compatible polar codes with a prescribed automorphism group, which is a key requirement for AED. We first prove that a one-bit granular sequence with useful automorphisms cannot exist. However, by allowing larger steps in the code dimension, flexible code sequences can be constructed. An explicit synthetic channel ranking based on the $\beta$-expansion is then proposed to ensure that all constructed codes possess the desired symmetries. Simulation results, covering a broad range of code dimensions and blocklengths, show a performance comparable to that of 5G polar codes under cyclic redundancy check (CRC)-aided SCL decoding, however, with lower complexity.

Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models

  • Authors: Shuai Zhao, Jinming Wen, Luu Anh Tuan, Junbo Zhao, Jie Fu
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.01219
  • Pdf link: https://arxiv.org/pdf/2305.01219
  • Abstract
    The prompt-based learning paradigm, which bridges the gap between pre-training and fine-tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot settings. Despite being widely applied, prompt-based learning is vulnerable to backdoor attacks. Textual backdoor attacks are designed to introduce targeted vulnerabilities into models by poisoning a subset of training samples through trigger injection and label modification. However, they suffer from flaws such as abnormal natural language expressions resulting from the trigger and incorrect labeling of poisoned samples. In this study, we propose {\bf ProAttack}, a novel and efficient method for performing clean-label backdoor attacks based on the prompt, which uses the prompt itself as a trigger. Our method does not require external triggers and ensures correct labeling of poisoned samples, improving the stealthy nature of the backdoor attack. With extensive experiments on rich-resource and few-shot text classification tasks, we empirically validate ProAttack's competitive performance in textual backdoor attacks. Notably, in the rich-resource setting, ProAttack achieves state-of-the-art attack success rates in the clean-label backdoor attack benchmark without external triggers. All data and code used in our models are publically available\footnote{\url{https://github.com/shuaizhao95/Prompt_attack}}.

Updatable Learned Indexes Meet Disk-Resident DBMS -- From Evaluations to Design Choices

  • Authors: Hai Lan, Zhifeng Bao, J. Shane Culpepper, Renata Borovica-Gajic
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2305.01237
  • Pdf link: https://arxiv.org/pdf/2305.01237
  • Abstract
    Although many updatable learned indexes have been proposed in recent years, whether they can outperform traditional approaches on disk remains unknown. In this study, we revisit and implement four state-of-the-art updatable learned indexes on disk, and compare them against the B+-tree under a wide range of settings. Through our evaluation, we make some key observations: 1) Overall, the B+-tree performs well across a range of workload types and datasets. 2) A learned index could outperform B+-tree or other learned indexes on disk for a specific workload. For example, PGM achieves the best performance in write-only workloads while LIPP significantly outperforms others in lookup-only workloads. We further conduct a detailed performance analysis to reveal the strengths and weaknesses of these learned indexes on disk. Moreover, we summarize the observed common shortcomings in five categories and propose four design principles to guide future design of on-disk, updatable learned indexes: (1) reducing the index's tree height, (2) better data structures to lower operation overheads, (3) improving the efficiency of scan operations, and (4) more efficient storage layout.

Arax: A Runtime Framework for Decoupling Applications from Heterogeneous Accelerators

  • Authors: Manos Pavlidakis, Stelios Mavridis, Antony Chazapis, Giorgos Vasiliadis, Angelos Bilas
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.01291
  • Pdf link: https://arxiv.org/pdf/2305.01291
  • Abstract
    Today, using multiple heterogeneous accelerators efficiently from applications and high-level frameworks, such as TensorFlow and Caffe, poses significant challenges in three respects: (a) sharing accelerators, (b) allocating available resources elastically during application execution, and (c) reducing the required programming effort. In this paper, we present Arax, a runtime system that decouples applications from heterogeneous accelerators within a server. First, Arax maps application tasks dynamically to available resources, managing all required task state, memory allocations, and task dependencies. As a result, Arax can share accelerators across applications in a server and adjust the resources used by each application as load fluctuates over time. dditionally, Arax offers a simple API and includes Autotalk, a stub generator that automatically generates stub libraries for applications already written for specific accelerator types, such as NVIDIA GPUs. Consequently, Arax applications are written once without considering physical details, including the number and type of accelerators. Our results show that applications, such as Caffe, TensorFlow, and Rodinia, can run using Arax with minimum effort and low overhead compared to native execution, about 12% (geometric mean). Arax supports efficient accelerator sharing, by offering up to 20% improved execution times compared to NVIDIA MPS, which supports NVIDIA GPUs only. Arax can transparently provide elasticity, decreasing total application turn-around time by up to 2x compared to native execution without elasticity support.

Higher-Order GFDM for Linear Elliptic Operators

  • Authors: Heinrich Kraus, Jörg Kuhnert, Pratik Suchde
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2305.01320
  • Pdf link: https://arxiv.org/pdf/2305.01320
  • Abstract
    We present a novel approach of discretizing diffusion operators of the form $\nabla\cdot(\lambda\nabla u)$ in the context of meshfree generalized finite difference methods. Our ansatz uses properties of derived operators and combines the discrete Laplace operator with reconstruction functions approximating the diffusion coefficient $\lambda$. Provided that the reconstructions are of a sufficiently high order, we prove that the order of accuracy of the discrete Laplace operator transfers to the derived diffusion operator. We show that the new discrete diffusion operator inherits the diagonal dominance property of the discrete Laplace operator and fulfills enrichment properties. Our numerical results for elliptic and parabolic partial differential equations show that even low-order reconstructions preserve the order of the underlying discrete Laplace operator for sufficiently smooth diffusion coefficients. In experiments, we demonstrate the applicability of the new discrete diffusion operator to interface problems with point clouds not aligning to the interface and numerically prove first-order convergence.

Guaranteeing Envy-Freeness under Generalized Assignment Constraints

  • Authors: Siddharth Barman, Arindam Khan, Sudarshan Shyam, K. V. N. Sreenivas
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2305.01339
  • Pdf link: https://arxiv.org/pdf/2305.01339
  • Abstract
    We study fair division of goods under the broad class of generalized assignment constraints. In this constraint framework, the sizes and values of the goods are agent-specific, and one needs to allocate the goods among the agents fairly while further ensuring that each agent receives a bundle of total size at most the corresponding budget of the agent. Since, in such a constraint setting, it may not always be feasible to partition all the goods among the agents, we conform -- as in recent works -- to the construct of charity to designate the set of unassigned goods. For this allocation framework, we obtain existential and computational guarantees for envy-free (appropriately defined) allocation of divisible and indivisible goods, respectively, among agents with individual, additive valuations for the goods. We deem allocations to be fair by evaluating envy only with respect to feasible subsets. In particular, an allocation is said to be feasibly envy-free (FEF) iff each agent prefers its bundle over every (budget) feasible subset within any other agent's bundle (and within the charity). The current work establishes that, for divisible goods, FEF allocations are guaranteed to exist and can be computed efficiently under generalized assignment constraints. In the context of indivisible goods, FEF allocations do not necessarily exist, and hence, we consider the fairness notion of feasible envy-freeness up to any good (FEFx). We show that, under generalized assignment constraints, an FEFx allocation of indivisible goods always exists. In fact, our FEFx result resolves open problems posed in prior works. Further, for indivisible goods and under generalized assignment constraints, we provide a pseudo-polynomial time algorithm for computing FEFx allocations, and a fully polynomial-time approximation scheme (FPTAS) for computing approximate FEFx allocations.

Next-Generation Full Duplex Networking System Empowered by Reconfigurable Intelligent Surfaces

  • Authors: Yingyang Chen, Yuncong Li, Miaowen Wen, Duoying Zhang, Bingli Jiao, Zhiguo Ding, Theodoros A. Tsiftsis, H. Vincent Poor
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.01341
  • Pdf link: https://arxiv.org/pdf/2305.01341
  • Abstract
    Full duplex (FD) radio has attracted extensive attention due to its co-time and co-frequency transceiving capability. {However, the potential gain brought by FD radios is closely related to the management of self-interference (SI), which imposes high or even stringent requirements on SI cancellation (SIC) techniques. When the FD deployment evolves into next-generation mobile networking, the SI problem becomes more complicated, significantly limiting its potential gains.} In this paper, we conceive a multi-cell FD networking scheme by deploying a reconfigurable intelligent surface (RIS) at the cell boundary to configure the radio environment proactively. To achieve the full potential of the system, we aim to maximize the sum rate (SR) of multiple cells by jointly optimizing the transmit precoding (TPC) matrices at FD base stations (BSs) and users and the phase shift matrix at RIS. Since the original problem is non-convex, we reformulate and decouple it into a pair of subproblems by utilizing the relationship between the SR and minimum mean square error (MMSE). The optimal solutions of TPC matrices are obtained in closed form, while both complex circle manifold (CCM) and successive convex approximation (SCA) based algorithms are developed to resolve the phase shift matrix suboptimally. Our simulation results show that introducing an RIS into an FD networking system not only improves the overall SR significantly but also enhances the cell edge performance prominently. More importantly, we validate that the RIS deployment with optimized phase shifts can reduce the requirement for SIC and the number of BS antennas, which further reduces the hardware cost and power consumption, especially with a sufficient number of reflecting elements. As a result, the utilization of an RIS enables the originally cumbersome FD networking system to become efficient and practical.

Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees

  • Authors: Daqian Shao, Marta Kwiatkowska
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL)
  • Arxiv link: https://arxiv.org/abs/2305.01381
  • Pdf link: https://arxiv.org/pdf/2305.01381
  • Abstract
    Linear Temporal Logic (LTL) is widely used to specify high-level objectives for system policies, and it is highly desirable for autonomous systems to learn the optimal policy with respect to such specifications. However, learning the optimal policy from LTL specifications is not trivial. We present a model-free Reinforcement Learning (RL) approach that efficiently learns an optimal policy for an unknown stochastic system, modelled using Markov Decision Processes (MDPs). We propose a novel and more general product MDP, reward structure and discounting mechanism that, when applied in conjunction with off-the-shelf model-free RL algorithms, efficiently learn the optimal policy that maximizes the probability of satisfying a given LTL specification with optimality guarantees. We also provide improved theoretical results on choosing the key parameters in RL to ensure optimality. To directly evaluate the learned policy, we adopt probabilistic model checker PRISM to compute the probability of the policy satisfying such specifications. Several experiments on various tabular MDP environments across different LTL tasks demonstrate the improved sample efficiency and optimal policy convergence.

Efficient Federated Learning with Enhanced Privacy via Lottery Ticket Pruning in Edge Computing

  • Authors: Yifan Shi, Kang Wei, Li Shen, Jun Li, Xueqian Wang, Bo Yuan, Song Guo
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.01387
  • Pdf link: https://arxiv.org/pdf/2305.01387
  • Abstract
    Federated learning (FL) is a collaborative learning paradigm for decentralized private data from mobile terminals (MTs). However, it suffers from issues in terms of communication, resource of MTs, and privacy. Existing privacy-preserving FL methods usually adopt the instance-level differential privacy (DP), which provides a rigorous privacy guarantee but with several bottlenecks: severe performance degradation, transmission overhead, and resource constraints of edge devices such as MTs. To overcome these drawbacks, we propose Fed-LTP, an efficient and privacy-enhanced FL framework with \underline{\textbf{L}}ottery \underline{\textbf{T}}icket \underline{\textbf{H}}ypothesis (LTH) and zero-concentrated D\underline{\textbf{P}} (zCDP). It generates a pruned global model on the server side and conducts sparse-to-sparse training from scratch with zCDP on the client side. On the server side, two pruning schemes are proposed: (i) the weight-based pruning (LTH) determines the pruned global model structure; (ii) the iterative pruning further shrinks the size of the pruned model's parameters. Meanwhile, the performance of Fed-LTP is also boosted via model validation based on the Laplace mechanism. On the client side, we use sparse-to-sparse training to solve the resource-constraints issue and provide tighter privacy analysis to reduce the privacy budget. We evaluate the effectiveness of Fed-LTP on several real-world datasets in both independent and identically distributed (IID) and non-IID settings. The results clearly confirm the superiority of Fed-LTP over state-of-the-art (SOTA) methods in communication, computation, and memory efficiencies while realizing a better utility-privacy trade-off.

Infrastructural Requirements and Regulatory Challenges of a Sustainable Urban Air Mobility Ecosystem

  • Authors: Árpád Takács, Tamás Haidegger
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.01398
  • Pdf link: https://arxiv.org/pdf/2305.01398
  • Abstract
    The United Nations has long put on the discussion agenda the sustainability challenges of ur- banization, which have both direct and indirect effects on future regulation strategies. Undoubtedly, most initiatives target better quality of life, improved access to services & goods and environment pro- tection. As commercial aerial urban transportation may become a feasible research goal in the near future, the connection possibilities between cities and regions scale up. It is expected that the growing number of vertical takeoff & landing vehicles used for passenger and goods transportation will change the infrastructure of the cities, and will have a significant effect on the cityscapes as well. In addition to the widely discussed regulatory and safety issues, the introduction of elevated traffic also raises environmental concerns, which influences the existing and required service and control infrastructure, and thus significantly affects sustainability. This paper provides narrated overview of the most common aspects of safety, licensing and regulations for passenger vertical takeoff & landing vehicles, and highlights the most important aspects of infrastructure planning, design and operation, which should be taken into account to maintain and efficiently operate this new way of transportation, leading to a sustainable urban air mobility ecosystem.

Get Back Here: Robust Imitation by Return-to-Distribution Planning

  • Authors: Geoffrey Cideron, Baruch Tabanpour, Sebastian Curi, Sertan Girgin, Leonard Hussenot, Gabriel Dulac-Arnold, Matthieu Geist, Olivier Pietquin, Robert Dadashi
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.01400
  • Pdf link: https://arxiv.org/pdf/2305.01400
  • Abstract
    We consider the Imitation Learning (IL) setup where expert data are not collected on the actual deployment environment but on a different version. To address the resulting distribution shift, we combine behavior cloning (BC) with a planner that is tasked to bring the agent back to states visited by the expert whenever the agent deviates from the demonstration distribution. The resulting algorithm, POIR, can be trained offline, and leverages online interactions to efficiently fine-tune its planner to improve performance over time. We test POIR on a variety of human-generated manipulation demonstrations in a realistic robotic manipulation simulator and show robustness of the learned policy to different initial state distributions and noisy dynamics.

Trade-off Between Optimal Efficiency and Envelope Correlation Coefficient for Antenna Clusters

  • Authors: Vojtech Neuman, Miloslav Capek, Lukas Jelinek, Anu Lehtovuori, Ville Viikari
  • Subjects: Information Theory (cs.IT); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2305.01416
  • Pdf link: https://arxiv.org/pdf/2305.01416
  • Abstract
    This paper introduces a theory for assessing and optimizing the multiple-input-multiple-output performance of multi-port cluster antennas in terms of efficiency, channel correlation, and power distribution. A method based on a convex optimization of feeding coefficients is extended with additional constraints allowing the user to control a ratio between the power radiated by the clusters. The formulation of the problem makes it possible to simultaneously optimize total efficiency and channel correlation with a fixed ratio between power radiated by the clusters, thus examining a trade-off between these parameters. It is shown that channel correlation, total efficiency, and allocation of radiated power are mutually conflicting parameters. The trade-offs are shown and discussed. The theory is demonstrated on a four-element antenna array and on a mobile terminal antenna.

An Efficient Multi-solution Solver for the Inverse Kinematics of 3-Section Constant-Curvature Robots

  • Authors: Ke Qiu, Jingyu Zhang, Danying Sun, Rong Xiong, Haojian Lu, Yue Wang
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.01458
  • Pdf link: https://arxiv.org/pdf/2305.01458
  • Abstract
    Piecewise constant curvature is a popular kinematics framework for continuum robots. Computing the model parameters from the desired end pose, known as the inverse kinematics problem, is fundamental in manipulation, tracking and planning tasks. In this paper, we propose an efficient multi-solution solver to address the inverse kinematics problem of 3-section constant-curvature robots by bridging both the theoretical reduction and numerical correction. We derive analytical conditions to simplify the original problem into a one-dimensional problem. Further, the equivalence of the two problems is formalised. In addition, we introduce an approximation with bounded error so that the one dimension becomes traversable while the remaining parameters analytically solvable. With the theoretical results, the global search and numerical correction are employed to implement the solver. The experiments validate the better efficiency and higher success rate of our solver than the numerical methods when one solution is required, and demonstrate the ability of obtaining multiple solutions with optimal path planning in a space with obstacles.

An Efficient Quadratic Interpolation Scheme for a Third-Order Cell-Centered Finite-Volume Method on Tetrahedral Grids

  • Authors: Hiroaki Nishikawa, Jeffery A. White
  • Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
  • Arxiv link: https://arxiv.org/abs/2305.01466
  • Pdf link: https://arxiv.org/pdf/2305.01466
  • Abstract
    In this paper, we propose an efficient quadratic interpolation formula utilizing solution gradients computed and stored at nodes and demonstrate its application to a third-order cell-centered finite-volume discretization on tetrahedral grids. The proposed quadratic formula is constructed based on an efficient formula of computing a projected derivative. It is efficient in that it completely eliminates the need to compute and store second derivatives of solution variables or any other quantities, which are typically required in upgrading a second-order cell-centered unstructured-grid finite-volume discretization to third-order accuracy. Moreover, a high-order flux quadrature formula, as required for third-order accuracy, can also be simplified by utilizing the efficient projected-derivative formula, resulting in a numerical flux at a face centroid plus a curvature correction not involving second derivatives of the flux. Similarly, a source term can be integrated over a cell to high-order in the form of a source term evaluated at the cell centroid plus a curvature correction, again, not requiring second derivatives of the source term. The discretization is defined as an approximation to an integral form of a conservation law but the numerical solution is defined as a point value at a cell center, leading to another feature that there is no need to compute and store geometric moments for a quadratic polynomial to preserve a cell average. Third-order accuracy and improved second-order accuracy are demonstrated and investigated for simple but illustrative test cases in three dimensions.

Stochastic Contextual Bandits with Graph-based Contexts

  • Authors: Jittat Fakcharoenphol, Chayutpong Prompak
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2305.01470
  • Pdf link: https://arxiv.org/pdf/2305.01470
  • Abstract
    We naturally generalize the on-line graph prediction problem to a version of stochastic contextual bandit problems where contexts are vertices in a graph and the structure of the graph provides information on the similarity of contexts. More specifically, we are given a graph $G=(V,E)$, whose vertex set $V$ represents contexts with {\em unknown} vertex label $y$. In our stochastic contextual bandit setting, vertices with the same label share the same reward distribution. The standard notion of instance difficulties in graph label prediction is the cutsize $f$ defined to be the number of edges whose end points having different labels. For line graphs and trees we present an algorithm with regret bound of $\tilde{O}(T^{2/3}K^{1/3}f^{1/3})$ where $K$ is the number of arms. Our algorithm relies on the optimal stochastic bandit algorithm by Zimmert and Seldin~[AISTAT'19, JMLR'21]. When the best arm outperforms the other arms, the regret improves to $\tilde{O}(\sqrt{KT\cdot f})$. The regret bound in the later case is comparable to other optimal contextual bandit results in more general cases, but our algorithm is easy to analyze, runs very efficiently, and does not require an i.i.d. assumption on the input context sequence. The algorithm also works with general graphs using a standard random spanning tree reduction.

Efficient Sensitivity Analysis for Parametric Robust Markov Chains

  • Authors: Thom Badings, Sebastian Junges, Ahmadreza Marandi, Ufuk Topcu, Nils Jansen
  • Subjects: Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2305.01473
  • Pdf link: https://arxiv.org/pdf/2305.01473
  • Abstract
    We provide a novel method for sensitivity analysis of parametric robust Markov chains. These models incorporate parameters and sets of probability distributions to alleviate the often unrealistic assumption that precise probabilities are available. We measure sensitivity in terms of partial derivatives with respect to the uncertain transition probabilities regarding measures such as the expected reward. As our main contribution, we present an efficient method to compute these partial derivatives. To scale our approach to models with thousands of parameters, we present an extension of this method that selects the subset of $k$ parameters with the highest partial derivative. Our methods are based on linear programming and differentiating these programs around a given value for the parameters. The experiments show the applicability of our approach on models with over a million states and thousands of parameters. Moreover, we embed the results within an iterative learning scheme that profits from having access to a dedicated sensitivity analysis.

Building Reliable Budget-Based Binary-State Networks

  • Authors: Wei-Chang Yeh
  • Subjects: Networking and Internet Architecture (cs.NI); Probability (math.PR); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2305.01488
  • Pdf link: https://arxiv.org/pdf/2305.01488
  • Abstract
    Everyday life is driven by various network, such as supply chains for distributing raw materials, semi-finished product goods, and final products; Internet of Things (IoT) for connecting and exchanging data; utility networks for transmitting fuel, power, water, electricity, and 4G/5G; and social networks for sharing information and connections. The binary-state network is a basic network, where the state of each component is either success or failure, i.e., the binary-state. Network reliability plays an important role in evaluating the performance of network planning, design, and management. Because more networks are being set up in the real world currently, there is a need for their reliability. It is necessary to build a reliable network within a limited budget. However, existing studies are focused on the budget limit for each minimal path (MP) in networks without considering the total budget of the entire network. We propose a novel concept to consider how to build a more reliable binary-state network under the budget limit. In addition, we propose an algorithm based on the binary-addition-tree algorithm (BAT) and stepwise vectors to solve the problem efficiently.

BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge Platforms

  • Authors: Ziyang Zhang, Huan Li, Yang Zhao, Changyao Lin, Jie Liu
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS)
  • Arxiv link: https://arxiv.org/abs/2305.01519
  • Pdf link: https://arxiv.org/pdf/2305.01519
  • Abstract
    As deep neural networks (DNNs) are being applied to a wide range of edge intelligent applications, it is critical for edge inference platforms to have both high-throughput and low-latency at the same time. Such edge platforms with multiple DNN models pose new challenges for scheduler designs. First, each request may have different service level objectives (SLOs) to improve quality of service (QoS). Second, the edge platforms should be able to efficiently schedule multiple heterogeneous DNN models so that system utilization can be improved. To meet these two goals, this paper proposes BCEdge, a novel learning-based scheduling framework that takes adaptive batching and concurrent execution of DNN inference services on edge platforms. We define a utility function to evaluate the trade-off between throughput and latency. The scheduler in BCEdge leverages maximum entropy-based deep reinforcement learning (DRL) to maximize utility by 1) co-optimizing batch size and 2) the number of concurrent models automatically. Our prototype implemented on different edge platforms shows that the proposed BCEdge enhances utility by up to 37.6% on average, compared to state-of-the-art solutions, while satisfying SLOs.

Unlocking the Power of Representations in Long-term Novelty-based Exploration

  • Authors: Alaa Saade, Steven Kapturowski, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra, Oliver Groth, Michal Valko, Bilal Piot
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2305.01521
  • Pdf link: https://arxiv.org/pdf/2305.01521
  • Abstract
    We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space. By adapting classical clustering to the nonstationary setting of Deep RL, RECODE can efficiently track state visitation counts over thousands of episodes. We further propose a novel generalization of the inverse dynamics loss, which leverages masked transformer architectures for multi-step prediction; which in conjunction with RECODE achieves a new state-of-the-art in a suite of challenging 3D-exploration tasks in DM-Hard-8. RECODE also sets new state-of-the-art in hard exploration Atari games, and is the first agent to reach the end screen in "Pitfall!".

Faster 0-1-Knapsack via Near-Convex Min-Plus-Convolution

  • Authors: Karl Bringmann, Alejandro Cassis
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2305.01593
  • Pdf link: https://arxiv.org/pdf/2305.01593
  • Abstract
    We revisit the classic 0-1-Knapsack problem, in which we are given $n$ items with their weights and profits as well as a weight budget $W$, and the goal is to find a subset of items of total weight at most $W$ that maximizes the total profit. We study pseudopolynomial-time algorithms parameterized by the largest profit of any item $p_{\max}$, and the largest weight of any item $w_{\max}$. Our main result are algorithms for 0-1-Knapsack running in time $\tilde{O}(n,w_\max,p_\max^{2/3})$ and $\tilde{O}(n,p_\max,w_\max^{2/3})$, improving upon an algorithm in time $O(n,p_\max,w_\max)$ by Pisinger [J. Algorithms '99]. In the regime $p_\max \approx w_\max \approx n$ (and $W \approx \mathrm{OPT} \approx n^2$) our algorithms are the first to break the cubic barrier $n^3$. To obtain our result, we give an efficient algorithm to compute the min-plus convolution of near-convex functions. More precisely, we say that a function $f \colon [n] \mapsto \mathbf{Z}$ is $\Delta$-near convex with $\Delta \geq 1$, if there is a convex function $\breve{f}$ such that $\breve{f}(i) \leq f(i) \leq \breve{f}(i) + \Delta$ for every $i$. We design an algorithm computing the min-plus convolution of two $\Delta$-near convex functions in time $\tilde{O}(n\Delta)$. This tool can replace the usage of the prediction technique of Bateni, Hajiaghayi, Seddighin and Stein [STOC '18] in all applications we are aware of, and we believe it has wider applicability.

Augmented Electronic Ising Machine as an Effective SAT Solver

  • Authors: Anshujit Sharma, Matthew Burns, Andrew Hahn, Michael Huang
  • Subjects: Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Emerging Technologies (cs.ET)
  • Arxiv link: https://arxiv.org/abs/2305.01623
  • Pdf link: https://arxiv.org/pdf/2305.01623
  • Abstract
    With the slowdown of improvement in conventional von Neumann systems, increasing attention is paid to novel paradigms such as Ising machines. They have very different approach to NP-complete optimization problems. Ising machines have shown great potential in solving binary optimization problems like MaxCut. In this paper, we present an analysis of these systems in satisfiability (SAT) problems. We demonstrate that, in the case of 3-SAT, a basic architecture fails to produce meaningful acceleration, thanks in no small part to the relentless progress made in conventional SAT solvers. Nevertheless, careful analysis attributes part of the failure to the lack of two important components: cubic interactions and efficient randomization heuristics. To overcome these limitations, we add proper architectural support for cubic interaction on a state-of-the-art Ising machine. More importantly, we propose a novel semantic-aware annealing schedule that makes the search-space navigation much more efficient than existing annealing heuristics. With experimental analyses, we show that such an Augmented Ising Machine for SAT (AIMS), outperforms state-of-the-art software-based, GPU-based and conventional hardware SAT solvers by orders of magnitude. We also demonstrate AIMS to be relatively robust against device variation and noise.

Sequence Modeling with Multiresolution Convolutional Memory

  • Authors: Jiaxin Shi, Ke Alexander Wang, Emily B. Fox
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2305.01638
  • Pdf link: https://arxiv.org/pdf/2305.01638
  • Abstract
    Efficiently capturing the long-range patterns in sequential data sources salient to a given task -- such as classification and generative modeling -- poses a fundamental challenge. Popular approaches in the space tradeoff between the memory burden of brute-force enumeration and comparison, as in transformers, the computational burden of complicated sequential dependencies, as in recurrent neural networks, or the parameter burden of convolutional networks with many or large filters. We instead take inspiration from wavelet-based multiresolution analysis to define a new building block for sequence modeling, which we call a MultiresLayer. The key component of our model is the multiresolution convolution, capturing multiscale trends in the input sequence. Our MultiresConv can be implemented with shared filters across a dilated causal convolution tree. Thus it garners the computational advantages of convolutional networks and the principled theoretical motivation of wavelet decompositions. Our MultiresLayer is straightforward to implement, requires significantly fewer parameters, and maintains at most a $\mathcal{O}(N\log N)$ memory footprint for a length $N$ sequence. Yet, by stacking such layers, our model yields state-of-the-art performance on a number of sequence classification and autoregressive density estimation tasks using CIFAR-10, ListOps, and PTB-XL datasets.

Key-Locked Rank One Editing for Text-to-Image Personalization

  • Authors: Yoad Tewel, Rinon Gal, Gal Chechik, Yuval Atzmon
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2305.01644
  • Pdf link: https://arxiv.org/pdf/2305.01644
  • Abstract
    Text-to-image models (T2I) offer a new level of flexibility by allowing users to guide the creative process through natural language. However, personalizing these models to align with user-provided visual concepts remains a challenging problem. The task of T2I personalization poses multiple hard challenges, such as maintaining high visual fidelity while allowing creative control, combining multiple personalized concepts in a single image, and keeping a small model size. We present Perfusion, a T2I personalization method that addresses these challenges using dynamic rank-1 updates to the underlying T2I model. Perfusion avoids overfitting by introducing a new mechanism that "locks" new concepts' cross-attention Keys to their superordinate category. Additionally, we develop a gated rank-1 approach that enables us to control the influence of a learned concept during inference time and to combine multiple concepts. This allows runtime-efficient balancing of visual-fidelity and textual-alignment with a single 100KB trained model, which is five orders of magnitude smaller than the current state of the art. Moreover, it can span different operating points across the Pareto front without additional training. Finally, we show that Perfusion outperforms strong baselines in both qualitative and quantitative terms. Importantly, key-locking leads to novel results compared to traditional approaches, allowing to portray personalized object interactions in unprecedented ways, even in one-shot settings.

Distill or Annotate? Cost-Efficient Fine-Tuning of Compact Models

  • Authors: Junmo Kang, Wei Xu, Alan Ritter
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2305.01645
  • Pdf link: https://arxiv.org/pdf/2305.01645
  • Abstract
    Fine-tuning large models is highly effective, however, inference using these models can be expensive and produces carbon emissions. Knowledge distillation has been shown to be a practical solution to reduce inference costs, but the distillation process itself requires significant computational resources. Rather than buying or renting GPUs to fine-tune, then distill a large model, an NLP practitioner who needs a compact model might also choose to simply allocate an available budget to hire annotators and manually label additional fine-tuning data. In this paper, we investigate how to most efficiently use a fixed budget to build a compact model. Through our extensive experiments on six diverse NLP tasks, we find that distilling from T5-XXL (11B) to T5-Small (60M) leads to almost always a cost-efficient option compared to annotating more data to directly train a compact model (T5-Small (60M)). We further demonstrate that the optimal amount of distillation that maximizes utility varies across different budgetary scenarios.

Keyword: faster

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

  • Authors: Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Bryan M. Wong, Zizhong Chen
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2305.01024
  • Pdf link: https://arxiv.org/pdf/2305.01024
  • Abstract
    General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as machine learning and scientific computing, and an efficient GEMM implementation is essential for the performance of these systems. While researchers often strive for faster performance by using large compute platforms, the increased scale of these systems can raise concerns about hardware and software reliability. In this paper, we present a design for a high-performance GEMM with algorithm-based fault tolerance for use on GPUs. We describe fault-tolerant designs for GEMM at the thread, warp, and threadblock levels, and also provide a baseline GEMM implementation that is competitive with or faster than the state-of-the-art, proprietary cuBLAS GEMM. We present a kernel fusion strategy to overlap and mitigate the memory latency due to fault tolerance with the original GEMM computation. To support a wide range of input matrix shapes and reduce development costs, we present a template-based approach for automatic code generation for both fault-tolerant and non-fault-tolerant GEMM implementations. We evaluate our work on NVIDIA Tesla T4 and A100 server GPUs. Experimental results demonstrate that our baseline GEMM presents comparable or superior performance compared to the closed-source cuBLAS. The fault-tolerant GEMM incurs only a minimal overhead (8.89% on average) compared to cuBLAS even with hundreds of errors injected per minute. For irregularly shaped inputs, the code generator-generated kernels show remarkable speedups of $160% \sim 183.5%$ and $148.55% \sim 165.12%$ for fault-tolerant and non-fault-tolerant GEMMs, outperforming cuBLAS by up to $41.40%$.

Autoencoders for discovering manifold dimension and coordinates in data from complex dynamical systems

  • Authors: Kevin Zeng, Michael D. Graham
  • Subjects: Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD)
  • Arxiv link: https://arxiv.org/abs/2305.01090
  • Pdf link: https://arxiv.org/pdf/2305.01090
  • Abstract
    While many phenomena in physics and engineering are formally high-dimensional, their long-time dynamics often live on a lower-dimensional manifold. The present work introduces an autoencoder framework that combines implicit regularization with internal linear layers and $L_2$ regularization (weight decay) to automatically estimate the underlying dimensionality of a data set, produce an orthogonal manifold coordinate system, and provide the mapping functions between the ambient space and manifold space, allowing for out-of-sample projections. We validate our framework's ability to estimate the manifold dimension for a series of datasets from dynamical systems of varying complexities and compare to other state-of-the-art estimators. We analyze the training dynamics of the network to glean insight into the mechanism of low-rank learning and find that collectively each of the implicit regularizing layers compound the low-rank representation and even self-correct during training. Analysis of gradient descent dynamics for this architecture in the linear case reveals the role of the internal linear layers in leading to faster decay of a "collective weight variable" incorporating all layers, and the role of weight decay in breaking degeneracies and thus driving convergence along directions in which no decay would occur in its absence. We show that this framework can be naturally extended for applications of state-space modeling and forecasting by generating a data-driven dynamic model of a spatiotemporally chaotic partial differential equation using only the manifold coordinates. Finally, we demonstrate that our framework is robust to hyperparameter choices.

Faster OreFSDet : A Lightweight and Effective Few-shot Object Detector for Ore Images

  • Authors: Yang Zhang, Le Cheng, Yuting Peng, Chengming Xu, Yanwei Fu, Bo Wu, Guodong Sun
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2305.01183
  • Pdf link: https://arxiv.org/pdf/2305.01183
  • Abstract
    For the ore particle size detection, obtaining a sizable amount of high-quality ore labeled data is time-consuming and expensive. General object detection methods often suffer from severe over-fitting with scarce labeled data. Despite their ability to eliminate over-fitting, existing few-shot object detectors encounter drawbacks such as slow detection speed and high memory requirements, making them difficult to implement in a real-world deployment scenario. To this end, we propose a lightweight and effective few-shot detector to achieve competitive performance with general object detection with only a few samples for ore images. First, the proposed support feature mining block characterizes the importance of location information in support features. Next, the relationship guidance block makes full use of support features to guide the generation of accurate candidate proposals. Finally, the dual-scale semantic aggregation module retrieves detailed features at different resolutions to contribute with the prediction process. Experimental results show that our method consistently exceeds the few-shot detectors with an excellent performance gap on all metrics. Moreover, our method achieves the smallest model size of 19MB as well as being competitive at 50 FPS detection speed compared with general object detectors. The source code is available at https://github.com/MVME-HBUT/Faster-OreFSDet.

Optimizing Guided Traversal for Fast Learned Sparse Retrieval

  • Authors: Yifan Qiao, Yingrui Yang, Haixin Lin, Tao Yang
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2305.01203
  • Pdf link: https://arxiv.org/pdf/2305.01203
  • Abstract
    Recent studies show that BM25-driven dynamic index skipping can greatly accelerate MaxScore-based document retrieval based on the learned sparse representation derived by DeepImpact. This paper investigates the effectiveness of such a traversal guidance strategy during top k retrieval when using other models such as SPLADE and uniCOIL, and finds that unconstrained BM25-driven skipping could have a visible relevance degradation when the BM25 model is not well aligned with a learned weight model or when retrieval depth k is small. This paper generalizes the previous work and optimizes the BM25 guided index traversal with a two-level pruning control scheme and model alignment for fast retrieval using a sparse representation. Although there can be a cost of increased latency, the proposed scheme is much faster than the original MaxScore method without BM25 guidance while retaining the relevance effectiveness. This paper analyzes the competitiveness of this two-level pruning scheme, and evaluates its tradeoff in ranking relevance and time efficiency when searching several test datasets.

The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold

  • Authors: Jialin Mao, Itay Griniasty, Han Kheng Teoh, Rahul Ramesh, Rubing Yang, Mark K. Transtrum, James P. Sethna, Pratik Chaudhari
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.01604
  • Pdf link: https://arxiv.org/pdf/2305.01604
  • Abstract
    We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.

Keyword: mobile

Development of IoT Smart Greenhouse System for Hydroponic Gardens

  • Authors: Arcel Christian H. Austria, John Simon Fabros, Kurt Russel G. Sumilang, Jocelyn Bernardino, Anabella C. Doctor
  • Subjects: Systems and Control (eess.SY); Computers and Society (cs.CY); Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2305.01189
  • Pdf link: https://arxiv.org/pdf/2305.01189
  • Abstract
    This study focused on the development of a smart greenhouse system for hydroponic gardens with the adaptation of the Internet of Things and monitored through mobile as one of the solutions towards the negative effects of the worlds booming population, never ending - shrinking of arable lands, and the effect of climate change drastically in our environments. To achieve the goal of the study, the researchers created an actual hydroponic greenhouse system with completely developing plants, and automation in examining and monitoring the water pH level, light, water, and greenhouse temperature, as well as humidity which is linked to ThingSpeak. The developed SMART Greenhouse monitoring system was tested and evaluated to confirm its reliability, functions, and usability under ISO 9126 evaluation criteria. The respondents who include casual plant owners and experts in hydroponic gardening able to test and evaluate the prototype, and the mobile application to monitor the parameters with the results of 7.77 for pH level, 83 for light, 27.94 deg C for water temperature, 27 deg C for greenhouse temperature, and 75% for humidity with a descriptive result in both software and hardware as Very Good with a mean average of 4.06 which means that the developed technology is useful and recommended. The SMART Greenhouse System for Hydroponic Garden is used as an alternative tool, solution, and innovation technique towards food shortages due to climate change, land shortages, and low farming environments. The proponents highly suggest the use of solar energy for the pump power, prototype wiring should be improved, the usage of a high-end model of Arduino to address more sensors and devices for a larger arsenal of data collected, enclosures of the device to ensure safety, and mobile application updates such as bug fixes and have an e-manual of the whole systems.

HuNavSim: A ROS 2 Human Navigation Simulator for Benchmarking Human-Aware Robot Navigation

  • Authors: Noé Pérez-Higueras, Roberto Otero, Fernando Caballero, Luis Merino
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.01303
  • Pdf link: https://arxiv.org/pdf/2305.01303
  • Abstract
    This work presents the Human Navigation Simulator (HuNavSim), a novel open-source tool for the simulation of different human-agent navigation behaviors in scenarios with mobile robots. The tool, the first programmed under the ROS 2 framework, can be employed along with different well-known robotics simulators like Gazebo. The main goal is to ease the development and evaluation of human-aware robot navigation systems in simulation. Besides a general human-navigation model, HuNavSim includes, as a novelty, a rich set of individual and realistic human navigation behaviors and a complete set of metrics for social navigation benchmarking.

Next-Generation Full Duplex Networking System Empowered by Reconfigurable Intelligent Surfaces

  • Authors: Yingyang Chen, Yuncong Li, Miaowen Wen, Duoying Zhang, Bingli Jiao, Zhiguo Ding, Theodoros A. Tsiftsis, H. Vincent Poor
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.01341
  • Pdf link: https://arxiv.org/pdf/2305.01341
  • Abstract
    Full duplex (FD) radio has attracted extensive attention due to its co-time and co-frequency transceiving capability. {However, the potential gain brought by FD radios is closely related to the management of self-interference (SI), which imposes high or even stringent requirements on SI cancellation (SIC) techniques. When the FD deployment evolves into next-generation mobile networking, the SI problem becomes more complicated, significantly limiting its potential gains.} In this paper, we conceive a multi-cell FD networking scheme by deploying a reconfigurable intelligent surface (RIS) at the cell boundary to configure the radio environment proactively. To achieve the full potential of the system, we aim to maximize the sum rate (SR) of multiple cells by jointly optimizing the transmit precoding (TPC) matrices at FD base stations (BSs) and users and the phase shift matrix at RIS. Since the original problem is non-convex, we reformulate and decouple it into a pair of subproblems by utilizing the relationship between the SR and minimum mean square error (MMSE). The optimal solutions of TPC matrices are obtained in closed form, while both complex circle manifold (CCM) and successive convex approximation (SCA) based algorithms are developed to resolve the phase shift matrix suboptimally. Our simulation results show that introducing an RIS into an FD networking system not only improves the overall SR significantly but also enhances the cell edge performance prominently. More importantly, we validate that the RIS deployment with optimized phase shifts can reduce the requirement for SIC and the number of BS antennas, which further reduces the hardware cost and power consumption, especially with a sufficient number of reflecting elements. As a result, the utilization of an RIS enables the originally cumbersome FD networking system to become efficient and practical.

Efficient Federated Learning with Enhanced Privacy via Lottery Ticket Pruning in Edge Computing

  • Authors: Yifan Shi, Kang Wei, Li Shen, Jun Li, Xueqian Wang, Bo Yuan, Song Guo
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.01387
  • Pdf link: https://arxiv.org/pdf/2305.01387
  • Abstract
    Federated learning (FL) is a collaborative learning paradigm for decentralized private data from mobile terminals (MTs). However, it suffers from issues in terms of communication, resource of MTs, and privacy. Existing privacy-preserving FL methods usually adopt the instance-level differential privacy (DP), which provides a rigorous privacy guarantee but with several bottlenecks: severe performance degradation, transmission overhead, and resource constraints of edge devices such as MTs. To overcome these drawbacks, we propose Fed-LTP, an efficient and privacy-enhanced FL framework with \underline{\textbf{L}}ottery \underline{\textbf{T}}icket \underline{\textbf{H}}ypothesis (LTH) and zero-concentrated D\underline{\textbf{P}} (zCDP). It generates a pruned global model on the server side and conducts sparse-to-sparse training from scratch with zCDP on the client side. On the server side, two pruning schemes are proposed: (i) the weight-based pruning (LTH) determines the pruned global model structure; (ii) the iterative pruning further shrinks the size of the pruned model's parameters. Meanwhile, the performance of Fed-LTP is also boosted via model validation based on the Laplace mechanism. On the client side, we use sparse-to-sparse training to solve the resource-constraints issue and provide tighter privacy analysis to reduce the privacy budget. We evaluate the effectiveness of Fed-LTP on several real-world datasets in both independent and identically distributed (IID) and non-IID settings. The results clearly confirm the superiority of Fed-LTP over state-of-the-art (SOTA) methods in communication, computation, and memory efficiencies while realizing a better utility-privacy trade-off.

A Mobile Quad-Arm Robot ARMS: Wheel-Legged Tripedal Mobility and Quad-Arm Manipulation

  • Authors: Hisayoshi Muramatsu, Keigo Kitagawa, Jun Watanabe, Ryohei Hisashiki
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.01406
  • Pdf link: https://arxiv.org/pdf/2305.01406
  • Abstract
    This letter proposes a mobile quad-arm robot: ARMS that unifies wheel-legged tripedal mobility, wheeled mobility, and quad-arm manipulation. The four arms have different mechanics and are designed to be general-purpose arms to enable the wheel-legged hybrid mobilities and manipulation. The three-degree-of-freedom (DOF) front arm has an active wheel, which is used for wheel-legged tripedal walking and wheel driving with passive wheels attached to the torso. The three-DOF rear arms are series elastic arms, which are used for wheel-legged tripedal walking, object grasping, and manipulation. The two-DOF upper arm is used for manipulation only; its position and orientation are determined by coordinating all arms. Each motor is controlled by an angle controller and trajectory modification with angle, angular velocity, angular acceleration, and torque constraints. ARMS was experimentally validated on the basis of the following four tasks: wheel-legged walking, wheel-driving, wheel-driving with grasping, and carrying a bag.

Trade-off Between Optimal Efficiency and Envelope Correlation Coefficient for Antenna Clusters

  • Authors: Vojtech Neuman, Miloslav Capek, Lukas Jelinek, Anu Lehtovuori, Ville Viikari
  • Subjects: Information Theory (cs.IT); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2305.01416
  • Pdf link: https://arxiv.org/pdf/2305.01416
  • Abstract
    This paper introduces a theory for assessing and optimizing the multiple-input-multiple-output performance of multi-port cluster antennas in terms of efficiency, channel correlation, and power distribution. A method based on a convex optimization of feeding coefficients is extended with additional constraints allowing the user to control a ratio between the power radiated by the clusters. The formulation of the problem makes it possible to simultaneously optimize total efficiency and channel correlation with a fixed ratio between power radiated by the clusters, thus examining a trade-off between these parameters. It is shown that channel correlation, total efficiency, and allocation of radiated power are mutually conflicting parameters. The trade-offs are shown and discussed. The theory is demonstrated on a four-element antenna array and on a mobile terminal antenna.

On the Collaborative Object Transportation Using Leader Follower Approach

  • Authors: Sumanta Ghosh, Subhajit Nath, Sarvesh Sortee, Lokesh Kumar, Titas Bera
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.01614
  • Pdf link: https://arxiv.org/pdf/2305.01614
  • Abstract
    In this paper we address the multi-agent collaborative object transportation problem in a partially known environment with obstacles under a specified goal condition. We propose a leader follower approach for two mobile manipulators collaboratively transporting an object along specified desired trajectories. The proposed approach treats the mobile manipulation system as two independent subsystems: a mobile platform and a manipulator arm and uses their kinematics model for trajectory tracking. In this work we considered that the mobile platform is subject to non-holonomic constraints, with a manipulator carrying a rigid load. The desired trajectories of the end points of the load are obtained from Probabilistic RoadMap-based planning approach. Our method combines Proportional Navigation Guidance-based approach with a proposed Stop-and-Sync algorithm to reach sufficiently close to the desired trajectory, the deviation due to the non-holonomic constraints is compensated by the manipulator arm. A leader follower approach for computing inverse kinematics solution for the position of the end-effector of the manipulator arm is proposed to maintain the load rigidity. Further, we compare the proposed approach with other approaches to analyse the efficacy of our algorithm.

Keyword: pruning

Optimizing Guided Traversal for Fast Learned Sparse Retrieval

  • Authors: Yifan Qiao, Yingrui Yang, Haixin Lin, Tao Yang
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2305.01203
  • Pdf link: https://arxiv.org/pdf/2305.01203
  • Abstract
    Recent studies show that BM25-driven dynamic index skipping can greatly accelerate MaxScore-based document retrieval based on the learned sparse representation derived by DeepImpact. This paper investigates the effectiveness of such a traversal guidance strategy during top k retrieval when using other models such as SPLADE and uniCOIL, and finds that unconstrained BM25-driven skipping could have a visible relevance degradation when the BM25 model is not well aligned with a learned weight model or when retrieval depth k is small. This paper generalizes the previous work and optimizes the BM25 guided index traversal with a two-level pruning control scheme and model alignment for fast retrieval using a sparse representation. Although there can be a cost of increased latency, the proposed scheme is much faster than the original MaxScore method without BM25 guidance while retaining the relevance effectiveness. This paper analyzes the competitiveness of this two-level pruning scheme, and evaluates its tradeoff in ranking relevance and time efficiency when searching several test datasets.

Efficient Federated Learning with Enhanced Privacy via Lottery Ticket Pruning in Edge Computing

  • Authors: Yifan Shi, Kang Wei, Li Shen, Jun Li, Xueqian Wang, Bo Yuan, Song Guo
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.01387
  • Pdf link: https://arxiv.org/pdf/2305.01387
  • Abstract
    Federated learning (FL) is a collaborative learning paradigm for decentralized private data from mobile terminals (MTs). However, it suffers from issues in terms of communication, resource of MTs, and privacy. Existing privacy-preserving FL methods usually adopt the instance-level differential privacy (DP), which provides a rigorous privacy guarantee but with several bottlenecks: severe performance degradation, transmission overhead, and resource constraints of edge devices such as MTs. To overcome these drawbacks, we propose Fed-LTP, an efficient and privacy-enhanced FL framework with \underline{\textbf{L}}ottery \underline{\textbf{T}}icket \underline{\textbf{H}}ypothesis (LTH) and zero-concentrated D\underline{\textbf{P}} (zCDP). It generates a pruned global model on the server side and conducts sparse-to-sparse training from scratch with zCDP on the client side. On the server side, two pruning schemes are proposed: (i) the weight-based pruning (LTH) determines the pruned global model structure; (ii) the iterative pruning further shrinks the size of the pruned model's parameters. Meanwhile, the performance of Fed-LTP is also boosted via model validation based on the Laplace mechanism. On the client side, we use sparse-to-sparse training to solve the resource-constraints issue and provide tighter privacy analysis to reduce the privacy budget. We evaluate the effectiveness of Fed-LTP on several real-world datasets in both independent and identically distributed (IID) and non-IID settings. The results clearly confirm the superiority of Fed-LTP over state-of-the-art (SOTA) methods in communication, computation, and memory efficiencies while realizing a better utility-privacy trade-off.

Keyword: voxel

There is no result

Keyword: lidar

A New Wave in Robotics: Survey on Recent mmWave Radar Applications in Robotics

  • Authors: Kyle Harlow, Hyesu Jang, Timothy D. Barfoot, Ayoung Kim, Christoffer Heckman
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.01135
  • Pdf link: https://arxiv.org/pdf/2305.01135
  • Abstract
    We survey the current state of millimeterwave (mmWave) radar applications in robotics with a focus on unique capabilities, and discuss future opportunities based on the state of the art. Frequency Modulated Continuous Wave (FMCW) mmWave radars operating in the 76--81GHz range are an appealing alternative to lidars, cameras and other sensors operating in the near visual spectrum. Radar has been made more widely available in new packaging classes, more convenient for robotics and its longer wavelengths have the ability to bypass visual clutter such as fog, dust, and smoke. We begin by covering radar principles as they relate to robotics. We then review the relevant new research across a broad spectrum of robotics applications beginning with motion estimation, localization, and mapping. We then cover object detection and classification, and then close with an analysis of current datasets and calibration techniques that provide entry points into radar research.

Safe Autonomous Driving in Adverse Weather: Sensor Evaluation and Performance Monitoring

  • Authors: Fatih Sezgin, Daniel Vriesman, Dagmar Steinhauser, Robert Lugner, Thomas Brandmeier
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.01336
  • Pdf link: https://arxiv.org/pdf/2305.01336
  • Abstract
    The vehicle's perception sensors radar, lidar and camera, which must work continuously and without restriction, especially with regard to automated/autonomous driving, can lose performance due to unfavourable weather conditions. This paper analyzes the sensor signals of these three sensor technologies under rain and fog as well as day and night. A data set of a driving test vehicle as an object target under different weather conditions was recorded in a controlled environment with adjustable, defined, and reproducible weather conditions. Based on the sensor performance evaluation, a method has been developed to detect sensor degradation, including determining the affected data areas and estimating how severe they are. Through this sensor monitoring, measures can be taken in subsequent algorithms to reduce the influences or to take them into account in safety and assistance systems to avoid malfunctions.

FlowMap: Path Generation for Automated Vehicles in Open Space Using Traffic Flow

  • Authors: Wenchao Ding, Jieru Zhao, Yubin Chu, Haihui Huang, Tong Qin, Chunjing Xu, Yuxiang Guan, Zhongxue Gan
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.01622
  • Pdf link: https://arxiv.org/pdf/2305.01622
  • Abstract
    There is extensive literature on perceiving road structures by fusing various sensor inputs such as lidar point clouds and camera images using deep neural nets. Leveraging the latest advance of neural architects (such as transformers) and bird-eye-view (BEV) representation, the road cognition accuracy keeps improving. However, how to cognize the road'' for automated vehicles where there is no well-defined roads'' remains an open problem. For example, how to find paths inside intersections without HD maps is hard since there is neither an explicit definition for roads'' nor explicit features such as lane markings. The idea of this paper comes from a proverb: it becomes a way when people walk on it. Although there are no roads'' from sensor readings, there are ``roads'' from tracks of other vehicles. In this paper, we propose FlowMap, a path generation framework for automated vehicles based on traffic flows. FlowMap is built by extending our previous work RoadMap, a light-weight semantic map, with an additional traffic flow layer. A path generation algorithm on traffic flow fields (TFFs) is proposed to generate human-like paths. The proposed framework is validated using real-world driving data and is amenable to generating paths for super complicated intersections without using HD maps.

Neural LiDAR Fields for Novel View Synthesis

  • Authors: Shengyu Huang, Zan Gojcic, Zian Wang, Francis Williams, Yoni Kasten, Sanja Fidler, Konrad Schindler, Or Litany
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.01643
  • Pdf link: https://arxiv.org/pdf/2305.01643
  • Abstract
    We present Neural Fields for LiDAR (NFL), a method to optimise a neural field scene representation from LiDAR measurements, with the goal of synthesizing realistic LiDAR scans from novel viewpoints. NFL combines the rendering power of neural fields with a detailed, physically motivated model of the LiDAR sensing process, thus enabling it to accurately reproduce key sensor behaviors like beam divergence, secondary returns, and ray dropping. We evaluate NFL on synthetic and real LiDAR scans and show that it outperforms explicit reconstruct-then-simulate methods as well as other NeRF-style methods on LiDAR novel view synthesis task. Moreover, we show that the improved realism of the synthesized views narrows the domain gap to real scans and translates to better registration and semantic segmentation performance.

Keyword: diffusion

In-Context Learning Unlocked for Diffusion Models

  • Authors: Zhendong Wang, Yifan Jiang, Yadong Lu, Yelong Shen, Pengcheng He, Weizhu Chen, Zhangyang Wang, Mingyuan Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.01115
  • Pdf link: https://arxiv.org/pdf/2305.01115
  • Abstract
    We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models. Given a pair of task-specific example images, such as depth from/to image and scribble from/to image, and a text guidance, our model automatically understands the underlying task and performs the same task on a new query image following the text guidance. To achieve this, we propose a vision-language prompt that can model a wide range of vision-language tasks and a diffusion model that takes it as input. The diffusion model is trained jointly over six different tasks using these prompts. The resulting Prompt Diffusion model is the first diffusion-based vision-language foundation model capable of in-context learning. It demonstrates high-quality in-context generation on the trained tasks and generalizes effectively to new, unseen vision tasks with their respective prompts. Our model also shows compelling text-guided image editing results. Our framework, with code publicly available at https://github.com/Zhendong-Wang/Prompt-Diffusion, aims to facilitate research into in-context learning for computer vision.

Geometric Latent Diffusion Models for 3D Molecule Generation

  • Authors: Minkai Xu, Alexander Powers, Ron Dror, Stefano Ermon, Jure Leskovec
  • Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
  • Arxiv link: https://arxiv.org/abs/2305.01140
  • Pdf link: https://arxiv.org/pdf/2305.01140
  • Abstract
    Generative models, especially diffusion models (DMs), have achieved promising results for generating feature-rich geometries and advancing foundational science problems such as molecule design. Inspired by the recent huge success of Stable (latent) Diffusion models, we propose a novel and principled method for 3D molecule generation named Geometric Latent Diffusion Models (GeoLDM). GeoLDM is the first latent DM model for the molecular geometry domain, composed of autoencoders encoding structures into continuous latent codes and DMs operating in the latent space. Our key innovation is that for modeling the 3D molecular geometries, we capture its critical roto-translational equivariance constraints by building a point-structured latent space with both invariant scalars and equivariant tensors. Extensive experiments demonstrate that GeoLDM can consistently achieve better performance on multiple molecule generation benchmarks, with up to 7% improvement for the valid percentage of large biomolecules. Results also demonstrate GeoLDM's higher capacity for controllable generation thanks to the latent modeling. Code is provided at \url{https://github.com/MinkaiXu/GeoLDM}.

Solving Inverse Problems with Score-Based Generative Priors learned from Noisy Data

  • Authors: Asad Aali, Marius Arvinte, Sidharth Kumar, Jonathan I. Tamir
  • Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.01166
  • Pdf link: https://arxiv.org/pdf/2305.01166
  • Abstract
    We present SURE-Score: an approach for learning score-based generative models using training samples corrupted by additive Gaussian noise. When a large training set of clean samples is available, solving inverse problems via score-based (diffusion) generative models trained on the underlying fully-sampled data distribution has recently been shown to outperform end-to-end supervised deep learning. In practice, such a large collection of training data may be prohibitively expensive to acquire in the first place. In this work, we present an approach for approximately learning a score-based generative model of the clean distribution, from noisy training data. We formulate and justify a novel loss function that leverages Stein's unbiased risk estimate to jointly denoise the data and learn the score function via denoising score matching, while using only the noisy samples. We demonstrate the generality of SURE-Score by learning priors and applying posterior sampling to ill-posed inverse problems in two practical applications from different domains: compressive wireless multiple-input multiple-output channel estimation and accelerated 2D multi-coil magnetic resonance imaging reconstruction, where we demonstrate competitive reconstruction performance when learning at signal-to-noise ratio values of 0 and 10 dB, respectively.

DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling

  • Authors: Mehmet Saygin Seyfioglu, Karim Bouyarmane, Suren Kumar, Amir Tavanaei, Ismail B. Tutar
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.01257
  • Pdf link: https://arxiv.org/pdf/2305.01257
  • Abstract
    We introduce DreamPaint, a framework to intelligently inpaint any e-commerce product on any user-provided context image. The context image can be, for example, the user's own image for virtual try-on of clothes from the e-commerce catalog on themselves, the user's room image for virtual try-on of a piece of furniture from the e-commerce catalog in their room, etc. As opposed to previous augmented-reality (AR)-based virtual try-on methods, DreamPaint does not use, nor does it require, 3D modeling of neither the e-commerce product nor the user context. Instead, it directly uses 2D images of the product as available in product catalog database, and a 2D picture of the context, for example taken from the user's phone camera. The method relies on few-shot fine tuning a pre-trained diffusion model with the masked latents (e.g., Masked DreamBooth) of the catalog images per item, whose weights are then loaded on a pre-trained inpainting module that is capable of preserving the characteristics of the context image. DreamPaint allows to preserve both the product image and the context (environment/user) image without requiring text guidance to describe the missing part (product/context). DreamPaint also allows to intelligently infer the best 3D angle of the product to place at the desired location on the user context, even if that angle was previously unseen in the product's reference 2D images. We compare our results against both text-guided and image-guided inpainting modules and show that DreamPaint yields superior performance in both subjective human study and quantitative metrics.

Long-Term Rhythmic Video Soundtracker

  • Authors: Jiashuo Yu, Yaohui Wang, Xinyuan Chen, Xiao Sun, Yu Qiao
  • Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2305.01319
  • Pdf link: https://arxiv.org/pdf/2305.01319
  • Abstract
    We consider the problem of generating musical soundtracks in sync with rhythmic visual cues. Most existing works rely on pre-defined music representations, leading to the incompetence of generative flexibility and complexity. Other methods directly generating video-conditioned waveforms suffer from limited scenarios, short lengths, and unstable generation quality. To this end, we present Long-Term Rhythmic Video Soundtracker (LORIS), a novel framework to synthesize long-term conditional waveforms. Specifically, our framework consists of a latent conditional diffusion probabilistic model to perform waveform synthesis. Furthermore, a series of context-aware conditioning encoders are proposed to take temporal information into consideration for a long-term generation. Notably, we extend our model's applicability from dances to multiple sports scenarios such as floor exercise and figure skating. To perform comprehensive evaluations, we establish a benchmark for rhythmic video soundtracks including the pre-processed dataset, improved evaluation metrics, and robust generative baselines. Extensive experiments show that our model generates long-term soundtracks with state-of-the-art musical quality and rhythmic correspondence. Codes are available at \url{https://github.com/OpenGVLab/LORIS}.

Higher-Order GFDM for Linear Elliptic Operators

  • Authors: Heinrich Kraus, Jörg Kuhnert, Pratik Suchde
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2305.01320
  • Pdf link: https://arxiv.org/pdf/2305.01320
  • Abstract
    We present a novel approach of discretizing diffusion operators of the form $\nabla\cdot(\lambda\nabla u)$ in the context of meshfree generalized finite difference methods. Our ansatz uses properties of derived operators and combines the discrete Laplace operator with reconstruction functions approximating the diffusion coefficient $\lambda$. Provided that the reconstructions are of a sufficiently high order, we prove that the order of accuracy of the discrete Laplace operator transfers to the derived diffusion operator. We show that the new discrete diffusion operator inherits the diagonal dominance property of the discrete Laplace operator and fulfills enrichment properties. Our numerical results for elliptic and parabolic partial differential equations show that even low-order reconstructions preserve the order of the underlying discrete Laplace operator for sufficiently smooth diffusion coefficients. In experiments, we demonstrate the applicability of the new discrete diffusion operator to interface problems with point clouds not aligning to the interface and numerically prove first-order convergence.

Adopting AI: How Familiarity Breeds Both Trust and Contempt

  • Authors: Michael C. Horowitz, Lauren Kahn, Julia Macdonald, Jacquelyn Schneider
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2305.01405
  • Pdf link: https://arxiv.org/pdf/2305.01405
  • Abstract
    Despite pronouncements about the inevitable diffusion of artificial intelligence and autonomous technologies, in practice it is human behavior, not technology in a vacuum, that dictates how technology seeps into -- and changes -- societies. In order to better understand how human preferences shape technological adoption and the spread of AI-enabled autonomous technologies, we look at representative adult samples of US public opinion in 2018 and 2020 on the use of four types of autonomous technologies: vehicles, surgery, weapons, and cyber defense. By focusing on these four diverse uses of AI-enabled autonomy that span transportation, medicine, and national security, we exploit the inherent variation between these AI-enabled autonomous use cases. We find that those with familiarity and expertise with AI and similar technologies were more likely to support all of the autonomous applications we tested (except weapons) than those with a limited understanding of the technology. Individuals that had already delegated the act of driving by using ride-share apps were also more positive about autonomous vehicles. However, familiarity cut both ways; individuals are also less likely to support AI-enabled technologies when applied directly to their life, especially if technology automates tasks they are already familiar with operating. Finally, opposition to AI-enabled military applications has slightly increased over time.

ContactArt: Learning 3D Interaction Priors for Category-level Articulated Object and Hand Poses Estimation

  • Authors: Zehao Zhu, Jiashun Wang, Yuzhe Qin, Deqing Sun, Varun Jampani, Xiaolong Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.01618
  • Pdf link: https://arxiv.org/pdf/2305.01618
  • Abstract
    We propose a new dataset and a novel approach to learning hand-object interaction priors for hand and articulated object pose estimation. We first collect a dataset using visual teleoperation, where the human operator can directly play within a physical simulator to manipulate the articulated objects. We record the data and obtain free and accurate annotations on object poses and contact information from the simulator. Our system only requires an iPhone to record human hand motion, which can be easily scaled up and largely lower the costs of data and annotation collection. With this data, we learn 3D interaction priors including a discriminator (in a GAN) capturing the distribution of how object parts are arranged, and a diffusion model which generates the contact regions on articulated objects, guiding the hand pose estimation. Such structural and contact priors can easily transfer to real-world data with barely any domain gap. By using our data and learned priors, our method significantly improves the performance on joint hand and articulated object poses estimation over the existing state-of-the-art methods. The project is available at https://zehaozhu.github.io/ContactArt/ .

Keyword: dynamic

Attention-based Spatial-Temporal Graph Neural ODE for Traffic Prediction

  • Authors: Weiheng Zhong, Hadi Meidani, Jane Macfarlane
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00985
  • Pdf link: https://arxiv.org/pdf/2305.00985
  • Abstract
    Traffic forecasting is an important issue in intelligent traffic systems (ITS). Graph neural networks (GNNs) are effective deep learning models to capture the complex spatio-temporal dependency of traffic data, achieving ideal prediction performance. In this paper, we propose attention-based graph neural ODE (ASTGODE) that explicitly learns the dynamics of the traffic system, which makes the prediction of our machine learning model more explainable. Our model aggregates traffic patterns of different periods and has satisfactory performance on two real-world traffic data sets. The results show that our model achieves the highest accuracy of the root mean square error metric among all the existing GNN models in our experiments.

Software Runtime Monitoring with Adaptive Sampling Rate to Collect Representative Samples of Execution Traces

  • Authors: Jhonny Mertz, Ingrid Nunes
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2305.01039
  • Pdf link: https://arxiv.org/pdf/2305.01039
  • Abstract
    Monitoring software systems at runtime is key for understanding workloads, debugging, and self-adaptation. It typically involves collecting and storing observable software data, which can be analyzed online or offline. Despite the usefulness of collecting system data, it may significantly impact the system execution by delaying response times and competing with system resources. The typical approach to cope with this is to filter portions of the system to be monitored and to sample data. Although these approaches are a step towards achieving a desired trade-off between the amount of collected information and the impact on the system performance, they focus on collecting data of a particular type or may capture a sample that does not correspond to the actual system behavior. In response, we propose an adaptive runtime monitoring process to dynamically adapt the sampling rate while monitoring software systems. It includes algorithms with statistical foundations to improve the representativeness of collected samples without compromising the system performance. Our evaluation targets five applications of a widely used benchmark. It shows that the error (RMSE) of the samples collected with our approach is 9-54% lower than the main alternative strategy (sampling rate inversely proportional to the throughput), with 1-6% higher performance impact.

Right HTML, Wrong JSON: Challenges in Replaying Archived Webpages Built with Client-Side Rendering

  • Authors: Michele C. Weigle, Michael L. Nelson, Sawood Alam, Mark Graham
  • Subjects: Digital Libraries (cs.DL)
  • Arxiv link: https://arxiv.org/abs/2305.01071
  • Pdf link: https://arxiv.org/pdf/2305.01071
  • Abstract
    Many web sites are transitioning how they construct their pages. The conventional model is where the content is embedded server-side in the HTML and returned to the client in an HTTP response. Increasingly, sites are moving to a model where the initial HTTP response contains only an HTML skeleton plus JavaScript that makes API calls to a variety of servers for the content (typically in JSON format), and then builds out the DOM client-side, more easily allowing for periodically refreshing the content in a page and allowing dynamic modification of the content. This client-side rendering, now predominant in social media platforms such as Twitter and Instagram, is also being adopted by news outlets, such as CNN.com. When conventional web archiving techniques, such as crawling with Heritrix, are applied to pages that render their content client-side, the JSON responses can become out of sync with the HTML page in which it is to be embedded, resulting in temporal violations on replay. Because the violative JSON is not directly observable in the page (i.e., in the same manner a violative embedded image is), the temporal violations can be difficult to detect. We describe how the top level CNN.com page has used client-side rendering since April 2015 and the impact this has had on web archives. Between April 24, 2015 and July 21, 2016, we found almost 15,000 mementos with a temporal violation of more than 2 days between the base CNN.com HTML and the JSON responses used to deliver the content under the main story. One way to mitigate this problem is to use browser-based crawling instead of conventional crawlers like Heritrix, but browser-based crawling is currently much slower than non-browser-based tools such as Heritrix.

Autoencoders for discovering manifold dimension and coordinates in data from complex dynamical systems

  • Authors: Kevin Zeng, Michael D. Graham
  • Subjects: Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD)
  • Arxiv link: https://arxiv.org/abs/2305.01090
  • Pdf link: https://arxiv.org/pdf/2305.01090
  • Abstract
    While many phenomena in physics and engineering are formally high-dimensional, their long-time dynamics often live on a lower-dimensional manifold. The present work introduces an autoencoder framework that combines implicit regularization with internal linear layers and $L_2$ regularization (weight decay) to automatically estimate the underlying dimensionality of a data set, produce an orthogonal manifold coordinate system, and provide the mapping functions between the ambient space and manifold space, allowing for out-of-sample projections. We validate our framework's ability to estimate the manifold dimension for a series of datasets from dynamical systems of varying complexities and compare to other state-of-the-art estimators. We analyze the training dynamics of the network to glean insight into the mechanism of low-rank learning and find that collectively each of the implicit regularizing layers compound the low-rank representation and even self-correct during training. Analysis of gradient descent dynamics for this architecture in the linear case reveals the role of the internal linear layers in leading to faster decay of a "collective weight variable" incorporating all layers, and the role of weight decay in breaking degeneracies and thus driving convergence along directions in which no decay would occur in its absence. We show that this framework can be naturally extended for applications of state-space modeling and forecasting by generating a data-driven dynamic model of a spatiotemporally chaotic partial differential equation using only the manifold coordinates. Finally, we demonstrate that our framework is robust to hyperparameter choices.

Learning Controllable Adaptive Simulation for Multi-resolution Physics

  • Authors: Tailin Wu, Takashi Maruyama, Qingqing Zhao, Gordon Wetzstein, Jure Leskovec
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2305.01122
  • Pdf link: https://arxiv.org/pdf/2305.01122
  • Abstract
    Simulating the time evolution of physical systems is pivotal in many scientific and engineering problems. An open challenge in simulating such systems is their multi-resolution dynamics: a small fraction of the system is extremely dynamic, and requires very fine-grained resolution, while a majority of the system is changing slowly and can be modeled by coarser spatial scales. Typical learning-based surrogate models use a uniform spatial scale, which needs to resolve to the finest required scale and can waste a huge compute to achieve required accuracy. In this work, we introduce Learning controllable Adaptive simulation for Multi-resolution Physics (LAMP) as the first full deep learning-based surrogate model that jointly learns the evolution model and optimizes appropriate spatial resolutions that devote more compute to the highly dynamic regions. LAMP consists of a Graph Neural Network (GNN) for learning the forward evolution, and a GNN-based actor-critic for learning the policy of spatial refinement and coarsening. We introduce learning techniques that optimizes LAMP with weighted sum of error and computational cost as objective, allowing LAMP to adapt to varying relative importance of error vs. computation tradeoff at inference time. We evaluate our method in a 1D benchmark of nonlinear PDEs and a challenging 2D mesh-based simulation. We demonstrate that our LAMP outperforms state-of-the-art deep learning surrogate models, and can adaptively trade-off computation to improve long-term prediction error: it achieves an average of 33.7% error reduction for 1D nonlinear PDEs, and outperforms MeshGraphNets + classical Adaptive Mesh Refinement (AMR) in 2D mesh-based simulations. Project website with data and code can be found at: this http URL

Analysis of different temporal graph neural network configurations on dynamic graphs

  • Authors: Rishu Verma, Ashmita Bhattacharya, Sai Naveen Katla
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2305.01128
  • Pdf link: https://arxiv.org/pdf/2305.01128
  • Abstract
    In recent years, there has been an increasing interest in the use of graph neural networks (GNNs) for analyzing dynamic graphs, which are graphs that evolve over time. However, there is still a lack of understanding of how different temporal graph neural network (TGNs) configurations can impact the accuracy of predictions on dynamic graphs. Moreover, the hunt for benchmark datasets for these TGNs models is still ongoing. Up until recently, Pytorch Geometric Temporal came up with a few benchmark datasets but most of these datasets have not been analyzed with different TGN models to establish the state-of-the-art. Therefore, this project aims to address this gap in the literature by performing a qualitative analysis of spatial-temporal dependence structure learning on dynamic graphs, as well as a comparative study of the effectiveness of selected TGNs on node and edge prediction tasks. Additionally, an extensive ablation study will be conducted on different variants of the best-performing TGN to identify the key factors contributing to its performance. By achieving these objectives, this project will provide valuable insights into the design and optimization of TGNs for dynamic graph analysis, with potential applications in areas such as disease spread prediction, social network analysis, traffic prediction, and more. Moreover, an attempt is made to convert snapshot-based data to the event-based dataset and make it compatible with the SOTA model namely TGN to perform node regression task.

PGrad: Learning Principal Gradients For Domain Generalization

  • Authors: Zhe Wang, Jake Grigsby, Yanjun Qi
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.01134
  • Pdf link: https://arxiv.org/pdf/2305.01134
  • Abstract
    Machine learning models fail to perform when facing out-of-distribution (OOD) domains, a challenging task known as domain generalization (DG). In this work, we develop a novel DG training strategy, we call PGrad, to learn a robust gradient direction, improving models' generalization ability on unseen domains. The proposed gradient aggregates the principal directions of a sampled roll-out optimization trajectory that measures the training dynamics across all training domains. PGrad's gradient design forces the DG training to ignore domain-dependent noise signals and updates all training domains with a robust direction covering main components of parameter dynamics. We further improve PGrad via bijection-based computational refinement and directional plus length-based calibrations. Our theoretical proof connects PGrad to the spectral analysis of Hessian in training neural networks. Experiments on DomainBed and WILDS benchmarks demonstrate that our approach effectively enables robust DG optimization and leads to smoothly decreased loss curves. Empirically, PGrad achieves competitive results across seven datasets, demonstrating its efficacy across both synthetic and real-world distributional shifts. Code is available at https://github.com/QData/PGrad.

Ripple Knowledge Graph Convolutional Networks For Recommendation Systems

  • Authors: Chen Li, Yang Cao, Ye Zhu, Debo Cheng, Chengyuan Li, Yasuhiko Morimoto
  • Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.01147
  • Pdf link: https://arxiv.org/pdf/2305.01147
  • Abstract
    Using knowledge graphs to assist deep learning models in making recommendation decisions has recently been proven to effectively improve the model's interpretability and accuracy. This paper introduces an end-to-end deep learning model, named RKGCN, which dynamically analyses each user's preferences and makes a recommendation of suitable items. It combines knowledge graphs on both the item side and user side to enrich their representations to maximize the utilization of the abundant information in knowledge graphs. RKGCN is able to offer more personalized and relevant recommendations in three different scenarios. The experimental results show the superior effectiveness of our model over 5 baseline models on three real-world datasets including movies, books, and music.

Early Classifying Multimodal Sequences

  • Authors: Alexander Cao, Jean Utke, Diego Klabjan
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.01151
  • Pdf link: https://arxiv.org/pdf/2305.01151
  • Abstract
    Often pieces of information are received sequentially over time. When did one collect enough such pieces to classify? Trading wait time for decision certainty leads to early classification problems that have recently gained attention as a means of adapting classification to more dynamic environments. However, so far results have been limited to unimodal sequences. In this pilot study, we expand into early classifying multimodal sequences by combining existing methods. We show our new method yields experimental AUC advantages of up to 8.7%.

Optimizing Guided Traversal for Fast Learned Sparse Retrieval

  • Authors: Yifan Qiao, Yingrui Yang, Haixin Lin, Tao Yang
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2305.01203
  • Pdf link: https://arxiv.org/pdf/2305.01203
  • Abstract
    Recent studies show that BM25-driven dynamic index skipping can greatly accelerate MaxScore-based document retrieval based on the learned sparse representation derived by DeepImpact. This paper investigates the effectiveness of such a traversal guidance strategy during top k retrieval when using other models such as SPLADE and uniCOIL, and finds that unconstrained BM25-driven skipping could have a visible relevance degradation when the BM25 model is not well aligned with a learned weight model or when retrieval depth k is small. This paper generalizes the previous work and optimizes the BM25 guided index traversal with a two-level pruning control scheme and model alignment for fast retrieval using a sparse representation. Although there can be a cost of increased latency, the proposed scheme is much faster than the original MaxScore method without BM25 guidance while retaining the relevance effectiveness. This paper analyzes the competitiveness of this two-level pruning scheme, and evaluates its tradeoff in ranking relevance and time efficiency when searching several test datasets.

Structure Aware Incremental Learning with Personalized Imitation Weights for Recommender Systems

  • Authors: Yuening Wang, Yingxue Zhang, Antonios Valkanas, Ruiming Tang, Chen Ma, Jianye Hao, Mark Coates
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2305.01204
  • Pdf link: https://arxiv.org/pdf/2305.01204
  • Abstract
    Recommender systems now consume large-scale data and play a significant role in improving user experience. Graph Neural Networks (GNNs) have emerged as one of the most effective recommender system models because they model the rich relational information. The ever-growing volume of data can make training GNNs prohibitively expensive. To address this, previous attempts propose to train the GNN models incrementally as new data blocks arrive. Feature and structure knowledge distillation techniques have been explored to allow the GNN model to train in a fast incremental fashion while alleviating the catastrophic forgetting problem. However, preserving the same amount of the historical information for all users is sub-optimal since it fails to take into account the dynamics of each user's change of preferences. For the users whose interests shift substantially, retaining too much of the old knowledge can overly constrain the model, preventing it from quickly adapting to the users' novel interests. In contrast, for users who have static preferences, model performance can benefit greatly from preserving as much of the user's long-term preferences as possible. In this work, we propose a novel training strategy that adaptively learns personalized imitation weights for each user to balance the contribution from the recent data and the amount of knowledge to be distilled from previous time periods. We demonstrate the effectiveness of learning imitation weights via a comparison on five diverse datasets for three state-of-art structure distillation based recommender systems. The performance shows consistent improvement over competitive incremental learning techninques.

Dynamic Scheduling for Federated Edge Learning with Streaming Data

  • Authors: Chung-Hsuan Hu, Zheng Chen, Erik G. Larsson
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2305.01238
  • Pdf link: https://arxiv.org/pdf/2305.01238
  • Abstract
    In this work, we consider a Federated Edge Learning (FEEL) system where training data are randomly generated over time at a set of distributed edge devices with long-term energy constraints. Due to limited communication resources and latency requirements, only a subset of devices is scheduled for participating in the local training process in every iteration. We formulate a stochastic network optimization problem for designing a dynamic scheduling policy that maximizes the time-average data importance from scheduled user sets subject to energy consumption and latency constraints. Our proposed algorithm based on the Lyapunov optimization framework outperforms alternative methods without considering time-varying data importance, especially when the generation of training data shows strong temporal correlation.

Sim2real and Digital Twins in Autonomous Driving: A Survey

  • Authors: Xuemin Hu, Shen Li, Tingyu Huang, Bo Tang, Long Chen
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.01263
  • Pdf link: https://arxiv.org/pdf/2305.01263
  • Abstract
    Safety and cost are two important concerns for the development of autonomous driving technologies. From the academic research to commercial applications of autonomous driving vehicles, sufficient simulation and real world testing are required. In general, a large scale of testing in simulation environment is conducted and then the learned driving knowledge is transferred to the real world, so how to adapt driving knowledge learned in simulation to reality becomes a critical issue. However, the virtual simulation world differs from the real world in many aspects such as lighting, textures, vehicle dynamics, and agents' behaviors, etc., which makes it difficult to bridge the gap between the virtual and real worlds. This gap is commonly referred to as the reality gap (RG). In recent years, researchers have explored various approaches to address the reality gap issue, which can be broadly classified into two categories: transferring knowledge from simulation to reality (sim2real) and learning in digital twins (DTs). In this paper, we consider the solutions through the sim2real and DTs technologies, and review important applications and innovations in the field of autonomous driving. Meanwhile, we show the state-of-the-arts from the views of algorithms, models, and simulators, and elaborate the development process from sim2real to DTs. The presentation also illustrates the far-reaching effects of the development of sim2real and DTs in autonomous driving.

Exploring vision transformer layer choosing for semantic segmentation

  • Authors: Fangjian Lin, Yizhe Ma, Shengwei Tian
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.01279
  • Pdf link: https://arxiv.org/pdf/2305.01279
  • Abstract
    Extensive work has demonstrated the effectiveness of Vision Transformers. The plain Vision Transformer tends to obtain multi-scale features by selecting fixed layers, or the last layer of features aiming to achieve higher performance in dense prediction tasks. However, this selection is often based on manual operation. And different samples often exhibit different features at different layers (e.g., edge, structure, texture, detail, etc.). This requires us to seek a dynamic adaptive fusion method to filter different layer features. In this paper, unlike previous encoder and decoder work, we design a neck network for adaptive fusion and feature selection, called ViTController. We validate the effectiveness of our method on different datasets and models and surpass previous state-of-the-art methods. Finally, our method can also be used as a plug-in module and inserted into different networks.

Arax: A Runtime Framework for Decoupling Applications from Heterogeneous Accelerators

  • Authors: Manos Pavlidakis, Stelios Mavridis, Antony Chazapis, Giorgos Vasiliadis, Angelos Bilas
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.01291
  • Pdf link: https://arxiv.org/pdf/2305.01291
  • Abstract
    Today, using multiple heterogeneous accelerators efficiently from applications and high-level frameworks, such as TensorFlow and Caffe, poses significant challenges in three respects: (a) sharing accelerators, (b) allocating available resources elastically during application execution, and (c) reducing the required programming effort. In this paper, we present Arax, a runtime system that decouples applications from heterogeneous accelerators within a server. First, Arax maps application tasks dynamically to available resources, managing all required task state, memory allocations, and task dependencies. As a result, Arax can share accelerators across applications in a server and adjust the resources used by each application as load fluctuates over time. dditionally, Arax offers a simple API and includes Autotalk, a stub generator that automatically generates stub libraries for applications already written for specific accelerator types, such as NVIDIA GPUs. Consequently, Arax applications are written once without considering physical details, including the number and type of accelerators. Our results show that applications, such as Caffe, TensorFlow, and Rodinia, can run using Arax with minimum effort and low overhead compared to native execution, about 12% (geometric mean). Arax supports efficient accelerator sharing, by offering up to 20% improved execution times compared to NVIDIA MPS, which supports NVIDIA GPUs only. Arax can transparently provide elasticity, decreasing total application turn-around time by up to 2x compared to native execution without elasticity support.

Validation of massively-parallel adaptive testing using dynamic control matching

  • Authors: Schaun Wheeler
  • Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
  • Arxiv link: https://arxiv.org/abs/2305.01334
  • Pdf link: https://arxiv.org/pdf/2305.01334
  • Abstract
    A/B testing is a widely-used paradigm within marketing optimization because it promises identification of causal effects and because it is implemented out of the box in most messaging delivery software platforms. Modern businesses, however, often run many A/B/n tests at the same time and in parallel, and package many content variations into the same messages, not all of which are part of an explicit test. Whether as the result of many teams testing at the same time, or as part of a more sophisticated reinforcement learning (RL) approach that continuously adapts tests and test condition assignment based on previous results, dynamic parallel testing cannot be evaluated the same way traditional A/B tests are evaluated. This paper presents a method for disentangling the causal effects of the various tests under conditions of continuous test adaptation, using a matched-synthetic control group that adapts alongside the tests.

Physics-Informed Learning Using Hamiltonian Neural Networks with Output Error Noise Models

  • Authors: Sarvin Moradi, Nick Jaensson, Roland Tóth, Maarten Schoukens
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.01338
  • Pdf link: https://arxiv.org/pdf/2305.01338
  • Abstract
    In order to make data-driven models of physical systems interpretable and reliable, it is essential to include prior physical knowledge in the modeling framework. Hamiltonian Neural Networks (HNNs) implement Hamiltonian theory in deep learning and form a comprehensive framework for modeling autonomous energy-conservative systems. Despite being suitable to estimate a wide range of physical system behavior from data, classical HNNs are restricted to systems without inputs and require noiseless state measurements and information on the derivative of the state to be available. To address these challenges, this paper introduces an Output Error Hamiltonian Neural Network (OE-HNN) modeling approach to address the modeling of physical systems with inputs and noisy state measurements. Furthermore, it does not require the state derivatives to be known. Instead, the OE-HNN utilizes an ODE-solver embedded in the training process, which enables the OE-HNN to learn the dynamics from noisy state measurements. In addition, extending HNNs based on the generalized Hamiltonian theory enables to include external inputs into the framework which are important for engineering applications. We demonstrate via simulation examples that the proposed OE-HNNs results in superior modeling performance compared to classical HNNs.

A Quadtree for Hyperbolic Space

  • Authors: Sándor Kisfaludi-Bak, Geert van Wordragen
  • Subjects: Computational Geometry (cs.CG)
  • Arxiv link: https://arxiv.org/abs/2305.01356
  • Pdf link: https://arxiv.org/pdf/2305.01356
  • Abstract
    We propose a data structure in d-dimensional hyperbolic space that can be considered a natural counterpart to quadtrees in Euclidean spaces. Based on this data structure we propose a so-called L-order for hyperbolic point sets, which is an extension of the Z-order defined in Euclidean spaces. We demonstrate the usefulness of our hyperbolic quadtree data structure by giving an algorithm for constant-approximate closest pair and dynamic constant-approximate nearest neighbours in hyperbolic space of constant dimension d.

Diddy: a Python toolbox for infinite discrete dynamical systems

  • Authors: Ville Salo, Ilkka Törmä
  • Subjects: Mathematical Software (cs.MS); Discrete Mathematics (cs.DM); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2305.01375
  • Pdf link: https://arxiv.org/pdf/2305.01375
  • Abstract
    We introduce Diddy, a collection of Python scripts for analyzing infinite discrete dynamical systems. The main focus is on generalized multidimensional shifts of finite type (SFTs). We show how Diddy can be used to easily define SFTs and cellular automata, and analyze their basic properties. We also showcase how to verify or rediscover some results from coding theory and cellular automata theory.

Get Back Here: Robust Imitation by Return-to-Distribution Planning

  • Authors: Geoffrey Cideron, Baruch Tabanpour, Sebastian Curi, Sertan Girgin, Leonard Hussenot, Gabriel Dulac-Arnold, Matthieu Geist, Olivier Pietquin, Robert Dadashi
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.01400
  • Pdf link: https://arxiv.org/pdf/2305.01400
  • Abstract
    We consider the Imitation Learning (IL) setup where expert data are not collected on the actual deployment environment but on a different version. To address the resulting distribution shift, we combine behavior cloning (BC) with a planner that is tasked to bring the agent back to states visited by the expert whenever the agent deviates from the demonstration distribution. The resulting algorithm, POIR, can be trained offline, and leverages online interactions to efficiently fine-tune its planner to improve performance over time. We test POIR on a variety of human-generated manipulation demonstrations in a realistic robotic manipulation simulator and show robustness of the learned policy to different initial state distributions and noisy dynamics.

Absolute integrability of Mercer kernels is only sufficient for RKHS stability

  • Authors: Mauro Bisiacco, Gianluigi Pillonetto
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.01411
  • Pdf link: https://arxiv.org/pdf/2305.01411
  • Abstract
    Reproducing kernel Hilbert spaces (RKHSs) are special Hilbert spaces in one-to-one correspondence with positive definite maps called kernels. They are widely employed in machine learning to reconstruct unknown functions from sparse and noisy data. In the last two decades, a subclass known as stable RKHSs has been also introduced in the setting of linear system identification. Stable RKHSs contain only absolutely integrable impulse responses over the positive real line. Hence, they can be adopted as hypothesis spaces to estimate linear, time-invariant and BIBO stable dynamic systems from input-output data. Necessary and sufficient conditions for RKHS stability are available in the literature and it is known that kernel absolute integrability implies stability. Working in discrete-time, in a recent work we have proved that this latter condition is only sufficient. Working in continuous-time, it is the purpose of this note to prove that the same result holds also for Mercer kernels.

Borinot: an agile torque-controlled robot for hybrid flying and contact loco-manipulation (workshop version)

  • Authors: Josep Marti-Saumell, Joan Sola, Angel Santamaria-Navarro, Hugo Duarte
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.01423
  • Pdf link: https://arxiv.org/pdf/2305.01423
  • Abstract
    This paper introduces Borinot, an open-source flying robotic platform designed to perform hybrid agile locomotion and manipulation. This platform features a compact and powerful hexarotor that can be outfitted with torque-actuated extremities of diverse architecture, allowing for whole-body dynamic control. As a result, Borinot can perform agile tasks such as aggressive or acrobatic maneuvers with the participation of the whole-body dynamics. The extremities attached to Borinot can be utilized in various ways; during contact, they can be used as legs to create contact-based locomotion, or as arms to manipulate objects. In free flight, they can be used as tails to contribute to dynamics, mimicking the movements of many animals. This allows for any hybridization of these dynamic modes, like the jump-flight of chicken and locusts, making Borinot an ideal open-source platform for research on hybrid aerial-contact agile motion. To demonstrate the key capabilities of Borinot, we have fitted a planar 2DoF arm and implemented whole-body torque-level model-predictive-control. The result is a capable and adaptable platform that, we believe, opens up new avenues of research in the field of agile robotics.

Mixed-Integer Optimal Control via Reinforcement Learning: A Case Study on Hybrid Vehicle Energy Management

  • Authors: Jinming Xu, Yuan Lin
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.01461
  • Pdf link: https://arxiv.org/pdf/2305.01461
  • Abstract
    Many optimal control problems require the simultaneous output of continuous and discrete control variables. Such problems are usually formulated as mixed-integer optimal control (MIOC) problems, which are challenging to solve due to the complexity of the solution space. Numerical methods such as branch-and-bound are computationally expensive and unsuitable for real-time control. This paper proposes a novel continuous-discrete reinforcement learning (CDRL) algorithm, twin delayed deep deterministic actor-Q (TD3AQ), for MIOC problems. TD3AQ combines the advantages of both actor-critic and Q-learning methods, and can handle the continuous and discrete action spaces simultaneously. The proposed algorithm is evaluated on a hybrid electric vehicle (HEV) energy management problem, where real-time control of the continuous variable engine torque and discrete variable gear ratio is essential to maximize fuel economy while satisfying driving constraints. Simulation results on different drive cycles show that TD3AQ can achieve near-optimal solutions compared to dynamic programming (DP) and outperforms the state-of-the-art discrete RL algorithm Rainbow, which is adopted for MIOC by discretizing continuous actions into a finite set of discrete values.

H2 optimal model reduction on general domains

  • Authors: Alessandro Borghi, Tobias Breiten
  • Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2305.01511
  • Pdf link: https://arxiv.org/pdf/2305.01511
  • Abstract
    Optimal model reduction for large-scale linear dynamical systems is studied. In contrast to most existing works, the systems under consideration are not required to be stable, neither in discrete nor in continuous time. As a consequence, the underlying rational transfer functions are allowed to have poles in general domains in the complex plane. In particular, this covers the case of specific conservative partial differential equations such as the linear Schr"odinger and the undamped linear wave equation with spectra on the imaginary axis. By an appropriate modification of the classical continuous time Hardy space $\mathcal{H}_2$, a new $\mathcal{H}_2$ like optimal model reduction problem is introduced and first order optimality conditions are derived. As in the classical $\mathcal{H}_2$ case, these conditions exhibit a rational Hermite interpolation structure for which an iterative model reduction algorithm is proposed. Numerical examples demonstrate the effectiveness of the new method.

Curriculum Modeling the Dependence among Targets with Multi-task Learning for Financial Marketing

  • Authors: Yunpeng Weng, Xing Tang, Liang Chen, Xiuqiang He
  • Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.01514
  • Pdf link: https://arxiv.org/pdf/2305.01514
  • Abstract
    Multi-task learning for various real-world applications usually involves tasks with logical sequential dependence. For example, in online marketing, the cascade behavior pattern of $impression \rightarrow click \rightarrow conversion$ is usually modeled as multiple tasks in a multi-task manner, where the sequential dependence between tasks is simply connected with an explicitly defined function or implicitly transferred information in current works. These methods alleviate the data sparsity problem for long-path sequential tasks as the positive feedback becomes sparser along with the task sequence. However, the error accumulation and negative transfer will be a severe problem for downstream tasks. Especially, at the beginning stage of training, the optimization for parameters of former tasks is not converged yet, and thus the information transferred to downstream tasks is negative. In this paper, we propose a prior information merged model (\textbf{PIMM}), which explicitly models the logical dependence among tasks with a novel prior information merged (\textbf{PIM}) module for multiple sequential dependence task learning in a curriculum manner. Specifically, the PIM randomly selects the true label information or the prior task prediction with a soft sampling strategy to transfer to the downstream task during the training. Following an easy-to-difficult curriculum paradigm, we dynamically adjust the sampling probability to ensure that the downstream task will get the effective information along with the training. The offline experimental results on both public and product datasets verify that PIMM outperforms state-of-the-art baselines. Moreover, we deploy the PIMM in a large-scale FinTech platform, and the online experiments also demonstrate the effectiveness of PIMM.

Unlocking the Power of Representations in Long-term Novelty-based Exploration

  • Authors: Alaa Saade, Steven Kapturowski, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra, Oliver Groth, Michal Valko, Bilal Piot
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2305.01521
  • Pdf link: https://arxiv.org/pdf/2305.01521
  • Abstract
    We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space. By adapting classical clustering to the nonstationary setting of Deep RL, RECODE can efficiently track state visitation counts over thousands of episodes. We further propose a novel generalization of the inverse dynamics loss, which leverages masked transformer architectures for multi-step prediction; which in conjunction with RECODE achieves a new state-of-the-art in a suite of challenging 3D-exploration tasks in DM-Hard-8. RECODE also sets new state-of-the-art in hard exploration Atari games, and is the first agent to reach the end screen in "Pitfall!".

FlexEdge: Digital Twin-Enabled Task Offloading for UAV-Aided Vehicular Edge Computing

  • Authors: Bin Li, Wancheng Xie, Yinghui Ye, Lei Liu, Zesong Fei
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2305.01536
  • Pdf link: https://arxiv.org/pdf/2305.01536
  • Abstract
    Integrating unmanned aerial vehicles (UAVs) into vehicular networks have shown high potentials in affording intensive computing tasks. In this paper, we study the digital twin driven vehicular edge computing networks for adaptively computing resource management where an unmanned aerial vehicle (UAV) named FlexEdge acts as a flying server. In particular, we first formulate an energy consumption minimization problem by jointly optimizing UAV trajectory and computation resource under the practical constraints. To address such a challenging problem, we then build the computation offloading process as a Markov decision process and propose a deep reinforcement learning-based proximal policy optimization algorithm to dynamically learn the computation offloading strategy and trajectory design policy. Numerical results indicate that our proposed algorithm can achieve quick convergence rate and significantly reduce the system energy consumption.

Teaching data-driven control: from linear design to adaptive control with throttle valves

  • Authors: Emmanuel Witrant, Ioan DorÉ Landau, Marie-Pierre Vaillant
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.01567
  • Pdf link: https://arxiv.org/pdf/2305.01567
  • Abstract
    Electric throttle valves represent a challenge for control design, as their dynamics involve strong nonlinearities, characterized by an asymmetric hysteresis. Carrying experiments on multiple valves, a large variability in the characteristics of each valve and erratic steady-state behaviors can also be noticed, impairing classical model-based control strategies. Nevertheless, local data-driven linear models can be obtained and simple proportional-integral (PI) controllers, tuned individually for each valve with the appropriate data set, provide good tracking performance. As these controllers cannot be transposed from one valve to another, a robust strategy and an adaptive controller (using identification in closed-loop and controller re-design) may be necessary to propose a general method. This work aims at promoting control education on a simple yet challenging process, going from frequency analysis and linear design to an adaptive control method implemented with an online recursive algorithm.

H2CGL: Modeling Dynamics of Citation Network for Impact Prediction

  • Authors: Guoxiu He, Zhikai Xue, Zhuoren Jiang, Yangyang Kang, Star Zhao, Wei Lu
  • Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.01572
  • Pdf link: https://arxiv.org/pdf/2305.01572
  • Abstract
    The potential impact of a paper is often quantified by how many citations it will receive. However, most commonly used models may underestimate the influence of newly published papers over time, and fail to encapsulate this dynamics of citation network into the graph. In this study, we construct hierarchical and heterogeneous graphs for target papers with an annual perspective. The constructed graphs can record the annual dynamics of target papers' scientific context information. Then, a novel graph neural network, Hierarchical and Heterogeneous Contrastive Graph Learning Model (H2CGL), is proposed to incorporate heterogeneity and dynamics of the citation network. H2CGL separately aggregates the heterogeneous information for each year and prioritizes the highly-cited papers and relationships among references, citations, and the target paper. It then employs a weighted GIN to capture dynamics between heterogeneous subgraphs over years. Moreover, it leverages contrastive learning to make the graph representations more sensitive to potential citations. Particularly, co-cited or co-citing papers of the target paper with large citation gap are taken as hard negative samples, while randomly dropping low-cited papers could generate positive samples. Extensive experimental results on two scholarly datasets demonstrate that the proposed H2CGL significantly outperforms a series of baseline approaches for both previously and freshly published papers. Additional analyses highlight the significance of the proposed modules. Our codes and settings have been released on Github (https://github.com/ECNU-Text-Computing/H2CGL)

Finding Neurons in a Haystack: Case Studies with Sparse Probing

  • Authors: Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine Harvey, Dmitrii Troitskii, Dimitris Bertsimas
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.01610
  • Pdf link: https://arxiv.org/pdf/2305.01610
  • Abstract
    Despite rapid adoption and deployment of large language models (LLMs), the internal computations of these models remain opaque and poorly understood. In this work, we seek to understand how high-level human-interpretable features are represented within the internal neuron activations of LLMs. We train $k$-sparse linear classifiers (probes) on these internal activations to predict the presence of features in the input; by varying the value of $k$ we study the sparsity of learned representations and how this varies with model scale. With $k=1$, we localize individual neurons which are highly relevant for a particular feature, and perform a number of case studies to illustrate general properties of LLMs. In particular, we show that early layers make use of sparse combinations of neurons to represent many features in superposition, that middle layers have seemingly dedicated neurons to represent higher-level contextual features, and that increasing scale causes representational sparsity to increase on average, but there are multiple types of scaling dynamics. In all, we probe for over 100 unique features comprising 10 different categories in 7 different models spanning 70 million to 6.9 billion parameters.

AutoColor: Learned Light Power Control for Multi-Color Holograms

  • Authors: Yicheng Zhan, Koray Kavaklı, Hakan Urey, Qi Sun, Kaan Akşit
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2305.01611
  • Pdf link: https://arxiv.org/pdf/2305.01611
  • Abstract
    Multi-color holograms rely on simultaneous illumination from multiple light sources. These multi-color holograms could utilize light sources better than conventional single-color holograms and can improve the dynamic range of holographic displays. In this letter, we introduce \projectname, the first learned method for estimating the optimal light source powers required for illuminating multi-color holograms. For this purpose, we establish the first multi-color hologram dataset using synthetic images and their depth information. We generate these synthetic images using a trending pipeline combining generative, large language, and monocular depth estimation models. Finally, we train our learned model using our dataset and experimentally demonstrate that \projectname significantly decreases the number of steps required to optimize multi-color holograms from $&gt;1000$ to $70$ iteration steps without compromising image quality.

Key-Locked Rank One Editing for Text-to-Image Personalization

  • Authors: Yoad Tewel, Rinon Gal, Gal Chechik, Yuval Atzmon
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2305.01644
  • Pdf link: https://arxiv.org/pdf/2305.01644
  • Abstract
    Text-to-image models (T2I) offer a new level of flexibility by allowing users to guide the creative process through natural language. However, personalizing these models to align with user-provided visual concepts remains a challenging problem. The task of T2I personalization poses multiple hard challenges, such as maintaining high visual fidelity while allowing creative control, combining multiple personalized concepts in a single image, and keeping a small model size. We present Perfusion, a T2I personalization method that addresses these challenges using dynamic rank-1 updates to the underlying T2I model. Perfusion avoids overfitting by introducing a new mechanism that "locks" new concepts' cross-attention Keys to their superordinate category. Additionally, we develop a gated rank-1 approach that enables us to control the influence of a learned concept during inference time and to combine multiple concepts. This allows runtime-efficient balancing of visual-fidelity and textual-alignment with a single 100KB trained model, which is five orders of magnitude smaller than the current state of the art. Moreover, it can span different operating points across the Pareto front without additional training. Finally, we show that Perfusion outperforms strong baselines in both qualitative and quantitative terms. Importantly, key-locking leads to novel results compared to traditional approaches, allowing to portray personalized object interactions in unprecedented ways, even in one-shot settings.

New submissions for Fri, 21 Apr 23

Keyword: efficient

Evolving Constrained Reinforcement Learning Policy

  • Authors: Chengpeng Hu, Jiyuan Pei, Jialin Liu, Xin Yao
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09869
  • Pdf link: https://arxiv.org/pdf/2304.09869
  • Abstract
    Evolutionary algorithms have been used to evolve a population of actors to generate diverse experiences for training reinforcement learning agents, which helps to tackle the temporal credit assignment problem and improves the exploration efficiency. However, when adapting this approach to address constrained problems, balancing the trade-off between the reward and constraint violation is hard. In this paper, we propose a novel evolutionary constrained reinforcement learning (ECRL) algorithm, which adaptively balances the reward and constraint violation with stochastic ranking, and at the same time, restricts the policy's behaviour by maintaining a set of Lagrange relaxation coefficients with a constraint buffer. Extensive experiments on robotic control benchmarks show that our ECRL achieves outstanding performance compared to state-of-the-art algorithms. Ablation analysis shows the benefits of introducing stochastic ranking and constraint buffer.

GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

  • Authors: Li Zaitang, Pin-Yu Chen, Tsung-Yi Ho
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09875
  • Pdf link: https://arxiv.org/pdf/2304.09875
  • Abstract
    Current studies on adversarial robustness mainly focus on aggregating local robustness results from a set of data samples to evaluate and rank different models. However, the local statistics may not well represent the true global robustness of the underlying unknown data distribution. To address this challenge, this paper makes the first attempt to present a new framework, called GREAT Score , for global robustness evaluation of adversarial perturbation using generative models. Formally, GREAT Score carries the physical meaning of a global statistic capturing a mean certified attack-proof perturbation level over all samples drawn from a generative model. For finite-sample evaluation, we also derive a probabilistic guarantee on the sample complexity and the difference between the sample mean and the true mean. GREAT Score has several advantages: (1) Robustness evaluations using GREAT Score are efficient and scalable to large models, by sparing the need of running adversarial attacks. In particular, we show high correlation and significantly reduced computation cost of GREAT Score when compared to the attack-based model ranking on RobustBench (Croce,et. al. 2021). (2) The use of generative models facilitates the approximation of the unknown data distribution. In our ablation study with different generative adversarial networks (GANs), we observe consistency between global robustness evaluation and the quality of GANs. (3) GREAT Score can be used for remote auditing of privacy-sensitive black-box models, as demonstrated by our robustness evaluation on several online facial recognition services.

The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages

  • Authors: Vesa Akerman, David Baines, Damien Daspit, Ulf Hermjakob, Taeho Jang, Colin Leong, Michael Martin, Joel Mathew, Jonathan Robie, Marcus Schwarting
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09919
  • Pdf link: https://arxiv.org/pdf/2304.09919
  • Abstract
    Efficiently and accurately translating a corpus into a low-resource language remains a challenge, regardless of the strategies employed, whether manual, automated, or a combination of the two. Many Christian organizations are dedicated to the task of translating the Holy Bible into languages that lack a modern translation. Bible translation (BT) work is currently underway for over 3000 extremely low resource languages. We introduce the eBible corpus: a dataset containing 1009 translations of portions of the Bible with data in 833 different languages across 75 language families. In addition to a BT benchmarking dataset, we introduce model performance benchmarks built on the No Language Left Behind (NLLB) neural machine translation (NMT) models. Finally, we describe several problems specific to the domain of BT and consider how the established data and model benchmarks might be used for future translation efforts. For a BT task trained with NLLB, Austronesian and Trans-New Guinea language families achieve 35.1 and 31.6 BLEU scores respectively, which spurs future innovations for NMT for low-resource languages in Papua New Guinea.

A robust and interpretable deep learning framework for multi-modal registration via keypoints

  • Authors: Alan Q. Wang, Evan M. Yu, Adrian V. Dalca, Mert R. Sabuncu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09941
  • Pdf link: https://arxiv.org/pdf/2304.09941
  • Abstract
    We present KeyMorph, a deep learning-based image registration framework that relies on automatically detecting corresponding keypoints. State-of-the-art deep learning methods for registration often are not robust to large misalignments, are not interpretable, and do not incorporate the symmetries of the problem. In addition, most models produce only a single prediction at test-time. Our core insight which addresses these shortcomings is that corresponding keypoints between images can be used to obtain the optimal transformation via a differentiable closed-form expression. We use this observation to drive the end-to-end learning of keypoints tailored for the registration task, and without knowledge of ground-truth keypoints. This framework not only leads to substantially more robust registration but also yields better interpretability, since the keypoints reveal which parts of the image are driving the final alignment. Moreover, KeyMorph can be designed to be equivariant under image translations and/or symmetric with respect to the input image ordering. Finally, we show how multiple deformation fields can be computed efficiently and in closed-form at test time corresponding to different transformation variants. We demonstrate the proposed framework in solving 3D affine and spline-based registration of multi-modal brain MRI scans. In particular, we show registration accuracy that surpasses current state-of-the-art methods, especially in the context of large displacements. Our code is available at https://github.com/evanmy/keymorph.

Baugh-Wooley Multiplication for the RISCV Processor

  • Authors: Franc Grootjen, Nikolai Schauer
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.09952
  • Pdf link: https://arxiv.org/pdf/2304.09952
  • Abstract
    This article describes an efficient way to implement the multiplication instructions for a RISCV processor. Instead of using three predefined IP blocks for signed, unsigned and mixed multiplication, this article presents a novel extension to the Baugh-Wooley multiplication algorithm which reduces area and power consumption with roughly a factor three.

MasakhaNEWS: News Topic Classification for African languages

  • Authors: David Ifeoluwa Adelani, Marek Masiak, Israel Abebe Azime, Jesujoba Oluwadara Alabi, Atnafu Lambebo Tonja, Christine Mwase, Odunayo Ogundepo, Bonaventure F. P. Dossou, Akintunde Oladipo, Doreen Nixdorf, Chris Chinenye Emezue, Sana Sabah al-azzawi, Blessing K. Sibanda, Davis David, Lolwethu Ndolela, Jonathan Mukiibi, Tunde Oluwaseyi Ajayi, Tatiana Moteu Ngoli, Brian Odhiambo, Abraham Toluwase Owodunni, Nnaemeka C. Obiefuna, Shamsuddeen Hassan Muhammad, Saheed Salahudeen Abdullahi, Mesay Gemeda Yigezu, Tajuddeen Gwadabe, Idris Abdulmumin, Mahlet Taye Bame, Oluwabusayo Olufunke Awoyomi, Iyanuoluwa Shode, Tolulope Anu Adelani, Habiba Abdulganiy Kailani, Abdul-Hakeem Omotayo, Adetola Adeeko, Afolabi Abeeb, Anuoluwapo Aremu, Olanrewaju Samuel, Clemencia Siro, Wangari Kimotho, Onyekachi Raphael Ogbu, et al. (23 additional authors not shown)
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.09972
  • Pdf link: https://arxiv.org/pdf/2304.09972
  • Abstract
    African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.

Equilibrium-Invariant Embedding, Metric Space, and Fundamental Set of $2\times2$ Normal-Form Games

  • Authors: Luke Marris, Ian Gemp, Georgios Piliouras
  • Subjects: Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Theoretical Economics (econ.TH); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.09978
  • Pdf link: https://arxiv.org/pdf/2304.09978
  • Abstract
    Equilibrium solution concepts of normal-form games, such as Nash equilibria, correlated equilibria, and coarse correlated equilibria, describe the joint strategy profiles from which no player has incentive to unilaterally deviate. They are widely studied in game theory, economics, and multiagent systems. Equilibrium concepts are invariant under certain transforms of the payoffs. We define an equilibrium-inspired distance metric for the space of all normal-form games and uncover a distance-preserving equilibrium-invariant embedding. Furthermore, we propose an additional transform which defines a better-response-invariant distance metric and embedding. To demonstrate these metric spaces we study $2\times2$ games. The equilibrium-invariant embedding of $2\times2$ games has an efficient two variable parameterization (a reduction from eight), where each variable geometrically describes an angle on a unit circle. Interesting properties can be spatially inferred from the embedding, including: equilibrium support, cycles, competition, coordination, distances, best-responses, and symmetries. The best-response-invariant embedding of $2\times2$ games, after considering symmetries, rediscovers a set of 15 games, and their respective equivalence classes. We propose that this set of game classes is fundamental and captures all possible interesting strategic interactions in $2\times2$ games. We introduce a directed graph representation and name for each class. Finally, we leverage the tools developed for $2\times2$ games to develop game theoretic visualizations of large normal-form and extensive-form games that aim to fingerprint the strategic interactions that occur within.

Tetra-NeRF: Representing Neural Radiance Fields Using Tetrahedra

  • Authors: Jonas Kulhanek, Torsten Sattler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09987
  • Pdf link: https://arxiv.org/pdf/2304.09987
  • Abstract
    Neural Radiance Fields (NeRFs) are a very recent and very popular approach for the problems of novel view synthesis and 3D reconstruction. A popular scene representation used by NeRFs is to combine a uniform, voxel-based subdivision of the scene with an MLP. Based on the observation that a (sparse) point cloud of the scene is often available, this paper proposes to use an adaptive representation based on tetrahedra and a Delaunay representation instead of the uniform subdivision or point-based representations. We show that such a representation enables efficient training and leads to state-of-the-art results. Our approach elegantly combines concepts from 3D geometry processing, triangle-based rendering, and modern neural radiance fields. Compared to voxel-based representations, ours provides more detail around parts of the scene likely to be close to the surface. Compared to point-based representations, our approach achieves better performance.

AI-coherent data-driven forecasting model for a combined cycle power plant

  • Authors: Mir Sayed Shah Danish, Zahra Nazari, Tomonobu Senjyu
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10009
  • Pdf link: https://arxiv.org/pdf/2304.10009
  • Abstract
    This study investigates the transformation of energy models to align with machine learning requirements as a promising tool for optimizing the operation of combined cycle power plants (CCPPs). By modeling energy production as a function of environmental and control variables, this methodology offers an innovative way to achieve energy-efficient power generation in the context of the data-driven application. This study focuses on developing a thorough AI-coherent modeling approach for CCPP optimization, preferring an interdisciplinary perspective and coming up with a comprehensive, insightful analysis. The proposed numerical model using Broyden Fletcher Goldfarb Shanno (BFGS) algorithm enhances efficiency by simulating various operating scenarios and adjusting optimal parameters, leading to a high yield power generation of 2.23% increase from 452 MW to 462.1 MW by optimizing the environmental factors. This study deals with data-driven modeling based on historical data to make predictions without prior knowledge of the system's parameter, demonstrating several merits in identifying patterns that can be difficult for human analysts to detect, high accuracy when trained on large datasets, and the potential to improve over time with new data. The proposed modeling approach and methodology can be expanded as a valuable tool for forecasting and decision-making in complex energy systems.

Dynablox: Real-time Detection of Diverse Dynamic Objects in Complex Environments

  • Authors: Lukas Schmid, Olov Andersson, Aurelio Sulser, Patrick Pfreundschuh, Roland Siegwart
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.10049
  • Pdf link: https://arxiv.org/pdf/2304.10049
  • Abstract
    Real-time detection of moving objects is an essential capability for robots acting autonomously in dynamic environments. We thus propose Dynablox, a novel online mapping-based approach for robust moving object detection in complex environments. The central idea of our approach is to incrementally estimate high confidence free-space areas by modeling and accounting for sensing, state estimation, and mapping limitations during online robot operation. The spatio-temporally conservative free space estimate enables robust detection of moving objects without making any assumptions on the appearance of objects or environments. This allows deployment in complex scenes such as multi-storied buildings or staircases, and for diverse moving objects such as people carrying various items, doors swinging or even balls rolling around. We thoroughly evaluate our approach on real-world data sets, achieving 86% IoU at 17 FPS in typical robotic settings. The method outperforms a recent appearance-based classifier and approaches the performance of offline methods. We demonstrate its generality on a novel data set with rare moving objects in complex environments. We make our efficient implementation and the novel data set available as open-source.

Maximize the Long-term Average Revenue of Network Slice Provider via Admission Control Among Heterogeneous Slices

  • Authors: Miao Dai, Gang Sun, Hongfang Yu, Dusit Niyato
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.10057
  • Pdf link: https://arxiv.org/pdf/2304.10057
  • Abstract
    Network slicing endows 5G/B5G with differentiated and customized capabilities to cope with the proliferation of diversified services, whereas limited physical network resources may not be able to support all service requests. Slice admission control is regarded as an essential means to ensure service quality and service isolation when the network is under burden. Herein, the scenario where rational tenants coexist with partially competitive network slice providers is adopted. We aim to maximize the long-term average revenue of the network operators through slice admission control, with the feasibility of multidimensional resource requirements, the priority differences among heterogeneous slices, and the admission fairness within each slice taken into account concurrently. We prove the intractability of our problem by a reduction from the Multidimensional Knapsack Problem (MKP), and propose a two-stage algorithm called MPSAC to make a sub-optimal solution efficiently. The principle of MPSAC is to split the original problem into two sub-problems; inter-slice decision-making and intra-slice quota allocation, which are solved using a heuristic method and a tailored auction mechanism respectively. Extensive simulations are carried out to demonstrate the efficacy of our algorithm, the results show that the long-term average revenue of ours is at least 9.6% higher than comparisons while maintaining better priority relations and achieving improved fairness performance.

High-Performance and Flexible Parallel Algorithms for Semisort and Related Problems

  • Authors: Xiaojun Dong, Yunshu Wu, Zhongqi Wang, Laxman Dhulipala, Yan Gu, Yihan Sun
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.10078
  • Pdf link: https://arxiv.org/pdf/2304.10078
  • Abstract
    Semisort is a fundamental algorithmic primitive widely used in the design and analysis of efficient parallel algorithms. It takes input as an array of records and a function extracting a \emph{key} per record, and reorders them so that records with equal keys are contiguous. Since many applications only require collecting equal values, but not fully sorting the input, semisort is broadly applicable, e.g., in string algorithms, graph analytics, and geometry processing, among many other domains. However, despite dozens of recent papers that use semisort in their theoretical analysis and the existence of an asymptotically optimal parallel semisort algorithm, most implementations of these parallel algorithms choose to implement semisort by using comparison or integer sorting in practice, due to potential performance issues in existing semisort implementations. In this paper, we revisit the semisort problem, with the goal of achieving a high-performance parallel semisort implementation with a flexible interface. Our approach can easily extend to two related problems, \emph{histogram} and \emph{collect-reduce}. Our algorithms achieve strong speedups in practice, and importantly, outperform state-of-the-art parallel sorting and semisorting methods for almost all settings we tested, with varying input sizes, distribution, and key types. We also test two important applications with real-world data, and show that our algorithms improve the performance over existing approaches. We believe that many other parallel algorithm implementations can be accelerated using our results.

Transmit Power Minimization for STAR-RIS Empowered Symbiotic Radio Communications

  • Authors: Chao Zhou, Bin Lyu, Youhong Feng, Dinh Thai Hoang
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.10095
  • Pdf link: https://arxiv.org/pdf/2304.10095
  • Abstract
    In this paper, we propose a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) empowered transmission scheme for symbiotic radio (SR) systems to make more flexibility for network deployment and enhance system performance. The STAR-RIS is utilized to not only beam the primary signals from the base station (BS) towards multiple primary users on the same side of the STAR-RIS, but also achieve the secondary transmission to the secondary users on another side. We consider both the broadcasting signal model and unicasting signal model at the BS. For each model, we aim for minimizing the transmit power of the BS by designing the active beamforming and simultaneous reflection and transmission coefficients under the practical phase correlation constraint. To address the challenge of solving the formulated problem, we propose a block coordinate descent based algorithm with the semidefinite relaxation, penalty dual decomposition and successive convex approximation methods, which decomposes the original problem into one sub-problem about active beamforming and the other sub-problem about simultaneous reflection and transmission coefficients, and iteratively solve them until the convergence is achieved. Numerical results indicate that the proposed scheme can reduce up to 150.6% transmit power compared to the backscattering device enabled scheme.

Two-Memory Reinforcement Learning

  • Authors: Zhao Yang, Thomas. M. Moerland, Mike Preuss, Aske Plaat
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10098
  • Pdf link: https://arxiv.org/pdf/2304.10098
  • Abstract
    While deep reinforcement learning has shown important empirical success, it tends to learn relatively slow due to slow propagation of rewards information and slow update of parametric neural networks. Non-parametric episodic memory, on the other hand, provides a faster learning alternative that does not require representation learning and uses maximum episodic return as state-action values for action selection. Episodic memory and reinforcement learning both have their own strengths and weaknesses. Notably, humans can leverage multiple memory systems concurrently during learning and benefit from all of them. In this work, we propose a method called Two-Memory reinforcement learning agent (2M) that combines episodic memory and reinforcement learning to distill both of their strengths. The 2M agent exploits the speed of the episodic memory part and the optimality and the generalization capacity of the reinforcement learning part to complement each other. Our experiments demonstrate that the 2M agent is more data efficient and outperforms both pure episodic memory and pure reinforcement learning, as well as a state-of-the-art memory-augmented RL agent. Moreover, the proposed approach provides a general framework that can be used to combine any episodic memory agent with other off-policy reinforcement learning algorithms.

Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One

  • Authors: Hongyuan Zhang, Yanan Zhu, Xuelong Li
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10126
  • Pdf link: https://arxiv.org/pdf/2304.10126
  • Abstract
    Graph neural networks (GNN) suffer from severe inefficiency. It is mainly caused by the exponential growth of node dependency with the increase of layers. It extremely limits the application of stochastic optimization algorithms so that the training of GNN is usually time-consuming. To address this problem, we propose to decouple a multi-layer GNN as multiple simple modules for more efficient training, which is comprised of classical forward training (FT)and designed backward training (BT). Under the proposed framework, each module can be trained efficiently in FT by stochastic algorithms without distortion of graph information owing to its simplicity. To avoid the only unidirectional information delivery of FT and sufficiently train shallow modules with the deeper ones, we develop a backward training mechanism that makes the former modules perceive the latter modules. The backward training introduces the reversed information delivery into the decoupled modules as well as the forward information delivery. To investigate how the decoupling and greedy training affect the representational capacity, we theoretically prove that the error produced by linear modules will not accumulate on unsupervised tasks in most cases. The theoretical and experimental results show that the proposed framework is highly efficient with reasonable performance.

Securing Semantic Communications with Physical-layer Semantic Encryption and Obfuscation

  • Authors: Qi Qin, Yankai Rong, Guoshun Nan, Shaokang Wu, Xuefei Zhang, Qimei Cui, Xiaofeng Tao
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.10147
  • Pdf link: https://arxiv.org/pdf/2304.10147
  • Abstract
    Deep learning based semantic communication(DLSC) systems have shown great potential of making wireless networks significantly more efficient by only transmitting the semantics of the data. However, the open nature of wireless channel and fragileness of neural models cause DLSC systems extremely vulnerable to various attacks. Traditional wireless physical layer key (PLK), which relies on reciprocal channel and randomness characteristics between two legitimate users, holds the promise of securing DLSC. The main challenge lies in generating secret keys in the static environment with ultra-low/zero rate. Different from prior efforts that use relays or reconfigurable intelligent surfaces (RIS) to manipulate wireless channels, this paper proposes a novel physical layer semantic encryption scheme by exploring the randomness of bilingual evaluation understudy (BLEU) scores in the field of machine translation, and additionally presents a novel semantic obfuscation mechanism to provide further physical layer protections. Specifically, 1) we calculate the BLEU scores and corresponding weights of the DLSC system. Then, we generate semantic keys (SKey) by feeding the weighted sum of the scores into a hash function. 2) Equipped with the SKey, our proposed subcarrier obfuscation is able to further secure semantic communications with a dynamic dummy data insertion mechanism. Experiments show the effectiveness of our method, especially in the static wireless environment.

Is ChatGPT a Good Recommender? A Preliminary Study

  • Authors: Junling Liu, Chao Liu, Renjie Lv, Kang Zhou, Yan Zhang
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.10149
  • Pdf link: https://arxiv.org/pdf/2304.10149
  • Abstract
    Recommendation systems have witnessed significant advancements and have been widely used over the past decades. However, most traditional recommendation methods are task-specific and therefore lack efficient generalization ability. Recently, the emergence of ChatGPT has significantly advanced NLP tasks by enhancing the capabilities of conversational models. Nonetheless, the application of ChatGPT in the recommendation domain has not been thoroughly investigated. In this paper, we employ ChatGPT as a general-purpose recommendation model to explore its potential for transferring extensive linguistic and world knowledge acquired from large-scale corpora to recommendation scenarios. Specifically, we design a set of prompts and evaluate ChatGPT's performance on five recommendation scenarios. Unlike traditional recommendation methods, we do not fine-tune ChatGPT during the entire evaluation process, relying only on the prompts themselves to convert recommendation tasks into natural language tasks. Further, we explore the use of few-shot prompting to inject interaction information that contains user potential interest to help ChatGPT better understand user needs and interests. Comprehensive experimental results on Amazon Beauty dataset show that ChatGPT has achieved promising results in certain tasks and is capable of reaching the baseline level in others. We conduct human evaluations on two explainability-oriented tasks to more accurately evaluate the quality of contents generated by different models. And the human evaluations show ChatGPT can truly understand the provided information and generate clearer and more reasonable results. We hope that our study can inspire researchers to further explore the potential of language models like ChatGPT to improve recommendation performance and contribute to the advancement of the recommendation systems field.

Automated Dynamic Bayesian Networks for Predicting Acute Kidney Injury Before Onset

  • Authors: David Gordon, Panayiotis Petousis, Anders O. Garlid, Keith Norris, Katherine Tuttle, Susanne B. Nicholas, Alex A.T. Bui (on behalf of CURE-CKD)
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10175
  • Pdf link: https://arxiv.org/pdf/2304.10175
  • Abstract
    Several algorithms for learning the structure of dynamic Bayesian networks (DBNs) require an a priori ordering of variables, which influences the determined graph topology. However, it is often unclear how to determine this order if feature importance is unknown, especially as an exhaustive search is usually impractical. In this paper, we introduce Ranking Approaches for Unknown Structures (RAUS), an automated framework to systematically inform variable ordering and learn networks end-to-end. RAUS leverages existing statistical methods (Cramers V, chi-squared test, and information gain) to compare variable ordering, resultant generated network topologies, and DBN performance. RAUS enables end-users with limited DBN expertise to implement models via command line interface. We evaluate RAUS on the task of predicting impending acute kidney injury (AKI) from inpatient clinical laboratory data. Longitudinal observations from 67,460 patients were collected from our electronic health record (EHR) and Kidney Disease Improving Global Outcomes (KDIGO) criteria were then applied to define AKI events. RAUS learns multiple DBNs simultaneously to predict a future AKI event at different time points (i.e., 24-, 48-, 72-hours in advance of AKI). We also compared the results of the learned AKI prediction models and variable orderings to baseline techniques (logistic regression, random forests, and extreme gradient boosting). The DBNs generated by RAUS achieved 73-83% area under the receiver operating characteristic curve (AUCROC) within 24-hours before AKI; and 71-79% AUCROC within 48-hours before AKI of any stage in a 7-day observation window. Insights from this automated framework can help efficiently implement and interpret DBNs for clinical decision support. The source code for RAUS is available in GitHub at https://github.com/dgrdn08/RAUS .

Robust Deep Reinforcement Learning Scheduling via Weight Anchoring

  • Authors: Steffen Gracla, Edgar Beck, Carsten Bockelmann, Armin Dekorsy
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10176
  • Pdf link: https://arxiv.org/pdf/2304.10176
  • Abstract
    Questions remain on the robustness of data-driven learning methods when crossing the gap from simulation to reality. We utilize weight anchoring, a method known from continual learning, to cultivate and fixate desired behavior in Neural Networks. Weight anchoring may be used to find a solution to a learning problem that is nearby the solution of another learning problem. Thereby, learning can be carried out in optimal environments without neglecting or unlearning desired behavior. We demonstrate this approach on the example of learning mixed QoS-efficient discrete resource scheduling with infrequent priority messages. Results show that this method provides performance comparable to the state of the art of augmenting a simulation environment, alongside significantly increased robustness and steerability.

Regularizing Second-Order Influences for Continual Learning

  • Authors: Zhicheng Sun, Yadong Mu, Gang Hua
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10177
  • Pdf link: https://arxiv.org/pdf/2304.10177
  • Abstract
    Continual learning aims to learn on non-stationary data streams without catastrophically forgetting previous knowledge. Prevalent replay-based methods address this challenge by rehearsing on a small buffer holding the seen data, for which a delicate sample selection strategy is required. However, existing selection schemes typically seek only to maximize the utility of the ongoing selection, overlooking the interference between successive rounds of selection. Motivated by this, we dissect the interaction of sequential selection steps within a framework built on influence functions. We manage to identify a new class of second-order influences that will gradually amplify incidental bias in the replay buffer and compromise the selection process. To regularize the second-order effects, a novel selection objective is proposed, which also has clear connections to two widely adopted criteria. Furthermore, we present an efficient implementation for optimizing the proposed criterion. Experiments on multiple continual learning benchmarks demonstrate the advantage of our approach over state-of-the-art methods. Code is available at https://github.com/feifeiobama/InfluenceCL.

Efficient Uncertainty Estimation in Spiking Neural Networks via MC-dropout

  • Authors: Tao Sun, Bojian Yin, Sander Bohte
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10191
  • Pdf link: https://arxiv.org/pdf/2304.10191
  • Abstract
    Spiking neural networks (SNNs) have gained attention as models of sparse and event-driven communication of biological neurons, and as such have shown increasing promise for energy-efficient applications in neuromorphic hardware. As with classical artificial neural networks (ANNs), predictive uncertainties are important for decision making in high-stakes applications, such as autonomous vehicles, medical diagnosis, and high frequency trading. Yet, discussion of uncertainty estimation in SNNs is limited, and approaches for uncertainty estimation in artificial neural networks (ANNs) are not directly applicable to SNNs. Here, we propose an efficient Monte Carlo(MC)-dropout based approach for uncertainty estimation in SNNs. Our approach exploits the time-step mechanism of SNNs to enable MC-dropout in a computationally efficient manner, without introducing significant overheads during training and inference while demonstrating high accuracy and uncertainty quality.

Selective and Collaborative Influence Function for Efficient Recommendation Unlearning

  • Authors: Yuyuan Li, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Biao Gong, Jun Wang
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.10199
  • Pdf link: https://arxiv.org/pdf/2304.10199
  • Abstract
    Recent regulations on the Right to be Forgotten have greatly influenced the way of running a recommender system, because users now have the right to withdraw their private data. Besides simply deleting the target data in the database, unlearning the associated data lineage e.g., the learned personal features and preferences in the model, is also necessary for data withdrawal. Existing unlearning methods are mainly devised for generalized machine learning models in classification tasks. In this paper, we first identify two main disadvantages of directly applying existing unlearning methods in the context of recommendation, i.e., (i) unsatisfactory efficiency for large-scale recommendation models and (ii) destruction of collaboration across users and items. To tackle the above issues, we propose an extra-efficient recommendation unlearning method based on Selective and Collaborative Influence Function (SCIF). Our proposed method can (i) avoid any kind of retraining which is computationally prohibitive for large-scale systems, (ii) further enhance efficiency by selectively updating user embedding and (iii) preserve the collaboration across the remaining users and items. Furthermore, in order to evaluate the unlearning completeness, we define a Membership Inference Oracle (MIO), which can justify whether the unlearned data points were in the training set of the model, i.e., whether a data point was completely unlearned. Extensive experiments on two benchmark datasets demonstrate that our proposed method can not only greatly enhance unlearning efficiency, but also achieve adequate unlearning completeness. More importantly, our proposed method outperforms the state-of-the-art unlearning method regarding comprehensive recommendation metrics.

Spiking-Fer: Spiking Neural Network for Facial Expression Recognition With Event Cameras

  • Authors: Sami Barchid, Benjamin Allaert, Amel Aissaoui, José Mennesson, Chaabane Djéraba
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10211
  • Pdf link: https://arxiv.org/pdf/2304.10211
  • Abstract
    Facial Expression Recognition (FER) is an active research domain that has shown great progress recently, notably thanks to the use of large deep learning models. However, such approaches are particularly energy intensive, which makes their deployment difficult for edge devices. To address this issue, Spiking Neural Networks (SNNs) coupled with event cameras are a promising alternative, capable of processing sparse and asynchronous events with lower energy consumption. In this paper, we establish the first use of event cameras for FER, named "Event-based FER", and propose the first related benchmarks by converting popular video FER datasets to event streams. To deal with this new task, we propose "Spiking-FER", a deep convolutional SNN model, and compare it against a similar Artificial Neural Network (ANN). Experiments show that the proposed approach achieves comparable performance to the ANN architecture, while consuming less energy by orders of magnitude (up to 65.39x). In addition, an experimental study of various event-based data augmentation techniques is performed to provide insights into the efficient transformations specific to event-based FER.

Dynamic Security Region of Natural Gas Systems in Integrated Electricity-Gas Systems

  • Authors: Han Gao, Peiyao Zhao, Zhengshuo Li
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10215
  • Pdf link: https://arxiv.org/pdf/2304.10215
  • Abstract
    In an integrated electricity-gas system (IEGS), the tight coupling of power and natural gas systems is embodied by frequent changes in gas withdrawal from gas-fired units to provide regulation services for the power system to handle uncertainty, which may in turn endanger the secure operation of the natural gas system and ultimately affect the safety of the whole IEGS. Hence, it is necessary to accurately and efficiently evaluate the dynamic security region (DSR) of the natural gas system in the IEGS by considering the real-time dynamic characteristics of natural gas systems, which are not satisfactorily handled in state-of-the-art works. To bridge this gap, this paper first conceptionally verifies the necessity of the DSR and establishes its mathematical model. Then, a dimensionality reduction method is proposed for the efficient solution and visualization of the high-dimensional DSR evaluation model. A fast evaluation (FE) algorithm is developed to address the difficulties of the nonconvex dynamic constraints in the reduced DSR model. Finally, the necessity and notable advantages of the proposed DSR model and FE are verified based on small and relatively large test systems in comparison with common security region models and algorithms. To the best of our knowledge, this is the first paper that comprehensively presents models and efficient algorithms regarding the DSR of natural gas systems in an IEGS.

An Analysis of the Completion Time of the BB84 Protocol

  • Authors: Sounak Kar, Jean-Yves Le Boudec
  • Subjects: Performance (cs.PF); Quantum Physics (quant-ph)
  • Arxiv link: https://arxiv.org/abs/2304.10218
  • Pdf link: https://arxiv.org/pdf/2304.10218
  • Abstract
    The BB84 QKD protocol is based on the idea that the sender and the receiver can reconcile a certain fraction of the teleported qubits to detect eavesdropping or noise and decode the rest to use as a private key. Under the present hardware infrastructure, decoherence of quantum states poses a significant challenge to performing perfect or efficient teleportation, meaning that a teleportation-based protocol must be run multiple times to observe success. Thus, performance analyses of such protocols usually consider the completion time, i.e., the time until success, rather than the duration of a single attempt. Moreover, due to decoherence, the success of an attempt is in general dependent on the duration of individual phases of that attempt, as quantum states must wait in memory while the success or failure of a generation phase is communicated to the relevant parties. In this work, we do a performance analysis of the completion time of the BB84 protocol in a setting where the sender and the receiver are connected via a single quantum repeater and the only quantum channel between them does not see any adversarial attack. Assuming certain distributional forms for the generation and communication phases of teleportation, we provide a method to compute the MGF of the completion time and subsequently derive an estimate of the CDF and a bound on the tail probability. This result helps us gauge the (tail) behaviour of the completion time in terms of the parameters characterising the elementary phases of teleportation, without having to run the protocol multiple times. We also provide an efficient simulation scheme to generate the completion time, which relies on expressing the completion time in terms of aggregated teleportation times. We numerically compare our approach with a full-scale simulation and observe good agreement between them.

PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces

  • Authors: Shuhei Watanabe, Archit Bansal, Frank Hutter
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.10255
  • Pdf link: https://arxiv.org/pdf/2304.10255
  • Abstract
    The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA). However, the original f-ANOVA formulation is inapplicable to the subspaces most relevant to algorithm designers, such as those defined by top performance. To overcome this problem, we derive a novel formulation of f-ANOVA for arbitrary subspaces and propose an algorithm that uses Pearson divergence (PED) to enable a closed-form computation of HPI. We demonstrate that this new algorithm, dubbed PED-ANOVA, is able to successfully identify important HPs in different subspaces while also being extremely computationally efficient.

PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image

  • Authors: Jianhui Li, Jianmin Li, Haoji Zhang, Shilong Liu, Zhengyi Wang, Zihao Xiao, Kaiwen Zheng, Jun Zhu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10263
  • Pdf link: https://arxiv.org/pdf/2304.10263
  • Abstract
    We study the 3D-aware image attribute editing problem in this paper, which has wide applications in practice. Recent methods solved the problem by training a shared encoder to map images into a 3D generator's latent space or by per-image latent code optimization and then edited images in the latent space. Despite their promising results near the input view, they still suffer from the 3D inconsistency of produced images at large camera poses and imprecise image attribute editing, like affecting unspecified attributes during editing. For more efficient image inversion, we train a shared encoder for all images. To alleviate 3D inconsistency at large camera poses, we propose two novel methods, an alternating training scheme and a multi-view identity loss, to maintain 3D consistency and subject identity. As for imprecise image editing, we attribute the problem to the gap between the latent space of real images and that of generated images. We compare the latent space and inversion manifold of GAN models and demonstrate that editing in the inversion manifold can achieve better results in both quantitative and qualitative evaluations. Extensive experiments show that our method produces more 3D consistent images and achieves more precise image editing than previous work. Source code and pretrained models can be found on our project page: https://mybabyyh.github.io/Preim3D/

Robust nonlinear set-point control with reinforcement learning

  • Authors: Ruoqi Zhang, Per Mattsson, Torbjörn Wigren
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10277
  • Pdf link: https://arxiv.org/pdf/2304.10277
  • Abstract
    There has recently been an increased interest in reinforcement learning for nonlinear control problems. However standard reinforcement learning algorithms can often struggle even on seemingly simple set-point control problems. This paper argues that three ideas can improve reinforcement learning methods even for highly nonlinear set-point control problems: 1) Make use of a prior feedback controller to aid amplitude exploration. 2) Use integrated errors. 3) Train on model ensembles. Together these ideas lead to more efficient training, and a trained set-point controller that is more robust to modelling errors and thus can be directly deployed to real-world nonlinear systems. The claim is supported by experiments with a real-world nonlinear cascaded tank process and a simulated strongly nonlinear pH-control system.

A baseline on continual learning methods for video action recognition

  • Authors: Giulia Castagnolo, Concetto Spampinato, Francesco Rundo, Daniela Giordano, Simone Palazzo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10335
  • Pdf link: https://arxiv.org/pdf/2304.10335
  • Abstract
    Continual learning has recently attracted attention from the research community, as it aims to solve long-standing limitations of classic supervisedly-trained models. However, most research on this subject has tackled continual learning in simple image classification scenarios. In this paper, we present a benchmark of state-of-the-art continual learning methods on video action recognition. Besides the increased complexity due to the temporal dimension, the video setting imposes stronger requirements on computing resources for top-performing rehearsal methods. To counteract the increased memory requirements, we present two method-agnostic variants for rehearsal methods, exploiting measures of either model confidence or data information to select memorable samples. Our experiments show that, as expected from the literature, rehearsal methods outperform other approaches; moreover, the proposed memory-efficient variants are shown to be effective at retaining a certain level of performance with a smaller buffer size.

Engel's theorem in Mathlib

  • Authors: Oliver Nash
  • Subjects: Logic in Computer Science (cs.LO); Representation Theory (math.RT)
  • Arxiv link: https://arxiv.org/abs/2304.10424
  • Pdf link: https://arxiv.org/pdf/2304.10424
  • Abstract
    We discuss the theory of Lie algebras in Lean's Mathlib library. Using nilpotency as the theme, we outline a computer formalisation of Engel's theorem and an application to root space theory. We emphasise that all arguments work with coefficients in any commutative ring.

GPT-NER: Named Entity Recognition via Large Language Models

  • Authors: Shuhe Wang, Xiaofei Sun, Xiaoya Li, Rongbin Ouyang, Fei Wu, Tianwei Zhang, Jiwei Li, Guoyin Wang
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.10428
  • Pdf link: https://arxiv.org/pdf/2304.10428
  • Abstract
    Despite the fact that large-scale Language Models (LLM) have achieved SOTA performances on a variety of NLP tasks, its performance on NER is still significantly below supervised baselines. This is due to the gap between the two tasks the NER and LLMs: the former is a sequence labeling task in nature while the latter is a text-generation model. In this paper, we propose GPT-NER to resolve this issue. GPT-NER bridges the gap by transforming the sequence labeling task to a generation task that can be easily adapted by LLMs e.g., the task of finding location entities in the input text "Columbus is a city" is transformed to generate the text sequence "@@columbus## is a city", where special tokens @@## marks the entity to extract. To efficiently address the "hallucination" issue of LLMs, where LLMs have a strong inclination to over-confidently label NULL inputs as entities, we propose a self-verification strategy by prompting LLMs to ask itself whether the extracted entities belong to a labeled entity tag. We conduct experiments on five widely adopted NER datasets, and GPT-NER achieves comparable performances to fully supervised baselines, which is the first time as far as we are concerned. More importantly, we find that GPT-NER exhibits a greater ability in the low-resource and few-shot setups, when the amount of training data is extremely scarce, GPT-NER performs significantly better than supervised models. This demonstrates the capabilities of GPT-NER in real-world NER applications where the number of labeled examples is limited.

Securing Neural Networks with Knapsack Optimization

  • Authors: Yakir Gorski, Shai Avidan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10442
  • Pdf link: https://arxiv.org/pdf/2304.10442
  • Abstract
    Deep learning inference brings together the data and the Convolutional Neural Network (CNN). This is problematic in case the user wants to preserve the privacy of the data and the service provider does not want to reveal the weights of his CNN. Secure Inference allows the two parties to engage in a protocol that preserves their respective privacy concerns, while revealing only the inference result to the user. This is known as Multi-Party Computation (MPC). A major bottleneck of MPC algorithms is communication, as the parties must send data back and forth. The linear component of a CNN (i.e. convolutions) can be done efficiently with minimal communication, but the non-linear part (i.e., ReLU) requires the bulk of communication bandwidth. We propose two ways to accelerate Secure Inference. The first is based on the observation that the ReLU outcome of many convolutions is highly correlated. Therefore, we replace the per pixel ReLU operation by a ReLU operation per patch. Each layer in the network will benefit from a patch of a different size and we devise an algorithm to choose the optimal set of patch sizes through a novel reduction of the problem to a knapsack problem. The second way to accelerate Secure Inference is based on cutting the number of bit comparisons required for a secure ReLU operation. We demonstrate the cumulative effect of these tools in the semi-honest secure 3-party setting for four problems: Classifying ImageNet using ResNet50 backbone, classifying CIFAR100 using ResNet18 backbone, semantic segmentation of ADE20K using MobileNetV2 backbone and semantic segmentation of Pascal VOC 2012 using ResNet50 backbone. Our source code is publicly available: $\href{https://github.com/yg320/secure_inference}{\text{https://github.com/yg320/secure_inference}}$

Angle based dynamic learning rate for gradient descent

  • Authors: Neel Mishra, Pawan Kumar
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10457
  • Pdf link: https://arxiv.org/pdf/2304.10457
  • Abstract
    In our work, we propose a novel yet simple approach to obtain an adaptive learning rate for gradient-based descent methods on classification tasks. Instead of the traditional approach of selecting adaptive learning rates via the decayed expectation of gradient-based terms, we use the angle between the current gradient and the new gradient: this new gradient is computed from the direction orthogonal to the current gradient, which further helps us in determining a better adaptive learning rate based on angle history, thereby, leading to relatively better accuracy compared to the existing state-of-the-art optimizers. On a wide variety of benchmark datasets with prominent image classification architectures such as ResNet, DenseNet, EfficientNet, and VGG, we find that our method leads to the highest accuracy in most of the datasets. Moreover, we prove that our method is convergent.

Reducing Aggregate Electric Vehicle Battery Capacity through Sharing

  • Authors: Polina Alexeenko, Vasileios Charisopoulos
  • Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.10461
  • Pdf link: https://arxiv.org/pdf/2304.10461
  • Abstract
    Meeting growing demand for automotive battery resources is predicted to be costly from both economic and environmental perspectives. To minimize these costs, battery resources should be deployed as efficiently as possible. A potential source of inefficiency in battery deployment is the fact that the batteries of personal vehicles are typically much larger than needed to meet most daily mobility needs. In this paper, we consider whether battery resources can be used more efficiently in a setting where drivers, in addition to having personal vehicle batteries, have access to a shared battery resource. More precisely, we consider the problem of minimizing aggregate battery capacity in settings with and without a shared resource subject to the requirement that driver commuting needs are met with high reliability. To assess the potential for reductions in deployed battery capacity with the addition of a shared resource, we quantify the difference in deployed battery capacity with and without a shared resource in case study using real-world longitudinal mobility data from Puget Sound, Washington. We find that giving drivers access to a shared battery resource can substantially reduces deployed battery capacity. Furthermore, relative reductions in battery capacity increase with number of drivers and the level of reliability desired.

Efficient Deep Reinforcement Learning Requires Regulating Overfitting

  • Authors: Qiyang Li, Aviral Kumar, Ilya Kostrikov, Sergey Levine
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.10466
  • Pdf link: https://arxiv.org/pdf/2304.10466
  • Abstract
    Deep reinforcement learning algorithms that learn policies by trial-and-error must learn from limited amounts of data collected by actively interacting with the environment. While many prior works have shown that proper regularization techniques are crucial for enabling data-efficient RL, a general understanding of the bottlenecks in data-efficient RL has remained unclear. Consequently, it has been difficult to devise a universal technique that works well across all domains. In this paper, we attempt to understand the primary bottleneck in sample-efficient deep RL by examining several potential hypotheses such as non-stationarity, excessive action distribution shift, and overfitting. We perform thorough empirical analysis on state-based DeepMind control suite (DMC) tasks in a controlled and systematic way to show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms, and prior methods that lead to good performance do in fact, control the validation TD error to be low. This observation gives us a robust principle for making deep RL efficient: we can hill-climb on the validation TD error by utilizing any form of regularization techniques from supervised learning. We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.

A primal dual mixed finite element method for inverse identification of the diffusion coefficient and its relation to the Kohn-Vogelius penalty method

  • Authors: Erik Burman
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.10467
  • Pdf link: https://arxiv.org/pdf/2304.10467
  • Abstract
    We revisit the celebrated Kohn-Vogelius penalty method and discuss how to use it for the unique continuation problem where data is given in the bulk of the domain. We then show that the primal-dual mixed finite element methods for the elliptic Cauchy problem introduced in \cite{BLO18} (\emph{E. Burman, M. Larson, L. Oksanen, Primal-dual mixed finite element methods for the elliptic Cauchy problem, SIAM J. Num. Anal., 56 (6), 2018}) can be interpreted as a Kohn-Vogelius penalty method and modify it to allow for unique continuation using data in the bulk. We prove that the resulting linear system is invertible for all data. Then we show that by introducing a singularly perturbed Robin condition on the discrete level sufficient regularization is obtained so that error estimates can be shown using conditional stability. Finally we show how the method can be used for the identification of the diffusivity coefficient in a second order elliptic operator with partial data. Some numerical examples are presented showing the performance of the method for unique continuation and for impedance computed tomography with partial data.

New Closed-Form ASER Expressions for Dual-Hop Mixed THz-RF Cooperative Relay Networks

  • Authors: Soumendu Das, Nagendra Kumar, Dharmendra Dixit
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.10504
  • Pdf link: https://arxiv.org/pdf/2304.10504
  • Abstract
    In this paper, we consider a dual-hop mixed THz-RF system model for backhaul-fronthaul applications where the link between source and destination is established only through the relay node in which decode-and-forward relaying protocol is used. The THz link suffers from the joint impact of antenna misalignment and stochastic characteristics of wireless channels, including the effect of environmental conditions such as pressure, humidity, and temperature. The envelope of THz link in the first hop follows a generalized $\alpha-\mu$ distribution, and for the RF end, the Nakagami-$m$ distribution is considered. In this context, we obtain new closed-form expressions of the cumulative density function and the moment-generating function of the end-to-end signal-to-noise ratio. Further, we derive the average symbol error rate expressions for coherent rectangular quadrature amplitude modulation (RQAM) and coherent hexagonal QAM (HQAM), as well as the non-coherent modulation scheme. The asymptotic behavior is also discussed to examine the system's diversity. Furthermore, the impact of several parameters, such as fading coefficients of individual links and antenna misalignment, as well as the distance between nodes, are also highlighted in the system's performance. Moreover, Monte Carlo simulations are used to validate the presented analytical framework. Finally, the presented numerical insights aid in the extraction of practical design principles.

Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget

  • Authors: Johannes Lehner, Benedikt Alkin, Andreas Fürst, Elisabeth Rumetshofer, Lukas Miklautz, Sepp Hochreiter
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10520
  • Pdf link: https://arxiv.org/pdf/2304.10520
  • Abstract
    Masked Image Modeling (MIM) methods, like Masked Autoencoders (MAE), efficiently learn a rich representation of the input. However, for adapting to downstream tasks, they require a sufficient amount of labeled data since their rich features capture not only objects but also less relevant image background. In contrast, Instance Discrimination (ID) methods focus on objects. In this work, we study how to combine the efficiency and scalability of MIM with the ability of ID to perform downstream classification in the absence of large amounts of labeled data. To this end, we introduce Masked Autoencoder Contrastive Tuning (MAE-CT), a sequential approach that applies Nearest Neighbor Contrastive Learning (NNCLR) to a pre-trained MAE. MAE-CT tunes the rich features such that they form semantic clusters of objects without using any labels. Applied to large and huge Vision Transformer (ViT) models, MAE-CT matches or excels previous self-supervised methods trained on ImageNet in linear probing, k-NN and low-shot classification accuracy as well as in unsupervised clustering accuracy. Notably, similar results can be achieved without additional image augmentations. While ID methods generally rely on hand-crafted augmentations to avoid shortcut learning, we find that nearest neighbor lookup is sufficient and that this data-driven augmentation effect improves with model size. MAE-CT is compute efficient. For instance, starting from a MAE pre-trained ViT-L/16, MAE-CT increases the ImageNet 1% low-shot accuracy from 67.7% to 72.6%, linear probing accuracy from 76.0% to 80.2% and k-NN accuracy from 60.6% to 79.1% in just five hours using eight A100 GPUs.

Learning Narrow One-Hidden-Layer ReLU Networks

  • Authors: Sitan Chen, Zehao Dou, Surbhi Goel, Adam R Klivans, Raghu Meka
  • Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.10524
  • Pdf link: https://arxiv.org/pdf/2304.10524
  • Abstract
    We consider the well-studied problem of learning a linear combination of $k$ ReLU activations with respect to a Gaussian distribution on inputs in $d$ dimensions. We give the first polynomial-time algorithm that succeeds whenever $k$ is a constant. All prior polynomial-time learners require additional assumptions on the network, such as positive combining coefficients or the matrix of hidden weight vectors being well-conditioned. Our approach is based on analyzing random contractions of higher-order moment tensors. We use a multi-scale analysis to argue that sufficiently close neurons can be collapsed together, sidestepping the conditioning issues present in prior work. This allows us to design an iterative procedure to discover individual neurons.

Learning Sparse and Low-Rank Priors for Image Recovery via Iterative Reweighted Least Squares Minimization

  • Authors: Stamatios Lefkimmiatis, Iaroslav Koshelev
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.10536
  • Pdf link: https://arxiv.org/pdf/2304.10536
  • Abstract
    We introduce a novel optimization algorithm for image recovery under learned sparse and low-rank constraints, which we parameterize as weighted extensions of the $\ell_p^p$-vector and $\mathcal S_p^p$ Schatten-matrix quasi-norms for $0!&lt;p!\le1$, respectively. Our proposed algorithm generalizes the Iteratively Reweighted Least Squares (IRLS) method, used for signal recovery under $\ell_1$ and nuclear-norm constrained minimization. Further, we interpret our overall minimization approach as a recurrent network that we then employ to deal with inverse low-level computer vision problems. Thanks to the convergence guarantees that our IRLS strategy offers, we are able to train the derived reconstruction networks using a memory-efficient implicit back-propagation scheme, which does not pose any restrictions on their effective depth. To assess our networks' performance, we compare them against other existing reconstruction methods on several inverse problems, namely image deblurring, super-resolution, demosaicking and sparse recovery. Our reconstruction results are shown to be very competitive and in many cases outperform those of existing unrolled networks, whose number of parameters is orders of magnitude higher than that of our learned models.

Learning Neural Duplex Radiance Fields for Real-Time View Synthesis

  • Authors: Ziyu Wan, Christian Richardt, Aljaž Božič, Chao Li, Vijay Rengarajan, Seonghyeon Nam, Xiaoyu Xiang, Tuotuo Li, Bo Zhu, Rakesh Ranjan, Jing Liao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.10537
  • Pdf link: https://arxiv.org/pdf/2304.10537
  • Abstract
    Neural radiance fields (NeRFs) enable novel view synthesis with unprecedented visual quality. However, to render photorealistic images, NeRFs require hundreds of deep multilayer perceptron (MLP) evaluations - for each pixel. This is prohibitively expensive and makes real-time rendering infeasible, even on powerful modern GPUs. In this paper, we propose a novel approach to distill and bake NeRFs into highly efficient mesh-based neural representations that are fully compatible with the massively parallel graphics rendering pipeline. We represent scenes as neural radiance features encoded on a two-layer duplex mesh, which effectively overcomes the inherent inaccuracies in 3D surface reconstruction by learning the aggregated radiance information from a reliable interval of ray-surface intersections. To exploit local geometric relationships of nearby pixels, we leverage screen-space convolutions instead of the MLPs used in NeRFs to achieve high-quality appearance. Finally, the performance of the whole framework is further boosted by a novel multi-view distillation optimization strategy. We demonstrate the effectiveness and superiority of our approach via extensive experiments on a range of standard datasets.

Keyword: faster

An Intent-based Framework for Vehicular Edge Computing

  • Authors: TianZhang He, Adel N. Toosi, Negin Akbari, Muhammed Tawfiqul Islam, Muhammad Aamir Cheema
  • Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.09916
  • Pdf link: https://arxiv.org/pdf/2304.09916
  • Abstract
    The rapid development of emerging vehicular edge computing (VEC) brings new opportunities and challenges for dynamic resource management. The increasing number of edge data centers, roadside units (RSUs), and network devices, however, makes resource management a complex task in VEC. On the other hand, the exponential growth of service applications and end-users makes corresponding QoS hard to maintain. Intent-Based Networking (IBN), based on Software-Defined Networking, was introduced to provide the ability to automatically handle and manage the networking requirements of different applications. Motivated by the IBN concept, in this paper, we propose a novel approach to jointly orchestrate networking and computing resources based on user requirements. The proposed solution constantly monitors user requirements and dynamically re-configures the system to satisfy desired states of the application. We compared our proposed solution with the state-of-the-art networking embedding algorithms using real-world taxi GPS traces. Results show that our proposed method is significantly faster (up to 95%) and can improve resource utilization (up to 76%) and the acceptance ratio of computing and networking requests with various priorities (up to 71%). We also present a small-scale prototype of the proposed intent management framework to validate our solution.

Speed Me up if You Can: Conditional Lower Bounds on Opacity Verification

  • Authors: Jiří Balun, Tomáš Masopust, Petr Osička
  • Subjects: Formal Languages and Automata Theory (cs.FL); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.09920
  • Pdf link: https://arxiv.org/pdf/2304.09920
  • Abstract
    Opacity is a property of privacy and security applications asking whether, given a system model, a passive intruder that makes online observations of system's behaviour can ascertain some "secret" information of the system. Deciding opacity is a PSpace-complete problem, and hence there are no polynomial-time algorithms to verify opacity under the assumption that PSpace differs from PTime. This assumption, however, gives rise to a question whether the existing exponential-time algorithms are the best possible or whether there are faster, sub-exponential-time algorithms. We show that under the (Strong) Exponential Time Hypothesis, there are no algorithms that would be significantly faster than the existing algorithms. As a by-product, we obtained a new conditional lower bound on the time complexity of deciding universality (and therefore also inclusion and equivalence) for nondeterministic finite automata.

Two-Memory Reinforcement Learning

  • Authors: Zhao Yang, Thomas. M. Moerland, Mike Preuss, Aske Plaat
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10098
  • Pdf link: https://arxiv.org/pdf/2304.10098
  • Abstract
    While deep reinforcement learning has shown important empirical success, it tends to learn relatively slow due to slow propagation of rewards information and slow update of parametric neural networks. Non-parametric episodic memory, on the other hand, provides a faster learning alternative that does not require representation learning and uses maximum episodic return as state-action values for action selection. Episodic memory and reinforcement learning both have their own strengths and weaknesses. Notably, humans can leverage multiple memory systems concurrently during learning and benefit from all of them. In this work, we propose a method called Two-Memory reinforcement learning agent (2M) that combines episodic memory and reinforcement learning to distill both of their strengths. The 2M agent exploits the speed of the episodic memory part and the optimality and the generalization capacity of the reinforcement learning part to complement each other. Our experiments demonstrate that the 2M agent is more data efficient and outperforms both pure episodic memory and pure reinforcement learning, as well as a state-of-the-art memory-augmented RL agent. Moreover, the proposed approach provides a general framework that can be used to combine any episodic memory agent with other off-policy reinforcement learning algorithms.

ZEBRA: Z-order Curve-based Event Retrieval Approach to Efficiently Explore Automotive Data

  • Authors: Christian Berger, Lukas Birkemeyer
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.10232
  • Pdf link: https://arxiv.org/pdf/2304.10232
  • Abstract
    Evaluating the performance of software for automated vehicles is predominantly driven by data collected from the real world. While professional test drivers are supported with technical means to semi-automatically annotate driving maneuvers to allow better event identification, simple data loggers in large vehicle fleets typically lack automatic and detailed event classification and hence, extra effort is needed when post-processing such data. Yet, the data quality from professional test drivers is apparently higher than the one from large fleets where labels are missing, but the non-annotated data set from large vehicle fleets is much more representative for typical, realistic driving scenarios to be handled by automated vehicles. However, while growing the data from large fleets is relatively simple, adding valuable annotations during post-processing has become increasingly expensive. In this paper, we leverage Z-order space-filling curves to systematically reduce data dimensionality while preserving domain-specific data properties, which allows us to explore even large-scale field data sets to spot interesting events orders of magnitude faster than processing time-series data directly. Furthermore, the proposed concept is based on an analytical approach, which preserves explainability for the identified events.

Observer-Feedback-Feedforward Controller Structures in Reinforcement Learning

  • Authors: Ruoqi Zhang, Per Mattson, Torbjörn Wigren
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10276
  • Pdf link: https://arxiv.org/pdf/2304.10276
  • Abstract
    The paper proposes the use of structured neural networks for reinforcement learning based nonlinear adaptive control. The focus is on partially observable systems, with separate neural networks for the state and feedforward observer and the state feedback and feedforward controller. The observer dynamics are modelled by recurrent neural networks while a standard network is used for the controller. As discussed in the paper, this leads to a separation of the observer dynamics to the recurrent neural network part, and the state feedback to the feedback and feedforward network. The structured approach reduces the computational complexity and gives the reinforcement learning based controller an {\em understandable} structure as compared to when one single neural network is used. As shown by simulation the proposed structure has the additional and main advantage that the training becomes significantly faster. Two ways to include feedforward structure are presented, one related to state feedback control and one related to classical feedforward control. The latter method introduces further structure with a separate recurrent neural network that processes only the measured disturbance. When evaluated with simulation on a nonlinear cascaded double tank process, the method with most structure performs the best, with excellent feedforward disturbance rejection gains.

Regret-Minimizing Double Oracle for Extensive-Form Games

  • Authors: Xiaohang Tang, Le Cong Dinh, Stephen Marcus McAleer, Yaodong Yang
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2304.10498
  • Pdf link: https://arxiv.org/pdf/2304.10498
  • Abstract
    By incorporating regret minimization, double oracle methods have demonstrated rapid convergence to Nash Equilibrium (NE) in normal-form games and extensive-form games, through algorithms such as online double oracle (ODO) and extensive-form double oracle (XDO), respectively. In this study, we further examine the theoretical convergence rate and sample complexity of such regret minimization-based double oracle methods, utilizing a unified framework called Regret-Minimizing Double Oracle. Based on this framework, we extend ODO to extensive-form games and determine its sample complexity. Moreover, we demonstrate that the sample complexity of XDO can be exponential in the number of information sets $|S|$, owing to the exponentially decaying stopping threshold of restricted games. To solve this problem, we propose the Periodic Double Oracle (PDO) method, which has the lowest sample complexity among all existing double oracle methods, being only polynomial in $|S|$. Empirical evaluations on multiple poker and board games show that PDO achieves significantly faster convergence than previous double oracle algorithms and reaches a competitive level with state-of-the-art regret minimization methods.

Transformer Models for Type Inference in the Simply Typed Lambda Calculus: A Case Study in Deep Learning for Code

  • Authors: Brando Miranda, Avi Shinnar, Vasily Pestun, Barry Trager
  • Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Symbolic Computation (cs.SC)
  • Arxiv link: https://arxiv.org/abs/2304.10500
  • Pdf link: https://arxiv.org/pdf/2304.10500
  • Abstract
    Despite a growing body of work at the intersection of deep learning and formal languages, there has been relatively little systematic exploration of transformer models for reasoning about typed lambda calculi. This is an interesting area of inquiry for two reasons. First, typed lambda calculi are the lingua franc of programming languages. A set of heuristics that relate various typed lambda calculi to effective neural architectures would provide a systematic method for mapping language features (e.g., polymorphism, subtyping, inheritance, etc.) to architecture choices. Second, transformer models are widely used in deep learning architectures applied to code, but the design and hyperparameter space for them is large and relatively unexplored in programming language applications. Therefore, we suggest a benchmark that allows us to explore exactly this through perhaps the simplest and most fundamental property of a programming language: the relationship between terms and types. Consequently, we begin this inquiry of transformer architectures for typed lambda calculi by exploring the effect of transformer warm-up and optimizer selection in the task of type inference: i.e., predicting the types of lambda calculus terms using only transformers. We find that the optimization landscape is difficult even in this simple setting. One particular experimental finding is that optimization by Adafactor converges much faster compared to the optimization by Adam and RAdam. We conjecture that such different performance of optimizers might be related to the difficulties of generalization over formally generated dataset.

Autonomic Architecture for Big Data Performance Optimization

  • Authors: Mikhail Genkin, Frank Dehne, Anousheh Shahmirza, Pablo Navarro, Siyu Zhou
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10503
  • Pdf link: https://arxiv.org/pdf/2304.10503
  • Abstract
    The big data software stack based on Apache Spark and Hadoop has become mission critical in many enterprises. Performance of Spark and Hadoop jobs depends on a large number of configuration settings. Manual tuning is expensive and brittle. There have been prior efforts to develop on-line and off-line automatic tuning approaches to make the big data stack less dependent on manual tuning. These, however, demonstrated only modest performance improvements with very simple, single-user workloads on small data sets. This paper presents KERMIT - the autonomic architecture for big data capable of automatically tuning Apache Spark and Hadoop on-line, and achieving performance results 30% faster than rule-of-thumb tuning by a human administrator and up to 92% as fast as the fastest possible tuning established by performing an exhaustive search of the tuning parameter space. KERMIT can detect important workload changes with up to 99% accuracy, and predict future workload types with up to 96% accuracy. It is capable of identifying and classifying complex multi-user workloads without being explicitly trained on examples of these workloads. It does not rely on the past workload history to predict the future workload classes and their associated performance. KERMIT can identify and learn new workload classes, and adapt to workload drift, without human intervention.

Generalizing Neural Human Fitting to Unseen Poses With Articulated SE(3) Equivariance

  • Authors: Haiwen Feng, Peter Kulits, Shichen Liu, Michael J. Black, Victoria Abrevaya
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10528
  • Pdf link: https://arxiv.org/pdf/2304.10528
  • Abstract
    We address the problem of fitting a parametric human body model (SMPL) to point cloud data. Optimization-based methods require careful initialization and are prone to becoming trapped in local optima. Learning-based methods address this but do not generalize well when the input pose is far from those seen during training. For rigid point clouds, remarkable generalization has been achieved by leveraging SE(3)-equivariant networks, but these methods do not work on articulated objects. In this work we extend this idea to human bodies and propose ArtEq, a novel part-based SE(3)-equivariant neural architecture for SMPL model estimation from point clouds. Specifically, we learn a part detection network by leveraging local SO(3) invariance, and regress shape and pose using articulated SE(3) shape-invariant and pose-equivariant networks, all trained end-to-end. Our novel equivariant pose regression module leverages the permutation-equivariant property of self-attention layers to preserve rotational equivariance. Experimental results show that ArtEq can generalize to poses not seen during training, outperforming state-of-the-art methods by 74.5%, without requiring an optimization refinement step. Further, compared with competing works, our method is more than three orders of magnitude faster during inference and has 97.3% fewer parameters. The code and model will be available for research purposes at https://arteq.is.tue.mpg.de.

Keyword: mobile

NRTS: A Client-Server architecture for supporting data recording, transmission and evaluation of multidisciplinary teams during the neonatal resuscitation simulation scenario

  • Authors: Manuel Striani
  • Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.09860
  • Pdf link: https://arxiv.org/pdf/2304.09860
  • Abstract
    In this technical report, we describe Neonatal Resuscitation Training Simulator (NRTS), an Android mobile app designed to support medical experts to input, transmit and record data during a High-Fidelity Simulation course for neonatal resuscitation. This mobile app allows one to automatically send all the recorded data from "Neonatal Intensive Care Unit" (NICU) of Casale Monferrato Children's Hospital, (Italy) to a server located at the Department of Science and Technological Innovation (DiSIT), University of Piemonte Orientale (Italy). Finally, the medical instructor can view statistics on a simulation exercise that may be used during the de-briefing phase for the evaluation of multidisciplinary teams involved in the simulation scenarios.

Scheduling DNNs on Edge Servers

  • Authors: Jian He, Chenxi Yang, Zhaoyuan He, Ghufran Baig, Lili Qiu
  • Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09961
  • Pdf link: https://arxiv.org/pdf/2304.09961
  • Abstract
    Deep neural networks (DNNs) have been widely used in various video analytic tasks. These tasks demand real-time responses. Due to the limited processing power on mobile devices, a common way to support such real-time analytics is to offload the processing to an edge server. This paper examines how to speed up the edge server DNN processing for multiple clients. In particular, we observe batching multiple DNN requests significantly speeds up the processing time. Based on this observation, we first design a novel scheduling algorithm to exploit the batching benefits of all requests that run the same DNN. This is compelling since there are only a handful of DNNs and many requests tend to use the same DNN. Our algorithms are general and can support different objectives, such as minimizing the completion time or maximizing the on-time ratio. We then extend our algorithm to handle requests that use different DNNs with or without shared layers. Finally, we develop a collaborative approach to further improve performance by adaptively processing some of the requests or portions of the requests locally at the clients. This is especially useful when the network and/or server is congested. Our implementation shows the effectiveness of our approach under different request distributions (e.g., Poisson, Pareto, and Constant inter-arrivals).

Availability Model of a 5G-MEC System

  • Authors: Thilina Pathirana, Gianfranco Nencioni
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.09992
  • Pdf link: https://arxiv.org/pdf/2304.09992
  • Abstract
    Multi-access Edge Computing (MEC) is one of the enabling technologies of the fifth generation (5G) of mobile networks. MEC enables services with strict latency requirements by bringing computing capabilities close to the users. As with any new technology, the dependability of MEC is one of the aspects that need to be carefully studied. In this paper, we propose a two-level model to compute the availability of a 5G-MEC system. We then use the model to evaluate the availability of a 5G-MEC system under various configurations. The results show that having a single redundancy of the 5G-MEC elements leads an acceptable availability. To reach a high availability, the software failure intensity of the management elements of 5G and MEC should be reduced.

Robust Route Planning with Distributional Reinforcement Learning in a Stochastic Road Network Environment

  • Authors: Xi Lin, Paul Szenher, John D. Martin, Brendan Englot
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.09996
  • Pdf link: https://arxiv.org/pdf/2304.09996
  • Abstract
    Route planning is essential to mobile robot navigation problems. In recent years, deep reinforcement learning (DRL) has been applied to learning optimal planning policies in stochastic environments without prior knowledge. However, existing works focus on learning policies that maximize the expected return, the performance of which can vary greatly when the level of stochasticity in the environment is high. In this work, we propose a distributional reinforcement learning based framework that learns return distributions which explicitly reflect environmental stochasticity. Policies based on the second-order stochastic dominance (SSD) relation can be used to make adjustable route decisions according to user preference on performance robustness. Our proposed method is evaluated in a simulated road network environment, and experimental results show that our method is able to plan the shortest routes that minimize stochasticity in travel time when robustness is preferred, while other state-of-the-art DRL methods are agnostic to environmental stochasticity.

FTMRate: Collision-Immune Distance-based Data Rate Selection for IEEE 802.11 Networks

  • Authors: Wojciech Ciezobka, Maksymilian Wojnar, Katarzyna Kosek-Szott, Szymon Szott, Krzysztof Rusek
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.10140
  • Pdf link: https://arxiv.org/pdf/2304.10140
  • Abstract
    Data rate selection algorithms for Wi-Fi devices are an important area of research because they directly impact performance. Most of the proposals are based on measuring the transmission success probability for a given data rate. In dense scenarios, however, this probing approach will fail because frame collisions are misinterpreted as erroneous data rate selection. We propose FTMRate which uses the fine timing measurement (FTM) feature, recently introduced in IEEE 802.11. FTM allows stations to measure their distance from the AP. We argue that knowledge of the distance from the receiver can be useful in determining which data rate to use. We apply statistical learning (a form of machine learning) to estimate the distance based on measurements, estimate channel quality from the distance, and select data rates based on channel quality. We evaluate three distinct estimation approaches: exponential smoothing, Kalman filter, and particle filter. We present a performance evaluation of the three variants of FTMRate and show, in several dense and mobile (though line-of-sight only) scenarios, that it can outperform two benchmarks and provide close to optimal results in IEEE 802.11ax networks.

A Large-scale Examination of "Socioeconomic" Fairness in Mobile Networks

  • Authors: Souneil Park, Pavol Mulinka, Diego Perino
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.10190
  • Pdf link: https://arxiv.org/pdf/2304.10190
  • Abstract
    Internet access is a special resource of which needs has become universal across the public whereas the service is operated in the private sector. Mobile Network Operators (MNOs) put efforts for management, planning, and optimization; however, they do not link such activities to socioeconomic fairness. In this paper, we make a first step towards understanding the relation between socioeconomic status of customers and network performance, and investigate potential discrimination in network deployment and management. The scope of our study spans various aspects, including urban geography, network resource deployment, data consumption, and device distribution. A novel methodology that enables a geo-socioeconomic perspective to mobile network is developed for the study. The results are based on an actual infrastructure in multiple cities, covering millions of users densely covering the socioeconomic scale. We report a thorough examination of the fairness status, its relationship with various structural factors, and potential class specific solutions.

Breast cancer detection using deep learning

  • Authors: Gayathri Girish, Ponnathota Spandana, Badrish Vasu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10386
  • Pdf link: https://arxiv.org/pdf/2304.10386
  • Abstract
    Objective: This paper proposes a deep learning model for breast cancer detection from reconstructed images of microwave imaging scan data and aims to improve the accuracy and efficiency of breast tumor detection, which could have a significant impact on breast cancer diagnosis and treatment. Methods: Our framework consists of different convolutional neural network (CNN) architectures for feature extraction and a region-based CNN for tumor detection. We use 7 different architectures: DenseNet201, ResNet50, InceptionV3, InceptionResNetV3, MobileNetV2, NASNetMobile and NASNetLarge and compare its performance to find the best architecture out of the seven. An experimental dataset of MRI-derived breast phantoms was used. Results: NASNetLarge is the best architecture which can be used for the CNN model with accuracy of 88.41% and loss of 27.82%. Given that the model's AUC is 0.786, it can be concluded that it is suitable for use in its present form, while it could be improved upon and trained on other datasets that are comparable. Impact: One of the main causes of death in women is breast cancer, and early identification is essential for enhancing the results for patients. Due to its non-invasiveness and capacity to produce high-resolution images, microwave imaging is a potential tool for breast cancer screening. The complexity of tumors makes it difficult to adequately detect them in microwave images. The results of this research show that deep learning has a lot of potential for breast cancer detection in microwave images

Securing Neural Networks with Knapsack Optimization

  • Authors: Yakir Gorski, Shai Avidan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10442
  • Pdf link: https://arxiv.org/pdf/2304.10442
  • Abstract
    Deep learning inference brings together the data and the Convolutional Neural Network (CNN). This is problematic in case the user wants to preserve the privacy of the data and the service provider does not want to reveal the weights of his CNN. Secure Inference allows the two parties to engage in a protocol that preserves their respective privacy concerns, while revealing only the inference result to the user. This is known as Multi-Party Computation (MPC). A major bottleneck of MPC algorithms is communication, as the parties must send data back and forth. The linear component of a CNN (i.e. convolutions) can be done efficiently with minimal communication, but the non-linear part (i.e., ReLU) requires the bulk of communication bandwidth. We propose two ways to accelerate Secure Inference. The first is based on the observation that the ReLU outcome of many convolutions is highly correlated. Therefore, we replace the per pixel ReLU operation by a ReLU operation per patch. Each layer in the network will benefit from a patch of a different size and we devise an algorithm to choose the optimal set of patch sizes through a novel reduction of the problem to a knapsack problem. The second way to accelerate Secure Inference is based on cutting the number of bit comparisons required for a secure ReLU operation. We demonstrate the cumulative effect of these tools in the semi-honest secure 3-party setting for four problems: Classifying ImageNet using ResNet50 backbone, classifying CIFAR100 using ResNet18 backbone, semantic segmentation of ADE20K using MobileNetV2 backbone and semantic segmentation of Pascal VOC 2012 using ResNet50 backbone. Our source code is publicly available: $\href{https://github.com/yg320/secure_inference}{\text{https://github.com/yg320/secure_inference}}$

Keyword: pruning

Model Pruning Enables Localized and Efficient Federated Learning for Yield Forecasting and Data Sharing

  • Authors: Andy Li, Milan Markovic, Peter Edwards, Georgios Leontidis
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09876
  • Pdf link: https://arxiv.org/pdf/2304.09876
  • Abstract
    Federated Learning (FL) presents a decentralized approach to model training in the agri-food sector and offers the potential for improved machine learning performance, while ensuring the safety and privacy of individual farms or data silos. However, the conventional FL approach has two major limitations. First, the heterogeneous data on individual silos can cause the global model to perform well for some clients but not all, as the update direction on some clients may hinder others after they are aggregated. Second, it is lacking with respect to the efficiency perspective concerning communication costs during FL and large model sizes. This paper proposes a new technical solution that utilizes network pruning on client models and aggregates the pruned models. This method enables local models to be tailored to their respective data distribution and mitigate the data heterogeneity present in agri-food data. Moreover, it allows for more compact models that consume less data during transmission. We experiment with a soybean yield forecasting dataset and find that this approach can improve inference performance by 15.5% to 20% compared to FedAvg, while reducing local model sizes by up to 84% and the data volume communicated between the clients and the server by 57.1% to 64.7%.

Keyword: voxel

Tetra-NeRF: Representing Neural Radiance Fields Using Tetrahedra

  • Authors: Jonas Kulhanek, Torsten Sattler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09987
  • Pdf link: https://arxiv.org/pdf/2304.09987
  • Abstract
    Neural Radiance Fields (NeRFs) are a very recent and very popular approach for the problems of novel view synthesis and 3D reconstruction. A popular scene representation used by NeRFs is to combine a uniform, voxel-based subdivision of the scene with an MLP. Based on the observation that a (sparse) point cloud of the scene is often available, this paper proposes to use an adaptive representation based on tetrahedra and a Delaunay representation instead of the uniform subdivision or point-based representations. We show that such a representation enables efficient training and leads to state-of-the-art results. Our approach elegantly combines concepts from 3D geometry processing, triangle-based rendering, and modern neural radiance fields. Compared to voxel-based representations, ours provides more detail around parts of the scene likely to be close to the surface. Compared to point-based representations, our approach achieves better performance.

Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering

  • Authors: Dongting Hu, Zhenkai Zhang, Tingbo Hou, Tongliang Liu, Huan Fu, Mingming Gong
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10075
  • Pdf link: https://arxiv.org/pdf/2304.10075
  • Abstract
    The rendering scheme in neural radiance field (NeRF) is effective in rendering a pixel by casting a ray into the scene. However, NeRF yields blurred rendering results when the training images are captured at non-uniform scales, and produces aliasing artifacts if the test images are taken in distant views. To address this issue, Mip-NeRF proposes a multiscale representation as a conical frustum to encode scale information. Nevertheless, this approach is only suitable for offline rendering since it relies on integrated positional encoding (IPE) to query a multilayer perceptron (MLP). To overcome this limitation, we propose mip voxel grids (Mip-VoG), an explicit multiscale representation with a deferred architecture for real-time anti-aliasing rendering. Our approach includes a density Mip-VoG for scene geometry and a feature Mip-VoG with a small MLP for view-dependent color. Mip-VoG encodes scene scale using the level of detail (LOD) derived from ray differentials and uses quadrilinear interpolation to map a queried 3D location to its features and density from two neighboring downsampled voxel grids. To our knowledge, our approach is the first to offer multiscale training and real-time anti-aliasing rendering simultaneously. We conducted experiments on multiscale datasets, and the results show that our approach outperforms state-of-the-art real-time rendering baselines.

Keyword: lidar

LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields

  • Authors: Tang Tao, Longfei Gao, Guangrun Wang, Peng Chen, Dayang Hao, Xiaodan Liang, Mathieu Salzmann, Kaicheng Yu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10406
  • Pdf link: https://arxiv.org/pdf/2304.10406
  • Abstract
    We introduce a new task, novel view synthesis for LiDAR sensors. While traditional model-based LiDAR simulators with style-transfer neural networks can be applied to render novel views, they fall short in producing accurate and realistic LiDAR patterns, because the renderers they rely on exploit game engines, which are not differentiable. We address this by formulating, to the best of our knowledge, the first differentiable LiDAR renderer, and propose an end-to-end framework, LiDAR-NeRF, leveraging a neural radiance field (NeRF) to enable jointly learning the geometry and the attributes of 3D points. To evaluate the effectiveness of our approach, we establish an object-centric multi-view LiDAR dataset, dubbed NeRF-MVL. It contains observations of objects from 9 categories seen from 360-degree viewpoints captured with multiple LiDAR sensors. Our extensive experiments on the scene-level KITTI-360 dataset, and on our object-level NeRF-MVL show that our LiDAR- NeRF surpasses the model-based algorithms significantly.

Keyword: diffusion

Using Text-to-Image Generation for Architectural Design Ideation

  • Authors: Ville Paananen, Jonas Oppenlaender, Aku Visuri
  • Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10182
  • Pdf link: https://arxiv.org/pdf/2304.10182
  • Abstract
    The recent progress of text-to-image generation has been recognized in architectural design. Our study is the first to investigate the potential of text-to-image generators in supporting creativity during the early stages of the architectural design process. We conducted a laboratory study with 17 architecture students, who developed a concept for a culture center using three popular text-to-image generators: Midjourney, Stable Diffusion, and DALL-E. Through standardized questionnaires and group interviews, we found that image generation could be a meaningful part of the design process when design constraints are carefully considered. Generative tools support serendipitous discovery of ideas and an imaginative mindset, enriching the design process. We identified several challenges of image generators and provided considerations for software development and educators to support creativity and emphasize designers' imaginative mindset. By understanding the limitations and potential of text-to-image generators, architects and designers can leverage this technology in their design process and education, facilitating innovation and effective communication of concepts.

A data augmentation perspective on diffusion models and retrieval

  • Authors: Max F. Burg, Florian Wenzel, Dominik Zietlow, Max Horn, Osama Makansi, Francesco Locatello, Chris Russell
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10253
  • Pdf link: https://arxiv.org/pdf/2304.10253
  • Abstract
    Diffusion models excel at generating photorealistic images from text-queries. Naturally, many approaches have been proposed to use these generative abilities to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large noisily supervised, but nonetheless, annotated datasets. It is an open question whether the generalization capabilities of diffusion models beyond using the additional data of the pre-training process for augmentation lead to improved downstream performance. We perform a systematic evaluation of existing methods to generate images from diffusion models and study new extensions to assess their benefit for data augmentation. While we find that personalizing diffusion models towards the target data outperforms simpler prompting strategies, we also show that using the training data of the diffusion model alone, via a simple nearest neighbor retrieval procedure, leads to even stronger downstream performance. Overall, our study probes the limitations of diffusion models for data augmentation but also highlights its potential in generating new training data to improve performance on simple downstream vision tasks.

Anything-3D: Towards Single-view Anything Reconstruction in the Wild

  • Authors: Qiuhong Shen, Xingyi Yang, Xinchao Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10261
  • Pdf link: https://arxiv.org/pdf/2304.10261
  • Abstract
    3D reconstruction from a single-RGB image in unconstrained real-world scenarios presents numerous challenges due to the inherent diversity and complexity of objects and environments. In this paper, we introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model to elevate objects to 3D, yielding a reliable and versatile system for single-view conditioned 3D reconstruction task. Our approach employs a BLIP model to generate textural descriptions, utilizes the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field. Demonstrating its ability to produce accurate and detailed 3D reconstructions for a wide array of objects, \emph{Anything-3D\footnotemark[2]} shows promise in addressing the limitations of existing methodologies. Through comprehensive experiments and evaluations on various datasets, we showcase the merits of our approach, underscoring its potential to contribute meaningfully to the field of 3D reconstruction. Demos and code will be available at \href{https://github.com/Anything-of-anything/Anything-3D}{https://github.com/Anything-of-anything/Anything-3D}.

Prediction of the evolution of the nuclear reactor core parameters using artificial neural network

  • Authors: Krzysztof Palmi, Wojciech Kubinski, Piotr Darnowski
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10337
  • Pdf link: https://arxiv.org/pdf/2304.10337
  • Abstract
    A nuclear reactor based on MIT BEAVRS benchmark was used as a typical power generating Pressurized Water Reactor (PWR). The PARCS v3.2 nodal-diffusion core simulator was used as a full-core reactor physics solver to emulate the operation of a reactor and to generate training, and validation data for the ANN. The ANN was implemented with dedicated Python 3.8 code with Google's TensorFlow 2.0 library. The effort was based to a large extent on the process of appropriate automatic transformation of data generated by PARCS simulator, which was later used in the process of the ANN development. Various methods that allow obtaining better accuracy of the ANN predicted results were studied, such as trying different ANN architectures to find the optimal number of neurons in the hidden layers of the network. Results were later compared with the architectures proposed in the literature. For the selected best architecture predictions were made for different core parameters and their dependence on core loading patterns. In this study, a special focus was put on the prediction of the fuel cycle length for a given core loading pattern, as it can be considered one of the targets for plant economic operation. For instance, the length of a single fuel cycle depending on the initial core loading pattern was predicted with very good accuracy (>99%). This work contributes to the exploration of the usefulness of neural networks in solving nuclear reactor design problems. Thanks to the application of ANN, designers can avoid using an excessive amount of core simulator runs and more rapidly explore the space of possible solutions before performing more detailed design considerations.

Collaborative Diffusion for Multi-Modal Face Generation and Editing

  • Authors: Ziqi Huang, Kelvin C.K. Chan, Yuming Jiang, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10530
  • Pdf link: https://arxiv.org/pdf/2304.10530
  • Abstract
    Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion models mainly focus on uni-modal control, i.e., the diffusion process is driven by only one modality of condition. To further unleash the users' creativity, it is desirable for the model to be controllable by multiple modalities simultaneously, e.g., generating and editing faces by describing the age (text-driven) while drawing the face shape (mask-driven). In this work, we present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training. Our key insight is that diffusion models driven by different modalities are inherently complementary regarding the latent denoising steps, where bilateral connections can be established upon. Specifically, we propose dynamic diffuser, a meta-network that adaptively hallucinates multi-modal denoising steps by predicting the spatial-temporal influence functions for each pre-trained uni-modal model. Collaborative Diffusion not only collaborates generation capabilities from uni-modal diffusion models, but also integrates multiple uni-modal manipulations to perform multi-modal editing. Extensive qualitative and quantitative experiments demonstrate the superiority of our framework in both image quality and condition consistency.

Nerfbusters: Removing Ghostly Artifacts from Casually Captured NeRFs

  • Authors: Frederik Warburg, Ethan Weber, Matthew Tancik, Aleksander Holynski, Angjoo Kanazawa
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.10532
  • Pdf link: https://arxiv.org/pdf/2304.10532
  • Abstract
    Casually captured Neural Radiance Fields (NeRFs) suffer from artifacts such as floaters or flawed geometry when rendered outside the camera trajectory. Existing evaluation protocols often do not capture these effects, since they usually only assess image quality at every 8th frame of the training capture. To push forward progress in novel-view synthesis, we propose a new dataset and evaluation procedure, where two camera trajectories are recorded of the scene: one used for training, and the other for evaluation. In this more challenging in-the-wild setting, we find that existing hand-crafted regularizers do not remove floaters nor improve scene geometry. Thus, we propose a 3D diffusion-based method that leverages local 3D priors and a novel density-based score distillation sampling loss to discourage artifacts during NeRF optimization. We show that this data-driven prior removes floaters and improves scene geometry for casual captures.

Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion

  • Authors: Tomas Jakab, Ruining Li, Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10535
  • Pdf link: https://arxiv.org/pdf/2304.10535
  • Abstract
    We present Farm3D, a method to learn category-specific 3D reconstructors for articulated objects entirely from "free" virtual supervision from a pre-trained 2D diffusion-based image generator. Recent approaches can learn, given a collection of single-view images of an object category, a monocular network to predict the 3D shape, albedo, illumination and viewpoint of any object occurrence. We propose a framework using an image generator like Stable Diffusion to generate virtual training data for learning such a reconstruction network from scratch. Furthermore, we include the diffusion model as a score to further improve learning. The idea is to randomise some aspects of the reconstruction, such as viewpoint and illumination, generating synthetic views of the reconstructed 3D object, and have the 2D network assess the quality of the resulting image, providing feedback to the reconstructor. Different from work based on distillation which produces a single 3D asset for each textual prompt in hours, our approach produces a monocular reconstruction network that can output a controllable 3D asset from a given image, real or generated, in only seconds. Our network can be used for analysis, including monocular reconstruction, or for synthesis, generating articulated assets for real-time applications such as video games.

Keyword: dynamic

GeoGraphViz: Geographically Constrained 3D Force-Directed Graph for Knowledge Graph Visualization

  • Authors: Sizhe Wang, Wenwen Li, Zhining Gu
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.09864
  • Pdf link: https://arxiv.org/pdf/2304.09864
  • Abstract
    Knowledge graphs are a key technique for linking and integrating cross-domain data, concepts, tools, and knowledge to enable data-driven analytics. As much of the worlds data have become massive in size, visualizing graph entities and their interrelationships intuitively and interactively has become a crucial task for ingesting and better utilizing graph content to support semantic reasoning, discovering hidden knowledge discovering, and better scientific understanding of geophysical and social phenomena. Despite the fact that many such phenomena (e.g., disasters) have clear spatial footprints and geographical properties, their location information is considered only as a textual label in existing graph visualization tools, limiting their capability to reveal the geospatial distribution patterns of the graph nodes. In addition, most graph visualization techniques rely on 2D graph visualization, which constraints the dimensions of information that can be presented and lacks support for graph structure examination from multiple angles. To tackle the above challenges, we developed a novel 3D map-based graph visualization algorithm to enable interactive exploration of graph content and patterns in a spatially explicit manner. The algorithm extends a 3D force directed graph by integrating a web map, an additional geolocational force, and a force balancing variable that allows for the dynamic adjustment of the 3D graph structure and layout. This mechanism helps create a balanced graph view between the semantic forces among the graph nodes and the attractive force from a geolocation to a graph node. Our solution offers a new perspective in visualizing and understanding spatial entities and events in a knowledge graph.

Robust trajectory tracking for underactuated mechanical systems without velocity measurements

  • Authors: N. Javanmardi, P. Borja, M. J. Yazdanpanah, J. M. A. Scherpen
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.09910
  • Pdf link: https://arxiv.org/pdf/2304.09910
  • Abstract
    In this paper, the notion of contraction is used to solve the trajectory-tracking problem for a class of mechanical systems. Additionally, we propose a dynamic extension to remove velocity measurements from the controller while rejecting matched disturbances. In particular, we propose three control designs stemming from the Interconnection and Damping Assignment Passivity-Based Control approach. The first controller is a tracker that does not require velocity measurements. The second control design solves the trajectory-tracking problem while guaranteeing robustness with respect to matched disturbances. Then, the third approach is a combination of both mentioned controllers. It is shown that all proposed design methods guarantee exponential convergence of the mechanical system to the desired (feasible) trajectory due to the contraction property of the closed-loop system. The applicability of this method is illustrated via the design of a controller for an underactuated mechanical system.

An Intent-based Framework for Vehicular Edge Computing

  • Authors: TianZhang He, Adel N. Toosi, Negin Akbari, Muhammed Tawfiqul Islam, Muhammad Aamir Cheema
  • Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.09916
  • Pdf link: https://arxiv.org/pdf/2304.09916
  • Abstract
    The rapid development of emerging vehicular edge computing (VEC) brings new opportunities and challenges for dynamic resource management. The increasing number of edge data centers, roadside units (RSUs), and network devices, however, makes resource management a complex task in VEC. On the other hand, the exponential growth of service applications and end-users makes corresponding QoS hard to maintain. Intent-Based Networking (IBN), based on Software-Defined Networking, was introduced to provide the ability to automatically handle and manage the networking requirements of different applications. Motivated by the IBN concept, in this paper, we propose a novel approach to jointly orchestrate networking and computing resources based on user requirements. The proposed solution constantly monitors user requirements and dynamically re-configures the system to satisfy desired states of the application. We compared our proposed solution with the state-of-the-art networking embedding algorithms using real-world taxi GPS traces. Results show that our proposed method is significantly faster (up to 95%) and can improve resource utilization (up to 76%) and the acceptance ratio of computing and networking requests with various priorities (up to 71%). We also present a small-scale prototype of the proposed intent management framework to validate our solution.

Improving Urban Flood Prediction using LSTM-DeepLabv3+ and Bayesian Optimization with Spatiotemporal feature fusion

  • Authors: Zuxiang Situ, Qi Wang, Shuai Teng, Wanen Feng, Gongfa Chen, Qianqian Zhou, Guangtao Fu
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09994
  • Pdf link: https://arxiv.org/pdf/2304.09994
  • Abstract
    Deep learning models have become increasingly popular for flood prediction due to their superior accuracy and efficiency compared to traditional methods. However, current machine learning methods often rely on separate spatial or temporal feature analysis and have limitations on the types, number, and dimensions of input data. This study presented a CNN-RNN hybrid feature fusion modelling approach for urban flood prediction, which integrated the strengths of CNNs in processing spatial features and RNNs in analyzing different dimensions of time sequences. This approach allowed for both static and dynamic flood predictions. Bayesian optimization was applied to identify the seven most influential flood-driven factors and determine the best combination strategy. By combining four CNNs (FCN, UNet, SegNet, DeepLabv3+) and three RNNs (LSTM, BiLSTM, GRU), the optimal hybrid model was identified as LSTM-DeepLabv3+. This model achieved the highest prediction accuracy (MAE, RMSE, NSE, and KGE were 0.007, 0.025, 0.973 and 0.755, respectively) under various rainfall input conditions. Additionally, the processing speed was significantly improved, with an inference time of 1.158s (approximately 1/125 of the traditional computation time) compared to the physically-based models.

HTNet: Dynamic WLAN Performance Prediction using Heterogenous Temporal GNN

  • Authors: Hongkuan Zhou, Rajgopal Kannan, Ananthram Swami, Viktor Prasanna
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.10013
  • Pdf link: https://arxiv.org/pdf/2304.10013
  • Abstract
    Predicting the throughput of WLAN deployments is a classic problem that occurs in the design of robust and high performance WLAN systems. However, due to the increasingly complex communication protocols and the increase in interference between devices in denser and denser WLAN deployments, traditional methods either have substantial runtime or enormous prediction error and hence cannot be applied in downstream tasks. Recently, Graph Neural Networks have been proven to be powerful graph analytic models and have been broadly applied to various networking problems such as link scheduling and power allocation. In this work, we propose HTNet, a specialized Heterogeneous Temporal Graph Neural Network that extracts features from dynamic WLAN deployments. Analyzing the unique graph structure of WLAN deployment graphs, we show that HTNet achieves the maximum expressive power on each snapshot. Based on a powerful message passing scheme, HTNet requires fewer number of layers compared with other GNN-based methods which entails less supporting data and runtime. To evaluate the performance of HTNet, we prepare six different setups with more than five thousands dense dynamic WLAN deployments that cover a wide range of real-world scenarios. HTNet achieves the lowest prediction error on all six setups with an average improvement of 25.3% over the state-of-the-art methods.

Topological Guided Actor-Critic Modular Learning of Continuous Systems with Temporal Objectives

  • Authors: Lening Li, Zhentian Qian
  • Subjects: Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.10041
  • Pdf link: https://arxiv.org/pdf/2304.10041
  • Abstract
    This work investigates the formal policy synthesis of continuous-state stochastic dynamic systems given high-level specifications in linear temporal logic. To learn an optimal policy that maximizes the satisfaction probability, we take a product between a dynamic system and the translated automaton to construct a product system on which we solve an optimal planning problem. Since this product system has a hybrid product state space that results in reward sparsity, we introduce a generalized optimal backup order, in reverse to the topological order, to guide the value backups and accelerate the learning process. We provide the optimality proof for using the generalized optimal backup order in this optimal planning problem. Further, this paper presents an actor-critic reinforcement learning algorithm when topological order applies. This algorithm leverages advanced mathematical techniques and enjoys the property of hyperparameter self-tuning. We provide proof of the optimality and convergence of our proposed reinforcement learning algorithm. We use neural networks to approximate the value function and policy function for hybrid product state space. Furthermore, we observe that assigning integer numbers to automaton states can rank the value or policy function approximated by neural networks. To break the ordinal relationship, we use an individual neural network for each automaton state's value (policy) function, termed modular learning. We conduct two experiments. First, to show the efficacy of our reinforcement learning algorithm, we compare it with baselines on a classic control task, CartPole. Second, we demonstrate the empirical performance of our formal policy synthesis framework on motion planning of a Dubins car with a temporal specification.

Dynablox: Real-time Detection of Diverse Dynamic Objects in Complex Environments

  • Authors: Lukas Schmid, Olov Andersson, Aurelio Sulser, Patrick Pfreundschuh, Roland Siegwart
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.10049
  • Pdf link: https://arxiv.org/pdf/2304.10049
  • Abstract
    Real-time detection of moving objects is an essential capability for robots acting autonomously in dynamic environments. We thus propose Dynablox, a novel online mapping-based approach for robust moving object detection in complex environments. The central idea of our approach is to incrementally estimate high confidence free-space areas by modeling and accounting for sensing, state estimation, and mapping limitations during online robot operation. The spatio-temporally conservative free space estimate enables robust detection of moving objects without making any assumptions on the appearance of objects or environments. This allows deployment in complex scenes such as multi-storied buildings or staircases, and for diverse moving objects such as people carrying various items, doors swinging or even balls rolling around. We thoroughly evaluate our approach on real-world data sets, achieving 86% IoU at 17 FPS in typical robotic settings. The method outperforms a recent appearance-based classifier and approaches the performance of offline methods. We demonstrate its generality on a novel data set with rare moving objects in complex environments. We make our efficient implementation and the novel data set available as open-source.

Recurrent Transformer for Dynamic Graph Representation Learning with Edge Temporal States

  • Authors: Shengxiang Hu, Guobing Zou, Shiyi Lin, Liangrui Wu, Chenyang Zhou, Bofeng Zhang, Yixin Chen
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10079
  • Pdf link: https://arxiv.org/pdf/2304.10079
  • Abstract
    Dynamic graph representation learning is growing as a trending yet challenging research task owing to the widespread demand for graph data analysis in real world applications. Despite the encouraging performance of many recent works that build upon recurrent neural networks (RNNs) and graph neural networks (GNNs), they fail to explicitly model the impact of edge temporal states on node features over time slices. Additionally, they are challenging to extract global structural features because of the inherent over-smoothing disadvantage of GNNs, which further restricts the performance. In this paper, we propose a recurrent difference graph transformer (RDGT) framework, which firstly assigns the edges in each snapshot with various types and weights to illustrate their specific temporal states explicitly, then a structure-reinforced graph transformer is employed to capture the temporal node representations by a recurrent learning paradigm. Experimental results on four real-world datasets demonstrate the superiority of RDGT for discrete dynamic graph representation learning, as it consistently outperforms competing methods in dynamic link prediction tasks.

Securing Semantic Communications with Physical-layer Semantic Encryption and Obfuscation

  • Authors: Qi Qin, Yankai Rong, Guoshun Nan, Shaokang Wu, Xuefei Zhang, Qimei Cui, Xiaofeng Tao
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.10147
  • Pdf link: https://arxiv.org/pdf/2304.10147
  • Abstract
    Deep learning based semantic communication(DLSC) systems have shown great potential of making wireless networks significantly more efficient by only transmitting the semantics of the data. However, the open nature of wireless channel and fragileness of neural models cause DLSC systems extremely vulnerable to various attacks. Traditional wireless physical layer key (PLK), which relies on reciprocal channel and randomness characteristics between two legitimate users, holds the promise of securing DLSC. The main challenge lies in generating secret keys in the static environment with ultra-low/zero rate. Different from prior efforts that use relays or reconfigurable intelligent surfaces (RIS) to manipulate wireless channels, this paper proposes a novel physical layer semantic encryption scheme by exploring the randomness of bilingual evaluation understudy (BLEU) scores in the field of machine translation, and additionally presents a novel semantic obfuscation mechanism to provide further physical layer protections. Specifically, 1) we calculate the BLEU scores and corresponding weights of the DLSC system. Then, we generate semantic keys (SKey) by feeding the weighted sum of the scores into a hash function. 2) Equipped with the SKey, our proposed subcarrier obfuscation is able to further secure semantic communications with a dynamic dummy data insertion mechanism. Experiments show the effectiveness of our method, especially in the static wireless environment.

Automated Dynamic Bayesian Networks for Predicting Acute Kidney Injury Before Onset

  • Authors: David Gordon, Panayiotis Petousis, Anders O. Garlid, Keith Norris, Katherine Tuttle, Susanne B. Nicholas, Alex A.T. Bui (on behalf of CURE-CKD)
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10175
  • Pdf link: https://arxiv.org/pdf/2304.10175
  • Abstract
    Several algorithms for learning the structure of dynamic Bayesian networks (DBNs) require an a priori ordering of variables, which influences the determined graph topology. However, it is often unclear how to determine this order if feature importance is unknown, especially as an exhaustive search is usually impractical. In this paper, we introduce Ranking Approaches for Unknown Structures (RAUS), an automated framework to systematically inform variable ordering and learn networks end-to-end. RAUS leverages existing statistical methods (Cramers V, chi-squared test, and information gain) to compare variable ordering, resultant generated network topologies, and DBN performance. RAUS enables end-users with limited DBN expertise to implement models via command line interface. We evaluate RAUS on the task of predicting impending acute kidney injury (AKI) from inpatient clinical laboratory data. Longitudinal observations from 67,460 patients were collected from our electronic health record (EHR) and Kidney Disease Improving Global Outcomes (KDIGO) criteria were then applied to define AKI events. RAUS learns multiple DBNs simultaneously to predict a future AKI event at different time points (i.e., 24-, 48-, 72-hours in advance of AKI). We also compared the results of the learned AKI prediction models and variable orderings to baseline techniques (logistic regression, random forests, and extreme gradient boosting). The DBNs generated by RAUS achieved 73-83% area under the receiver operating characteristic curve (AUCROC) within 24-hours before AKI; and 71-79% AUCROC within 48-hours before AKI of any stage in a 7-day observation window. Insights from this automated framework can help efficiently implement and interpret DBNs for clinical decision support. The source code for RAUS is available in GitHub at https://github.com/dgrdn08/RAUS .

UAV-based Receding Horizon Control for 3D Inspection Planning

  • Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10201
  • Pdf link: https://arxiv.org/pdf/2304.10201
  • Abstract
    Nowadays, unmanned aerial vehicles or UAVs are being used for a wide range of tasks, including infrastructure inspection, automated monitoring and coverage. This paper investigates the problem of 3D inspection planning with an autonomous UAV agent which is subject to dynamical and sensing constraints. We propose a receding horizon 3D inspection planning control approach for generating optimal trajectories which enable an autonomous UAV agent to inspect a finite number of feature-points scattered on the surface of a cuboid-like structure of interest. The inspection planning problem is formulated as a constrained open-loop optimal control problem and is solved using mixed integer programming (MIP) optimization. Quantitative and qualitative evaluation demonstrates the effectiveness of the proposed approach.

Dynamic Security Region of Natural Gas Systems in Integrated Electricity-Gas Systems

  • Authors: Han Gao, Peiyao Zhao, Zhengshuo Li
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10215
  • Pdf link: https://arxiv.org/pdf/2304.10215
  • Abstract
    In an integrated electricity-gas system (IEGS), the tight coupling of power and natural gas systems is embodied by frequent changes in gas withdrawal from gas-fired units to provide regulation services for the power system to handle uncertainty, which may in turn endanger the secure operation of the natural gas system and ultimately affect the safety of the whole IEGS. Hence, it is necessary to accurately and efficiently evaluate the dynamic security region (DSR) of the natural gas system in the IEGS by considering the real-time dynamic characteristics of natural gas systems, which are not satisfactorily handled in state-of-the-art works. To bridge this gap, this paper first conceptionally verifies the necessity of the DSR and establishes its mathematical model. Then, a dimensionality reduction method is proposed for the efficient solution and visualization of the high-dimensional DSR evaluation model. A fast evaluation (FE) algorithm is developed to address the difficulties of the nonconvex dynamic constraints in the reduced DSR model. Finally, the necessity and notable advantages of the proposed DSR model and FE are verified based on small and relatively large test systems in comparison with common security region models and algorithms. To the best of our knowledge, this is the first paper that comprehensively presents models and efficient algorithms regarding the DSR of natural gas systems in an IEGS.

Filter-Aware Model-Predictive Control

  • Authors: Baris Kayalibay, Atanas Mirchev, Ahmed Agha, Patrick van der Smagt, Justin Bayer
  • Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10246
  • Pdf link: https://arxiv.org/pdf/2304.10246
  • Abstract
    Partially-observable problems pose a trade-off between reducing costs and gathering information. They can be solved optimally by planning in belief space, but that is often prohibitively expensive. Model-predictive control (MPC) takes the alternative approach of using a state estimator to form a belief over the state, and then plan in state space. This ignores potential future observations during planning and, as a result, cannot actively increase or preserve the certainty of its own state estimate. We find a middle-ground between planning in belief space and completely ignoring its dynamics by only reasoning about its future accuracy. Our approach, filter-aware MPC, penalises the loss of information by what we call "trackability", the expected error of the state estimator. We show that model-based simulation allows condensing trackability into a neural network, which allows fast planning. In experiments involving visual navigation, realistic every-day environments and a two-link robot arm, we show that filter-aware MPC vastly improves regular MPC.

Learning Representative Trajectories of Dynamical Systems via Domain-Adaptive Imitation

  • Authors: Edgardo Solano-Carrillo, Jannis Stoppe
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.10260
  • Pdf link: https://arxiv.org/pdf/2304.10260
  • Abstract
    Domain-adaptive trajectory imitation is a skill that some predators learn for survival, by mapping dynamic information from one domain (their speed and steering direction) to a different domain (current position of the moving prey). An intelligent agent with this skill could be exploited for a diversity of tasks, including the recognition of abnormal motion in traffic once it has learned to imitate representative trajectories. Towards this direction, we propose DATI, a deep reinforcement learning agent designed for domain-adaptive trajectory imitation using a cycle-consistent generative adversarial method. Our experiments on a variety of synthetic families of reference trajectories show that DATI outperforms baseline methods for imitation learning and optimal control in this setting, keeping the same per-task hyperparameters. Its generalization to a real-world scenario is shown through the discovery of abnormal motion patterns in maritime traffic, opening the door for the use of deep reinforcement learning methods for spatially-unconstrained trajectory data mining.

BackCache: Mitigating Contention-Based Cache Timing Attacks by Hiding Cache Line Evictions

  • Authors: Quancheng Wang, Ming Tang, Han Wang, Yuzhe Gu
  • Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.10268
  • Pdf link: https://arxiv.org/pdf/2304.10268
  • Abstract
    Caches are used to reduce the speed differential between the CPU and memory to improve the performance of modern processors. However, attackers can use contention-based cache timing attacks to steal sensitive information from victim processes through carefully designed cache eviction sets. And L1 data cache attacks are widely exploited and pose a significant privacy and confidentiality threat. Existing hardware-based countermeasures mainly focus on cache partitioning, randomization, and cache line flushing, which unfortunately either incur high overhead or can be circumvented by sophisticated attacks. In this paper, we propose a novel hardware-software co-design called BackCache with the idea of always achieving cache hits instead of cache misses to mitigate contention-based cache timing attacks on the L1 data cache. BackCache places the evicted cache lines from the L1 data cache into a fully-associative backup cache to hide the evictions. To improve the security of BackCache, we introduce a randomly used replacement policy (RURP) and a dynamic backup cache resizing mechanism. We also present a theoretical security analysis to demonstrate the effectiveness of BackCache. Our evaluation on the gem5 simulator shows that BackCache can degrade the performance by 1.33%, 7.34%, and 7.59% For OS kernel, single-thread, and multi-thread benchmarks.

Observer-Feedback-Feedforward Controller Structures in Reinforcement Learning

  • Authors: Ruoqi Zhang, Per Mattson, Torbjörn Wigren
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10276
  • Pdf link: https://arxiv.org/pdf/2304.10276
  • Abstract
    The paper proposes the use of structured neural networks for reinforcement learning based nonlinear adaptive control. The focus is on partially observable systems, with separate neural networks for the state and feedforward observer and the state feedback and feedforward controller. The observer dynamics are modelled by recurrent neural networks while a standard network is used for the controller. As discussed in the paper, this leads to a separation of the observer dynamics to the recurrent neural network part, and the state feedback to the feedback and feedforward network. The structured approach reduces the computational complexity and gives the reinforcement learning based controller an {\em understandable} structure as compared to when one single neural network is used. As shown by simulation the proposed structure has the additional and main advantage that the training becomes significantly faster. Two ways to include feedforward structure are presented, one related to state feedback control and one related to classical feedforward control. The latter method introduces further structure with a separate recurrent neural network that processes only the measured disturbance. When evaluated with simulation on a nonlinear cascaded double tank process, the method with most structure performs the best, with excellent feedforward disturbance rejection gains.

Aiding reinforcement learning for set point control

  • Authors: Ruoqi Zhang, Per Mattsson, Torbjörn Wigren
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10289
  • Pdf link: https://arxiv.org/pdf/2304.10289
  • Abstract
    While reinforcement learning has made great improvements, state-of-the-art algorithms can still struggle with seemingly simple set-point feedback control problems. One reason for this is that the learned controller may not be able to excite the system dynamics well enough initially, and therefore it can take a long time to get data that is informative enough to learn for good control. The paper contributes by augmentation of reinforcement learning with a simple guiding feedback controller, for example, a proportional controller. The key advantage in set point control is a much improved excitation that improves the convergence properties of the reinforcement learning controller significantly. This can be very important in real-world control where quick and accurate convergence is needed. The proposed method is evaluated with simulation and on a real-world double tank process with promising results.

FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits

  • Authors: Polina Karpikova (1 and 2), Radionova Ekaterina (1), Anastasia Yaschenko (1 and 2), Andrei Spiridonov (1), Leonid Kostyushko (3), Riccardo Fabbricatore (1), Aleksei Ivakhnenko (1) ((1) Samsung AI Center, (2) Higher School of Economics, (3) Lomonosov Moscow State University)
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10306
  • Pdf link: https://arxiv.org/pdf/2304.10306
  • Abstract
    Generative DNNs are a powerful tool for image synthesis, but they are limited by their computational load. On the other hand, given a trained model and a task, e.g. faces generation within a range of characteristics, the output image quality will be unevenly distributed among images with different characteristics. It follows, that we might restrain the models complexity on some instances, maintaining a high quality. We propose a method for diminishing computations by adding so-called early exit branches to the original architecture, and dynamically switching the computational path depending on how difficult it will be to render the output. We apply our method on two different SOTA models performing generative tasks: generation from a semantic map, and cross-reenactment of face expressions; showing it is able to output images with custom lower-quality thresholds. For a threshold of LPIPS <=0.1, we diminish their computations by up to a half. This is especially relevant for real-time applications such as synthesis of faces, when quality loss needs to be contained, but most of the inputs need fewer computations than the complex instances.

ORIGAMI: A flexible state channels design for public blockchain systems

  • Authors: Lydia Negka, Angeliki Katsika, Georgios Spathoulas, Vassilis Plagianakos
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.10313
  • Pdf link: https://arxiv.org/pdf/2304.10313
  • Abstract
    Public blockchain systems offer security guarantees that cannot be matched by any centralised system. This offering has attracted a lot of interest and has exposed a significant limitation of most blockchain designs with regards to scalability. One of the scaling solutions proposed is state channels which enables serving given applications with minimum number of transactions. Existing state channels designs set multiple compatibility requirements for applications to be deployed. Origami is a novel state channels design which removes most of the requirements of existing approaches, while it also offers a number of new features. Origami enables dynamic groups of users to interact in an unordered way completely off-chain after an initial on-boarding on-chain transaction. The proposed design is analysed in detail and compared to existing schemes, while a formal security analysis validates the security properties it offers.

Polylog-Competitive Algorithms for Dynamic Balanced Graph Partitioning for Ring Demands

  • Authors: Harald Räcke, Stefan Schmid, Ruslan Zabrodin
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.10350
  • Pdf link: https://arxiv.org/pdf/2304.10350
  • Abstract
    The performance of many large-scale and data-intensive distributed systems critically depends on the capacity of the interconnecting network. This paper is motivated by the vision of self-adjusting infrastructures whose resources can be adjusted according to the workload they currently serve, in a demand-aware manner. Such dynamic adjustments can be exploited to improve network utilization and hence performance, by dynamically moving frequently interacting communication partners closer, e.g., collocating them in the same server or datacenter rack. In particular, we revisit the online balanced graph partitioning problem which captures the fundamental tradeoff between the benefits and costs of dynamically collocating communication partners. The demand is modelled as a sequence $\sigma$ (revealed in an online manner) of communication requests between $n$ processes, each of which is running on one of the $\ell$ servers. Each server has capacity $k=n/\ell$, hence, the processes have to be scheduled in a balanced manner across the servers. A request incurs cost $1$, if the requested processes are located on different servers, otherwise the cost is 0. A process can be migrated to a different server at cost $1$. This paper presents the first online algorithm for online balanced graph partitioning achieving a polylogarithmic competitive ratio for the fundamental case of ring communication patterns. Specifically, our main contribution is a $O(\log^3 n)$-competitive randomized online algorithm for this problem. We further present a randomized online algorithm which is $O(\log^2 n)$-competitive when compared to a static optimal solution. Our two results rely on different algorithms and techniques and hence are of independent interest.

PDL on Steroids: on Expressive Extensions of PDL with Intersection and Converse

  • Authors: Diego Figueira, Santiago Figueira, Edwin Pin
  • Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.10381
  • Pdf link: https://arxiv.org/pdf/2304.10381
  • Abstract
    We introduce CPDL+, a family of expressive logics rooted in Propositional Dynamic Logic (PDL). In terms of expressive power, CPDL+ strictly contains PDL extended with intersection and converse (a.k.a. ICPDL) as well as Conjunctive Queries (CQ), Conjunctive Regular Path Queries (CRPQ), or some known extensions thereof (Regular Queries and CQPDL). We investigate the expressive power, characterization of bisimulation, satisfiability, and model checking for CPDL+. We argue that natural subclasses of CPDL+ can be defined in terms of the tree-width of the underlying graphs of the formulas. We show that the class of CPDL+ formulas of tree-width 2 is equivalent to ICPDL, and that it also coincides with CPDL+ formulas of tree-width 1. However, beyond tree-width 2, incrementing the tree-width strictly increases the expressive power. We characterize the expressive power for every class of fixed tree-width formulas in terms of a bisimulation game with pebbles. Based on this characterization, we show that CPDL+ has a tree-like model property. We prove that the satisfiability problem is decidable in 2ExpTime on fixed tree-width formulas, coinciding with the complexity of ICPDL. We also exhibit classes for which satisfiability is reduced to ExpTime. Finally, we establish that the model checking problem for fixed tree-width formulas is in \ptime, contrary to the full class CPDL+.

Multi-label Node Classification On Graph-Structured Data

  • Authors: Tianqi Zhao, Ngan Thi Dong, Alan Hanjalic, Megha Khosla
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10398
  • Pdf link: https://arxiv.org/pdf/2304.10398
  • Abstract
    Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, besides defining homophily for the multi-label scenario, we develop a new approach that dynamically fuses the feature and label correlation information to learn label-informed representations. Finally, we perform a large-scale comparative study with $10$ methods and $9$ datasets which also showcase the effectiveness of our approach. We release our benchmark at \url{https://anonymous.4open.science/r/LFLF-5D8C/}.

Distributed Neural Representation for Reactive in situ Visualization

  • Authors: Qi Wu, Joseph A. Insley, Victor A. Mateevitsi, Silvio Rizzi, Michael E. Papka, Kwan-Liu Ma
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10516
  • Pdf link: https://arxiv.org/pdf/2304.10516
  • Abstract
    In situ visualization and steering of computational modeling can be effectively achieved using reactive programming, which leverages temporal abstraction and data caching mechanisms to create dynamic workflows. However, implementing a temporal cache for large-scale simulations can be challenging. Implicit neural networks have proven effective in compressing large volume data. However, their application to distributed data has yet to be fully explored. In this work, we develop an implicit neural representation for distributed volume data and incorporate it into the DIVA reactive programming system. This implementation enables us to build an in situ temporal caching system with a capacity 100 times larger than previously achieved. We integrate our implementation into the Ascent infrastructure and evaluate its performance using real-world simulations.

A class of mesh-free algorithms for some problems arising in finance and machine learning

  • Authors: Philippe G. LeFloch, Jean-Marc Mercier
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.10521
  • Pdf link: https://arxiv.org/pdf/2304.10521
  • Abstract
    We introduce a numerical methodology, referred to as the transport-based mesh-free method, which allows us to deal with continuous, discrete, or statistical models in the same unified framework, and leads us to a broad class of numerical algorithms recently implemented in a Python library (namely, CodPy). Specifically, we propose a mesh-free discretization technique based on the theory of reproducing kernels and the theory of transport mappings, in a way that is reminiscent of Lagrangian methods in computational fluid dynamics. We introduce kernel-based discretizations of a variety of differential and discrete operators (gradient, divergence, Laplacian, Leray projection, extrapolation, interpolation, polar factorization). The proposed algorithms are nonlinear in nature and enjoy quantitative error estimates based on the notion of discrepancy error, which allows one to evaluate the relevance and accuracy of, both, the given data and the numerical solutions. Our strategy is relevant when a large number of degrees of freedom are present as is the case in mathematical finance and machine learning. We consider the Fokker-Planck-Kolmogorov system (relevant for problems arising in finance and material dynamics) and a class of neural networks based on support vector machines.

Collaborative Diffusion for Multi-Modal Face Generation and Editing

  • Authors: Ziqi Huang, Kelvin C.K. Chan, Yuming Jiang, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10530
  • Pdf link: https://arxiv.org/pdf/2304.10530
  • Abstract
    Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion models mainly focus on uni-modal control, i.e., the diffusion process is driven by only one modality of condition. To further unleash the users' creativity, it is desirable for the model to be controllable by multiple modalities simultaneously, e.g., generating and editing faces by describing the age (text-driven) while drawing the face shape (mask-driven). In this work, we present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training. Our key insight is that diffusion models driven by different modalities are inherently complementary regarding the latent denoising steps, where bilateral connections can be established upon. Specifically, we propose dynamic diffuser, a meta-network that adaptively hallucinates multi-modal denoising steps by predicting the spatial-temporal influence functions for each pre-trained uni-modal model. Collaborative Diffusion not only collaborates generation capabilities from uni-modal diffusion models, but also integrates multiple uni-modal manipulations to perform multi-modal editing. Extensive qualitative and quantitative experiments demonstrate the superiority of our framework in both image quality and condition consistency.

New submissions for Tue, 2 May 23

Keyword: efficient

Fair Distribution of Delivery Orders

  • Authors: Hadi Hosseini, Shivika Narang, Tomasz Wąs
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2305.00040
  • Pdf link: https://arxiv.org/pdf/2305.00040
  • Abstract
    We initiate the study of fair distribution of delivery tasks among a set of agents wherein delivery jobs are placed along the vertices of a graph. Our goal is to fairly distribute delivery costs (modeled as a submodular function) among a fixed set of agents while satisfying some desirable notions of economic efficiency. We adopt well-established fairness concepts$\unicode{x2014}$such as envy-freeness up to one item (EF1) and minimax share (MMS)$\unicode{x2014}$to our setting and show that fairness is often incompatible with the efficiency notion of social optimality. Yet, we characterize instances that admit fair and socially optimal solutions by exploiting graph structures. We further show that achieving fairness along with Pareto optimality is computationally intractable. Nonetheless, we design an XP algorithm (parameterized by the number of agents) for finding MMS and Pareto optimal solutions on every instance, and show that the same algorithm can be modified to find efficient solutions along with EF1, when such solutions exist. We complement our theoretical results by experimentally analyzing the price of fairness on randomly generated graph structures.

Click-Feedback Retrieval

  • Authors: Zeyu Wang, Yu Wu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00052
  • Pdf link: https://arxiv.org/pdf/2305.00052
  • Abstract
    Retrieving target information based on input query is of fundamental importance in many real-world applications. In practice, it is not uncommon for the initial search to fail, where additional feedback information is needed to guide the searching process. In this work, we study a setting where the feedback is provided through users clicking liked and disliked searching results. We believe this form of feedback is of great practical interests for its convenience and efficiency. To facilitate future work in this direction, we construct a new benchmark termed click-feedback retrieval based on a large-scale dataset in fashion domain. We demonstrate that incorporating click-feedback can drastically improve the retrieval performance, which validates the value of the proposed setting. We also introduce several methods to utilize click-feedback during training, and show that click-feedback-guided training can significantly enhance the retrieval quality. We hope further exploration in this direction can bring new insights on building more efficient and user-friendly search engines.

The Kolmogorov N-width for linear transport: Exact representation and the influence of the data

  • Authors: Florian Arbes, Constantin Greif, Karsten Urban
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2305.00066
  • Pdf link: https://arxiv.org/pdf/2305.00066
  • Abstract
    The Kolmogorov $N$-width describes the best possible error one can achieve by elements of an $N$-dimensional linear space. Its decay has extensively been studied in Approximation Theory and for the solution of Partial Differential Equations (PDEs). Particular interest has occurred within Model Order Reduction (MOR) of parameterized PDEs e.g.\ by the Reduced Basis Method (RBM). While it is known that the $N$-width decays exponentially fast (and thus admits efficient MOR) for certain problems, there are examples of the linear transport and the wave equation, where the decay rate deteriorates to $N^{-1/2}$. On the other hand, it is widely accepted that a smooth parameter dependence admits a fast decay of the $N$-width. However, a detailed analysis of the influence of properties of the data (such as regularity or slope) on the rate of the $N$-width seems to lack. In this paper, we use techniques from Fourier Analysis to derive exact representations of the $N$-width in terms of initial and boundary conditions of the linear transport equation modeled by some function $g$ for half-wave symmetric data. For arbitrary functions $g$, we derive bounds and prove that these bounds are sharp. In particular, we prove that the $N$-width decays as $c_r N^{-(r+1/2)}$ for functions in the Sobolev space, $g\in H^r$. Our theoretical investigations are complemented by numerical experiments which confirm the sharpness of our bounds and give additional quantitative insight.

CarGameAR: An Integrated AR Car Game Authoring Interface for Custom-Built Car Programed on Arduino Board

  • Authors: Dang Bui, Wanwan Li, Hong Huang
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2305.00084
  • Pdf link: https://arxiv.org/pdf/2305.00084
  • Abstract
    In this paper, we present CarGameAR: An Integrated AR Car Game Authoring Interface for Custom-Built Car Programed on Arduino Board. The car consists of an Arduino board, an H-bridge, and motors. The objective of the project is to create a system that can move a car in different directions using a computer application. The system uses Unity software to create a virtual environment where the user can control the car using keyboard commands. The car's motion is achieved by sending signals from the computer to the Arduino board, which then drives the motors through the H-bridge. The project provides a cost-effective and efficient way to build a car, which can be used for educational purposes, such as teaching programming. Moreover, this project is not limited to the control of the car through keyboard commands in a virtual environment. The system can be adapted to support augmented reality (AR) technology, providing an even more immersive and engaging user experience. By integrating the car with AR, the user can control the car's motion using physical gestures and movements, adding an extra layer of interactivity to the system. This makes the car an ideal platform for game development in AR, allowing the user to create driving games that blend the physical and virtual worlds seamlessly. Additionally, the car's affordability and ease of construction make it an accessible and valuable tool for teaching programming and principles in a fun and interactive way. Overall, this project demonstrates the versatility and potential of the car system, highlighting the various applications and possibilities it offers for both education and entertainment.

Space reduction techniques for the $3$-wise Kemeny problem

  • Authors: Xuan Kien Phung, Sylvie Hamel
  • Subjects: Discrete Mathematics (cs.DM); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.00140
  • Pdf link: https://arxiv.org/pdf/2305.00140
  • Abstract
    Kemeny's rule is one of the most studied and well-known voting schemes with various important applications in computational social choice and biology. Recently, Kemeny's rule was generalized via a set-wise approach by Gilbert et. al. Following this paradigm, we have shown in \cite{Phung-Hamel-2023} that the $3$-wise Kemeny voting scheme induced by the $3$-wise Kendall-tau distance presents interesting advantages in comparison with the classical Kemeny rule. While the $3$-wise Kemeny problem, which consists of computing the set of $3$-wise consensus rankings of a voting profile, is NP-hard, we establish in this paper several generalizations of the Major Order Theorems, as obtained in \cite{Milosz-Hamel-2020} for the classical Kemeny rule, for the $3$-wise Kemeny voting scheme to achieve a substantial search space reduction by efficiently determining in polynomial time the relative orders of pairs of alternatives. Essentially, our theorems quantify precisely the non-trivial property that if the preference for an alternative over another one in an election is strong enough, not only in the head-to-head competition but even when taking into consideration one or two more alternatives, then the relative order of these two alternatives in every $3$-wise consensus ranking must be as expected. Moreover, we show that the well-known $3/4$-majority rule of Betzler et al. for the classical Kemeny rule is only valid for elections with no more than $5$ alternatives with respect to the $3$-wise Kemeny scheme. Examples are also provided to show that the $3$-wise Kemeny rule is more resistant to manipulation than the classical one.

Learning to Seek: Multi-Agent Online Source Seeking Against Non-Stochastic Disturbances

  • Authors: Bin Du, Kun Qian, Christian Claudel, Dengfeng Sun
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2305.00154
  • Pdf link: https://arxiv.org/pdf/2305.00154
  • Abstract
    This paper proposes to leverage the emerginglearning techniques and devise a multi-agent online source {seeking} algorithm under unknown environment. Of particular significance in our problem setups are: i) the underlying environment is not only unknown, but dynamically changing and also perturbed by two types of non-stochastic disturbances; and ii) a group of agents is deployed and expected to cooperatively seek as many sources as possible. Correspondingly, a new technique of discounted Kalman filter is developed to tackle with the non-stochastic disturbances, and a notion of confidence bound in polytope nature is utilizedto aid the computation-efficient cooperation amongmultiple agents. With standard assumptions on the unknown environment as well as the disturbances, our algorithm is shown to achieve sub-linear regrets under the twotypes of non-stochastic disturbances; both results are comparable to the state-of-the-art. Numerical examples on a real-world pollution monitoring application are provided to demonstrate the effectiveness of our algorithm.

Beyond Prediction: On-street Parking Recommendation using Heterogeneous Graph-based List-wise Ranking

  • Authors: Hanyu Sun, Xiao Huang, Wei Ma
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.00162
  • Pdf link: https://arxiv.org/pdf/2305.00162
  • Abstract
    To provide real-time parking information, existing studies focus on predicting parking availability, which seems an indirect approach to saving drivers' cruising time. In this paper, we first time propose an on-street parking recommendation (OPR) task to directly recommend a parking space for a driver. To this end, a learn-to-rank (LTR) based OPR model called OPR-LTR is built. Specifically, parking recommendation is closely related to the "turnover events" (state switching between occupied and vacant) of each parking space, and hence we design a highly efficient heterogeneous graph called ESGraph to represent historical and real-time meters' turnover events as well as geographical relations; afterward, a convolution-based event-then-graph network is used to aggregate and update representations of the heterogeneous graph. A ranking model is further utilized to learn a score function that helps recommend a list of ranked parking spots for a specific on-street parking query. The method is verified using the on-street parking meter data in Hong Kong and San Francisco. By comparing with the other two types of methods: prediction-only and prediction-then-recommendation, the proposed direct-recommendation method achieves satisfactory performance in different metrics. Extensive experiments also demonstrate that the proposed ESGraph and the recommendation model are more efficient in terms of computational efficiency as well as saving drivers' on-street parking time.

Asynchronous Distributed Protocol for Service Provisioning in the Edge-Cloud Continuum

  • Authors: Itamar Cohen, Paolo Giaccone, Carla Fabiana Chiasserini
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2305.00184
  • Pdf link: https://arxiv.org/pdf/2305.00184
  • Abstract
    In the edge-cloud continuum, datacenters provide microservices (MSs) to mobile users, with each MS having specific latency constraints and computational requirements. Deploying such a variety of MSs matching their requirements with the available computing resources is challenging. In addition, time-critical MSs may have to be migrated as the users move, to keep meeting their latency constraints. Unlike previous work relying on a central orchestrator with an always-updated global view of the available resources and of the users' locations, this work envisions a distributed solution to the above issues. In particular, we propose a distributed asynchronous protocol for MS deployment in the cloud-edge continuum that (i) dramatically reduces the system overhead compared to a centralized approach, and (ii) increases the system stability by avoiding having a single point of failure as in the case of a central orchestrator. Our solution ensures cost-efficient feasible placement of MSs, while using negligible bandwidth.

Distributed State Estimation for Linear Time-Varying Systems with Sensor Network Delays

  • Authors: Sanjay Chandrasekaran, Vishnu Varadan, Siva Vignesh Krishnan, Florian Dörfler, Mohammad H. Mamduhi
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.00190
  • Pdf link: https://arxiv.org/pdf/2305.00190
  • Abstract
    Distributed sensor networks often include a multitude of sensors, each measuring parts of a process state space or observing the operations of a system. Communication of measurements between the sensor nodes and estimator(s) cannot realistically be considered delay-free due to communication errors and transmission latency in the channels. We propose a novel stability-based method that mitigates the influence of sensor network delays in distributed state estimation for linear time-varying systems. Our proposed algorithm efficiently selects a subset of sensors from the entire sensor nodes in the network based on the desired stability margins of the distributed Kalman filter estimates, after which, the state estimates are computed only using the measurements of the selected sensors. We provide comparisons between the estimation performance of our proposed algorithm and a greedy algorithm that exhaustively selects an optimal subset of nodes. We then apply our method to a simulative scenario for estimating the states of a linear time-varying system using a sensor network including 2000 sensor nodes. Simulation results demonstrate the performance efficiency of our algorithm and show that it closely follows the achieved performance by the optimal greedy search algorithm.

Data-Driven Subgroup Identification for Linear Regression

  • Authors: Zachary Izzo, Ruishan Liu, James Zou
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2305.00195
  • Pdf link: https://arxiv.org/pdf/2305.00195
  • Abstract
    Medical studies frequently require to extract the relationship between each covariate and the outcome with statistical confidence measures. To do this, simple parametric models are frequently used (e.g. coefficients of linear regression) but usually fitted on the whole dataset. However, it is common that the covariates may not have a uniform effect over the whole population and thus a unified simple model can miss the heterogeneous signal. For example, a linear model may be able to explain a subset of the data but fail on the rest due to the nonlinearity and heterogeneity in the data. In this paper, we propose DDGroup (data-driven group discovery), a data-driven method to effectively identify subgroups in the data with a uniform linear relationship between the features and the label. DDGroup outputs an interpretable region in which the linear model is expected to hold. It is simple to implement and computationally tractable for use. We show theoretically that, given a large enough sample, DDGroup recovers a region where a single linear model with low variance is well-specified (if one exists), and experiments on real-world medical datasets confirm that it can discover regions where a local linear model has improved performance. Our experiments also show that DDGroup can uncover subgroups with qualitatively different relationships which are missed by simply applying parametric approaches to the whole dataset.

Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming

  • Authors: Vignesh V Menon, Jingwen Zhu, Prajit T Rajendran, Hadi Amirpour, Patrick Le Callet, Christian Timmerer
  • Subjects: Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2305.00225
  • Pdf link: https://arxiv.org/pdf/2305.00225
  • Abstract
    In video streaming applications, a fixed set of bitrate-resolution pairs (known as a bitrate ladder) is typically used during the entire streaming session. However, an optimized bitrate ladder per scene may result in (i) decreased storage or delivery costs or/and (ii) increased Quality of Experience. This paper introduces a Just Noticeable Difference (JND)-aware per-scene bitrate ladder prediction scheme (JASLA) for adaptive video-on-demand streaming applications. JASLA predicts jointly optimized resolutions and corresponding constant rate factors (CRFs) using spatial and temporal complexity features for a given set of target bitrates for every scene, which yields an efficient constrained Variable Bitrate encoding. Moreover, bitrate-resolution pairs that yield distortion lower than one JND are eliminated. Experimental results show that, on average, JASLA yields bitrate savings of 34.42% and 42.67% to maintain the same PSNR and VMAF, respectively, compared to the reference HTTP Live Streaming (HLS) bitrate ladder Constant Bitrate encoding using x265 HEVC encoder, where the maximum resolution of streaming is Full HD (1080p). Moreover, a 54.34% average cumulative decrease in storage space is observed.

ZIRCON: Zero-watermarking-based approach for data integrity and secure provenance in IoT networks

  • Authors: Omair Faraj, David Megías, Joaquin Garcia-Alfaro
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2305.00266
  • Pdf link: https://arxiv.org/pdf/2305.00266
  • Abstract
    The Internet of Things (IoT) is integrating the Internet and smart devices in almost every domain such as home automation, e-healthcare systems, vehicular networks, industrial control and military applications. In these sectors, sensory data, which is collected from multiple sources and managed through intermediate processing by multiple nodes, is used for decision-making processes. Ensuring data integrity and keeping track of data provenance is a core requirement in such a highly dynamic context, since data provenance is an important tool for the assurance of data trustworthiness. Dealing with such requirements is challenging due to the limited computational and energy resources in IoT networks. This requires addressing several challenges such as processing overhead, secure provenance, bandwidth consumption and storage efficiency. In this paper, we propose ZIRCON, a novel zero-watermarking approach to establish end-to-end data trustworthiness in an IoT network. In ZIRCON, provenance information is stored in a tamper-proof centralized network database through watermarks, generated at source node before transmission. We provide an extensive security analysis showing the resilience of our scheme against passive and active attacks. We also compare our scheme with existing works based on performance metrics such as computational time, energy utilization and cost analysis. The results show that ZIRCON is robust against several attacks, lightweight, storage efficient, and better in energy utilization and bandwidth consumption, compared to prior art.

Path Planning for Multiple Tethered Robots Using Topological Braids

  • Authors: Muqing Cao, Kun Cao, Shenghai Yuan, Kangcheng Liu, Yan Loi Wong, Lihua Xie
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.00271
  • Pdf link: https://arxiv.org/pdf/2305.00271
  • Abstract
    Path planning for multiple tethered robots is a challenging problem due to the complex interactions among the cables and the possibility of severe entanglements. Previous works on this problem either consider idealistic cable models or provide no guarantee for entanglement-free paths. In this work, we present a new approach to address this problem using the theory of braids. By establishing a topological equivalence between the physical cables and the space-time trajectories of the robots, and identifying particular braid patterns that emerge from the entangled trajectories, we obtain the key finding that all complex entanglements stem from a finite number of interaction patterns between 2 or 3 robots. Hence, non-entanglement can be guaranteed by avoiding these interaction patterns in the trajectories of the robots. Based on this finding, we present a graph search algorithm using the permutation grid to efficiently search for a feasible topology of paths and reject braid patterns that result in an entanglement. We demonstrate that the proposed algorithm can achieve 100% goal-reaching capability without entanglement for up to 10 drones with a slack cable model in a high-fidelity simulation platform. The practicality of the proposed approach is verified using three small tethered UAVs in indoor flight experiments.

A spectral method for a Fokker-Planck equation in neuroscience with applications in neural networks with learning rules

  • Authors: Pei Zhang, Yanli Wang, Zhennan Zhou
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2305.00275
  • Pdf link: https://arxiv.org/pdf/2305.00275
  • Abstract
    In this work, we consider the Fokker-Planck equation of the Nonlinear Noisy Leaky Integrate-and-Fire (NNLIF) model for neuron networks. Due to the firing events of neurons at the microscopic level, this Fokker-Planck equation contains dynamic boundary conditions involving specific internal points. To efficiently solve this problem and explore the properties of the unknown, we construct a flexible numerical scheme for the Fokker-Planck equation in the framework of spectral methods that can accurately handle the dynamic boundary condition. This numerical scheme is stable with suitable choices of test function spaces, and asymptotic preserving, and it is easily extendable to variant models with multiple time scales. We also present extensive numerical examples to verify the scheme properties, including order of convergence and time efficiency, and explore unique properties of the model, including blow-up phenomena for the NNLIF model and learning and discriminative properties for the NNLIF model with learning rules.

NSLF-OL: Online Learning of Neural Surface Light Fields alongside Real-time Incremental 3D Reconstruction

  • Authors: Yijun Yuan, Andreas Nuchter
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.00282
  • Pdf link: https://arxiv.org/pdf/2305.00282
  • Abstract
    Immersive novel view generation is an important technology in the field of graphics and has recently also received attention for operator-based human-robot interaction. However, the involved training is time-consuming, and thus the current test scope is majorly on object capturing. This limits the usage of related models in the robotics community for 3D reconstruction since robots (1) usually only capture a very small range of view directions to surfaces that cause arbitrary predictions on unseen, novel direction, (2) requires real-time algorithms, and (3) work with growing scenes, e.g., in robotic exploration. The paper proposes a novel Neural Surface Light Fields model that copes with the small range of view directions while producing a good result in unseen directions. Exploiting recent encoding techniques, the training of our model is highly efficient. In addition, we design Multiple Asynchronous Neural Agents (MANA), a universal framework to learn each small region in parallel for large-scale growing scenes. Our model learns online the Neural Surface Light Fields (NSLF) aside from real-time 3D reconstruction with a sequential data stream as the shared input. In addition to online training, our model also provides real-time rendering after completing the data stream for visualization. We implement experiments using well-known RGBD indoor datasets, showing the high flexibility to embed our model into real-time 3D reconstruction and demonstrating high-fidelity view synthesis for these scenes. The code is available on github.

Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning

  • Authors: Mingyang Wang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Hang Su, Chenguang Yang, Kai Huang, Alois Knoll
  • Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.00286
  • Pdf link: https://arxiv.org/pdf/2305.00286
  • Abstract
    Meta-reinforcement learning enables artificial agents to learn from related training tasks and adapt to new tasks efficiently with minimal interaction data. However, most existing research is still limited to narrow task distributions that are parametric and stationary, and does not consider out-of-distribution tasks during the evaluation, thus, restricting its application. In this paper, we propose MoSS, a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning to address this challenge. We extend meta-RL to broad non-parametric task distributions which have never been explored before, and also achieve state-of-the-art results in non-stationary and out-of-distribution tasks. Specifically, MoSS consists of a task inference module and a policy module. We utilize the Gaussian mixture model for task representation to imitate the parametric and non-parametric task variations. Additionally, our online adaptation strategy enables the agent to react at the first sight of a task change, thus being applicable in non-stationary tasks. MoSS also exhibits strong generalization robustness in out-of-distributions tasks which benefits from the reliable and robust task representation. The policy is built on top of an off-policy RL algorithm and the entire network is trained completely off-policy to ensure high sample efficiency. On MuJoCo and Meta-World benchmarks, MoSS outperforms prior works in terms of asymptotic performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization robustness on broad and diverse task distributions.

An Efficient Plane Extraction Approach for Bundle Adjustment on LiDAR Point clouds

  • Authors: Zheng Liu, Fu Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.00287
  • Pdf link: https://arxiv.org/pdf/2305.00287
  • Abstract
    Bundle adjustment (BA) on LiDAR point clouds has been extensively investigated in recent years due to its ability to optimize multiple poses together, resulting in high accuracy and global consistency for point cloud. However, the accuracy and speed of LiDAR bundle adjustment depend on the quality of plane extraction, which provides point association for LiDAR BA. In this study, we propose a novel and efficient voxel-based approach for plane extraction that is specially designed to provide point association for LiDAR bundle adjustment. To begin, we partition the space into multiple voxels of a fixed size and then split these root voxels based on whether the points are on the same plane, using an octree structure. We also design a novel plane determination method based on principle component analysis (PCA), which segments the points into four even quarters and compare their minimum eigenvalues with that of the initial point cloud. Finally, we adopt a plane merging method to prevent too many small planes from being in a single voxel, which can increase the optimization time required for BA. Our experimental results on HILTI demonstrate that our approach achieves the best precision and least time cost compared to other plane extraction methods.

Patent Mining by Extracting Functional Analysis Information Modelled As Graph Structure: A Patent Knowledge-base Collaborative Building Approach

  • Authors: Manal E. Helal
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2305.00309
  • Pdf link: https://arxiv.org/pdf/2305.00309
  • Abstract
    Patents provide a rich source of information about design innovations. Patent mining techniques employ various technologies, such as text mining, machine learning, natural language processing, and ontology-building techniques. An automated graph data modelling method is proposed for extracting functional representations for building a semantic database of patents of mechanical designs. The method has several benefits: The schema-free characteristic of the proposed graph modelling enables the ontology it is based on to evolve and generalise to upper ontologies across technology domains and to specify lower ontologies to more specific domains. Graph modelling benefits from enhanced performance of deep queries across many levels of relationships and interactions and provides efficient storage. Graph modelling also enables visualisation libraries to use the graph data structure immediately, avoiding the need for graph extraction programs from relational databases. Patent/Design comparisons are computed by search queries using counting of overlaps of different levels and weights. This work has produced the PatMine SolidWorks Add-in \c{opyright}, which compares annotated CAD designs with patents and highlights overlapping design concepts. The patent annotation extracts its functional analysis, representing its structure as geometric feature interactions. Additional features such as full-text search and semantic search of the PatMine patents database are available, and graph analytic methods and machine learning algorithms are enabled and can be implemented as plug-ins in future work. Keywords: Patent Mining; Semantic Analysis; Functional Analysis Diagrams; Graph Data Modelling; Visualisation; Similarity Scoring; Big Data Analytics; Machine Learning; Artificial Intelligence; Natural Language Processing

Optimizing Privacy, Utility and Efficiency in Constrained Multi-Objective Federated Learning

  • Authors: Yan Kang, Hanlin Gu, Xingxing Tang, Yuanqin He, Yuzhu Zhang, Jinnan He, Yuxing Han, Lixin Fan, Qiang Yang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00312
  • Pdf link: https://arxiv.org/pdf/2305.00312
  • Abstract
    Conventionally, federated learning aims to optimize a single objective, typically the utility. However, for a federated learning system to be trustworthy, it needs to simultaneously satisfy multiple/many objectives, such as maximizing model performance, minimizing privacy leakage and training cost, and being robust to malicious attacks. Multi-Objective Optimization (MOO) aiming to optimize multiple conflicting objectives at the same time is quite suitable for solving the optimization problem of Trustworthy Federated Learning (TFL). In this paper, we unify MOO and TFL by formulating the problem of constrained multi-objective federated learning (CMOFL). Under this formulation, existing MOO algorithms can be adapted to TFL straightforwardly. Different from existing CMOFL works focusing on utility, efficiency, fairness, and robustness, we consider optimizing privacy leakage along with utility loss and training cost, the three primary objectives of a TFL system. We develop two improved CMOFL algorithms based on NSGA-II and PSL, respectively, for effectively and efficiently finding Pareto optimal solutions, and we provide theoretical analysis on their convergence. We design specific measurements of privacy leakage, utility loss, and training cost for three privacy protection mechanisms: Randomization, BatchCrypt (An efficient version of homomorphic encryption), and Sparsification. Empirical experiments conducted under each of the three protection mechanisms demonstrate the effectiveness of our proposed algorithms.

Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using Corrupted Multimodal Data

  • Authors: Arthur Josi, Mahdi Alehdaghi, Rafael M. O. Cruz, Eric Granger
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00320
  • Pdf link: https://arxiv.org/pdf/2305.00320
  • Abstract
    Visible-infrared person re-identification (V-I ReID) seeks to match images of individuals captured over a distributed network of RGB and IR cameras. The task is challenging due to the significant differences between V and I modalities, especially under real-world conditions, where images are corrupted by, e.g, blur, noise, and weather. Indeed, state-of-art V-I ReID models cannot leverage corrupted modality information to sustain a high level of accuracy. In this paper, we propose an efficient model for multimodal V-I ReID -- named Multimodal Middle Stream Fusion (MMSF) -- that preserves modality-specific knowledge for improved robustness to corrupted multimodal images. In addition, three state-of-art attention-based multimodal fusion models are adapted to address corrupted multimodal data in V-I ReID, allowing to dynamically balance each modality importance. Recently, evaluation protocols have been proposed to assess the robustness of ReID models under challenging real-world scenarios. However, these protocols are limited to unimodal V settings. For realistic evaluation of multimodal (and cross-modal) V-I person ReID models, we propose new challenging corrupted datasets for scenarios where V and I cameras are co-located (CL) and not co-located (NCL). Finally, the benefits of our Masking and Local Multimodal Data Augmentation (ML-MDA) strategy are explored to improve the robustness of ReID models to multimodal corruption. Our experiments on clean and corrupted versions of the SYSU-MM01, RegDB, and ThermalWORLD datasets indicate the multimodal V-I ReID models that are more likely to perform well in real-world operational conditions. In particular, our ML-MDA is an important strategy for a V-I person ReID system to sustain high accuracy and robustness when processing corrupted multimodal images. Also, our multimodal ReID model MMSF outperforms every method under CL and NCL camera scenarios.

Leveraging Data Mining Algorithms to Recommend Source Code Changes

  • Authors: AmirHossein Naghshzan, Saeed Khalilazar, Pierre Poilane, Olga Baysal, Latifa Guerrouj, Foutse Khomh
  • Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00323
  • Pdf link: https://arxiv.org/pdf/2305.00323
  • Abstract
    Context: Recent research has used data mining to develop techniques that can guide developers through source code changes. To the best of our knowledge, very few studies have investigated data mining techniques and--or compared their results with other algorithms or a baseline. Objectives: This paper proposes an automatic method for recommending source code changes using four data mining algorithms. We not only use these algorithms to recommend source code changes, but we also conduct an empirical evaluation. Methods: Our investigation includes seven open-source projects from which we extracted source change history at the file level. We used four widely data mining algorithms \ie{} Apriori, FP-Growth, Eclat, and Relim to compare the algorithms in terms of performance (Precision, Recall and F-measure) and execution time. Results: Our findings provide empirical evidence that while some Frequent Pattern Mining algorithms, such as Apriori may outperform other algorithms in some cases, the results are not consistent throughout all the software projects, which is more likely due to the nature and characteristics of the studied projects, in particular their change history. Conclusion: Apriori seems appropriate for large-scale projects, whereas Eclat appears to be suitable for small-scale projects. Moreover, FP-Growth seems an efficient approach in terms of execution time.

MinMaxLTTB: Leveraging MinMax-Preselection to Scale LTTB

  • Authors: Jeroen Van Der Donckt, Jonas Van Der Donckt, Michael Rademaker, Sofie Van Hoecke
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2305.00332
  • Pdf link: https://arxiv.org/pdf/2305.00332
  • Abstract
    Visualization plays an important role in analyzing and exploring time series data. To facilitate efficient visualization of large datasets, downsampling has emerged as a well-established approach. This work concentrates on LTTB (Largest-Triangle-Three-Buckets), a widely adopted downsampling algorithm for time series data point selection. Specifically, we propose MinMaxLTTB, a two-step algorithm that marks a significant enhancement in the scalability of LTTB. MinMaxLTTB entails the following two steps: (i) the MinMax algorithm preselects a certain ratio of minimum and maximum data points, followed by (ii) applying the LTTB algorithm on only these preselected data points, effectively reducing LTTB's time complexity. The low computational cost of the MinMax algorithm, along with its parallelization capabilities, facilitates efficient preselection of data points. Additionally, the competitive performance of MinMax in terms of visual representativeness also makes it an effective reduction method. Experiments show that MinMaxLTTB outperforms LTTB by more than an order of magnitude in terms of computation time. Furthermore, preselecting a small multiple of the desired output size already provides similar visual representativeness compared to LTTB. In summary, MinMaxLTTB leverages the computational efficiency of MinMax to scale LTTB, without compromising on LTTB's favored visualization properties. The accompanying code and experiments of this paper can be found at https://github.com/predict-idlab/MinMaxLTTB.

MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer

  • Authors: Yifang Xu, Yunzhuo Sun, Yang Li, Yilei Shi, Xiaoxiang Zhu, Sidan Du
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.00355
  • Pdf link: https://arxiv.org/pdf/2305.00355
  • Abstract
    With the increasing demand for video understanding, video moment and highlight detection (MHD) has emerged as a critical research topic. MHD aims to localize all moments and predict clip-wise saliency scores simultaneously. Despite progress made by existing DETR-based methods, we observe that these methods coarsely fuse features from different modalities, which weakens the temporal intra-modal context and results in insufficient cross-modal interaction. To address this issue, we propose MH-DETR (Moment and Highlight Detection Transformer) tailored for MHD. Specifically, we introduce a simple yet efficient pooling operator within the uni-modal encoder to capture global intra-modal context. Moreover, to obtain temporally aligned cross-modal features, we design a plug-and-play cross-modal interaction module between the encoder and decoder, seamlessly integrating visual and textual features. Comprehensive experiments on QVHighlights, Charades-STA, Activity-Net, and TVSum datasets show that MH-DETR outperforms existing state-of-the-art methods, demonstrating its effectiveness and superiority. Our code is available at https://github.com/YoucanBaby/MH-DETR.

Electricity Price Prediction for Energy Storage System Arbitrage: A Decision-focused Approach

  • Authors: Linwei Sang, Yinliang Xu, Huan Long, Qinran Hu, Hongbin Sun
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.00362
  • Pdf link: https://arxiv.org/pdf/2305.00362
  • Abstract
    Electricity price prediction plays a vital role in energy storage system (ESS) management. Current prediction models focus on reducing prediction errors but overlook their impact on downstream decision-making. So this paper proposes a decision-focused electricity price prediction approach for ESS arbitrage to bridge the gap from the downstream optimization model to the prediction model. The decision-focused approach aims at utilizing the downstream arbitrage model for training prediction models. It measures the difference between actual decisions under the predicted price and oracle decisions under the true price, i.e., decision error, by regret, transforms it into the tractable surrogate regret, and then derives the gradients to predicted price for training prediction models. Based on the prediction and decision errors, this paper proposes the hybrid loss and corresponding stochastic gradient descent learning method to learn prediction models for prediction and decision accuracy. The case study verifies that the proposed approach can efficiently bring more economic benefits and reduce decision errors by flattening the time distribution of prediction errors, compared to prediction models for only minimizing prediction errors.

Edge Learning for Large-Scale Internet of Things With Task-Oriented Efficient Communication

  • Authors: Haihui Xie, Minghua Xia, Peiran Wu, Shuai Wang, H. Vincent Poor
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.00383
  • Pdf link: https://arxiv.org/pdf/2305.00383
  • Abstract
    In the Internet of Things (IoT) networks, edge learning for data-driven tasks provides intelligent applications and services. As the network size becomes large, different users may generate distinct datasets. Thus, to suit multiple edge learning tasks for large-scale IoT networks, this paper performs efficient communication under the task-oriented principle by using the collaborative design of wireless resource allocation and edge learning error prediction. In particular, we start with multi-user scheduling to alleviate co-channel interference in dense networks. Then, we perform optimal power allocation in parallel for different learning tasks. Thanks to the high parallelization of the designed algorithm, extensive experimental results corroborate that the multi-user scheduling and task-oriented power allocation improve the performance of distinct edge learning tasks efficiently compared with the state-of-the-art benchmark algorithms.

Alternately denoising and reconstructing unoriented point sets

  • Authors: Dong Xiao, Zuoqiang Shi, Bin Wang
  • Subjects: Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2305.00391
  • Pdf link: https://arxiv.org/pdf/2305.00391
  • Abstract
    We propose a new strategy to bridge point cloud denoising and surface reconstruction by alternately updating the denoised point clouds and the reconstructed surfaces. In Poisson surface reconstruction, the implicit function is generated by a set of smooth basis functions centered at the octnodes. When the octree depth is properly selected, the reconstructed surface is a good smooth approximation of the noisy point set. Our method projects the noisy points onto the surface and alternately reconstructs and projects the point set. We use the iterative Poisson surface reconstruction (iPSR) to support unoriented surface reconstruction. Our method iteratively performs iPSR and acts as an outer loop of iPSR. Considering that the octree depth significantly affects the reconstruction results, we propose an adaptive depth selection strategy to ensure an appropriate depth choice. To manage the oversmoothing phenomenon near the sharp features, we propose a $\lambda$-projection method, which means to project the noisy points onto the surface with an individual control coefficient $\lambda_{i}$ for each point. The coefficients are determined through a Voronoi-based feature detection method. Experimental results show that our method achieves high performance in point cloud denoising and unoriented surface reconstruction within different noise scales, and exhibits well-rounded performance in various types of inputs.

Transformer-based Sequence Labeling for Audio Classification based on MFCCs

  • Authors: C. S. Sonali, Chinmayi B S, Ahana Balasubramanian
  • Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2305.00417
  • Pdf link: https://arxiv.org/pdf/2305.00417
  • Abstract
    Audio classification is vital in areas such as speech and music recognition. Feature extraction from the audio signal, such as Mel-Spectrograms and MFCCs, is a critical step in audio classification. These features are transformed into spectrograms for classification. Researchers have explored various techniques, including traditional machine and deep learning methods to classify spectrograms, but these can be computationally expensive. To simplify this process, a more straightforward approach inspired by sequence classification in NLP can be used. This paper proposes a Transformer-encoder-based model for audio classification using MFCCs. The model was benchmarked against the ESC-50, Speech Commands v0.02 and UrbanSound8k datasets and has shown strong performance, with the highest accuracy of 95.2% obtained upon training the model on the UrbanSound8k dataset. The model consisted of a mere 127,544 total parameters, making it light-weight yet highly efficient at the audio classification task.

Ortho-Radial Drawing in Near-Linear Time

  • Authors: Yi-Jun Chang
  • Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2305.00425
  • Pdf link: https://arxiv.org/pdf/2305.00425
  • Abstract
    An orthogonal drawing is an embedding of a plane graph into a grid. In a seminal work of Tamassia (SIAM Journal on Computing 1987), a simple combinatorial characterization of angle assignments that can be realized as bend-free orthogonal drawings was established, thereby allowing an orthogonal drawing to be described combinatorially by listing the angles of all corners. The characterization reduces the need to consider certain geometric aspects, such as edge lengths and vertex coordinates, and simplifies the task of graph drawing algorithm design. Barth, Niedermann, Rutter, and Wolf (SoCG 2017) established an analogous combinatorial characterization for ortho-radial drawings, which are a generalization of orthogonal drawings to cylindrical grids. The proof of the characterization is existential and does not result in an efficient algorithm. Niedermann, Rutter, and Wolf (SoCG 2019) later addressed this issue by developing quadratic-time algorithms for both testing the realizability of a given angle assignment as an ortho-radial drawing without bends and constructing such a drawing. In this paper, we further improve the time complexity of these tasks to near-linear time. We establish a new characterization for ortho-radial drawings based on the concept of a good sequence. Using the new characterization, we design a simple greedy algorithm for constructing ortho-radial drawings.

STAR-RIS-Aided Mobile Edge Computing: Computation Rate Maximization with Binary Amplitude Coefficients

  • Authors: Zhenrong Liu, Zongze Li, Miaowen Wen, Yi Gong, Yik-Chung Wu
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.00428
  • Pdf link: https://arxiv.org/pdf/2305.00428
  • Abstract
    In this paper, simultaneously transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS) is investigated in the multi-user mobile edge computing (MEC) system to improve the computation rate. Compared with traditional RIS-aided MEC, STAR-RIS extends the service coverage from half-space to full-space and provides new flexibility for improving the computation rate for end users. However, the STAR-RIS-aided MEC system design is a challenging problem due to the non-smooth and non-convex binary amplitude coefficients with coupled phase shifters. To fill this gap, this paper formulates a computation rate maximization problem via the joint design of the STAR-RIS phase shifts, reflection and transmission amplitude coefficients, the receive beamforming vectors, and energy partition strategies for local computing and offloading. To tackle the discontinuity caused by binary variables, we propose an efficient smoothing-based method to decrease convergence error, in contrast to the conventional penalty-based method, which brings many undesired stationary points and local optima. Furthermore, a fast iterative algorithm is proposed to obtain a stationary point for the joint optimization problem, with each subproblem solved by a low-complexity algorithm, making the proposed design scalable to a massive number of users and STAR-RIS elements. Simulation results validate the strength of the proposed smoothing-based method and show that the proposed fast iterative algorithm achieves a higher computation rate than the conventional method while saving the computation time by at least an order of magnitude. Moreover, the resultant STAR-RIS-aided MEC system significantly improves the computation rate compared to other baseline schemes with conventional reflect-only/transmit-only RIS.

TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation

  • Authors: Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan He
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2305.00447
  • Pdf link: https://arxiv.org/pdf/2305.00447
  • Abstract
    Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains, thereby prompting researchers to explore their potential for use in recommendation systems. Initial attempts have leveraged the exceptional capabilities of LLMs, such as rich knowledge and strong generalization through In-context Learning, which involves phrasing the recommendation task as prompts. Nevertheless, the performance of LLMs in recommendation tasks remains suboptimal due to a substantial disparity between the training tasks for LLMs and recommendation tasks, as well as inadequate recommendation data during pre-training. To bridge the gap, we consider building a Large Recommendation Language Model by tunning LLMs with recommendation data. To this end, we propose an efficient and effective Tuning framework for Aligning LLMs with Recommendation, namely TALLRec. We have demonstrated that the proposed TALLRec framework can significantly enhance the recommendation capabilities of LLMs in the movie and book domains, even with a limited dataset of fewer than 100 samples. Additionally, the proposed framework is highly efficient and can be executed on a single RTX 3090 with LLaMA-7B. Furthermore, the fine-tuned LLM exhibits robust cross-domain generalization. Our code and data are available at https://github.com/SAI990323/TALLRec.

Hypergraphs with Edge-Dependent Vertex Weights: Spectral Clustering based on the 1-Laplacian

  • Authors: Yu Zhu, Boning Li, Santiago Segarra
  • Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2305.00462
  • Pdf link: https://arxiv.org/pdf/2305.00462
  • Abstract
    We propose a flexible framework for defining the 1-Laplacian of a hypergraph that incorporates edge-dependent vertex weights. These weights are able to reflect varying importance of vertices within a hyperedge, thus conferring the hypergraph model higher expressivity than homogeneous hypergraphs. We then utilize the eigenvector associated with the second smallest eigenvalue of the hypergraph 1-Laplacian to cluster the vertices. From a theoretical standpoint based on an adequately defined normalized Cheeger cut, this procedure is expected to achieve higher clustering accuracy than that based on the traditional Laplacian. Indeed, we confirm that this is the case using real-world datasets to demonstrate the effectiveness of the proposed spectral clustering approach. Moreover, we show that for a special case within our framework, the corresponding hypergraph 1-Laplacian is equivalent to the 1-Laplacian of a related graph, whose eigenvectors can be computed more efficiently, facilitating the adoption on larger datasets.

Unified high-order multi-scale method for mechanical behavior simulation and strength prediction of composite plate and shell structures

  • Authors: Ge Bu-Feng, Gao Ming-Yuan, Dong Hao
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2305.00464
  • Pdf link: https://arxiv.org/pdf/2305.00464
  • Abstract
    The complicated mesoscopic configurations of composite plate and shell structures requires a huge amount of computational overhead for directly simulating their mechanical problems. In this paper, a unified high-order multi-scale method, which can effectively simulate the mechanical behavior and predict yield strength of composite plates and shells, is developed. Firstly, through the multiscale asymptotic analysis of multi-scale elastic equations in the orthogonal curvilinear coordinate system, a high-order multi-scale model is established, which can uniformly and effectively analyze the mechanical behavior of composite plate and shell structures. Moreover, the error estimation of the high-order multi-scale solutions is derived. Then, combining with the material strength theory, a high-order multi-scale model for the strength prediction of composite plate and shell structures is established. Next, based on the established high-order multi-scale model, a multi-scale algorithm is developed which can not only efficiently and accurately simulate the mechanical behaviors of composite plate and shell structures, but also predict their yield strength. Finally, the effectiveness of the established high-order multi-scale method is verified by extensive numerical experiments. The numerical experimental results indicate that the high-order multi-scale method can more accurately capture the meso-scale oscillatory behaviors of composite plate and shell structures. The unified high-order multi-scale method established in this paper is not only suitable for the prediction of mechanical properties of composite plate and shell structures, but also can be further extended to the prediction of multi-field coupling properties of composite plate and shell structures.

Efficient and accurate nonlinear model reduction via first-order empirical interpolation

  • Authors: Ngoc Cuong Nguyen, Jaime Peraire
  • Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
  • Arxiv link: https://arxiv.org/abs/2305.00466
  • Pdf link: https://arxiv.org/pdf/2305.00466
  • Abstract
    We present a model reduction approach that extends the original empirical interpolation method to enable accurate and efficient reduced basis approximation of parametrized nonlinear partial differential equations (PDEs). In the presence of nonlinearity, the Galerkin reduced basis approximation remains computationally expensive due to the high complexity of evaluating the nonlinear terms, which depends on the dimension of the truth approximation. The empirical interpolation method (EIM) was proposed as a nonlinear model reduction technique to render the complexity of evaluating the nonlinear terms independent of the dimension of the truth approximation. The main idea is to replace any nonlinear term with a reduced basis expansion expressed as a linear combination of pre-computed basis functions and parameter-dependent coefficients. The coefficients are determined efficiently by an inexpensive and stable interpolation procedure. In order to improve the approximation accuracy, we propose a first-order empirical interpolation method (FOEIM) that employs both the nonlinear function and its partial derivatives at selected parameter points to construct the reduced basis expansion of the nonlinear term. Our approach is applied to nonlinear elliptic PDEs and compared to the Galerkin reduced basis approximation and the EIM. Numerical results are presented to demonstrate the performance of the three reduced basis approaches.

Posterior Sampling for Deep Reinforcement Learning

  • Authors: Remo Sasso, Michelangelo Conserva, Paulo Rauber
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.00477
  • Pdf link: https://arxiv.org/pdf/2305.00477
  • Abstract
    Despite remarkable successes, deep reinforcement learning algorithms remain sample inefficient: they require an enormous amount of trial and error to find good policies. Model-based algorithms promise sample efficiency by building an environment model that can be used for planning. Posterior Sampling for Reinforcement Learning is such a model-based algorithm that has attracted significant interest due to its performance in the tabular setting. This paper introduces Posterior Sampling for Deep Reinforcement Learning (PSDRL), the first truly scalable approximation of Posterior Sampling for Reinforcement Learning that retains its model-based essence. PSDRL combines efficient uncertainty quantification over latent state space models with a specially tailored continual planning algorithm based on value-function approximation. Extensive experiments on the Atari benchmark show that PSDRL significantly outperforms previous state-of-the-art attempts at scaling up posterior sampling while being competitive with a state-of-the-art (model-based) reinforcement learning method, both in sample efficiency and computational efficiency.

Learned Focused Plenoptic Image Compression with Microimage Preprocessing and Global Attention

  • Authors: Kedeng Tong, Xin Jin, Yuqing Yang, Chen Wang, Jinshi Kang, Fan Jiang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2305.00489
  • Pdf link: https://arxiv.org/pdf/2305.00489
  • Abstract
    Focused plenoptic cameras can record spatial and angular information of the light field (LF) simultaneously with higher spatial resolution relative to traditional plenoptic cameras, which facilitate various applications in computer vision. However, the existing plenoptic image compression methods present ineffectiveness to the captured images due to the complex micro-textures generated by the microlens relay imaging and long-distance correlations among the microimages. In this paper, a lossy end-to-end learning architecture is proposed to compress the focused plenoptic images efficiently. First, a data preprocessing scheme is designed according to the imaging principle to remove the sub-aperture image ineffective pixels in the recorded light field and align the microimages to the rectangular grid. Then, the global attention module with large receptive field is proposed to capture the global correlation among the feature maps using pixel-wise vector attention computed in the resampling process. Also, a new image dataset consisting of 1910 focused plenoptic images with content and depth diversity is built to benefit training and testing. Extensive experimental evaluations demonstrate the effectiveness of the proposed approach. It outperforms intra coding of HEVC and VVC by an average of 62.57% and 51.67% bitrate reduction on the 20 preprocessed focused plenoptic images, respectively. Also, it achieves 18.73% bitrate saving and generates perceptually pleasant reconstructions compared to the state-of-the-art end-to-end image compression methods, which benefits the applications of focused plenoptic cameras greatly. The dataset and code are publicly available at https://github.com/VincentChandelier/GACN.

Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition

  • Authors: Pangoth Santhosh Kumar, Garika Akshay
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.00552
  • Pdf link: https://arxiv.org/pdf/2305.00552
  • Abstract
    In low-resource computing contexts, such as smartphones and other tiny devices, Both deep learning and machine learning are being used in a lot of identification systems. as authentication techniques. The transparent, contactless, and non-invasive nature of these face recognition technologies driven by AI has led to their meteoric rise in popularity in recent years. While they are mostly successful, there are still methods to get inside without permission by utilising things like pictures, masks, glasses, etc. In this research, we present an alternate authentication process that makes use of both facial recognition and the individual's distinctive temporal facial feature motions while they speak a password. Because the suggested methodology allows for a password to be specified in any language, it is not limited by language. The suggested model attained an accuracy of 96.1% when tested on the industry-standard MIRACL-VC1 dataset, demonstrating its efficacy as a reliable and powerful solution. In addition to being data-efficient, the suggested technique shows promising outcomes with as little as 10 positive video examples for training the model. The effectiveness of the network's training is further proved via comparisons with other combined facial recognition and lip reading models.

Collective Relational Inference for learning physics-consistent heterogeneous particle interactions

  • Authors: Zhichao Han, Olga Fink, David S. Kammer
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00557
  • Pdf link: https://arxiv.org/pdf/2305.00557
  • Abstract
    Interacting particle systems are ubiquitous in nature and engineering. Revealing particle interaction laws is of fundamental importance but also particularly challenging due to underlying configurational complexities. Recently developed machine learning methods show great potential in discovering pairwise interactions from particle trajectories in homogeneous systems. However, they fail to reveal interactions in heterogeneous systems that are prevalent in reality, where multiple interaction types coexist simultaneously and relational inference is required. Here, we propose a novel probabilistic method for relational inference, which possesses two distinctive characteristics compared to existing methods. First, it infers the interaction types of different edges collectively, and second, it uses a physics-induced graph neural network to learn physics-consistent pairwise interactions. We evaluate the proposed methodology across several benchmark datasets and demonstrate that it is consistent with the underlying physics. Furthermore, we showcase its ability to outperform existing methods in accurately inferring interaction types. In addition, the proposed model is data-efficient and generalizable to large systems when trained on smaller ones, which contrasts with previously proposed solutions. The developed methodology constitutes a key element for the discovery of the fundamental laws that determine macroscopic mechanical properties of particle systems.

Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL

  • Authors: Baiting Zhu, Meihua Dang, Aditya Grover
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.00567
  • Pdf link: https://arxiv.org/pdf/2305.00567
  • Abstract
    The goal of multi-objective reinforcement learning (MORL) is to learn policies that simultaneously optimize multiple competing objectives. In practice, an agent's preferences over the objectives may not be known apriori, and hence, we require policies that can generalize to arbitrary preferences at test time. In this work, we propose a new data-driven setup for offline MORL, where we wish to learn a preference-agnostic policy agent using only a finite dataset of offline demonstrations of other agents and their preferences. The key contributions of this work are two-fold. First, we introduce D4MORL, (D)atasets for MORL that are specifically designed for offline settings. It contains 1.8 million annotated demonstrations obtained by rolling out reference policies that optimize for randomly sampled preferences on 6 MuJoCo environments with 2-3 objectives each. Second, we propose Pareto-Efficient Decision Agents (PEDA), a family of offline MORL algorithms that builds and extends Decision Transformers via a novel preference-and-return-conditioned policy. Empirically, we show that PEDA closely approximates the behavioral policy on the D4MORL benchmark and provides an excellent approximation of the Pareto-front with appropriate conditioning, as measured by the hypervolume and sparsity metrics.

RAPID: Autonomous Multi-Agent Racing using Constrained Potential Dynamic Games

  • Authors: Yixuan Jia, Maulik Bhatt, Negar Mehr
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.00579
  • Pdf link: https://arxiv.org/pdf/2305.00579
  • Abstract
    In this work, we consider the problem of autonomous racing with multiple agents where agents must interact closely and influence each other to compete. We model interactions among agents through a game-theoretical framework and propose an efficient algorithm for tractably solving the resulting game in real time. More specifically, we capture interactions among multiple agents through a constrained dynamic game. We show that the resulting dynamic game is an instance of a simple-to-analyze class of games. Namely, we show that our racing game is an instance of a constrained dynamic potential game. An important and appealing property of dynamic potential games is that a generalized Nash equilibrium of the underlying game can be computed by solving a single constrained optimal control problem instead of multiple coupled constrained optimal control problems. Leveraging this property, we show that the problem of autonomous racing is greatly simplified and develop RAPID (autonomous multi-agent RAcing using constrained PotentIal Dynamic games), a racing algorithm that can be solved tractably in real-time. Through simulation studies, we demonstrate that our algorithm outperforms the state-of-the-art approach. We further show the real-time capabilities of our algorithm in hardware experiments.

The MCC approaches the geometric mean of precision and recall as true negatives approach infinity

  • Authors: Jon Crall
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00594
  • Pdf link: https://arxiv.org/pdf/2305.00594
  • Abstract
    The performance of a binary classifier is described by a confusion matrix with four entries: the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). The Matthew's Correlation Coefficient (MCC), F1, and Fowlkes--Mallows (FM) scores are scalars that summarize a confusion matrix. Both the F1 and FM scores are based on only three of the four entries in the confusion matrix (they ignore TN). In contrast, the MCC takes into account all four entries of the confusion matrix and thus can be seen as providing a more representative picture. However, in object detection problems, measuring the number of true negatives is so large it is often intractable. Thus we ask, what happens to the MCC as the number of true negatives approaches infinity? This paper provides insight into the relationship between the MCC and FM score by proving that the FM-measure is equal to the limit of the MCC as the number of true negatives approaches infinity.

Containerization of a polyglot microservice application using Docker and Kubernetes

  • Authors: Vamsi Krishna Yepuri, Venkata Kalyan Polamarasetty, Shivani Donthi, Ajay Kumar Reddy Gondi
  • Subjects: Software Engineering (cs.SE); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2305.00600
  • Pdf link: https://arxiv.org/pdf/2305.00600
  • Abstract
    This project investigates the benefits of containerization technology in modern software development and deployment. The study emphasizes the advantages of using Kubernetes and Docker in the development process, including the easy packaging and deployment of microservices, efficient resource utilization, faster startup times, and greater scalability and flexibility. The project concludes by proposing a study that involves creating a polyglot microservice application using Java, Python, and JavaScript, containerizing it with Docker, and deploying it in Kubernetes. The study aims to evaluate service discovery and auto-scaling in distributed mode and compare the performance metrics with virtual machines and containers. The results of this study can inform software development teams about the benefits of containerization in modern software development and deployment.

Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation

  • Authors: Tianxiang Hao, Hui Chen, Yuchen Guo, Guiguang Ding
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00603
  • Pdf link: https://arxiv.org/pdf/2305.00603
  • Abstract
    Recently, transformers have shown strong ability as visual feature extractors, surpassing traditional convolution-based models in various scenarios. However, the success of vision transformers largely owes to their capacity to accommodate numerous parameters. As a result, new challenges for adapting large models to downstream tasks arise. On the one hand, classic fine-tuning tunes all parameters in a huge model for every task and thus easily falls into overfitting, leading to inferior performance. On the other hand, on resource-limited devices, fine-tuning stores a full copy of parameters and thus is usually impracticable for the shortage of storage space. However, few works have focused on how to efficiently and effectively transfer knowledge in a vision transformer. Existing methods did not dive into the properties of visual features, leading to inferior performance. Moreover, some of them bring heavy inference cost though benefiting storage. To tackle these problems, we propose consolidator to modify the pre-trained model with the addition of a small set of tunable parameters to temporarily store the task-specific knowledge while freezing the backbone model. Motivated by the success of group-wise convolution, we adopt grouped connections across the features extracted by fully connected layers to construct tunable parts in a consolidator. To further enhance the model's capacity to transfer knowledge under a constrained storage budget and keep inference efficient, we consolidate the parameters in two stages: 1. between adaptation and storage, and 2. between loading and inference. On a series of downstream visual tasks, our consolidator can reach up to 7.56 better accuracy than full fine-tuning with merely 0.35% parameters, and outperform state-of-the-art parameter-efficient tuning methods by a clear margin. Code is available at https://github.com/beyondhtx/Consolidator.

Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding

  • Authors: Yuxi Xie, Kenji Kawaguchi, Yiran Zhao, Xu Zhao, Min-Yen Kan, Junxian He, Qizhe Xie
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00633
  • Pdf link: https://arxiv.org/pdf/2305.00633
  • Abstract
    We propose an effective prompting approach that integrates self-evaluation guidance through stochastic beam search. Our approach explores the reasoning search space using a well-calibrated automatic criterion. This enables an efficient search to produce higher-quality final predictions. With the self-evaluation guided stochastic beam search, we also balance the quality--diversity trade-off in the generation of reasoning chains. This allows our approach to adapt well with majority voting and surpass the corresponding Codex-backboned baselines by $6.34%$, $9.56%$, and $5.46%$ on the GSM8K, AQUA, and StrategyQA benchmarks, respectively, in few-shot accuracy. Analysis of our decompositional reasoning finds it pinpoints logic failures and leads to higher consistency and robustness.

GTree: GPU-Friendly Privacy-preserving Decision Tree Training and Inference

  • Authors: Qifan Wang, Shujie Cui, Lei Zhou, Ye Dong, Jianli Bai, Yun Sing Koh, Giovanni Russello
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2305.00645
  • Pdf link: https://arxiv.org/pdf/2305.00645
  • Abstract
    Decision tree (DT) is a widely used machine learning model due to its versatility, speed, and interpretability. However, for privacy-sensitive applications, outsourcing DT training and inference to cloud platforms raise concerns about data privacy. Researchers have developed privacy-preserving approaches for DT training and inference using cryptographic primitives, such as Secure Multi-Party Computation (MPC). While these approaches have shown progress, they still suffer from heavy computation and communication overheads. Few recent works employ Graphical Processing Units (GPU) to improve the performance of MPC-protected deep learning. This raises a natural question: \textit{can MPC-protected DT training and inference be accelerated by GPU?} We present GTree, the first scheme that uses GPU to accelerate MPC-protected secure DT training and inference. GTree is built across 3 parties who securely and jointly perform each step of DT training and inference with GPU. Each MPC protocol in GTree is designed in a GPU-friendly version. The performance evaluation shows that GTree achieves ${\thicksim}11{\times}$ and ${\thicksim}21{\times}$ improvements in training SPECT and Adult datasets, compared to the prior most efficient CPU-based work. For inference, GTree shows its superior efficiency when the DT has less than 10 levels, which is $126\times$ faster than the prior most efficient work when inferring $10^4$ instances with a tree of 7 levels. GTree also achieves a stronger security guarantee than prior solutions, which only leaks the tree depth and size of data samples while prior solutions also leak the tree structure. With \textit{oblivious array access}, the access pattern on GPU is also protected.

Dynamic Transfer Learning across Graphs

  • Authors: Haohui Wang, Yuzhen Mao, Jianhui Sun, Si Zhang, Dawei Zhou
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00664
  • Pdf link: https://arxiv.org/pdf/2305.00664
  • Abstract
    Transferring knowledge across graphs plays a pivotal role in many high-stake domains, ranging from transportation networks to e-commerce networks, from neuroscience to finance. To date, the vast majority of existing works assume both source and target domains are sampled from a universal and stationary distribution. However, many real-world systems are intrinsically dynamic, where the underlying domains are evolving over time. To bridge the gap, we propose to shift the problem to the dynamic setting and ask: given the label-rich source graphs and the label-scarce target graphs observed in previous T timestamps, how can we effectively characterize the evolving domain discrepancy and optimize the generalization performance of the target domain at the incoming T+1 timestamp? To answer the question, for the first time, we propose a generalization bound under the setting of dynamic transfer learning across graphs, which implies the generalization performance is dominated by domain evolution and domain discrepancy between source and target domains. Inspired by the theoretical results, we propose a novel generic framework DyTrans to improve knowledge transferability across dynamic graphs. In particular, we start with a transformer-based temporal encoding module to model temporal information of the evolving domains; then, we further design a dynamic domain unification module to efficiently learn domain-invariant representations across the source and target domains. Finally, extensive experiments on various real-world datasets demonstrate the effectiveness of DyTrans in transferring knowledge from dynamic source domains to dynamic target domains.

On the Complexity of Multi-Agent Decision Making: From Learning in Games to Partial Monitoring

  • Authors: Dylan J. Foster, Dean P. Foster, Noah Golowich, Alexander Rakhlin
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2305.00684
  • Pdf link: https://arxiv.org/pdf/2305.00684
  • Abstract
    A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees, and how these considerations change as we move from few to many agents. We study this question in a general framework for interactive decision making with multiple agents, encompassing Markov games with function approximation and normal-form games with bandit feedback. We focus on equilibrium computation, in which a centralized learning algorithm aims to compute an equilibrium by controlling multiple agents that interact with an unknown environment. Our main contributions are: - We provide upper and lower bounds on the optimal sample complexity for multi-agent decision making based on a multi-agent generalization of the Decision-Estimation Coefficient, a complexity measure introduced by Foster et al. (2021) in the single-agent counterpart to our setting. Compared to the best results for the single-agent setting, our bounds have additional gaps. We show that no "reasonable" complexity measure can close these gaps, highlighting a striking separation between single and multiple agents. - We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making, but with hidden (unobserved) rewards, a framework that subsumes variants of the partial monitoring problem. As a consequence, we characterize the statistical complexity for hidden-reward interactive decision making to the best extent possible. Building on this development, we provide several new structural results, including 1) conditions under which the statistical complexity of multi-agent decision making can be reduced to that of single-agent, and 2) conditions under which the so-called curse of multiple agents can be avoided.

Efficient dynamic model based testing using greedy test case selection

  • Authors: P.H.M. van Spaendonck
  • Subjects: Software Engineering (cs.SE); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2305.00705
  • Pdf link: https://arxiv.org/pdf/2305.00705
  • Abstract
    Model-based testing (MBT) provides an automated approach for finding discrepancies between software models and their implementation. If we want to incorporate MBT into the fast and iterative software development process that is Continuous Integration Continuous Deployment, then MBT must be able to test the entire model in as little time as possible. However, current academic MBT tools either traverse models at random, which we show to be ineffective for this purpose, or use precalculated optimal paths which can not be efficiently calculated for large industrial models. We provide a new traversal strategy that provides an improvement in error-detection rate comparable to using recalculated paths. We show that the new strategy is able to be applied efficiently to large models. The benchmarks are performed on a mix of real-world and pseudo-randomly generated models. We observe no significant difference between these two types of models.

ZeroSearch: Local Image Search from Text with Zero Shot Learning

  • Authors: Jatin Nainani, Abhishek Mazumdar, Viraj Sheth
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00715
  • Pdf link: https://arxiv.org/pdf/2305.00715
  • Abstract
    The problem of organizing and finding images in a user's directory has become increasingly challenging due to the rapid growth in the number of images captured on personal devices. This paper presents a solution that utilizes zero shot learning to create image queries with only user provided text descriptions. The paper's primary contribution is the development of an algorithm that utilizes pre-trained models to extract features from images. The algorithm uses OWL to check for the presence of bounding boxes and sorts images based on cosine similarity scores. The algorithm's output is a list of images sorted in descending order of similarity, helping users to locate specific images more efficiently. The paper's experiments were conducted using a custom dataset to simulate a user's image directory and evaluated the accuracy, inference time, and size of the models. The results showed that the VGG model achieved the highest accuracy, while the Resnet50 and InceptionV3 models had the lowest inference time and size. The papers proposed algorithm provides an effective and efficient solution for organizing and finding images in a users local directory. The algorithm's performance and flexibility make it suitable for various applications, including personal image organization and search engines. Code and dataset for zero-search are available at: https://github.com/NainaniJatinZ/zero-search

Adaptively Topological Tensor Network for Multi-view Subspace Clustering

  • Authors: Yipeng Liu, Yingcong Lu, Weiting Ou, Zhen Long, Ce Zhu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00716
  • Pdf link: https://arxiv.org/pdf/2305.00716
  • Abstract
    Multi-view subspace clustering methods have employed learned self-representation tensors from different tensor decompositions to exploit low rank information. However, the data structures embedded with self-representation tensors may vary in different multi-view datasets. Therefore, a pre-defined tensor decomposition may not fully exploit low rank information for a certain dataset, resulting in sub-optimal multi-view clustering performance. To alleviate the aforementioned limitations, we propose the adaptively topological tensor network (ATTN) by determining the edge ranks from the structural information of the self-representation tensor, and it can give a better tensor representation with the data-driven strategy. Specifically, in multi-view tensor clustering, we analyze the higher-order correlations among different modes of a self-representation tensor, and prune the links of the weakly correlated ones from a fully connected tensor network. Therefore, the newly obtained tensor networks can efficiently explore the essential clustering information with self-representation with different tensor structures for various datasets. A greedy adaptive rank-increasing strategy is further applied to improve the capture capacity of low rank structure. We apply ATTN on multi-view subspace clustering and utilize the alternating direction method of multipliers to solve it. Experimental results show that multi-view subspace clustering based on ATTN outperforms the counterparts on six multi-view datasets.

Emotions Beyond Words: Non-Speech Audio Emotion Recognition With Edge Computing

  • Authors: Ibrahim Malik, Siddique Latif, Sanaullah Manzoor, Muhammad Usama, Junaid Qadir, Raja Jurdak
  • Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2305.00725
  • Pdf link: https://arxiv.org/pdf/2305.00725
  • Abstract
    Non-speech emotion recognition has a wide range of applications including healthcare, crime control and rescue, and entertainment, to name a few. Providing these applications using edge computing has great potential, however, recent studies are focused on speech-emotion recognition using complex architectures. In this paper, a non-speech-based emotion recognition system is proposed, which can rely on edge computing to analyse emotions conveyed through non-speech expressions like screaming and crying. In particular, we explore knowledge distillation to design a computationally efficient system that can be deployed on edge devices with limited resources without degrading the performance significantly. We comprehensively evaluate our proposed framework using two publicly available datasets and highlight its effectiveness by comparing the results with the well-known MobileNet model. Our results demonstrate the feasibility and effectiveness of using edge computing for non-speech emotion detection, which can potentially improve applications that rely on emotion detection in communication networks. To the best of our knowledge, this is the first work on an edge-computing-based framework for detecting emotions in non-speech audio, offering promising directions for future research.

Breaks and Code Quality: Investigating the Impact of Forgetting on Software Development. A Registered Report

  • Authors: Dario Amoroso d'Aragona, Luca Pascarella, Andrea Janes, Valentina Lenarduzzi, Rafael Penaloza, Davide Taibi
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2305.00760
  • Pdf link: https://arxiv.org/pdf/2305.00760
  • Abstract
    Developers interrupting their participation in a project might slowly forget critical information about the code, such as its intended purpose, structure, the impact of external dependencies, and the approach used for implementation. Forgetting the implementation details can have detrimental effects on software maintenance, comprehension, knowledge sharing, and developer productivity, resulting in bugs, and other issues that can negatively influence the software development process. Therefore, it is crucial to ensure that developers have a clear understanding of the codebase and can work efficiently and effectively even after long interruptions. This registered report seeks to investigate the relationship between a developer's commits break and different code quality properties, so as to understand if the amount of activity in a project impact the code quality, and if developers with different activity profiles show different impacts on code quality. The results might be useful to understand if it is beneficial to promote the practice of developing multiple projects in parallel, or if it is more beneficial to reduce the number of projects each developer contributes.

SGX Switchless Calls Made Configless

  • Authors: Peterson Yuhala, Michael Paper, Timothée Zerbib, Pascal Felber, Valerio Schiavoni, Alain Tchana
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2305.00763
  • Pdf link: https://arxiv.org/pdf/2305.00763
  • Abstract
    Intel's software guard extensions (SGX) provide hardware enclaves to guarantee confidentiality and integrity for sensitive code and data. However, systems leveraging such security mechanisms must often pay high performance overheads. A major source of this overhead is SGX enclave transitions which induce expensive cross-enclave context switches. The Intel SGX SDK mitigates this with a switchless call mechanism for transitionless cross-enclave calls using worker threads. Intel's SGX switchless call implementation improves performance but provides limited flexibility: developers need to statically fix the system configuration at build time, which is error-prone and misconfigurations lead to performance degradations and waste of CPU resources. ZC-SWITCHLESS is a configless and efficient technique to drive the execution of SGX switchless calls. Its dynamic approach optimises the total switchless worker threads at runtime to minimise CPU waste. The experimental evaluation shows that ZC-SWITCHLESS obviates the performance penalty of misconfigured switchless systems while minimising CPU waste.

Montsalvat: Intel SGX Shielding for GraalVM Native Images

  • Authors: Peterson Yuhala, Jämes Ménétrey, Pascal Felber, Valerio Schiavoni, Alain Tchana, Gaël Thomas, Hugo Guiroux, Jean-Pierre Lozi
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2305.00766
  • Pdf link: https://arxiv.org/pdf/2305.00766
  • Abstract
    The popularity of the Java programming language has led to its wide adoption in cloud computing infrastructures. However, Java applications running in untrusted clouds are vulnerable to various forms of privileged attacks. The emergence of trusted execution environments (TEEs) such as Intel SGX mitigates this problem. TEEs protect code and data in secure enclaves inaccessible to untrusted software, including the kernel and hypervisors. To efficiently use TEEs, developers must manually partition their applications into trusted and untrusted parts, in order to reduce the size of the trusted computing base (TCB) and minimise the risks of security vulnerabilities. However, partitioning applications poses two important challenges: (i) ensuring efficient object communication between the partitioned components, and (ii) ensuring the consistency of garbage collection between the parts, especially with memory-managed languages such as Java. We present Montsalvat, a tool which provides a practical and intuitive annotation-based partitioning approach for Java applications destined for secure enclaves. Montsalvat provides an RMI-like mechanism to ensure inter-object communication, as well as consistent garbage collection across the partitioned components. We implement Montsalvat with GraalVM native-image, a tool for compiling Java applications ahead-of-time into standalone native executables that do not require a JVM at runtime. Our extensive evaluation with micro- and macro-benchmarks shows our partitioning approach to boost performance in real-world applications

RViDeformer: Efficient Raw Video Denoising Transformer with a Larger Benchmark Dataset

  • Authors: Huanjing Yue, Cong Cao, Lei Liao, Jingyu Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00767
  • Pdf link: https://arxiv.org/pdf/2305.00767
  • Abstract
    In recent years, raw video denoising has garnered increased attention due to the consistency with the imaging process and well-studied noise modeling in the raw domain. However, two problems still hinder the denoising performance. Firstly, there is no large dataset with realistic motions for supervised raw video denoising, as capturing noisy and clean frames for real dynamic scenes is difficult. To address this, we propose recapturing existing high-resolution videos displayed on a 4K screen with high-low ISO settings to construct noisy-clean paired frames. In this way, we construct a video denoising dataset (named as ReCRVD) with 120 groups of noisy-clean videos, whose ISO values ranging from 1600 to 25600. Secondly, while non-local temporal-spatial attention is beneficial for denoising, it often leads to heavy computation costs. We propose an efficient raw video denoising transformer network (RViDeformer) that explores both short and long-distance correlations. Specifically, we propose multi-branch spatial and temporal attention modules, which explore the patch correlations from local window, local low-resolution window, global downsampled window, and neighbor-involved window, and then they are fused together. We employ reparameterization to reduce computation costs. Our network is trained in both supervised and unsupervised manners, achieving the best performance compared with state-of-the-art methods. Additionally, the model trained with our proposed dataset (ReCRVD) outperforms the model trained with previous benchmark dataset (CRVD) when evaluated on the real-world outdoor noisy videos. Our code and dataset will be released after the acceptance of this work.

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

  • Authors: Zhenhui Ye, Jinzheng He, Ziyue Jiang, Rongjie Huang, Jiawei Huang, Jinglin Liu, Yi Ren, Xiang Yin, Zejun Ma, Zhou Zhao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00787
  • Pdf link: https://arxiv.org/pdf/2305.00787
  • Abstract
    Generating talking person portraits with arbitrary speech audio is a crucial problem in the field of digital human and metaverse. A modern talking face generation method is expected to achieve the goals of generalized audio-lip synchronization, good video quality, and high system efficiency. Recently, neural radiance field (NeRF) has become a popular rendering technique in this field since it could achieve high-fidelity and 3D-consistent talking face generation with a few-minute-long training video. However, there still exist several challenges for NeRF-based methods: 1) as for the lip synchronization, it is hard to generate a long facial motion sequence of high temporal consistency and audio-lip accuracy; 2) as for the video quality, due to the limited data used to train the renderer, it is vulnerable to out-of-domain input condition and produce bad rendering results occasionally; 3) as for the system efficiency, the slow training and inference speed of the vanilla NeRF severely obstruct its usage in real-world applications. In this paper, we propose GeneFace++ to handle these challenges by 1) utilizing the pitch contour as an auxiliary feature and introducing a temporal loss in the facial motion prediction process; 2) proposing a landmark locally linear embedding method to regulate the outliers in the predicted motion sequence to avoid robustness issues; 3) designing a computationally efficient NeRF-based motion-to-video renderer to achieves fast training and real-time inference. With these settings, GeneFace++ becomes the first NeRF-based method that achieves stable and real-time talking face generation with generalized audio-lip synchronization. Extensive experiments show that our method outperforms state-of-the-art baselines in terms of subjective and objective evaluation. Video samples are available at https://genefaceplusplus.github.io .

Automated Paper Screening for Clinical Reviews Using Large Language Models

  • Authors: Eddie Guo, Mehul Gupta, Jiawen Deng, Ye-Jean Park, Mike Paget, Christopher Naugler
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.00844
  • Pdf link: https://arxiv.org/pdf/2305.00844
  • Abstract
    Objective: To assess the performance of the OpenAI GPT API in accurately and efficiently identifying relevant titles and abstracts from real-world clinical review datasets and compare its performance against ground truth labelling by two independent human reviewers. Methods: We introduce a novel workflow using the OpenAI GPT API for screening titles and abstracts in clinical reviews. A Python script was created to make calls to the GPT API with the screening criteria in natural language and a corpus of title and abstract datasets that have been filtered by a minimum of two human reviewers. We compared the performance of our model against human-reviewed papers across six review papers, screening over 24,000 titles and abstracts. Results: Our results show an accuracy of 0.91, a sensitivity of excluded papers of 0.91, and a sensitivity of included papers of 0.76. On a randomly selected subset of papers, the GPT API demonstrated the ability to provide reasoning for its decisions and corrected its initial decision upon being asked to explain its reasoning for a subset of incorrect classifications. Conclusion: The GPT API has the potential to streamline the clinical review process, save valuable time and effort for researchers, and contribute to the overall quality of clinical reviews. By prioritizing the workflow and acting as an aid rather than a replacement for researchers and reviewers, the GPT API can enhance efficiency and lead to more accurate and reliable conclusions in medical research.

(1+1)-CMA-ES with Margin for Discrete and Mixed-Integer Problems

  • Authors: Yohei Watanabe, Kento Uchida, Ryoki Hamano, Shota Saito, Masahiro Nomura, Shinichi Shirakawa
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2305.00849
  • Pdf link: https://arxiv.org/pdf/2305.00849
  • Abstract
    The covariance matrix adaptation evolution strategy (CMA-ES) is an efficient continuous black-box optimization method. The CMA-ES possesses many attractive features, including invariance properties and a well-tuned default hyperparameter setting. Moreover, several components to specialize the CMA-ES have been proposed, such as noise handling and constraint handling. To utilize these advantages in mixed-integer optimization problems, the CMA-ES with margin has been proposed. The CMA-ES with margin prevents the premature convergence of discrete variables by the margin correction, in which the distribution parameters are modified to leave the generation probability for changing the discrete variable. The margin correction has been applied to ($\mu/\mu_\mathrm{w}$,$\lambda$)-CMA-ES, while this paper introduces the margin correction into (1+1)-CMA-ES, an elitist version of CMA-ES. The (1+1)-CMA-ES is often advantageous for unimodal functions and can be computationally less expensive. To tackle the performance deterioration on mixed-integer optimization, we use the discretized elitist solution as the mean of the sampling distribution and modify the margin correction not to move the elitist solution. The numerical simulation using benchmark functions on mixed-integer, integer, and binary domains shows that (1+1)-CMA-ES with margin outperforms the CMA-ES with margin and is better than or comparable with several specialized methods to a particular search domain.

Multi-Agent Systems with Quantitative Satisficing Goals

  • Authors: Senthil Rajasekaran, Suguman Bansal, Moshe Y. Vardi
  • Subjects: Computer Science and Game Theory (cs.GT); Formal Languages and Automata Theory (cs.FL)
  • Arxiv link: https://arxiv.org/abs/2305.00953
  • Pdf link: https://arxiv.org/pdf/2305.00953
  • Abstract
    In the study of reactive systems, qualitative properties are usually easier to model and analyze than quantitative properties. This is especially true in systems where mutually beneficial cooperation between agents is possible, such as multi-agent systems. The large number of possible payoffs available to agents in reactive systems with quantitative properties means that there are many scenarios in which agents deviate from mutually beneficial outcomes in order to gain negligible payoff improvements. This behavior often leads to less desirable outcomes for all agents involved. For this reason we study satisficing goals, derived from a decision-making approach aimed at meeting a good-enough outcome instead of pure optimization. By considering satisficing goals, we are able to employ efficient automata-based algorithms to find pure-strategy Nash equilibria. We then show that these algorithms extend to scenarios in which agents have multiple thresholds, providing an approximation of optimization while still retaining the possibility of mutually beneficial cooperation and efficient automata-based algorithms. Finally, we demonstrate a one-way correspondence between the existence of $\epsilon$-equilibria and the existence of equilibria in games where agents have multiple thresholds.

A Spectral Algorithm for List-Decodable Covariance Estimation in Relative Frobenius Norm

  • Authors: Ilias Diakonikolas, Daniel M. Kane, Jasper C. H. Lee, Ankit Pensia, Thanasis Pittas
  • Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2305.00966
  • Pdf link: https://arxiv.org/pdf/2305.00966
  • Abstract
    We study the problem of list-decodable Gaussian covariance estimation. Given a multiset $T$ of $n$ points in $\mathbb R^d$ such that an unknown $\alpha&lt;1/2$ fraction of points in $T$ are i.i.d. samples from an unknown Gaussian $\mathcal{N}(\mu, \Sigma)$, the goal is to output a list of $O(1/\alpha)$ hypotheses at least one of which is close to $\Sigma$ in relative Frobenius norm. Our main result is a $\mathrm{poly}(d,1/\alpha)$ sample and time algorithm for this task that guarantees relative Frobenius norm error of $\mathrm{poly}(1/\alpha)$. Importantly, our algorithm relies purely on spectral techniques. As a corollary, we obtain an efficient spectral algorithm for robust partial clustering of Gaussian mixture models (GMMs) -- a key ingredient in the recent work of [BDJ+22] on robustly learning arbitrary GMMs. Combined with the other components of [BDJ+22], our new method yields the first Sum-of-Squares-free algorithm for robustly learning GMMs. At the technical level, we develop a novel multi-filtering method for list-decodable covariance estimation that may be useful in other settings.

Keyword: faster

Neural Network Accelerated Process Design of Polycrystalline Microstructures

  • Authors: Junrong Lin, Mahmudul Hasan, Pinar Acarb, Jose Blanchet, Vahid Tarokh
  • Subjects: Computational Engineering, Finance, and Science (cs.CE); Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00003
  • Pdf link: https://arxiv.org/pdf/2305.00003
  • Abstract
    Computational experiments are exploited in finding a well-designed processing path to optimize material structures for desired properties. This requires understanding the interplay between the processing-(micro)structure-property linkages using a multi-scale approach that connects the macro-scale (process parameters) to meso (homogenized properties) and micro (crystallographic texture) scales. Due to the nature of the problem's multi-scale modeling setup, possible processing path choices could grow exponentially as the decision tree becomes deeper, and the traditional simulators' speed reaches a critical computational threshold. To lessen the computational burden for predicting microstructural evolution under given loading conditions, we develop a neural network (NN)-based method with physics-infused constraints. The NN aims to learn the evolution of microstructures under each elementary process. Our method is effective and robust in finding optimal processing paths. In this study, our NN-based method is applied to maximize the homogenized stiffness of a Copper microstructure, and it is found to be 686 times faster while achieving 0.053% error in the resulting homogenized stiffness compared to the traditional finite element simulator on a 10-process experiment.

LAVA: Data Valuation without Pre-Specified Learning Algorithms

  • Authors: Hoang Anh Just, Feiyang Kang, Jiachen T. Wang, Yi Zeng, Myeongseob Ko, Ming Jin, Ruoxi Jia
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2305.00054
  • Pdf link: https://arxiv.org/pdf/2305.00054
  • Abstract
    Traditionally, data valuation is posed as a problem of equitably splitting the validation performance of a learning algorithm among the training data. As a result, the calculated data values depend on many design choices of the underlying learning algorithm. However, this dependence is undesirable for many use cases of data valuation, such as setting priorities over different data sources in a data acquisition process and informing pricing mechanisms in a data marketplace. In these scenarios, data needs to be valued before the actual analysis and the choice of the learning algorithm is still undetermined then. Another side-effect of the dependence is that to assess the value of individual points, one needs to re-run the learning algorithm with and without a point, which incurs a large computation burden. This work leapfrogs over the current limits of data valuation methods by introducing a new framework that can value training data in a way that is oblivious to the downstream learning algorithm. (1) We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between the training and the validation set. We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions. (2) We develop a novel method to value individual data based on the sensitivity analysis of the class-wise Wasserstein distance. Importantly, these values can be directly obtained for free from the output of off-the-shelf optimization solvers when computing the distance. (3) We evaluate our new data valuation framework over various use cases related to detecting low-quality data and show that, surprisingly, the learning-agnostic feature of our framework enables a significant improvement over the state-of-the-art performance while being orders of magnitude faster.

Instruction-ViT: Multi-Modal Prompts for Instruction Learning in ViT

  • Authors: Zhenxiang Xiao, Yuzhong Chen, Lu Zhang, Junjie Yao, Zihao Wu, Xiaowei Yu, Yi Pan, Lin Zhao, Chong Ma, Xinyu Liu, Wei Liu, Xiang Li, Yixuan Yuan, Dinggang Shen, Dajiang Zhu, Tianming Liu, Xi Jiang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00201
  • Pdf link: https://arxiv.org/pdf/2305.00201
  • Abstract
    Prompts have been proven to play a crucial role in large language models, and in recent years, vision models have also been using prompts to improve scalability for multiple downstream tasks. In this paper, we focus on adapting prompt design based on instruction tuning into a visual transformer model for image classification which we called Instruction-ViT. The key idea is to implement multi-modal prompts (text or image prompt) related to category information to guide the fine-tuning of the model. Based on the experiments of several image captionining tasks, the performance and domain adaptability were improved. Our work provided an innovative strategy to fuse multi-modal prompts with better performance and faster adaptability for visual classification models.

Physics-Guided Graph Neural Networks for Real-time AC/DC Power Flow Analysis

  • Authors: Mei Yang, Gao Qiu, Yong Wu, Junyong Liu, Nina Dai, Yue Shui, Kai Liu, Lijie Ding
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00216
  • Pdf link: https://arxiv.org/pdf/2305.00216
  • Abstract
    The increasing scale of alternating current and direct current (AC/DC) hybrid systems necessitates a faster power flow analysis tool than ever. This letter thus proposes a specific physics-guided graph neural network (PG-GNN). The tailored graph modelling of AC and DC grids is firstly advanced to enhance the topology adaptability of the PG-GNN. To eschew unreliable experience emulation from data, AC/DC physics are embedded in the PG-GNN using duality. Augmented Lagrangian method-based learning scheme is then presented to help the PG-GNN better learn nonconvex patterns in an unsupervised label-free manner. Multi-PG-GNN is finally conducted to master varied DC control modes. Case study shows that, relative to the other 7 data-driven rivals, only the proposed method matches the performance of the model-based benchmark, also beats it in computational efficiency beyond 10 times.

Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning

  • Authors: Mingyang Wang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Hang Su, Chenguang Yang, Kai Huang, Alois Knoll
  • Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.00286
  • Pdf link: https://arxiv.org/pdf/2305.00286
  • Abstract
    Meta-reinforcement learning enables artificial agents to learn from related training tasks and adapt to new tasks efficiently with minimal interaction data. However, most existing research is still limited to narrow task distributions that are parametric and stationary, and does not consider out-of-distribution tasks during the evaluation, thus, restricting its application. In this paper, we propose MoSS, a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning to address this challenge. We extend meta-RL to broad non-parametric task distributions which have never been explored before, and also achieve state-of-the-art results in non-stationary and out-of-distribution tasks. Specifically, MoSS consists of a task inference module and a policy module. We utilize the Gaussian mixture model for task representation to imitate the parametric and non-parametric task variations. Additionally, our online adaptation strategy enables the agent to react at the first sight of a task change, thus being applicable in non-stationary tasks. MoSS also exhibits strong generalization robustness in out-of-distributions tasks which benefits from the reliable and robust task representation. The policy is built on top of an off-policy RL algorithm and the entire network is trained completely off-policy to ensure high sample efficiency. On MuJoCo and Meta-World benchmarks, MoSS outperforms prior works in terms of asymptotic performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization robustness on broad and diverse task distributions.

A Simulation-Augmented Benchmarking Framework for Automatic RSO Streak Detection in Single-Frame Space Images

  • Authors: Zhe Chen, Yang Yang, Anne Bettens, Youngho Eun, Xiaofeng Wu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00412
  • Pdf link: https://arxiv.org/pdf/2305.00412
  • Abstract
    Detecting Resident Space Objects (RSOs) and preventing collisions with other satellites is crucial. Recently, deep convolutional neural networks (DCNNs) have shown superior performance in object detection when large-scale datasets are available. However, collecting rich data of RSOs is difficult due to very few occurrences in the space images. Without sufficient data, it is challenging to comprehensively train DCNN detectors and make them effective for detecting RSOs in space images, let alone to estimate whether a detector is sufficiently robust. The lack of meaningful evaluation of different detectors could further affect the design and application of detection methods. To tackle this issue, we propose that the space images containing RSOs can be simulated to complement the shortage of raw data for better benchmarking. Accordingly, we introduce a novel simulation-augmented benchmarking framework for RSO detection (SAB-RSOD). In our framework, by making the best use of the hardware parameters of the sensor that captures real-world space images, we first develop a high-fidelity RSO simulator that can generate various realistic space images. Then, we use this simulator to generate images that contain diversified RSOs in space and annotate them automatically. Later, we mix the synthetic images with the real-world images, obtaining around 500 images for training with only the real-world images for evaluation. Under SAB-RSOD, we can train different popular object detectors like Yolo and Faster RCNN effectively, enabling us to evaluate their performance thoroughly. The evaluation results have shown that the amount of available data and image resolution are two key factors for robust RSO detection. Moreover, if using a lower resolution for higher efficiency, we demonstrated that a simple UNet-based detection method can already access high detection accuracy.

Guaranteed Evader Detection in Multi-Agent Search Tasks using Pincer Trajectories

  • Authors: Roee M. Francos, Alfred M. Bruckstein
  • Subjects: Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2305.00533
  • Pdf link: https://arxiv.org/pdf/2305.00533
  • Abstract
    Assume that inside an initial planar area there are smart mobile evaders attempting to avoid detection by a team of sweeping searching agents. All sweepers detect evaders with fan-shaped sensors, modeling the field of view of real cameras. Detection of all evaders is guaranteed with cooperative sweeping strategies, by setting requirements on sweepers' speed, and by carefully designing their trajectories. Assume the smart evaders have an upper limit on their speed which is a-priori known to the sweeping team. An easier task for the team of sweepers is to confine evaders to the domain in which they are initially located. The sweepers accomplish the confinement task if they move sufficiently fast and detect evaders by applying an appropriate search strategy. Any given search strategy results in a minimal sweeper's speed in order to be able to detect all evaders. The minimal speed guarantees the ability of the sweeping team to confine evaders to their original domain, and if the sweepers move faster they are able to detect all evaders that are present in the region. We present results on the total search time for a novel pincer-movement based search protocol that utilizes complementary trajectories along with adaptive sensor geometries for any even number of pursuers.

Containerization of a polyglot microservice application using Docker and Kubernetes

  • Authors: Vamsi Krishna Yepuri, Venkata Kalyan Polamarasetty, Shivani Donthi, Ajay Kumar Reddy Gondi
  • Subjects: Software Engineering (cs.SE); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2305.00600
  • Pdf link: https://arxiv.org/pdf/2305.00600
  • Abstract
    This project investigates the benefits of containerization technology in modern software development and deployment. The study emphasizes the advantages of using Kubernetes and Docker in the development process, including the easy packaging and deployment of microservices, efficient resource utilization, faster startup times, and greater scalability and flexibility. The project concludes by proposing a study that involves creating a polyglot microservice application using Java, Python, and JavaScript, containerizing it with Docker, and deploying it in Kubernetes. The study aims to evaluate service discovery and auto-scaling in distributed mode and compare the performance metrics with virtual machines and containers. The results of this study can inform software development teams about the benefits of containerization in modern software development and deployment.

GTree: GPU-Friendly Privacy-preserving Decision Tree Training and Inference

  • Authors: Qifan Wang, Shujie Cui, Lei Zhou, Ye Dong, Jianli Bai, Yun Sing Koh, Giovanni Russello
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2305.00645
  • Pdf link: https://arxiv.org/pdf/2305.00645
  • Abstract
    Decision tree (DT) is a widely used machine learning model due to its versatility, speed, and interpretability. However, for privacy-sensitive applications, outsourcing DT training and inference to cloud platforms raise concerns about data privacy. Researchers have developed privacy-preserving approaches for DT training and inference using cryptographic primitives, such as Secure Multi-Party Computation (MPC). While these approaches have shown progress, they still suffer from heavy computation and communication overheads. Few recent works employ Graphical Processing Units (GPU) to improve the performance of MPC-protected deep learning. This raises a natural question: \textit{can MPC-protected DT training and inference be accelerated by GPU?} We present GTree, the first scheme that uses GPU to accelerate MPC-protected secure DT training and inference. GTree is built across 3 parties who securely and jointly perform each step of DT training and inference with GPU. Each MPC protocol in GTree is designed in a GPU-friendly version. The performance evaluation shows that GTree achieves ${\thicksim}11{\times}$ and ${\thicksim}21{\times}$ improvements in training SPECT and Adult datasets, compared to the prior most efficient CPU-based work. For inference, GTree shows its superior efficiency when the DT has less than 10 levels, which is $126\times$ faster than the prior most efficient work when inferring $10^4$ instances with a tree of 7 levels. GTree also achieves a stronger security guarantee than prior solutions, which only leaks the tree depth and size of data samples while prior solutions also leak the tree structure. With \textit{oblivious array access}, the access pattern on GPU is also protected.

File Fragment Classification using Light-Weight Convolutional Neural Networks

  • Authors: Mustafa Ghaleb, Kunwar Saaim, Muhamad Felemban, Saleh Al-Saleh, Ahmad Al-Mulhem
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2305.00656
  • Pdf link: https://arxiv.org/pdf/2305.00656
  • Abstract
    In digital forensics, file fragment classification is an important step toward completing file carving process. There exist several techniques to identify the type of file fragments without relying on meta-data, such as using features like header/footer and N-gram to identify the fragment type. Recently, convolutional neural network (CNN) models have been used to build classification models to achieve this task. However, the number of parameters in CNNs tends to grow exponentially as the number of layers increases. This results in a dramatic increase in training and inference time. In this paper, we propose light-weight file fragment classification models based on depthwise separable CNNs. The evaluation results show that our proposed models provide faster inference time with comparable accuracy as compared to the state-of-art CNN based models. In particular, our models were able to achieve an accuracy of 79% on the FFT-75 dataset with nearly 100K parameters and 164M FLOPs, which is 4x smaller and 6x faster than the state-of-the-art classifier in the literature.

Event Camera as Region Proposal Network

  • Authors: Shrutarv Awasthi, Anas Gouda, Richard Julian Lodenkaemper, Moritz Roidl
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00718
  • Pdf link: https://arxiv.org/pdf/2305.00718
  • Abstract
    The human eye consists of two types of photoreceptors, rods and cones. Rods are responsible for monochrome vision, and cones for color vision. The number of rods is much higher than the cones, which means that most human vision processing is done in monochrome. An event camera reports the change in pixel intensity and is analogous to rods. Event and color cameras in computer vision are like rods and cones in human vision. Humans can notice objects moving in the peripheral vision (far right and left), but we cannot classify them (think of someone passing by on your far left or far right, this can trigger your attention without knowing who they are). Thus, rods act as a region proposal network (RPN) in human vision. Therefore, an event camera can act as a region proposal network in deep learning Two-stage object detectors in deep learning, such as Mask R-CNN, consist of a backbone for feature extraction and a RPN. Currently, RPN uses the brute force method by trying out all the possible bounding boxes to detect an object. This requires much computation time to generate region proposals making two-stage detectors inconvenient for fast applications. This work replaces the RPN in Mask-RCNN of detectron2 with an event camera for generating proposals for moving objects. Thus, saving time and being computationally less expensive. The proposed approach is faster than the two-stage detectors with comparable accuracy

DNS Privacy with Speed? Evaluating DNS over QUIC and its Impact on Web Performance

  • Authors: Mike Kosek, Luca Schumann, Robin Marx, Trinh Viet Doan, Vaibhav Bajpai
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2305.00790
  • Pdf link: https://arxiv.org/pdf/2305.00790
  • Abstract
    Over the last decade, Web traffic has significantly shifted towards HTTPS due to an increased awareness for privacy. However, DNS traffic is still largely unencrypted, which allows user profiles to be derived from plaintext DNS queries. While DNS over TLS (DoT) and DNS over HTTPS (DoH) address this problem by leveraging transport encryption for DNS, both protocols are constrained by the underlying transport (TCP) and encryption (TLS) protocols, requiring multiple round-trips to establish a secure connection. In contrast, QUIC combines the transport and cryptographic handshake into a single round-trip, which allows the recently standardized DNS over QUIC (DoQ) to provide DNS privacy with minimal latency. In the first study of its kind, we perform distributed DoQ measurements across multiple vantage points to evaluate the impact of DoQ on Web performance. We find that DoQ excels over DoH, leading to significant improvements with up to 10% faster loads for simple webpages. With increasing complexity of webpages, DoQ even catches up to DNS over UDP (DoUDP) as the cost of encryption amortizes: With DoQ being only ~2% slower than DoUDP, encrypted DNS becomes much more appealing for the Web.

A comparison of methods to eliminate regularization weight tuning from data-enabled predictive control

  • Authors: Manuel Koch, Colin N. Jones
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.00807
  • Pdf link: https://arxiv.org/pdf/2305.00807
  • Abstract
    Data-enabled predictive control (DeePC) is a recently established form of Model Predictive Control (MPC), based on behavioral systems theory. While eliminating the need to explicitly identify a model, it requires an additional regularization with a corresponding weight to function well with noisy data. The tuning of this weight is non-trivial and has a significant impact on performance. In this paper, we compare three reformulations of DeePC that either eliminate the regularization, or simplify the tuning to a trivial point. A building simulation study shows a comparable performance for all three reformulations of DeePC. However, a conventional MPC with a black-box model slightly outperforms them, while solving much faster, and yielding smoother optimal trajectories. Two of the DeePC variants also show sensitivity to an unobserved biased input noise, which is not present in the conventional MPC.

StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video

  • Authors: Lizhen Wang, Xiaochen Zhao, Jingxiang Sun, Yuxiang Zhang, Hongwen Zhang, Tao Yu, Yebin Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00942
  • Pdf link: https://arxiv.org/pdf/2305.00942
  • Abstract
    Face reenactment methods attempt to restore and re-animate portrait videos as realistically as possible. Existing methods face a dilemma in quality versus controllability: 2D GAN-based methods achieve higher image quality but suffer in fine-grained control of facial attributes compared with 3D counterparts. In this work, we propose StyleAvatar, a real-time photo-realistic portrait avatar reconstruction method using StyleGAN-based networks, which can generate high-fidelity portrait avatars with faithful expression control. We expand the capabilities of StyleGAN by introducing a compositional representation and a sliding window augmentation method, which enable faster convergence and improve translation generalization. Specifically, we divide the portrait scenes into three parts for adaptive adjustments: facial region, non-facial foreground region, and the background. Besides, our network leverages the best of UNet, StyleGAN and time coding for video learning, which enables high-quality video generation. Furthermore, a sliding window augmentation method together with a pre-training strategy are proposed to improve translation generalization and training performance, respectively. The proposed network can converge within two hours while ensuring high image quality and a forward rendering time of only 20 milliseconds. Furthermore, we propose a real-time live system, which further pushes research into applications. Results and experiments demonstrate the superiority of our method in terms of image quality, full portrait video generation, and real-time re-animation compared to existing facial reenactment methods. Training and inference code for this paper are at https://github.com/LizhenWangT/StyleAvatar.

Keyword: mobile

Wearing face mask detection using deep learning through COVID-19 pandemic

  • Authors: Javad Khoramdel, Soheila Hatami, Majid Sadedel
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00068
  • Pdf link: https://arxiv.org/pdf/2305.00068
  • Abstract
    During the COVID-19 pandemic, wearing a face mask has been known to be an effective way to prevent the spread of COVID-19. In lots of monitoring tasks, humans have been replaced with computers thanks to the outstanding performance of the deep learning models. Monitoring the wearing of a face mask is another task that can be done by deep learning models with acceptable accuracy. The main challenge of this task is the limited amount of data because of the quarantine. In this paper, we did an investigation on the capability of three state-of-the-art object detection neural networks on face mask detection for real-time applications. As mentioned, here are three models used, Single Shot Detector (SSD), two versions of You Only Look Once (YOLO) i.e., YOLOv4-tiny, and YOLOv4-tiny-3l from which the best was selected. In the proposed method, according to the performance of different models, the best model that can be suitable for use in real-world and mobile device applications in comparison to other recent studies was the YOLOv4-tiny model, with 85.31% and 50.66 for mean Average Precision (mAP) and Frames Per Second (FPS), respectively. These acceptable values were achieved using two datasets with only 1531 images in three separate classes.

Asynchronous Distributed Protocol for Service Provisioning in the Edge-Cloud Continuum

  • Authors: Itamar Cohen, Paolo Giaccone, Carla Fabiana Chiasserini
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2305.00184
  • Pdf link: https://arxiv.org/pdf/2305.00184
  • Abstract
    In the edge-cloud continuum, datacenters provide microservices (MSs) to mobile users, with each MS having specific latency constraints and computational requirements. Deploying such a variety of MSs matching their requirements with the available computing resources is challenging. In addition, time-critical MSs may have to be migrated as the users move, to keep meeting their latency constraints. Unlike previous work relying on a central orchestrator with an always-updated global view of the available resources and of the users' locations, this work envisions a distributed solution to the above issues. In particular, we propose a distributed asynchronous protocol for MS deployment in the cloud-edge continuum that (i) dramatically reduces the system overhead compared to a centralized approach, and (ii) increases the system stability by avoiding having a single point of failure as in the case of a central orchestrator. Our solution ensures cost-efficient feasible placement of MSs, while using negligible bandwidth.

STAR-RIS-Aided Mobile Edge Computing: Computation Rate Maximization with Binary Amplitude Coefficients

  • Authors: Zhenrong Liu, Zongze Li, Miaowen Wen, Yi Gong, Yik-Chung Wu
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.00428
  • Pdf link: https://arxiv.org/pdf/2305.00428
  • Abstract
    In this paper, simultaneously transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS) is investigated in the multi-user mobile edge computing (MEC) system to improve the computation rate. Compared with traditional RIS-aided MEC, STAR-RIS extends the service coverage from half-space to full-space and provides new flexibility for improving the computation rate for end users. However, the STAR-RIS-aided MEC system design is a challenging problem due to the non-smooth and non-convex binary amplitude coefficients with coupled phase shifters. To fill this gap, this paper formulates a computation rate maximization problem via the joint design of the STAR-RIS phase shifts, reflection and transmission amplitude coefficients, the receive beamforming vectors, and energy partition strategies for local computing and offloading. To tackle the discontinuity caused by binary variables, we propose an efficient smoothing-based method to decrease convergence error, in contrast to the conventional penalty-based method, which brings many undesired stationary points and local optima. Furthermore, a fast iterative algorithm is proposed to obtain a stationary point for the joint optimization problem, with each subproblem solved by a low-complexity algorithm, making the proposed design scalable to a massive number of users and STAR-RIS elements. Simulation results validate the strength of the proposed smoothing-based method and show that the proposed fast iterative algorithm achieves a higher computation rate than the conventional method while saving the computation time by at least an order of magnitude. Moreover, the resultant STAR-RIS-aided MEC system significantly improves the computation rate compared to other baseline schemes with conventional reflect-only/transmit-only RIS.

Guaranteed Evader Detection in Multi-Agent Search Tasks using Pincer Trajectories

  • Authors: Roee M. Francos, Alfred M. Bruckstein
  • Subjects: Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2305.00533
  • Pdf link: https://arxiv.org/pdf/2305.00533
  • Abstract
    Assume that inside an initial planar area there are smart mobile evaders attempting to avoid detection by a team of sweeping searching agents. All sweepers detect evaders with fan-shaped sensors, modeling the field of view of real cameras. Detection of all evaders is guaranteed with cooperative sweeping strategies, by setting requirements on sweepers' speed, and by carefully designing their trajectories. Assume the smart evaders have an upper limit on their speed which is a-priori known to the sweeping team. An easier task for the team of sweepers is to confine evaders to the domain in which they are initially located. The sweepers accomplish the confinement task if they move sufficiently fast and detect evaders by applying an appropriate search strategy. Any given search strategy results in a minimal sweeper's speed in order to be able to detect all evaders. The minimal speed guarantees the ability of the sweeping team to confine evaders to their original domain, and if the sweepers move faster they are able to detect all evaders that are present in the region. We present results on the total search time for a novel pincer-movement based search protocol that utilizes complementary trajectories along with adaptive sensor geometries for any even number of pursuers.

Self-supervised Activity Representation Learning with Incremental Data: An Empirical Study

  • Authors: Jason Liu, Shohreh Deldari, Hao Xue, Van Nguyen, Flora D. Salim
  • Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.00619
  • Pdf link: https://arxiv.org/pdf/2305.00619
  • Abstract
    In the context of mobile sensing environments, various sensors on mobile devices continually generate a vast amount of data. Analyzing this ever-increasing data presents several challenges, including limited access to annotated data and a constantly changing environment. Recent advancements in self-supervised learning have been utilized as a pre-training step to enhance the performance of conventional supervised models to address the absence of labelled datasets. This research examines the impact of using a self-supervised representation learning model for time series classification tasks in which data is incrementally available. We proposed and evaluated a workflow in which a model learns to extract informative features using a corpus of unlabeled time series data and then conducts classification on labelled data using features extracted by the model. We analyzed the effect of varying the size, distribution, and source of the unlabeled data on the final classification performance across four public datasets, including various types of sensors in diverse applications.

Emotions Beyond Words: Non-Speech Audio Emotion Recognition With Edge Computing

  • Authors: Ibrahim Malik, Siddique Latif, Sanaullah Manzoor, Muhammad Usama, Junaid Qadir, Raja Jurdak
  • Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2305.00725
  • Pdf link: https://arxiv.org/pdf/2305.00725
  • Abstract
    Non-speech emotion recognition has a wide range of applications including healthcare, crime control and rescue, and entertainment, to name a few. Providing these applications using edge computing has great potential, however, recent studies are focused on speech-emotion recognition using complex architectures. In this paper, a non-speech-based emotion recognition system is proposed, which can rely on edge computing to analyse emotions conveyed through non-speech expressions like screaming and crying. In particular, we explore knowledge distillation to design a computationally efficient system that can be deployed on edge devices with limited resources without degrading the performance significantly. We comprehensively evaluate our proposed framework using two publicly available datasets and highlight its effectiveness by comparing the results with the well-known MobileNet model. Our results demonstrate the feasibility and effectiveness of using edge computing for non-speech emotion detection, which can potentially improve applications that rely on emotion detection in communication networks. To the best of our knowledge, this is the first work on an edge-computing-based framework for detecting emotions in non-speech audio, offering promising directions for future research.

AI-based Radio and Computing Resource Allocation and Path Planning in NOMA NTNs: AoI Minimization under CSI Uncertainty

  • Authors: Maryam Ansarifard, Nader Mokari, Mohammadreza Javan, Hamid Saeedi, Eduard A. Jorswieck
  • Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.00780
  • Pdf link: https://arxiv.org/pdf/2305.00780
  • Abstract
    In this paper, we develop a hierarchical aerial computing framework composed of high altitude platform (HAP) and unmanned aerial vehicles (UAVs) to compute the fully offloaded tasks of terrestrial mobile users which are connected through an uplink non-orthogonal multiple access (UL-NOMA). In particular, the problem is formulated to minimize the AoI of all users with elastic tasks, by adjusting UAVs trajectory and resource allocation on both UAVs and HAP, which is restricted by the channel state information (CSI) uncertainty and multiple resource constraints of UAVs and HAP. In order to solve this non-convex optimization problem, two methods of multi-agent deep deterministic policy gradient (MADDPG) and federated reinforcement learning (FRL) are proposed to design the UAVs trajectory and obtain channel, power, and CPU allocations. It is shown that task scheduling significantly reduces the average AoI. This improvement is more pronounced for larger task sizes. On the one hand, it is shown that power allocation has a marginal effect on the average AoI compared to using full transmission power for all users. On the other hand, compared with traditional transmissions (fixed method) simulation result shows that our scheduling scheme has a lower average AoI.

Performance and Energy Consumption of Parallel Machine Learning Algorithms

  • Authors: Xidong Wu, Preston Brazzle, Stephen Cahoon
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00798
  • Pdf link: https://arxiv.org/pdf/2305.00798
  • Abstract
    Machine learning models have achieved remarkable success in various real-world applications such as data science, computer vision, and natural language processing. However, model training in machine learning requires large-scale data sets and multiple iterations before it can work properly. Parallelization of training algorithms is a common strategy to speed up the process of training. However, many studies on model training and inference focus only on aspects of performance. Power consumption is also an important metric for any type of computation, especially high-performance applications. Machine learning algorithms that can be used on low-power platforms such as sensors and mobile devices have been researched, but less power optimization is done for algorithms designed for high-performance computing. In this paper, we present a C++ implementation of logistic regression and the genetic algorithm, and a Python implementation of neural networks with stochastic gradient descent (SGD) algorithm on classification tasks. We will show the impact that the complexity of the model and the size of the training data have on the parallel efficiency of the algorithm in terms of both power and performance. We also tested these implementations using shard-memory parallelism, distributed memory parallelism, and GPU acceleration to speed up machine learning model training.

Population Protocols with Unordered Data

  • Authors: Michael Blondin, François Ladouceur
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2305.00872
  • Pdf link: https://arxiv.org/pdf/2305.00872
  • Abstract
    Population protocols form a well-established model of computation of passively mobile anonymous agents with constant-size memory. It is well known that population protocols compute Presburger-definable predicates, such as absolute majority and counting predicates. In this work, we initiate the study of population protocols operating over arbitrarily large data domains. More precisely, we introduce population protocols with unordered data as a formalism to reason about anonymous crowd computing over unordered sequences of data. We first show that it is possible to determine whether an unordered sequence from an infinite data domain has a datum with absolute majority. We then establish the expressive power of the immediate observation restriction of our model, namely where, in each interaction, an agent observes another agent who is unaware of the interaction.

Analysis of reward mechanism for quizmarket

  • Authors: Noorul Ali
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2305.00915
  • Pdf link: https://arxiv.org/pdf/2305.00915
  • Abstract
    A reward algorithm is needed for games which rewards risk, i.e. early play, and extends the longevity of a reward pool. This would allow a higher number of players and greater engagement. I created a reward mechanism that rewards risk, lasts longer, and is more profitable than existing mechanisms. I also implemented an algorithm within the mechanism to self-correct in outlier performance. This reward mechanism was used in TURBLAZE, a mobile game designed for high school students. The game has quizzes. Gamers pay a fixed fee to participate in a quiz and win a reward if their score is above a certain threshold.

Keyword: pruning

There is no result

Keyword: voxel

An Efficient Plane Extraction Approach for Bundle Adjustment on LiDAR Point clouds

  • Authors: Zheng Liu, Fu Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.00287
  • Pdf link: https://arxiv.org/pdf/2305.00287
  • Abstract
    Bundle adjustment (BA) on LiDAR point clouds has been extensively investigated in recent years due to its ability to optimize multiple poses together, resulting in high accuracy and global consistency for point cloud. However, the accuracy and speed of LiDAR bundle adjustment depend on the quality of plane extraction, which provides point association for LiDAR BA. In this study, we propose a novel and efficient voxel-based approach for plane extraction that is specially designed to provide point association for LiDAR bundle adjustment. To begin, we partition the space into multiple voxels of a fixed size and then split these root voxels based on whether the points are on the same plane, using an octree structure. We also design a novel plane determination method based on principle component analysis (PCA), which segments the points into four even quarters and compare their minimum eigenvalues with that of the initial point cloud. Finally, we adopt a plane merging method to prevent too many small planes from being in a single voxel, which can increase the optimization time required for BA. Our experimental results on HILTI demonstrate that our approach achieves the best precision and least time cost compared to other plane extraction methods.

Object-Centric Voxelization of Dynamic Scenes via Inverse Neural Rendering

  • Authors: Siyu Gao, Yanpeng Zhao, Yunbo Wang, Xiaokang Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00393
  • Pdf link: https://arxiv.org/pdf/2305.00393
  • Abstract
    Understanding the compositional dynamics of the world in unsupervised 3D scenarios is challenging. Existing approaches either fail to make effective use of time cues or ignore the multi-view consistency of scene decomposition. In this paper, we propose DynaVol, an inverse neural rendering framework that provides a pilot study for learning time-varying volumetric representations for dynamic scenes with multiple entities (like objects). It has two main contributions. First, it maintains a time-dependent 3D grid, which dynamically and flexibly binds the spatial locations to different entities, thus encouraging the separation of information at a representational level. Second, our approach jointly learns grid-level local dynamics, object-level global dynamics, and the compositional neural radiance fields in an end-to-end architecture, thereby enhancing the spatiotemporal consistency of object-centric scene voxelization. We present a two-stage training scheme for DynaVol and validate its effectiveness on various benchmarks with multiple objects, diverse dynamics, and real-world shapes and textures. We present visualization at https://sites.google.com/view/dynavol-visual.

Towards Computational Architecture of Liberty: A Comprehensive Survey on Deep Learning for Generating Virtual Architecture in the Metaverse

  • Authors: Anqi Wang, Jiahua Dong, Jiachuan Shen, Lik-Hang Lee, Pan Hui
  • Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00510
  • Pdf link: https://arxiv.org/pdf/2305.00510
  • Abstract
    3D shape generation techniques utilizing deep learning are increasing attention from both computer vision and architectural design. This survey focuses on investigating and comparing the current latest approaches to 3D object generation with deep generative models (DGMs), including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), 3D-aware images, and diffusion models. We discuss 187 articles (80.7% of articles published between 2018-2022) to review the field of generated possibilities of architecture in virtual environments, limited to the architecture form. We provide an overview of architectural research, virtual environment, and related technical approaches, followed by a review of recent trends in discrete voxel generation, 3D models generated from 2D images, and conditional parameters. We highlight under-explored issues in 3D generation and parameterized control that is worth further investigation. Moreover, we speculate that four research agendas including data limitation, editability, evaluation metrics, and human-computer interaction are important enablers of ubiquitous interaction with immersive systems in architecture for computer-aided design Our work contributes to researchers' understanding of the current potential and future needs of deep learnings in generating virtual architecture.

Learning Self-Prior for Mesh Inpainting Using Self-Supervised Graph Convolutional Networks

  • Authors: Shota Hattori, Tatsuya Yatagawa, Yutaka Ohtake, Hiromasa Suzuki
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00635
  • Pdf link: https://arxiv.org/pdf/2305.00635
  • Abstract
    This study presents a self-prior-based mesh inpainting framework that requires only an incomplete mesh as input, without the need for any training datasets. Additionally, our method maintains the polygonal mesh format throughout the inpainting process without converting the shape format to an intermediate, such as a voxel grid, a point cloud, or an implicit function, which are typically considered easier for deep neural networks to process. To achieve this goal, we introduce two graph convolutional networks (GCNs): single-resolution GCN (SGCN) and multi-resolution GCN (MGCN), both trained in a self-supervised manner. Our approach refines a watertight mesh obtained from the initial hole filling to generate a completed output mesh. Specifically, we train the GCNs to deform an oversmoothed version of the input mesh into the expected completed shape. To supervise the GCNs for accurate vertex displacements, despite the unknown correct displacements at real holes, we utilize multiple sets of meshes with several connected regions marked as fake holes. The correct displacements are known for vertices in these fake holes, enabling network training with loss functions that assess the accuracy of displacement vectors estimated by the GCNs. We demonstrate that our method outperforms traditional dataset-independent approaches and exhibits greater robustness compared to other deep-learning-based methods for shapes that less frequently appear in shape datasets.

Keyword: lidar

DSEC-MOS: Segment Any Moving Object with Moving Ego Vehicle

  • Authors: Zhuyun Zhou, Zongwei Wu, Rémi Boutteau, Fan Yang, Dominique Ginhac
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.00126
  • Pdf link: https://arxiv.org/pdf/2305.00126
  • Abstract
    Moving Object Segmentation (MOS), a crucial task in computer vision, has numerous applications such as surveillance, autonomous driving, and video analytics. Existing datasets for moving object segmentation mainly focus on RGB or Lidar videos, but lack additional event information that can enhance the understanding of dynamic scenes. To address this limitation, we propose a novel dataset, called DSEC-MOS. Our dataset includes frames captured by RGB cameras embedded on moving vehicules and incorporates event data, which provide high temporal resolution and low-latency information about changes in the scenes. To generate accurate segmentation mask annotations for moving objects, we apply the recently emerged large model SAM - Segment Anything Model - with moving object bounding boxes from DSEC-MOD serving as prompts and calibrated RGB frames, then further revise the results. Our DSEC-MOS dataset contains in total 16 sequences (13314 images). To the best of our knowledge, DSEC-MOS is also the first moving object segmentation dataset that includes event camera in autonomous driving. Project Page: https://github.com/ZZY-Zhou/DSEC-MOS.

Sensor Equivariance by LiDAR Projection Images

  • Authors: Hannes Reichert, Manuel Hetzel, Steven Schreck, Konrad Doll, Bernhard Sick
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2305.00221
  • Pdf link: https://arxiv.org/pdf/2305.00221
  • Abstract
    In this work, we propose an extension of conventional image data by an additional channel in which the associated projection properties are encoded. This addresses the issue of sensor-dependent object representation in projection-based sensors, such as LiDAR, which can lead to distorted physical and geometric properties due to variations in sensor resolution and field of view. To that end, we propose an architecture for processing this data in an instance segmentation framework. We focus specifically on LiDAR as a key sensor modality for machine vision tasks and highly automated driving (HAD). Through an experimental setup in a controlled synthetic environment, we identify a bias on sensor resolution and field of view and demonstrate that our proposed method can reduce said bias for the task of LiDAR instance segmentation. Furthermore, we define our method such that it can be applied to other projection-based sensors, such as cameras. To promote transparency, we make our code and dataset publicly available. This method shows the potential to improve performance and robustness in various machine vision tasks that utilize projection-based sensors.

An Efficient Plane Extraction Approach for Bundle Adjustment on LiDAR Point clouds

  • Authors: Zheng Liu, Fu Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.00287
  • Pdf link: https://arxiv.org/pdf/2305.00287
  • Abstract
    Bundle adjustment (BA) on LiDAR point clouds has been extensively investigated in recent years due to its ability to optimize multiple poses together, resulting in high accuracy and global consistency for point cloud. However, the accuracy and speed of LiDAR bundle adjustment depend on the quality of plane extraction, which provides point association for LiDAR BA. In this study, we propose a novel and efficient voxel-based approach for plane extraction that is specially designed to provide point association for LiDAR bundle adjustment. To begin, we partition the space into multiple voxels of a fixed size and then split these root voxels based on whether the points are on the same plane, using an octree structure. We also design a novel plane determination method based on principle component analysis (PCA), which segments the points into four even quarters and compare their minimum eigenvalues with that of the initial point cloud. Finally, we adopt a plane merging method to prevent too many small planes from being in a single voxel, which can increase the optimization time required for BA. Our experimental results on HILTI demonstrate that our approach achieves the best precision and least time cost compared to other plane extraction methods.

InfraDet3D: Multi-Modal 3D Object Detection based on Roadside Infrastructure Camera and LiDAR Sensors

  • Authors: Walter Zimmer, Joseph Birkner, Marcel Brucker, Huu Tung Nguyen, Stefan Petrovski, Bohan Wang, Alois C. Knoll
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00314
  • Pdf link: https://arxiv.org/pdf/2305.00314
  • Abstract
    Current multi-modal object detection approaches focus on the vehicle domain and are limited in the perception range and the processing capabilities. Roadside sensor units (RSUs) introduce a new domain for perception systems and leverage altitude to observe traffic. Cameras and LiDARs mounted on gantry bridges increase the perception range and produce a full digital twin of the traffic. In this work, we introduce InfraDet3D, a multi-modal 3D object detector for roadside infrastructure sensors. We fuse two LiDARs using early fusion and further incorporate detections from monocular cameras to increase the robustness and to detect small objects. Our monocular 3D detection module uses HD maps to ground object yaw hypotheses, improving the final perception results. The perception framework is deployed on a real-world intersection that is part of the A9 Test Stretch in Munich, Germany. We perform several ablation studies and experiments and show that fusing two LiDARs with two cameras leads to an improvement of +1.90 mAP compared to a camera-only solution. We evaluate our results on the A9 infrastructure dataset and achieve 68.48 mAP on the test set. The dataset and code will be available at https://a9-dataset.com to allow the research community to further improve the perception results and make autonomous driving safer.

TransCAR: Transformer-based Camera-And-Radar Fusion for 3D Object Detection

  • Authors: Su Pang, Daniel Morris, Hayder Radha
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00397
  • Pdf link: https://arxiv.org/pdf/2305.00397
  • Abstract
    Despite radar's popularity in the automotive industry, for fusion-based 3D object detection, most existing works focus on LiDAR and camera fusion. In this paper, we propose TransCAR, a Transformer-based Camera-And-Radar fusion solution for 3D object detection. Our TransCAR consists of two modules. The first module learns 2D features from surround-view camera images and then uses a sparse set of 3D object queries to index into these 2D features. The vision-updated queries then interact with each other via transformer self-attention layer. The second module learns radar features from multiple radar scans and then applies transformer decoder to learn the interactions between radar features and vision-updated queries. The cross-attention layer within the transformer decoder can adaptively learn the soft-association between the radar features and vision-updated queries instead of hard-association based on sensor calibration only. Finally, our model estimates a bounding box per query using set-to-set Hungarian loss, which enables the method to avoid non-maximum suppression. TransCAR improves the velocity estimation using the radar scans without temporal information. The superior experimental results of our TransCAR on the challenging nuScenes datasets illustrate that our TransCAR outperforms state-of-the-art Camera-Radar fusion-based 3D object detection approaches.

LIMOT: A Tightly-Coupled System for LiDAR-Inertial Odometry and Multi-Object Tracking

  • Authors: Zhongyang Zhu, Junqiao Zhao, Xuebo Tian, Kai Huang, Chen Ye
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.00406
  • Pdf link: https://arxiv.org/pdf/2305.00406
  • Abstract
    Simultaneous localization and mapping (SLAM) is critical to the implementation of autonomous driving. Most LiDAR-inertial SLAM algorithms assume a static environment, leading to unreliable localization in dynamic environments. Furthermore, accurate tracking of moving objects is of great significance for the control and planning of autonomous vehicle operation. This study proposes LIMOT, a tightly-coupled multi-object tracking and LiDAR-inertial SLAM system capable of accurately estimating the poses of both ego-vehicle and objects. First, we use 3D bounding boxes generated by an object detector to represent all movable objects and perform LiDAR odometry using inertial measurement unit (IMU) pre-integration result. Based on the historical trajectories of tracked objects in a sliding window, we perform robust object association. We propose a trajectory-based dynamic feature filtering method, which filters out features belonging to moving objects by leveraging tracking results. Factor graph-based optimization is then conducted to optimize the bias of the IMU and the poses of both the ego-vehicle and surrounding objects in a sliding window. Experiments conducted on KITTI datasets show that our method achieves better pose and tracking accuracy than our previous work DL-SLOT and other SLAM and multi-object tracking baseline methods.

Keyword: diffusion

Unsupervised Discovery of 3D Hierarchical Structure with Generative Diffusion Features

  • Authors: Nurislam Tursynbek, Marc Niethammer
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00067
  • Pdf link: https://arxiv.org/pdf/2305.00067
  • Abstract
    Inspired by recent findings that generative diffusion models learn semantically meaningful representations, we use them to discover the intrinsic hierarchical structure in biomedical 3D images using unsupervised segmentation. We show that features of diffusion models from different stages of a U-Net-based ladder-like architecture capture different hierarchy levels in 3D biomedical images. We design three losses to train a predictive unsupervised segmentation network that encourages the decomposition of 3D volumes into meaningful nested subvolumes that represent a hierarchy. First, we pretrain 3D diffusion models and use the consistency of their features across subvolumes. Second, we use the visual consistency between subvolumes. Third, we use the invariance to photometric augmentations as a regularizer. Our models achieve better performance than prior unsupervised structure discovery approaches on challenging biologically-inspired synthetic datasets and on a real-world brain tumor MRI dataset.

Temporal Subsampling Diminishes Small Spatial Scales in Recurrent Neural Network Emulators of Geophysical Turbulence

  • Authors: Timothy A. Smith, Stephen G. Penny, Jason A. Platt, Tse-Chun Chen
  • Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph); Fluid Dynamics (physics.flu-dyn)
  • Arxiv link: https://arxiv.org/abs/2305.00100
  • Pdf link: https://arxiv.org/pdf/2305.00100
  • Abstract
    The immense computational cost of traditional numerical weather and climate models has sparked the development of machine learning (ML) based emulators. Because ML methods benefit from long records of training data, it is common to use datasets that are temporally subsampled relative to the time steps required for the numerical integration of differential equations. Here, we investigate how this often overlooked processing step affects the quality of an emulator's predictions. We implement two ML architectures from a class of methods called reservoir computing: (1) a form of Nonlinear Vector Autoregression (NVAR), and (2) an Echo State Network (ESN). Despite their simplicity, it is well documented that these architectures excel at predicting low dimensional chaotic dynamics. We are therefore motivated to test these architectures in an idealized setting of predicting high dimensional geophysical turbulence as represented by Surface Quasi-Geostrophic dynamics. In all cases, subsampling the training data consistently leads to an increased bias at small spatial scales that resembles numerical diffusion. Interestingly, the NVAR architecture becomes unstable when the temporal resolution is increased, indicating that the polynomial based interactions are insufficient at capturing the detailed nonlinearities of the turbulent flow. The ESN architecture is found to be more robust, suggesting a benefit to the more expensive but more general structure. Spectral errors are reduced by including a penalty on the kinetic energy density spectrum during training, although the subsampling related errors persist. Future work is warranted to understand how the temporal resolution of training data affects other ML architectures.

Towards Computational Architecture of Liberty: A Comprehensive Survey on Deep Learning for Generating Virtual Architecture in the Metaverse

  • Authors: Anqi Wang, Jiahua Dong, Jiachuan Shen, Lik-Hang Lee, Pan Hui
  • Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00510
  • Pdf link: https://arxiv.org/pdf/2305.00510
  • Abstract
    3D shape generation techniques utilizing deep learning are increasing attention from both computer vision and architectural design. This survey focuses on investigating and comparing the current latest approaches to 3D object generation with deep generative models (DGMs), including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), 3D-aware images, and diffusion models. We discuss 187 articles (80.7% of articles published between 2018-2022) to review the field of generated possibilities of architecture in virtual environments, limited to the architecture form. We provide an overview of architectural research, virtual environment, and related technical approaches, followed by a review of recent trends in discrete voxel generation, 3D models generated from 2D images, and conditional parameters. We highlight under-explored issues in 3D generation and parameterized control that is worth further investigation. Moreover, we speculate that four research agendas including data limitation, editability, evaluation metrics, and human-computer interaction are important enablers of ubiquitous interaction with immersive systems in architecture for computer-aided design Our work contributes to researchers' understanding of the current potential and future needs of deep learnings in generating virtual architecture.

Class-Balancing Diffusion Models

  • Authors: Yiming Qin, Huangjie Zheng, Jiangchao Yao, Mingyuan Zhou, Ya Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00562
  • Pdf link: https://arxiv.org/pdf/2305.00562
  • Abstract
    Diffusion-based models have shown the merits of generating high-quality visual data while preserving better diversity in recent studies. However, such observation is only justified with curated data distribution, where the data samples are nicely pre-processed to be uniformly distributed in terms of their labels. In practice, a long-tailed data distribution appears more common and how diffusion models perform on such class-imbalanced data remains unknown. In this work, we first investigate this problem and observe significant degradation in both diversity and fidelity when the diffusion model is trained on datasets with class-imbalanced distributions. Especially in tail classes, the generations largely lose diversity and we observe severe mode-collapse issues. To tackle this problem, we set from the hypothesis that the data distribution is not class-balanced, and propose Class-Balancing Diffusion Models (CBDM) that are trained with a distribution adjustment regularizer as a solution. Experiments show that images generated by CBDM exhibit higher diversity and quality in both quantitative and qualitative ways. Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.

Diffusion Models for Time Series Applications: A Survey

  • Authors: Lequan Lin, Zhengkun Li, Ruikun Li, Xuliang Li, Junbin Gao
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00624
  • Pdf link: https://arxiv.org/pdf/2305.00624
  • Abstract
    Diffusion models, a family of generative models based on deep learning, have become increasingly prominent in cutting-edge machine learning research. With a distinguished performance in generating samples that resemble the observed data, diffusion models are widely used in image, video, and text synthesis nowadays. In recent years, the concept of diffusion has been extended to time series applications, and many powerful models have been developed. Considering the deficiency of a methodical summary and discourse on these models, we provide this survey as an elementary resource for new researchers in this area and also an inspiration to motivate future research. For better understanding, we include an introduction about the basics of diffusion models. Except for this, we primarily focus on diffusion-based methods for time series forecasting, imputation, and generation, and present them respectively in three individual sections. We also compare different methods for the same application and highlight their connections if applicable. Lastly, we conclude the common limitation of diffusion-based methods and highlight potential future research directions.

Quality of approximating a mass-emitting object by a point source in a diffusion model

  • Authors: Qiyao Peng, Sander C. Hille
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2305.00717
  • Pdf link: https://arxiv.org/pdf/2305.00717
  • Abstract
    For the sake of computational efficiency and for theoretical purposes, in mathematical modelling, the Dirac Delta distributions are often utilized as a replacement for cells or vesicles, since the size of cells or vesicles is much smaller than the size of the surrounding tissues. Here, we consider the scenario that the cell or the vesicle releases the diffusive compounds to the immediate environment, which is modelled by the diffusion equation. Typically, one separates the intracellular and extracellular environment and uses homogeneous Neumann boundary condition for the cell boundary (so-called spatial exclusion approach), while the point source approach neglects the intracellular environment. We show that extra conditions are needed such that the solutions to the two approaches are consistent. We prove a necessary and sufficient condition for the consistency. Suggested by the numerical results, we conclude that an initial condition in the form of Gaussian kernel in the point source approach compensates for a time-delay discrepancy between the solutions to the two approaches in the numerical solutions. Various approaches determining optimal amplitude and variance of the Gaussian kernel have been discussed.

Keyword: dynamic

HermesBDD: A Multi-Core and Multi-Platform Binary Decision Diagram Package

  • Authors: Luigi Capogrosso, Luca Geretti, Marco Cristani, Franco Fummi, Tiziano Villa
  • Subjects: Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2305.00039
  • Pdf link: https://arxiv.org/pdf/2305.00039
  • Abstract
    BDDs are representations of a Boolean expression in the form of a directed acyclic graph. BDDs are widely used in several fields, particularly in model checking and hardware verification. There are several implementations for BDD manipulation, where each package differs depending on the application. This paper presents HermesBDD: a novel multi-core and multi-platform binary decision diagram package focused on high performance and usability. HermesBDD supports a static and dynamic memory management mechanism, the possibility to exploit lock-free hash tables, and a simple parallel implementation of the If-Then-Else procedure based on a higher-level wrapper for threads and futures. HermesBDD is completely written in C++ with no need to rely on external libraries and is developed according to software engineering principles for reliability and easy maintenance over time. We provide experimental results on the n-Queens problem, the de-facto SAT solver benchmark for BDDs, demonstrating a significant speedup of 18.73x over our non-parallel baselines, and a remarkable performance boost w.r.t. other state-of-the-art BDDs packages.

An Integrated System Dynamics and Discrete Event Supply Chain Simulation Framework for Supply Chain Resilience with Non-Stationary Pandemic Demand

  • Authors: Mustafa Can Camur, Chin-Yuan Tseng, Aristotelis E. Thanos, Chelsea C. White, Walter Yund, Eleftherios Iakovou
  • Subjects: Multiagent Systems (cs.MA); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.00086
  • Pdf link: https://arxiv.org/pdf/2305.00086
  • Abstract
    COVID-19 resulted in some of the largest supply chain disruptions in recent history. To mitigate the impact of future disruptions, we propose an integrated hybrid simulation framework to couple nonstationary demand signals from an event like COVID-19 with a model of an end-to-end supply chain. We first create a system dynamics susceptible-infected-recovered (SIR) model, augmenting a classic epidemiological model to create a realistic portrayal of demand patterns for oxygen concentrators (OC). Informed by this granular demand signal, we then create a supply chain discrete event simulation model of OC sourcing, manufacturing, and distribution to test production augmentation policies to satisfy this increased demand. This model utilizes publicly available data, engineering teardowns of OCs, and a supply chain illumination to identify suppliers. Our findings indicate that this coupled approach can use realistic demand during a disruptive event to enable rapid recommendations of policies for increased supply chain resilience with controlled cost.

Improving Gradient Computation for Differentiable Physics Simulation with Contacts

  • Authors: Yaofeng Desmond Zhong, Jiequn Han, Biswadip Dey, Georgia Olympia Brikis
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2305.00092
  • Pdf link: https://arxiv.org/pdf/2305.00092
  • Abstract
    Differentiable simulation enables gradients to be back-propagated through physics simulations. In this way, one can learn the dynamics and properties of a physics system by gradient-based optimization or embed the whole differentiable simulation as a layer in a deep learning model for downstream tasks, such as planning and control. However, differentiable simulation at its current stage is not perfect and might provide wrong gradients that deteriorate its performance in learning tasks. In this paper, we study differentiable rigid-body simulation with contacts. We find that existing differentiable simulation methods provide inaccurate gradients when the contact normal direction is not fixed - a general situation when the contacts are between two moving objects. We propose to improve gradient computation by continuous collision detection and leverage the time-of-impact (TOI) to calculate the post-collision velocities. We demonstrate our proposed method, referred to as TOI-Velocity, on two optimal control problems. We show that with TOI-Velocity, we are able to learn an optimal control sequence that matches the analytical solution, while without TOI-Velocity, existing differentiable simulation methods fail to do so.

Latent Dynamics Networks (LDNets): learning the intrinsic dynamics of spatio-temporal processes

  • Authors: Francesco Regazzoni, Stefano Pagani, Matteo Salvador, Luca Dede', Alfio Quarteroni
  • Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2305.00094
  • Pdf link: https://arxiv.org/pdf/2305.00094
  • Abstract
    Predicting the evolution of systems that exhibit spatio-temporal dynamics in response to external stimuli is a key enabling technology fostering scientific innovation. Traditional equations-based approaches leverage first principles to yield predictions through the numerical approximation of high-dimensional systems of differential equations, thus calling for large-scale parallel computing platforms and requiring large computational costs. Data-driven approaches, instead, enable the description of systems evolution in low-dimensional latent spaces, by leveraging dimensionality reduction and deep learning algorithms. We propose a novel architecture, named Latent Dynamics Network (LDNet), which is able to discover low-dimensional intrinsic dynamics of possibly non-Markovian dynamical systems, thus predicting the time evolution of space-dependent fields in response to external inputs. Unlike popular approaches, in which the latent representation of the solution manifold is learned by means of auto-encoders that map a high-dimensional discretization of the system state into itself, LDNets automatically discover a low-dimensional manifold while learning the latent dynamics, without ever operating in the high-dimensional space. Furthermore, LDNets are meshless algorithms that do not reconstruct the output on a predetermined grid of points, but rather at any point of the domain, thus enabling weight-sharing across query-points. These features make LDNets lightweight and easy-to-train, with excellent accuracy and generalization properties, even in time-extrapolation regimes. We validate our method on several test cases and we show that, for a challenging highly-nonlinear problem, LDNets outperform state-of-the-art methods in terms of accuracy (normalized error 5 times smaller), by employing a dramatically smaller number of trainable parameters (more than 10 times fewer).

Temporal Subsampling Diminishes Small Spatial Scales in Recurrent Neural Network Emulators of Geophysical Turbulence

  • Authors: Timothy A. Smith, Stephen G. Penny, Jason A. Platt, Tse-Chun Chen
  • Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph); Fluid Dynamics (physics.flu-dyn)
  • Arxiv link: https://arxiv.org/abs/2305.00100
  • Pdf link: https://arxiv.org/pdf/2305.00100
  • Abstract
    The immense computational cost of traditional numerical weather and climate models has sparked the development of machine learning (ML) based emulators. Because ML methods benefit from long records of training data, it is common to use datasets that are temporally subsampled relative to the time steps required for the numerical integration of differential equations. Here, we investigate how this often overlooked processing step affects the quality of an emulator's predictions. We implement two ML architectures from a class of methods called reservoir computing: (1) a form of Nonlinear Vector Autoregression (NVAR), and (2) an Echo State Network (ESN). Despite their simplicity, it is well documented that these architectures excel at predicting low dimensional chaotic dynamics. We are therefore motivated to test these architectures in an idealized setting of predicting high dimensional geophysical turbulence as represented by Surface Quasi-Geostrophic dynamics. In all cases, subsampling the training data consistently leads to an increased bias at small spatial scales that resembles numerical diffusion. Interestingly, the NVAR architecture becomes unstable when the temporal resolution is increased, indicating that the polynomial based interactions are insufficient at capturing the detailed nonlinearities of the turbulent flow. The ESN architecture is found to be more robust, suggesting a benefit to the more expensive but more general structure. Spectral errors are reduced by including a penalty on the kinetic energy density spectrum during training, although the subsampling related errors persist. Future work is warranted to understand how the temporal resolution of training data affects other ML architectures.

Faster Submodular Maximization for Several Classes of Matroids

  • Authors: Monika Henzinger, Paul Liu, Jan Vondrak, Da Wei Zheng
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2305.00122
  • Pdf link: https://arxiv.org/pdf/2305.00122
  • Abstract
    The maximization of submodular functions have found widespread application in areas such as machine learning, combinatorial optimization, and economics, where practitioners often wish to enforce various constraints; the matroid constraint has been investigated extensively due to its algorithmic properties and expressive power. Recent progress has focused on fast algorithms for important classes of matroids given in explicit form. Currently, nearly-linear time algorithms only exist for graphic and partition matroids [ICALP '19]. In this work, we develop algorithms for monotone submodular maximization constrained by graphic, transversal matroids, or laminar matroids in time near-linear in the size of their representation. Our algorithms achieve an optimal approximation of $1-1/e-\epsilon$ and both generalize and accelerate the results of Ene and Nguyen [ICALP '19]. In fact, the running time of our algorithm cannot be improved within the fast continuous greedy framework of Badanidiyuru and Vondr'ak [SODA '14]. To achieve near-linear running time, we make use of dynamic data structures that maintain bases with approximate maximum cardinality and weight under certain element updates. These data structures need to support a weight decrease operation and a novel FREEZE operation that allows the algorithm to freeze elements (i.e. force to be contained) in its basis regardless of future data structure operations. For the laminar matroid, we present a new dynamic data structure using the top tree interface of Alstrup, Holm, de Lichtenberg, and Thorup [TALG '05] that maintains the maximum weight basis under insertions and deletions of elements in $O(\log n)$ time. For the transversal matroid the FREEZE operation corresponds to requiring the data structure to keep a certain set $S$ of vertices matched, a property that we call $S$-stability.

DSEC-MOS: Segment Any Moving Object with Moving Ego Vehicle

  • Authors: Zhuyun Zhou, Zongwei Wu, Rémi Boutteau, Fan Yang, Dominique Ginhac
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.00126
  • Pdf link: https://arxiv.org/pdf/2305.00126
  • Abstract
    Moving Object Segmentation (MOS), a crucial task in computer vision, has numerous applications such as surveillance, autonomous driving, and video analytics. Existing datasets for moving object segmentation mainly focus on RGB or Lidar videos, but lack additional event information that can enhance the understanding of dynamic scenes. To address this limitation, we propose a novel dataset, called DSEC-MOS. Our dataset includes frames captured by RGB cameras embedded on moving vehicules and incorporates event data, which provide high temporal resolution and low-latency information about changes in the scenes. To generate accurate segmentation mask annotations for moving objects, we apply the recently emerged large model SAM - Segment Anything Model - with moving object bounding boxes from DSEC-MOD serving as prompts and calibrated RGB frames, then further revise the results. Our DSEC-MOS dataset contains in total 16 sequences (13314 images). To the best of our knowledge, DSEC-MOS is also the first moving object segmentation dataset that includes event camera in autonomous driving. Project Page: https://github.com/ZZY-Zhou/DSEC-MOS.

Optimal Scheduling in IoT-Driven Smart Isolated Microgrids Based on Deep Reinforcement Learning

  • Authors: Jiaju Qi, Lei Lei, Kan Zheng, Simon X. Yang, Xuemin (Sherman)Shen
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.00127
  • Pdf link: https://arxiv.org/pdf/2305.00127
  • Abstract
    In this paper, we investigate the scheduling issue of diesel generators (DGs) in an Internet of Things (IoT)-Driven isolated microgrid (MG) by deep reinforcement learning (DRL). The renewable energy is fully exploited under the uncertainty of renewable generation and load demand. The DRL agent learns an optimal policy from history renewable and load data of previous days, where the policy can generate real-time decisions based on observations of past renewable and load data of previous hours collected by connected sensors. The goal is to reduce operating cost on the premise of ensuring supply-demand balance. In specific, a novel finite-horizon partial observable Markov decision process (POMDP) model is conceived considering the spinning reserve. In order to overcome the challenge of discrete-continuous hybrid action space due to the binary DG switching decision and continuous energy dispatch (ED) decision, a DRL algorithm, namely the hybrid action finite-horizon RDPG (HAFH-RDPG), is proposed. HAFH-RDPG seamlessly integrates two classical DRL algorithms, i.e., deep Q-network (DQN) and recurrent deterministic policy gradient (RDPG), based on a finite-horizon dynamic programming (DP) framework. Extensive experiments are performed with real-world data in an IoT-driven MG to evaluate the capability of the proposed algorithm in handling the uncertainty due to inter-hour and inter-day power fluctuation and to compare its performance with those of the benchmark algorithms.

Learning to Seek: Multi-Agent Online Source Seeking Against Non-Stochastic Disturbances

  • Authors: Bin Du, Kun Qian, Christian Claudel, Dengfeng Sun
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2305.00154
  • Pdf link: https://arxiv.org/pdf/2305.00154
  • Abstract
    This paper proposes to leverage the emerginglearning techniques and devise a multi-agent online source {seeking} algorithm under unknown environment. Of particular significance in our problem setups are: i) the underlying environment is not only unknown, but dynamically changing and also perturbed by two types of non-stochastic disturbances; and ii) a group of agents is deployed and expected to cooperatively seek as many sources as possible. Correspondingly, a new technique of discounted Kalman filter is developed to tackle with the non-stochastic disturbances, and a notion of confidence bound in polytope nature is utilizedto aid the computation-efficient cooperation amongmultiple agents. With standard assumptions on the unknown environment as well as the disturbances, our algorithm is shown to achieve sub-linear regrets under the twotypes of non-stochastic disturbances; both results are comparable to the state-of-the-art. Numerical examples on a real-world pollution monitoring application are provided to demonstrate the effectiveness of our algorithm.

Uniqueness and Rapid Mixing in the Bipartite Hardcore Model

  • Authors: Xiaoyu Chen, Jingcheng Liu, Yitong Yin
  • Subjects: Data Structures and Algorithms (cs.DS); Probability (math.PR)
  • Arxiv link: https://arxiv.org/abs/2305.00186
  • Pdf link: https://arxiv.org/pdf/2305.00186
  • Abstract
    We characterize the uniqueness condition in the hardcore model for bipartite graphs with degree bounds only on one side, and provide a nearly linear time sampling algorithm that works up to the uniqueness threshold. We show that the uniqueness threshold for bipartite graph has almost the same form of the tree uniqueness threshold for general graphs, except with degree bounds only on one side of the bipartition. The hardcore model from statistical physics can be seen as a weighted enumeration of independent sets. Its bipartite version (#BIS) is a central open problem in approximate counting. Compared to the same problem in a general graph, surprising tractable regime have been identified that are believed to be hard in general. This is made possible by two lines of algorithmic approach: the high-temperature algorithms starting from Liu and Lu (STOC 2015), and the low-temperature algorithms starting from Helmuth, Perkins, and Regts (STOC 2019). In this work, we study the limit of these algorithms in the high-temperature case. Our characterization of the uniqueness condition is obtained by proving decay of correlations for arguably the best possible regime, which involves locating fixpoints of multivariate iterative rational maps and showing their contraction. We also give a nearly linear time sampling algorithm based on simulating field dynamics only on one side of the bipartite graph that works up to the uniqueness threshold. Our algorithm is very different from the original high-temperature algorithm of Liu and Lu, and it makes use of a connection between correlation decay and spectral independence of Markov chains. Last but not the least, we are able to show that the standard Glauber dynamics on both side of the bipartite graph mixes in polynomial time up to the uniqueness.

Large-Scale Assessment of Labour Market Dynamics in China during the COVID-19 Pandemic

  • Authors: Ying Sun, Hengshu Zhu, Hui Xiong
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2305.00199
  • Pdf link: https://arxiv.org/pdf/2305.00199
  • Abstract
    The outbreak of the COVID-19 pandemic has had an unprecedented impact on China's labour market, and has largely changed the structure of labour supply and demand in different regions. It becomes critical for policy makers to understand the emerging dynamics of the post-pandemic labour market and provide the right policies for supporting the sustainable development of regional economies. To this end, in this paper, we provide a data-driven approach to assess and understand the evolving dynamics in regions' labour markets with large-scale online job search queries and job postings. In particular, we model the spatial-temporal patterns of labour flow and labour demand which reflect the attractiveness of regional labour markets. Our analysis shows that regional labour markets suffered from dramatic changes and demonstrated unusual signs of recovery during the pandemic. Specifically, the intention of labour flow quickly recovered with a trend of migrating from large to small cities and from northern to southern regions, respectively. Meanwhile, due to the pandemic, the demand of blue-collar workers has been substantially reduced compared to that of white-collar workers. In addition, the demand structure of blue-collar jobs also changed from manufacturing to service industries. Our findings reveal that the pandemic can cause varied impacts on regions with different structures of labour demand and control policies. This analysis provides timely information for both individuals and organizations in confronting the dynamic change in job markets during the extreme events, such as pandemics. Also, the governments can be better assisted for providing the right policies on job markets in facilitating the sustainable development of regions' economies.

Deep Learning Based Channel Estimation in High Mobility Communications Using Bi-RNN Networks

  • Authors: Abdul Karim Gizzini, Marwa Chafii
  • Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.00208
  • Pdf link: https://arxiv.org/pdf/2305.00208
  • Abstract
    Doubly-selective channel estimation represents a key element in ensuring communication reliability in wireless systems. Due to the impact of multi-path propagation and Doppler interference in dynamic environments, doubly-selective channel estimation becomes challenging. Conventional channel estimation schemes encounter performance degradation in high mobility scenarios due to the usage of limited training pilots. Recently, deep learning (DL) has been utilized for doubly-selective channel estimation, where convolutional neural network (CNN) networks are employed in the frame-by-frame (FBF) channel estimation. However, CNN-based estimators require high complexity, making them impractical in real-case scenarios. For this purpose, we overcome this issue by proposing an optimized and robust bi-directional recurrent neural network (Bi-RNN) based channel estimator to accurately estimate the doubly-selective channel, especially in high mobility scenarios. The proposed estimator is based on performing end-to-end interpolation using gated recurrent unit (GRU) unit. Extensive numerical experiments demonstrate that the developed Bi-GRU estimator significantly outperforms the recently proposed CNN-based estimators in different mobility scenarios, while substantially reducing the overall computational complexity.

Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention

  • Authors: Xiao Liu, Jian Zhang, Heng Zhang, Fuzhao Xue, Yang You
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.00262
  • Pdf link: https://arxiv.org/pdf/2305.00262
  • Abstract
    Compared with standard text, understanding dialogue is more challenging for machines as the dynamic and unexpected semantic changes in each turn. To model such inconsistent semantics, we propose a simple but effective Hierarchical Dialogue Understanding model, HiDialog. Specifically, we first insert multiple special tokens into a dialogue and propose the turn-level attention to learn turn embeddings hierarchically. Then, a heterogeneous graph module is leveraged to polish the learned embeddings. We evaluate our model on various dialogue understanding tasks including dialogue relation extraction, dialogue emotion recognition, and dialogue act classification. Results show that our simple approach achieves state-of-the-art performance on all three tasks above. All our source code is publicly available at https://github.com/ShawX825/HiDialog.

ZIRCON: Zero-watermarking-based approach for data integrity and secure provenance in IoT networks

  • Authors: Omair Faraj, David Megías, Joaquin Garcia-Alfaro
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2305.00266
  • Pdf link: https://arxiv.org/pdf/2305.00266
  • Abstract
    The Internet of Things (IoT) is integrating the Internet and smart devices in almost every domain such as home automation, e-healthcare systems, vehicular networks, industrial control and military applications. In these sectors, sensory data, which is collected from multiple sources and managed through intermediate processing by multiple nodes, is used for decision-making processes. Ensuring data integrity and keeping track of data provenance is a core requirement in such a highly dynamic context, since data provenance is an important tool for the assurance of data trustworthiness. Dealing with such requirements is challenging due to the limited computational and energy resources in IoT networks. This requires addressing several challenges such as processing overhead, secure provenance, bandwidth consumption and storage efficiency. In this paper, we propose ZIRCON, a novel zero-watermarking approach to establish end-to-end data trustworthiness in an IoT network. In ZIRCON, provenance information is stored in a tamper-proof centralized network database through watermarks, generated at source node before transmission. We provide an extensive security analysis showing the resilience of our scheme against passive and active attacks. We also compare our scheme with existing works based on performance metrics such as computational time, energy utilization and cost analysis. The results show that ZIRCON is robust against several attacks, lightweight, storage efficient, and better in energy utilization and bandwidth consumption, compared to prior art.

A spectral method for a Fokker-Planck equation in neuroscience with applications in neural networks with learning rules

  • Authors: Pei Zhang, Yanli Wang, Zhennan Zhou
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2305.00275
  • Pdf link: https://arxiv.org/pdf/2305.00275
  • Abstract
    In this work, we consider the Fokker-Planck equation of the Nonlinear Noisy Leaky Integrate-and-Fire (NNLIF) model for neuron networks. Due to the firing events of neurons at the microscopic level, this Fokker-Planck equation contains dynamic boundary conditions involving specific internal points. To efficiently solve this problem and explore the properties of the unknown, we construct a flexible numerical scheme for the Fokker-Planck equation in the framework of spectral methods that can accurately handle the dynamic boundary condition. This numerical scheme is stable with suitable choices of test function spaces, and asymptotic preserving, and it is easily extendable to variant models with multiple time scales. We also present extensive numerical examples to verify the scheme properties, including order of convergence and time efficiency, and explore unique properties of the model, including blow-up phenomena for the NNLIF model and learning and discriminative properties for the NNLIF model with learning rules.

Improving Classification of Retinal Fundus Image Using Flow Dynamics Optimized Deep Learning Methods

  • Authors: V. Banupriya, S. Anusuya
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2305.00294
  • Pdf link: https://arxiv.org/pdf/2305.00294
  • Abstract
    Diabetic Retinopathy (DR) refers to a barrier that takes place in diabetes mellitus damaging the blood vessel network present in the retina. This may endanger the subjects' vision if they have diabetes. It can take some time to perform a DR diagnosis using color fundus pictures because experienced clinicians are required to identify the tumors in the imagery used to identify the illness. Automated detection of the DR can be an extremely challenging task. Convolutional Neural Networks (CNN) are also highly effective at classifying images when applied in the present situation, particularly compared to the handmade and functionality methods employed. In order to guarantee high results, the researchers also suggested a cutting-edge CNN model that might determine the characteristics of the fundus images. The features of the CNN output were employed in various classifiers of machine learning for the proposed system. This model was later evaluated using different forms of deep learning methods and Visual Geometry Group (VGG) networks). It was done by employing the images from a generic KAGGLE dataset. Here, the River Formation Dynamics (RFD) algorithm proposed along with the FUNDNET to detect retinal fundus images has been employed. The investigation's findings demonstrated that the approach performed better than alternative approaches.

Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using Corrupted Multimodal Data

  • Authors: Arthur Josi, Mahdi Alehdaghi, Rafael M. O. Cruz, Eric Granger
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00320
  • Pdf link: https://arxiv.org/pdf/2305.00320
  • Abstract
    Visible-infrared person re-identification (V-I ReID) seeks to match images of individuals captured over a distributed network of RGB and IR cameras. The task is challenging due to the significant differences between V and I modalities, especially under real-world conditions, where images are corrupted by, e.g, blur, noise, and weather. Indeed, state-of-art V-I ReID models cannot leverage corrupted modality information to sustain a high level of accuracy. In this paper, we propose an efficient model for multimodal V-I ReID -- named Multimodal Middle Stream Fusion (MMSF) -- that preserves modality-specific knowledge for improved robustness to corrupted multimodal images. In addition, three state-of-art attention-based multimodal fusion models are adapted to address corrupted multimodal data in V-I ReID, allowing to dynamically balance each modality importance. Recently, evaluation protocols have been proposed to assess the robustness of ReID models under challenging real-world scenarios. However, these protocols are limited to unimodal V settings. For realistic evaluation of multimodal (and cross-modal) V-I person ReID models, we propose new challenging corrupted datasets for scenarios where V and I cameras are co-located (CL) and not co-located (NCL). Finally, the benefits of our Masking and Local Multimodal Data Augmentation (ML-MDA) strategy are explored to improve the robustness of ReID models to multimodal corruption. Our experiments on clean and corrupted versions of the SYSU-MM01, RegDB, and ThermalWORLD datasets indicate the multimodal V-I ReID models that are more likely to perform well in real-world operational conditions. In particular, our ML-MDA is an important strategy for a V-I person ReID system to sustain high accuracy and robustness when processing corrupted multimodal images. Also, our multimodal ReID model MMSF outperforms every method under CL and NCL camera scenarios.

Maximum Match Subsequence Alignment Algorithm Finely Grained (MMSAA FG)

  • Authors: Bharath Reddy, Richard Fields
  • Subjects: Information Theory (cs.IT); Genomics (q-bio.GN)
  • Arxiv link: https://arxiv.org/abs/2305.00329
  • Pdf link: https://arxiv.org/pdf/2305.00329
  • Abstract
    Sequence alignment is common nowadays as it is used in many fields to determine how closely two sequences are related and at times to see how little they differ. In computational biology / Bioinformatics, there are many algorithms developed over the course of time to not only align two sequences quickly but also get good laboratory results from these alignments. The first algorithms developed were based of a technique called Dynamic Programming, which were very slow but were optimal when it comes to sensitivity. To improve speed, more algorithms today are based of heuristic approach, by sacrificing sensitivity. In this paper, we are going to improve on a heuristic algorithm called MASAA (Multiple Anchor Staged Local Sequence Alignment Algorithm) and MASAA Sensitive which we published previously. This new algorithm appropriately called Maximum Match Subsequence Alignment Algorithm Finely Grained. The algorithm is based on suffix tree data structure like our previous algorithms, but to improve sensitivity, we employ adaptive seeds, and finely grained perfect match seeds in between the already identified anchors. We tested this algorithm on a randomly generated sequences, and Rosetta dataset where the sequence length ranged up to 500 thousand.

Critical Scenario Generation for Developing Trustworthy Autonomy

  • Authors: Wenhao Ding
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.00339
  • Pdf link: https://arxiv.org/pdf/2305.00339
  • Abstract
    Autonomous systems, such as self-driving vehicles, quadrupeds, and robot manipulators, are largely enabled by the rapid development of artificial intelligence. However, such systems involve several trustworthy challenges such as safety, robustness, and generalization, due to their deployment in open-ended and real-time environments. To evaluate and improve trustworthiness, simulations or so-called digital twins are largely utilized for system development with low cost and high efficiency. One important thing in virtual simulations is scenarios that consist of static and dynamic objects, specific tasks, and evaluation metrics. However, designing diverse, realistic, and effective scenarios is still a challenging problem. One straightforward way is creating scenarios through human design, which is time-consuming and limited by the experience of experts. Another method commonly used in self-driving areas is log replay. This method collects scenario data in the real world and then replays it in simulations or adds random perturbations. Although the replay scenarios are realistic, most of the collected scenarios are redundant since they are all ordinary scenarios that only consider a small portion of critical cases. The desired scenarios should cover all cases in the real world, especially rare but critical events with extremely low probability. Critical scenarios are rare but important to test autonomous systems under risky conditions and unpredictable perturbations, which reveal their trustworthiness.

Neural Radiance Fields (NeRFs): A Review and Some Recent Developments

  • Authors: Mohamed Debbagh
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2305.00375
  • Pdf link: https://arxiv.org/pdf/2305.00375
  • Abstract
    Neural Radiance Field (NeRF) is a framework that represents a 3D scene in the weights of a fully connected neural network, known as the Multi-Layer Perception(MLP). The method was introduced for the task of novel view synthesis and is able to achieve state-of-the-art photorealistic image renderings from a given continuous viewpoint. NeRFs have become a popular field of research as recent developments have been made that expand the performance and capabilities of the base framework. Recent developments include methods that require less images to train the model for view synthesis as well as methods that are able to generate views from unconstrained and dynamic scene representations.

Image Completion via Dual-path Cooperative Filtering

  • Authors: Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00379
  • Pdf link: https://arxiv.org/pdf/2305.00379
  • Abstract
    Given the recent advances with image-generating algorithms, deep image completion methods have made significant progress. However, state-of-art methods typically provide poor cross-scene generalization, and generated masked areas often contain blurry artifacts. Predictive filtering is a method for restoring images, which predicts the most effective kernels based on the input scene. Motivated by this approach, we address image completion as a filtering problem. Deep feature-level semantic filtering is introduced to fill in missing information, while preserving local structure and generating visually realistic content. In particular, a Dual-path Cooperative Filtering (DCF) model is proposed, where one path predicts dynamic kernels, and the other path extracts multi-level features by using Fast Fourier Convolution to yield semantically coherent reconstructions. Experiments on three challenging image completion datasets show that our proposed DCF outperforms state-of-art methods.

Object-Centric Voxelization of Dynamic Scenes via Inverse Neural Rendering

  • Authors: Siyu Gao, Yanpeng Zhao, Yunbo Wang, Xiaokang Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00393
  • Pdf link: https://arxiv.org/pdf/2305.00393
  • Abstract
    Understanding the compositional dynamics of the world in unsupervised 3D scenarios is challenging. Existing approaches either fail to make effective use of time cues or ignore the multi-view consistency of scene decomposition. In this paper, we propose DynaVol, an inverse neural rendering framework that provides a pilot study for learning time-varying volumetric representations for dynamic scenes with multiple entities (like objects). It has two main contributions. First, it maintains a time-dependent 3D grid, which dynamically and flexibly binds the spatial locations to different entities, thus encouraging the separation of information at a representational level. Second, our approach jointly learns grid-level local dynamics, object-level global dynamics, and the compositional neural radiance fields in an end-to-end architecture, thereby enhancing the spatiotemporal consistency of object-centric scene voxelization. We present a two-stage training scheme for DynaVol and validate its effectiveness on various benchmarks with multiple objects, diverse dynamics, and real-world shapes and textures. We present visualization at https://sites.google.com/view/dynavol-visual.

LIMOT: A Tightly-Coupled System for LiDAR-Inertial Odometry and Multi-Object Tracking

  • Authors: Zhongyang Zhu, Junqiao Zhao, Xuebo Tian, Kai Huang, Chen Ye
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.00406
  • Pdf link: https://arxiv.org/pdf/2305.00406
  • Abstract
    Simultaneous localization and mapping (SLAM) is critical to the implementation of autonomous driving. Most LiDAR-inertial SLAM algorithms assume a static environment, leading to unreliable localization in dynamic environments. Furthermore, accurate tracking of moving objects is of great significance for the control and planning of autonomous vehicle operation. This study proposes LIMOT, a tightly-coupled multi-object tracking and LiDAR-inertial SLAM system capable of accurately estimating the poses of both ego-vehicle and objects. First, we use 3D bounding boxes generated by an object detector to represent all movable objects and perform LiDAR odometry using inertial measurement unit (IMU) pre-integration result. Based on the historical trajectories of tracked objects in a sliding window, we perform robust object association. We propose a trajectory-based dynamic feature filtering method, which filters out features belonging to moving objects by leveraging tracking results. Factor graph-based optimization is then conducted to optimize the bias of the IMU and the poses of both the ego-vehicle and surrounding objects in a sliding window. Experiments conducted on KITTI datasets show that our method achieves better pose and tracking accuracy than our previous work DL-SLOT and other SLAM and multi-object tracking baseline methods.

Dynamic Obstacles Tracking in mmWave Networks

  • Authors: Rathindra Nath Dutta, Subhojit Sarkar, Sasthi C. Ghosh
  • Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2305.00429
  • Pdf link: https://arxiv.org/pdf/2305.00429
  • Abstract
    The advent of fifth generation communication networks has led to novel opportunities and problems that were absent in legacy networks. Stringent line-of-sight demands necessitated by fast attenuating nature of millimeter waves (mmWave) through obstacles, pose to be one of the central problems of the field. mmWave links are easily disrupted due to obstacles, both static and dynamic. Handling static obstacles is easy, while dynamic obstacles are usually tracked by expensive additional hardware like cameras and radars, which undoubtedly lead to increased deployment costs. In this manuscript, we propose a novel approach to estimate the trajectories of multiple dynamic obstacles in an ultra dense mmWave network, solely based on link failure information, without resorting to any specialized tracking hardware. We keep a track of link failures over a short window of time and use that knowledge to extrapolate the trajectories of dynamic obstacles. After proving its NP-completeness, we employ a greedy set cover based approach for this. We then use the obtained trajectories to tag upcoming links as per their blockage possibility. We simulate on real world data to validate our approach based on its accuracy, sensitivity, and precision. Our approach is also shown to outperform an existing one.

EVREAL: Towards a Comprehensive Benchmark and Analysis Suite for Event-based Video Reconstruction

  • Authors: Burak Ercan, Onur Eker, Aykut Erdem, Erkut Erdem
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00434
  • Pdf link: https://arxiv.org/pdf/2305.00434
  • Abstract
    Event cameras are a new type of vision sensor that incorporates asynchronous and independent pixels, offering advantages over traditional frame-based cameras such as high dynamic range and minimal motion blur. However, their output is not easily understandable by humans, making the reconstruction of intensity images from event streams a fundamental task in event-based vision. While recent deep learning-based methods have shown promise in video reconstruction from events, this problem is not completely solved yet. To facilitate comparison between different approaches, standardized evaluation protocols and diverse test datasets are essential. This paper proposes a unified evaluation methodology and introduces an open-source framework called EVREAL to comprehensively benchmark and analyze various event-based video reconstruction methods from the literature. Using EVREAL, we give a detailed analysis of the state-of-the-art methods for event-based video reconstruction, and provide valuable insights into the performance of these methods under varying settings, challenging scenarios, and downstream tasks.

Learning, Diversity and Adaptation in Changing Environments: The Role of Weak Links

  • Authors: Daron Acemoglu, Asuman Ozdaglar, Sarath Pattathil
  • Subjects: Social and Information Networks (cs.SI); Theoretical Economics (econ.TH)
  • Arxiv link: https://arxiv.org/abs/2305.00474
  • Pdf link: https://arxiv.org/pdf/2305.00474
  • Abstract
    Adaptation to dynamic conditions requires a certain degree of diversity. If all agents take the best current action, learning that the underlying state has changed and behavior should adapt will be slower. Diversity is harder to maintain when there is fast communication between agents, because they tend to find out and pursue the best action rapidly. We explore these issues using a model of (Bayesian) learning over a social network. Agents learn rapidly from and may also have incentives to coordinate with others to whom they are connected via strong links. We show, however, that when the underlying environment changes sufficiently rapidly, any network consisting of just strong links will do only a little better than random choice in the long run. In contrast, networks combining strong and weak links, whereby the latter type of links transmit information only slowly, can achieve much higher long-run average payoffs. The best social networks are those that combine a large fraction of agents into a strongly-connected component, while still maintaining a sufficient number of smaller communities that make diverse choices and communicate with this component via weak links.

Fixed-time safe tracking control of uncertain high-order nonlinear pure-feedback systems via unified transformation functions

  • Authors: Chaoqun Guo, Jiangping Hu, Jiasheng Hao, Sergej Celikovsky, Xiaoming Hu
  • Subjects: Systems and Control (eess.SY); Mathematical Physics (math-ph)
  • Arxiv link: https://arxiv.org/abs/2305.00505
  • Pdf link: https://arxiv.org/pdf/2305.00505
  • Abstract
    In this paper, a fixed-time safe control problem is investigated for an uncertain high-order nonlinear pure-feedback system with state constraints. A new nonlinear transformation function is firstly proposed to handle both the constrained and unconstrained cases in a unified way. Further, a radial basis function neural network is constructed to approximate the unknown dynamics in the system and a fixed-time dynamic surface control (FDSC) technique is developed to facilitate the fixed-time control design for the uncertain high-order pure-feedback system. Combined with the proposed unified transformation function and the FDSC technique, an adaptive fixed-time control strategy is proposed to guarantee the fixed-time tracking. The proposed fixed-time control strategy can guarantee uniform control structure when addressing both constrained and unconstrained situations. Numerical examples are presented to demonstrate the proposed fixed-time tracking control strategy.

StyleLipSync: Style-based Personalized Lip-sync Video Generation

  • Authors: Taekyung Ki, Dongchan Min
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00521
  • Pdf link: https://arxiv.org/pdf/2305.00521
  • Abstract
    In this paper, we present StyleLipSync, a style-based personalized lip-sync video generative model that can generate identity-agnostic lip-synchronizing video from arbitrary audio. To generate a video of arbitrary identities, we leverage expressive lip prior from the semantically rich latent space of a pre-trained StyleGAN, where we can also design a video consistency with a linear transformation. In contrast to the previous lip-sync methods, we introduce pose-aware masking that dynamically locates the mask to improve the naturalness over frames by utilizing a 3D parametric mesh predictor frame by frame. Moreover, we propose a few-shot lip-sync adaptation method for an arbitrary person by introducing a sync regularizer that preserves lips-sync generalization while enhancing the person-specific visual information. Extensive experiments demonstrate that our model can generate accurate lip-sync videos even with the zero-shot setting and enhance characteristics of an unseen face using a few seconds of target video through the proposed adaptation method. Please refer to our project page.

MD-Manifold: A Medical-Distance-Based Representation Learning Approach for Medical Concept and Patient Representation

  • Authors: Shaodong Wang, Qing Li, Wenli Zhang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00553
  • Pdf link: https://arxiv.org/pdf/2305.00553
  • Abstract
    Effectively representing medical concepts and patients is important for healthcare analytical applications. Representing medical concepts for healthcare analytical tasks requires incorporating medical domain knowledge and prior information from patient description data. Current methods, such as feature engineering and mapping medical concepts to standardized terminologies, have limitations in capturing the dynamic patterns from patient description data. Other embedding-based methods have difficulties in incorporating important medical domain knowledge and often require a large amount of training data, which may not be feasible for most healthcare systems. Our proposed framework, MD-Manifold, introduces a novel approach to medical concept and patient representation. It includes a new data augmentation approach, concept distance metric, and patient-patient network to incorporate crucial medical domain knowledge and prior data information. It then adapts manifold learning methods to generate medical concept-level representations that accurately reflect medical knowledge and patient-level representations that clearly identify heterogeneous patient cohorts. MD-Manifold also outperforms other state-of-the-art techniques in various downstream healthcare analytical tasks. Our work has significant implications in information systems research in representation learning, knowledge-driven machine learning, and using design science as middle-ground frameworks for downstream explorative and predictive analyses. Practically, MD-Manifold has the potential to create effective and generalizable representations of medical concepts and patients by incorporating medical domain knowledge and prior data information. It enables deeper insights into medical data and facilitates the development of new analytical applications for better healthcare outcomes.

RAPID: Autonomous Multi-Agent Racing using Constrained Potential Dynamic Games

  • Authors: Yixuan Jia, Maulik Bhatt, Negar Mehr
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.00579
  • Pdf link: https://arxiv.org/pdf/2305.00579
  • Abstract
    In this work, we consider the problem of autonomous racing with multiple agents where agents must interact closely and influence each other to compete. We model interactions among agents through a game-theoretical framework and propose an efficient algorithm for tractably solving the resulting game in real time. More specifically, we capture interactions among multiple agents through a constrained dynamic game. We show that the resulting dynamic game is an instance of a simple-to-analyze class of games. Namely, we show that our racing game is an instance of a constrained dynamic potential game. An important and appealing property of dynamic potential games is that a generalized Nash equilibrium of the underlying game can be computed by solving a single constrained optimal control problem instead of multiple coupled constrained optimal control problems. Leveraging this property, we show that the problem of autonomous racing is greatly simplified and develop RAPID (autonomous multi-agent RAcing using constrained PotentIal Dynamic games), a racing algorithm that can be solved tractably in real-time. Through simulation studies, we demonstrate that our algorithm outperforms the state-of-the-art approach. We further show the real-time capabilities of our algorithm in hardware experiments.

MAMBO-V: Dynamic Side-Channel Leakage Analysis on RISC-V

  • Authors: Jan Wichelmann, Christopher Peredy, Florian Sieck, Anna Pätschke, Thomas Eisenbarth
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2305.00584
  • Pdf link: https://arxiv.org/pdf/2305.00584
  • Abstract
    RISC-V is an emerging technology, with applications ranging from embedded devices to high-performance servers. Therefore, more and more security-critical workloads will be conducted with code that is compiled for RISC-V. Well-known microarchitectural side-channel attacks against established platforms like x86 apply to RISC-V CPUs as well. As RISC-V does not mandate any hardware-based side-channel countermeasures, a piece of code compiled for a generic RISC-V CPU in a cloud server cannot make safe assumptions about the microarchitecture on which it is running. Existing tools for aiding software-level precautions by checking side-channel vulnerabilities on source code or x86 binaries are not compatible with RISC-V machine code. In this work, we study the requirements and goals of architecture-specific leakage analysis for RISC-V and illustrate how to achieve these goals with the help of fast and precise dynamic binary analysis. We implement all necessary building blocks for finding side-channel leakages on RISC-V, while relying on existing mature solutions when possible. Our leakage analysis builds upon the modular side-channel analysis framework Microwalk, that examines execution traces for leakage through secret-dependent memory accesses or branches. To provide suitable traces, we port the ARM dynamic binary instrumentation tool MAMBO to RISC-V. Our port named MAMBO-V can instrument arbitrary binaries which use the 64-bit general purpose instruction set. We evaluate our toolchain on several cryptographic libraries with RISC-V support and identify multiple exploitable leakages.

Modeling and Analysis of Analog Non-Volatile Devices for Compute-In-Memory Applications

  • Authors: Carl Brando, Minseong Park, Sayma Nowshin Chowdhury, Matthew Chen, Kyusang Lee, Sahil Shah
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2305.00618
  • Pdf link: https://arxiv.org/pdf/2305.00618
  • Abstract
    This paper introduces a novel simulation tool for analyzing and training neural network models tailored for compute-in-memory hardware. The tool leverages physics-based device models to enable the design of neural network models and their parameters that are more hardware-accurate. The initial study focused on modeling a CMOS-based floating-gate transistor and memristor device using measurement data from a fabricated device. Additionally, the tool incorporates hardware constraints, such as the dynamic range of data converters, and allows users to specify circuit-level constraints. A case study using the MNIST dataset and LeNet-5 architecture demonstrates the tool's capability to estimate area, power, and accuracy. The results showcase the potential of the proposed tool to optimize neural network models for compute-in-memory hardware.

Dynamic Transfer Learning across Graphs

  • Authors: Haohui Wang, Yuzhen Mao, Jianhui Sun, Si Zhang, Dawei Zhou
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00664
  • Pdf link: https://arxiv.org/pdf/2305.00664
  • Abstract
    Transferring knowledge across graphs plays a pivotal role in many high-stake domains, ranging from transportation networks to e-commerce networks, from neuroscience to finance. To date, the vast majority of existing works assume both source and target domains are sampled from a universal and stationary distribution. However, many real-world systems are intrinsically dynamic, where the underlying domains are evolving over time. To bridge the gap, we propose to shift the problem to the dynamic setting and ask: given the label-rich source graphs and the label-scarce target graphs observed in previous T timestamps, how can we effectively characterize the evolving domain discrepancy and optimize the generalization performance of the target domain at the incoming T+1 timestamp? To answer the question, for the first time, we propose a generalization bound under the setting of dynamic transfer learning across graphs, which implies the generalization performance is dominated by domain evolution and domain discrepancy between source and target domains. Inspired by the theoretical results, we propose a novel generic framework DyTrans to improve knowledge transferability across dynamic graphs. In particular, we start with a transformer-based temporal encoding module to model temporal information of the evolving domains; then, we further design a dynamic domain unification module to efficiently learn domain-invariant representations across the source and target domains. Finally, extensive experiments on various real-world datasets demonstrate the effectiveness of DyTrans in transferring knowledge from dynamic source domains to dynamic target domains.

PRSeg: A Lightweight Patch Rotate MLP Decoder for Semantic Segmentation

  • Authors: Yizhe Ma, Fangjian Lin, Sitong Wu, Shengwei Tian, Long Yu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00671
  • Pdf link: https://arxiv.org/pdf/2305.00671
  • Abstract
    The lightweight MLP-based decoder has become increasingly promising for semantic segmentation. However, the channel-wise MLP cannot expand the receptive fields, lacking the context modeling capacity, which is critical to semantic segmentation. In this paper, we propose a parametric-free patch rotate operation to reorganize the pixels spatially. It first divides the feature map into multiple groups and then rotates the patches within each group. Based on the proposed patch rotate operation, we design a novel segmentation network, named PRSeg, which includes an off-the-shelf backbone and a lightweight Patch Rotate MLP decoder containing multiple Dynamic Patch Rotate Blocks (DPR-Blocks). In each DPR-Block, the fully connected layer is performed following a Patch Rotate Module (PRM) to exchange spatial information between pixels. Specifically, in PRM, the feature map is first split into the reserved part and rotated part along the channel dimension according to the predicted probability of the Dynamic Channel Selection Module (DCSM), and our proposed patch rotate operation is only performed on the rotated part. Extensive experiments on ADE20K, Cityscapes and COCO-Stuff 10K datasets prove the effectiveness of our approach. We expect that our PRSeg can promote the development of MLP-based decoder in semantic segmentation.

End to End Lane detection with One-to-Several Transformer

  • Authors: Kunyang Zhou, Rui Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00675
  • Pdf link: https://arxiv.org/pdf/2305.00675
  • Abstract
    Although lane detection methods have shown impressive performance in real-world scenarios, most of methods require post-processing which is not robust enough. Therefore, end-to-end detectors like DEtection TRansformer(DETR) have been introduced in lane detection. However, one-to-one label assignment in DETR can degrade the training efficiency due to label semantic conflicts. Besides, positional query in DETR is unable to provide explicit positional prior, making it difficult to be optimized. In this paper, we present the One-to-Several Transformer(O2SFormer). We first propose the one-to-several label assignment, which combines one-to-one and one-to-many label assignments to improve the training efficiency while keeping end-to-end detection. To overcome the difficulty in optimizing one-to-one assignment. We further propose the layer-wise soft label which adjusts the positive weight of positive lane anchors across different decoder layers. Finally, we design the dynamic anchor-based positional query to explore positional prior by incorporating lane anchors into positional query. Experimental results show that O2SFormer significantly speeds up the convergence of DETR and outperforms Transformer-based and CNN-based detectors on the CULane dataset. Code will be available athttps://github.com/zkyseu/O2SFormer.

Learning Terrain-Aware Kinodynamic Model for Autonomous Off-Road Rally Driving With Model Predictive Path Integral Control

  • Authors: Hojin Lee, Taekyung Kim, Jungwi Mun, Wonsuk Lee
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.00676
  • Pdf link: https://arxiv.org/pdf/2305.00676
  • Abstract
    High-speed autonomous driving in off-road environments has immense potential for various applications, but it also presents challenges due to the complexity of vehicle-terrain interactions. In such environments, it is crucial for the vehicle to predict its motion and adjust its controls proactively in response to environmental changes, such as variations in terrain elevation. To this end, we propose a method for learning terrain-aware kinodynamic model which is conditioned on both proprioceptive and exteroceptive information. The proposed model generates reliable predictions of 6-degree-of-freedom motion and can even estimate contact interactions without requiring ground truth force data during training. This enables the design of a safe and robust model predictive controller through appropriate cost function design which penalizes sampled trajectories with unstable motion, unsafe interactions, and high levels of uncertainty derived from the model. We demonstrate the effectiveness of our approach through experiments on a simulated off-road track, showing that our proposed model-controller pair outperforms the baseline and ensures robust high-speed driving performance without control failure.

Joint tone mapping and denoising of thermal infrared images via multi-scale Retinex and multi-task learning

  • Authors: Axel Gödrich, Daniel König, Gabriel Eilertsen, Michael Teutsch
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2305.00691
  • Pdf link: https://arxiv.org/pdf/2305.00691
  • Abstract
    Cameras digitize real-world scenes as pixel intensity values with a limited value range given by the available bits per pixel (bpp). High Dynamic Range (HDR) cameras capture those luminance values in higher resolution through an increase in the number of bpp. Most displays, however, are limited to 8 bpp. Naive HDR compression methods lead to a loss of the rich information contained in those HDR images. In this paper, tone mapping algorithms for thermal infrared images with 16 bpp are investigated that can preserve this information. An optimized multi-scale Retinex algorithm sets the baseline. This algorithm is then approximated with a deep learning approach based on the popular U-Net architecture. The remaining noise in the images after tone mapping is reduced implicitly by utilizing a self-supervised deep learning approach that can be jointly trained with the tone mapping approach in a multi-task learning scheme. Further discussions are provided on denoising and deflickering for thermal infrared video enhancement in the context of tone mapping. Extensive experiments on the public FLIR ADAS Dataset prove the effectiveness of our proposed method in comparison with the state-of-the-art.

Full Scaling Automation for Sustainable Development of Green Data Centers

  • Authors: Shiyu Wang, Yinbo Sun, Xiaoming Shi, Shiyi Zhu, Lin-Tao Ma, James Zhang, Yifei Zheng, Jian Liu
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00706
  • Pdf link: https://arxiv.org/pdf/2305.00706
  • Abstract
    The rapid rise in cloud computing has resulted in an alarming increase in data centers' carbon emissions, which now accounts for >3% of global greenhouse gas emissions, necessitating immediate steps to combat their mounting strain on the global climate. An important focus of this effort is to improve resource utilization in order to save electricity usage. Our proposed Full Scaling Automation (FSA) mechanism is an effective method of dynamically adapting resources to accommodate changing workloads in large-scale cloud computing clusters, enabling the clusters in data centers to maintain their desired CPU utilization target and thus improve energy efficiency. FSA harnesses the power of deep representation learning to accurately predict the future workload of each service and automatically stabilize the corresponding target CPU usage level, unlike the previous autoscaling methods, such as Autopilot or FIRM, that need to adjust computing resources with statistical models and expert knowledge. Our approach achieves significant performance improvement compared to the existing work in real-world datasets. We also deployed FSA on large-scale cloud computing clusters in industrial data centers, and according to the certification of the China Environmental United Certification Center (CEC), a reduction of 947 tons of carbon dioxide, equivalent to a saving of 1538,000 kWh of electricity, was achieved during the Double 11 shopping festival of 2022, marking a critical step for our company's strategic goal towards carbon neutrality by 2030.

SGX Switchless Calls Made Configless

  • Authors: Peterson Yuhala, Michael Paper, Timothée Zerbib, Pascal Felber, Valerio Schiavoni, Alain Tchana
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2305.00763
  • Pdf link: https://arxiv.org/pdf/2305.00763
  • Abstract
    Intel's software guard extensions (SGX) provide hardware enclaves to guarantee confidentiality and integrity for sensitive code and data. However, systems leveraging such security mechanisms must often pay high performance overheads. A major source of this overhead is SGX enclave transitions which induce expensive cross-enclave context switches. The Intel SGX SDK mitigates this with a switchless call mechanism for transitionless cross-enclave calls using worker threads. Intel's SGX switchless call implementation improves performance but provides limited flexibility: developers need to statically fix the system configuration at build time, which is error-prone and misconfigurations lead to performance degradations and waste of CPU resources. ZC-SWITCHLESS is a configless and efficient technique to drive the execution of SGX switchless calls. Its dynamic approach optimises the total switchless worker threads at runtime to minimise CPU waste. The experimental evaluation shows that ZC-SWITCHLESS obviates the performance penalty of misconfigured switchless systems while minimising CPU waste.

RViDeformer: Efficient Raw Video Denoising Transformer with a Larger Benchmark Dataset

  • Authors: Huanjing Yue, Cong Cao, Lei Liao, Jingyu Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2305.00767
  • Pdf link: https://arxiv.org/pdf/2305.00767
  • Abstract
    In recent years, raw video denoising has garnered increased attention due to the consistency with the imaging process and well-studied noise modeling in the raw domain. However, two problems still hinder the denoising performance. Firstly, there is no large dataset with realistic motions for supervised raw video denoising, as capturing noisy and clean frames for real dynamic scenes is difficult. To address this, we propose recapturing existing high-resolution videos displayed on a 4K screen with high-low ISO settings to construct noisy-clean paired frames. In this way, we construct a video denoising dataset (named as ReCRVD) with 120 groups of noisy-clean videos, whose ISO values ranging from 1600 to 25600. Secondly, while non-local temporal-spatial attention is beneficial for denoising, it often leads to heavy computation costs. We propose an efficient raw video denoising transformer network (RViDeformer) that explores both short and long-distance correlations. Specifically, we propose multi-branch spatial and temporal attention modules, which explore the patch correlations from local window, local low-resolution window, global downsampled window, and neighbor-involved window, and then they are fused together. We employ reparameterization to reduce computation costs. Our network is trained in both supervised and unsupervised manners, achieving the best performance compared with state-of-the-art methods. Additionally, the model trained with our proposed dataset (ReCRVD) outperforms the model trained with previous benchmark dataset (CRVD) when evaluated on the real-world outdoor noisy videos. Our code and dataset will be released after the acceptance of this work.

Higher-order time domain boundary elements for elastodynamics - graded meshes and hp versions

  • Authors: Alessandra Aimi, Giulia Di Credico, Heiko Gimperlein, Ernst P. Stephan
  • Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
  • Arxiv link: https://arxiv.org/abs/2305.00772
  • Pdf link: https://arxiv.org/pdf/2305.00772
  • Abstract
    The solution to the elastodynamic equation in the exterior of a polyhedral domain or a screen exhibits singular behavior from the corners and edges. The detailed expansion of the singularities implies quasi-optimal estimates for piecewise polynomial approximations of the Dirichlet trace of the solution and the traction. The results are applied to hp and graded versions of the time domain boundary element method for the weakly singular and the hypersingular integral equations. Numerical examples confirm the theoretical results for the Dirichlet and Neumann problems for screens and for polygonal domains in 2d. They exhibit the expected quasi-optimal convergence rates and the singular behavior of the solutions.

Explicit Knowledge Graph Reasoning for Conversational Recommendation

  • Authors: Xuhui Ren, Tong Chen, Quoc Viet Hung Nguyen, Lizhen Cui, Zi Huang, Hongzhi Yin
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2305.00783
  • Pdf link: https://arxiv.org/pdf/2305.00783
  • Abstract
    Traditional recommender systems estimate user preference on items purely based on historical interaction records, thus failing to capture fine-grained yet dynamic user interests and letting users receive recommendation only passively. Recent conversational recommender systems (CRSs) tackle those limitations by enabling recommender systems to interact with the user to obtain her/his current preference through a sequence of clarifying questions. Despite the progress achieved in CRSs, existing solutions are far from satisfaction in the following two aspects: 1) current CRSs usually require each user to answer a quantity of clarifying questions before reaching the final recommendation, which harms the user experience; 2) there is a semantic gap between the learned representations of explicitly mentioned attributes and items. To address these drawbacks, we introduce the knowledge graph (KG) as the auxiliary information for comprehending and reasoning a user's preference, and propose a new CRS framework, namely Knowledge Enhanced Conversational Reasoning (KECR) system. As a user can reflect her/his preference via both attribute- and item-level expressions, KECR closes the semantic gap between two levels by embedding the structured knowledge in the KG. Meanwhile, KECR utilizes the connectivity within the KG to conduct explicit reasoning of the user demand, making the model less dependent on the user's feedback to clarifying questions. KECR can find a prominent reasoning chain to make the recommendation explainable and more rationale, as well as smoothen the conversation process, leading to better user experience and conversational recommendation accuracy. Extensive experiments on two real-world datasets demonstrate our approach's superiority over state-of-the-art baselines in both automatic evaluations and human judgments.

Empowering Learner-Centered Instruction: Integrating ChatGPT Python API and Tinker Learning for Enhanced Creativity and Problem-Solving Skills

  • Authors: Yun-Cheng Tsai
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2305.00821
  • Pdf link: https://arxiv.org/pdf/2305.00821
  • Abstract
    The ChatGPT Python API plays a crucial role in promoting Learner-Centered Instruction (LCI) and aligns with the principles of Tinker Learning, allowing students to discover their learning strategies. LCI emphasizes the importance of active, hands-on learning experiences and encourages students to take responsibility for their learning journey. By integrating the ChatGPT Python API into the educational process, students can explore various resources, generate new ideas, and create content in a more personalized manner. This innovative approach enables students to engage with the learning material deeper, fostering a sense of ownership and motivation. As they work through the Creative Learning Spiral, students develop essential skills such as critical thinking, problem-solving, and creativity. The ChatGPT Python API is a valuable tool for students to explore different solutions, evaluate alternatives, and make informed decisions, all while encouraging self-directed learning. In Tinker Learning environments, the integration of ChatGPT Python API empowers students to experiment and iterate, allowing them to find the most effective learning strategies that cater to their individual needs and preferences. This personalized approach helps students to become more confident in their abilities, leading to tremendous academic success and long-term skill development. By leveraging the capabilities of the ChatGPT Python API, educational institutions can create a more engaging, supportive, and dynamic learning environment. This approach aligns with the principles of Learner-Centered Instruction and Tinker Learning, promoting a culture of curiosity, exploration, and creativity among students while preparing them for the challenges of the fast-paced, ever-changing world.

Jointly Managing Electrical and Thermal Energy in Solar- and Battery-powered Computer Systems

  • Authors: Noman Bashir, Yasra Chandio, David Irwin, Fatima M. Anwar, Jeremy Gummeson, Prashant Shenoy
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computers and Society (cs.CY); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2305.00855
  • Pdf link: https://arxiv.org/pdf/2305.00855
  • Abstract
    Environmentally-powered computer systems operate on renewable energy harvested from their environment, such as solar or wind, and stored in batteries. While harvesting environmental energy has long been necessary for small-scale embedded systems without access to external power sources, it is also increasingly important in designing sustainable larger-scale systems for edge applications. For sustained operations, such systems must consider not only the electrical energy but also the thermal energy available in the environment in their design and operation. Unfortunately, prior work generally ignores the impact of thermal effects, and instead implicitly assumes ideal temperatures. To address the problem, we develop a thermodynamic model that captures the interplay of electrical and thermal energy in environmentally-powered computer systems. The model captures the effect of environmental conditions, the system's physical properties, and workload scheduling on performance. In evaluating our model, we distill the thermal effects that impact these systems using a small-scale prototype and a programmable incubator. We then leverage our model to show how considering these thermal effects in designing and operating environmentally-powered computer systems of varying scales can improve their energy-efficiency, performance, and availability.

Supporting Contextual Conversational Agent-Based Software Development

  • Authors: Glaucia Melo, Luis Fernando Lins, Paulo Alencar, Donald Cowan
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2305.00885
  • Pdf link: https://arxiv.org/pdf/2305.00885
  • Abstract
    Software Development (SD) is remarkably dynamic and is critically dependent on the knowledge acquired by the project's software developers as the project progresses. Software developers need to understand large amounts of information related to the tasks at hand. This information (context) is often not explicit, as it can be lost in large documentation repositories, a team member's brain, or beyond their cognitive memory capacity. These contexts include tool features, integration strategies, data structures, code syntax, approaches to tasks, project definitions, and even implicit or tacit contexts, which add significant complexity to the SD process. Current software development practices still lack sufficient techniques using the existing SD execution information and context to provide developers with relevant process guidance, augmenting their capacity to do their job using available applicable information. This paper presents ongoing and future research on an approach to support conversational agent-based knowledge-augmented software development. Developers benefit by receiving recommendations about task-related information and workflows they need to execute. This work advances human-computer interaction patterns in workflow engines, from graphical user interfaces to conversational patterns in software engineering.

Learning Flight Control Systems from Human Demonstrations and Real-Time Uncertainty-Informed Interventions

  • Authors: Prashant Ganesh, J. Humberto Ramos, Vinicius G. Goecks, Jared Paquet, Matthew Longmire, Nicholas R. Waytowich, Kevin Brink
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.00929
  • Pdf link: https://arxiv.org/pdf/2305.00929
  • Abstract
    This paper describes a methodology for learning flight control systems from human demonstrations and interventions while considering the estimated uncertainty in the learned models. The proposed approach uses human demonstrations to train an initial model via imitation learning and then iteratively, improve its performance by using real-time human interventions. The aim of the interventions is to correct undesired behaviors and adapt the model to changes in the task dynamics. The learned model uncertainty is estimated in real-time via Monte Carlo Dropout and the human supervisor is cued for intervention via an audiovisual signal when this uncertainty exceeds a predefined threshold. This proposed approach is validated in an autonomous quadrotor landing task on both fixed and moving platforms. It is shown that with this algorithm, a human can rapidly teach a flight task to an unmanned aerial vehicle via demonstrating expert trajectories and then adapt the learned model by intervening when the learned controller performs any undesired maneuver, the task changes, and/or the model uncertainty exceeds a threshold

A Comparison of Pneumatic Actuators for Soft Growing Vine Robots

  • Authors: Alexander M. Kübler, Cosima du Pasquier, Andrew Low, Betim Djambazi, Nicolas Aymon, Julian Förster, Nathaniel Agharese, Roland Siegwart, Allison M. Okamura
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2305.00967
  • Pdf link: https://arxiv.org/pdf/2305.00967
  • Abstract
    Soft pneumatic actuators are used to steer soft growing "vine" robots while being flexible enough to undergo the tip eversion required for growth. They also meet the requirements to steer soft growing vine robots through challenging terrain. In this study, we compared the performance of three types of pneumatic actuators in terms of their ability to perform eversion, bending, dynamic motion, and force: the pouch motor, the cylindrical pneumatic artificial muscle (cPAM), and the fabric pneumatic artificial muscle (fPAM). The pouch motor is advantageous for prototyping due to its simple manufacturing process. The cPAM exhibits superior bending behavior and produces the highest forces, while the fPAM actuates fastest and everts at the lowest pressure. We evaluated a similar range of dimensions for each actuator type. Larger actuators can produce more significant deformations and forces, but smaller actuators inflate more quickly and require a lower eversion pressure. Since vine robots are lightweight, the effect of gravity on the functionality of different actuators is minimal. We developed a new analytical model that predicts the pressure-to-bending behavior of vine robot actuators. Using the actuator results, we designed and demonstrated a 4.8 m long vine robot equipped with highly maneuverable 60x60 mm cPAMs in a three-dimensional obstacle course. The vine robot was able to move around sharp turns, travel through a passage smaller than its diameter, and lift itself against gravity.

New submissions for Mon, 10 Apr 23

Keyword: efficient

Automatic Detection of Reactions to Music via Earable Sensing

  • Authors: Euihyoek Lee, Chulhong Min, Jeaseung Lee, Jin Yu, Seungwoo Kang
  • Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.03295
  • Pdf link: https://arxiv.org/pdf/2304.03295
  • Abstract
    We present GrooveMeter, a novel system that automatically detects vocal and motion reactions to music via earable sensing and supports music engagement-aware applications. To this end, we use smart earbuds as sensing devices, which are already widely used for music listening, and devise reaction detection techniques by leveraging an inertial measurement unit (IMU) and a microphone on earbuds. To explore reactions in daily music-listening situations, we collect the first kind of dataset, MusicReactionSet, containing 926-minute-long IMU and audio data with 30 participants. With the dataset, we discover a set of unique challenges in detecting music listening reactions accurately and robustly using audio and motion sensing. We devise sophisticated processing pipelines to make reaction detection accurate and efficient. We present a comprehensive evaluation to examine the performance of reaction detection and system cost. It shows that GrooveMeter achieves the macro F1 scores of 0.89 for vocal reaction and 0.81 for motion reaction with leave-one-subject-out cross-validation. More importantly, GrooveMeter shows higher accuracy and robustness compared to alternative methods. We also show that our filtering approach reduces 50% or more of the energy overhead. Finally, we demonstrate the potential use cases through a case study.

Identifying Lebesgue-sampled Continuous-time Impulse Response Models: A Kernel-based Approach

  • Authors: Rodrigo A. González, Koen Tiels, Tom Oomen
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.03312
  • Pdf link: https://arxiv.org/pdf/2304.03312
  • Abstract
    Control applications are increasingly sampled non-equidistantly in time, including in motion control, networked control, resource-aware control, and event-triggered control. Some of these applications use measurement devices that sample equidistantly in the amplitude domain. The aim of this paper is to develop a non-parametric estimator of the impulse response of continuous-time systems based on such sampling strategy, known as Lebesgue-sampling. To this end, kernel methods are developed to formulate an algorithm that adequately takes into account the output intersample behavior, which ultimately leads to more accurate models and more efficient output sampling compared to the standard approach. The efficacy of this method is demonstrated through a mass-spring damper case study.

Adaptive Decision-Making with Constraints and Dependent Losses: Performance Guarantees and Applications to Online and Nonlinear Identification

  • Authors: Michael Muehlebach
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.03321
  • Pdf link: https://arxiv.org/pdf/2304.03321
  • Abstract
    We consider adaptive decision-making problems where an agent optimizes a cumulative performance objective by repeatedly choosing among a finite set of options. Compared to the classical prediction-with-expert-advice set-up, we consider situations where losses are constrained and derive algorithms that exploit the additional structure in optimal and computationally efficient ways. Our algorithm and our analysis is instance dependent, that is, suboptimal choices of the environment are exploited and reflected in our regret bounds. The constraints handle general dependencies between losses (even across time), and are flexible enough to also account for a loss budget, which the environment is not allowed to exceed. The performance of the resulting algorithms is highlighted in two numerical examples, which include a nonlinear and online system identification task.

Hardware-Aware Static Optimization of Hyperdimensional Computations

  • Authors: Pu Yi, Sara Achour
  • Subjects: Programming Languages (cs.PL); Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2304.03335
  • Pdf link: https://arxiv.org/pdf/2304.03335
  • Abstract
    Hyperdimensional (HD) computing is an highly error-resilient computational paradigm that can be used to efficiently perform language classification, data retrieval, and analogical reasoning tasks on error-prone emerging hardware technologies. HD computation is storage-inefficient and often requires computing over 10,000-dimensional bit vectors. Prior work either leaves hypervectors unoptimized or dynamically tunes HD computation parameters (e.g., hypervector dimension) to deliver the desired accuracy. These approaches are time-consuming, lack accuracy guarantees, and do not generalize well. We present Heim, a framework for statically optimizing HD computation parameters to minimize resource usage in the presence of hardware error. Heim guarantees the optimized computation satisfies a user-provided target accuracy. Heim deploys a novel analysis procedure that unifies theoretical results in HD computing to systematically optimize HD computation. We develop four analysis-amenable data structures that leverage Heim to perform aggressive space-saving optimizations, and optimize these data structures to attain 99% query accuracy on both binary memory and multiple-bit-per-cell resistive memory. Heim-optimized data structures deliver 1.31x-14.51x reductions in hypervector size and 2.191x-27.27x reductions in memory usage while attaining 98.96-99.75% accuracy. Heim-optimized data structures deliver up to 41.40% accuracy improvements over dynamically tuned parameters. Heim computes parameters significantly faster than dynamic approaches.

Spintronic Physical Reservoir for Autonomous Prediction and Long-Term Household Energy Load Forecasting

  • Authors: Walid Al Misba, Harindra S. Mavikumbure, Md Mahadi Rajib, Daniel L. Marino, Victor Cobilean, Milos Manic, Jayasimha Atulasimha
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03343
  • Pdf link: https://arxiv.org/pdf/2304.03343
  • Abstract
    In this study, we have shown autonomous long-term prediction with a spintronic physical reservoir. Due to the short-term memory property of the magnetization dynamics, non-linearity arises in the reservoir states which could be used for long-term prediction tasks using simple linear regression for online training. During the prediction stage, the output is directly fed to the input of the reservoir for autonomous prediction. We employ our proposed reservoir for the modeling of the chaotic time series such as Mackey-Glass and dynamic time-series data, such as household building energy loads. Since only the last layer of a RC needs to be trained with linear regression, it is well suited for learning in real time on edge devices. Here we show that a skyrmion based magnetic tunnel junction can potentially be used as a prototypical RC but any nanomagnetic magnetic tunnel junction with nonlinear magnetization behavior can implement such a RC. By comparing our spintronic physical RC approach with state-of-the-art energy load forecasting algorithms, such as LSTMs and RNNs, we conclude that the proposed framework presents good performance in achieving high predictions accuracy, while also requiring low memory and energy both of which are at a premium in hardware resource and power constrained edge applications. Further, the proposed approach is shown to require very small training datasets and at the same time being at least 16X energy efficient compared to the state-of-the-art sequence to sequence LSTM for accurate household load predictions.

ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing Accelerators

  • Authors: Nisarg Ujjainkar, Jingwen Leng, Yuhao Zhu
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.03352
  • Pdf link: https://arxiv.org/pdf/2304.03352
  • Abstract
    Image processing algorithms are prime targets for hardware acceleration as they are commonly used in resource- and power-limited applications. Today's image processing accelerator designs make rigid assumptions about the algorithm structures and/or on-chip memory resources. As a result, they either have narrow applicability or result in inefficient designs. This paper presents a compiler framework that automatically generates memory- and power-efficient image processing accelerators. We allow programmers to describe generic image processing algorithms (in a domain specific language) and specify on-chip memory structures available. Our framework then formulates a constrained optimization problem that minimizes on-chip memory usage while maintaining theoretical maximum throughput. The key challenge we address is to analytically express the throughput bottleneck, on-chip memory contention, to enable a lightweight compilation. FPGA prototyping and ASIC synthesis show that, compared to existing approaches, accelerators generated by our framework reduce the on-chip memory usage and/or power consumption by double digits.

Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training

  • Authors: Luís Carvalho, João Lopes Costa, José Mourão, Gonçalo Oliveira
  • Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Probability (math.PR)
  • Arxiv link: https://arxiv.org/abs/2304.03385
  • Pdf link: https://arxiv.org/pdf/2304.03385
  • Abstract
    Recent developments in applications of artificial neural networks with over $n=10^{14}$ parameters make it extremely important to study the large $n$ behaviour of such networks. Most works studying wide neural networks have focused on the infinite width $n \to +\infty$ limit of such networks and have shown that, at initialization, they correspond to Gaussian processes. In this work we will study their behavior for large, but finite $n$. Our main contributions are the following: (1) The computation of the corrections to Gaussianity in terms of an asymptotic series in $n^{-\frac{1}{2}}$. The coefficients in this expansion are determined by the statistics of parameter initialization and by the activation function. (2) Controlling the evolution of the outputs of finite width $n$ networks, during training, by computing deviations from the limiting infinite width case (in which the network evolves through a linear flow). This improves previous estimates and yields sharper decay rates for the (finite width) NTK in terms of $n$, valid during the entire training procedure. As a corollary, we also prove that, with arbitrarily high probability, the training of sufficiently wide neural networks converges to a global minimum of the corresponding quadratic loss function. (3) Estimating how the deviations from Gaussianity evolve with training in terms of $n$. In particular, using a certain metric in the space of measures we find that, along training, the resulting measure is within $n^{-\frac{1}{2}}(\log n)^{1+}$ of the time dependent Gaussian process corresponding to the infinite width network (which is explicitly given by precomposing the initial Gaussian process with the linear flow corresponding to training in the infinite width limit).

An Online Adaptation Strategy for Direct Data-driven Control

  • Authors: Johannes Teutsch, Sebastian Ellmaier, Sebastian Kerz, Dirk Wollherr, Marion Leibold
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.03386
  • Pdf link: https://arxiv.org/pdf/2304.03386
  • Abstract
    The fundamental lemma from behavioral systems theory yields a data-driven non-parametric system representation that has shown great potential for the data-efficient control of unknown linear and weakly nonlinear systems, even in the presence of measurement noise. In this work, we strive to extend the applicability of this paradigm to more strongly nonlinear systems by updating the system representation during control. Unlike existing approaches, our method does not impose suitable excitation to the control inputs, but runs as an observer parallel to the controller. Whenever a rank condition is deemed to be fulfilled, the system representation is updated using newly available datapoints. In a reference tracking simulation of a two-link robotic arm, we showcase the performance of the proposed strategy in a predictive control framework.

CAPOT: Creating Robust Dense Query Encoders using Post Training Contrastive Alignment

  • Authors: Daniel Campos, ChengXiang Zhai, Alessandro Magnani
  • Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.03401
  • Pdf link: https://arxiv.org/pdf/2304.03401
  • Abstract
    The success of contextual word representations and advances in neural information retrieval have made dense vector-based retrieval a standard approach for passage and document ranking. While effective and efficient, dual-encoders are brittle to variations in query distributions and noisy queries. Data augmentation can make models more robust but introduces overhead to training set generation and requires retraining and index regeneration. We present Contrastive Alignment POst Training (CAPOT), a highly efficient finetuning method that improves model robustness without requiring index regeneration, the training set optimization, or alteration. CAPOT enables robust retrieval by freezing the document encoder while the query encoder learns to align noisy queries with their unaltered root. We evaluate CAPOT noisy variants of MSMARCO, Natural Questions, and Trivia QA passage retrieval, finding CAPOT has a similar impact as data augmentation with none of its overhead.

TinyDet: Accurate Small Object Detection in Lightweight Generic Detectors

  • Authors: Shaoyu Chen, Tianheng Cheng, Jiemin Fang, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03428
  • Pdf link: https://arxiv.org/pdf/2304.03428
  • Abstract
    Small object detection requires the detection head to scan a large number of positions on image feature maps, which is extremely hard for computation- and energy-efficient lightweight generic detectors. To accurately detect small objects with limited computation, we propose a two-stage lightweight detection framework with extremely low computation complexity, termed as TinyDet. It enables high-resolution feature maps for dense anchoring to better cover small objects, proposes a sparsely-connected convolution for computation reduction, enhances the early stage features in the backbone, and addresses the feature misalignment problem for accurate small object detection. On the COCO benchmark, our TinyDet-M achieves 30.3 AP and 13.5 AP^s with only 991 MFLOPs, which is the first detector that has an AP over 30 with less than 1 GFLOPs; besides, TinyDet-S and TinyDet-L achieve promising performance under different computation limitation.

Exploring Collaborative Distributed Diffusion-Based AI-Generated Content (AIGC) in Wireless Networks

  • Authors: Hongyang Du, Ruichen Zhang, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Xuemin (Sherman)Shen, H. Vincent Poor
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.03446
  • Pdf link: https://arxiv.org/pdf/2304.03446
  • Abstract
    Driven by advances in generative artificial intelligence (AI) techniques and algorithms, the widespread adoption of AI-generated content (AIGC) has emerged, allowing for the generation of diverse and high-quality content. Especially, the diffusion model-based AIGC technique has been widely used to generate content in a variety of modalities. However, the real-world implementation of AIGC models, particularly on resource-constrained devices such as mobile phones, introduces significant challenges related to energy consumption and privacy concerns. To further promote the realization of ubiquitous AIGC services, we propose a novel collaborative distributed diffusion-based AIGC framework. By capitalizing on collaboration among devices in wireless networks, the proposed framework facilitates the efficient execution of AIGC tasks, optimizing edge computation resource utilization. Furthermore, we examine the practical implementation of the denoising steps on mobile phones, the impact of the proposed approach on the wireless network-aided AIGC landscape, and the future opportunities associated with its real-world integration. The contributions of this paper not only offer a promising solution to the existing limitations of AIGC services but also pave the way for future research in device collaboration, resource optimization, and the seamless delivery of AIGC services across various devices. Our code is available at https://github.com/HongyangDu/DistributedDiffusion.

Does Prompt-Tuning Language Model Ensure Privacy?

  • Authors: Shangyu Xie, Wei Dai, Esha Ghosh, Sambuddha Roy, Dan Schwartz, Kim Laine
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.03472
  • Pdf link: https://arxiv.org/pdf/2304.03472
  • Abstract
    Prompt-tuning has received attention as an efficient tuning method in the language domain, i.e., tuning a prompt that is a few tokens long, while keeping the large language model frozen, yet achieving comparable performance with conventional fine-tuning. Considering the emerging privacy concerns with language models, we initiate the study of privacy leakage in the setting of prompt-tuning. We first describe a real-world email service pipeline to provide customized output for various users via prompt-tuning. Then we propose a novel privacy attack framework to infer users' private information by exploiting the prompt module with user-specific signals. We conduct a comprehensive privacy evaluation on the target pipeline to demonstrate the potential leakage from prompt-tuning. The results also demonstrate the effectiveness of the proposed attack.

Can we learn better with hard samples?

  • Authors: Subin Sahayam, John Zakkam, Umarani Jayaraman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.03486
  • Pdf link: https://arxiv.org/pdf/2304.03486
  • Abstract
    In deep learning, mini-batch training is commonly used to optimize network parameters. However, the traditional mini-batch method may not learn the under-represented samples and complex patterns in the data, leading to a longer time for generalization. To address this problem, a variant of the traditional algorithm has been proposed, which trains the network focusing on mini-batches with high loss. The study evaluates the effectiveness of the proposed training using various deep neural networks trained on three benchmark datasets (CIFAR-10, CIFAR-100, and STL-10). The deep neural networks used in the study are ResNet-18, ResNet-50, Efficient Net B4, EfficientNetV2-S, and MobilenetV3-S. The experimental results showed that the proposed method can significantly improve the test accuracy and speed up the convergence compared to the traditional mini-batch training method. Furthermore, we introduce a hyper-parameter delta ({\delta}) that decides how many mini-batches are considered for training. Experiments on various values of {\delta} found that the performance of the proposed method for smaller {\delta} values generally results in similar test accuracy and faster generalization. We show that the proposed method generalizes in 26.47% less number of epochs than the traditional mini-batch method in EfficientNet-B4 on STL-10. The proposed method also improves the test top-1 accuracy by 7.26% in ResNet-18 on CIFAR-100.

Continuous Input Embedding Size Search For Recommender Systems

  • Authors: Yunke Qu, Tong Chen, Xiangyu Zhao, Lizhen Cui, Kai Zheng, Hongzhi Yin
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.03501
  • Pdf link: https://arxiv.org/pdf/2304.03501
  • Abstract
    Latent factor models are the most popular backbones for today's recommender systems owing to their prominent performance. Latent factor models represent users and items as real-valued embedding vectors for pairwise similarity computation, and all embeddings are traditionally restricted to a uniform size that is relatively large (e.g., 256-dimensional). With the exponentially expanding user base and item catalog in contemporary e-commerce, this design is admittedly becoming memory-inefficient. To facilitate lightweight recommendation, reinforcement learning (RL) has recently opened up opportunities for identifying varying embedding sizes for different users/items. However, challenged by search efficiency and learning an optimal RL policy, existing RL-based methods are restricted to highly discrete, predefined embedding size choices. This leads to a largely overlooked potential of introducing finer granularity into embedding sizes to obtain better recommendation effectiveness under a given memory budget. In this paper, we propose continuous input embedding size search (CIESS), a novel RL-based method that operates on a continuous search space with arbitrary embedding sizes to choose from. In CIESS, we further present an innovative random walk-based exploration strategy to allow the RL policy to efficiently explore more candidate embedding sizes and converge to a better decision. CIESS is also model-agnostic and hence generalizable to a variety of latent factor RSs, whilst experiments on two real-world datasets have shown state-of-the-art performance of CIESS under different memory budgets when paired with three popular recommendation models.

Generative Recommendation: Towards Next-generation Recommender Paradigm

  • Authors: Wenjie Wang, Xinyu Lin, Fuli Feng, Xiangnan He, Tat-Seng Chua
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.03516
  • Pdf link: https://arxiv.org/pdf/2304.03516
  • Abstract
    Recommender systems typically retrieve items from an item corpus for personalized recommendations. However, such a retrieval-based recommender paradigm faces two limitations: 1) the human-generated items in the corpus might fail to satisfy the users' diverse information needs, and 2) users usually adjust the recommendations via passive and inefficient feedback such as clicks. Nowadays, AI-Generated Content (AIGC) has revealed significant success across various domains, offering the potential to overcome these limitations: 1) generative AI can produce personalized items to meet users' specific information needs, and 2) the newly emerged ChatGPT significantly facilitates users to express information needs more precisely via natural language instructions. In this light, the boom of AIGC points the way towards the next-generation recommender paradigm with two new objectives: 1) generating personalized content through generative AI, and 2) integrating user instructions to guide content generation. To this end, we propose a novel Generative Recommender paradigm named GeneRec, which adopts an AI generator to personalize content generation and leverages user instructions to acquire users' information needs. Specifically, we pre-process users' instructions and traditional feedback (e.g., clicks) via an instructor to output the generation guidance. Given the guidance, we instantiate the AI generator through an AI editor and an AI creator to repurpose existing items and create new items, respectively. Eventually, GeneRec can perform content retrieval, repurposing, and creation to meet users' information needs. Besides, to ensure the trustworthiness of the generated items, we emphasize various fidelity checks such as authenticity and legality checks. Lastly, we study the feasibility of implementing the AI editor and AI creator on micro-video generation, showing promising results.

From Retrieval to Generation: Efficient and Effective Entity Set Expansion

  • Authors: Shulin Huang, Shirong Ma, Yangning Li, Yinghui Li, Hai-Tao Zheng, Yong Jiang
  • Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.03531
  • Pdf link: https://arxiv.org/pdf/2304.03531
  • Abstract
    Entity Set Expansion (ESE) is a critical task aiming to expand entities of the target semantic class described by a small seed entity set. Most existing ESE methods are retrieval-based frameworks that need to extract the contextual features of entities and calculate the similarity between seed entities and candidate entities. To achieve the two purposes, they should iteratively traverse the corpus and the entity vocabulary provided in the datasets, resulting in poor efficiency and scalability. The experimental results indicate that the time consumed by the retrieval-based ESE methods increases linearly with entity vocabulary and corpus size. In this paper, we firstly propose a generative ESE framework, Generative Entity Set Expansion (GenExpan), which utilizes a generative pre-trained language model to accomplish ESE task. Specifically, a prefix tree is employed to guarantee the validity of entity generation, and automatically generated class names are adopted to guide the model to generate target entities. Moreover, we propose Knowledge Calibration and Generative Ranking to further bridge the gap between generic knowledge of the language model and the goal of ESE task. Experiments on publicly available datasets show that GenExpan is efficient and effective. For efficiency, expansion time consumed by GenExpan is independent of entity vocabulary and corpus size, and GenExpan achieves an average 600% speedup compared to strong baselines. For expansion performance, our framework outperforms previous state-of-the-art ESE methods.

A Mixer Layer is Worth One Graph Convolution: Unifying MLP-Mixers and GCNs for Human Motion Prediction

  • Authors: Xinshun Wang, Shen Zhao, Chen Chen, Mengyuan Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03532
  • Pdf link: https://arxiv.org/pdf/2304.03532
  • Abstract
    The past few years has witnessed the dominance of Graph Convolutional Networks (GCNs) over human motion prediction, while their performance is still far from satisfactory. Recently, MLP-Mixers show competitive results on top of being more efficient and simple. To extract features, GCNs typically follow an aggregate-and-update paradigm, while Mixers rely on token mixing and channel mixing operations. The two research paths have been independently established in the community. In this paper, we develop a novel perspective by unifying Mixers and GCNs. We show that a mixer layer can be seen as a graph convolutional layer applied to a fully-connected graph with parameterized adjacency. Extending this theoretical finding to the practical side, we propose Meta-Mixing Network (M$^2$-Net). Assisted with a novel zero aggregation operation, our network is capable of capturing both the structure-agnostic and the structure-sensitive dependencies in a collaborative manner. Not only is it computationally efficient, but most importantly, it also achieves state-of-the-art performance. An extensive evaluation on the Human3.6M, AMASS, and 3DPW datasets shows that M$^2$-Net consistently outperforms all other approaches. We hope our work brings the community one step further towards truly predictable human motion. Our code will be publicly available.

Applicable Methodologies for the Mass Transfer Phenomenon in Tumble Dryers: A Review

  • Authors: Sajad Salavatidezfouli, Arash Hajisharifi, Michele Girfoglio, Giovanni Stabile, Gianluigi Rozza
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.03533
  • Pdf link: https://arxiv.org/pdf/2304.03533
  • Abstract
    Tumble dryers offer a fast and convenient way of drying textiles independent of weather conditions and therefore are frequently used in ordinary households. However, artificial drying of textiles consumes considerable amounts of energy, approximately 8.2 percent of the residential electricity consumption is for drying of textiles in northern European countries (Cranston et al., 2019). Several authors have investigated the aspects of the clothes drying cycle with experimental and numerical methods to understand and improve the process. The first turning point study on understanding the physics of evaporation for tumble dryers was presented by Lambert et al. (1991) in the early 90s. With the aid of Chilton_Colburn analogy, they introduced the concept of area-mass transfer coefficient to address evaporation rate. Afterwards, several experimental or numerical studies were published based on this concept, and furthermore, the model was then developed into 0-dimensional (Deans, 2001) and 1-dimensional (Wei et al., 2017) to gain more accuracy. The evaporation rate is considered to be the main system parameter for dryers with which other performance parameters including drying time, effectiveness, moisture content and efficiency can be estimated. More recent literature focused on utilizing dimensional analysis or image processing techniques to correlate drying indices with system parameters. However, the validity of these regressed models is machine-specific, and hence, cannot be generalized yet. All the previous models for estimating the evaporation rate in tumble dryers are discussed. The review of the related literature showed that all of the previous models for the prediction of the evaporation rate in the clothes dryers have some limitations in terms of accuracy and applicability.

CRISP: Curriculum inducing Primitive Informed Subgoal Prediction for Hierarchical Reinforcement Learning

  • Authors: Utsav Singh, Vinay P Namboodiri
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03535
  • Pdf link: https://arxiv.org/pdf/2304.03535
  • Abstract
    Hierarchical reinforcement learning is a promising approach that uses temporal abstraction to solve complex long horizon problems. However, simultaneously learning a hierarchy of policies is unstable as it is challenging to train higher-level policy when the lower-level primitive is non-stationary. In this paper, we propose a novel hierarchical algorithm by generating a curriculum of achievable subgoals for evolving lower-level primitives using reinforcement learning and imitation learning. The lower level primitive periodically performs data relabeling on a handful of expert demonstrations using our primitive informed parsing approach. We provide expressions to bound the sub-optimality of our method and develop a practical algorithm for hierarchical reinforcement learning. Since our approach uses a handful of expert demonstrations, it is suitable for most robotic control tasks. Experimental evaluation on complex maze navigation and robotic manipulation environments show that inducing hierarchical curriculum learning significantly improves sample efficiency, and results in efficient goal conditioned policies for solving temporally extended tasks.

ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT Interactions

  • Authors: Sibei Chen, Hanbing Liu, Weiting Jin, Xiangyu Sun, Xiaoyao Feng, Ju Fan, Xiaoyong Du, Nan Tang
  • Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03540
  • Pdf link: https://arxiv.org/pdf/2304.03540
  • Abstract
    Orchestrating a high-quality data preparation program is essential for successful machine learning (ML), but it is known to be time and effort consuming. Despite the impressive capabilities of large language models like ChatGPT in generating programs by interacting with users through natural language prompts, there are still limitations. Specifically, a user must provide specific prompts to iteratively guide ChatGPT in improving data preparation programs, which requires a certain level of expertise in programming, the dataset used and the ML task. Moreover, once a program has been generated, it is non-trivial to revisit a previous version or make changes to the program without starting the process over again. In this paper, we present ChatPipe, a novel system designed to facilitate seamless interaction between users and ChatGPT. ChatPipe provides users with effective recommendation on next data preparation operations, and guides ChatGPT to generate program for the operations. Also, ChatPipe enables users to easily roll back to previous versions of the program, which facilitates more efficient experimentation and testing. We have developed a web application for ChatPipe and prepared several real-world ML tasks from Kaggle. These tasks can showcase the capabilities of ChatPipe and enable VLDB attendees to easily experiment with our novel features to rapidly orchestrate a high-quality data preparation program.

Towards Automated 3D Search Planning for Emergency Response Missions

  • Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.03570
  • Pdf link: https://arxiv.org/pdf/2304.03570
  • Abstract
    The ability to efficiently plan and execute automated and precise search missions using unmanned aerial vehicles (UAVs) during emergency response situations is imperative. Precise navigation between obstacles and time-efficient searching of 3D structures and buildings are essential for locating survivors and people in need in emergency response missions. In this work we address this challenging problem by proposing a unified search planning framework that automates the process of UAV-based search planning in 3D environments. Specifically, we propose a novel search planning framework which enables automated planning and execution of collision-free search trajectories in 3D by taking into account low-level mission constrains (e.g., the UAV dynamical and sensing model), mission objectives (e.g., the mission execution time and the UAV energy efficiency) and user-defined mission specifications (e.g., the 3D structures to be searched and minimum detection probability constraints). The capabilities and performance of the proposed approach are demonstrated through extensive simulated 3D search scenarios.

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

  • Authors: Li Shen, Yan Sun, Zhiyuan Yu, Liang Ding, Xinmei Tian, Dacheng Tao
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.03589
  • Pdf link: https://arxiv.org/pdf/2304.03589
  • Abstract
    The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech. The use of large-scale models trained on vast amounts of data holds immense promise for practical applications, enhancing industrial productivity and facilitating social development. With the increasing demands on computational capacity, though numerous studies have explored the efficient training, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated. In this survey, we present a detailed review for training acceleration. We consider the fundamental update formulation and split its basic components into five main perspectives: (1) data-centric: including dataset regularization, data sampling, and data-centric curriculum learning techniques, which can significantly reduce the computational complexity of the data samples; (2) model-centric, including acceleration of basic modules, compression training, model initialization and model-centric curriculum learning techniques, which focus on accelerating the training via reducing the calculations on parameters; (3) optimization-centric, including the selection of learning rate, the employment of large batchsize, the designs of efficient objectives, and model average techniques, which pay attention to the training policy and improving the generality for the large-scale models; (4) budgeted training, including some distinctive acceleration methods on source-constrained situations; (5) system-centric, including some efficient open-source distributed libraries/systems which provide adequate hardware support for the implementation of acceleration algorithms. By presenting this comprehensive taxonomy, our survey presents a comprehensive review to understand the general mechanisms within each component and their joint interaction.

ALIKED: A Lighter Keypoint and Descriptor Extraction Network via Deformable Transformation

  • Authors: Xiaoming Zhao, Xingming Wu, Weihai Chen, Peter C. Y. Chen, Qingsong Xu, Zhengguo Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03608
  • Pdf link: https://arxiv.org/pdf/2304.03608
  • Abstract
    Image keypoints and descriptors play a crucial role in many visual measurement tasks. In recent years, deep neural networks have been widely used to improve the performance of keypoint and descriptor extraction. However, the conventional convolution operations do not provide the geometric invariance required for the descriptor. To address this issue, we propose the Sparse Deformable Descriptor Head (SDDH), which learns the deformable positions of supporting features for each keypoint and constructs deformable descriptors. Furthermore, SDDH extracts descriptors at sparse keypoints instead of a dense descriptor map, which enables efficient extraction of descriptors with strong expressiveness. In addition, we relax the neural reprojection error (NRE) loss from dense to sparse to train the extracted sparse descriptors. Experimental results show that the proposed network is both efficient and powerful in various visual measurement tasks, including image matching, 3D reconstruction, and visual relocalization.

Qubo model for the Closest Vector Problem

  • Authors: Eduardo Canale, Claudio Qureshi, Alfredo Viola
  • Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2304.03616
  • Pdf link: https://arxiv.org/pdf/2304.03616
  • Abstract
    In this paper we consider the closest vector problem (CVP) for lattices $\Lambda \subseteq \mathbb{Z}^n$ given by a generator matrix $A\in \mathcal{M}_{n\times n}(\mathbb{Z})$. Let $b&gt;0$ be the maximum of the absolute values of the entries of the matrix $A$. We prove that the CVP can be reduced in polynomial time to a quadratic unconstrained binary optimization (QUBO) problem in $O(n^2(\log(n)+\log(b)))$ binary variables, where the length of the coefficients in the corresponding quadratic form is $O(n(\log(n)+\log(b)))$.

FedDiSC: A Computation-efficient Federated Learning Framework for Power Systems Disturbance and Cyber Attack Discrimination

  • Authors: Muhammad Akbar Husnoo, Adnan Anwar, Haftu Tasew Reda, Nasser Hosseinzadeh, Shama Naz Islam, Abdun Naser Mahmood, Robin Doss
  • Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03640
  • Pdf link: https://arxiv.org/pdf/2304.03640
  • Abstract
    With the growing concern about the security and privacy of smart grid systems, cyberattacks on critical power grid components, such as state estimation, have proven to be one of the top-priority cyber-related issues and have received significant attention in recent years. However, cyberattack detection in smart grids now faces new challenges, including privacy preservation and decentralized power zones with strategic data owners. To address these technical bottlenecks, this paper proposes a novel Federated Learning-based privacy-preserving and communication-efficient attack detection framework, known as FedDiSC, that enables Discrimination between power System disturbances and Cyberattacks. Specifically, we first propose a Federated Learning approach to enable Supervisory Control and Data Acquisition subsystems of decentralized power grid zones to collaboratively train an attack detection model without sharing sensitive power related data. Secondly, we put forward a representation learning-based Deep Auto-Encoder network to accurately detect power system and cybersecurity anomalies. Lastly, to adapt our proposed framework to the timeliness of real-world cyberattack detection in SGs, we leverage the use of a gradient privacy-preserving quantization scheme known as DP-SIGNSGD to improve its communication efficiency. Extensive simulations of the proposed framework on publicly available Industrial Control Systems datasets demonstrate that the proposed framework can achieve superior detection accuracy while preserving the privacy of sensitive power grid related information. Furthermore, we find that the gradient quantization scheme utilized improves communication efficiency by 40% when compared to a traditional federated learning approach without gradient quantization which suggests suitability in a real-world scenario.

SCART: Simulation of Cyber Attacks for Real-Time

  • Authors: Kfir Girstein, Eliron Rahimi, Prof. Avi Mendelson
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.03657
  • Pdf link: https://arxiv.org/pdf/2304.03657
  • Abstract
    Real-Time systems are often implemented as reactive systems that respond to stimuli and complete tasks in a known bounded time. The development process of such systems usually involves using a cycle-accurate simulation environment and even the digital twine system that can accurately simulate the system and the environment it operates in. In addition, many real-time systems require high reliability and strive to be immune against security attacks. Thus, the development environment must support reliability-related events such as the failure of a sensor, malfunction of a subsystem, and foreseen events of Cyber security attacks. This paper presents the SCART framework - an innovative solution that aims to allow extending simulation environments of real-time systems with the capability to incorporate reliability-related events and advanced cyber security attacks, e.g., an attack on a single sensor as well as "complex security attacks" that aim to change the behavior of a group of sensors. We validate our system by applying the new proposed environment on control a drone's flight control system including its navigation system that uses machine learning algorithms. Such a system is very challenging since it requires many experiments that can hardly be achieved by using live systems. We showed that using SCART is very efficient, can increase the model's accuracy, and significantly reduce false-positive rates. Some of these experiments were also validated using a set of "real drones".

DATE: Domain Adaptive Product Seeker for E-commerce

  • Authors: Haoyuan Li, Hao Jiang, Tao Jin, Mengyan Li, Yan Chen, Zhijie Lin, Yang Zhao, Zhou Zhao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03669
  • Pdf link: https://arxiv.org/pdf/2304.03669
  • Abstract
    Product Retrieval (PR) and Grounding (PG), aiming to seek image and object-level products respectively according to a textual query, have attracted great interest recently for better shopping experience. Owing to the lack of relevant datasets, we collect two large-scale benchmark datasets from Taobao Mall and Live domains with about 474k and 101k image-query pairs for PR, and manually annotate the object bounding boxes in each image for PG. As annotating boxes is expensive and time-consuming, we attempt to transfer knowledge from annotated domain to unannotated for PG to achieve un-supervised Domain Adaptation (PG-DA). We propose a {\bf D}omain {\bf A}daptive Produc{\bf t} S{\bf e}eker ({\bf DATE}) framework, regarding PR and PG as Product Seeking problem at different levels, to assist the query {\bf date} the product. Concretely, we first design a semantics-aggregated feature extractor for each modality to obtain concentrated and comprehensive features for following efficient retrieval and fine-grained grounding tasks. Then, we present two cooperative seekers to simultaneously search the image for PR and localize the product for PG. Besides, we devise a domain aligner for PG-DA to alleviate uni-modal marginal and multi-modal conditional distribution shift between source and target domains, and design a pseudo box generator to dynamically select reliable instances and generate bounding boxes for further knowledge transfer. Extensive experiments show that our DATE achieves satisfactory performance in fully-supervised PR, PG and un-supervised PG-DA. Our desensitized datasets will be publicly available here\footnote{\url{https://github.com/Taobao-live/Product-Seeking}}.

Sound Dynamic Deadlock Prediction in Linear Time

  • Authors: Umang Mathur, Andreas Pavlogiannis, Hünkar Can Tunç, Mahesh Viswanathan
  • Subjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.03692
  • Pdf link: https://arxiv.org/pdf/2304.03692
  • Abstract
    Deadlocks are one of the most notorious concurrency bugs, and significant research has focused on detecting them efficiently. Dynamic predictive analyses work by observing concurrent executions, and reason about alternative interleavings that can witness concurrency bugs. Such techniques offer scalability and sound bug reports, and have emerged as an effective approach for concurrency bug detection, such as data races. Effective dynamic deadlock prediction, however, has proven a challenging task, as no deadlock predictor currently meets the requirements of soundness, high-precision, and efficiency. In this paper, we first formally establish that this tradeoff is unavoidable, by showing that (a) sound and complete deadlock prediction is intractable, in general, and (b) even the seemingly simpler task of determining the presence of potential deadlocks, which often serve as unsound witnesses for actual predictable deadlocks, is intractable. The main contribution of this work is a new class of predictable deadlocks, called sync(hronization)-preserving deadlocks. Informally, these are deadlocks that can be predicted by reordering the observed execution while preserving the relative order of conflicting critical sections. We present two algorithms for sound deadlock prediction based on this notion. Our first algorithm SyncPDOffline detects all sync-preserving deadlocks, with running time that is linear per abstract deadlock pattern, a novel notion also introduced in this work. Our second algorithm SyncPDOnline predicts all sync-preserving deadlocks that involve two threads in a strictly online fashion, runs in overall linear time, and is better suited for a runtime monitoring setting. We implemented both our algorithms and evaluated their ability to perform offline and online deadlock-prediction on a large dataset of standard benchmarks.

On the Importance of Contrastive Loss in Multimodal Learning

  • Authors: Yunwei Ren, Yuanzhi Li
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03717
  • Pdf link: https://arxiv.org/pdf/2304.03717
  • Abstract
    Recently, contrastive learning approaches (e.g., CLIP (Radford et al., 2021)) have received huge success in multimodal learning, where the model tries to minimize the distance between the representations of different views (e.g., image and its caption) of the same data point while keeping the representations of different data points away from each other. However, from a theoretical perspective, it is unclear how contrastive learning can learn the representations from different views efficiently, especially when the data is not isotropic. In this work, we analyze the training dynamics of a simple multimodal contrastive learning model and show that contrastive pairs are important for the model to efficiently balance the learned representations. In particular, we show that the positive pairs will drive the model to align the representations at the cost of increasing the condition number, while the negative pairs will reduce the condition number, keeping the learned representations balanced.

Keyword: faster

Hardware-Aware Static Optimization of Hyperdimensional Computations

  • Authors: Pu Yi, Sara Achour
  • Subjects: Programming Languages (cs.PL); Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2304.03335
  • Pdf link: https://arxiv.org/pdf/2304.03335
  • Abstract
    Hyperdimensional (HD) computing is an highly error-resilient computational paradigm that can be used to efficiently perform language classification, data retrieval, and analogical reasoning tasks on error-prone emerging hardware technologies. HD computation is storage-inefficient and often requires computing over 10,000-dimensional bit vectors. Prior work either leaves hypervectors unoptimized or dynamically tunes HD computation parameters (e.g., hypervector dimension) to deliver the desired accuracy. These approaches are time-consuming, lack accuracy guarantees, and do not generalize well. We present Heim, a framework for statically optimizing HD computation parameters to minimize resource usage in the presence of hardware error. Heim guarantees the optimized computation satisfies a user-provided target accuracy. Heim deploys a novel analysis procedure that unifies theoretical results in HD computing to systematically optimize HD computation. We develop four analysis-amenable data structures that leverage Heim to perform aggressive space-saving optimizations, and optimize these data structures to attain 99% query accuracy on both binary memory and multiple-bit-per-cell resistive memory. Heim-optimized data structures deliver 1.31x-14.51x reductions in hypervector size and 2.191x-27.27x reductions in memory usage while attaining 98.96-99.75% accuracy. Heim-optimized data structures deliver up to 41.40% accuracy improvements over dynamically tuned parameters. Heim computes parameters significantly faster than dynamic approaches.

TopNet: Transformer-based Object Placement Network for Image Compositing

  • Authors: Sijie Zhu, Zhe Lin, Scott Cohen, Jason Kuen, Zhifei Zhang, Chen Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03372
  • Pdf link: https://arxiv.org/pdf/2304.03372
  • Abstract
    We investigate the problem of automatically placing an object into a background image for image compositing. Given a background image and a segmented object, the goal is to train a model to predict plausible placements (location and scale) of the object for compositing. The quality of the composite image highly depends on the predicted location/scale. Existing works either generate candidate bounding boxes or apply sliding-window search using global representations from background and object images, which fail to model local information in background images. However, local clues in background images are important to determine the compatibility of placing the objects with certain locations/scales. In this paper, we propose to learn the correlation between object features and all local background features with a transformer module so that detailed information can be provided on all possible location/scale configurations. A sparse contrastive loss is further proposed to train our model with sparse supervision. Our new formulation generates a 3D heatmap indicating the plausibility of all location/scale combinations in one network forward pass, which is over 10 times faster than the previous sliding-window method. It also supports interactive search when users provide a pre-defined location or scale. The proposed method can be trained with explicit annotation or in a self-supervised manner using an off-the-shelf inpainting model, and it outperforms state-of-the-art methods significantly. The user study shows that the trained model generalizes well to real-world images with diverse challenging scenes and object categories.

Scalable Causal Discovery with Score Matching

  • Authors: Francesco Montagna, Nicoletta Noceti, Lorenzo Rosasco, Kun Zhang, Francesco Locatello
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.03382
  • Pdf link: https://arxiv.org/pdf/2304.03382
  • Abstract
    This paper demonstrates how to discover the whole causal graph from the second derivative of the log-likelihood in observational non-linear additive Gaussian noise models. Leveraging scalable machine learning approaches to approximate the score function $\nabla \log p(\mathbf{X})$, we extend the work of Rolland et al. (2022) that only recovers the topological order from the score and requires an expensive pruning step removing spurious edges among those admitted by the ordering. Our analysis leads to DAS (acronym for Discovery At Scale), a practical algorithm that reduces the complexity of the pruning by a factor proportional to the graph size. In practice, DAS achieves competitive accuracy with current state-of-the-art while being over an order of magnitude faster. Overall, our approach enables principled and scalable causal discovery, significantly lowering the compute bar.

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

  • Authors: Jing Shi, Wei Xiong, Zhe Lin, Hyun Joon Jung
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03411
  • Pdf link: https://arxiv.org/pdf/2304.03411
  • Abstract
    Recent advances in personalized image generation allow a pre-trained text-to-image model to learn a new concept from a set of images. However, existing personalization approaches usually require heavy test-time finetuning for each concept, which is time-consuming and difficult to scale. We propose InstantBooth, a novel approach built upon pre-trained text-to-image models that enables instant text-guided image personalization without any test-time finetuning. We achieve this with several major components. First, we learn the general concept of the input images by converting them to a textual token with a learnable image encoder. Second, to keep the fine details of the identity, we learn rich visual feature representation by introducing a few adapter layers to the pre-trained model. We train our components only on text-image pairs without using paired images of the same concept. Compared to test-time finetuning-based methods like DreamBooth and Textual-Inversion, our model can generate competitive results on unseen concepts concerning language-image alignment, image fidelity, and identity preservation while being 100 times faster.

Convex Minimization with Integer Minima in $\widetilde O(n^4)$ Time

  • Authors: Haotian Jiang, Yin Tat Lee, Zhao Song, Lichen Zhang
  • Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.03426
  • Pdf link: https://arxiv.org/pdf/2304.03426
  • Abstract
    Given a convex function $f$ on $\mathbb{R}^n$ with an integer minimizer, we show how to find an exact minimizer of $f$ using $O(n^2 \log n)$ calls to a separation oracle and $O(n^4 \log n)$ time. The previous best polynomial time algorithm for this problem given in [Jiang, SODA 2021, JACM 2022] achieves $\widetilde{O}(n^2)$ oracle complexity. However, the overall runtime of Jiang's algorithm is at least $\widetilde{\Omega}(n^8)$, due to expensive sub-routines such as the Lenstra-Lenstra-Lov'asz (LLL) algorithm [Lenstra, Lenstra, Lov'asz, Math. Ann. 1982] and random walk based cutting plane method [Bertsimas, Vempala, JACM 2004]. Our significant speedup is obtained by a nontrivial combination of a faster version of the LLL algorithm due to [Neumaier, Stehl'e, ISSAC 2016] that gives similar guarantees, the volumetric center cutting plane method (CPM) by [Vaidya, FOCS 1989] and its fast implementation given in [Jiang, Lee, Song, Wong, STOC 2020]. For the special case of submodular function minimization (SFM), our result implies a strongly polynomial time algorithm for this problem using $O(n^3 \log n)$ calls to an evaluation oracle and $O(n^4 \log n)$ additional arithmetic operations. Both the oracle complexity and the number of arithmetic operations of our more general algorithm are better than the previous best-known runtime algorithms for this specific problem given in [Lee, Sidford, Wong, FOCS 2015] and [Dadush, V'egh, Zambelli, SODA 2018, MOR 2021].

Can we learn better with hard samples?

  • Authors: Subin Sahayam, John Zakkam, Umarani Jayaraman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.03486
  • Pdf link: https://arxiv.org/pdf/2304.03486
  • Abstract
    In deep learning, mini-batch training is commonly used to optimize network parameters. However, the traditional mini-batch method may not learn the under-represented samples and complex patterns in the data, leading to a longer time for generalization. To address this problem, a variant of the traditional algorithm has been proposed, which trains the network focusing on mini-batches with high loss. The study evaluates the effectiveness of the proposed training using various deep neural networks trained on three benchmark datasets (CIFAR-10, CIFAR-100, and STL-10). The deep neural networks used in the study are ResNet-18, ResNet-50, Efficient Net B4, EfficientNetV2-S, and MobilenetV3-S. The experimental results showed that the proposed method can significantly improve the test accuracy and speed up the convergence compared to the traditional mini-batch training method. Furthermore, we introduce a hyper-parameter delta ({\delta}) that decides how many mini-batches are considered for training. Experiments on various values of {\delta} found that the performance of the proposed method for smaller {\delta} values generally results in similar test accuracy and faster generalization. We show that the proposed method generalizes in 26.47% less number of epochs than the traditional mini-batch method in EfficientNet-B4 on STL-10. The proposed method also improves the test top-1 accuracy by 7.26% in ResNet-18 on CIFAR-100.

Pallet Detection from Synthetic Data Using Game Engines

  • Authors: Jouveer Naidoo, Nicholas Bates, Trevor Gee, Mahla Nejati
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.03602
  • Pdf link: https://arxiv.org/pdf/2304.03602
  • Abstract
    This research sets out to assess the viability of using game engines to generate synthetic training data for machine learning in the context of pallet segmentation. Using synthetic data has been proven in prior research to be a viable means of training neural networks and saves hours of manual labour due to the reduced need for manual image annotation. Machine vision for pallet detection can benefit from synthetic data as the industry increases the development of autonomous warehousing technologies. As per our methodology, we developed a tool capable of automatically generating large amounts of annotated training data from 3D models at pixel-perfect accuracy and a much faster rate than manual approaches. Regarding image segmentation, a Mask R-CNN pipeline was used, which achieved an AP50 of 86% for individual pallets.

Keyword: mobile

Exploring Collaborative Distributed Diffusion-Based AI-Generated Content (AIGC) in Wireless Networks

  • Authors: Hongyang Du, Ruichen Zhang, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Xuemin (Sherman)Shen, H. Vincent Poor
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.03446
  • Pdf link: https://arxiv.org/pdf/2304.03446
  • Abstract
    Driven by advances in generative artificial intelligence (AI) techniques and algorithms, the widespread adoption of AI-generated content (AIGC) has emerged, allowing for the generation of diverse and high-quality content. Especially, the diffusion model-based AIGC technique has been widely used to generate content in a variety of modalities. However, the real-world implementation of AIGC models, particularly on resource-constrained devices such as mobile phones, introduces significant challenges related to energy consumption and privacy concerns. To further promote the realization of ubiquitous AIGC services, we propose a novel collaborative distributed diffusion-based AIGC framework. By capitalizing on collaboration among devices in wireless networks, the proposed framework facilitates the efficient execution of AIGC tasks, optimizing edge computation resource utilization. Furthermore, we examine the practical implementation of the denoising steps on mobile phones, the impact of the proposed approach on the wireless network-aided AIGC landscape, and the future opportunities associated with its real-world integration. The contributions of this paper not only offer a promising solution to the existing limitations of AIGC services but also pave the way for future research in device collaboration, resource optimization, and the seamless delivery of AIGC services across various devices. Our code is available at https://github.com/HongyangDu/DistributedDiffusion.

Can we learn better with hard samples?

  • Authors: Subin Sahayam, John Zakkam, Umarani Jayaraman
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.03486
  • Pdf link: https://arxiv.org/pdf/2304.03486
  • Abstract
    In deep learning, mini-batch training is commonly used to optimize network parameters. However, the traditional mini-batch method may not learn the under-represented samples and complex patterns in the data, leading to a longer time for generalization. To address this problem, a variant of the traditional algorithm has been proposed, which trains the network focusing on mini-batches with high loss. The study evaluates the effectiveness of the proposed training using various deep neural networks trained on three benchmark datasets (CIFAR-10, CIFAR-100, and STL-10). The deep neural networks used in the study are ResNet-18, ResNet-50, Efficient Net B4, EfficientNetV2-S, and MobilenetV3-S. The experimental results showed that the proposed method can significantly improve the test accuracy and speed up the convergence compared to the traditional mini-batch training method. Furthermore, we introduce a hyper-parameter delta ({\delta}) that decides how many mini-batches are considered for training. Experiments on various values of {\delta} found that the performance of the proposed method for smaller {\delta} values generally results in similar test accuracy and faster generalization. We show that the proposed method generalizes in 26.47% less number of epochs than the traditional mini-batch method in EfficientNet-B4 on STL-10. The proposed method also improves the test top-1 accuracy by 7.26% in ResNet-18 on CIFAR-100.

Cell-Edge Performance Booster in 6G: Cell-Free Massive MIMO vs. Reconfigurable Intelligent Surface

  • Authors: Wei Jiang, Hans D. Schotten
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.03594
  • Pdf link: https://arxiv.org/pdf/2304.03594
  • Abstract
    User experience in mobile communications is vulnerable to worse quality at the cell edge, which cannot be compensated by enjoying excellent service at the cell center, according to the principle of risk aversion in behavioral economics. Constrained by weak signal strength and substantial inter-cell interference, the cell edge is always a major bottleneck of any mobile network. Due to their possibility for empowering the next-generation mobile system, reconfigurable intelligent surface (RIS) and cell-free massive MIMO (CFmMIMO) have recently attracted a lot of focus from academia and industry. In addition to a variety of technological advantages, both are highly potential to boost cell-edge performance. To the authors' best knowledge, a performance comparison of RIS and CFmMIMO, especially on the cell edge, is still missing in the literature. To fill this gap, this paper establishes a fair scenario and demonstrates extensive numerical results to clarify their behaviors at the cell edge.

RSPT: Reconstruct Surroundings and Predict Trajectories for Generalizable Active Object Tracking

  • Authors: Fangwei Zhong, Xiao Bi, Yudi Zhang, Wei Zhang, Yizhou Wang
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03623
  • Pdf link: https://arxiv.org/pdf/2304.03623
  • Abstract
    Active Object Tracking (AOT) aims to maintain a specific relation between the tracker and object(s) by autonomously controlling the motion system of a tracker given observations. AOT has wide-ranging applications, such as in mobile robots and autonomous driving. However, building a generalizable active tracker that works robustly across different scenarios remains a challenge, especially in unstructured environments with cluttered obstacles and diverse layouts. We argue that constructing a state representation capable of modeling the geometry structure of the surroundings and the dynamics of the target is crucial for achieving this goal. To address this challenge, we present RSPT, a framework that forms a structure-aware motion representation by Reconstructing the Surroundings and Predicting the target Trajectory. Additionally, we enhance the generalization of the policy network by training in an asymmetric dueling mechanism. We evaluate RSPT on various simulated scenarios and show that it outperforms existing methods in unseen environments, particularly those with complex obstacles and layouts. We also demonstrate the successful transfer of RSPT to real-world settings. Project Website: https://sites.google.com/view/aot-rspt.

Keyword: pruning

Scalable Causal Discovery with Score Matching

  • Authors: Francesco Montagna, Nicoletta Noceti, Lorenzo Rosasco, Kun Zhang, Francesco Locatello
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.03382
  • Pdf link: https://arxiv.org/pdf/2304.03382
  • Abstract
    This paper demonstrates how to discover the whole causal graph from the second derivative of the log-likelihood in observational non-linear additive Gaussian noise models. Leveraging scalable machine learning approaches to approximate the score function $\nabla \log p(\mathbf{X})$, we extend the work of Rolland et al. (2022) that only recovers the topological order from the score and requires an expensive pruning step removing spurious edges among those admitted by the ordering. Our analysis leads to DAS (acronym for Discovery At Scale), a practical algorithm that reduces the complexity of the pruning by a factor proportional to the graph size. In practice, DAS achieves competitive accuracy with current state-of-the-art while being over an order of magnitude faster. Overall, our approach enables principled and scalable causal discovery, significantly lowering the compute bar.

Clutter Detection and Removal in 3D Scenes with View-Consistent Inpainting

  • Authors: Fangyin Wei, Thomas Funkhouser, Szymon Rusinkiewicz
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03763
  • Pdf link: https://arxiv.org/pdf/2304.03763
  • Abstract
    Removing clutter from scenes is essential in many applications, ranging from privacy-concerned content filtering to data augmentation. In this work, we present an automatic system that removes clutter from 3D scenes and inpaints with coherent geometry and texture. We propose techniques for its two key components: 3D segmentation from shared properties and 3D inpainting, both of which are important porblems. The definition of 3D scene clutter (frequently-moving objects) is not well captured by commonly-studied object categories in computer vision. To tackle the lack of well-defined clutter annotations, we group noisy fine-grained labels, leverage virtual rendering, and impose an instance-level area-sensitive loss. Once clutter is removed, we inpaint geometry and texture in the resulting holes by merging inpainted RGB-D images. This requires novel voting and pruning strategies that guarantee multi-view consistency across individually inpainted images for mesh reconstruction. Experiments on ScanNet and Matterport dataset show that our method outperforms baselines for clutter segmentation and 3D inpainting, both visually and quantitatively.

Keyword: voxel

On the Suitability of Representations for Quality Diversity Optimization of Shapes

  • Authors: Ludovico Scarton, Alexander Hagg
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.03520
  • Pdf link: https://arxiv.org/pdf/2304.03520
  • Abstract
    The representation, or encoding, utilized in evolutionary algorithms has a substantial effect on their performance. Examination of the suitability of widely used representations for quality diversity optimization (QD) in robotic domains has yielded inconsistent results regarding the most appropriate encoding method. Given the domain-dependent nature of QD, additional evidence from other domains is necessary. This study compares the impact of several representations, including direct encoding, a dictionary-based representation, parametric encoding, compositional pattern producing networks, and cellular automata, on the generation of voxelized meshes in an architecture setting. The results reveal that some indirect encodings outperform direct encodings and can generate more diverse solution sets, especially when considering full phenotypic diversity. The paper introduces a multi-encoding QD approach that incorporates all evaluated representations in the same archive. Species of encodings compete on the basis of phenotypic features, leading to an approach that demonstrates similar performance to the best single-encoding QD approach. This is noteworthy, as it does not always require the contribution of the best-performing single encoding.

Keyword: lidar

There is no result

Keyword: diffusion

Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models

  • Authors: Guanhua Zhang, Jiabao Ji, Yang Zhang, Mo Yu, Tommi Jaakkola, Shiyu Chang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03322
  • Pdf link: https://arxiv.org/pdf/2304.03322
  • Abstract
    Image inpainting refers to the task of generating a complete, natural image based on a partially revealed reference image. Recently, many research interests have been focused on addressing this problem using fixed diffusion models. These approaches typically directly replace the revealed region of the intermediate or final generated images with that of the reference image or its variants. However, since the unrevealed regions are not directly modified to match the context, it results in incoherence between revealed and unrevealed regions. To address the incoherence problem, a small number of methods introduce a rigorous Bayesian framework, but they tend to introduce mismatches between the generated and the reference images due to the approximation errors in computing the posterior distributions. In this paper, we propose COPAINT, which can coherently inpaint the whole image without introducing mismatches. COPAINT also uses the Bayesian framework to jointly modify both revealed and unrevealed regions, but approximates the posterior distribution in a way that allows the errors to gradually drop to zero throughout the denoising steps, thus strongly penalizing any mismatches with the reference image. Our experiments verify that COPAINT can outperform the existing diffusion-based methods under both objective and subjective metrics. The codes are available at https://github.com/UCSB-NLP-Chang/CoPaint/.

Training-Free Layout Control with Cross-Attention Guidance

  • Authors: Minghao Chen, Iro Laina, Andrea Vedaldi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03373
  • Pdf link: https://arxiv.org/pdf/2304.03373
  • Abstract
    Recent diffusion-based generators can produce high-quality images based only on textual prompts. However, they do not correctly interpret instructions that specify the spatial layout of the composition. We propose a simple approach that can achieve robust layout control without requiring training or fine-tuning the image generator. Our technique, which we call layout guidance, manipulates the cross-attention layers that the model uses to interface textual and visual information and steers the reconstruction in the desired direction given, e.g., a user-specified layout. In order to determine how to best guide attention, we study the role of different attention maps when generating images and experiment with two alternative strategies, forward and backward guidance. We evaluate our method quantitatively and qualitatively with several experiments, validating its effectiveness. We further demonstrate its versatility by extending layout guidance to the task of editing the layout and context of a given real image.

RoSteALS: Robust Steganography using Autoencoder Latent Space

  • Authors: Tu Bui, Shruti Agarwal, Ning Yu, John Collomosse
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03400
  • Pdf link: https://arxiv.org/pdf/2304.03400
  • Abstract
    Data hiding such as steganography and invisible watermarking has important applications in copyright protection, privacy-preserved communication and content provenance. Existing works often fall short in either preserving image quality, or robustness against perturbations or are too complex to train. We propose RoSteALS, a practical steganography technique leveraging frozen pretrained autoencoders to free the payload embedding from learning the distribution of cover images. RoSteALS has a light-weight secret encoder of just 300k parameters, is easy to train, has perfect secret recovery performance and comparable image quality on three benchmarks. Additionally, RoSteALS can be adapted for novel cover-less steganography applications in which the cover image can be sampled from noise or conditioned on text prompts via a denoising diffusion process. Our model and code are available at \url{https://github.com/TuBui/RoSteALS}.

Exploring Collaborative Distributed Diffusion-Based AI-Generated Content (AIGC) in Wireless Networks

  • Authors: Hongyang Du, Ruichen Zhang, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Xuemin (Sherman)Shen, H. Vincent Poor
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.03446
  • Pdf link: https://arxiv.org/pdf/2304.03446
  • Abstract
    Driven by advances in generative artificial intelligence (AI) techniques and algorithms, the widespread adoption of AI-generated content (AIGC) has emerged, allowing for the generation of diverse and high-quality content. Especially, the diffusion model-based AIGC technique has been widely used to generate content in a variety of modalities. However, the real-world implementation of AIGC models, particularly on resource-constrained devices such as mobile phones, introduces significant challenges related to energy consumption and privacy concerns. To further promote the realization of ubiquitous AIGC services, we propose a novel collaborative distributed diffusion-based AIGC framework. By capitalizing on collaboration among devices in wireless networks, the proposed framework facilitates the efficient execution of AIGC tasks, optimizing edge computation resource utilization. Furthermore, we examine the practical implementation of the denoising steps on mobile phones, the impact of the proposed approach on the wireless network-aided AIGC landscape, and the future opportunities associated with its real-world integration. The contributions of this paper not only offer a promising solution to the existing limitations of AIGC services but also pave the way for future research in device collaboration, resource optimization, and the seamless delivery of AIGC services across various devices. Our code is available at https://github.com/HongyangDu/DistributedDiffusion.

Compressed Regression over Adaptive Networks

  • Authors: Marco Carpentiero, Vincenzo Matta, Ali H. Sayed
  • Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Signal Processing (eess.SP); Optimization and Control (math.OC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.03638
  • Pdf link: https://arxiv.org/pdf/2304.03638
  • Abstract
    In this work we derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem. Agents employ the recently proposed ACTC (adapt-compress-then-combine) diffusion strategy, where the signals exchanged locally by neighboring agents are encoded with randomized differential compression operators. We provide a detailed characterization of the mean-square estimation error, which is shown to comprise a term related to the error that agents would achieve without communication constraints, plus a term arising from compression. The analysis reveals quantitative relationships between the compression loss and fundamental attributes of the distributed regression problem, in particular, the stochastic approximation error caused by the gradient noise and the network topology (through the Perron eigenvector). We show that knowledge of such relationships is critical to allocate optimally the communication resources across the agents, taking into account their individual attributes, such as the quality of their data or their degree of centrality in the network topology. We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents. Illustrative examples show that a significant performance improvement, as compared to a blind (i.e., uniform) resource allocation, can be achieved by optimizing the allocation by means of the provided mean-square-error formulas.

Keyword: dynamic

Adaptive Feature Fusion: Enhancing Generalization in Deep Learning Models

  • Authors: Neelesh Mungoli
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.03290
  • Pdf link: https://arxiv.org/pdf/2304.03290
  • Abstract
    In recent years, deep learning models have demonstrated remarkable success in various domains, such as computer vision, natural language processing, and speech recognition. However, the generalization capabilities of these models can be negatively impacted by the limitations of their feature fusion techniques. This paper introduces an innovative approach, Adaptive Feature Fusion (AFF), to enhance the generalization of deep learning models by dynamically adapting the fusion process of feature representations. The proposed AFF framework is designed to incorporate fusion layers into existing deep learning architectures, enabling seamless integration and improved performance. By leveraging a combination of data-driven and model-based fusion strategies, AFF is able to adaptively fuse features based on the underlying data characteristics and model requirements. This paper presents a detailed description of the AFF framework, including the design and implementation of fusion layers for various architectures. Extensive experiments are conducted on multiple benchmark datasets, with the results demonstrating the superiority of the AFF approach in comparison to traditional feature fusion techniques. The analysis showcases the effectiveness of AFF in enhancing generalization capabilities, leading to improved performance across different tasks and applications. Finally, the paper discusses various real-world use cases where AFF can be employed, providing insights into its practical applicability. The conclusion highlights the potential for future research directions, including the exploration of advanced fusion strategies and the extension of AFF to other machine learning paradigms.

Hardware-Aware Static Optimization of Hyperdimensional Computations

  • Authors: Pu Yi, Sara Achour
  • Subjects: Programming Languages (cs.PL); Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2304.03335
  • Pdf link: https://arxiv.org/pdf/2304.03335
  • Abstract
    Hyperdimensional (HD) computing is an highly error-resilient computational paradigm that can be used to efficiently perform language classification, data retrieval, and analogical reasoning tasks on error-prone emerging hardware technologies. HD computation is storage-inefficient and often requires computing over 10,000-dimensional bit vectors. Prior work either leaves hypervectors unoptimized or dynamically tunes HD computation parameters (e.g., hypervector dimension) to deliver the desired accuracy. These approaches are time-consuming, lack accuracy guarantees, and do not generalize well. We present Heim, a framework for statically optimizing HD computation parameters to minimize resource usage in the presence of hardware error. Heim guarantees the optimized computation satisfies a user-provided target accuracy. Heim deploys a novel analysis procedure that unifies theoretical results in HD computing to systematically optimize HD computation. We develop four analysis-amenable data structures that leverage Heim to perform aggressive space-saving optimizations, and optimize these data structures to attain 99% query accuracy on both binary memory and multiple-bit-per-cell resistive memory. Heim-optimized data structures deliver 1.31x-14.51x reductions in hypervector size and 2.191x-27.27x reductions in memory usage while attaining 98.96-99.75% accuracy. Heim-optimized data structures deliver up to 41.40% accuracy improvements over dynamically tuned parameters. Heim computes parameters significantly faster than dynamic approaches.

Spintronic Physical Reservoir for Autonomous Prediction and Long-Term Household Energy Load Forecasting

  • Authors: Walid Al Misba, Harindra S. Mavikumbure, Md Mahadi Rajib, Daniel L. Marino, Victor Cobilean, Milos Manic, Jayasimha Atulasimha
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03343
  • Pdf link: https://arxiv.org/pdf/2304.03343
  • Abstract
    In this study, we have shown autonomous long-term prediction with a spintronic physical reservoir. Due to the short-term memory property of the magnetization dynamics, non-linearity arises in the reservoir states which could be used for long-term prediction tasks using simple linear regression for online training. During the prediction stage, the output is directly fed to the input of the reservoir for autonomous prediction. We employ our proposed reservoir for the modeling of the chaotic time series such as Mackey-Glass and dynamic time-series data, such as household building energy loads. Since only the last layer of a RC needs to be trained with linear regression, it is well suited for learning in real time on edge devices. Here we show that a skyrmion based magnetic tunnel junction can potentially be used as a prototypical RC but any nanomagnetic magnetic tunnel junction with nonlinear magnetization behavior can implement such a RC. By comparing our spintronic physical RC approach with state-of-the-art energy load forecasting algorithms, such as LSTMs and RNNs, we conclude that the proposed framework presents good performance in achieving high predictions accuracy, while also requiring low memory and energy both of which are at a premium in hardware resource and power constrained edge applications. Further, the proposed approach is shown to require very small training datasets and at the same time being at least 16X energy efficient compared to the state-of-the-art sequence to sequence LSTM for accurate household load predictions.

Robust Decision-Focused Learning for Reward Transfer

  • Authors: Abhishek Sharma, Sonali Parbhoo, Omer Gottesman, Finale Doshi-Velez
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.03365
  • Pdf link: https://arxiv.org/pdf/2304.03365
  • Abstract
    Decision-focused (DF) model-based reinforcement learning has recently been introduced as a powerful algorithm which can focus on learning the MDP dynamics which are most relevant for obtaining high rewards. While this approach increases the performance of agents by focusing the learning towards optimizing for the reward directly, it does so by learning less accurate dynamics (from a MLE standpoint), and may thus be brittle to changes in the reward function. In this work, we develop the robust decision-focused (RDF) algorithm which leverages the non-identifiability of DF solutions to learn models which maximize expected returns while simultaneously learning models which are robust to changes in the reward function. We demonstrate on a variety of toy example and healthcare simulators that RDF significantly increases the robustness of DF to changes in the reward function, without decreasing the overall return the agent obtains.

Interpretable statistical representations of neural population dynamics and geometry

  • Authors: Adam Gosztolai, Robert L. Peach, Alexis Arnaudon, Mauricio Barahona, Pierre Vandergheynst
  • Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM)
  • Arxiv link: https://arxiv.org/abs/2304.03376
  • Pdf link: https://arxiv.org/pdf/2304.03376
  • Abstract
    The dynamics of neuron populations during diverse tasks often evolve on low-dimensional manifolds. However, it remains challenging to discern the contributions of geometry and dynamics for encoding relevant behavioural variables. Here, we introduce an unsupervised geometric deep learning framework for representing non-linear dynamical systems based on statistical distributions of local phase portrait features. Our method provides robust geometry-aware or geometry-agnostic representations for the unbiased comparison of dynamics based on measured trajectories. We demonstrate that our statistical representation can generalise across neural network instances to discriminate computational mechanisms, obtain interpretable embeddings of neural dynamics in a primate reaching task with geometric correspondence to hand kinematics, and develop a decoding algorithm with state-of-the-art accuracy. Our results highlight the importance of using the intrinsic manifold structure over temporal information to develop better decoding algorithms and assimilate data across experiments.

EZClone: Improving DNN Model Extraction Attack via Shape Distillation from GPU Execution Profiles

  • Authors: Jonah O'Brien Weiss, Tiago Alves, Sandip Kundu
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.03388
  • Pdf link: https://arxiv.org/pdf/2304.03388
  • Abstract
    Deep Neural Networks (DNNs) have become ubiquitous due to their performance on prediction and classification problems. However, they face a variety of threats as their usage spreads. Model extraction attacks, which steal DNNs, endanger intellectual property, data privacy, and security. Previous research has shown that system-level side-channels can be used to leak the architecture of a victim DNN, exacerbating these risks. We propose two DNN architecture extraction techniques catering to various threat models. The first technique uses a malicious, dynamically linked version of PyTorch to expose a victim DNN architecture through the PyTorch profiler. The second, called EZClone, exploits aggregate (rather than time-series) GPU profiles as a side-channel to predict DNN architecture, employing a simple approach and assuming little adversary capability as compared to previous work. We investigate the effectiveness of EZClone when minimizing the complexity of the attack, when applied to pruned models, and when applied across GPUs. We find that EZClone correctly predicts DNN architectures for the entire set of PyTorch vision architectures with 100% accuracy. No other work has shown this degree of architecture prediction accuracy with the same adversarial constraints or using aggregate side-channel information. Prior work has shown that, once a DNN has been successfully cloned, further attacks such as model evasion or model inversion can be accelerated significantly.

Runtime Variation in Big Data Analytics

  • Authors: Yiwen Zhu, Rathijit Sen, Robert Horton, John Mark, Agosta
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.03424
  • Pdf link: https://arxiv.org/pdf/2304.03424
  • Abstract
    The dynamic nature of resource allocation and runtime conditions on Cloud can result in high variability in a job's runtime across multiple iterations, leading to a poor experience. Identifying the sources of such variation and being able to predict and adjust for them is crucial to cloud service providers to design reliable data processing pipelines, provision and allocate resources, adjust pricing services, meet SLOs and debug performance hazards. In this paper, we analyze the runtime variation of millions of production SCOPE jobs on Cosmos, an exabyte-scale internal analytics platform at Microsoft. We propose an innovative 2-step approach to predict job runtime distribution by characterizing typical distribution shapes combined with a classification model with an average accuracy of >96%, out-performing traditional regression models and better capturing long tails. We examine factors such as job plan characteristics and inputs, resource allocation, physical cluster heterogeneity and utilization, and scheduling policies. To the best of our knowledge, this is the first study on predicting categories of runtime distributions for enterprise analytics workloads at scale. Furthermore, we examine how our methods can be used to analyze what-if scenarios, focusing on the impact of resource allocation, scheduling, and physical cluster provisioning decisions on a job's runtime consistency and predictability.

Large-Scale Analysis of New Employee Network Dynamics

  • Authors: Yulin Yu, Longqi Yang, Siân Lindley, Mengting Wan
  • Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2304.03441
  • Pdf link: https://arxiv.org/pdf/2304.03441
  • Abstract
    The COVID-19 pandemic has accelerated digital transformations across industries, but also introduced new challenges into workplaces, including the difficulties of effectively socializing with colleagues when working remotely. This challenge is exacerbated for new employees who need to develop workplace networks from the outset. In this paper, by analyzing a large-scale telemetry dataset of more than 10,000 Microsoft employees who joined the company in the first three months of 2022, we describe how new employees interact and telecommute with their colleagues during their ``onboarding'' period. Our results reveal that although new hires are gradually expanding networks over time, there still exists significant gaps between their network statistics and those of tenured employees even after the six-month onboarding phase. We also observe that heterogeneity exists among new employees in how their networks change over time, where employees whose job tasks do not necessarily require extensive and diverse connections could be at a disadvantaged position in this onboarding process. By investigating how web-based people recommendations in organizational knowledge base facilitate new employees naturally expand their networks, we also demonstrate the potential of web-based applications for addressing the aforementioned socialization challenges. Altogether, our findings provide insights on new employee network dynamics in remote and hybrid work environments, which may help guide organizational leaders and web application developers on quantifying and improving the socialization experiences of new employees in digital workplaces.

Generative Agents: Interactive Simulacra of Human Behavior

  • Authors: Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein
  • Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.03442
  • Pdf link: https://arxiv.org/pdf/2304.03442
  • Abstract
    Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.

Detecting Chinese Fake News on Twitter during the COVID-19 Pandemic

  • Authors: Yongjun Zhang, Sijia Liu, Yi Wang, Xinguang Fan
  • Subjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.03454
  • Pdf link: https://arxiv.org/pdf/2304.03454
  • Abstract
    The outbreak of COVID-19 has led to a global surge of Sinophobia partly because of the spread of misinformation, disinformation, and fake news on China. In this paper, we report on the creation of a novel classifier that detects whether Chinese-language social media posts from Twitter are related to fake news about China. The classifier achieves an F1 score of 0.64 and an accuracy rate of 93%. We provide the final model and a new training dataset with 18,425 tweets for researchers to study fake news in the Chinese language during the COVID-19 pandemic. We also introduce a new dataset generated by our classifier that tracks the dynamics of fake news in the Chinese language during the early pandemic.

UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner

  • Authors: Yiwen Ye, Yutong Xie, Jianpeng Zhang, Ziyang Chen, Yong Xia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03493
  • Pdf link: https://arxiv.org/pdf/2304.03493
  • Abstract
    The universal model emerges as a promising trend for medical image segmentation, paving up the way to build medical imaging large model (MILM). One popular strategy to build universal models is to encode each task as a one-hot vector and generate dynamic convolutional layers at the end of the decoder to extract the interested target. Although successful, it ignores the correlations among tasks and meanwhile is too late to make the model 'aware' of the ongoing task. To address both issues, we propose a prompt-driven Universal Segmentation model (UniSeg) for multi-task medical image segmentation using diverse modalities and domains. We first devise a learnable universal prompt to describe the correlations among all tasks and then convert this prompt and image features into a task-specific prompt, which is fed to the decoder as a part of its input. Thus, we make the model 'aware' of the ongoing task early and boost the task-specific training of the whole decoder. Our results indicate that the proposed UniSeg outperforms other universal models and single-task models on 11 upstream tasks. Moreover, UniSeg also beats other pre-trained models on two downstream datasets, providing the community with a high-quality pre-trained model for 3D medical image segmentation. Code and model are available at https://github.com/yeerwen/UniSeg.

Robust data-driven control for nonlinear systems using the Koopman operator

  • Authors: Robin Strässer, Julian Berberich, Frank Allgöwer
  • Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.03519
  • Pdf link: https://arxiv.org/pdf/2304.03519
  • Abstract
    Data-driven analysis and control of dynamical systems have gained a lot of interest in recent years. While the class of linear systems is well studied, theoretical results for nonlinear systems are still rare. In this paper, we present a data-driven controller design method for discrete-time control-affine nonlinear systems. Our approach relies on the Koopman operator, which is a linear but infinite-dimensional operator lifting the nonlinear system to a higher-dimensional space. Particularly, we derive a linear fractional representation of a lifted bilinear system representation based on measured data. Further, we restrict the lifting to finite dimensions, but account for the truncation error using a finite-gain argument. We derive a linear matrix inequality based design procedure to guarantee robust local stability for the resulting bilinear system for all error terms satisfying the finite-gain bound and, thus, also for the underlying nonlinear system. Finally, we apply the developed design method to the nonlinear Van der Pol oscillator.

Automated Tuning of Nonlinear Kalman Filters for Optimal Trajectory Tracking Performance of AUVs

  • Authors: Maximilian Nitsch, David Stenger, Dirk Abel
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.03565
  • Pdf link: https://arxiv.org/pdf/2304.03565
  • Abstract
    The performance of navigation algorithms significantly determines the trajectory tracking accuracy of the guidance, navigation, and control (GNC) system of an autonomous underwater vehicle (AUV). In closed-loop operation, the interaction among path planning, control, and navigation plays a crucial role in the tracking accuracy of the overall GNC system. A Doppler velocity log (DVL) is often used for AUVs to measure velocity over the ground, positively affecting the closed-loop tracking error. However, a DVL may not be installed in miniaturized AUVs due to limited space and energy. In this paper, a navigation filter for an underactuated miniature AUV (nanoAUV) is considered that is mainly based on acoustic localization using a novel highly-miniaturized ultra-short baseline (USBL) system and a depth pressure sensor. The nanoAUV is being developed for subglacial lake exploration. We compare two unscented Kalman filters (UKF) with different prediction models - the classical strapdown inertial navigation systems (SINS) model and a hydrodynamic motion model (HMM). To enable a fair comparison, filter parameters are auto-tuned with Bayesian optimization (BO) for open and closed-loop performance, which is novel in AUV navigation. The results indicate that BO performs similarly to particle swarm optimization (PSO) regarding sample efficiency for the proposed problem. To quantify the GNC tracking performance, we use extensive Monte Carlo simulations. Results suggest that with BO-tuned navigation filter parameters, the median tracking error is reduced by up to 50% compared to default parametrization.

Towards Automated 3D Search Planning for Emergency Response Missions

  • Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.03570
  • Pdf link: https://arxiv.org/pdf/2304.03570
  • Abstract
    The ability to efficiently plan and execute automated and precise search missions using unmanned aerial vehicles (UAVs) during emergency response situations is imperative. Precise navigation between obstacles and time-efficient searching of 3D structures and buildings are essential for locating survivors and people in need in emergency response missions. In this work we address this challenging problem by proposing a unified search planning framework that automates the process of UAV-based search planning in 3D environments. Specifically, we propose a novel search planning framework which enables automated planning and execution of collision-free search trajectories in 3D by taking into account low-level mission constrains (e.g., the UAV dynamical and sensing model), mission objectives (e.g., the mission execution time and the UAV energy efficiency) and user-defined mission specifications (e.g., the 3D structures to be searched and minimum detection probability constraints). The capabilities and performance of the proposed approach are demonstrated through extensive simulated 3D search scenarios.

RSPT: Reconstruct Surroundings and Predict Trajectories for Generalizable Active Object Tracking

  • Authors: Fangwei Zhong, Xiao Bi, Yudi Zhang, Wei Zhang, Yizhou Wang
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03623
  • Pdf link: https://arxiv.org/pdf/2304.03623
  • Abstract
    Active Object Tracking (AOT) aims to maintain a specific relation between the tracker and object(s) by autonomously controlling the motion system of a tracker given observations. AOT has wide-ranging applications, such as in mobile robots and autonomous driving. However, building a generalizable active tracker that works robustly across different scenarios remains a challenge, especially in unstructured environments with cluttered obstacles and diverse layouts. We argue that constructing a state representation capable of modeling the geometry structure of the surroundings and the dynamics of the target is crucial for achieving this goal. To address this challenge, we present RSPT, a framework that forms a structure-aware motion representation by Reconstructing the Surroundings and Predicting the target Trajectory. Additionally, we enhance the generalization of the policy network by training in an asymmetric dueling mechanism. We evaluate RSPT on various simulated scenarios and show that it outperforms existing methods in unseen environments, particularly those with complex obstacles and layouts. We also demonstrate the successful transfer of RSPT to real-world settings. Project Website: https://sites.google.com/view/aot-rspt.

DATE: Domain Adaptive Product Seeker for E-commerce

  • Authors: Haoyuan Li, Hao Jiang, Tao Jin, Mengyan Li, Yan Chen, Zhijie Lin, Yang Zhao, Zhou Zhao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03669
  • Pdf link: https://arxiv.org/pdf/2304.03669
  • Abstract
    Product Retrieval (PR) and Grounding (PG), aiming to seek image and object-level products respectively according to a textual query, have attracted great interest recently for better shopping experience. Owing to the lack of relevant datasets, we collect two large-scale benchmark datasets from Taobao Mall and Live domains with about 474k and 101k image-query pairs for PR, and manually annotate the object bounding boxes in each image for PG. As annotating boxes is expensive and time-consuming, we attempt to transfer knowledge from annotated domain to unannotated for PG to achieve un-supervised Domain Adaptation (PG-DA). We propose a {\bf D}omain {\bf A}daptive Produc{\bf t} S{\bf e}eker ({\bf DATE}) framework, regarding PR and PG as Product Seeking problem at different levels, to assist the query {\bf date} the product. Concretely, we first design a semantics-aggregated feature extractor for each modality to obtain concentrated and comprehensive features for following efficient retrieval and fine-grained grounding tasks. Then, we present two cooperative seekers to simultaneously search the image for PR and localize the product for PG. Besides, we devise a domain aligner for PG-DA to alleviate uni-modal marginal and multi-modal conditional distribution shift between source and target domains, and design a pseudo box generator to dynamically select reliable instances and generate bounding boxes for further knowledge transfer. Extensive experiments show that our DATE achieves satisfactory performance in fully-supervised PR, PG and un-supervised PG-DA. Our desensitized datasets will be publicly available here\footnote{\url{https://github.com/Taobao-live/Product-Seeking}}.

Sound Dynamic Deadlock Prediction in Linear Time

  • Authors: Umang Mathur, Andreas Pavlogiannis, Hünkar Can Tunç, Mahesh Viswanathan
  • Subjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.03692
  • Pdf link: https://arxiv.org/pdf/2304.03692
  • Abstract
    Deadlocks are one of the most notorious concurrency bugs, and significant research has focused on detecting them efficiently. Dynamic predictive analyses work by observing concurrent executions, and reason about alternative interleavings that can witness concurrency bugs. Such techniques offer scalability and sound bug reports, and have emerged as an effective approach for concurrency bug detection, such as data races. Effective dynamic deadlock prediction, however, has proven a challenging task, as no deadlock predictor currently meets the requirements of soundness, high-precision, and efficiency. In this paper, we first formally establish that this tradeoff is unavoidable, by showing that (a) sound and complete deadlock prediction is intractable, in general, and (b) even the seemingly simpler task of determining the presence of potential deadlocks, which often serve as unsound witnesses for actual predictable deadlocks, is intractable. The main contribution of this work is a new class of predictable deadlocks, called sync(hronization)-preserving deadlocks. Informally, these are deadlocks that can be predicted by reordering the observed execution while preserving the relative order of conflicting critical sections. We present two algorithms for sound deadlock prediction based on this notion. Our first algorithm SyncPDOffline detects all sync-preserving deadlocks, with running time that is linear per abstract deadlock pattern, a novel notion also introduced in this work. Our second algorithm SyncPDOnline predicts all sync-preserving deadlocks that involve two threads in a strictly online fashion, runs in overall linear time, and is better suited for a runtime monitoring setting. We implemented both our algorithms and evaluated their ability to perform offline and online deadlock-prediction on a large dataset of standard benchmarks.

Sorta Solving the OPF by Not Solving the OPF: DAE Control Theory and the Price of Realtime Regulation

  • Authors: Muhammad Nadeem, Ahmad F. Taha
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.03699
  • Pdf link: https://arxiv.org/pdf/2304.03699
  • Abstract
    This paper presents a new approach to solve or approximate the AC optimal power flow (ACOPF). By eliminating the need to solve the ACOPF every few minutes, the paper showcases how a realtime feedback controller can be utilized in lieu of ACOPF and its variants. By (i) forming the grid dynamics as a system of differential algebraic equations (DAE) that naturally encode the non-convex power flow constraints, (ii) utilizing advanced DAE-Lyapunov theory, and (iii) designing a feedback controller that captures realtime uncertainty while being uncertainty-unaware, the presented approach demonstrates promises of obtaining solutions that are close to the OPF ones without needing to solve the OPF. The proposed controller responds in realtime to deviations in renewables generation and loads, guaranteeing transient stability, while always yielding feasible solutions of the ACOPF with no constraint violations. As the studied approach herein indeed yields slightly more expensive realtime generator setpoints, the corresponding price of realtime control and regulation is examined. Cost-comparisons with the traditional ACOPF are also showcased -- all via case studies on standard power networks.

Optimal Reads-From Consistency Checking for C11-Style Memory Models

  • Authors: Parosh Aziz Abdulla, Soham Chakraborty, Shankaranarayanan Krishna, Umang Mathur, Andreas Pavlogiannis, Hünkar Can Tunç
  • Subjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.03714
  • Pdf link: https://arxiv.org/pdf/2304.03714
  • Abstract
    Over the years, several memory models have been proposed to capture the subtle concurrency semantics of C/C++.One of the most fundamental problems associated with a memory model M is consistency checking: given an execution X, is X consistent with M? This problem lies at the heart of numerous applications, including specification testing and litmus tests, stateless model checking, and dynamic analyses. As such, it has been explored extensively and its complexity is well-understood for traditional models like SC and TSO. However, less is known for the numerous model variants of C/C++, for which the problem becomes challenging due to the intricacies of their concurrency primitives. In this work we study the problem of consistency checking for popular variants of the C11 memory model, in particular, the RC20 model, its release-acquire (RA) fragment, the strong and weak variants of RA (SRA and WRA), as well as the Relaxed fragment of RC20. Motivated by applications in testing and model checking, we focus on reads-from consistency checking. The input is an execution X specifying a set of events, their program order and their reads-from relation, and the task is to decide the existence of a modification order on the writes of X that makes X consistent in a memory model. We draw a rich complexity landscape for this problem; our results include (i)~nearly-linear-time algorithms for certain variants, which improve over prior results, (ii)~fine-grained optimality results, as well as (iii)~matching upper and lower bounds (NP-hardness) for other variants. To our knowledge, this is the first work to characterize the complexity of consistency checking for C11 memory models. We have implemented our algorithms inside the TruSt model checker and the C11Tester testing tool. Experiments on standard benchmarks show that our new algorithms improve consistency checking, often by a significant margin.

On the Importance of Contrastive Loss in Multimodal Learning

  • Authors: Yunwei Ren, Yuanzhi Li
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.03717
  • Pdf link: https://arxiv.org/pdf/2304.03717
  • Abstract
    Recently, contrastive learning approaches (e.g., CLIP (Radford et al., 2021)) have received huge success in multimodal learning, where the model tries to minimize the distance between the representations of different views (e.g., image and its caption) of the same data point while keeping the representations of different data points away from each other. However, from a theoretical perspective, it is unclear how contrastive learning can learn the representations from different views efficiently, especially when the data is not isotropic. In this work, we analyze the training dynamics of a simple multimodal contrastive learning model and show that contrastive pairs are important for the model to efficiently balance the learned representations. In particular, we show that the positive pairs will drive the model to align the representations at the cost of increasing the condition number, while the negative pairs will reduce the condition number, keeping the learned representations balanced.

Responsive Parallelism with Synchronization

  • Authors: Stefan K. Muller, Kyle Singer, Devyn Terra Keeney, Andrew Neth, Kunal Agrawal, I-Ting Angelina Lee, Umut A. Acar
  • Subjects: Programming Languages (cs.PL)
  • Arxiv link: https://arxiv.org/abs/2304.03753
  • Pdf link: https://arxiv.org/pdf/2304.03753
  • Abstract
    Many concurrent programs assign priorities to threads to improve responsiveness. When used in conjunction with synchronization mechanisms such as mutexes and condition variables, however, priorities can lead to priority inversions, in which high-priority threads are delayed by low-priority ones. Priority inversions in the use of mutexes are easily handled using dynamic techniques such as priority inheritance, but priority inversions in the use of condition variables are not well-studied and dynamic techniques are not suitable. In this work, we use a combination of static and dynamic techniques to prevent priority inversion in code that uses mutexes and condition variables. A type system ensures that condition variables are used safely, even while dynamic techniques change thread priorities at runtime to eliminate priority inversions in the use of mutexes. We prove the soundness of our system, using a model of priority inversions based on cost models for parallel programs. To show that the type system is practical to implement, we encode it within the type systems of Rust and C++, and show that the restrictions are not overly burdensome by writing sizeable case studies using these encodings, including porting the Memcached object server to use our C++ implementation.

The Effect of Robot Skill Level and Communication in Rapid, Proximate Human-Robot Collaboration

  • Authors: Kin Man Lee, Arjun Krishna, Zulfiqar Zaidi, Rohan Paleja, Letian Chen, Erin Hedlund-Botti, Mariah Schrum, Matthew Gombolay
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.03756
  • Pdf link: https://arxiv.org/pdf/2304.03756
  • Abstract
    As high-speed, agile robots become more commonplace, these robots will have the potential to better aid and collaborate with humans. However, due to the increased agility and functionality of these robots, close collaboration with humans can create safety concerns that alter team dynamics and degrade task performance. In this work, we aim to enable the deployment of safe and trustworthy agile robots that operate in proximity with humans. We do so by 1) Proposing a novel human-robot doubles table tennis scenario to serve as a testbed for studying agile, proximate human-robot collaboration and 2) Conducting a user-study to understand how attributes of the robot (e.g., robot competency or capacity to communicate) impact team dynamics, perceived safety, and perceived trust, and how these latent factors affect human-robot collaboration (HRC) performance. We find that robot competency significantly increases perceived trust ($p&lt;.001$), extending skill-to-trust assessments in prior studies to agile, proximate HRC. Furthermore, interestingly, we find that when the robot vocalizes its intention to perform a task, it results in a significant decrease in team performance ($p=.037$) and perceived safety of the system ($p=.009$).

New submissions for Wed, 29 Mar 23

Keyword: efficient

Analytical Study and Efficient Evaluation of the Josephus Function

  • Authors: Yunier Bello-Cruz, Roy Quintero-Contreras
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.15457
  • Pdf link: https://arxiv.org/pdf/2303.15457
  • Abstract
    A new approach to analyzing intrinsic properties of the Josephus function, $J_{k}$, is presented in this paper. The linear structure between extreme points of $J{k}$ is fully revealed, leading to the design of an efficient algorithm for evaluating $J{k}(n)$. Algebraic expressions that describe how recursively compute extreme points, including fixed points, are derived. The existence of consecutive extreme and also fixed points for all $k\geq 2$ is proven as a consequence, which generalizes Knuth result for $k=2$. Moreover, an extensive comparative numerical experiment is conducted to illustrate the performance of the proposed algorithm for evaluating the Josephus function compared to established algorithms. The results show that the proposed scheme is highly effective in computing $J{_k}(n)$ for large inputs.

A Stochastic Method for Solving Time-Fractional Differential Equations

  • Authors: Nicolas L. Guidotti, Juan Acebrón, José Monteiro
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.15458
  • Pdf link: https://arxiv.org/pdf/2303.15458
  • Abstract
    We present a stochastic method for efficiently computing the solution of time-fractional partial differential equations (fPDEs) that model anomalous diffusion problems of the subdiffusive type. After discretizing the fPDE in space, the ensuing system of fractional linear equations is solved resorting to a Monte Carlo evaluation of the corresponding Mittag-Leffler matrix function. This is accomplished through the approximation of the expected value of a suitable multiplicative functional of a stochastic process, which consists of a Markov chain whose sojourn times in every state are Mittag-Leffler distributed. The resulting algorithm is able to calculate the solution at conveniently chosen points in the domain with high efficiency. In addition, we present how to generalize this algorithm in order to compute the complete solution. For several large-scale numerical problems, our method showed remarkable performance in both shared-memory and distributed-memory systems, achieving nearly perfect scalability up to 16,384 CPU cores.

Uniform in time convergence of numerical schemes for stochastic differential equations via Strong Exponential stability: Euler methods, Split-Step and Tamed Schemes

  • Authors: Letizia Angeli, Dan Crisan, Michela Ottobre
  • Subjects: Numerical Analysis (math.NA); Probability (math.PR)
  • Arxiv link: https://arxiv.org/abs/2303.15463
  • Pdf link: https://arxiv.org/pdf/2303.15463
  • Abstract
    We prove a general criterion providing sufficient conditions under which a time-discretiziation of a given Stochastic Differential Equation (SDE) is a uniform in time approximation of the SDE. The criterion is also, to a certain extent, discussed in the paper, necessary. Using such a criterion we then analyse the convergence properties of numerical methods for solutions of SDEs; we consider Explicit and Implicit Euler, split-step and (truncated) tamed Euler methods. In particular, we show that, under mild conditions on the coefficients of the SDE (locally Lipschitz and strictly monotonic), these methods produce approximations of the law of the solution of the SDE that converge uniformly in time. The theoretical results are verified by numerical examples.

Embedding Contextual Information through Reward Shaping in Multi-Agent Learning: A Case Study from Google Football

  • Authors: Chaoyi Gu, Varuna De Silva, Corentin Artaud, Rafael Pina
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.15471
  • Pdf link: https://arxiv.org/pdf/2303.15471
  • Abstract
    Artificial Intelligence has been used to help human complete difficult tasks in complicated environments by providing optimized strategies for decision-making or replacing the manual labour. In environments including multiple agents, such as football, the most common methods to train agents are Imitation Learning and Multi-Agent Reinforcement Learning (MARL). However, the agents trained by Imitation Learning cannot outperform the expert demonstrator, which makes humans hardly get new insights from the learnt policy. Besides, MARL is prone to the credit assignment problem. In environments with sparse reward signal, this method can be inefficient. The objective of our research is to create a novel reward shaping method by embedding contextual information in reward function to solve the aforementioned challenges. We demonstrate this in the Google Research Football (GRF) environment. We quantify the contextual information extracted from game state observation and use this quantification together with original sparse reward to create the shaped reward. The experiment results in the GRF environment prove that our reward shaping method is a useful addition to state-of-the-art MARL algorithms for training agents in environments with sparse reward signal.

Exploring the Performance of Pruning Methods in Neural Networks: An Empirical Study of the Lottery Ticket Hypothesis

  • Authors: Eirik Fladmark, Muhammad Hamza Sajjad, Laura Brinkholm Justesen
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.15479
  • Pdf link: https://arxiv.org/pdf/2303.15479
  • Abstract
    In this paper, we explore the performance of different pruning methods in the context of the lottery ticket hypothesis. We compare the performance of L1 unstructured pruning, Fisher pruning, and random pruning on different network architectures and pruning scenarios. The experiments include an evaluation of one-shot and iterative pruning, an examination of weight movement in the network during pruning, a comparison of the pruning methods on networks of varying widths, and an analysis of the performance of the methods when the network becomes very sparse. Additionally, we propose and evaluate a new method for efficient computation of Fisher pruning, known as batched Fisher pruning.

Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis

  • Authors: Xiuwei Xu, Ziwei Wang, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15493
  • Pdf link: https://arxiv.org/pdf/2303.15493
  • Abstract
    In this paper, we propose binary sparse convolutional networks called BSC-Net for efficient point cloud analysis. We empirically observe that sparse convolution operation causes larger quantization errors than standard convolution. However, conventional network quantization methods directly binarize the weights and activations in sparse convolution, resulting in performance drop due to the significant quantization loss. On the contrary, we search the optimal subset of convolution operation that activates the sparse convolution at various locations for quantization error alleviation, and the performance gap between real-valued and binary sparse convolutional networks is closed without complexity overhead. Specifically, we first present the shifted sparse convolution that fuses the information in the receptive field for the active sites that match the pre-defined positions. Then we employ the differentiable search strategies to discover the optimal opsitions for active site matching in the shifted sparse convolution, and the quantization errors are significantly alleviated for efficient point cloud analysis. For fair evaluation of the proposed method, we empirically select the recently advances that are beneficial for sparse convolution network binarization to construct a strong baseline. The experimental results on Scan-Net and NYU Depth v2 show that our BSC-Net achieves significant improvement upon our srtong baseline and outperforms the state-of-the-art network binarization methods by a remarkable margin without additional computation overhead for binarizing sparse convolutional networks.

A Novel Neural Network Approach for Predicting the Arrival Time of Buses for Smart On-Demand Public Transit

  • Authors: Narges Rashvand, Sanaz Sadat Hosseini, Mona Azarbayjani, Hamed Tabkhi
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.15495
  • Pdf link: https://arxiv.org/pdf/2303.15495
  • Abstract
    Among the major public transportation systems in cities, bus transit has its problems, including more accuracy and reliability when estimating the bus arrival time for riders. This can lead to delays and decreased ridership, especially in cities where public transportation is heavily relied upon. A common issue is that the arrival times of buses do not match the schedules, resulting in latency for fixed schedules. According to the study in this paper on New York City bus data, there is an average delay of around eight minutes or 491 seconds mismatch between the bus arrivals and the actual scheduled time. This research paper presents a novel AI-based data-driven approach for estimating the arrival times of buses at each transit point (station). Our approach is based on a fully connected neural network and can predict the arrival time collectively across all bus lines in large metropolitan areas. Our neural-net data-driven approach provides a new way to estimate the arrival time of the buses, which can lead to a more efficient and smarter way to bring the bus transit to the general public. Our evaluation of the network bus system with more than 200 bus lines, and 2 million data points, demonstrates less than 40 seconds of estimated error for arrival times. The inference time per each validation set data point is less than 0.006 ms.

Learning Harmonic Molecular Representations on Riemannian Manifold

  • Authors: Yiqun Wang, Yuning Shen, Shi Chen, Lihao Wang, Fei Ye, Hao Zhou
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
  • Arxiv link: https://arxiv.org/abs/2303.15520
  • Pdf link: https://arxiv.org/pdf/2303.15520
  • Abstract
    Molecular representation learning plays a crucial role in AI-assisted drug discovery research. Encoding 3D molecular structures through Euclidean neural networks has become the prevailing method in the geometric deep learning community. However, the equivariance constraints and message passing in Euclidean space may limit the network expressive power. In this work, we propose a Harmonic Molecular Representation learning (HMR) framework, which represents a molecule using the Laplace-Beltrami eigenfunctions of its molecular surface. HMR offers a multi-resolution representation of molecular geometric and chemical features on 2D Riemannian manifold. We also introduce a harmonic message passing method to realize efficient spectral message passing over the surface manifold for better molecular encoding. Our proposed method shows comparable predictive power to current models in small molecule property prediction, and outperforms the state-of-the-art deep learning models for ligand-binding protein pocket classification and the rigid protein docking challenge, demonstrating its versatility in molecular representation learning.

A New Index based on Power Splitting Indices for Predicting Proper Time of Controlled Islanding

  • Authors: Hamzeh Davarikia, Faycal Znidi, Masoud Barati, Heena Rathore
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.15530
  • Pdf link: https://arxiv.org/pdf/2303.15530
  • Abstract
    In the event of large disturbances, the practice of controlled islanding is used as a last resort to prevent cascading outages. The application of the strategy at the right time is crucial to maintaining system security. A controlled islanding strategy may be deployed efficiently at the right time by predicting the time of uncontrolled system splitting. The purpose of this study is to predict the appropriate islanding time to prevent catastrophic blackout and uncontrolled islanding based on existing relationships between coherent generator groups. A new instability index is derived from the proximity of inter-area oscillations to power splitting indices. Power splitting indices are derived using synchronization coefficients, which recognize the conditions in the system that warrant controlled islanding. The critical values of indices are calculated in offline mode using simulation data from IEEE 39-Buses, and their online performance is evaluated following a controlled islanding strategy. Through the introduction of these indices, system degradation can be effectively evaluated, and blackouts can be predicted early and prevented by controlled islanding at the right time.

Randomized rounding algorithms for large scale unsplittable flow problems

  • Authors: François Lamothe, Emmanuel Rachelson, Alain Haït, Cedric Baudoin, Jean-Baptiste Dupe
  • Subjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2303.15550
  • Pdf link: https://arxiv.org/pdf/2303.15550
  • Abstract
    Unsplittable flow problems cover a wide range of telecommunication and transportation problems and their efficient resolution is key to a number of applications. In this work, we study algorithms that can scale up to large graphs and important numbers of commodities. We present and analyze in detail a heuristic based on the linear relaxation of the problem and randomized rounding. We provide empirical evidence that this approach is competitive with state-of-the-art resolution methods either by its scaling performance or by the quality of its solutions. We provide a variation of the heuristic which has the same approximation factor as the state-of-the-art approximation algorithm. We also derive a tighter analysis for the approximation factor of both the variation and the state-of-the-art algorithm. We introduce a new objective function for the unsplittable flow problem and discuss its differences with the classical congestion objective function. Finally, we discuss the gap in practical performance and theoretical guarantees between all the aforementioned algorithms.

Privacy-preserving machine learning for healthcare: open challenges and future perspectives

  • Authors: Alejandro Guerra-Manzanares, L. Julian Lechuga Lopez, Michail Maniatakos, Farah E. Shamout
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.15563
  • Pdf link: https://arxiv.org/pdf/2303.15563
  • Abstract
    Machine Learning (ML) has recently shown tremendous success in modeling various healthcare prediction tasks, ranging from disease diagnosis and prognosis to patient treatment. Due to the sensitive nature of medical data, privacy must be considered along the entire ML pipeline, from model training to inference. In this paper, we conduct a review of recent literature concerning Privacy-Preserving Machine Learning (PPML) for healthcare. We primarily focus on privacy-preserving training and inference-as-a-service, and perform a comprehensive review of existing trends, identify challenges, and discuss opportunities for future research directions. The aim of this review is to guide the development of private and efficient ML models in healthcare, with the prospects of translating research efforts into real-world settings.

Core-Periphery Principle Guided Redesign of Self-Attention in Transformers

  • Authors: Xiaowei Yu, Lu Zhang, Haixing Dai, Yanjun Lyu, Lin Zhao, Zihao Wu, David Liu, Tianming Liu, Dajiang Zhu
  • Subjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
  • Arxiv link: https://arxiv.org/abs/2303.15569
  • Pdf link: https://arxiv.org/pdf/2303.15569
  • Abstract
    Designing more efficient, reliable, and explainable neural network architectures is critical to studies that are based on artificial intelligence (AI) techniques. Previous studies, by post-hoc analysis, have found that the best-performing ANNs surprisingly resemble biological neural networks (BNN), which indicates that ANNs and BNNs may share some common principles to achieve optimal performance in either machine learning or cognitive/behavior tasks. Inspired by this phenomenon, we proactively instill organizational principles of BNNs to guide the redesign of ANNs. We leverage the Core-Periphery (CP) organization, which is widely found in human brain networks, to guide the information communication mechanism in the self-attention of vision transformer (ViT) and name this novel framework as CP-ViT. In CP-ViT, the attention operation between nodes is defined by a sparse graph with a Core-Periphery structure (CP graph), where the core nodes are redesigned and reorganized to play an integrative role and serve as a center for other periphery nodes to exchange information. We evaluated the proposed CP-ViT on multiple public datasets, including medical image datasets (INbreast) and natural image datasets. Interestingly, by incorporating the BNN-derived principle (CP structure) into the redesign of ViT, our CP-ViT outperforms other state-of-the-art ANNs. In general, our work advances the state of the art in three aspects: 1) This work provides novel insights for brain-inspired AI: we can utilize the principles found in BNNs to guide and improve our ANN architecture design; 2) We show that there exist sweet spots of CP graphs that lead to CP-ViTs with significantly improved performance; and 3) The core nodes in CP-ViT correspond to task-related meaningful and important image patches, which can significantly enhance the interpretability of the trained deep model.

Learning Expressive Prompting With Residuals for Vision Transformers

  • Authors: Rajshekhar Das, Yonatan Dukler, Avinash Ravichandran, Ashwin Swaminathan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15591
  • Pdf link: https://arxiv.org/pdf/2303.15591
  • Abstract
    Prompt learning is an efficient approach to adapt transformers by inserting learnable set of parameters into the input and intermediate representations of a pre-trained model. In this work, we present Expressive Prompts with Residuals (EXPRES) which modifies the prompt learning paradigm specifically for effective adaptation of vision transformers (ViT). Out method constructs downstream representations via learnable ``output'' tokens, that are akin to the learned class tokens of the ViT. Further for better steering of the downstream representation processed by the frozen transformer, we introduce residual learnable tokens that are added to the output of various computations. We apply EXPRES for image classification, few shot learning, and semantic segmentation, and show our method is capable of achieving state of the art prompt tuning on 3/3 categories of the VTAB benchmark. In addition to strong performance, we observe that our approach is an order of magnitude more prompt efficient than existing visual prompting baselines. We analytically show the computational benefits of our approach over weight space adaptation techniques like finetuning. Lastly we systematically corroborate the architectural design of our method via a series of ablation experiments.

Multiphysics discovery with moving boundaries using Ensemble SINDy and Peridynamic Differential Operator

  • Authors: A. C. Bekar, E. Haghighat, E. Madenci
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.15631
  • Pdf link: https://arxiv.org/pdf/2303.15631
  • Abstract
    This study proposes a novel framework for learning the underlying physics of phenomena with moving boundaries. The proposed approach combines Ensemble SINDy and Peridynamic Differential Operator (PDDO) and imposes an inductive bias assuming the moving boundary physics evolve in its own corotational coordinate system. The robustness of the approach is demonstrated by considering various levels of noise in the measured data using the 2D Fisher-Stefan model. The confidence intervals of recovered coefficients are listed, and the uncertainties of the moving boundary positions are depicted by obtaining the solutions with the recovered coefficients. Although the main focus of this study is the Fisher-Stefan model, the proposed approach is applicable to any type of moving boundary problem with a smooth moving boundary front without a mushy region. The code and data for this framework is available at: https://github.com/alicanbekar/MB_PDDO-SINDy.

Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning

  • Authors: Vladislav Lialin, Vijeta Deshpande, Anna Rumshisky
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2303.15647
  • Pdf link: https://arxiv.org/pdf/2303.15647
  • Abstract
    This paper presents a systematic overview and comparison of parameter-efficient fine-tuning methods covering over 40 papers published between February 2019 and February 2023. These methods aim to resolve the infeasibility and impracticality of fine-tuning large language models by only training a small set of parameters. We provide a taxonomy that covers a broad range of methods and present a detailed method comparison with a specific focus on real-life efficiency and fine-tuning multibillion-scale language models.

Predicting Thermoelectric Power Factor of Bismuth Telluride During Laser Powder Bed Fusion Additive Manufacturing

  • Authors: Ankita Agarwal (1), Tanvi Banerjee (1), Joy Gockel (2), Saniya LeBlanc (3), Joe Walker (4), John Middendorf (4) ((1) Wright State University, (2) Colorado School of Mines, (3) The George Washington University, (4) Open Additive, LLC)
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2303.15663
  • Pdf link: https://arxiv.org/pdf/2303.15663
  • Abstract
    An additive manufacturing (AM) process, like laser powder bed fusion, allows for the fabrication of objects by spreading and melting powder in layers until a freeform part shape is created. In order to improve the properties of the material involved in the AM process, it is important to predict the material characterization property as a function of the processing conditions. In thermoelectric materials, the power factor is a measure of how efficiently the material can convert heat to electricity. While earlier works have predicted the material characterization properties of different thermoelectric materials using various techniques, implementation of machine learning models to predict the power factor of bismuth telluride (Bi2Te3) during the AM process has not been explored. This is important as Bi2Te3 is a standard material for low temperature applications. Thus, we used data about manufacturing processing parameters involved and in-situ sensor monitoring data collected during AM of Bi2Te3, to train different machine learning models in order to predict its thermoelectric power factor. We implemented supervised machine learning techniques using 80% training and 20% test data and further used the permutation feature importance method to identify important processing parameters and in-situ sensor features which were best at predicting power factor of the material. Ensemble-based methods like random forest, AdaBoost classifier, and bagging classifier performed the best in predicting power factor with the highest accuracy of 90% achieved by the bagging classifier model. Additionally, we found the top 15 processing parameters and in-situ sensor features to characterize the material manufacturing property like power factor. These features could further be optimized to maximize power factor of the thermoelectric material and improve the quality of the products built using this material.

DisWOT: Student Architecture Search for Distillation WithOut Training

  • Authors: Peijie Dong, Lujun Li, Zimian Wei
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.15678
  • Pdf link: https://arxiv.org/pdf/2303.15678
  • Abstract
    Knowledge distillation (KD) is an effective training strategy to improve the lightweight student models under the guidance of cumbersome teachers. However, the large architecture difference across the teacher-student pairs limits the distillation gains. In contrast to previous adaptive distillation methods to reduce the teacher-student gap, we explore a novel training-free framework to search for the best student architectures for a given teacher. Our work first empirically show that the optimal model under vanilla training cannot be the winner in distillation. Secondly, we find that the similarity of feature semantics and sample relations between random-initialized teacher-student networks have good correlations with final distillation performances. Thus, we efficiently measure similarity matrixs conditioned on the semantic activation maps to select the optimal student via an evolutionary algorithm without any training. In this way, our student architecture search for Distillation WithOut Training (DisWOT) significantly improves the performance of the model in the distillation stage with at least 180$\times$ training acceleration. Additionally, we extend similarity metrics in DisWOT as new distillers and KD-based zero-proxies. Our experiments on CIFAR, ImageNet and NAS-Bench-201 demonstrate that our technique achieves state-of-the-art results on different search spaces. Our project and code are available at https://lilujunai.github.io/DisWOT-CVPR2023/.

Efficient Deep Learning of Robust, Adaptive Policies using Tube MPC-Guided Data Augmentation

  • Authors: Tong Zhao, Andrea Tagliabue, Jonathan P. How
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.15688
  • Pdf link: https://arxiv.org/pdf/2303.15688
  • Abstract
    The deployment of agile autonomous systems in challenging, unstructured environments requires adaptation capabilities and robustness to uncertainties. Existing robust and adaptive controllers, such as the ones based on MPC, can achieve impressive performance at the cost of heavy online onboard computations. Strategies that efficiently learn robust and onboard-deployable policies from MPC have emerged, but they still lack fundamental adaptation capabilities. In this work, we extend an existing efficient IL algorithm for robust policy learning from MPC with the ability to learn policies that adapt to challenging model/environment uncertainties. The key idea of our approach consists in modifying the IL procedure by conditioning the policy on a learned lower-dimensional model/environment representation that can be efficiently estimated online. We tailor our approach to the task of learning an adaptive position and attitude control policy to track trajectories under challenging disturbances on a multirotor. Our evaluation is performed in a high-fidelity simulation environment and shows that a high-quality adaptive policy can be obtained in about $1.3$ hours. We additionally empirically demonstrate rapid adaptation to in- and out-of-training-distribution uncertainties, achieving a $6.1$ cm average position error under a wind disturbance that corresponds to about $50%$ of the weight of the robot and that is $36%$ larger than the maximum wind seen during training.

Distributed Graph Embedding with Information-Oriented Random Walks

  • Authors: Peng Fang, Arijit Khan, Siqiang Luo, Fang Wang, Dan Feng, Zhenli Li, Wei Yin, Yuchao Cao
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.15702
  • Pdf link: https://arxiv.org/pdf/2303.15702
  • Abstract
    Graph embedding maps graph nodes to low-dimensional vectors, and is widely adopted in machine learning tasks. The increasing availability of billion-edge graphs underscores the importance of learning efficient and effective embeddings on large graphs, such as link prediction on Twitter with over one billion edges. Most existing graph embedding methods fall short of reaching high data scalability. In this paper, we present a general-purpose, distributed, information-centric random walk-based graph embedding framework, DistGER, which can scale to embed billion-edge graphs. DistGER incrementally computes information-centric random walks. It further leverages a multi-proximity-aware, streaming, parallel graph partitioning strategy, simultaneously achieving high local partition quality and excellent workload balancing across machines. DistGER also improves the distributed Skip-Gram learning model to generate node embeddings by optimizing the access locality, CPU throughput, and synchronization efficiency. Experiments on real-world graphs demonstrate that compared to state-of-the-art distributed graph embedding frameworks, including KnightKing, DistDGL, and Pytorch-BigGraph, DistGER exhibits 2.33x-129x acceleration, 45% reduction in cross-machines communication, and > 10% effectiveness improvement in downstream tasks.

Design Space Exploration for PCM-based Photonic Memory

  • Authors: Amin Shafiee, Benoit Charbonnier, Sudeep Pasricha, Mahdi Nikdast
  • Subjects: Emerging Technologies (cs.ET); Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2303.15721
  • Pdf link: https://arxiv.org/pdf/2303.15721
  • Abstract
    The integration of silicon photonics (SiPh) and phase change materials (PCMs) has created a unique opportunity to realize adaptable and reconfigurable photonic systems. In particular, the nonvolatile programmability in PCMs has made them a promising candidate for implementing optical memory systems. In this paper, we describe the design of an optical memory cell based on PCMs while exploring the design space of the cell in terms of PCM material choice (e.g., GST, GSST, Sb2Se3), cell bit capacity, latency, and power consumption. Leveraging this design-space exploration for the design of efficient optical memory cells, we present the design and implementation of an optical memory array and explore its scalability and power consumption when using different optical memory cells. We also identify performance bottlenecks that need to be alleviated to further scale optical memory arrays with competitive latency and energy consumption, compared to their electronic counterparts.

HISSbot: Sidewinding with a Soft Snake Robot

  • Authors: Farhan Rozaidi, Emma Waters, Olivia Dawes, Jennifer Yang, Joseph R. Davidson, Ross L. Hatton
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.15732
  • Pdf link: https://arxiv.org/pdf/2303.15732
  • Abstract
    Snake robots are characterized by their ability to navigate through small spaces and loose terrain by utilizing efficient cyclic forms of locomotion. Soft snake robots are a subset of these robots which utilize soft, compliant actuators to produce movement. Prior work on soft snake robots has primarily focused on planar gaits, such as undulation. More efficient spatial gaits, such as sidewinding, are unexplored gaits for soft snake robots. We propose a novel means of constructing a soft snake robot capable of sidewinding, and introduce the Helical Inflating Soft Snake Robot (HISSbot). We validate this actuation through the physical HISSbot, and demonstrate its ability to sidewind across various surfaces. Our tests show robustness in locomotion through low-friction and granular media.

Deformable Kernel Expansion Model for Efficient Arbitrary-shaped Scene Text Detection

  • Authors: Tao He, Sheng Huang, Wenhao Tang, Bo Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15737
  • Pdf link: https://arxiv.org/pdf/2303.15737
  • Abstract
    Scene text detection is a challenging computer vision task due to the high variation in text shapes and ratios. In this work, we propose a scene text detector named Deformable Kernel Expansion (DKE), which incorporates the merits of both segmentation and contour-based detectors. DKE employs a segmentation module to segment the shrunken text region as the text kernel, then expands the text kernel contour to obtain text boundary by regressing the vertex-wise offsets. Generating the text kernel by segmentation enables DKE to inherit the arbitrary-shaped text region modeling capability of segmentation-based detectors. Regressing the kernel contour with some sampled vertices enables DKE to avoid the complicated pixel-level post-processing and better learn contour deformation as the contour-based detectors. Moreover, we propose an Optimal Bipartite Graph Matching Loss (OBGML) that measures the matching error between the predicted contour and the ground truth, which efficiently minimizes the global contour matching distance. Extensive experiments on CTW1500, Total-Text, MSRA-TD500, and ICDAR2015 demonstrate that DKE achieves a good tradeoff between accuracy and efficiency in scene text detection.

Learning Second-Order Attentive Context for Efficient Correspondence Pruning

  • Authors: Xinyi Ye, Weiyue Zhao, Hao Lu, Zhiguo Cao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15761
  • Pdf link: https://arxiv.org/pdf/2303.15761
  • Abstract
    Correspondence pruning aims to search consistent correspondences (inliers) from a set of putative correspondences. It is challenging because of the disorganized spatial distribution of numerous outliers, especially when putative correspondences are largely dominated by outliers. It's more challenging to ensure effectiveness while maintaining efficiency. In this paper, we propose an effective and efficient method for correspondence pruning. Inspired by the success of attentive context in correspondence problems, we first extend the attentive context to the first-order attentive context and then introduce the idea of attention in attention (ANA) to model second-order attentive context for correspondence pruning. Compared with first-order attention that focuses on feature-consistent context, second-order attention dedicates to attention weights itself and provides an additional source to encode consistent context from the attention map. For efficiency, we derive two approximate formulations for the naive implementation of second-order attention to optimize the cubic complexity to linear complexity, such that second-order attention can be used with negligible computational overheads. We further implement our formulations in a second-order context layer and then incorporate the layer in an ANA block. Extensive experiments demonstrate that our method is effective and efficient in pruning outliers, especially in high-outlier-ratio cases. Compared with the state-of-the-art correspondence pruning approach LMCNet, our method runs 14 times faster while maintaining a competitive accuracy.

A Generalized Ray Formulation For Wave-Optics Rendering

  • Authors: Shlomi Steinberg, Ravi Ramamoorthi, Benedikt Bitterli, Eugene d'Eon, Ling-Qi Yan, Matt Pharr
  • Subjects: Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2303.15762
  • Pdf link: https://arxiv.org/pdf/2303.15762
  • Abstract
    Under ray-optical light transport, the classical ray serves as a local and linear "point query" of light's behaviour. Such point queries are useful, and sophisticated path tracing and sampling techniques enable efficiently computing solutions to light transport problems in complex, real-world settings and environments. However, such formulations are firmly confined to the realm of ray optics, while many applications of interest, in computer graphics and computational optics, demand a more precise understanding of light. We rigorously formulate the generalized ray, which enables local and linear point queries of the wave-optical phase space. Furthermore, we present sample-solve: a simple method that serves as a novel link between path tracing and computational optics. We will show that this link enables the application of modern path tracing techniques for wave-optical rendering, improving upon the state-of-the-art in terms of the generality and accuracy of the formalism, ease of application, as well as performance. Sampling using generalized rays enables interactive rendering under rigorous wave optics, with orders-of-magnitude faster performance compared to existing techniques.

Characterizing the Performance of Emerging Deep Learning, Graph, and High Performance Computing Workloads Under Interference

  • Authors: Hao Xu, Shuang Song, Ze Mao
  • Subjects: Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2303.15763
  • Pdf link: https://arxiv.org/pdf/2303.15763
  • Abstract
    Throughput-oriented computing via co-running multiple applications in the same machine has been widely adopted to achieve high hardware utilization and energy saving on modern supercomputers and data centers. However, efficiently co-running applications raises new design challenges, mainly because applications with diverse requirements can stress out shared hardware resources (IO, Network and Cache) at various levels. The disparities in resource usage can result in interference, which in turn can lead to unpredictable co-running behaviors. To better understand application interference, prior work provided detailed execution characterization. However, these characterization approaches either emphasize on traditional benchmarks or fall into a single application domain. To address this issue, we study 25 up-to-date applications and benchmarks from various application domains and form 625 consolidation pairs to thoroughly analyze the execution interference caused by application co-running. Moreover, we leverage mini-benchmarks and real applications to pinpoint the provenance of co-running interference in both hardware and software aspects.

TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

  • Authors: Xiangyun Meng, Nathan Hatch, Alexander Lambert, Anqi Li, Nolan Wagener, Matthew Schmittle, JoonHo Lee, Wentao Yuan, Zoey Chen, Samuel Deng, Greg Okopal, Dieter Fox, Byron Boots, Amirreza Shaban
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.15771
  • Pdf link: https://arxiv.org/pdf/2303.15771
  • Abstract
    Effective use of camera-based vision systems is essential for robust performance in autonomous off-road driving, particularly in the high-speed regime. Despite success in structured, on-road settings, current end-to-end approaches for scene prediction have yet to be successfully adapted for complex outdoor terrain. To this end, we present TerrainNet, a vision-based terrain perception system for semantic and geometric terrain prediction for aggressive, off-road navigation. The approach relies on several key insights and practical considerations for achieving reliable terrain modeling. The network includes a multi-headed output representation to capture fine- and coarse-grained terrain features necessary for estimating traversability. Accurate depth estimation is achieved using self-supervised depth completion with multi-view RGB and stereo inputs. Requirements for real-time performance and fast inference speeds are met using efficient, learned image feature projections. Furthermore, the model is trained on a large-scale, real-world off-road dataset collected across a variety of diverse outdoor environments. We show how TerrainNet can also be used for costmap prediction and provide a detailed framework for integration into a planning module. We demonstrate the performance of TerrainNet through extensive comparison to current state-of-the-art baselines for camera-only scene prediction. Finally, we showcase the effectiveness of integrating TerrainNet within a complete autonomous-driving stack by conducting a real-world vehicle test in a challenging off-road scenario.

HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models

  • Authors: Shan Ning, Longtian Qiu, Yongfei Liu, Xuming He
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15786
  • Pdf link: https://arxiv.org/pdf/2303.15786
  • Abstract
    Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions. Recently, Contrastive Language-Image Pre-training (CLIP) has shown great potential in providing interaction prior for HOI detectors via knowledge distillation. However, such approaches often rely on large-scale training data and suffer from inferior performance under few/zero-shot scenarios. In this paper, we propose a novel HOI detection framework that efficiently extracts prior knowledge from CLIP and achieves better generalization. In detail, we first introduce a novel interaction decoder to extract informative regions in the visual feature map of CLIP via a cross-attention mechanism, which is then fused with the detection backbone by a knowledge integration block for more accurate human-object pair detection. In addition, prior knowledge in CLIP text encoder is leveraged to generate a classifier by embedding HOI descriptions. To distinguish fine-grained interactions, we build a verb classifier from training data via visual semantic arithmetic and a lightweight verb representation adapter. Furthermore, we propose a training-free enhancement to exploit global HOI predictions from CLIP. Extensive experiments demonstrate that our method outperforms the state of the art by a large margin on various settings, e.g. +4.04 mAP on HICO-Det. The source code is available in https://github.com/Artanic30/HOICLIP.

KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation

  • Authors: Xiangyang Li, Zihan Wang, Jiahao Yang, Yaowei Wang, Shuqiang Jiang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15796
  • Pdf link: https://arxiv.org/pdf/2303.15796
  • Abstract
    Vision-and-language navigation (VLN) is the task to enable an embodied agent to navigate to a remote location following the natural language instruction in real scenes. Most of the previous approaches utilize the entire features or object-centric features to represent navigable candidates. However, these representations are not efficient enough for an agent to perform actions to arrive the target location. As knowledge provides crucial information which is complementary to visible content, in this paper, we propose a Knowledge Enhanced Reasoning Model (KERM) to leverage knowledge to improve agent navigation ability. Specifically, we first retrieve facts (i.e., knowledge described by language descriptions) for the navigation views based on local regions from the constructed knowledge base. The retrieved facts range from properties of a single object (e.g., color, shape) to relationships between objects (e.g., action, spatial position), providing crucial information for VLN. We further present the KERM which contains the purification, fact-aware interaction, and instruction-guided aggregation modules to integrate visual, history, instruction, and fact features. The proposed KERM can automatically select and gather crucial and relevant cues, obtaining more accurate action prediction. Experimental results on the REVERIE, R2R, and SOON datasets demonstrate the effectiveness of the proposed method.

Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition

  • Authors: Xiao Yang, Chang Liu, Longlong Xu, Yikai Wang, Yinpeng Dong, Ning Chen, Hang Su, Jun Zhu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.15818
  • Pdf link: https://arxiv.org/pdf/2303.15818
  • Abstract
    Face recognition is a prevailing authentication solution in numerous biometric applications. Physical adversarial attacks, as an important surrogate, can identify the weaknesses of face recognition systems and evaluate their robustness before deployed. However, most existing physical attacks are either detectable readily or ineffective against commercial recognition systems. The goal of this work is to develop a more reliable technique that can carry out an end-to-end evaluation of adversarial robustness for commercial systems. It requires that this technique can simultaneously deceive black-box recognition models and evade defensive mechanisms. To fulfill this, we design adversarial textured 3D meshes (AT3D) with an elaborate topology on a human face, which can be 3D-printed and pasted on the attacker's face to evade the defenses. However, the mesh-based optimization regime calculates gradients in high-dimensional mesh space, and can be trapped into local optima with unsatisfactory transferability. To deviate from the mesh-based space, we propose to perturb the low-dimensional coefficient space based on 3D Morphable Model, which significantly improves black-box transferability meanwhile enjoying faster search efficiency and better visual quality. Extensive experiments in digital and physical scenarios show that our method effectively explores the security vulnerabilities of multiple popular commercial services, including three recognition APIs, four anti-spoofing APIs, two prevailing mobile phones and two automated access control systems.

One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization

  • Authors: Deze Wang, Boxing Chen, Shanshan Li, Wei Luo, Shaoliang Peng, Wei Dong, Xiangke Liao
  • Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.15822
  • Pdf link: https://arxiv.org/pdf/2303.15822
  • Abstract
    As pre-trained models automate many code intelligence tasks, a widely used paradigm is to fine-tune a model on the task dataset for each programming language. A recent study reported that multilingual fine-tuning benefits a range of tasks and models. However, we find that multilingual fine-tuning leads to performance degradation on recent models UniXcoder and CodeT5. To alleviate the potentially catastrophic forgetting issue in multilingual models, we fix all pre-trained model parameters, insert the parameter-efficient structure adapter, and fine-tune it. Updating only 0.6% of the overall parameters compared to full-model fine-tuning for each programming language, adapter tuning yields consistent improvements on code search and summarization tasks, achieving state-of-the-art results. In addition, we experimentally show its effectiveness in cross-lingual and low-resource scenarios. Multilingual fine-tuning with 200 samples per programming language approaches the results fine-tuned with the entire dataset on code summarization. Our experiments on three probing tasks show that adapter tuning significantly outperforms full-model fine-tuning and effectively overcomes catastrophic forgetting.

Automated wildlife image classification: An active learning tool for ecological applications

  • Authors: Ludwig Bothmann, Lisa Wimmer, Omid Charrakh, Tobias Weber, Hendrik Edelhoff, Wibke Peters, Hien Nguyen, Caryl Benjamin, Annette Menzel
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)
  • Arxiv link: https://arxiv.org/abs/2303.15823
  • Pdf link: https://arxiv.org/pdf/2303.15823
  • Abstract
    Wildlife camera trap images are being used extensively to investigate animal abundance, habitat associations, and behavior, which is complicated by the fact that experts must first classify the images manually. Artificial intelligence systems can take over this task but usually need a large number of already-labeled training images to achieve sufficient performance. This requirement necessitates human expert labor and poses a particular challenge for projects with few cameras or short durations. We propose a label-efficient learning strategy that enables researchers with small or medium-sized image databases to leverage the potential of modern machine learning, thus freeing crucial resources for subsequent analyses. Our methodological proposal is two-fold: (1) We improve current strategies of combining object detection and image classification by tuning the hyperparameters of both models. (2) We provide an active learning (AL) system that allows training deep learning models very efficiently in terms of required human-labeled training images. We supply a software package that enables researchers to use these methods directly and thereby ensure the broad applicability of the proposed framework in ecological practice. We show that our tuning strategy improves predictive performance. We demonstrate how the AL pipeline reduces the amount of pre-labeled data needed to achieve a specific predictive performance and that it is especially valuable for improving out-of-sample predictive performance. We conclude that the combination of tuning and AL increases predictive performance substantially. Furthermore, we argue that our work can broadly impact the community through the ready-to-use software package provided. Finally, the publication of our models tailored to European wildlife data enriches existing model bases mostly trained on data from Africa and North America.

Soft-prompt tuning to predict lung cancer using primary care free-text Dutch medical notes

  • Authors: Auke Elfrink, Iacopo Vagliano, Ameen Abu-Hanna, Iacer Calixto
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.15846
  • Pdf link: https://arxiv.org/pdf/2303.15846
  • Abstract
    We investigate different natural language processing (NLP) approaches based on contextualised word representations for the problem of early prediction of lung cancer using free-text patient medical notes of Dutch primary care physicians. Because lung cancer has a low prevalence in primary care, we also address the problem of classification under highly imbalanced classes. Specifically, we use large Transformer-based pretrained language models (PLMs) and investigate: 1) how \textit{soft prompt-tuning} -- an NLP technique used to adapt PLMs using small amounts of training data -- compares to standard model fine-tuning; 2) whether simpler static word embedding models (WEMs) can be more robust compared to PLMs in highly imbalanced settings; and 3) how models fare when trained on notes from a small number of patients. We find that 1) soft-prompt tuning is an efficient alternative to standard model fine-tuning; 2) PLMs show better discrimination but worse calibration compared to simpler static word embedding models as the classification problem becomes more imbalanced; and 3) results when training models on small number of patients are mixed and show no clear differences between PLMs and WEMs. All our code is available open source in \url{https://bitbucket.org/aumc-kik/prompt_tuning_cancer_prediction/}.

GAS: A Gaussian Mixture Distribution-Based Adaptive Sampling Method for PINNs

  • Authors: Yuling Jiao, Di Li, Xiliang Lu, Jerry Zhijian Yang, Cheng Yuan
  • Subjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
  • Arxiv link: https://arxiv.org/abs/2303.15849
  • Pdf link: https://arxiv.org/pdf/2303.15849
  • Abstract
    With recent study of the deep learning in scientific computation, the PINNs method has drawn widespread attention for solving PDEs. Compared with traditional methods, PINNs can efficiently handle high-dimensional problems, while the accuracy is relatively low, especially for highly irregular problems. Inspired by the idea of adaptive finite element methods and incremental learning, we propose GAS, a Gaussian mixture distribution-based adaptive sampling method for PINNs. During the training procedure, GAS uses the current residual information to generate a Gaussian mixture distribution for the sampling of additional points, which are then trained together with history data to speed up the convergence of loss and achieve a higher accuracy. Several numerical simulations on 2d to 10d problems show that GAS is a promising method which achieves the state-of-the-art accuracy among deep solvers, while being comparable with traditional numerical solvers.

The Wyner Variational Autoencoder for Unsupervised Multi-Layer Wireless Fingerprinting

  • Authors: Teng-Hui Huang, Thilini Dahanayaka, Kanchana Thilakarathna, Philip H.W. Leong, Hesham El Gamal
  • Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.15860
  • Pdf link: https://arxiv.org/pdf/2303.15860
  • Abstract
    Wireless fingerprinting refers to a device identification method leveraging hardware imperfections and wireless channel variations as signatures. Beyond physical layer characteristics, recent studies demonstrated that user behaviours could be identified through network traffic, e.g., packet length, without decryption of the payload. Inspired by these results, we propose a multi-layer fingerprinting framework that jointly considers the multi-layer signatures for improved identification performance. In contrast to previous works, by leveraging the recent multi-view machine learning paradigm, i.e., data with multiple forms, our method can cluster the device information shared among the multi-layer features without supervision. Our information-theoretic approach can be extended to supervised and semi-supervised settings with straightforward derivations. In solving the formulated problem, we obtain a tight surrogate bound using variational inference for efficient optimization. In extracting the shared device information, we develop an algorithm based on the Wyner common information method, enjoying reduced computation complexity as compared to existing approaches. The algorithm can be applied to data distributions belonging to the exponential family class. Empirically, we evaluate the algorithm in a synthetic dataset with real-world video traffic and simulated physical layer characteristics. Our empirical results show that the proposed method outperforms the state-of-the-art baselines in both supervised and unsupervised settings.

Accelerating exponential integrators to efficiently solve advection-diffusion-reaction equations

  • Authors: Marco Caliari, Fabio Cassini, Lukas Einkemmer, Alexander Ostermann
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.15861
  • Pdf link: https://arxiv.org/pdf/2303.15861
  • Abstract
    In this paper we consider an approach to improve the performance of exponential integrators/Lawson schemes in cases where the solution of a related, but usually much simpler, problem can be computed efficiently. While for implicit methods such an approach is common (e.g. by using preconditioners), for exponential integrators this has proven more challenging. Here we propose to extract a constant coefficient differential operator from advection-diffusion-reaction equations for which we are then able to compute the required matrix functions efficiently. Both a linear stability analysis and numerical experiments show that the resulting schemes can be unconditionally stable. In fact, we find that exponential integrators and Lawson schemes can have better stability properties than similarly constructed implicit-explicit schemes. We also propose new Lawson type integrators that further improve on these stability properties. The effectiveness of the approach is highlighted by a number of numerical examples in two and three space dimensions.

Efficient Alternating Minimization Solvers for Wyner Multi-View Unsupervised Learning

  • Authors: Teng-Hui Huang, Hesham El Gamal
  • Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.15866
  • Pdf link: https://arxiv.org/pdf/2303.15866
  • Abstract
    In this work, we adopt Wyner common information framework for unsupervised multi-view representation learning. Within this framework, we propose two novel formulations that enable the development of computational efficient solvers based on the alternating minimization principle. The first formulation, referred to as the {\em variational form}, enjoys a linearly growing complexity with the number of views and is based on a variational-inference tight surrogate bound coupled with a Lagrangian optimization objective function. The second formulation, i.e., the {\em representational form}, is shown to include known results as special cases. Here, we develop a tailored version from the alternating direction method of multipliers (ADMM) algorithm for solving the resulting non-convex optimization problem. In the two cases, the convergence of the proposed solvers is established in certain relevant regimes. Furthermore, our empirical results demonstrate the effectiveness of the proposed methods as compared with the state-of-the-art solvers. In a nutshell, the proposed solvers offer computational efficiency, theoretical convergence guarantees, scalable complexity with the number of views, and exceptional accuracy as compared with the state-of-the-art techniques. Our focus here is devoted to the discrete case and our results for continuous distributions are reported elsewhere.

STMixer: A One-Stage Sparse Action Detector

  • Authors: Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15879
  • Pdf link: https://arxiv.org/pdf/2303.15879
  • Abstract
    Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and cannot capture context information outside the bounding box. Recently, a few query-based action detectors are proposed to predict action instances in an end-to-end manner. However, they still lack adaptability in feature sampling and decoding, thus suffering from the issues of inferior performance or slower convergence. In this paper, we propose a new one-stage sparse action detector, termed STMixer. STMixer is based on two core designs. First, we present a query-based adaptive feature sampling module, which endows our STMixer with the flexibility of mining a set of discriminative features from the entire spatiotemporal domain. Second, we devise a dual-branch feature mixing module, which allows our STMixer to dynamically attend to and mix video features along the spatial and the temporal dimension respectively for better feature decoding. Coupling these two designs with a video backbone yields an efficient end-to-end action detector. Without bells and whistles, our STMixer obtains the state-of-the-art results on the datasets of AVA, UCF101-24, and JHMDB.

Head3D: Complete 3D Head Generation via Tri-plane Feature Distillation

  • Authors: Yuhao Cheng, Yichao Yan, Wenhan Zhu, Ye Pan, Bowen Pan, Xiaokang Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15892
  • Pdf link: https://arxiv.org/pdf/2303.15892
  • Abstract
    Head generation with diverse identities is an important task in computer vision and computer graphics, widely used in multimedia applications. However, current full head generation methods require a large number of 3D scans or multi-view images to train the model, resulting in expensive data acquisition cost. To address this issue, we propose Head3D, a method to generate full 3D heads with limited multi-view images. Specifically, our approach first extracts facial priors represented by tri-planes learned in EG3D, a 3D-aware generative model, and then proposes feature distillation to deliver the 3D frontal faces into complete heads without compromising head integrity. To mitigate the domain gap between the face and head models, we present dual-discriminators to guide the frontal and back head generation, respectively. Our model achieves cost-efficient and diverse complete head generation with photo-realistic renderings and high-quality geometry representations. Extensive experiments demonstrate the effectiveness of our proposed Head3D, both qualitatively and quantitatively.

Efficient Quality Diversity Optimization of 3D Buildings through 2D Pre-optimization

  • Authors: Alexander Hagg, Martin L. Kliemank, Alexander Asteroth, Dominik Wilde, Mario C. Bedrunka, Holger Foysi, Dirk Reith
  • Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.15896
  • Pdf link: https://arxiv.org/pdf/2303.15896
  • Abstract
    Quality diversity algorithms can be used to efficiently create a diverse set of solutions to inform engineers' intuition. But quality diversity is not efficient in very expensive problems, needing 100.000s of evaluations. Even with the assistance of surrogate models, quality diversity needs 100s or even 1000s of evaluations, which can make it use infeasible. In this study we try to tackle this problem by using a pre-optimization strategy on a lower-dimensional optimization problem and then map the solutions to a higher-dimensional case. For a use case to design buildings that minimize wind nuisance, we show that we can predict flow features around 3D buildings from 2D flow features around building footprints. For a diverse set of building designs, by sampling the space of 2D footprints with a quality diversity algorithm, a predictive model can be trained that is more accurate than when trained on a set of footprints that were selected with a space-filling algorithm like the Sobol sequence. Simulating only 16 buildings in 3D, a set of 1024 building designs with low predicted wind nuisance is created. We show that we can produce better machine learning models by producing training data with quality diversity instead of using common sampling techniques. The method can bootstrap generative design in a computationally expensive 3D domain and allow engineers to sweep the design space, understanding wind nuisance in early design phases.

Mask-Free Video Instance Segmentation

  • Authors: Lei Ke, Martin Danelljan, Henghui Ding, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.15904
  • Pdf link: https://arxiv.org/pdf/2303.15904
  • Abstract
    The recent advancement in Video Instance Segmentation (VIS) has largely been driven by the use of deeper and increasingly data-hungry transformer-based models. However, video masks are tedious and expensive to annotate, limiting the scale and diversity of existing VIS datasets. In this work, we aim to remove the mask-annotation requirement. We propose MaskFreeVIS, achieving highly competitive VIS performance, while only using bounding box annotations for the object state. We leverage the rich temporal mask consistency constraints in videos by introducing the Temporal KNN-patch Loss (TK-Loss), providing strong mask supervision without any labels. Our TK-Loss finds one-to-many matches across frames, through an efficient patch-matching step followed by a K-nearest neighbor selection. A consistency loss is then enforced on the found matches. Our mask-free objective is simple to implement, has no trainable parameters, is computationally efficient, yet outperforms baselines employing, e.g., state-of-the-art optical flow to enforce temporal mask consistency. We validate MaskFreeVIS on the YouTube-VIS 2019/2021, OVIS and BDD100K MOTS benchmarks. The results clearly demonstrate the efficacy of our method by drastically narrowing the gap between fully and weakly-supervised VIS performance. Our code and trained models are available at https://github.com/SysCV/MaskFreeVis.

When Brain-inspired AI Meets AGI

  • Authors: Lin Zhao, Lu Zhang, Zihao Wu, Yuzhong Chen, Haixing Dai, Xiaowei Yu, Zhengliang Liu, Tuo Zhang, Xintao Hu, Xi Jiang, Xiang Li, Dajiang Zhu, Dinggang Shen, Tianming Liu
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.15935
  • Pdf link: https://arxiv.org/pdf/2303.15935
  • Abstract
    Artificial General Intelligence (AGI) has been a long-standing goal of humanity, with the aim of creating machines capable of performing any intellectual task that humans can do. To achieve this, AGI researchers draw inspiration from the human brain and seek to replicate its principles in intelligent machines. Brain-inspired artificial intelligence is a field that has emerged from this endeavor, combining insights from neuroscience, psychology, and computer science to develop more efficient and powerful AI systems. In this article, we provide a comprehensive overview of brain-inspired AI from the perspective of AGI. We begin with the current progress in brain-inspired AI and its extensive connection with AGI. We then cover the important characteristics for both human intelligence and AGI (e.g., scaling, multimodality, and reasoning). We discuss important technologies toward achieving AGI in current AI systems, such as in-context learning and prompt tuning. We also investigate the evolution of AGI systems from both algorithmic and infrastructural perspectives. Finally, we explore the limitations and future of AGI.

A source separation approach to temporal graph modelling for computer networks

  • Authors: Corentin Larroche
  • Subjects: Cryptography and Security (cs.CR); Applications (stat.AP); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2303.15950
  • Pdf link: https://arxiv.org/pdf/2303.15950
  • Abstract
    Detecting malicious activity within an enterprise computer network can be framed as a temporal link prediction task: given a sequence of graphs representing communications between hosts over time, the goal is to predict which edges should--or should not--occur in the future. However, standard temporal link prediction algorithms are ill-suited for computer network monitoring as they do not take account of the peculiar short-term dynamics of computer network activity, which exhibits sharp seasonal variations. In order to build a better model, we propose a source separation-inspired description of computer network activity: at each time step, the observed graph is a mixture of subgraphs representing various sources of activity, and short-term dynamics result from changes in the mixing coefficients. Both qualitative and quantitative experiments demonstrate the validity of our approach.

Efficient Parallel Split Learning over Resource-constrained Wireless Edge Networks

  • Authors: Zheng Lin, Guangyu Zhu, Yiqin Deng, Xianhao Chen, Yue Gao, Kaibin Huang, Yuguang Fang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.15991
  • Pdf link: https://arxiv.org/pdf/2303.15991
  • Abstract
    The increasingly deeper neural networks hinder the democratization of privacy-enhancing distributed learning, such as federated learning (FL), to resource-constrained devices. To overcome this challenge, in this paper, we advocate the integration of edge computing paradigm and parallel split learning (PSL), allowing multiple client devices to offload substantial training workloads to an edge server via layer-wise model split. By observing that existing PSL schemes incur excessive training latency and large volume of data transmissions, we propose an innovative PSL framework, namely, efficient parallel split learning (EPSL), to accelerate model training. To be specific, EPSL parallelizes client-side model training and reduces the dimension of local gradients for back propagation (BP) via last-layer gradient aggregation, leading to a significant reduction in server-side training and communication latency. Moreover, by considering the heterogeneous channel conditions and computing capabilities at client devices, we jointly optimize subchannel allocation, power control, and cut layer selection to minimize the per-round latency. Simulation results show that the proposed EPSL framework significantly decreases the training latency needed to achieve a target accuracy compared with the state-of-the-art benchmarks, and the tailored resource management and layer split strategy can considerably reduce latency than the counterpart without optimization.

A Survey on Malware Detection with Graph Representation Learning

  • Authors: Tristan Bilot, Nour El Madhoun, Khaldoun Al Agha, Anis Zouaoui
  • Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16004
  • Pdf link: https://arxiv.org/pdf/2303.16004
  • Abstract
    Malware detection has become a major concern due to the increasing number and complexity of malware. Traditional detection methods based on signatures and heuristics are used for malware detection, but unfortunately, they suffer from poor generalization to unknown attacks and can be easily circumvented using obfuscation techniques. In recent years, Machine Learning (ML) and notably Deep Learning (DL) achieved impressive results in malware detection by learning useful representations from data and have become a solution preferred over traditional methods. More recently, the application of such techniques on graph-structured data has achieved state-of-the-art performance in various domains and demonstrates promising results in learning more robust representations from malware. Yet, no literature review focusing on graph-based deep learning for malware detection exists. In this survey, we provide an in-depth literature review to summarize and unify existing works under the common approaches and architectures. We notably demonstrate that Graph Neural Networks (GNNs) reach competitive results in learning robust embeddings from malware represented as expressive graph structures, leading to an efficient detection by downstream classifiers. This paper also reviews adversarial attacks that are utilized to fool graph-based detection methods. Challenges and future research directions are discussed at the end of the paper.

Understanding and Exploring the Whole Set of Good Sparse Generalized Additive Models

  • Authors: Zhi Chen, Chudi Zhong, Margo Seltzer, Cynthia Rudin
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2303.16047
  • Pdf link: https://arxiv.org/pdf/2303.16047
  • Abstract
    In real applications, interaction between machine learning model and domain experts is critical; however, the classical machine learning paradigm that usually produces only a single model does not facilitate such interaction. Approximating and exploring the Rashomon set, i.e., the set of all near-optimal models, addresses this practical challenge by providing the user with a searchable space containing a diverse set of models from which domain experts can choose. We present a technique to efficiently and accurately approximate the Rashomon set of sparse, generalized additive models (GAMs). We present algorithms to approximate the Rashomon set of GAMs with ellipsoids for fixed support sets and use these ellipsoids to approximate Rashomon sets for many different support sets. The approximated Rashomon set serves as a cornerstone to solve practical challenges such as (1) studying the variable importance for the model class; (2) finding models under user-specified constraints (monotonicity, direct editing); (3) investigating sudden changes in the shape functions. Experiments demonstrate the fidelity of the approximated Rashomon set and its effectiveness in solving practical challenges.

Simulation-based Inference for Model Parameterization on Analog Neuromorphic Hardware

  • Authors: Jakob Kaiser, Raphael Stock, Eric Müller, Johannes Schemmel, Sebastian Schmitt
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2303.16056
  • Pdf link: https://arxiv.org/pdf/2303.16056
  • Abstract
    The BrainScaleS-2 (BSS-2) system implements physical models of neurons as well as synapses and aims for an energy-efficient and fast emulation of biological neurons. When replicating neuroscientific experiment results, a major challenge is finding suitable model parameters. This study investigates the suitability of the sequential neural posterior estimation (SNPE) algorithm for parameterizing a multi-compartmental neuron model emulated on the BSS-2 analog neuromorphic hardware system. In contrast to other optimization methods such as genetic algorithms or stochastic searches, the SNPE algorithms belongs to the class of approximate Bayesian computing (ABC) methods and estimates the posterior distribution of the model parameters; access to the posterior allows classifying the confidence in parameter estimations and unveiling correlation between model parameters. In previous applications, the SNPE algorithm showed a higher computational efficiency than traditional ABC methods. For our multi-compartmental model, we show that the approximated posterior is in agreement with experimental observations and that the identified correlation between parameters is in agreement with theoretical expectations. Furthermore, we show that the algorithm can deal with high-dimensional observations and parameter spaces. These results suggest that the SNPE algorithm is a promising approach for automating the parameterization of complex models, especially when dealing with characteristic properties of analog neuromorphic substrates, such as trial-to-trial variations or limited parameter ranges.

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

  • Authors: Kunchang Li, Yali Wang, Yizhuo Li, Yi Wang, Yinan He, Limin Wang, Yu Qiao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16058
  • Pdf link: https://arxiv.org/pdf/2303.16058
  • Abstract
    Video Foundation Models (VFMs) have received limited exploration due to high computational costs and data scarcity. Previous VFMs rely on Image Foundation Models (IFMs), which face challenges in transferring to the video domain. Although VideoMAE has trained a robust ViT from limited data, its low-level reconstruction poses convergence difficulties and conflicts with high-level cross-modal alignment. This paper proposes a training-efficient method for temporal-sensitive VFMs that integrates the benefits of existing methods. To increase data efficiency, we mask out most of the low-semantics video tokens, but selectively align the unmasked tokens with IFM, which serves as the UnMasked Teacher (UMT). By providing semantic guidance, our method enables faster convergence and multimodal friendliness. With a progressive pre-training framework, our model can handle various tasks including scene-related, temporal-related, and complex video-language understanding. Using only public sources for pre-training in 6 days on 32 A100 GPUs, our scratch-built ViT-L/16 achieves state-of-the-art performances on various video tasks. The code and models will be released at https://github.com/OpenGVLab/unmasked_teacher.

Efficient solutions to the relative pose of three calibrated cameras from four points using virtual correspondences

  • Authors: Charalambos Tzamos, Daniel Barath, Torsten Sattler, Zuzana Kukelova
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16078
  • Pdf link: https://arxiv.org/pdf/2303.16078
  • Abstract
    We study the challenging problem of estimating the relative pose of three calibrated cameras. We propose two novel solutions to the notoriously difficult configuration of four points in three views, known as the 4p3v problem. Our solutions are based on the simple idea of generating one additional virtual point correspondence in two views by using the information from the locations of the four input correspondences in the three views. For the first solver, we train a network to predict this point correspondence. The second solver uses a much simpler and more efficient strategy based on the mean points of three corresponding input points. The new solvers are efficient and easy to implement since they are based on the existing efficient minimal solvers, i.e., the well-known 5-point relative pose and the P3P solvers. The solvers achieve state-of-the-art results on real data. The idea of solving minimal problems using virtual correspondences is general and can be applied to other problems, e.g., the 5-point relative pose problem. In this way, minimal problems can be solved using simpler non-minimal solvers or even using sub-minimal samples inside RANSAC. In addition, we compare different variants of 4p3v solvers with the baseline solver for the minimal configuration consisting of three triplets of points and two points visible in two views. We discuss which configuration of points is potentially the most practical in real applications.

Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures

  • Authors: Zirui Fu, Aleksandre Avaliani, Marco Donato
  • Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2303.16100
  • Pdf link: https://arxiv.org/pdf/2303.16100
  • Abstract
    Executing machine learning inference tasks on resource-constrained edge devices requires careful hardware-software co-design optimizations. Recent examples have shown how transformer-based deep neural network models such as ALBERT can be used to enable the execution of natural language processing (NLP) inference on mobile systems-on-chip housing custom hardware accelerators. However, while these existing solutions are effective in alleviating the latency, energy, and area costs of running single NLP tasks, achieving multi-task inference requires running computations over multiple variants of the model parameters, which are tailored to each of the targeted tasks. This approach leads to either prohibitive on-chip memory requirements or paying the cost of off-chip memory access. This paper proposes adapter-ALBERT, an efficient model optimization for maximal data reuse across different tasks. The proposed model's performance and robustness to data compression methods are evaluated across several language tasks from the GLUE benchmark. Additionally, we demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator to extrapolate performance, power, and area improvements over the execution of a traditional ALBERT model on the same hardware platform.

Variational Distribution Learning for Unsupervised Text-to-Image Generation

  • Authors: Minsoo Kang, Doyup Lee, Jiseob Kim, Saehoon Kim, Bohyung Han
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16105
  • Pdf link: https://arxiv.org/pdf/2303.16105
  • Abstract
    We propose a text-to-image generation algorithm based on deep neural networks when text captions for images are unavailable during training. In this work, instead of simply generating pseudo-ground-truth sentences of training images using existing image captioning methods, we employ a pretrained CLIP model, which is capable of properly aligning embeddings of images and corresponding texts in a joint space and, consequently, works well on zero-shot recognition tasks. We optimize a text-to-image generation model by maximizing the data log-likelihood conditioned on pairs of image-text CLIP embeddings. To better align data in the two domains, we employ a principled way based on a variational inference, which efficiently estimates an approximate posterior of the hidden text embedding given an image and its CLIP feature. Experimental results validate that the proposed framework outperforms existing approaches by large margins under unsupervised and semi-supervised text-to-image generation settings.

Multimodal Manoeuvre and Trajectory Prediction for Autonomous Vehicles Using Transformer Networks

  • Authors: Sajjad Mozaffari, Konstantinos Koufos, Mehrdad Dianati
  • Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.16109
  • Pdf link: https://arxiv.org/pdf/2303.16109
  • Abstract
    Predicting the behaviour (i.e. manoeuvre/trajectory) of other road users, including vehicles, is critical for the safe and efficient operation of autonomous vehicles (AVs), a.k.a. automated driving systems (ADSs). Due to the uncertain future behaviour of vehicles, multiple future behaviour modes are often plausible for a vehicle in a given driving scene. Therefore, multimodal prediction can provide richer information than single-mode prediction enabling AVs to perform a better risk assessment. To this end, we propose a novel multimodal prediction framework that can predict multiple plausible behaviour modes and their likelihoods. The proposed framework includes a bespoke problem formulation for manoeuvre prediction, a novel transformer-based prediction model, and a tailored training method for multimodal manoeuvre and trajectory prediction. The performance of the framework is evaluated using two public benchmark highway driving datasets, namely NGSIM and highD. The results show that the proposed framework outperforms the state-of-the-art multimodal methods in the literature in terms of prediction error and is capable of predicting plausible manoeuvre and trajectory modes.

DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets

  • Authors: Isabella Huang, Yashraj Narang, Ruzena Bajcsy, Fabio Ramos, Tucker Hermans, Dieter Fox
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.16138
  • Pdf link: https://arxiv.org/pdf/2303.16138
  • Abstract
    Robotic grasping of 3D deformable objects is critical for real-world applications such as food handling and robotic surgery. Unlike rigid and articulated objects, 3D deformable objects have infinite degrees of freedom. Fully defining their state requires 3D deformation and stress fields, which are exceptionally difficult to analytically compute or experimentally measure. Thus, evaluating grasp candidates for grasp planning typically requires accurate, but slow 3D finite element method (FEM) simulation. Sampling-based grasp planning is often impractical, as it requires evaluation of a large number of grasp candidates. Gradient-based grasp planning can be more efficient, but requires a differentiable model to synthesize optimal grasps from initial candidates. Differentiable FEM simulators may fill this role, but are typically no faster than standard FEM. In this work, we propose learning a predictive graph neural network (GNN), DefGraspNets, to act as our differentiable model. We train DefGraspNets to predict 3D stress and deformation fields based on FEM-based grasp simulations. DefGraspNets not only runs up to 1500 times faster than the FEM simulator, but also enables fast gradient-based grasp optimization over 3D stress and deformation metrics. We design DefGraspNets to align with real-world grasp planning practices and demonstrate generalization across multiple test sets, including real-world experiments.

Dias: Dynamic Rewriting of Pandas Code

  • Authors: Stefanos Baziotis, Daniel Kang, Charith Mendis
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2303.16146
  • Pdf link: https://arxiv.org/pdf/2303.16146
  • Abstract
    In recent years, dataframe libraries, such as pandas have exploded in popularity. Due to their flexibility, they are increasingly used in ad-hoc exploratory data analysis (EDA) workloads. These workloads are diverse, including custom functions which can span libraries or be written in pure Python. The majority of systems available to accelerate EDA workloads focus on bulk-parallel workloads, which contain vastly different computational patterns, typically within a single library. As a result, they can introduce excessive overheads for ad-hoc EDA workloads due to their expensive optimization techniques. Instead, we identify program rewriting as a lightweight technique which can offer substantial speedups while also avoiding slowdowns. We implemented our techniques in Dias, which rewrites notebook cells to be more efficient for ad-hoc EDA workloads. We develop techniques for efficient rewrites in Dias, including dynamic checking of preconditions under which rewrites are correct and just-in-time rewrites for notebook environments. We show that Dias can rewrite individual cells to be 57$\times$ faster compared to pandas and 1909$\times$ faster compared to optimized systems such as modin. Furthermore, Dias can accelerate whole notebooks by up to 3.6$\times$ compared to pandas and 26.4$\times$ compared to modin.

What Writing Assistants Can Learn from Programming IDEs

  • Authors: Sergey Titov, Agnia Sergeyuk, Timofey Bryksin
  • Subjects: Software Engineering (cs.SE); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2303.16175
  • Pdf link: https://arxiv.org/pdf/2303.16175
  • Abstract
    With the development of artificial intelligence, writing assistants (WAs) are changing the way people interact with text, creating lengthy outputs that can be overwhelming for users. The programming field has long addressed this issue, and Integrated Development Environments (IDEs) have been created for efficient software development, helping programmers reduce the cognitive load. This experience could be employed in the development of WAs. IDEs can also be used to test assumptions about interventions that help people interact with WAs efficiently. Previous works have successfully used self-written IDE plugins to test hypotheses in the field of human-computer interaction. The lessons learned can be applied to the building of WAs.

Learning Federated Visual Prompt in Null Space for MRI Reconstruction

  • Authors: Chun-Mei Feng Bangjun Li Xinxing Xu, Yong Liu, Huazhu Fu Wangmeng Zuo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16181
  • Pdf link: https://arxiv.org/pdf/2303.16181
  • Abstract
    Federated Magnetic Resonance Imaging (MRI) reconstruction enables multiple hospitals to collaborate distributedly without aggregating local data, thereby protecting patient privacy. However, the data heterogeneity caused by different MRI protocols, insufficient local training data, and limited communication bandwidth inevitably impair global model convergence and updating. In this paper, we propose a new algorithm, FedPR, to learn federated visual prompts in the null space of global prompt for MRI reconstruction. FedPR is a new federated paradigm that adopts a powerful pre-trained model while only learning and communicating the prompts with few learnable parameters, thereby significantly reducing communication costs and achieving competitive performance on limited local data. Moreover, to deal with catastrophic forgetting caused by data heterogeneity, FedPR also updates efficient federated visual prompts that project the local prompts into an approximate null space of the global prompt, thereby suppressing the interference of gradients on the server performance. Extensive experiments on federated MRI show that FedPR significantly outperforms state-of-the-art FL algorithms with <6% of communication costs when given the limited amount of local training data.

VMesh: Hybrid Volume-Mesh Representation for Efficient View Synthesis

  • Authors: Yuan-Chen Guo, Yan-Pei Cao, Chen Wang, Yu He, Ying Shan, Xiaohu Qie, Song-Hai Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2303.16184
  • Pdf link: https://arxiv.org/pdf/2303.16184
  • Abstract
    With the emergence of neural radiance fields (NeRFs), view synthesis quality has reached an unprecedented level. Compared to traditional mesh-based assets, this volumetric representation is more powerful in expressing scene geometry but inevitably suffers from high rendering costs and can hardly be involved in further processes like editing, posing significant difficulties in combination with the existing graphics pipeline. In this paper, we present a hybrid volume-mesh representation, VMesh, which depicts an object with a textured mesh along with an auxiliary sparse volume. VMesh retains the advantages of mesh-based assets, such as efficient rendering, compact storage, and easy editing, while also incorporating the ability to represent subtle geometric structures provided by the volumetric counterpart. VMesh can be obtained from multi-view images of an object and renders at 2K 60FPS on common consumer devices with high fidelity, unleashing new opportunities for real-time immersive applications.

Large-scale Training Data Search for Object Re-identification

  • Authors: Yue Yao, Huan Lei, Tom Gedeon, Liang Zheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16186
  • Pdf link: https://arxiv.org/pdf/2303.16186
  • Abstract
    We consider a scenario where we have access to the target domain, but cannot afford on-the-fly training data annotation, and instead would like to construct an alternative training set from a large-scale data pool such that a competitive model can be obtained. We propose a search and pruning (SnP) solution to this training data search problem, tailored to object re-identification (re-ID), an application aiming to match the same object captured by different cameras. Specifically, the search stage identifies and merges clusters of source identities which exhibit similar distributions with the target domain. The second stage, subject to a budget, then selects identities and their images from the Stage I output, to control the size of the resulting training set for efficient training. The two steps provide us with training sets 80% smaller than the source pool while achieving a similar or even higher re-ID accuracy. These training sets are also shown to be superior to a few existing search methods such as random sampling and greedy sampling under the same budget on training data size. If we release the budget, training sets resulting from the first stage alone allow even higher re-ID accuracy. We provide interesting discussions on the specificity of our method to the re-ID problem and particularly its role in bridging the re-ID domain gap. The code is available at https://github.com/yorkeyao/SnP.

Hard Nominal Example-aware Template Mutual Matching for Industrial Anomaly Detection

  • Authors: Zixuan Chen, jianhuang Lai, Lingxiao Yang, Xiaohua Xie
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16191
  • Pdf link: https://arxiv.org/pdf/2303.16191
  • Abstract
    Anomaly detectors are widely used in industrial production to detect and localize unknown defects in query images. These detectors are trained on nominal images and have shown success in distinguishing anomalies from most normal samples. However, hard-nominal examples are scattered and far apart from most normalities, they are often mistaken for anomalies by existing anomaly detectors. To address this problem, we propose a simple yet efficient method: \textbf{H}ard Nominal \textbf{E}xample-aware \textbf{T}emplate \textbf{M}utual \textbf{M}atching (HETMM). Specifically, \textit{HETMM} aims to construct a robust prototype-based decision boundary, which can precisely distinguish between hard-nominal examples and anomalies, yielding fewer false-positive and missed-detection rates. Moreover, \textit{HETMM} mutually explores the anomalies in two directions between queries and the template set, and thus it is capable to capture the logical anomalies. This is a significant advantage over most anomaly detectors that frequently fail to detect logical anomalies. Additionally, to meet the speed-accuracy demands, we further propose \textbf{P}ixel-level \textbf{T}emplate \textbf{S}election (PTS) to streamline the original template set. \textit{PTS} selects cluster centres and hard-nominal examples to form a tiny set, maintaining the original decision boundaries. Comprehensive experiments on five real-world datasets demonstrate that our methods yield outperformance than existing advances under the real-time inference speed. Furthermore, \textit{HETMM} can be hot-updated by inserting novel samples, which may promptly address some incremental learning issues.

When to be critical? Performance and evolvability in different regimes of neural Ising agents

  • Authors: Sina Khajehabdollahi, Jan Prosi, Georg Martius, Anna Levina
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2303.16195
  • Pdf link: https://arxiv.org/pdf/2303.16195
  • Abstract
    It has long been hypothesized that operating close to the critical state is beneficial for natural, artificial and their evolutionary systems. We put this hypothesis to test in a system of evolving foraging agents controlled by neural networks that can adapt agents' dynamical regime throughout evolution. Surprisingly, we find that all populations that discover solutions, evolve to be subcritical. By a resilience analysis, we find that there are still benefits of starting the evolution in the critical regime. Namely, initially critical agents maintain their fitness level under environmental changes (for example, in the lifespan) and degrade gracefully when their genome is perturbed. At the same time, initially subcritical agents, even when evolved to the same fitness, are often inadequate to withstand the changes in the lifespan and degrade catastrophically with genetic perturbations. Furthermore, we find the optimal distance to criticality depends on the task complexity. To test it we introduce a hard and simple task: for the hard task, agents evolve closer to criticality whereas more subcritical solutions are found for the simple task. We verify that our results are independent of the selected evolutionary mechanisms by testing them on two principally different approaches: a genetic algorithm and an evolutionary strategy. In summary, our study suggests that although optimal behaviour in the simple task is obtained in a subcritical regime, initializing near criticality is important to be efficient at finding optimal solutions for new tasks of unknown complexity.

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

  • Authors: Renrui Zhang, Jiaming Han, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Peng Gao, Yu Qiao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2303.16199
  • Pdf link: https://arxiv.org/pdf/2303.16199
  • Abstract
    We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the input text tokens at higher transformer layers. Then, a zero-init attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With efficient training, LLaMA-Adapter generates high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Furthermore, our approach can be simply extended to multi-modal input, e.g., images, for image-conditioned LLaMA, which achieves superior reasoning capacity on ScienceQA. We release our code at https://github.com/ZrrSkywalker/LLaMA-Adapter.

Keyword: faster

A Heterogeneous Parallel Non-von Neumann Architecture System for Accurate and Efficient Machine Learning Molecular Dynamics

  • Authors: Zhuoying Zhao, Ziling Tan, Pinghui Mo, Xiaonan Wang, Dan Zhao, Xin Zhang, Ming Tao, Jie Liu
  • Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.15474
  • Pdf link: https://arxiv.org/pdf/2303.15474
  • Abstract
    This paper proposes a special-purpose system to achieve high-accuracy and high-efficiency machine learning (ML) molecular dynamics (MD) calculations. The system consists of field programmable gate array (FPGA) and application specific integrated circuit (ASIC) working in heterogeneous parallelization. To be specific, a multiplication-less neural network (NN) is deployed on the non-von Neumann (NvN)-based ASIC (SilTerra 180 nm process) to evaluate atomic forces, which is the most computationally expensive part of MD. All other calculations of MD are done using FPGA (Xilinx XC7Z100). It is shown that, to achieve similar-level accuracy, the proposed NvN-based system based on low-end fabrication technologies (180 nm) is 1.6x faster and 10^2-10^3x more energy efficiency than state-of-the-art vN based MLMD using graphics processing units (GPUs) based on much more advanced technologies (12 nm), indicating superiority of the proposed NvN-based heterogeneous parallel architecture.

Switched Moving Boundary Modeling of Phase Change Thermal Energy Storage Systems

  • Authors: Trent J. Sakakini, Justin P. Koeln
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.15687
  • Pdf link: https://arxiv.org/pdf/2303.15687
  • Abstract
    Thermal Energy Storage (TES) devices, which leverage the constant-temperature thermal capacity of the latent heat of a Phase Change Material (PCM), provide benefits to a variety of thermal management systems by decoupling the absorption and rejection of thermal energy. While performing a role similar to a battery in an electrical system, it is critical to know when to charge (freeze) and discharge (melt) the TES to maximize the capabilities and efficiency of the overall system. Therefore, control-oriented models of TES are needed to predict the behavior of the TES and make informed control decisions. While existing modeling approaches divide the TES in to multiple sections using a Fixed Grid (FG) approach, this paper proposes a switched Moving Boundary (MB) model that captures the key dynamics of the TES with significantly fewer dynamic states. Specifically, a graph-based modeling approach is used to model the heat flow through the TES and a MB approach is used to model the time-varying liquid and solid regions of the TES. Additionally, a Finite State Machine (FSM) is used to switch between four different modes of operation based on the State-of-Charge (SOC) of the TES. Numerical simulations comparing the proposed approach with a more traditional FG approach show that the MB model is capable of accurately modeling the behavior of the FG model while using far fewer states, leading to five times faster simulations.

Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise

  • Authors: Zaiwei Chen, Siva Theja Maguluri, Martin Zubeldia
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2303.15740
  • Pdf link: https://arxiv.org/pdf/2303.15740
  • Abstract
    In this work, we study the concentration behavior of a stochastic approximation (SA) algorithm under a contractive operator with respect to an arbitrary norm. We consider two settings where the iterates are potentially unbounded: (1) bounded multiplicative noise, and (2) additive sub-Gaussian noise. We obtain maximal concentration inequalities on the convergence errors, and show that these errors have sub-Gaussian tails in the additive noise setting, and super-polynomial tails (faster than polynomial decay) in the multiplicative noise setting. In addition, we provide an impossibility result showing that it is in general not possible to achieve sub-exponential tails for SA with multiplicative noise. To establish these results, we develop a novel bootstrapping argument that involves bounding the moment generating function of the generalized Moreau envelope of the error and the construction of an exponential supermartingale to enable using Ville's maximal inequality. To demonstrate the applicability of our theoretical results, we use them to provide maximal concentration bounds for a large class of reinforcement learning algorithms, including but not limited to on-policy TD-learning with linear function approximation, off-policy TD-learning with generalized importance sampling factors, and $Q$-learning. To the best of our knowledge, super-polynomial concentration bounds for off-policy TD-learning have not been established in the literature due to the challenge of handling the combination of unbounded iterates and multiplicative noise.

Learning Second-Order Attentive Context for Efficient Correspondence Pruning

  • Authors: Xinyi Ye, Weiyue Zhao, Hao Lu, Zhiguo Cao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15761
  • Pdf link: https://arxiv.org/pdf/2303.15761
  • Abstract
    Correspondence pruning aims to search consistent correspondences (inliers) from a set of putative correspondences. It is challenging because of the disorganized spatial distribution of numerous outliers, especially when putative correspondences are largely dominated by outliers. It's more challenging to ensure effectiveness while maintaining efficiency. In this paper, we propose an effective and efficient method for correspondence pruning. Inspired by the success of attentive context in correspondence problems, we first extend the attentive context to the first-order attentive context and then introduce the idea of attention in attention (ANA) to model second-order attentive context for correspondence pruning. Compared with first-order attention that focuses on feature-consistent context, second-order attention dedicates to attention weights itself and provides an additional source to encode consistent context from the attention map. For efficiency, we derive two approximate formulations for the naive implementation of second-order attention to optimize the cubic complexity to linear complexity, such that second-order attention can be used with negligible computational overheads. We further implement our formulations in a second-order context layer and then incorporate the layer in an ANA block. Extensive experiments demonstrate that our method is effective and efficient in pruning outliers, especially in high-outlier-ratio cases. Compared with the state-of-the-art correspondence pruning approach LMCNet, our method runs 14 times faster while maintaining a competitive accuracy.

A Generalized Ray Formulation For Wave-Optics Rendering

  • Authors: Shlomi Steinberg, Ravi Ramamoorthi, Benedikt Bitterli, Eugene d'Eon, Ling-Qi Yan, Matt Pharr
  • Subjects: Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2303.15762
  • Pdf link: https://arxiv.org/pdf/2303.15762
  • Abstract
    Under ray-optical light transport, the classical ray serves as a local and linear "point query" of light's behaviour. Such point queries are useful, and sophisticated path tracing and sampling techniques enable efficiently computing solutions to light transport problems in complex, real-world settings and environments. However, such formulations are firmly confined to the realm of ray optics, while many applications of interest, in computer graphics and computational optics, demand a more precise understanding of light. We rigorously formulate the generalized ray, which enables local and linear point queries of the wave-optical phase space. Furthermore, we present sample-solve: a simple method that serves as a novel link between path tracing and computational optics. We will show that this link enables the application of modern path tracing techniques for wave-optical rendering, improving upon the state-of-the-art in terms of the generality and accuracy of the formalism, ease of application, as well as performance. Sampling using generalized rays enables interactive rendering under rigorous wave optics, with orders-of-magnitude faster performance compared to existing techniques.

X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance

  • Authors: Yiwei Ma, Xiaioqing Zhang, Xiaoshuai Sun, Jiayi Ji, Haowei Wang, Guannan Jiang, Weilin Zhuang, Rongrong Ji
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15764
  • Pdf link: https://arxiv.org/pdf/2303.15764
  • Abstract
    Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV) and computer graphics (CG), aimed at transforming a bare mesh to fit a target text. Prior methods adopt text-independent multilayer perceptrons (MLPs) to predict the attributes of the target mesh with the supervision of CLIP loss. However, such text-independent architecture lacks textual guidance during predicting attributes, thus leading to unsatisfactory stylization and slow convergence. To address these limitations, we present X-Mesh, an innovative text-driven 3D stylization framework that incorporates a novel Text-guided Dynamic Attention Module (TDAM). The TDAM dynamically integrates the guidance of the target text by utilizing text-relevant spatial and channel-wise attentions during vertex feature extraction, resulting in more accurate attribute prediction and faster convergence speed. Furthermore, existing works lack standard benchmarks and automated metrics for evaluation, often relying on subjective and non-reproducible user studies to assess the quality of stylized 3D assets. To overcome this limitation, we introduce a new standard text-mesh benchmark, namely MIT-30, and two automated metrics, which will enable future research to achieve fair and objective comparisons. Our extensive qualitative and quantitative experiments demonstrate that X-Mesh outperforms previous state-of-the-art methods.

Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition

  • Authors: Xiao Yang, Chang Liu, Longlong Xu, Yikai Wang, Yinpeng Dong, Ning Chen, Hang Su, Jun Zhu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.15818
  • Pdf link: https://arxiv.org/pdf/2303.15818
  • Abstract
    Face recognition is a prevailing authentication solution in numerous biometric applications. Physical adversarial attacks, as an important surrogate, can identify the weaknesses of face recognition systems and evaluate their robustness before deployed. However, most existing physical attacks are either detectable readily or ineffective against commercial recognition systems. The goal of this work is to develop a more reliable technique that can carry out an end-to-end evaluation of adversarial robustness for commercial systems. It requires that this technique can simultaneously deceive black-box recognition models and evade defensive mechanisms. To fulfill this, we design adversarial textured 3D meshes (AT3D) with an elaborate topology on a human face, which can be 3D-printed and pasted on the attacker's face to evade the defenses. However, the mesh-based optimization regime calculates gradients in high-dimensional mesh space, and can be trapped into local optima with unsatisfactory transferability. To deviate from the mesh-based space, we propose to perturb the low-dimensional coefficient space based on 3D Morphable Model, which significantly improves black-box transferability meanwhile enjoying faster search efficiency and better visual quality. Extensive experiments in digital and physical scenarios show that our method effectively explores the security vulnerabilities of multiple popular commercial services, including three recognition APIs, four anti-spoofing APIs, two prevailing mobile phones and two automated access control systems.

Clustered Federated Learning Architecture for Network Anomaly Detection in Large Scale Heterogeneous IoT Networks

  • Authors: Xabier Sáez-de-Cámara, Jose Luis Flores, Cristóbal Arellano, Aitor Urbieta, Urko Zurutuza
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.15986
  • Pdf link: https://arxiv.org/pdf/2303.15986
  • Abstract
    There is a growing trend of cyberattacks against Internet of Things (IoT) devices; moreover, the sophistication and motivation of those attacks is increasing. The vast scale of IoT, diverse hardware and software, and being typically placed in uncontrolled environments make traditional IT security mechanisms such as signature-based intrusion detection and prevention systems challenging to integrate. They also struggle to cope with the rapidly evolving IoT threat landscape due to long delays between the analysis and publication of the detection rules. Machine learning methods have shown faster response to emerging threats; however, model training architectures like cloud or edge computing face multiple drawbacks in IoT settings, including network overhead and data isolation arising from the large scale and heterogeneity that characterizes these networks. This work presents an architecture for training unsupervised models for network intrusion detection in large, distributed IoT and Industrial IoT (IIoT) deployments. We leverage Federated Learning (FL) to collaboratively train between peers and reduce isolation and network overhead problems. We build upon it to include an unsupervised device clustering algorithm fully integrated into the FL pipeline to address the heterogeneity issues that arise in FL settings. The architecture is implemented and evaluated using a testbed that includes various emulated IoT/IIoT devices and attackers interacting in a complex network topology comprising 100 emulated devices, 30 switches and 10 routers. The anomaly detection models are evaluated on real attacks performed by the testbed's threat actors, including the entire Mirai malware lifecycle, an additional botnet based on the Merlin command and control server and other red-teaming tools performing scanning activities and multiple attacks targeting the emulated devices.

Faster Deterministic Distributed MIS and Approximate Matching

  • Authors: Mohsen Ghaffari, Christoph Grunau
  • Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.16043
  • Pdf link: https://arxiv.org/pdf/2303.16043
  • Abstract
    $ \renewcommand{\tilde}{\widetilde} $We present an $\tilde{O}(\log^2 n)$ round deterministic distributed algorithm for the maximal independent set problem. By known reductions, this round complexity extends also to maximal matching, $\Delta+1$ vertex coloring, and $2\Delta-1$ edge coloring. These four problems are among the most central problems in distributed graph algorithms and have been studied extensively for the past four decades. This improved round complexity comes closer to the $\tilde{\Omega}(\log n)$ lower bound of maximal independent set and maximal matching [Balliu et al. FOCS '19]. The previous best known deterministic complexity for all of these problems was $\Theta(\log^3 n)$. Via the shattering technique, the improvement permeates also to the corresponding randomized complexities, e.g., the new randomized complexity of $\Delta+1$ vertex coloring is now $\tilde{O}(\log^2\log n)$ rounds. Our approach is a novel combination of the previously known two methods for developing deterministic algorithms for these problems, namely global derandomization via network decomposition (see e.g., [Rozhon, Ghaffari STOC'20; Ghaffari, Grunau, Rozhon SODA'21; Ghaffari et al. SODA'23]) and local rounding of fractional solutions (see e.g., [Fischer DISC'17; Harris FOCS'19; Fischer, Ghaffari, Kuhn FOCS'17; Ghaffari, Kuhn FOCS'21; Faour et al. SODA'23]). We consider a relaxation of the classic network decomposition concept, where instead of requiring the clusters in the same block to be non-adjacent, we allow each node to have a small number of neighboring clusters. We also show a deterministic algorithm that computes this relaxed decomposition faster than standard decompositions. We then use this relaxed decomposition to significantly improve the integrality of certain fractional solutions, before handing them to the local rounding procedure that now has to do fewer rounding steps.

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

  • Authors: Kunchang Li, Yali Wang, Yizhuo Li, Yi Wang, Yinan He, Limin Wang, Yu Qiao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16058
  • Pdf link: https://arxiv.org/pdf/2303.16058
  • Abstract
    Video Foundation Models (VFMs) have received limited exploration due to high computational costs and data scarcity. Previous VFMs rely on Image Foundation Models (IFMs), which face challenges in transferring to the video domain. Although VideoMAE has trained a robust ViT from limited data, its low-level reconstruction poses convergence difficulties and conflicts with high-level cross-modal alignment. This paper proposes a training-efficient method for temporal-sensitive VFMs that integrates the benefits of existing methods. To increase data efficiency, we mask out most of the low-semantics video tokens, but selectively align the unmasked tokens with IFM, which serves as the UnMasked Teacher (UMT). By providing semantic guidance, our method enables faster convergence and multimodal friendliness. With a progressive pre-training framework, our model can handle various tasks including scene-related, temporal-related, and complex video-language understanding. Using only public sources for pre-training in 6 days on 32 A100 GPUs, our scratch-built ViT-L/16 achieves state-of-the-art performances on various video tasks. The code and models will be released at https://github.com/OpenGVLab/unmasked_teacher.

Neural Collapse Inspired Federated Learning with Non-iid Data

  • Authors: Chenxi Huang, Liang Xie, Yibo Yang, Wenxiao Wang, Binbin Lin, Deng Cai
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16066
  • Pdf link: https://arxiv.org/pdf/2303.16066
  • Abstract
    One of the challenges in federated learning is the non-independent and identically distributed (non-iid) characteristics between heterogeneous devices, which cause significant differences in local updates and affect the performance of the central server. Although many studies have been proposed to address this challenge, they only focus on local training and aggregation processes to smooth the changes and fail to achieve high performance with deep learning models. Inspired by the phenomenon of neural collapse, we force each client to be optimized toward an optimal global structure for classification. Specifically, we initialize it as a random simplex Equiangular Tight Frame (ETF) and fix it as the unit optimization target of all clients during the local updating. After guaranteeing all clients are learning to converge to the global optimum, we propose to add a global memory vector for each category to remedy the parameter fluctuation caused by the bias of the intra-class condition distribution among clients. Our experimental results show that our method can improve the performance with faster convergence speed on different-size datasets.

Lazy learning: a biologically-inspired plasticity rule for fast and energy efficient synaptic plasticity

  • Authors: Aaron Pache, Mark CW van Rossum
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
  • Arxiv link: https://arxiv.org/abs/2303.16067
  • Pdf link: https://arxiv.org/pdf/2303.16067
  • Abstract
    When training neural networks for classification tasks with backpropagation, parameters are updated on every trial, even if the sample is classified correctly. In contrast, humans concentrate their learning effort on errors. Inspired by human learning, we introduce lazy learning, which only learns on incorrect samples. Lazy learning can be implemented in a few lines of code and requires no hyperparameter tuning. Lazy learning achieves state-of-the-art performance and is particularly suited when datasets are large. For instance, it reaches 99.2% test accuracy on Extended MNIST using a single-layer MLP, and does so 7.6x faster than a matched backprop network

DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets

  • Authors: Isabella Huang, Yashraj Narang, Ruzena Bajcsy, Fabio Ramos, Tucker Hermans, Dieter Fox
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.16138
  • Pdf link: https://arxiv.org/pdf/2303.16138
  • Abstract
    Robotic grasping of 3D deformable objects is critical for real-world applications such as food handling and robotic surgery. Unlike rigid and articulated objects, 3D deformable objects have infinite degrees of freedom. Fully defining their state requires 3D deformation and stress fields, which are exceptionally difficult to analytically compute or experimentally measure. Thus, evaluating grasp candidates for grasp planning typically requires accurate, but slow 3D finite element method (FEM) simulation. Sampling-based grasp planning is often impractical, as it requires evaluation of a large number of grasp candidates. Gradient-based grasp planning can be more efficient, but requires a differentiable model to synthesize optimal grasps from initial candidates. Differentiable FEM simulators may fill this role, but are typically no faster than standard FEM. In this work, we propose learning a predictive graph neural network (GNN), DefGraspNets, to act as our differentiable model. We train DefGraspNets to predict 3D stress and deformation fields based on FEM-based grasp simulations. DefGraspNets not only runs up to 1500 times faster than the FEM simulator, but also enables fast gradient-based grasp optimization over 3D stress and deformation metrics. We design DefGraspNets to align with real-world grasp planning practices and demonstrate generalization across multiple test sets, including real-world experiments.

Dias: Dynamic Rewriting of Pandas Code

  • Authors: Stefanos Baziotis, Daniel Kang, Charith Mendis
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2303.16146
  • Pdf link: https://arxiv.org/pdf/2303.16146
  • Abstract
    In recent years, dataframe libraries, such as pandas have exploded in popularity. Due to their flexibility, they are increasingly used in ad-hoc exploratory data analysis (EDA) workloads. These workloads are diverse, including custom functions which can span libraries or be written in pure Python. The majority of systems available to accelerate EDA workloads focus on bulk-parallel workloads, which contain vastly different computational patterns, typically within a single library. As a result, they can introduce excessive overheads for ad-hoc EDA workloads due to their expensive optimization techniques. Instead, we identify program rewriting as a lightweight technique which can offer substantial speedups while also avoiding slowdowns. We implemented our techniques in Dias, which rewrites notebook cells to be more efficient for ad-hoc EDA workloads. We develop techniques for efficient rewrites in Dias, including dynamic checking of preconditions under which rewrites are correct and just-in-time rewrites for notebook environments. We show that Dias can rewrite individual cells to be 57$\times$ faster compared to pandas and 1909$\times$ faster compared to optimized systems such as modin. Furthermore, Dias can accelerate whole notebooks by up to 3.6$\times$ compared to pandas and 26.4$\times$ compared to modin.

Keyword: mobile

Beyond Accuracy: A Critical Review of Fairness in Machine Learning for Mobile and Wearable Computing

  • Authors: Sofia Yfantidou, Marios Constantinides, Dimitris Spathis, Athena Vakali, Daniele Quercia, Fahim Kawsar
  • Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.15585
  • Pdf link: https://arxiv.org/pdf/2303.15585
  • Abstract
    The field of mobile, wearable, and ubiquitous computing (UbiComp) is undergoing a revolutionary integration of machine learning. Devices can now diagnose diseases, predict heart irregularities, and unlock the full potential of human cognition. However, the underlying algorithms are not immune to biases with respect to sensitive attributes (e.g., gender, race), leading to discriminatory outcomes. The research communities of HCI and AI-Ethics have recently started to explore ways of reporting information about datasets to surface and, eventually, counter those biases. The goal of this work is to explore the extent to which the UbiComp community has adopted such ways of reporting and highlight potential shortcomings. Through a systematic review of papers published in the Proceedings of the ACM Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) journal over the past 5 years (2018-2022), we found that progress on algorithmic fairness within the UbiComp community lags behind. Our findings show that only a small portion (5%) of published papers adheres to modern fairness reporting, while the overwhelming majority thereof focuses on accuracy or error metrics. In light of these findings, our work provides practical guidelines for the design and development of ubiquitous technologies that not only strive for accuracy but also for fairness.

Overcoming Probabilistic Faults in Disoriented Linear Search

  • Authors: Konstantinos Georgiou, Nikos Giachoudis, Evangelos Kranakis
  • Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
  • Arxiv link: https://arxiv.org/abs/2303.15608
  • Pdf link: https://arxiv.org/pdf/2303.15608
  • Abstract
    We consider search by mobile agents for a hidden, idle target, placed on the infinite line. Feasible solutions are agent trajectories in which all agents reach the target sooner or later. A special feature of our problem is that the agents are $p$-faulty, meaning that every attempt to change direction is an independent Bernoulli trial with known probability $p$, where $p$ is the probability that a turn fails. We are looking for agent trajectories that minimize the worst-case expected termination time, relative to competitive analysis. First, we study linear search with one deterministic $p$-faulty agent, i.e., with no access to random oracles, $p\in (0,1/2)$. For this problem, we provide trajectories that leverage the probabilistic faults into an algorithmic advantage. Our strongest result pertains to a search algorithm (deterministic, aside from the adversarial probabilistic faults) which, as $p\to 0$, has optimal performance $4.59112+\epsilon$, up to the additive term $\epsilon$ that can be arbitrarily small. Additionally, it has performance less than $9$ for $p\leq 0.390388$. When $p\to 1/2$, our algorithm has performance $\Theta(1/(1-2p))$, which we also show is optimal up to a constant factor. Second, we consider linear search with two $p$-faulty agents, $p\in (0,1/2)$, for which we provide three algorithms of different advantages, all with a bounded competitive ratio even as $p\rightarrow 1/2$. Indeed, for this problem, we show how the agents can simulate the trajectory of any $0$-faulty agent (deterministic or randomized), independently of the underlying communication model. As a result, searching with two agents allows for a solution with a competitive ratio of $9+\epsilon$, or a competitive ratio of $4.59112+\epsilon$. Our final contribution is a novel algorithm for searching with two $p$-faulty agents that achieves a competitive ratio $3+4\sqrt{p(1-p)}$.

Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition

  • Authors: Xiao Yang, Chang Liu, Longlong Xu, Yikai Wang, Yinpeng Dong, Ning Chen, Hang Su, Jun Zhu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.15818
  • Pdf link: https://arxiv.org/pdf/2303.15818
  • Abstract
    Face recognition is a prevailing authentication solution in numerous biometric applications. Physical adversarial attacks, as an important surrogate, can identify the weaknesses of face recognition systems and evaluate their robustness before deployed. However, most existing physical attacks are either detectable readily or ineffective against commercial recognition systems. The goal of this work is to develop a more reliable technique that can carry out an end-to-end evaluation of adversarial robustness for commercial systems. It requires that this technique can simultaneously deceive black-box recognition models and evade defensive mechanisms. To fulfill this, we design adversarial textured 3D meshes (AT3D) with an elaborate topology on a human face, which can be 3D-printed and pasted on the attacker's face to evade the defenses. However, the mesh-based optimization regime calculates gradients in high-dimensional mesh space, and can be trapped into local optima with unsatisfactory transferability. To deviate from the mesh-based space, we propose to perturb the low-dimensional coefficient space based on 3D Morphable Model, which significantly improves black-box transferability meanwhile enjoying faster search efficiency and better visual quality. Extensive experiments in digital and physical scenarios show that our method effectively explores the security vulnerabilities of multiple popular commercial services, including three recognition APIs, four anti-spoofing APIs, two prevailing mobile phones and two automated access control systems.

A Novel Design for Advanced 5G Deployment Environments with Virtualized Resources at Vehicular and MEC Nodes

  • Authors: Angelo Feraudo, Alessando Calvio, Armir Bujari, Paolo Bellavista
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2303.15836
  • Pdf link: https://arxiv.org/pdf/2303.15836
  • Abstract
    IoT and edge computing are profoundly changing the information era, bringing a hyper-connected and context-aware computing environment to reality. Connected vehicles are a critical outcome of this synergy, allowing for the seamless interconnection of autonomous mobile/fixed objects, giving rise to a decentralized vehicle-to-everything (V2X) paradigm. On this front, the European Telecommunications Standards Institute (ETSI) proposed the Multi-Access Edge Computing (MEC) standard, addressing the execution of cloud-like services at the very edge of the infrastructure, thus facilitating the support of low-latency services at the far-edge. In this article, we go a step further and propose a novel ETSI MEC-compliant architecture that fully exploits the synergies between the edge and far-edge, extending the pool of virtualized resources available at MEC nodes with vehicular ones found in the vicinity. In particular, our approach allows vehicle entities to access and partake in a negotiation process embodying a rewarding scheme, while addressing resource volatility as vehicles join and leave the resource pool. To demonstrate the viability and flexibility of our proposed approach, we have built an ETSI MEC-compliant simulation model, which could be tailored to distribute application requests based on the availability of both local and remote resources, managing their transparent migration and execution. In addition, the paper reports on the experimental validation of our proposal in a 5G network setting, contrasting different service delivery modes, by highlighting the potential of the dynamic exploitation of far-edge vehicular resources.

4K-HAZE: A Dehazing Benchmark with 4K Resolution Hazy and Haze-Free Images

  • Authors: Zhuoran Zheng, Xiuyi Jia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.15848
  • Pdf link: https://arxiv.org/pdf/2303.15848
  • Abstract
    Currently, mobile and IoT devices are in dire need of a series of methods to enhance 4K images with limited resource expenditure. The absence of large-scale 4K benchmark datasets hampers progress in this area, especially for dehazing. The challenges in building ultra-high-definition (UHD) dehazing datasets are the absence of estimation methods for UHD depth maps, high-quality 4K depth estimation datasets, and migration strategies for UHD haze images from synthetic to real domains. To address these problems, we develop a novel synthetic method to simulate 4K hazy images (including nighttime and daytime scenes) from clear images, which first estimates the scene depth, simulates the light rays and object reflectance, then migrates the synthetic images to real domains by using a GAN, and finally yields the hazy effects on 4K resolution images. We wrap these synthesized images into a benchmark called the 4K-HAZE dataset. Specifically, we design the CS-Mixer (an MLP-based model that integrates \textbf{C}hannel domain and \textbf{S}patial domain) to estimate the depth map of 4K clear images, the GU-Net to migrate a 4K synthetic image to the real hazy domain. The most appealing aspect of our approach (depth estimation and domain migration) is the capability to run a 4K image on a single GPU with 24G RAM in real-time (33fps). Additionally, this work presents an objective assessment of several state-of-the-art single-image dehazing methods that are evaluated using the 4K-HAZE dataset. At the end of the paper, we discuss the limitations of the 4K-HAZE dataset and its social implications.

Around-Body Interaction: Leveraging Limb Movements for Interacting in a Digitally Augmented Physical World

  • Authors: Florian Müller
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2303.15913
  • Pdf link: https://arxiv.org/pdf/2303.15913
  • Abstract
    Recent technological advances have made head-mounted displays (HMDs) smaller and untethered, fostering the vision of ubiquitous interaction with information in a digitally augmented physical world. For interacting with such devices, three main types of input - besides not very intuitive finger gestures - have emerged so far: 1) Touch input on the frame of the devices or 2) on accessories (controller) as well as 3) voice input. While these techniques have both advantages and disadvantages depending on the current situation of the user, they largely ignore the skills and dexterity that we show when interacting with the real world: Throughout our lives, we have trained extensively to use our limbs to interact with and manipulate the physical world around us. This thesis explores how the skills and dexterity of our upper and lower limbs, acquired and trained in interacting with the real world, can be transferred to the interaction with HMDs. Thus, this thesis develops the vision of around-body interaction, in which we use the space around our body, defined by the reach of our limbs, for fast, accurate, and enjoyable interaction with such devices. This work contributes four interaction techniques, two for the upper limbs and two for the lower limbs: The first contribution shows how the proximity between our head and hand can be used to interact with HMDs. The second contribution extends the interaction with the upper limbs to multiple users and illustrates how the registration of augmented information in the real world can support cooperative use cases. The third contribution shifts the focus to the lower limbs and discusses how foot taps can be leveraged as an input modality for HMDs. The fourth contribution presents how lateral shifts of the walking path can be exploited for mobile and hands-free interaction with HMDs while walking.

Ranking mobility and impact inequality in early academic careers

  • Authors: Ye Sun, Fabio Caccioli, Giacomo Livan
  • Subjects: Digital Libraries (cs.DL); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2303.15988
  • Pdf link: https://arxiv.org/pdf/2303.15988
  • Abstract
    How difficult is it for an early career academic to climb the ranks of their discipline? We tackle this question with a comprehensive bibliometric analysis of 57 disciplines, examining the publications of more than 5 million authors whose careers started between 1986 and 2008. We calibrate a simple random walk model over historical data of ranking mobility, which we use to (1) identify which strata of academic impact rankings are the most/least mobile and (2) study the temporal evolution of mobility. By focusing our analysis on cohorts of authors starting their careers in the same year, we find that ranking mobility is remarkably low for the top and bottom-ranked authors, and that this excess of stability persists throughout the entire period of our analysis. We further observe that mobility of impact rankings has increased over time, and that such rise has been accompanied by a decline of impact inequality, which is consistent with the negative correlation that we observe between such two quantities. These findings provide clarity on the opportunities of new scholars entering the academic community, with implications for academic policymaking.

Inside-out Infrared Marker Tracking via Head Mounted Displays for Smart Robot Programming

  • Authors: David Puljiz, Alexandru-George Vasilache, Michael Mende, Björn Hein
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.16017
  • Pdf link: https://arxiv.org/pdf/2303.16017
  • Abstract
    Intuitive robot programming through use of tracked smart input devices relies on fixed, external tracking systems, most often employing infra-red markers. Such an approach is frequently combined with projector-based augmented reality for better visualisation and interface. The combined system, although providing an intuitive programming platform with short cycle times even for inexperienced users, is immobile, expensive and requires extensive calibration. When faced with a changing environment and large number of robots it becomes sorely impractical. Here we present our work on infra-red marker tracking using the Microsoft HoloLens head-mounted display. The HoloLens can map the environment, register the robot on-line, and track smart devices equipped with infra-red markers in the robot coordinate system. We envision our work to provide the basis to transfer many of the paradigms developed over the years for systems requiring a projector and a tracked input device into a highly-portable system that does not require any calibration or special set-up. We test the quality of the marker-tracking in an industrial robot cell and compare our tracking with a ground truth obtained via an ART-3 tracking system.

Evolutionary Design of the Memory Subsystem

  • Authors: Josefa Díaz Álvarez, José L. Risco-Martín, J. Manuel Colmenar
  • Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2303.16074
  • Pdf link: https://arxiv.org/pdf/2303.16074
  • Abstract
    The memory hierarchy has a high impact on the performance and power consumption in the system. Moreover, current embedded systems, included in mobile devices, are specifically designed to run multimedia applications, which are memory intensive. This increases the pressure on the memory subsystem and affects the performance and energy consumption. In this regard, the thermal problems, performance degradation and high energy consumption, can cause irreversible damage to the devices. We address the optimization of the whole memory subsystem with three approaches integrated as a single methodology. Firstly, the thermal impact of register file is analyzed and optimized. Secondly, the cache memory is addressed by optimizing cache configuration according to running applications and improving both performance and power consumption. Finally, we simplify the design and evaluation process of general-purpose and customized dynamic memory manager, in the main memory. To this aim, we apply different evolutionary algorithms in combination with memory simulators and profiling tools. This way, we are able to evaluate the quality of each candidate solution and take advantage of the exploration of solutions given by the optimization algorithm.We also provide an experimental experience where our proposal is assessed using well-known benchmark applications.

Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures

  • Authors: Zirui Fu, Aleksandre Avaliani, Marco Donato
  • Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2303.16100
  • Pdf link: https://arxiv.org/pdf/2303.16100
  • Abstract
    Executing machine learning inference tasks on resource-constrained edge devices requires careful hardware-software co-design optimizations. Recent examples have shown how transformer-based deep neural network models such as ALBERT can be used to enable the execution of natural language processing (NLP) inference on mobile systems-on-chip housing custom hardware accelerators. However, while these existing solutions are effective in alleviating the latency, energy, and area costs of running single NLP tasks, achieving multi-task inference requires running computations over multiple variants of the model parameters, which are tailored to each of the targeted tasks. This approach leads to either prohibitive on-chip memory requirements or paying the cost of off-chip memory access. This paper proposes adapter-ALBERT, an efficient model optimization for maximal data reuse across different tasks. The proposed model's performance and robustness to data compression methods are evaluated across several language tasks from the GLUE benchmark. Additionally, we demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator to extrapolate performance, power, and area improvements over the execution of a traditional ALBERT model on the same hardware platform.

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

  • Authors: Minrui Xu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Shiwen Mao, Zhu Han, Abbas Jamalipour, Dong In Kim, Xuemin (Sherman)Shen, Victor C. M. Leung, H. Vincent Poor
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2303.16129
  • Pdf link: https://arxiv.org/pdf/2303.16129
  • Abstract
    Artificial Intelligence-Generated Content (AIGC) is an automated method for generating, manipulating, and modifying valuable and diverse data using AI algorithms creatively. This survey paper focuses on the deployment of AIGC applications, e.g., ChatGPT and Dall-E, at mobile edge networks, namely mobile AIGC networks, that provide personalized and customized AIGC services in real time while maintaining user privacy. We begin by introducing the background and fundamentals of generative models and the lifecycle of AIGC services at mobile AIGC networks, which includes data collection, training, finetuning, inference, and product management. We then discuss the collaborative cloud-edge-mobile infrastructure and technologies required to support AIGC services and enable users to access AIGC at mobile edge networks. Furthermore, we explore AIGCdriven creative applications and use cases for mobile AIGC networks. Additionally, we discuss the implementation, security, and privacy challenges of deploying mobile AIGC networks. Finally, we highlight some future research directions and open issues for the full realization of mobile AIGC networks.

Keyword: pruning

Exploring the Performance of Pruning Methods in Neural Networks: An Empirical Study of the Lottery Ticket Hypothesis

  • Authors: Eirik Fladmark, Muhammad Hamza Sajjad, Laura Brinkholm Justesen
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.15479
  • Pdf link: https://arxiv.org/pdf/2303.15479
  • Abstract
    In this paper, we explore the performance of different pruning methods in the context of the lottery ticket hypothesis. We compare the performance of L1 unstructured pruning, Fisher pruning, and random pruning on different network architectures and pruning scenarios. The experiments include an evaluation of one-shot and iterative pruning, an examination of weight movement in the network during pruning, a comparison of the pruning methods on networks of varying widths, and an analysis of the performance of the methods when the network becomes very sparse. Additionally, we propose and evaluate a new method for efficient computation of Fisher pruning, known as batched Fisher pruning.

Learning Second-Order Attentive Context for Efficient Correspondence Pruning

  • Authors: Xinyi Ye, Weiyue Zhao, Hao Lu, Zhiguo Cao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15761
  • Pdf link: https://arxiv.org/pdf/2303.15761
  • Abstract
    Correspondence pruning aims to search consistent correspondences (inliers) from a set of putative correspondences. It is challenging because of the disorganized spatial distribution of numerous outliers, especially when putative correspondences are largely dominated by outliers. It's more challenging to ensure effectiveness while maintaining efficiency. In this paper, we propose an effective and efficient method for correspondence pruning. Inspired by the success of attentive context in correspondence problems, we first extend the attentive context to the first-order attentive context and then introduce the idea of attention in attention (ANA) to model second-order attentive context for correspondence pruning. Compared with first-order attention that focuses on feature-consistent context, second-order attention dedicates to attention weights itself and provides an additional source to encode consistent context from the attention map. For efficiency, we derive two approximate formulations for the naive implementation of second-order attention to optimize the cubic complexity to linear complexity, such that second-order attention can be used with negligible computational overheads. We further implement our formulations in a second-order context layer and then incorporate the layer in an ANA block. Extensive experiments demonstrate that our method is effective and efficient in pruning outliers, especially in high-outlier-ratio cases. Compared with the state-of-the-art correspondence pruning approach LMCNet, our method runs 14 times faster while maintaining a competitive accuracy.

Randomly Initialized Subnetworks with Iterative Weight Recycling

  • Authors: Matt Gorbett, Darrell Whitley
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.15953
  • Pdf link: https://arxiv.org/pdf/2303.15953
  • Abstract
    The Multi-Prize Lottery Ticket Hypothesis posits that randomly initialized neural networks contain several subnetworks that achieve comparable accuracy to fully trained models of the same architecture. However, current methods require that the network is sufficiently overparameterized. In this work, we propose a modification to two state-of-the-art algorithms (Edge-Popup and Biprop) that finds high-accuracy subnetworks with no additional storage cost or scaling. The algorithm, Iterative Weight Recycling, identifies subsets of important weights within a randomly initialized network for intra-layer reuse. Empirically we show improvements on smaller network architectures and higher prune rates, finding that model sparsity can be increased through the "recycling" of existing weights. In addition to Iterative Weight Recycling, we complement the Multi-Prize Lottery Ticket Hypothesis with a reciprocal finding: high-accuracy, randomly initialized subnetwork's produce diverse masks, despite being generated with the same hyperparameter's and pruning strategy. We explore the landscapes of these masks, which show high variability.

Large-scale Training Data Search for Object Re-identification

  • Authors: Yue Yao, Huan Lei, Tom Gedeon, Liang Zheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16186
  • Pdf link: https://arxiv.org/pdf/2303.16186
  • Abstract
    We consider a scenario where we have access to the target domain, but cannot afford on-the-fly training data annotation, and instead would like to construct an alternative training set from a large-scale data pool such that a competitive model can be obtained. We propose a search and pruning (SnP) solution to this training data search problem, tailored to object re-identification (re-ID), an application aiming to match the same object captured by different cameras. Specifically, the search stage identifies and merges clusters of source identities which exhibit similar distributions with the target domain. The second stage, subject to a budget, then selects identities and their images from the Stage I output, to control the size of the resulting training set for efficient training. The two steps provide us with training sets 80% smaller than the source pool while achieving a similar or even higher re-ID accuracy. These training sets are also shown to be superior to a few existing search methods such as random sampling and greedy sampling under the same budget on training data size. If we release the budget, training sets resulting from the first stage alone allow even higher re-ID accuracy. We provide interesting discussions on the specificity of our method to the re-ID problem and particularly its role in bridging the re-ID domain gap. The code is available at https://github.com/yorkeyao/SnP.

Keyword: voxel

Multimodal and multicontrast image fusion via deep generative models

  • Authors: Giovanna Maria Dimitri, Simeon Spasov, Andrea Duggento, Luca Passamonti, Pietro Li`o, Nicola Toschi
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.15963
  • Pdf link: https://arxiv.org/pdf/2303.15963
  • Abstract
    Recently, it has become progressively more evident that classic diagnostic labels are unable to reliably describe the complexity and variability of several clinical phenotypes. This is particularly true for a broad range of neuropsychiatric illnesses (e.g., depression, anxiety disorders, behavioral phenotypes). Patient heterogeneity can be better described by grouping individuals into novel categories based on empirically derived sections of intersecting continua that span across and beyond traditional categorical borders. In this context, neuroimaging data carry a wealth of spatiotemporally resolved information about each patient's brain. However, they are usually heavily collapsed a priori through procedures which are not learned as part of model training, and consequently not optimized for the downstream prediction task. This is because every individual participant usually comes with multiple whole-brain 3D imaging modalities often accompanied by a deep genotypic and phenotypic characterization, hence posing formidable computational challenges. In this paper we design a deep learning architecture based on generative models rooted in a modular approach and separable convolutional blocks to a) fuse multiple 3D neuroimaging modalities on a voxel-wise level, b) convert them into informative latent embeddings through heavy dimensionality reduction, c) maintain good generalizability and minimal information loss. As proof of concept, we test our architecture on the well characterized Human Connectome Project database demonstrating that our latent embeddings can be clustered into easily separable subject strata which, in turn, map to different phenotypical information which was not included in the embedding creation process. This may be of aid in predicting disease evolution as well as drug response, hence supporting mechanistic disease understanding and empowering clinical trials.

LinK: Linear Kernel for LiDAR-based 3D Perception

  • Authors: Tao Lu, Xiang Ding, Haisong Liu, Gangshan Wu, Limin Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16094
  • Pdf link: https://arxiv.org/pdf/2303.16094
  • Abstract
    Extending the success of 2D Large Kernel to 3D perception is challenging due to: 1. the cubically-increasing overhead in processing 3D data; 2. the optimization difficulties from data scarcity and sparsity. Previous work has taken the first step to scale up the kernel size from 3x3x3 to 7x7x7 by introducing block-shared weights. However, to reduce the feature variations within a block, it only employs modest block size and fails to achieve larger kernels like the 21x21x21. To address this issue, we propose a new method, called LinK, to achieve a wider-range perception receptive field in a convolution-like manner with two core designs. The first is to replace the static kernel matrix with a linear kernel generator, which adaptively provides weights only for non-empty voxels. The second is to reuse the pre-computed aggregation results in the overlapped blocks to reduce computation complexity. The proposed method successfully enables each voxel to perceive context within a range of 21x21x21. Extensive experiments on two basic perception tasks, 3D object detection and 3D semantic segmentation, demonstrate the effectiveness of our method. Notably, we rank 1st on the public leaderboard of the 3D detection benchmark of nuScenes (LiDAR track), by simply incorporating a LinK-based backbone into the basic detector, CenterPoint. We also boost the strong segmentation baseline's mIoU with 2.7% in the SemanticKITTI test set. Code is available at https://github.com/MCG-NJU/LinK.

Keyword: lidar

4D Panoptic Segmentation as Invariant and Equivariant Field Prediction

  • Authors: Minghan Zhu, Shizong Han, Hong Cai, Shubhankar Borse, Maani Ghaffari Jadidi, Fatih Porikli
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15651
  • Pdf link: https://arxiv.org/pdf/2303.15651
  • Abstract
    In this paper, we develop rotation-equivariant neural networks for 4D panoptic segmentation. 4D panoptic segmentation is a recently established benchmark task for autonomous driving, which requires recognizing semantic classes and object instances on the road based on LiDAR scans, as well as assigning temporally consistent IDs to instances across time. We observe that the driving scenario is symmetric to rotations on the ground plane. Therefore, rotation-equivariance could provide better generalization and more robust feature learning. Specifically, we review the object instance clustering strategies, and restate the centerness-based approach and the offset-based approach as the prediction of invariant scalar fields and equivariant vector fields. Other sub-tasks are also unified from this perspective, and different invariant and equivariant layers are designed to facilitate their predictions. Through evaluation on the standard 4D panoptic segmentation benchmark of SemanticKITTI, we show that our equivariant models achieve higher accuracy with lower computational costs compared to their non-equivariant counterparts. Moreover, our method sets the new state-of-the-art performance and achieves 1st place on the SemanticKITTI 4D Panoptic Segmentation leaderboard.

LinK: Linear Kernel for LiDAR-based 3D Perception

  • Authors: Tao Lu, Xiang Ding, Haisong Liu, Gangshan Wu, Limin Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16094
  • Pdf link: https://arxiv.org/pdf/2303.16094
  • Abstract
    Extending the success of 2D Large Kernel to 3D perception is challenging due to: 1. the cubically-increasing overhead in processing 3D data; 2. the optimization difficulties from data scarcity and sparsity. Previous work has taken the first step to scale up the kernel size from 3x3x3 to 7x7x7 by introducing block-shared weights. However, to reduce the feature variations within a block, it only employs modest block size and fails to achieve larger kernels like the 21x21x21. To address this issue, we propose a new method, called LinK, to achieve a wider-range perception receptive field in a convolution-like manner with two core designs. The first is to replace the static kernel matrix with a linear kernel generator, which adaptively provides weights only for non-empty voxels. The second is to reuse the pre-computed aggregation results in the overlapped blocks to reduce computation complexity. The proposed method successfully enables each voxel to perceive context within a range of 21x21x21. Extensive experiments on two basic perception tasks, 3D object detection and 3D semantic segmentation, demonstrate the effectiveness of our method. Notably, we rank 1st on the public leaderboard of the 3D detection benchmark of nuScenes (LiDAR track), by simply incorporating a LinK-based backbone into the basic detector, CenterPoint. We also boost the strong segmentation baseline's mIoU with 2.7% in the SemanticKITTI test set. Code is available at https://github.com/MCG-NJU/LinK.

Keyword: diffusion

An efficient method for the anisotropic diffusion equation in magnetic fields

  • Authors: Dean Muir, Kenneth Duru, Matthew Hole, Stuart Hudson
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.15447
  • Pdf link: https://arxiv.org/pdf/2303.15447
  • Abstract
    We solve the anisotropic diffusion equation in 2D, where the dominant direction of diffusion is defined by a vector field which does not conform to a Cartesian grid. Our method uses operator splitting to separate the diffusion perpendicular and parallel to the vector field. The slow time scale is solved using a provably stable finite difference formulation in the perpendicular to the vector field, and an integral operator for the diffusion parallel to it. Energy estimates are shown to for the continuous and semi-discrete cases. Numerical experiments are performed showing convergence of the method, and examples is given to demonstrate the capabilities of the method.

A Stochastic Method for Solving Time-Fractional Differential Equations

  • Authors: Nicolas L. Guidotti, Juan Acebrón, José Monteiro
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.15458
  • Pdf link: https://arxiv.org/pdf/2303.15458
  • Abstract
    We present a stochastic method for efficiently computing the solution of time-fractional partial differential equations (fPDEs) that model anomalous diffusion problems of the subdiffusive type. After discretizing the fPDE in space, the ensuing system of fractional linear equations is solved resorting to a Monte Carlo evaluation of the corresponding Mittag-Leffler matrix function. This is accomplished through the approximation of the expected value of a suitable multiplicative functional of a stochastic process, which consists of a Markov chain whose sojourn times in every state are Mittag-Leffler distributed. The resulting algorithm is able to calculate the solution at conveniently chosen points in the domain with high efficiency. In addition, we present how to generalize this algorithm in order to compute the complete solution. For several large-scale numerical problems, our method showed remarkable performance in both shared-memory and distributed-memory systems, achieving nearly perfect scalability up to 16,384 CPU cores.

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

  • Authors: Senmao Li, Joost van de Weijer, Taihang Hu, Fahad Shahbaz Khan, Qibin Hou, Yaxing Wang, Jian Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15649
  • Pdf link: https://arxiv.org/pdf/2303.15649
  • Abstract
    A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images. They either finetune the model, or invert the image in the latent space of the pretrained model. However, they suffer from two problems: (1) Unsatisfying results for selected regions, and unexpected changes in nonselected regions. (2) They require careful text prompt editing where the prompt should include all visual objects in the input image. To address this, we propose two improvements: (1) Only optimizing the input of the value linear network in the cross-attention layers, is sufficiently powerful to reconstruct a real image. (2) We propose attention regularization to preserve the object-like attention maps after editing, enabling us to obtain accurate style editing without invoking significant structural changes. We further improve the editing technique which is used for the unconditional branch of classifier-free guidance, as well as the conditional one as used by P2P. Extensive experimental prompt-editing results on a variety of images, demonstrate qualitatively and quantitatively that our method has superior editing capabilities than existing and concurrent works.

Ecosystem Graphs: The Social Footprint of Foundation Models

  • Authors: Rishi Bommasani, Dilara Soylu, Thomas I. Liao, Kathleen A. Creel, Percy Liang
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2303.15772
  • Pdf link: https://arxiv.org/pdf/2303.15772
  • Abstract
    Foundation models (e.g. ChatGPT, StableDiffusion) pervasively influence society, warranting immediate social attention. While the models themselves garner much attention, to accurately characterize their impact, we must consider the broader sociotechnical ecosystem. We propose Ecosystem Graphs as a documentation framework to transparently centralize knowledge of this ecosystem. Ecosystem Graphs is composed of assets (datasets, models, applications) linked together by dependencies that indicate technical (e.g. how Bing relies on GPT-4) and social (e.g. how Microsoft relies on OpenAI) relationships. To supplement the graph structure, each asset is further enriched with fine-grained metadata (e.g. the license or training emissions). We document the ecosystem extensively at https://crfm.stanford.edu/ecosystem-graphs/. As of March 16, 2023, we annotate 262 assets (64 datasets, 128 models, 70 applications) from 63 organizations linked by 356 dependencies. We show Ecosystem Graphs functions as a powerful abstraction and interface for achieving the minimum transparency required to address myriad use cases. Therefore, we envision Ecosystem Graphs will be a community-maintained resource that provides value to stakeholders spanning AI researchers, industry professionals, social scientists, auditors and policymakers.

Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion

  • Authors: Hiromichi Kamata, Yuiko Sakuma, Akio Hayakawa, Masato Ishii, Takuya Narihira
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15780
  • Pdf link: https://arxiv.org/pdf/2303.15780
  • Abstract
    We propose a high-quality 3D-to-3D conversion method, Instruct 3D-to-3D. Our method is designed for a novel task, which is to convert a given 3D scene to another scene according to text instructions. Instruct 3D-to-3D applies pretrained Image-to-Image diffusion models for 3D-to-3D conversion. This enables the likelihood maximization of each viewpoint image and high-quality 3D generation. In addition, our proposed method explicitly inputs the source 3D scene as a condition, which enhances 3D consistency and controllability of how much of the source 3D scene structure is reflected. We also propose dynamic scaling, which allows the intensity of the geometry transformation to be adjusted. We performed quantitative and qualitative evaluations and showed that our proposed method achieves higher quality 3D-to-3D conversions than baseline methods.

Structure Preserving Finite Volume Approximation of Cross-Diffusion Systems Coupled by a Free Interface

  • Authors: Clément Cancès, Jean Cauvin-Vila, Claire Chainais-Hillairet, Virginie Ehrlacher
  • Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
  • Arxiv link: https://arxiv.org/abs/2303.15817
  • Pdf link: https://arxiv.org/pdf/2303.15817
  • Abstract
    We propose a two-point flux approximation finite-volume scheme for the approximation of two cross-diffusion systems coupled by a free interface to account for vapor deposition. The moving interface is addressed with a cut-cell approach, where the mesh is locally deformed around the interface. The scheme preserves the structure of the continuous system, namely: mass conservation, nonnegativity, volume-filling constraints and decay of the free energy. Numerical results illustrate the properties of the scheme.

Accelerating exponential integrators to efficiently solve advection-diffusion-reaction equations

  • Authors: Marco Caliari, Fabio Cassini, Lukas Einkemmer, Alexander Ostermann
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.15861
  • Pdf link: https://arxiv.org/pdf/2303.15861
  • Abstract
    In this paper we consider an approach to improve the performance of exponential integrators/Lawson schemes in cases where the solution of a related, but usually much simpler, problem can be computed efficiently. While for implicit methods such an approach is common (e.g. by using preconditioners), for exponential integrators this has proven more challenging. Here we propose to extract a constant coefficient differential operator from advection-diffusion-reaction equations for which we are then able to compute the required matrix functions efficiently. Both a linear stability analysis and numerical experiments show that the resulting schemes can be unconditionally stable. In fact, we find that exponential integrators and Lawson schemes can have better stability properties than similarly constructed implicit-explicit schemes. We also propose new Lawson type integrators that further improve on these stability properties. The effectiveness of the approach is highlighted by a number of numerical examples in two and three space dimensions.

Visual Chain-of-Thought Diffusion Models

  • Authors: William Harvey, Frank Wood
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16187
  • Pdf link: https://arxiv.org/pdf/2303.16187
  • Abstract
    Recent progress with conditional image diffusion models has been stunning, and this holds true whether we are speaking about models conditioned on a text description, a scene layout, or a sketch. Unconditional image diffusion models are also improving but lag behind, as do diffusion models which are conditioned on lower-dimensional features like class labels. We propose to close the gap between conditional and unconditional models using a two-stage sampling procedure. In the first stage we sample an embedding describing the semantic content of the image. In the second stage we sample the image conditioned on this embedding and then discard the embedding. Doing so lets us leverage the power of conditional diffusion models on the unconditional generation task, which we show improves FID by 25-50% compared to standard unconditional generation.

Your Diffusion Model is Secretly a Zero-Shot Classifier

  • Authors: Alexander C. Li, Mihir Prabhudesai, Shivam Duggal, Ellis Brown, Deepak Pathak
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.16203
  • Pdf link: https://arxiv.org/pdf/2303.16203
  • Abstract
    The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. These models can generate realistic images for a staggering variety of prompts and exhibit impressive compositional generalization abilities. Almost all use cases thus far have solely focused on sampling; however, diffusion models can also provide conditional density estimates, which are useful for tasks beyond image generation. In this paper, we show that the density estimates from large-scale text-to-image diffusion models like Stable Diffusion can be leveraged to perform zero-shot classification without any additional training. Our generative approach to classification attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models. We also find that our diffusion-based approach has stronger multimodal relational reasoning abilities than competing contrastive approaches. Finally, we evaluate diffusion models trained on ImageNet and find that they approach the performance of SOTA discriminative classifiers trained on the same dataset, even with weak augmentations and no regularization. Results and visualizations at https://diffusion-classifier.github.io/

Keyword: dynamic

A Heterogeneous Parallel Non-von Neumann Architecture System for Accurate and Efficient Machine Learning Molecular Dynamics

  • Authors: Zhuoying Zhao, Ziling Tan, Pinghui Mo, Xiaonan Wang, Dan Zhao, Xin Zhang, Ming Tao, Jie Liu
  • Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.15474
  • Pdf link: https://arxiv.org/pdf/2303.15474
  • Abstract
    This paper proposes a special-purpose system to achieve high-accuracy and high-efficiency machine learning (ML) molecular dynamics (MD) calculations. The system consists of field programmable gate array (FPGA) and application specific integrated circuit (ASIC) working in heterogeneous parallelization. To be specific, a multiplication-less neural network (NN) is deployed on the non-von Neumann (NvN)-based ASIC (SilTerra 180 nm process) to evaluate atomic forces, which is the most computationally expensive part of MD. All other calculations of MD are done using FPGA (Xilinx XC7Z100). It is shown that, to achieve similar-level accuracy, the proposed NvN-based system based on low-end fabrication technologies (180 nm) is 1.6x faster and 10^2-10^3x more energy efficiency than state-of-the-art vN based MLMD using graphics processing units (GPUs) based on much more advanced technologies (12 nm), indicating superiority of the proposed NvN-based heterogeneous parallel architecture.

Sequential training of GANs against GAN-classifiers reveals correlated "knowledge gaps" present among independently trained GAN instances

  • Authors: Arkanath Pathak, Nicholas Dufour
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15533
  • Pdf link: https://arxiv.org/pdf/2303.15533
  • Abstract
    Modern Generative Adversarial Networks (GANs) generate realistic images remarkably well. Previous work has demonstrated the feasibility of "GAN-classifiers" that are distinct from the co-trained discriminator, and operate on images generated from a frozen GAN. That such classifiers work at all affirms the existence of "knowledge gaps" (out-of-distribution artifacts across samples) present in GAN training. We iteratively train GAN-classifiers and train GANs that "fool" the classifiers (in an attempt to fill the knowledge gaps), and examine the effect on GAN training dynamics, output quality, and GAN-classifier generalization. We investigate two settings, a small DCGAN architecture trained on low dimensional images (MNIST), and StyleGAN2, a SOTA GAN architecture trained on high dimensional images (FFHQ). We find that the DCGAN is unable to effectively fool a held-out GAN-classifier without compromising the output quality. However, StyleGAN2 can fool held-out classifiers with no change in output quality, and this effect persists over multiple rounds of GAN/classifier training which appears to reveal an ordering over optima in the generator parameter space. Finally, we study different classifier architectures and show that the architecture of the GAN-classifier has a strong influence on the set of its learned artifacts.

OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis

  • Authors: Hongyi Xu, Guoxian Song, Zihang Jiang, Jianfeng Zhang, Yichun Shi, Jing Liu, Wanchun Ma, Jiashi Feng, Linjie Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15539
  • Pdf link: https://arxiv.org/pdf/2303.15539
  • Abstract
    We present OmniAvatar, a novel geometry-guided 3D head synthesis model trained from in-the-wild unstructured images that is capable of synthesizing diverse identity-preserved 3D heads with compelling dynamic details under full disentangled control over camera poses, facial expressions, head shapes, articulated neck and jaw poses. To achieve such high level of disentangled control, we first explicitly define a novel semantic signed distance function (SDF) around a head geometry (FLAME) conditioned on the control parameters. This semantic SDF allows us to build a differentiable volumetric correspondence map from the observation space to a disentangled canonical space from all the control parameters. We then leverage the 3D-aware GAN framework (EG3D) to synthesize detailed shape and appearance of 3D full heads in the canonical space, followed by a volume rendering step guided by the volumetric correspondence map to output into the observation space. To ensure the control accuracy on the synthesized head shapes and expressions, we introduce a geometry prior loss to conform to head SDF and a control loss to conform to the expression code. Further, we enhance the temporal realism with dynamic details conditioned upon varying expressions and joint poses. Our model can synthesize more preferable identity-preserved 3D heads with compelling dynamic details compared to the state-of-the-art methods both qualitatively and quantitatively. We also provide an ablation study to justify many of our system design choices.

Multiphysics discovery with moving boundaries using Ensemble SINDy and Peridynamic Differential Operator

  • Authors: A. C. Bekar, E. Haghighat, E. Madenci
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.15631
  • Pdf link: https://arxiv.org/pdf/2303.15631
  • Abstract
    This study proposes a novel framework for learning the underlying physics of phenomena with moving boundaries. The proposed approach combines Ensemble SINDy and Peridynamic Differential Operator (PDDO) and imposes an inductive bias assuming the moving boundary physics evolve in its own corotational coordinate system. The robustness of the approach is demonstrated by considering various levels of noise in the measured data using the 2D Fisher-Stefan model. The confidence intervals of recovered coefficients are listed, and the uncertainties of the moving boundary positions are depicted by obtaining the solutions with the recovered coefficients. Although the main focus of this study is the Fisher-Stefan model, the proposed approach is applicable to any type of moving boundary problem with a smooth moving boundary front without a mushy region. The code and data for this framework is available at: https://github.com/alicanbekar/MB_PDDO-SINDy.

Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model

  • Authors: Rashmi Ranjan Bhuyan, Adel Javanmard, Sungchul Kim, Gourab Mukherjee, Ryan A. Rossi, Tong Yu, Handong Zhao
  • Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2303.15652
  • Pdf link: https://arxiv.org/pdf/2303.15652
  • Abstract
    We consider dynamic pricing strategies in a streamed longitudinal data set-up where the objective is to maximize, over time, the cumulative profit across a large number of customer segments. We consider a dynamic probit model with the consumers' preferences as well as price sensitivity varying over time. Building on the well-known finding that consumers sharing similar characteristics act in similar ways, we consider a global shrinkage structure, which assumes that the consumers' preferences across the different segments can be well approximated by a spatial autoregressive (SAR) model. In such a streamed longitudinal set-up, we measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance. We propose a pricing policy based on penalized stochastic gradient descent (PSGD) and explicitly characterize its regret as functions of time, the temporal variability in the model parameters as well as the strength of the auto-correlation network structure spanning the varied customer segments. Our regret analysis results not only demonstrate asymptotic optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information as policies based on unshrunken models are highly sub-optimal in the aforementioned set-up.

GNN-based physics solver for time-independent PDEs

  • Authors: Rini Jasmine Gladstone, Helia Rahmani, Vishvas Suryakumar, Hadi Meidani, Marta D'Elia, Ahmad Zareei
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2303.15681
  • Pdf link: https://arxiv.org/pdf/2303.15681
  • Abstract
    Physics-based deep learning frameworks have shown to be effective in accurately modeling the dynamics of complex physical systems with generalization capability across problem inputs. However, time-independent problems pose the challenge of requiring long-range exchange of information across the computational domain for obtaining accurate predictions. In the context of graph neural networks (GNNs), this calls for deeper networks, which, in turn, may compromise or slow down the training process. In this work, we present two GNN architectures to overcome this challenge - the Edge Augmented GNN and the Multi-GNN. We show that both these networks perform significantly better (by a factor of 1.5 to 2) than baseline methods when applied to time-independent solid mechanics problems. Furthermore, the proposed architectures generalize well to unseen domains, boundary conditions, and materials. Here, the treatment of variable domains is facilitated by a novel coordinate transformation that enables rotation and translation invariance. By broadening the range of problems that neural operators based on graph neural networks can tackle, this paper provides the groundwork for their application to complex scientific and industrial settings.

Switched Moving Boundary Modeling of Phase Change Thermal Energy Storage Systems

  • Authors: Trent J. Sakakini, Justin P. Koeln
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.15687
  • Pdf link: https://arxiv.org/pdf/2303.15687
  • Abstract
    Thermal Energy Storage (TES) devices, which leverage the constant-temperature thermal capacity of the latent heat of a Phase Change Material (PCM), provide benefits to a variety of thermal management systems by decoupling the absorption and rejection of thermal energy. While performing a role similar to a battery in an electrical system, it is critical to know when to charge (freeze) and discharge (melt) the TES to maximize the capabilities and efficiency of the overall system. Therefore, control-oriented models of TES are needed to predict the behavior of the TES and make informed control decisions. While existing modeling approaches divide the TES in to multiple sections using a Fixed Grid (FG) approach, this paper proposes a switched Moving Boundary (MB) model that captures the key dynamics of the TES with significantly fewer dynamic states. Specifically, a graph-based modeling approach is used to model the heat flow through the TES and a MB approach is used to model the time-varying liquid and solid regions of the TES. Additionally, a Finite State Machine (FSM) is used to switch between four different modes of operation based on the State-of-Charge (SOC) of the TES. Numerical simulations comparing the proposed approach with a more traditional FG approach show that the MB model is capable of accurately modeling the behavior of the FG model while using far fewer states, leading to five times faster simulations.

Minimization of Sensor Activation in Discrete-Event Systems with Control Delays and Observation Delays

  • Authors: Yunfeng Hou, Ching-Yen Weng, Qingdu Li
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.15706
  • Pdf link: https://arxiv.org/pdf/2303.15706
  • Abstract
    In discrete-event systems, to save sensor resources, the agent continuously adjusts sensor activation decisions according to a sensor activation policy based on the changing observations. However, new challenges arise for sensor activations in networked discrete-event systems, where observation delays and control delays exist between the sensor systems and the agent. In this paper, a new framework for activating sensors in networked discrete-event systems is established. In this framework, we construct a communication automaton that explicitly expresses the interaction process between the agent and the sensor systems over the observation channel and the control channel. Based on the communication automaton, we can define dynamic observations of a communicated string. To guarantee that a sensor activation policy is physically implementable and insensitive to random control delays and observation delays, we further introduce the definition of delay feasibility. We show that a delay feasible sensor activation policy can be used to dynamically activate sensors even if control delays and observation delays exist. A set of algorithms are developed to minimize sensor activations in a transition-based domain while ensuring a given specification condition is satisfied. A practical example is provided to show the application of the developed sensor activation methods. Finally, we briefly discuss how to extend the proposed framework to a decentralized sensing architecture.

Cesno: Possibility of Creating a New Programming Language

  • Authors: Ozelot Vanilla, Jingxiang Yu, Hemn Barzan Abdalla, Haozhe Cui
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2303.15750
  • Pdf link: https://arxiv.org/pdf/2303.15750
  • Abstract
    Programming languages are incredibly versatile, enabling developers to create applications and programs that suit their individual requirements. This article introduces a new language called Cesno, designed from the ground up to offer an advanced, user-friendly, and easy-to-use programming environment. Cesno's syntax is similar to other popular languages, making it simple to learn and work with. It incorporates features from other languages, such as syntactic sugar, a built-in library, support for functional programming, object-oriented programming, dynamic typing, a type system, and a variety of function parameters and restrictions. This article will explore the design of Cesno's grammar, provide a brief overview of how Cesno processes and compiles code, and provide examples of what Cesno's code looks like and how it can aid in development.

X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance

  • Authors: Yiwei Ma, Xiaioqing Zhang, Xiaoshuai Sun, Jiayi Ji, Haowei Wang, Guannan Jiang, Weilin Zhuang, Rongrong Ji
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15764
  • Pdf link: https://arxiv.org/pdf/2303.15764
  • Abstract
    Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV) and computer graphics (CG), aimed at transforming a bare mesh to fit a target text. Prior methods adopt text-independent multilayer perceptrons (MLPs) to predict the attributes of the target mesh with the supervision of CLIP loss. However, such text-independent architecture lacks textual guidance during predicting attributes, thus leading to unsatisfactory stylization and slow convergence. To address these limitations, we present X-Mesh, an innovative text-driven 3D stylization framework that incorporates a novel Text-guided Dynamic Attention Module (TDAM). The TDAM dynamically integrates the guidance of the target text by utilizing text-relevant spatial and channel-wise attentions during vertex feature extraction, resulting in more accurate attribute prediction and faster convergence speed. Furthermore, existing works lack standard benchmarks and automated metrics for evaluation, often relying on subjective and non-reproducible user studies to assess the quality of stylized 3D assets. To overcome this limitation, we introduce a new standard text-mesh benchmark, namely MIT-30, and two automated metrics, which will enable future research to achieve fair and objective comparisons. Our extensive qualitative and quantitative experiments demonstrate that X-Mesh outperforms previous state-of-the-art methods.

Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion

  • Authors: Hiromichi Kamata, Yuiko Sakuma, Akio Hayakawa, Masato Ishii, Takuya Narihira
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15780
  • Pdf link: https://arxiv.org/pdf/2303.15780
  • Abstract
    We propose a high-quality 3D-to-3D conversion method, Instruct 3D-to-3D. Our method is designed for a novel task, which is to convert a given 3D scene to another scene according to text instructions. Instruct 3D-to-3D applies pretrained Image-to-Image diffusion models for 3D-to-3D conversion. This enables the likelihood maximization of each viewpoint image and high-quality 3D generation. In addition, our proposed method explicitly inputs the source 3D scene as a condition, which enhances 3D consistency and controllability of how much of the source 3D scene structure is reflected. We also propose dynamic scaling, which allows the intensity of the geometry transformation to be adjusted. We performed quantitative and qualitative evaluations and showed that our proposed method achieves higher quality 3D-to-3D conversions than baseline methods.

A Novel Design for Advanced 5G Deployment Environments with Virtualized Resources at Vehicular and MEC Nodes

  • Authors: Angelo Feraudo, Alessando Calvio, Armir Bujari, Paolo Bellavista
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2303.15836
  • Pdf link: https://arxiv.org/pdf/2303.15836
  • Abstract
    IoT and edge computing are profoundly changing the information era, bringing a hyper-connected and context-aware computing environment to reality. Connected vehicles are a critical outcome of this synergy, allowing for the seamless interconnection of autonomous mobile/fixed objects, giving rise to a decentralized vehicle-to-everything (V2X) paradigm. On this front, the European Telecommunications Standards Institute (ETSI) proposed the Multi-Access Edge Computing (MEC) standard, addressing the execution of cloud-like services at the very edge of the infrastructure, thus facilitating the support of low-latency services at the far-edge. In this article, we go a step further and propose a novel ETSI MEC-compliant architecture that fully exploits the synergies between the edge and far-edge, extending the pool of virtualized resources available at MEC nodes with vehicular ones found in the vicinity. In particular, our approach allows vehicle entities to access and partake in a negotiation process embodying a rewarding scheme, while addressing resource volatility as vehicles join and leave the resource pool. To demonstrate the viability and flexibility of our proposed approach, we have built an ETSI MEC-compliant simulation model, which could be tailored to distribute application requests based on the availability of both local and remote resources, managing their transparent migration and execution. In addition, the paper reports on the experimental validation of our proposal in a 5G network setting, contrasting different service delivery modes, by highlighting the potential of the dynamic exploitation of far-edge vehicular resources.

Obstacle Avoidance in Dynamic Environments via Tunnel-following MPC with Adaptive Guiding Vector Fields

  • Authors: Albin Dahlin, Yiannis Karayiannidis
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.15869
  • Pdf link: https://arxiv.org/pdf/2303.15869
  • Abstract
    This paper proposes a motion control scheme for robots operating in a dynamic environment with concave obstacles. A Model Predictive Controller (MPC) is constructed to drive the robot towards a goal position while ensuring collision avoidance without direct use of obstacle information in the optimization problem. This is achieved by guaranteeing tracking performance of an appropriately designed receding horizon path. The path is computed using a guiding vector field defined in a subspace of the free workspace where each point in the subspace satisfies a criteria for minimum distance to all obstacles. The effectiveness of the control scheme is illustrated by means of simulation.

Control Barrier Functions in Dynamic UAVs for Kinematic Obstacle Avoidance: A Collision Cone Approach

  • Authors: Manan Tayal, Shishir Kolathaya
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.15871
  • Pdf link: https://arxiv.org/pdf/2303.15871
  • Abstract
    Unmanned aerial vehicles (UAVs), specifically quadrotors, have revolutionized various industries with their maneuverability and versatility, but their safe operation in dynamic environments heavily relies on effective collision avoidance techniques. This paper introduces a novel technique for safely navigating a quadrotor along a desired route while avoiding kinematic obstacles. The proposed approach employs control barrier functions and utilizes collision cones to ensure that the quadrotor's velocity and the obstacle's velocity always point away from each other. In particular, we propose a new constraint formulation that ensures that the relative velocity between the quadrotor and the obstacle always avoids a cone of vectors that may lead to a collision. By showing that the proposed constraint is a valid control barrier function (CBFs) for quadrotors, we are able to leverage on its real-time implementation via Quadratic Programs (QPs), called the CBF-QPs. We validate the effectiveness of the proposed CBF-QPs by demonstrating collision avoidance with moving obstacles under multiple scenarios. This is shown in the pybullet simulator.Furthermore we compare the proposed approach with CBF-QPs shown in literature, especially the well-known higher order CBF-QPs (HO-CBF-QPs), where in we show that it is more conservative compared to the proposed approach. This comparison also shown in simulation in detail.

Satellite Dynamics Toolbox Library: a tool to model multi-body space systems for robust control synthesis and analysis

  • Authors: Francesco Sanfedino, Daniel Alazard, Ervan Kassarian, Franca Somers
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.15872
  • Pdf link: https://arxiv.org/pdf/2303.15872
  • Abstract
    The level of maturity reached by robust control theory techniques nowadays contributes to a considerable minimization of the development time of an end-to-end control design of a spacecraft system. The advantage offered by this framework is twofold: all system uncertainties can be included from the very beginning of the design process; the validation and verification (V&V) process is improved by fast detection of worst-case configurations that could escape to a classical sample-based Monte Carlo simulation campaign. Before proceeding to the control synthesis and analysis, a proper uncertain plant model has to be available in order to push these techniques to their limits of performance. In this spirit, the Satellite Dynamics Toolbox Library (SDTlib) offers many features to model a spacecraft system in a multi-body fashion on SIMULINK. Parametric models can be easily built in a Linear Fractional Transformation (LFT) form by including uncertainties and varying parameters with minimal number of repetitions. Uncertain Linear Time Invariant (LTI) and uncertain Linear Parameter-Varying (LPV) controllers can then be synthesized and analyzed in a straightforward way. The authors present in this article a tutorial, that can be downloaded at https://nextcloud.isae.fr/index.php/s/XDfRfHntejHTmmp, to show how to deal with an end-to-end robust design of a spacecraft mission and to provide to researchers a benchmark to test their own algorithms.

STMixer: A One-Stage Sparse Action Detector

  • Authors: Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.15879
  • Pdf link: https://arxiv.org/pdf/2303.15879
  • Abstract
    Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and cannot capture context information outside the bounding box. Recently, a few query-based action detectors are proposed to predict action instances in an end-to-end manner. However, they still lack adaptability in feature sampling and decoding, thus suffering from the issues of inferior performance or slower convergence. In this paper, we propose a new one-stage sparse action detector, termed STMixer. STMixer is based on two core designs. First, we present a query-based adaptive feature sampling module, which endows our STMixer with the flexibility of mining a set of discriminative features from the entire spatiotemporal domain. Second, we devise a dual-branch feature mixing module, which allows our STMixer to dynamically attend to and mix video features along the spatial and the temporal dimension respectively for better feature decoding. Coupling these two designs with a video backbone yields an efficient end-to-end action detector. Without bells and whistles, our STMixer obtains the state-of-the-art results on the datasets of AVA, UCF101-24, and JHMDB.

On Optimal Synchronization of Diffusively Coupled Heterogeneous Van der Pol Oscillators

  • Authors: Tabea Trummel, Zonglin Liu, Olaf Stursberg
  • Subjects: Systems and Control (eess.SY); Chaotic Dynamics (nlin.CD)
  • Arxiv link: https://arxiv.org/abs/2303.15890
  • Pdf link: https://arxiv.org/pdf/2303.15890
  • Abstract
    This paper proposes a novel method to achieve and preserve synchronization for a set of connected heterogeneous Van der Pol oscillators. Unlike the state-of-the-art synchronization methods, in which a large coupling gain is applied to couple any pair of connected oscillators, the proposed method first casts the whole synchronization process into two phases. The first one considers the period from the beginning to the first instant of synchronization, while the second phase covers the following time in which synchronization must be preserved. It is shown that a large coupling gain is adopted for the first phase, while the averaged coupling gain to preserve the synchronization in the second phase can be reduced significantly by using an offline optimized coupling law. Efficiency and performance of this method are confirmed by a set of numerical tests with different graphs and system dynamics.

ARMP: Autoregressive Motion Planning for Quadruped Locomotion and Navigation in Complex Indoor Environments

  • Authors: Jeonghwan Kim, Tianyu Li, Sehoon Ha
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.15900
  • Pdf link: https://arxiv.org/pdf/2303.15900
  • Abstract
    Generating natural and physically feasible motions for legged robots has been a challenging problem due to its complex dynamics. In this work, we introduce a novel learning-based framework of autoregressive motion planner (ARMP) for quadruped locomotion and navigation. Our method can generate motion plans with an arbitrary length in an autoregressive fashion, unlike most offline trajectory optimization algorithms for a fixed trajectory length. To this end, we first construct the motion library by solving a dense set of trajectory optimization problems for diverse scenarios and parameter settings. Then we learn the motion manifold from the dataset in a supervised learning fashion. We show that the proposed ARMP can generate physically plausible motions for various tasks and situations. We also showcase that our method can be successfully integrated with the recent robot navigation frameworks as a low-level controller and unleash the full capability of legged robots for complex indoor navigation.

In Sync: Exploring Synchronization to Increase Trust Between Humans and Non-humanoid Robots

  • Authors: Wieslaw Bartkowski (University of Warsaw), Andrzej Nowak (University of Warsaw), Filip Ignacy Czajkowski (University of Warsaw), Albrecht Schmidt (LMU Munich), Florian Müller (LMU Munich)
  • Subjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.15917
  • Pdf link: https://arxiv.org/pdf/2303.15917
  • Abstract
    When we go for a walk with friends, we can observe an interesting effect: From step lengths to arm movements - our movements unconsciously align; they synchronize. Prior research found that this synchronization is a crucial aspect of human relations that strengthens social cohesion and trust. Generalizing from these findings in synchronization theory, we propose a dynamical approach that can be applied in the design of non-humanoid robots to increase trust. We contribute the results of a controlled experiment with 51 participants exploring our concept in a between-subjects design. For this, we built a prototype of a simple non-humanoid robot that can bend to follow human movements and vary the movement synchronization patterns. We found that synchronized movements lead to significantly higher ratings in an established questionnaire on trust between people and automation but did not influence the willingness to spend money in a trust game.

Unbiasing Hamiltonian Monte Carlo algorithms for a general Hamiltonian function

  • Authors: Tony Lelièvre, Régis Santet, Gabriel Stoltz
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2303.15918
  • Pdf link: https://arxiv.org/pdf/2303.15918
  • Abstract
    Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo method that allows to sample high dimensional probability measures. It relies on the integration of the Hamiltonian dynamics to propose a move which is then accepted or rejected thanks to a Metropolis procedure. Unbiased sampling is guaranteed by the preservation by the numerical integrators of two key properties of the Hamiltonian dynamics: volume-preservation and reversibility up to momentum reversal. For separable Hamiltonian functions, some standard explicit numerical schemes, such as the St"ormer--Verlet integrator, satisfy these properties. However, for numerical or physical reasons, one may consider a Hamiltonian function which is nonseparable, in which case the standard numerical schemes which preserve the volume and satisfy reversibility up to momentum reversal are implicit. Actually, when implemented in practice, such implicit schemes may admit many solutions or none, especially when the timestep is too large. We show here how to enforce the numerical reversibility, and thus unbiasedness, of HMC schemes in this context. Numerical results illustrate the relevance of this correction on simple problems.

A source separation approach to temporal graph modelling for computer networks

  • Authors: Corentin Larroche
  • Subjects: Cryptography and Security (cs.CR); Applications (stat.AP); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2303.15950
  • Pdf link: https://arxiv.org/pdf/2303.15950
  • Abstract
    Detecting malicious activity within an enterprise computer network can be framed as a temporal link prediction task: given a sequence of graphs representing communications between hosts over time, the goal is to predict which edges should--or should not--occur in the future. However, standard temporal link prediction algorithms are ill-suited for computer network monitoring as they do not take account of the peculiar short-term dynamics of computer network activity, which exhibits sharp seasonal variations. In order to build a better model, we propose a source separation-inspired description of computer network activity: at each time step, the observed graph is a mixture of subgraphs representing various sources of activity, and short-term dynamics result from changes in the mixing coefficients. Both qualitative and quantitative experiments demonstrate the validity of our approach.

TraffNet: Learning Causality of Traffic Generation for Road Network Digital Twins

  • Authors: Ming Xu, Yunyi Ma, Ruimin Li, Geqi Qi, Xiangfu Meng, Haibo Jin
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.15954
  • Pdf link: https://arxiv.org/pdf/2303.15954
  • Abstract
    Road network digital twins (RNDTs) play a critical role in the development of next-generation intelligent transportation systems, enabling more precise traffic planning and control. To support just-in-time (JIT) decision making, RNDTs require a model that dynamically learns the traffic patterns from online sensor data and generates high-fidelity simulation results. Although current traffic prediction techniques based on graph neural networks have achieved state-of-the-art performance, these techniques only predict future traffic by mining correlations in historical traffic data, disregarding the causes of traffic generation, such as traffic demands and route selection. Therefore, their performance is unreliable for JIT decision making. To fill this gap, we introduce a novel deep learning framework called TraffNet that learns the causality of traffic volume from vehicle trajectory data. First, we use a heterogeneous graph to represent the road network, allowing the model to incorporate causal features of traffic volumes. Next, motivated by the traffic domain knowledge, we propose a traffic causality learning method to learn an embedding vector that encodes travel demands and path-level dependencies for each road segment. Then, we model temporal dependencies to match the underlying process of traffic generation. Finally, the experiments verify the utility of TraffNet. The code of TraffNet is available at https://github.com/mayunyi-1999/TraffNet_code.git.

Evolutionary Design of the Memory Subsystem

  • Authors: Josefa Díaz Álvarez, José L. Risco-Martín, J. Manuel Colmenar
  • Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2303.16074
  • Pdf link: https://arxiv.org/pdf/2303.16074
  • Abstract
    The memory hierarchy has a high impact on the performance and power consumption in the system. Moreover, current embedded systems, included in mobile devices, are specifically designed to run multimedia applications, which are memory intensive. This increases the pressure on the memory subsystem and affects the performance and energy consumption. In this regard, the thermal problems, performance degradation and high energy consumption, can cause irreversible damage to the devices. We address the optimization of the whole memory subsystem with three approaches integrated as a single methodology. Firstly, the thermal impact of register file is analyzed and optimized. Secondly, the cache memory is addressed by optimizing cache configuration according to running applications and improving both performance and power consumption. Finally, we simplify the design and evaluation process of general-purpose and customized dynamic memory manager, in the main memory. To this aim, we apply different evolutionary algorithms in combination with memory simulators and profiling tools. This way, we are able to evaluate the quality of each candidate solution and take advantage of the exploration of solutions given by the optimization algorithm.We also provide an experimental experience where our proposal is assessed using well-known benchmark applications.

Invariant preservation in machine learned PDE solvers via error correction

  • Authors: Nick McGreivy, Ammar Hakim
  • Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
  • Arxiv link: https://arxiv.org/abs/2303.16110
  • Pdf link: https://arxiv.org/pdf/2303.16110
  • Abstract
    Machine learned partial differential equation (PDE) solvers trade the reliability of standard numerical methods for potential gains in accuracy and/or speed. The only way for a solver to guarantee that it outputs the exact solution is to use a convergent method in the limit that the grid spacing $\Delta x$ and timestep $\Delta t$ approach zero. Machine learned solvers, which learn to update the solution at large $\Delta x$ and/or $\Delta t$, can never guarantee perfect accuracy. Some amount of error is inevitable, so the question becomes: how do we constrain machine learned solvers to give us the sorts of errors that we are willing to tolerate? In this paper, we design more reliable machine learned PDE solvers by preserving discrete analogues of the continuous invariants of the underlying PDE. Examples of such invariants include conservation of mass, conservation of energy, the second law of thermodynamics, and/or non-negative density. Our key insight is simple: to preserve invariants, at each timestep apply an error-correcting algorithm to the update rule. Though this strategy is different from how standard solvers preserve invariants, it is necessary to retain the flexibility that allows machine learned solvers to be accurate at large $\Delta x$ and/or $\Delta t$. This strategy can be applied to any autoregressive solver for any time-dependent PDE in arbitrary geometries with arbitrary boundary conditions. Although this strategy is very general, the specific error-correcting algorithms need to be tailored to the invariants of the underlying equations as well as to the solution representation and time-stepping scheme of the solver. The error-correcting algorithms we introduce have two key properties. First, by preserving the right invariants they guarantee numerical stability. Second, in closed or periodic systems they do so without degrading the accuracy of an already-accurate solver.

CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection

  • Authors: Lei Chen, Zhan Tong, Yibing Song, Gangshan Wu, Limin Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16118
  • Pdf link: https://arxiv.org/pdf/2303.16118
  • Abstract
    The relation modeling between actors and scene context advances video action detection where the correlation of multiple actors makes their action recognition challenging. Existing studies model each actor and scene relation to improve action recognition. However, the scene variations and background interference limit the effectiveness of this relation modeling. In this paper, we propose to select actor-related scene context, rather than directly leverage raw video scenario, to improve relation modeling. We develop a Cycle Actor-Context Relation network (CycleACR) where there is a symmetric graph that models the actor and context relations in a bidirectional form. Our CycleACR consists of the Actor-to-Context Reorganization (A2C-R) that collects actor features for context feature reorganizations, and the Context-to-Actor Enhancement (C2A-E) that dynamically utilizes reorganized context features for actor feature enhancement. Compared to existing designs that focus on C2A-E, our CycleACR introduces A2C-R for a more effective relation modeling. This modeling advances our CycleACR to achieve state-of-the-art performance on two popular action detection datasets (i.e., AVA and UCF101-24). We also provide ablation studies and visualizations as well to show how our cycle actor-context relation modeling improves video action detection. Code is available at https://github.com/MCG-NJU/CycleACR.

Dias: Dynamic Rewriting of Pandas Code

  • Authors: Stefanos Baziotis, Daniel Kang, Charith Mendis
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2303.16146
  • Pdf link: https://arxiv.org/pdf/2303.16146
  • Abstract
    In recent years, dataframe libraries, such as pandas have exploded in popularity. Due to their flexibility, they are increasingly used in ad-hoc exploratory data analysis (EDA) workloads. These workloads are diverse, including custom functions which can span libraries or be written in pure Python. The majority of systems available to accelerate EDA workloads focus on bulk-parallel workloads, which contain vastly different computational patterns, typically within a single library. As a result, they can introduce excessive overheads for ad-hoc EDA workloads due to their expensive optimization techniques. Instead, we identify program rewriting as a lightweight technique which can offer substantial speedups while also avoiding slowdowns. We implemented our techniques in Dias, which rewrites notebook cells to be more efficient for ad-hoc EDA workloads. We develop techniques for efficient rewrites in Dias, including dynamic checking of preconditions under which rewrites are correct and just-in-time rewrites for notebook environments. We show that Dias can rewrite individual cells to be 57$\times$ faster compared to pandas and 1909$\times$ faster compared to optimized systems such as modin. Furthermore, Dias can accelerate whole notebooks by up to 3.6$\times$ compared to pandas and 26.4$\times$ compared to modin.

Reactive Gait Composition with Stability: Dynamic Walking amidst Static and Moving Obstacles

  • Authors: Kunal Sanjay Narkhede, Mohamad Shafiee Motahar, Sushant Veer, Ioannis Poulakakis
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.16165
  • Pdf link: https://arxiv.org/pdf/2303.16165
  • Abstract
    This paper presents a modular approach to motion planning with provable stability guarantees for robots that move through changing environments via periodic locomotion behaviors. We focus on dynamic walkers as a paradigm for such systems, although the tools developed in this paper can be used to support general compositional approaches to robot motion planning with Dynamic Movement Primitives (DMPs). Our approach ensures a priori that the suggested plan can be stably executed. This is achieved by formulating the planning process as a Switching System with Multiple Equilibria (SSME) and proving that the system's evolution remains within explicitly characterized trapping regions in the state space under suitable constraints on the frequency of switching among the DMPs. These conditions effectively encapsulate the low-level stability limitations in a form that can be easily communicated to the planner to guarantee that the suggested plan is compatible with the robot's dynamics. Furthermore, we show how the available primitives can be safely composed online in a receding horizon manner to enable the robot to react to moving obstacles. The proposed framework is applied on 3D bipedal walking models under common modeling assumptions, and offers a modular approach towards stably integrating readily available low-level locomotion control and high-level planning methods.

Control Barrier Function-based Predictive Control for Close Proximity operation of UAVs inside a Tunnel

  • Authors: Vedant Mundheda, Damodar Datta K, Harikumar Kandath
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.16177
  • Pdf link: https://arxiv.org/pdf/2303.16177
  • Abstract
    This paper introduces a method for effectively controlling the movement of an Unmanned Aerial Vehicle (UAV) within a tunnel. The primary challenge of this problem lies in the UAV's exposure to nonlinear distance-dependent torques and forces generated by the tunnel walls, along with the need to operate safely within a defined region while in close proximity to these walls. To address this problem, the paper proposes the implementation of a Model Predictive Control (MPC) framework with constraints based on Control Barrier Function (CBF). The paper approaches the issue in two distinct ways; first, by maintaining a safe distance from the tunnel walls to avoid the effects of both the walls and ceiling, and second, by minimizing the distance from the walls to effectively manage the nonlinear forces associated with close proximity tasks. Finally, the paper demonstrates the effectiveness of its approach through testing on simulation for various close proximity trajectories with the realistic model of aerodynamic disturbances due to the proximity of the ceiling and boundary walls.

When to be critical? Performance and evolvability in different regimes of neural Ising agents

  • Authors: Sina Khajehabdollahi, Jan Prosi, Georg Martius, Anna Levina
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2303.16195
  • Pdf link: https://arxiv.org/pdf/2303.16195
  • Abstract
    It has long been hypothesized that operating close to the critical state is beneficial for natural, artificial and their evolutionary systems. We put this hypothesis to test in a system of evolving foraging agents controlled by neural networks that can adapt agents' dynamical regime throughout evolution. Surprisingly, we find that all populations that discover solutions, evolve to be subcritical. By a resilience analysis, we find that there are still benefits of starting the evolution in the critical regime. Namely, initially critical agents maintain their fitness level under environmental changes (for example, in the lifespan) and degrade gracefully when their genome is perturbed. At the same time, initially subcritical agents, even when evolved to the same fitness, are often inadequate to withstand the changes in the lifespan and degrade catastrophically with genetic perturbations. Furthermore, we find the optimal distance to criticality depends on the task complexity. To test it we introduce a hard and simple task: for the hard task, agents evolve closer to criticality whereas more subcritical solutions are found for the simple task. We verify that our results are independent of the selected evolutionary mechanisms by testing them on two principally different approaches: a genetic algorithm and an evolutionary strategy. In summary, our study suggests that although optimal behaviour in the simple task is obtained in a subcritical regime, initializing near criticality is important to be efficient at finding optimal solutions for new tasks of unknown complexity.

Forecasting localized weather impacts on vegetation as seen from space with meteo-guided video prediction

  • Authors: Vitus Benson, Christian Requena-Mesa, Claire Robin, Lazaro Alonso, José Cortés, Zhihan Gao, Nora Linscheid, Mélanie Weynants, Markus Reichstein
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16198
  • Pdf link: https://arxiv.org/pdf/2303.16198
  • Abstract
    We present a novel approach for modeling vegetation response to weather in Europe as measured by the Sentinel 2 satellite. Existing satellite imagery forecasting approaches focus on photorealistic quality of the multispectral images, while derived vegetation dynamics have not yet received as much attention. We leverage both spatial and temporal context by extending state-of-the-art video prediction methods with weather guidance. We extend the EarthNet2021 dataset to be suitable for vegetation modeling by introducing a learned cloud mask and an appropriate evaluation scheme. Qualitative and quantitative experiments demonstrate superior performance of our approach over a wide variety of baseline methods, including leading approaches to satellite imagery forecasting. Additionally, we show how our modeled vegetation dynamics can be leveraged in a downstream task: inferring gross primary productivity for carbon monitoring. To the best of our knowledge, this work presents the first models for continental-scale vegetation modeling at fine resolution able to capture anomalies beyond the seasonal cycle, thereby paving the way for predictive assessments of vegetation status.

New submissions for Thu, 30 Mar 23

Keyword: efficient

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

  • Authors: Valentin Macé, Raphaël Boige, Felix Chalumeau, Thomas Pierrot, Guillaume Richard, Nicolas Perrin-Gilbert
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16207
  • Pdf link: https://arxiv.org/pdf/2303.16207
  • Abstract
    In the context of neuroevolution, Quality-Diversity algorithms have proven effective in generating repertoires of diverse and efficient policies by relying on the definition of a behavior space. A natural goal induced by the creation of such a repertoire is trying to achieve behaviors on demand, which can be done by running the corresponding policy from the repertoire. However, in uncertain environments, two problems arise. First, policies can lack robustness and repeatability, meaning that multiple episodes under slightly different conditions often result in very different behaviors. Second, due to the discrete nature of the repertoire, solutions vary discontinuously. Here we present a new approach to achieve behavior-conditioned trajectory generation based on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrains the selection of solutions to those that are the most consistent in the behavior space. Second, the Quality-Diversity Transformer (QDT), a Transformer-based model conditioned on continuous behavior descriptors, which trains on a dataset generated by policies from a ME-LS repertoire and learns to autoregressively generate sequences of actions that achieve target behaviors. Results show that ME-LS produces consistent and robust policies, and that its combination with the QDT yields a single policy capable of achieving diverse behaviors on demand with high accuracy.

ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales

  • Authors: Xingfu Wu, Prasanna Balaprakash, Michael Kruse, Jaehoon Koo, Brice Videau, Paul Hovland, Valerie Taylor, Brad Geltz, Siddhartha Jana, Mary Hall
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2303.16245
  • Pdf link: https://arxiv.org/pdf/2303.16245
  • Abstract
    As we enter the exascale computing era, efficiently utilizing power and optimizing the performance of scientific applications under power and energy constraints has become critical and challenging. We propose a low-overhead autotuning framework to autotune performance and energy for various hybrid MPI/OpenMP scientific applications at large scales and to explore the tradeoffs between application runtime and power/energy for energy efficient application execution, then use this framework to autotune four ECP proxy applications -- XSBench, AMG, SWFFT, and SW4lite. Our approach uses Bayesian optimization with a Random Forest surrogate model to effectively search parameter spaces with up to 6 million different configurations on two large-scale production systems, Theta at Argonne National Laboratory and Summit at Oak Ridge National Laboratory. The experimental results show that our autotuning framework at large scales has low overhead and achieves good scalability. Using the proposed autotuning framework to identify the best configurations, we achieve up to 91.59% performance improvement, up to 21.2% energy savings, and up to 37.84% EDP improvement on up to 4,096 nodes.

Communication-Efficient Vertical Federated Learning with Limited Overlapping Samples

  • Authors: Jingwei Sun, Ziyue Xu, Dong Yang, Vishwesh Nath, Wenqi Li, Can Zhao, Daguang Xu, Yiran Chen, Holger R. Roth
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16270
  • Pdf link: https://arxiv.org/pdf/2303.16270
  • Abstract
    Federated learning is a popular collaborative learning approach that enables clients to train a global model without sharing their local data. Vertical federated learning (VFL) deals with scenarios in which the data on clients have different feature spaces but share some overlapping samples. Existing VFL approaches suffer from high communication costs and cannot deal efficiently with limited overlapping samples commonly seen in the real world. We propose a practical vertical federated learning (VFL) framework called \textbf{one-shot VFL} that can solve the communication bottleneck and the problem of limited overlapping samples simultaneously based on semi-supervised learning. We also propose \textbf{few-shot VFL} to improve the accuracy further with just one more communication round between the server and the clients. In our proposed framework, the clients only need to communicate with the server once or only a few times. We evaluate the proposed VFL framework on both image and tabular datasets. Our methods can improve the accuracy by more than 46.5% and reduce the communication cost by more than 330$\times$ compared with state-of-the-art VFL methods when evaluated on CIFAR-10. Our code will be made publicly available at \url{https://nvidia.github.io/NVFlare/research/one-shot-vfl}.

A Machine Learning Outlook: Post-processing of Global Medium-range Forecasts

  • Authors: Shreya Agrawal, Rob Carver, Cenk Gazen, Eric Maddy, Vladimir Krasnopolsky, Carla Bromberg, Zack Ontiveros, Tyler Russell, Jason Hickey, Sid Boukabara
  • Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
  • Arxiv link: https://arxiv.org/abs/2303.16301
  • Pdf link: https://arxiv.org/pdf/2303.16301
  • Abstract
    Post-processing typically takes the outputs of a Numerical Weather Prediction (NWP) model and applies linear statistical techniques to produce improve localized forecasts, by including additional observations, or determining systematic errors at a finer scale. In this pilot study, we investigate the benefits and challenges of using non-linear neural network (NN) based methods to post-process multiple weather features -- temperature, moisture, wind, geopotential height, precipitable water -- at 30 vertical levels, globally and at lead times up to 7 days. We show that we can achieve accuracy improvements of up to 12% (RMSE) in a field such as temperature at 850hPa for a 7 day forecast. However, we recognize the need to strengthen foundational work on objectively measuring a sharp and correct forecast. We discuss the challenges of using standard metrics such as root mean squared error (RMSE) or anomaly correlation coefficient (ACC) as we move from linear statistical models to more complex non-linear machine learning approaches for post-processing global weather forecasts.

Operator learning with PCA-Net: upper and lower complexity bounds

  • Authors: Samuel Lanthaler
  • Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2303.16317
  • Pdf link: https://arxiv.org/pdf/2303.16317
  • Abstract
    Neural operators are gaining attention in computational science and engineering. PCA-Net is a recently proposed neural operator architecture which combines principal component analysis (PCA) with neural networks to approximate an underlying operator. The present work develops approximation theory for this approach, improving and significantly extending previous work in this direction. In terms of qualitative bounds, this paper derives a novel universal approximation result, under minimal assumptions on the underlying operator and the data-generating distribution. In terms of quantitative bounds, two potential obstacles to efficient operator learning with PCA-Net are identified, and made rigorous through the derivation of lower complexity bounds; the first relates to the complexity of the output distribution, measured by a slow decay of the PCA eigenvalues. The other obstacle relates the inherent complexity of the space of operators between infinite-dimensional input and output spaces, resulting in a rigorous and quantifiable statement of the curse of dimensionality. In addition to these lower bounds, upper complexity bounds are derived; first, a suitable smoothness criterion is shown to ensure a algebraic decay of the PCA eigenvalues. Then, it is shown that PCA-Net can overcome the general curse of dimensionality for specific operators of interest, arising from the Darcy flow and Navier-Stokes equations.

A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI

  • Authors: Tianyuan Yao, Nancy Newlin, Praitayini Kanakaraj, Vishwesh nath, Leon Y Cai, Karthik Ramadass, Kurt Schilling, Bennett A. Landman, Yuankai Huo
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16376
  • Pdf link: https://arxiv.org/pdf/2303.16376
  • Abstract
    Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relied on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences.

ProductAE: Toward Deep Learning Driven Error-Correction Codes of Large Dimensions

  • Authors: Mohammad Vahid Jamali, Hamid Saber, Homayoon Hatami, Jung Hyun Bae
  • Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16424
  • Pdf link: https://arxiv.org/pdf/2303.16424
  • Abstract
    While decades of theoretical research have led to the invention of several classes of error-correction codes, the design of such codes is an extremely challenging task, mostly driven by human ingenuity. Recent studies demonstrate that such designs can be effectively automated and accelerated via tools from machine learning (ML), thus enabling ML-driven classes of error-correction codes with promising performance gains compared to classical designs. A fundamental challenge, however, is that it is prohibitively complex, if not impossible, to design and train fully ML-driven encoder and decoder pairs for large code dimensions. In this paper, we propose Product Autoencoder (ProductAE) -- a computationally-efficient family of deep learning driven (encoder, decoder) pairs -- aimed at enabling the training of relatively large codes (both encoder and decoder) with a manageable training complexity. We build upon ideas from classical product codes and propose constructing large neural codes using smaller code components. ProductAE boils down the complex problem of training the encoder and decoder for a large code dimension $k$ and blocklength $n$ to less-complex sub-problems of training encoders and decoders for smaller dimensions and blocklengths. Our training results show successful training of ProductAEs of dimensions as large as $k = 300$ bits with meaningful performance gains compared to state-of-the-art classical and neural designs. Moreover, we demonstrate excellent robustness and adaptivity of ProductAEs to channel models different than the ones used for training.

Learning Excavation of Rigid Objects with Offline Reinforcement Learning

  • Authors: Shiyu Jin, Zhixian Ye, Liangjun Zhang
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.16427
  • Pdf link: https://arxiv.org/pdf/2303.16427
  • Abstract
    Autonomous excavation is a challenging task. The unknown contact dynamics between the excavator bucket and the terrain could easily result in large contact forces and jamming problems during excavation. Traditional model-based methods struggle to handle such problems due to complex dynamic modeling. In this paper, we formulate the excavation skills with three novel manipulation primitives. We propose to learn the manipulation primitives with offline reinforcement learning (RL) to avoid large amounts of online robot interactions. The proposed method can learn efficient penetration skills from sub-optimal demonstrations, which contain sub-trajectories that can be ``stitched" together to formulate an optimal trajectory without causing jamming. We evaluate the proposed method with extensive experiments on excavating a variety of rigid objects and demonstrate that the learned policy outperforms the demonstrations. We also show that the learned policy can quickly adapt to unseen and challenging fragmented rocks with online fine-tuning.

Learning Complicated Manipulation Skills via Deterministic Policy with Limited Demonstrations

  • Authors: Liu Haofeng, Chen Yiwen, Tan Jiayi, Marcelo H Ang
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16469
  • Pdf link: https://arxiv.org/pdf/2303.16469
  • Abstract
    Combined with demonstrations, deep reinforcement learning can efficiently develop policies for manipulators. However, it takes time to collect sufficient high-quality demonstrations in practice. And human demonstrations may be unsuitable for robots. The non-Markovian process and over-reliance on demonstrations are further challenges. For example, we found that RL agents are sensitive to demonstration quality in manipulation tasks and struggle to adapt to demonstrations directly from humans. Thus it is challenging to leverage low-quality and insufficient demonstrations to assist reinforcement learning in training better policies, and sometimes, limited demonstrations even lead to worse performance. We propose a new algorithm named TD3fG (TD3 learning from a generator) to solve these problems. It forms a smooth transition from learning from experts to learning from experience. This innovation can help agents extract prior knowledge while reducing the detrimental effects of the demonstrations. Our algorithm performs well in Adroit manipulator and MuJoCo tasks with limited demonstrations.

Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields

  • Authors: Tao Hu, Xiaogang Xu, Shu Liu, Jiaya Jia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16482
  • Pdf link: https://arxiv.org/pdf/2303.16482
  • Abstract
    Synthesizing photo-realistic images from a point cloud is challenging because of the sparsity of point cloud representation. Recent Neural Radiance Fields and extensions are proposed to synthesize realistic images from 2D input. In this paper, we present Point2Pix as a novel point renderer to link the 3D sparse point clouds with 2D dense image pixels. Taking advantage of the point cloud 3D prior and NeRF rendering pipeline, our method can synthesize high-quality images from colored point clouds, generally for novel indoor scenes. To improve the efficiency of ray sampling, we propose point-guided sampling, which focuses on valid samples. Also, we present Point Encoding to build Multi-scale Radiance Fields that provide discriminative 3D point features. Finally, we propose Fusion Encoding to efficiently synthesize high-quality images. Extensive experiments on the ScanNet and ArkitScenes datasets demonstrate the effectiveness and generalization.

TriVol: Point Cloud Rendering via Triple Volumes

  • Authors: Tao Hu, Xiaogang Xu, Ruihang Chu, Jiaya Jia
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16485
  • Pdf link: https://arxiv.org/pdf/2303.16485
  • Abstract
    Existing learning-based methods for point cloud rendering adopt various 3D representations and feature querying mechanisms to alleviate the sparsity problem of point clouds. However, artifacts still appear in rendered images, due to the challenges in extracting continuous and discriminative 3D features from point clouds. In this paper, we present a dense while lightweight 3D representation, named TriVol, that can be combined with NeRF to render photo-realistic images from point clouds. Our TriVol consists of triple slim volumes, each of which is encoded from the point cloud. TriVol has two advantages. First, it fuses respective fields at different scales and thus extracts local and non-local features for discriminative representation. Second, since the volume size is greatly reduced, our 3D decoder can be efficiently inferred, allowing us to increase the resolution of the 3D space to render more point details. Extensive experiments on different benchmarks with varying kinds of scenes/objects demonstrate our framework's effectiveness compared with current approaches. Moreover, our framework has excellent generalization ability to render a category of scenes/objects without fine-tuning.

Development of a deep learning-based tool to assist wound classification

  • Authors: Po-Hsuan Huang, Yi-Hsiang Pan, Ying-Sheng Luo, Yi-Fan Chen, Yu-Cheng Lo, Trista Pei-Chun Chen, Cherng-Kang Perng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16522
  • Pdf link: https://arxiv.org/pdf/2303.16522
  • Abstract
    This paper presents a deep learning-based wound classification tool that can assist medical personnel in non-wound care specialization to classify five key wound conditions, namely deep wound, infected wound, arterial wound, venous wound, and pressure wound, given color images captured using readily available cameras. The accuracy of the classification is vital for appropriate wound management. The proposed wound classification method adopts a multi-task deep learning framework that leverages the relationships among the five key wound conditions for a unified wound classification architecture. With differences in Cohen's kappa coefficients as the metrics to compare our proposed model with humans, the performance of our model was better or non-inferior to those of all human medical personnel. Our convolutional neural network-based model is the first to classify five tasks of deep, infected, arterial, venous, and pressure wounds simultaneously with good accuracy. The proposed model is compact and matches or exceeds the performance of human doctors and nurses. Medical personnel who do not specialize in wound care can potentially benefit from an app equipped with the proposed deep learning model.

RusTitW: Russian Language Text Dataset for Visual Text in-the-Wild Recognition

  • Authors: Igor Markov, Sergey Nesteruk, Andrey Kuznetsov, Denis Dimitrov
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16531
  • Pdf link: https://arxiv.org/pdf/2303.16531
  • Abstract
    Information surrounds people in modern life. Text is a very efficient type of information that people use for communication for centuries. However, automated text-in-the-wild recognition remains a challenging problem. The major limitation for a DL system is the lack of training data. For the competitive performance, training set must contain many samples that replicate the real-world cases. While there are many high-quality datasets for English text recognition; there are no available datasets for Russian language. In this paper, we present a large-scale human-labeled dataset for Russian text recognition in-the-wild. We also publish a synthetic dataset and code to reproduce the generation process

Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks

  • Authors: Haoqi Yuan, Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, Zongqing Lu
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16563
  • Pdf link: https://arxiv.org/pdf/2303.16563
  • Abstract
    We study building a multi-task agent in Minecraft. Without human demonstrations, solving long-horizon tasks in this open-ended environment with reinforcement learning (RL) is extremely sample inefficient. To tackle the challenge, we decompose solving Minecraft tasks into learning basic skills and planning over the skills. We propose three types of fine-grained basic skills in Minecraft, and use RL with intrinsic rewards to accomplish basic skills with high success rates. For skill planning, we use Large Language Models to find the relationships between skills and build a skill graph in advance. When the agent is solving a task, our skill search algorithm walks on the skill graph and generates the proper skill plans for the agent. In experiments, our method accomplishes 24 diverse Minecraft tasks, where many tasks require sequentially executing for more than 10 skills. Our method outperforms baselines in most tasks by a large margin. The project's website and code can be found at https://sites.google.com/view/plan4mc.

Efficient and Reconfigurable Optimal Planning in Large-Scale Systems Using Hierarchical Finite State Machines

  • Authors: Elis Stefansson, Karl H. Johansson
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.16567
  • Pdf link: https://arxiv.org/pdf/2303.16567
  • Abstract
    In this paper, we consider a planning problem for a large-scale system modelled as a hierarchical finite state machine (HFSM) and develop a control algorithm for computing optimal plans between any two states. The control algorithm consists of two steps: a preprocessing step computing optimal exit costs for each machine in the HFSM, with time complexity scaling linearly with the number of machines, and a query step that rapidly computes optimal plans, truncating irrelevant parts of the HFSM using the optimal exit costs, with time complexity scaling near-linearly with the depth of the HFSM. The control algorithm is reconfigurable in the sense that a change in the HFSM is efficiently handled, updating only needed parts in the preprocessing step to account for the change, with time complexity linear in the depth of the HFSM. We validate our algorithm on a robotic application, comparing with Dijkstra's algorithm and Contraction Hierarchies. Our algorithm outperforms both.

Satisfiability of Non-Linear Transcendental Arithmetic as a Certificate Search Problem

  • Authors: Enrico Lipparini, Stefan Ratschan
  • Subjects: Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2303.16582
  • Pdf link: https://arxiv.org/pdf/2303.16582
  • Abstract
    For typical first-order logical theories, satisfying assignments have a straightforward finite representation that can directly serve as a certificate that a given assignment satisfies the given formula. For non-linear real arithmetic with transcendental functions, however, no general finite representation of satisfying assignments is available. Hence, in this paper, we introduce a different form of satisfiability certificate for this theory, formulate the satisfiability verification problem as the problem of searching for such a certificate, and show how to perform this search in a systematic fashion. This does not only ease the independent verification of results, but also allows the systematic design of new, efficient search techniques. Computational experiments document that the resulting method is able to prove satisfiability of a substantially higher number of benchmark problems than existing methods.

4D Facial Expression Diffusion Model

  • Authors: Kaifeng Zou, Sylvain Faisan, Boyang Yu, Sébastien Valette, Hyewon Seo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16611
  • Pdf link: https://arxiv.org/pdf/2303.16611
  • Abstract
    Facial expression generation is one of the most challenging and long-sought aspects of character animation, with many interesting applications. The challenging task, traditionally having relied heavily on digital craftspersons, remains yet to be explored. In this paper, we introduce a generative framework for generating 3D facial expression sequences (i.e. 4D faces) that can be conditioned on different inputs to animate an arbitrary 3D face mesh. It is composed of two tasks: (1) Learning the generative model that is trained over a set of 3D landmark sequences, and (2) Generating 3D mesh sequences of an input facial mesh driven by the generated landmark sequences. The generative model is based on a Denoising Diffusion Probabilistic Model (DDPM), which has achieved remarkable success in generative tasks of other domains. While it can be trained unconditionally, its reverse process can still be conditioned by various condition signals. This allows us to efficiently develop several downstream tasks involving various conditional generation, by using expression labels, text, partial sequences, or simply a facial geometry. To obtain the full mesh deformation, we then develop a landmark-guided encoder-decoder to apply the geometrical deformation embedded in landmarks on a given facial mesh. Experiments show that our model has learned to generate realistic, quality expressions solely from the dataset of relatively small size, improving over the state-of-the-art methods. Videos and qualitative comparisons with other methods can be found at https://github.com/ZOUKaifeng/4DFM. Code and models will be made available upon acceptance.

AraSpot: Arabic Spoken Command Spotting

  • Authors: Mahmoud Salhab, Haidar Harmanani
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16621
  • Pdf link: https://arxiv.org/pdf/2303.16621
  • Abstract
    Spoken keyword spotting (KWS) is the task of identifying a keyword in an audio stream and is widely used in smart devices at the edge in order to activate voice assistants and perform hands-free tasks. The task is daunting as there is a need, on the one hand, to achieve high accuracy while at the same time ensuring that such systems continue to run efficiently on low power and possibly limited computational capabilities devices. This work presents AraSpot for Arabic keyword spotting trained on 40 Arabic keywords, using different online data augmentation, and introducing ConformerGRU model architecture. Finally, we further improve the performance of the model by training a text-to-speech model for synthetic data generation. AraSpot achieved a State-of-the-Art SOTA 99.59% result outperforming previous approaches.

Optimizing Reconfigurable Intelligent Surfaces for Small Data Packets: A Subarray Approach

  • Authors: Anders Enqvist, Özlem Tuğfe Demir, Cicek Cavdar, Emil Björnson
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2303.16625
  • Pdf link: https://arxiv.org/pdf/2303.16625
  • Abstract
    In this paper, we examine the energy consumption of a user equipment (UE) when it transmits a finite-sized data packet. The receiving base station (BS) controls a reconfigurable intelligent surface (RIS) that can be utilized to improve the channel conditions, if additional pilot signals are transmitted to configure the RIS. We derive a formula for the energy consumption taking both the pilot and data transmission powers into account. By dividing the RIS into subarrays consisting of multiple RIS elements using the same reflection coefficient, the pilot overhead can be tuned to minimize the energy consumption while maintaining parts of the aperture gain. Our analytical results show that there exists an energy-minimizing subarray size. For small data blocks and when the channel conditions between the BS and UE are favorable compared to the path to the RIS, the energy consumption is minimized using large subarrays. When the channel conditions to the RIS are better and the data blocks are large, it is preferable to use fewer elements per subarray and potentially configure the elements individually.

Modeling online adaptive navigation in virtual environments based on PID control

  • Authors: Yuyang Wang, Jean-Rémy Chardonnet, Frédéric Merienne
  • Subjects: Human-Computer Interaction (cs.HC); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.16635
  • Pdf link: https://arxiv.org/pdf/2303.16635
  • Abstract
    It is well known that locomotion-dominated navigation tasks may highly provoke cybersickness effects. Past research has proposed numerous approaches to tackle this issue based on offline considerations. In this work, a novel approach to mitigate cybersickness is presented based on online adaptative navigation. Considering the Proportional-Integral-Derivative (PID) control method, we proposed a mathematical model for online adaptive navigation parameterized with several parameters, taking as input the users' electro-dermal activity (EDA), an efficient indicator to measure the cybersickness level, and providing as output adapted navigation accelerations. Therefore, minimizing the cybersickness level is regarded as an argument optimization problem: find the PID model parameters which can reduce the severity of cybersickness. User studies were organized to collect non-adapted navigation accelerations and the corresponding EDA signals. A deep neural network was then formulated to learn the correlation between EDA and navigation accelerations. The hyperparameters of the network were obtained through the Optuna open-source framework. To validate the performance of the optimized online adaptive navigation developed through the PID control, we performed an analysis in a simulated user study based on the pre-trained deep neural network. Results indicate a significant reduction of cybersickness in terms of EDA signal analysis and motion sickness dose value. This is a pioneering work which presented a systematic strategy for adaptive navigation settings from a theoretical point.

Learning Augmented, Multi-Robot Long-Horizon Navigation in Partially Mapped Environments

  • Authors: Abhish Khanal, Gregory J. Stein
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.16654
  • Pdf link: https://arxiv.org/pdf/2303.16654
  • Abstract
    We present a novel approach for efficient and reliable goal-directed long-horizon navigation for a multi-robot team in a structured, unknown environment by predicting statistics of unknown space. Building on recent work in learning-augmented model based planning under uncertainty, we introduce a high-level state and action abstraction that lets us approximate the challenging Dec-POMDP into a tractable stochastic MDP. Our Multi-Robot Learning over Subgoals Planner (MR-LSP) guides agents towards coordinated exploration of regions more likely to reach the unseen goal. We demonstrate improvement in cost against other multi-robot strategies; in simulated office-like environments, we show that our approach saves 13.29% (2 robot) and 4.6% (3 robot) average cost versus standard non-learned optimistic planning and a learning-informed baseline.

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

  • Authors: Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, Yu Qiao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16727
  • Pdf link: https://arxiv.org/pdf/2303.16727
  • Abstract
    Scale is the primary factor for building a powerful foundation model that could well generalize to a variety of downstream tasks. However, it is still challenging to train video foundation models with billions of parameters. This paper shows that video masked autoencoder (VideoMAE) is a scalable and general self-supervised pre-trainer for building video foundation models. We scale the VideoMAE in both model and data with a core design. Specifically, we present a dual masking strategy for efficient pre-training, with an encoder operating on a subset of video tokens and a decoder processing another subset of video tokens. Although VideoMAE is very efficient due to high masking ratio in encoder, masking decoder can still further reduce the overall computational cost. This enables the efficient pre-training of billion-level models in video. We also use a progressive training paradigm that involves an initial pre-training on a diverse multi-sourced unlabeled dataset, followed by a post-pre-training on a mixed labeled dataset. Finally, we successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance on the datasets of Kinetics (90.0% on K400 and 89.9% on K600) and Something-Something (68.7% on V1 and 77.0% on V2). In addition, we extensively verify the pre-trained video ViT models on a variety of downstream tasks, demonstrating its effectiveness as a general video representation learner.

Predictive Resource Allocation in mmWave Systems with Rotation Detection

  • Authors: Yifei Sun, Bojie Lv, Rui Wang, Haisheng Tan, Francis C. M. Lau
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.16734
  • Pdf link: https://arxiv.org/pdf/2303.16734
  • Abstract
    Millimeter wave (MmWave) has been regarded as a promising technology to support high-capacity communications in 5G era. However, its high-layer performance such as latency and packet drop rate in the long term highly depends on resource allocation because mmWave channel suffers significant fluctuation with rotating users due to mmWave sparse channel property and limited field-of-view (FoV) of antenna arrays. In this paper, downlink transmission scheduling considering rotation of user equipments (UE) and limited antenna FoV in an mmWave system is optimized via a novel approximate Markov decision process (MDP) method. Specifically, we consider the joint downlink UE selection and power allocation in a number of frames where future orientations of rotating UEs can be predicted via embedded motion sensors. The problem is formulated as a finite-horizon MDP with non-stationary state transition probabilities. A novel low-complexity solution framework is proposed via one iteration step over a base policy whose average future cost can be predicted with analytical expressions. It is demonstrated by simulations that compared with existing benchmarks, the proposed scheme can schedule the downlink transmission and suppress the packet drop rate efficiently in non-stationary mmWave links.

Improving Code Generation by Training with Natural Language Feedback

  • Authors: Angelica Chen, Jérémy Scheurer, Tomasz Korbak, Jon Ander Campos, Jun Shern Chan, Samuel R. Bowman, Kyunghyun Cho, Ethan Perez
  • Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16749
  • Pdf link: https://arxiv.org/pdf/2303.16749
  • Abstract
    The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitation learning from Language Feedback (ILF). ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient. We further show that ILF can be seen as a form of minimizing the KL divergence to the ground truth distribution and demonstrate a proof-of-concept on a neural program synthesis task. We use ILF to improve a Codegen-Mono 6.1B model's pass@1 rate by 38% relative (and 10% absolute) on the Mostly Basic Python Problems (MBPP) benchmark, outperforming both fine-tuning on MBPP and fine-tuning on repaired programs written by humans. Overall, our results suggest that learning from human-written natural language feedback is both more effective and sample-efficient than training exclusively on demonstrations for improving an LLM's performance on code generation tasks.

Judicial Intelligent Assistant System: Extracting Events from Divorce Cases to Detect Disputes for the Judge

  • Authors: Yuan Zhang, Chuanyi Li, Yu Sheng, Jidong Ge, Bin Luo
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2303.16751
  • Pdf link: https://arxiv.org/pdf/2303.16751
  • Abstract
    In formal procedure of civil cases, the textual materials provided by different parties describe the development process of the cases. It is a difficult but necessary task to extract the key information for the cases from these textual materials and to clarify the dispute focus of related parties. Currently, officers read the materials manually and use methods, such as keyword searching and regular matching, to get the target information. These approaches are time-consuming and heavily depending on prior knowledge and carefulness of the officers. To assist the officers to enhance working efficiency and accuracy, we propose an approach to detect disputes from divorce cases based on a two-round-labeling event extracting technique in this paper. We implement the Judicial Intelligent Assistant (JIA) system according to the proposed approach to 1) automatically extract focus events from divorce case materials, 2) align events by identifying co-reference among them, and 3) detect conflicts among events brought by the plaintiff and the defendant. With the JIA system, it is convenient for judges to determine the disputed issues. Experimental results demonstrate that the proposed approach and system can obtain the focus of cases and detect conflicts more effectively and efficiently comparing with existing method.

Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

  • Authors: Peiyu Liu, Ze-Feng Gao, Wayne Xin Zhao, Ji-Rong Wen
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2303.16753
  • Pdf link: https://arxiv.org/pdf/2303.16753
  • Abstract
    In this paper, we propose a highly parameter-efficient approach to scaling pre-trained language models (PLMs) to a deeper model depth. Unlike prior work that shares all parameters or uses extra blocks, we design a more capable parameter-sharing architecture based on matrix product operator (MPO). MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts: the major part that contains the major information (central tensor) and the supplementary part that only has a small proportion of parameters (auxiliary tensors). Based on such a decomposition, our architecture shares the central tensor across all layers for reducing the model size and meanwhile keeps layer-specific auxiliary tensors (also using adapters) for enhancing the adaptation flexibility. To improve the model training, we further propose a stable initialization algorithm tailored for the MPO-based architecture. Extensive experiments have demonstrated the effectiveness of our proposed model in reducing the model size and achieving highly competitive performance.

ACO-tagger: A Novel Method for Part-of-Speech Tagging using Ant Colony Optimization

  • Authors: Amirhossein Mohammadi, Sara Hajiaghajani, Mohammad Bahrani
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2303.16760
  • Pdf link: https://arxiv.org/pdf/2303.16760
  • Abstract
    Swarm Intelligence algorithms have gained significant attention in recent years as a means of solving complex and non-deterministic problems. These algorithms are inspired by the collective behavior of natural creatures, and they simulate this behavior to develop intelligent agents for computational tasks. One such algorithm is Ant Colony Optimization (ACO), which is inspired by the foraging behavior of ants and their pheromone laying mechanism. ACO is used for solving difficult problems that are discrete and combinatorial in nature. Part-of-Speech (POS) tagging is a fundamental task in natural language processing that aims to assign a part-of-speech role to each word in a sentence. In this research paper, proposed a high-performance POS-tagging method based on ACO called ACO-tagger. This method achieved a high accuracy rate of 96.867%, outperforming several state-of-the-art methods. The proposed method is fast and efficient, making it a viable option for practical applications.

Computationally Efficient Labeling of Cancer Related Forum Posts by Non-Clinical Text Information Retrieval

  • Authors: Jimmi Agerskov, Kristian Nielsen, Christian Marius Lillelund, Christian Fischer Pedersen
  • Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16766
  • Pdf link: https://arxiv.org/pdf/2303.16766
  • Abstract
    An abundance of information about cancer exists online, but categorizing and extracting useful information from it is difficult. Almost all research within healthcare data processing is concerned with formal clinical data, but there is valuable information in non-clinical data too. The present study combines methods within distributed computing, text retrieval, clustering, and classification into a coherent and computationally efficient system, that can clarify cancer patient trajectories based on non-clinical and freely available information. We produce a fully-functional prototype that can retrieve, cluster and present information about cancer trajectories from non-clinical forum posts. We evaluate three clustering algorithms (MR-DBSCAN, DBSCAN, and HDBSCAN) and compare them in terms of Adjusted Rand Index and total run time as a function of the number of posts retrieved and the neighborhood radius. Clustering results show that neighborhood radius has the most significant impact on clustering performance. For small values, the data set is split accordingly, but high values produce a large number of possible partitions and searching for the best partition is hereby time-consuming. With a proper estimated radius, MR-DBSCAN can cluster 50000 forum posts in 46.1 seconds, compared to DBSCAN (143.4) and HDBSCAN (282.3). We conduct an interview with the Danish Cancer Society and present our software prototype. The organization sees a potential in software that can democratize online information about cancer and foresee that such systems will be required in the future.

Maximin Headway Control of Automated Vehicles for System Optimal Dynamic Traffic Assignment in General Networks

  • Authors: Jinxiao Du, Wei Ma
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.16772
  • Pdf link: https://arxiv.org/pdf/2303.16772
  • Abstract
    This study develops the headway control framework in a fully automated road network, as we believe headway of Automated Vehicles (AVs) is another influencing factor to traffic dynamics in addition to conventional vehicle behaviors (e.g. route and departure time choices). Specifically, we aim to search for the optimal time headway between AVs on each link that achieves the network-wide system optimal dynamic traffic assignment (SO-DTA). To this end, the headway-dependent fundamental diagram (HFD) and headway-dependent double queue model (HDQ) are developed to model the effect of dynamic headway on roads, and a dynamic network model is built. It is rigorously proved that the minimum headway could always achieve SO-DTA, yet the optimal headway is non-unique. Motivated by these two findings, this study defines a novel concept of maximin headway, which is the largest headway that still achieves SO-DTA in the network. Mathematical properties regarding maximin headway are analyzed and an efficient solution algorithm is developed. Numerical experiments on both a small and large network verify the effectiveness of the maximin headway control framework as well as the properties of maximin headway. This study sheds light on deriving the desired solution among the non-unique solutions in SO-DTA and provides implications regarding the safety margin of AVs under SO-DTA.

Dispersion relation reconstruction for 2D Photonic Crystals based on polynomial interpolation

  • Authors: Yueqi Wang, Guanglian Li, Richard Craster
  • Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
  • Arxiv link: https://arxiv.org/abs/2303.16787
  • Pdf link: https://arxiv.org/pdf/2303.16787
  • Abstract
    Dispersion relation reflects the dependence of wave frequency on its wave vector when the wave passes through certain material. It demonstrates the properties of this material and thus it is critical. However, dispersion relation reconstruction is very time consuming and expensive. To address this bottleneck, we propose in this paper an efficient dispersion relation reconstruction scheme based on global polynomial interpolation for the approximation of 2D photonic band functions. Our method relies on the fact that the band functions are piecewise analytic with respect to the wave vector in the first Brillouin zone. We utilize suitable sampling points in the first Brillouin zone at which we solve the eigenvalue problem involved in the band function calculation, and then employ Lagrange interpolation to approximate the band functions on the whole first Brillouin zone. Numerical results show that our proposed methods can significantly improve the computational efficiency.

On real and observable realizations of input-output equations

  • Authors: Sebastian Falkensteiner, Dmitrii Pavlov, Rafael Sendra
  • Subjects: Symbolic Computation (cs.SC); Algebraic Geometry (math.AG); Dynamical Systems (math.DS); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2303.16799
  • Pdf link: https://arxiv.org/pdf/2303.16799
  • Abstract
    Given a single algebraic input-output equation, we present a method for finding different representations of the associated system in the form of rational realizations; these are dynamical systems with rational right-hand sides. It has been shown that in the case where the input-output equation is of order one, rational realizations can be computed, if they exist. In this work, we focus first on the existence and actual computation of the so-called observable rational realizations, and secondly on rational realizations with real coefficients. The study of observable realizations allows to find every rational realization of a given first order input-output equation, and the necessary field extensions in this process. We show that for first order input-output equations the existence of a rational realization is equivalent to the existence of an observable rational realization. Moreover, we give a criterion to decide the existence of real rational realizations. The computation of observable and real realizations of first order input-output equations is fully algorithmic. We also present partial results for the case of higher order input-output equations.

Adaptive Superpixel for Active Learning in Semantic Segmentation

  • Authors: Hoyoung Kim, Minhyeon Oh, Sehyun Hwang, Suha Kwak, Jungseul Ok
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16817
  • Pdf link: https://arxiv.org/pdf/2303.16817
  • Abstract
    Learning semantic segmentation requires pixel-wise annotations, which can be time-consuming and expensive. To reduce the annotation cost, we propose a superpixel-based active learning (AL) framework, which collects a dominant label per superpixel instead. To be specific, it consists of adaptive superpixel and sieving mechanisms, fully dedicated to AL. At each round of AL, we adaptively merge neighboring pixels of similar learned features into superpixels. We then query a selected subset of these superpixels using an acquisition function assuming no uniform superpixel size. This approach is more efficient than existing methods, which rely only on innate features such as RGB color and assume uniform superpixel sizes. Obtaining a dominant label per superpixel drastically reduces annotators' burden as it requires fewer clicks. However, it inevitably introduces noisy annotations due to mismatches between superpixel and ground truth segmentation. To address this issue, we further devise a sieving mechanism that identifies and excludes potentially noisy annotations from learning. Our experiments on both Cityscapes and PASCAL VOC datasets demonstrate the efficacy of adaptive superpixel and sieving mechanisms.

Multi-View Keypoints for Reliable 6D Object Pose Estimation

  • Authors: Alan Li, Angela P. Schoellig
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16833
  • Pdf link: https://arxiv.org/pdf/2303.16833
  • Abstract
    6D Object pose estimation is a fundamental component in robotics enabling efficient interaction with the environment. It is particularly challenging in bin-picking applications, where many objects are low-feature and reflective, and self-occlusion between objects of the same type is common. We propose a novel multi-view approach leveraging known camera transformations from an eye-in-hand setup to combine heatmap and keypoint estimates into a probability density map over 3D space. The result is a robust approach that is scalable in the number of views. It relies on a confidence score composed of keypoint probabilities and point-cloud alignment error, which allows reliable rejection of false positives. We demonstrate an average pose estimation error of approximately 0.5mm and 2 degrees across a variety of difficult low-feature and reflective objects in the ROBI dataset, while also surpassing the state-of-art correct detection rate, measured using the 10% object diameter threshold on ADD error.

Full-Range Approximation for the Theis Well Function Using Ramanujan's Series and Bounds for the Exponential Integral

  • Authors: Manotosh Kumbhakar, Vijay P. Singh
  • Subjects: Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2303.16871
  • Pdf link: https://arxiv.org/pdf/2303.16871
  • Abstract
    The solution of the governing equation representing the drawdown in a horizontal confined aquifer, where groundwater flow is unsteady, is provided in terms of the exponential integral, which is famously known as the Well function. For the computation of this function in practical applications, it is important to develop not only accurate but also a simple approximation that requires evaluation of the fewest possible terms. To that end, introducing Ramanujan's series expression, this work proposes a full-range approximation to the exponential integral using Ramanujan's series for the small argument (u \leq 1) and an approximation based on the bound of the integral for the other range (u \in (1,100]). The evaluation of the proposed approximation results in the most accurate formulae compared to the existing studies, which possess the maximum percentage error of 0.05%. Further, the proposed formula is much simpler to apply as it contains just the product of exponential and logarithm functions. To further check the efficiency of the proposed approximation, we consider a practical example for evaluating the discrete pumping kernel, which shows the superiority of this approximation over the others. Finally, the authors hope that the proposed efficient approximation can be useful for groundwater and hydrogeological applications.

CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network

  • Authors: Ruyi Lian, Haibin Ling
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16874
  • Pdf link: https://arxiv.org/pdf/2303.16874
  • Abstract
    Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task. Recent studies have shown the great potential of dense correspondence-based solutions, yet improvements are still needed to reach practical deployment. In this paper, we propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects. Firstly, CheckerPose densely samples 3D keypoints from the surface of the 3D object and finds their 2D correspondences progressively in the 2D image. Compared to previous solutions that conduct dense sampling in the image space, our strategy enables the correspondence searching in a 2D grid (i.e., pixel coordinate). Secondly, for our 3D-to-2D correspondence, we design a compact binary code representation for 2D image locations. This representation not only allows for progressive correspondence refinement but also converts the correspondence regression to a more efficient classification problem. Thirdly, we adopt a graph neural network to explicitly model the interactions among the sampled 3D keypoints, further boosting the reliability and accuracy of the correspondences. Together, these novel components make our CheckerPose a strong pose estimation algorithm. When evaluated on the popular Linemod, Linemod-O, and YCB-V object pose estimation benchmarks, CheckerPose clearly boosts the accuracy of correspondence-based methods and achieves state-of-the-art performances.

Keyword: faster

An EMO Joint Pruning with Multiple Sub-networks: Fast and Effect

  • Authors: Ronghua Shang, Songling Zhu, Licheng Jiao, Songhua Xu
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16212
  • Pdf link: https://arxiv.org/pdf/2303.16212
  • Abstract
    The network pruning algorithm based on evolutionary multi-objective (EMO) can balance the pruning rate and performance of the network. However, its population-based nature often suffers from the complex pruning optimization space and the highly resource-consuming pruning structure verification process, which limits its application. To this end, this paper proposes an EMO joint pruning with multiple sub-networks (EMO-PMS) to reduce space complexity and resource consumption. First, a divide-and-conquer EMO network pruning framework is proposed, which decomposes the complex EMO pruning task on the whole network into easier sub-tasks on multiple sub-networks. On the one hand, this decomposition reduces the pruning optimization space and decreases the optimization difficulty; on the other hand, the smaller network structure converges faster, so the computational resource consumption of the proposed algorithm is lower. Secondly, a sub-network training method based on cross-network constraints is designed so that the sub-network can process the features generated by the previous one through feature constraints. This method allows sub-networks optimized independently to collaborate better and improves the overall performance of the pruned network. Finally, a multiple sub-networks joint pruning method based on EMO is proposed. For one thing, it can accurately measure the feature processing capability of the sub-networks with the pre-trained feature selector. For another, it can combine multi-objective pruning results on multiple sub-networks through global performance impairment ranking to design a joint pruning scheme. The proposed algorithm is validated on three datasets with different challenging. Compared with fifteen advanced pruning algorithms, the experiment results exhibit the effectiveness and efficiency of the proposed algorithm.

Accelerated wind farm yaw and layout optimisation with multi-fidelity deep transfer learning wake models

  • Authors: Sokratis Anagnostopoulos, Jens Bauer, Mariana C. A. Clare, Matthew D. Piggott
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Atmospheric and Oceanic Physics (physics.ao-ph)
  • Arxiv link: https://arxiv.org/abs/2303.16274
  • Pdf link: https://arxiv.org/pdf/2303.16274
  • Abstract
    Wind farm modelling has been an area of rapidly increasing interest with numerous analytical as well as computational-based approaches developed to extend the margins of wind farm efficiency and maximise power production. In this work, we present the novel ML framework WakeNet, which can reproduce generalised 2D turbine wake velocity fields at hub-height over a wide range of yaw angles, wind speeds and turbulence intensities (TIs), with a mean accuracy of 99.8% compared to the solution calculated using the state-of-the-art wind farm modelling software FLORIS. As the generation of sufficient high-fidelity data for network training purposes can be cost-prohibitive, the utility of multi-fidelity transfer learning has also been investigated. Specifically, a network pre-trained on the low-fidelity Gaussian wake model is fine-tuned in order to obtain accurate wake results for the mid-fidelity Curl wake model. The robustness and overall performance of WakeNet on various wake steering control and layout optimisation scenarios has been validated through power-gain heatmaps, obtaining at least 90% of the power gained through optimisation performed with FLORIS directly. We also demonstrate that when utilising the Curl model, WakeNet is able to provide similar power gains to FLORIS, two orders of magnitude faster (e.g. 10 minutes vs 36 hours per optimisation case). The wake evaluation time of wakeNet when trained on a high-fidelity CFD dataset is expected to be similar, thus further increasing computational time gains. These promising results show that generalised wake modelling with ML tools can be accurate enough to contribute towards active yaw and layout optimisation, while producing realistic optimised configurations at a fraction of the computational cost, hence making it feasible to perform real-time active yaw control as well as robust optimisation under uncertainty.

FMAS: Fast Multi-Objective SuperNet Architecture Search for Semantic Segmentation

  • Authors: Zhuoran Xiong, Marihan Amein, Olivier Therrien, Warren J. Gross, Brett H. Meyer
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16322
  • Pdf link: https://arxiv.org/pdf/2303.16322
  • Abstract
    We present FMAS, a fast multi-objective neural architecture search framework for semantic segmentation. FMAS subsamples the structure and pre-trained parameters of DeepLabV3+, without fine-tuning, dramatically reducing training time during search. To further reduce candidate evaluation time, we use a subset of the validation dataset during the search. Only the final, Pareto non-dominated, candidates are ultimately fine-tuned using the complete training set. We evaluate FMAS by searching for models that effectively trade accuracy and computational cost on the PASCAL VOC 2012 dataset. FMAS finds competitive designs quickly, e.g., taking just 0.5 GPU days to discover a DeepLabV3+ variant that reduces FLOPs and parameters by 10$%$ and 20$%$ respectively, for less than 3$%$ increased error. We also search on an edge device called GAP8 and use its latency as the metric. FMAS is capable of finding 2.2$\times$ faster network with 7.61$%$ MIoU loss.

A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI

  • Authors: Tianyuan Yao, Nancy Newlin, Praitayini Kanakaraj, Vishwesh nath, Leon Y Cai, Karthik Ramadass, Kurt Schilling, Bennett A. Landman, Yuankai Huo
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16376
  • Pdf link: https://arxiv.org/pdf/2303.16376
  • Abstract
    Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relied on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences.

Hard Regularization to Prevent Collapse in Online Deep Clustering without Data Augmentation

  • Authors: Louis Mahon, Thomas Lukasiewicz
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16521
  • Pdf link: https://arxiv.org/pdf/2303.16521
  • Abstract
    Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed. While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster. Successful existing models have employed various techniques to avoid this problem, most of which require data augmentation or which aim to make the average soft assignment across the dataset the same for each cluster. We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments. Using a Bayesian framework, we derive an intuitive optimization objective that can be straightforwardly included in the training of the encoder network. Tested on four image datasets, we show that it consistently avoids collapse more robustly than other methods and that it leads to more accurate clustering. We also conduct further experiments and analyses justifying our choice to regularize the hard cluster assignments.

Runtime Verification of Self-Adaptive Systems with Changing Requirements

  • Authors: Marc Carwehl, Thomas Vogel, Genaína Nunes Rodrigues, Lars Grunske
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2303.16530
  • Pdf link: https://arxiv.org/pdf/2303.16530
  • Abstract
    To accurately make adaptation decisions, a self-adaptive system needs precise means to analyze itself at runtime. To this end, runtime verification can be used in the feedback loop to check that the managed system satisfies its requirements formalized as temporal-logic properties. These requirements, however, may change due to system evolution or uncertainty in the environment, managed system, and requirements themselves. Thus, the properties under investigation by the runtime verification have to be dynamically adapted to represent the changing requirements while preserving the knowledge about requirements satisfaction gathered thus far, all with minimal latency. To address this need, we present a runtime verification approach for self-adaptive systems with changing requirements. Our approach uses property specification patterns to automatically obtain automata with precise semantics that are the basis for runtime verification. The automata can be safely adapted during runtime verification while preserving intermediate verification results to seamlessly reflect requirement changes and enable continuous verification. We evaluate our approach on an Arduino prototype of the Body Sensor Network and the Timescales benchmark. Results show that our approach is over five times faster than the typical approach of redeploying and restarting runtime monitors to reflect requirements changes, while improving the system's trustworthiness by avoiding interruptions of verification.

Cyber Security aboard Micro Aerial Vehicles: An OpenTitan-based Visual Communication Use Case

  • Authors: Maicol Ciani, Stefano Bonato, Rafail Psiakis, Angelo Garofalo, Luca Valente, Suresh Sugumar, Alessandro Giusti, Davide Rossi, Daniele Palossi
  • Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.16554
  • Pdf link: https://arxiv.org/pdf/2303.16554
  • Abstract
    Autonomous Micro Aerial Vehicles (MAVs), with a form factor of 10cm in diameter, are an emerging technology thanks to the broad applicability enabled by their onboard intelligence. However, these platforms are strongly limited in the onboard power envelope for processing, i.e., less than a few hundred mW, which confines the onboard processors to the class of simple microcontroller units (MCUs). These MCUs lack advanced security features opening the way to a wide range of cyber security vulnerabilities, from the communication between agents of the same fleet to the onboard execution of malicious code. This work presents an open source System on Chip (SoC) design that integrates a 64 bit Linux capable host processor accelerated by an 8 core 32 bit parallel programmable accelerator. The heterogeneous system architecture is coupled with a security enclave based on an open source OpenTitan root of trust. To demonstrate our design, we propose a use case where OpenTitan detects a security breach on the SoC aboard the MAV and drives its exclusive GPIOs to start a LED blinking routine. This procedure embodies an unconventional visual communication between two palm sized MAVs: the receiver MAV classifies the LED state of the sender (on or off) with an onboard convolutional neural network running on the parallel accelerator. Then, it reconstructs a high-level message in 1.3s, 2.3 times faster than current commercial solutions.

An Efficient Online Prediction of Host Workloads Using Pruned GRU Neural Nets

  • Authors: Amin Setayesh, Hamid Hadian, Radu Prodan
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.16601
  • Pdf link: https://arxiv.org/pdf/2303.16601
  • Abstract
    Host load prediction is essential for dynamic resource scaling and job scheduling in a cloud computing environment. In this context, workload prediction is challenging because of several issues. First, it must be accurate to enable precise scheduling decisions. Second, it must be fast to schedule at the right time. Third, a model must be able to account for new patterns of workloads so it can perform well on the latest and old patterns. Not being able to make an accurate and fast prediction or the inability to predict new usage patterns can result in severe outcomes such as service level agreement (SLA) misses. Our research trains a fast model with the ability of online adaptation based on the gated recurrent unit (GRU) to mitigate the mentioned issues. We use a multivariate approach using several features, such as memory usage, CPU usage, disk I/O usage, and disk space, to perform the predictions accurately. Moreover, we predict multiple steps ahead, which is essential for making scheduling decisions in advance. Furthermore, we use two pruning methods: L1 norm and random, to produce a sparse model for faster forecasts. Finally, online learning is used to create a model that can adapt over time to new workload patterns.

Keyword: mobile

Assessing the Impact of Mobile Attackers on RPL-based Internet of Things

  • Authors: Cansu Dogan, Selim Yilmaz, Sevil Sen
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2303.16499
  • Pdf link: https://arxiv.org/pdf/2303.16499
  • Abstract
    The Internet of Things (IoT) is becoming ubiquitous in our daily life. IoT networks that are made up of devices low power, low memory, and low computing capability appears in many applications such as healthcare, home, agriculture. IPv6 Routing Protocol for Low Power and Lossy Network (RPL) has become a standardized routing protocol for such low-power and lossy networks in IoT. RPL establishes the best routes between devices according to the requirements of the application, which is achieved by the Objective Function (OF). Even though some security mechanisms are defined for external attackers in its RFC, RPL is vulnerable to attacks coming from inside. Moreover, the same attacks could has different impacts on networks with different OFs. Therefore, an analysis of such attacks becomes important in order to develop suitable security solutions for RPL. This study analyze RPL-specific attacks on networks using RPL's default OFs, namely Objective Function Zero (OF0) and the Minimum Rank with Hysteresis Objective Function (MRHOF). Moreover, mobile attackers could affect more nodes in a network due to their mobility. While the security solutions proposed in the literature assume that the network is static, this study takes into account mobile attackers.

Multi-Agent Reinforcement Learning with Action Masking for UAV-enabled Mobile Communications

  • Authors: Danish Rizvi, David Boyle
  • Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16737
  • Pdf link: https://arxiv.org/pdf/2303.16737
  • Abstract
    Unmanned Aerial Vehicles (UAVs) are increasingly used as aerial base stations to provide ad hoc communications infrastructure. Building upon prior research efforts which consider either static nodes, 2D trajectories or single UAV systems, this paper focuses on the use of multiple UAVs for providing wireless communication to mobile users in the absence of terrestrial communications infrastructure. In particular, we jointly optimize UAV 3D trajectory and NOMA power allocation to maximize system throughput. Firstly, a weighted K-means-based clustering algorithm establishes UAV-user associations at regular intervals. The efficacy of training a novel Shared Deep Q-Network (SDQN) with action masking is then explored. Unlike training each UAV separately using DQN, the SDQN reduces training time by using the experiences of multiple UAVs instead of a single agent. We also show that SDQN can be used to train a multi-agent system with differing action spaces. Simulation results confirm that: 1) training a shared DQN outperforms a conventional DQN in terms of maximum system throughput (+20%) and training time (-10%); 2) it can converge for agents with different action spaces, yielding a 9% increase in throughput compared to mutual learning algorithms; and 3) combining NOMA with an SDQN architecture enables the network to achieve a better sum rate compared with existing baseline schemes.

Active Implicit Object Reconstruction using Uncertainty-guided Next-Best-View Optimziation

  • Authors: Dongyu Yan, Jianheng Liu, Fengyu Quan, Haoyao Chen, Mengmeng Fu
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16739
  • Pdf link: https://arxiv.org/pdf/2303.16739
  • Abstract
    Actively planning sensor views during object reconstruction is essential to autonomous mobile robots. This task is usually performed by evaluating information gain from an explicit uncertainty map. Existing algorithms compare options among a set of preset candidate views and select the next-best-view from them. In contrast to these, we take the emerging implicit representation as the object model and seamlessly combine it with the active reconstruction task. To fully integrate observation information into the model, we propose a supervision method specifically for object-level reconstruction that considers both valid and free space. Additionally, to directly evaluate view information from the implicit object model, we introduce a sample-based uncertainty evaluation method. It samples points on rays directly from the object model and uses variations of implicit function inferences as the uncertainty metrics, with no need for voxel traversal or an additional information map. Leveraging the differentiability of our metrics, it is possible to optimize the next-best-view by maximizing the uncertainty continuously. This does away with the traditionally-used candidate views setting, which may provide sub-optimal results. Experiments in simulations and real-world scenes show that our method effectively improves the reconstruction accuracy and the view-planning efficiency of active reconstruction tasks. The proposed system is going to open source at https://github.com/HITSZ-NRSL/ActiveImplicitRecon.git.

Keyword: pruning

An EMO Joint Pruning with Multiple Sub-networks: Fast and Effect

  • Authors: Ronghua Shang, Songling Zhu, Licheng Jiao, Songhua Xu
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16212
  • Pdf link: https://arxiv.org/pdf/2303.16212
  • Abstract
    The network pruning algorithm based on evolutionary multi-objective (EMO) can balance the pruning rate and performance of the network. However, its population-based nature often suffers from the complex pruning optimization space and the highly resource-consuming pruning structure verification process, which limits its application. To this end, this paper proposes an EMO joint pruning with multiple sub-networks (EMO-PMS) to reduce space complexity and resource consumption. First, a divide-and-conquer EMO network pruning framework is proposed, which decomposes the complex EMO pruning task on the whole network into easier sub-tasks on multiple sub-networks. On the one hand, this decomposition reduces the pruning optimization space and decreases the optimization difficulty; on the other hand, the smaller network structure converges faster, so the computational resource consumption of the proposed algorithm is lower. Secondly, a sub-network training method based on cross-network constraints is designed so that the sub-network can process the features generated by the previous one through feature constraints. This method allows sub-networks optimized independently to collaborate better and improves the overall performance of the pruned network. Finally, a multiple sub-networks joint pruning method based on EMO is proposed. For one thing, it can accurately measure the feature processing capability of the sub-networks with the pre-trained feature selector. For another, it can combine multi-objective pruning results on multiple sub-networks through global performance impairment ranking to design a joint pruning scheme. The proposed algorithm is validated on three datasets with different challenging. Compared with fifteen advanced pruning algorithms, the experiment results exhibit the effectiveness and efficiency of the proposed algorithm.

Tetra-AML: Automatic Machine Learning via Tensor Networks

  • Authors: A. Naumov, Ar. Melnikov, V. Abronin, F. Oxanichenko, K. Izmailov, M. Pflitsch, A. Melnikov, M. Perelshtein
  • Subjects: Machine Learning (cs.LG); Quantum Physics (quant-ph)
  • Arxiv link: https://arxiv.org/abs/2303.16214
  • Pdf link: https://arxiv.org/pdf/2303.16214
  • Abstract
    Neural networks have revolutionized many aspects of society but in the era of huge models with billions of parameters, optimizing and deploying them for commercial applications can require significant computational and financial resources. To address these challenges, we introduce the Tetra-AML toolbox, which automates neural architecture search and hyperparameter optimization via a custom-developed black-box Tensor train Optimization algorithm, TetraOpt. The toolbox also provides model compression through quantization and pruning, augmented by compression using tensor networks. Here, we analyze a unified benchmark for optimizing neural networks in computer vision tasks and show the superior performance of our approach compared to Bayesian optimization on the CIFAR-10 dataset. We also demonstrate the compression of ResNet-18 neural networks, where we use 14.5 times less memory while losing just 3.2% of accuracy. The presented framework is generic, not limited by computer vision problems, supports hardware acceleration (such as with GPUs and TPUs) and can be further extended to quantum hardware and to hybrid quantum machine learning models.

An Efficient Online Prediction of Host Workloads Using Pruned GRU Neural Nets

  • Authors: Amin Setayesh, Hamid Hadian, Radu Prodan
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.16601
  • Pdf link: https://arxiv.org/pdf/2303.16601
  • Abstract
    Host load prediction is essential for dynamic resource scaling and job scheduling in a cloud computing environment. In this context, workload prediction is challenging because of several issues. First, it must be accurate to enable precise scheduling decisions. Second, it must be fast to schedule at the right time. Third, a model must be able to account for new patterns of workloads so it can perform well on the latest and old patterns. Not being able to make an accurate and fast prediction or the inability to predict new usage patterns can result in severe outcomes such as service level agreement (SLA) misses. Our research trains a fast model with the ability of online adaptation based on the gated recurrent unit (GRU) to mitigate the mentioned issues. We use a multivariate approach using several features, such as memory usage, CPU usage, disk I/O usage, and disk space, to perform the predictions accurately. Moreover, we predict multiple steps ahead, which is essential for making scheduling decisions in advance. Furthermore, we use two pruning methods: L1 norm and random, to produce a sparse model for faster forecasts. Finally, online learning is used to create a model that can adapt over time to new workload patterns.

Keyword: voxel

SnakeVoxFormer: Transformer-based Single Image\Voxel Reconstruction with Run Length Encoding

  • Authors: Jae Joong Lee, Bedrich Benes
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16293
  • Pdf link: https://arxiv.org/pdf/2303.16293
  • Abstract
    Deep learning-based 3D object reconstruction has achieved unprecedented results. Among those, the transformer deep neural model showed outstanding performance in many applications of computer vision. We introduce SnakeVoxFormer, a novel, 3D object reconstruction in voxel space from a single image using the transformer. The input to SnakeVoxFormer is a 2D image, and the result is a 3D voxel model. The key novelty of our approach is in using the run-length encoding that traverses (like a snake) the voxel space and encodes wide spatial differences into a 1D structure that is suitable for transformer encoding. We then use dictionary encoding to convert the discovered RLE blocks into tokens that are used for the transformer. The 1D representation is a lossless 3D shape data compression method that converts to 1D data that use only about 1% of the original data size. We show how different voxel traversing strategies affect the effect of encoding and reconstruction. We compare our method with the state-of-the-art for 3D voxel reconstruction from images and our method improves the state-of-the-art methods by at least 2.8% and up to 19.8%.

A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI

  • Authors: Tianyuan Yao, Nancy Newlin, Praitayini Kanakaraj, Vishwesh nath, Leon Y Cai, Karthik Ramadass, Kurt Schilling, Bennett A. Landman, Yuankai Huo
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16376
  • Pdf link: https://arxiv.org/pdf/2303.16376
  • Abstract
    Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relied on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences.

Active Implicit Object Reconstruction using Uncertainty-guided Next-Best-View Optimziation

  • Authors: Dongyu Yan, Jianheng Liu, Fengyu Quan, Haoyao Chen, Mengmeng Fu
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16739
  • Pdf link: https://arxiv.org/pdf/2303.16739
  • Abstract
    Actively planning sensor views during object reconstruction is essential to autonomous mobile robots. This task is usually performed by evaluating information gain from an explicit uncertainty map. Existing algorithms compare options among a set of preset candidate views and select the next-best-view from them. In contrast to these, we take the emerging implicit representation as the object model and seamlessly combine it with the active reconstruction task. To fully integrate observation information into the model, we propose a supervision method specifically for object-level reconstruction that considers both valid and free space. Additionally, to directly evaluate view information from the implicit object model, we introduce a sample-based uncertainty evaluation method. It samples points on rays directly from the object model and uses variations of implicit function inferences as the uncertainty metrics, with no need for voxel traversal or an additional information map. Leveraging the differentiability of our metrics, it is possible to optimize the next-best-view by maximizing the uncertainty continuously. This does away with the traditionally-used candidate views setting, which may provide sub-optimal results. Experiments in simulations and real-world scenes show that our method effectively improves the reconstruction accuracy and the view-planning efficiency of active reconstruction tasks. The proposed system is going to open source at https://github.com/HITSZ-NRSL/ActiveImplicitRecon.git.

Instant Neural Radiance Fields Stylization

  • Authors: Shaoxu Li, Ye Pan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16884
  • Pdf link: https://arxiv.org/pdf/2303.16884
  • Abstract
    We present Instant Neural Radiance Fields Stylization, a novel approach for multi-view image stylization for the 3D scene. Our approach models a neural radiance field based on neural graphics primitives, which use a hash table-based position encoder for position embedding. We split the position encoder into two parts, the content and style sub-branches, and train the network for normal novel view image synthesis with the content and style targets. In the inference stage, we execute AdaIN to the output features of the position encoder, with content and style voxel grid features as reference. With the adjusted features, the stylization of novel view images could be obtained. Our method extends the style target from style images to image sets of scenes and does not require additional network training for stylization. Given a set of images of 3D scenes and a style target(a style image or another set of 3D scenes), our method can generate stylized novel views with a consistent appearance at various view angles in less than 10 minutes on modern GPU hardware. Extensive experimental results demonstrate the validity and superiority of our method.

Keyword: lidar

Spatiotemporal Self-supervised Learning for Point Clouds in the Wild

  • Authors: Yanhao Wu, Tong Zhang, Wei Ke, Sabine Süsstrunk, Mathieu Salzmann
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16235
  • Pdf link: https://arxiv.org/pdf/2303.16235
  • Abstract
    Self-supervised learning (SSL) has the potential to benefit many applications, particularly those where manually annotating data is cumbersome. One such situation is the semantic segmentation of point clouds. In this context, existing methods employ contrastive learning strategies and define positive pairs by performing various augmentation of point clusters in a single frame. As such, these methods do not exploit the temporal nature of LiDAR data. In this paper, we introduce an SSL strategy that leverages positive pairs in both the spatial and temporal domain. To this end, we design (i) a point-to-cluster learning strategy that aggregates spatial information to distinguish objects; and (ii) a cluster-to-cluster learning strategy based on unsupervised object tracking that exploits temporal correspondences. We demonstrate the benefits of our approach via extensive experiments performed by self-supervised training on two large-scale LiDAR datasets and transferring the resulting models to other point cloud segmentation benchmarks. Our results evidence that our method outperforms the state-of-the-art point cloud SSL methods.

BEVSimDet: Simulated Multi-modal Distillation in Bird's-Eye View for Multi-view 3D Object Detection

  • Authors: Haimei Zhao, Qiming Zhang, Shanshan Zhao, Jing Zhang, Dacheng Tao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16818
  • Pdf link: https://arxiv.org/pdf/2303.16818
  • Abstract
    Multi-view camera-based 3D object detection has gained popularity due to its low cost. But accurately inferring 3D geometry solely from camera data remains challenging, which impacts model performance. One promising approach to address this issue is to distill precise 3D geometry knowledge from LiDAR data. However, transferring knowledge between different sensor modalities is hindered by the significant modality gap. In this paper, we approach this challenge from the perspective of both architecture design and knowledge distillation and present a new simulated multi-modal 3D object detection method named BEVSimDet. We first introduce a novel framework that includes a LiDAR and camera fusion-based teacher and a simulated multi-modal student, where the student simulates multi-modal features with image-only input. To facilitate effective distillation, we propose a simulated multi-modal distillation scheme that supports intra-modal, cross-modal, and multi-modal distillation simultaneously. By combining them together, BEVSimDet can learn better feature representations for 3D object detection while enjoying cost-effective camera-only deployment. Experimental results on the challenging nuScenes benchmark demonstrate the effectiveness and superiority of BEVSimDet over recent representative methods. The source code will be released.

Photometric LiDAR and RGB-D Bundle Adjustment

  • Authors: Luca Di Giammarino, Emanuele Giacomini, Leonardo Brizi, Omar Salem, Giorgio Grisetti
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.16878
  • Pdf link: https://arxiv.org/pdf/2303.16878
  • Abstract
    The joint optimization of the sensor trajectory and 3D map is a crucial characteristic of Simultaneous Localization and Mapping (SLAM) systems. To achieve this, the gold standard is Bundle Adjustment (BA). Modern 3D LiDARs now retain higher resolutions that enable the creation of point cloud images resembling those taken by conventional cameras. Nevertheless, the typical effective global refinement techniques employed for RGB-D sensors are not widely applied to LiDARs. This paper presents a novel BA photometric strategy that accounts for both RGB-D and LiDAR in the same way. Our work can be used on top of any SLAM/GNSS estimate to improve and refine the initial trajectory. We conducted different experiments using these two depth sensors on public benchmarks. Our results show that our system performs on par or better compared to other state-of-the-art ad-hoc SLAM/BA strategies, free from data association and without making assumptions about the environment. In addition, we present the benefit of jointly using RGB-D and LiDAR within our unified method. We finally release an open-source CUDA/C++ implementation.

Keyword: diffusion

Rethinking CycleGAN: Improving Quality of GANs for Unpaired Image-to-Image Translation

  • Authors: Dmitrii Torbunov, Yi Huang, Huan-Hsin Tseng, Haiwang Yu, Jin Huang, Shinjae Yoo, Meifeng Lin, Brett Viren, Yihui Ren
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16280
  • Pdf link: https://arxiv.org/pdf/2303.16280
  • Abstract
    An unpaired image-to-image (I2I) translation technique seeks to find a mapping between two domains of data in a fully unsupervised manner. While the initial solutions to the I2I problem were provided by the generative adversarial neural networks (GANs), currently, diffusion models (DM) hold the state-of-the-art status on the I2I translation benchmarks in terms of FID. Yet, they suffer from some limitations, such as not using data from the source domain during the training, or maintaining consistency of the source and translated images only via simple pixel-wise errors. This work revisits the classic CycleGAN model and equips it with recent advancements in model architectures and model training procedures. The revised model is shown to significantly outperform other advanced GAN- and DM-based competitors on a variety of benchmarks. In the case of Male2Female translation of CelebA, the model achieves over 40% improvement in FID score compared to the state-of-the-art results. This work also demonstrates the ineffectiveness of the pixel-wise I2I translation faithfulness metrics and suggests their revision. The code and trained models are available at https://github.com/LS4GAN/uvcgan2

A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI

  • Authors: Tianyuan Yao, Nancy Newlin, Praitayini Kanakaraj, Vishwesh nath, Leon Y Cai, Karthik Ramadass, Kurt Schilling, Bennett A. Landman, Yuankai Huo
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16376
  • Pdf link: https://arxiv.org/pdf/2303.16376
  • Abstract
    Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relied on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences.

A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion

  • Authors: Haomin Zhuang, Yihua Zhang, Sijia Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16378
  • Pdf link: https://arxiv.org/pdf/2303.16378
  • Abstract
    Despite the record-breaking performance in Text-to-Image (T2I) generation by Stable Diffusion, less research attention is paid to its adversarial robustness. In this work, we study the problem of adversarial attack generation for Stable Diffusion and ask if an adversarial text prompt can be obtained even in the absence of end-to-end model queries. We call the resulting problem 'query-free attack generation'. To resolve this problem, we show that the vulnerability of T2I models is rooted in the lack of robustness of text encoders, e.g., the CLIP text encoder used for attacking Stable Diffusion. Based on such insight, we propose both untargeted and targeted query-free attacks, where the former is built on the most influential dimensions in the text embedding space, which we call steerable key dimensions. By leveraging the proposed attacks, we empirically show that only a five-character perturbation to the text prompt is able to cause the significant content shift of synthesized images using Stable Diffusion. Moreover, we show that the proposed target attack can precisely steer the diffusion model to scrub the targeted image content without causing much change in untargeted image content.

Implicit Diffusion Models for Continuous Super-Resolution

  • Authors: Sicheng Gao, Xuhui Liu, Bohan Zeng, Sheng Xu, Yanjing Li, Xiaoyan Luo, Jianzhuang Liu, Xiantong Zhen, Baochang Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16491
  • Pdf link: https://arxiv.org/pdf/2303.16491
  • Abstract
    Image super-resolution (SR) has attracted increasing attention due to its wide applications. However, current SR methods generally suffer from over-smoothing and artifacts, and most work only with fixed magnifications. This paper introduces an Implicit Diffusion Model (IDM) for high-fidelity continuous image super-resolution. IDM integrates an implicit neural representation and a denoising diffusion model in a unified end-to-end framework, where the implicit neural representation is adopted in the decoding process to learn continuous-resolution representation. Furthermore, we design a scale-controllable conditioning mechanism that consists of a low-resolution (LR) conditioning network and a scaling factor. The scaling factor regulates the resolution and accordingly modulates the proportion of the LR information and generated features in the final output, which enables the model to accommodate the continuous-resolution requirement. Extensive experiments validate the effectiveness of our IDM and demonstrate its superior performance over prior arts.

HOLODIFFUSION: Training a 3D Diffusion Model using 2D Images

  • Authors: Animesh Karnewar, Andrea Vedaldi, David Novotny, Niloy Mitra
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2303.16509
  • Pdf link: https://arxiv.org/pdf/2303.16509
  • Abstract
    Diffusion models have emerged as the best approach for generative modeling of 2D images. Part of their success is due to the possibility of training them on millions if not billions of images with a stable learning objective. However, extending these models to 3D remains difficult for two reasons. First, finding a large quantity of 3D training data is much more complex than for 2D images. Second, while it is conceptually trivial to extend the models to operate on 3D rather than 2D grids, the associated cubic growth in memory and compute complexity makes this infeasible. We address the first challenge by introducing a new diffusion setup that can be trained, end-to-end, with only posed 2D images for supervision; and the second challenge by proposing an image formation model that decouples model memory from spatial memory. We evaluate our method on real-world data, using the CO3D dataset which has not been used to train 3D generative models before. We show that our diffusion models are scalable, train robustly, and are competitive in terms of sample quality and fidelity to existing approaches for 3D generative modeling.

WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models

  • Authors: Konstantina Nikolaidou, George Retsinas, Vincent Christlein, Mathias Seuret, Giorgos Sfikas, Elisa Barney Smith, Hamam Mokayed, Marcus Liwicki
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16576
  • Pdf link: https://arxiv.org/pdf/2303.16576
  • Abstract
    Text-to-Image synthesis is the task of generating an image according to a specific text description. Generative Adversarial Networks have been considered the standard method for image synthesis virtually since their introduction; today, Denoising Diffusion Probabilistic Models are recently setting a new baseline, with remarkable results in Text-to-Image synthesis, among other fields. Aside its usefulness per se, it can also be particularly relevant as a tool for data augmentation to aid training models for other document image processing tasks. In this work, we present a latent diffusion-based method for styled text-to-text-content-image generation on word-level. Our proposed method manages to generate realistic word image samples from different writer styles, by using class index styles and text content prompts without the need of adversarial training, writer recognition, or text recognition. We gauge system performance with Frechet Inception Distance, writer recognition accuracy, and writer retrieval. We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and gets similar writer retrieval score as real data.

4D Facial Expression Diffusion Model

  • Authors: Kaifeng Zou, Sylvain Faisan, Boyang Yu, Sébastien Valette, Hyewon Seo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16611
  • Pdf link: https://arxiv.org/pdf/2303.16611
  • Abstract
    Facial expression generation is one of the most challenging and long-sought aspects of character animation, with many interesting applications. The challenging task, traditionally having relied heavily on digital craftspersons, remains yet to be explored. In this paper, we introduce a generative framework for generating 3D facial expression sequences (i.e. 4D faces) that can be conditioned on different inputs to animate an arbitrary 3D face mesh. It is composed of two tasks: (1) Learning the generative model that is trained over a set of 3D landmark sequences, and (2) Generating 3D mesh sequences of an input facial mesh driven by the generated landmark sequences. The generative model is based on a Denoising Diffusion Probabilistic Model (DDPM), which has achieved remarkable success in generative tasks of other domains. While it can be trained unconditionally, its reverse process can still be conditioned by various condition signals. This allows us to efficiently develop several downstream tasks involving various conditional generation, by using expression labels, text, partial sequences, or simply a facial geometry. To obtain the full mesh deformation, we then develop a landmark-guided encoder-decoder to apply the geometrical deformation embedded in landmarks on a given facial mesh. Experiments show that our model has learned to generate realistic, quality expressions solely from the dataset of relatively small size, improving over the state-of-the-art methods. Videos and qualitative comparisons with other methods can be found at https://github.com/ZOUKaifeng/4DFM. Code and models will be made available upon acceptance.

MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path

  • Authors: Qian Wang, Biao Zhang, Michael Birsak, Peter Wonka
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16765
  • Pdf link: https://arxiv.org/pdf/2303.16765
  • Abstract
    Image generation using diffusion can be controlled in multiple ways. In this paper, we systematically analyze the equations of modern generative diffusion networks to propose a framework, called MDP, that explains the design space of suitable manipulations. We identify 5 different manipulations, including intermediate latent, conditional embedding, cross attention maps, guidance, and predicted noise. We analyze the corresponding parameters of these manipulations and the manipulation schedule. We show that some previous editing methods fit nicely into our framework. Particularly, we identified one specific configuration as a new type of control by manipulating the predicted noise, which can perform higher-quality edits than previous work for a variety of local and global edits.

Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos

  • Authors: Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2303.16897
  • Pdf link: https://arxiv.org/pdf/2303.16897
  • Abstract
    Modeling sounds emitted from physical object interactions is critical for immersive perceptual experiences in real and virtual worlds. Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound. However, they require fine details of both the object geometries and impact locations, which are rarely available in the real world and can not be applied to synthesize impact sounds from common videos. On the other hand, existing video-driven deep learning-based approaches could only capture the weak correspondence between visual content and impact sounds since they lack of physics knowledge. In this work, we propose a physics-driven diffusion model that can synthesize high-fidelity impact sound for a silent video clip. In addition to the video content, we propose to use additional physics priors to guide the impact sound synthesis procedure. The physics priors include both physics parameters that are directly estimated from noisy real-world impact sound examples without sophisticated setup and learned residual parameters that interpret the sound environment via neural networks. We further implement a novel diffusion model with specific training and inference strategies to combine physics priors and visual information for impact sound synthesis. Experimental results show that our model outperforms several existing systems in generating realistic impact sounds. More importantly, the physics-based representations are fully interpretable and transparent, thus enabling us to perform sound editing flexibly.

Keyword: dynamic

Towards Quantifying Calibrated Uncertainty via Deep Ensembles in Multi-output Regression Task

  • Authors: Sunwoong Yang, Kwanjung Yee
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2303.16210
  • Pdf link: https://arxiv.org/pdf/2303.16210
  • Abstract
    Deep ensemble is a simple and straightforward approach for approximating Bayesian inference and has been successfully applied to many classification tasks. This study aims to comprehensively investigate this approach in the multi-output regression task to predict the aerodynamic performance of a missile configuration. By scrutinizing the effect of the number of neural networks used in the ensemble, an obvious trend toward underconfidence in estimated uncertainty is observed. In this context, we propose the deep ensemble framework that applies the post-hoc calibration method, and its improved uncertainty quantification performance is demonstrated. It is compared with Gaussian process regression, the most prevalent model for uncertainty quantification in engineering, and is proven to have superior performance in terms of regression accuracy, reliability of estimated uncertainty, and training efficiency. Finally, the impact of the suggested framework on the results of Bayesian optimization is examined, showing that whether or not the deep ensemble is calibrated can result in completely different exploration characteristics. This framework can be seamlessly applied and extended to any regression task, as no special assumptions have been made for the specific problem used in this study.

TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition

  • Authors: Ishan Rajendrakumar Dave, Mamshad Nayeem Rizve, Chen Chen, Mubarak Shah
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16268
  • Pdf link: https://arxiv.org/pdf/2303.16268
  • Abstract
    Semi-Supervised Learning can be more beneficial for the video domain compared to images because of its higher annotation cost and dimensionality. Besides, any video understanding task requires reasoning over both spatial and temporal dimensions. In order to learn both the static and motion related features for the semi-supervised action recognition task, existing methods rely on hard input inductive biases like using two-modalities (RGB and Optical-flow) or two-stream of different playback rates. Instead of utilizing unlabeled videos through diverse input streams, we rely on self-supervised video representations, particularly, we utilize temporally-invariant and temporally-distinctive representations. We observe that these representations complement each other depending on the nature of the action. Based on this observation, we propose a student-teacher semi-supervised learning framework, TimeBalance, where we distill the knowledge from a temporally-invariant and a temporally-distinctive teacher. Depending on the nature of the unlabeled video, we dynamically combine the knowledge of these two teachers based on a novel temporal similarity-based reweighting scheme. Our method achieves state-of-the-art performance on three action recognition benchmarks: UCF101, HMDB51, and Kinetics400. Code: https://github.com/DAVEISHAN/TimeBalance

On the Local Cache Update Rules in Streaming Federated Learning

  • Authors: Heqiang Wang, Jieming Bian, Jie Xu
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.16340
  • Pdf link: https://arxiv.org/pdf/2303.16340
  • Abstract
    In this study, we address the emerging field of Streaming Federated Learning (SFL) and propose local cache update rules to manage dynamic data distributions and limited cache capacity. Traditional federated learning relies on fixed data sets, whereas in SFL, data is streamed, and its distribution changes over time, leading to discrepancies between the local training dataset and long-term distribution. To mitigate this problem, we propose three local cache update rules - First-In-First-Out (FIFO), Static Ratio Selective Replacement (SRSR), and Dynamic Ratio Selective Replacement (DRSR) - that update the local cache of each client while considering the limited cache capacity. Furthermore, we derive a convergence bound for our proposed SFL algorithm as a function of the distribution discrepancy between the long-term data distribution and the client's local training dataset. We then evaluate our proposed algorithm on two datasets: a network traffic classification dataset and an image classification dataset. Our experimental results demonstrate that our proposed local cache update rules significantly reduce the distribution discrepancy and outperform the baseline methods. Our study advances the field of SFL and provides practical cache management solutions in federated learning.

EJ-FAT Joint ESnet JLab FPGA Accelerated Transport Load Balancer

  • Authors: Stacey Sheldon, Yatish Kumar, Michael Goodrich, Graham Heyes
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2303.16351
  • Pdf link: https://arxiv.org/pdf/2303.16351
  • Abstract
    To increase the science rate for high data rates/volumes, Thomas Jefferson National Accelerator Facility (JLab) has partnered with Energy Sciences Network (ESnet) to define an edge to data center traffic shaping / steering transport capability featuring data event-aware network shaping and forwarding. The keystone of this ESnet JLab FPGA Accelerated Transport (EJFAT) is the joint development of a dynamic compute work Load Balancer (LB) of UDP streamed data. The LB is a suite consisting of a Field Programmable Gate Array (FPGA) executing the dynamically configurable, low fixed latency LB data plane featuring real-time packet redirection at high throughput, and a control plane running on the FPGA host computer that monitors network and compute farm telemetry in order to make dynamic decisions for destination compute host redirection / load balancing.

A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI

  • Authors: Tianyuan Yao, Nancy Newlin, Praitayini Kanakaraj, Vishwesh nath, Leon Y Cai, Karthik Ramadass, Kurt Schilling, Bennett A. Landman, Yuankai Huo
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16376
  • Pdf link: https://arxiv.org/pdf/2303.16376
  • Abstract
    Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relied on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences.

ARMBench: An Object-centric Benchmark Dataset for Robotic Manipulation

  • Authors: Chaitanya Mitash, Fan Wang, Shiyang Lu, Vikedo Terhuja, Tyler Garaas, Felipe Polido, Manikantan Nambi
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16382
  • Pdf link: https://arxiv.org/pdf/2303.16382
  • Abstract
    This paper introduces Amazon Robotic Manipulation Benchmark (ARMBench), a large-scale, object-centric benchmark dataset for robotic manipulation in the context of a warehouse. Automation of operations in modern warehouses requires a robotic manipulator to deal with a wide variety of objects, unstructured storage, and dynamically changing inventory. Such settings pose challenges in perceiving the identity, physical characteristics, and state of objects during manipulation. Existing datasets for robotic manipulation consider a limited set of objects or utilize 3D models to generate synthetic scenes with limitation in capturing the variety of object properties, clutter, and interactions. We present a large-scale dataset collected in an Amazon warehouse using a robotic manipulator performing object singulation from containers with heterogeneous contents. ARMBench contains images, videos, and metadata that corresponds to 235K+ pick-and-place activities on 190K+ unique objects. The data is captured at different stages of manipulation, i.e., pre-pick, during transfer, and after placement. Benchmark tasks are proposed by virtue of high-quality annotations and baseline performance evaluation are presented on three visual perception challenges, namely 1) object segmentation in clutter, 2) object identification, and 3) defect detection. ARMBench can be accessed at this http URL

Learning Excavation of Rigid Objects with Offline Reinforcement Learning

  • Authors: Shiyu Jin, Zhixian Ye, Liangjun Zhang
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.16427
  • Pdf link: https://arxiv.org/pdf/2303.16427
  • Abstract
    Autonomous excavation is a challenging task. The unknown contact dynamics between the excavator bucket and the terrain could easily result in large contact forces and jamming problems during excavation. Traditional model-based methods struggle to handle such problems due to complex dynamic modeling. In this paper, we formulate the excavation skills with three novel manipulation primitives. We propose to learn the manipulation primitives with offline reinforcement learning (RL) to avoid large amounts of online robot interactions. The proposed method can learn efficient penetration skills from sub-optimal demonstrations, which contain sub-trajectories that can be ``stitched" together to formulate an optimal trajectory without causing jamming. We evaluate the proposed method with extensive experiments on excavating a variety of rigid objects and demonstrate that the learned policy outperforms the demonstrations. We also show that the learned policy can quickly adapt to unseen and challenging fragmented rocks with online fine-tuning.

Ordinary Differential Equation-based Sparse Signal Recovery

  • Authors: Tadashi Wadayama, Ayano Nakai-Kasai
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2303.16431
  • Pdf link: https://arxiv.org/pdf/2303.16431
  • Abstract
    This study investigates the use of continuous-time dynamical systems for sparse signal recovery. The proposed dynamical system is in the form of a nonlinear ordinary differential equation (ODE) derived from the gradient flow of the Lasso objective function. The sparse signal recovery process of this ODE-based approach is demonstrated by numerical simulations using the Euler method. The state of the continuous-time dynamical system eventually converges to the equilibrium point corresponding to the minimum of the objective function. To gain insight into the local convergence properties of the system, a linear approximation around the equilibrium point is applied, yielding a closed-form error evolution ODE. This analysis shows the behavior of convergence to the equilibrium point. In addition, a variational optimization problem is proposed to optimize a time-dependent regularization parameter in order to improve both convergence speed and solution quality. The deep unfolded-variational optimization method is introduced as a means of solving this optimization problem, and its effectiveness is validated through numerical experiments.

Towards Understanding the Endemic Behavior of a Competitive Tri-Virus SIS Networked Model

  • Authors: Sebin Gracy, Mengbin Ye, Brian D.O. Anderson, Cesar A. Uribe
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.16457
  • Pdf link: https://arxiv.org/pdf/2303.16457
  • Abstract
    This paper studies the endemic behavior of a multi-competitive networked susceptible-infected-susceptible (SIS) model. Specifically, the paper deals with three competing virus systems (i.e., tri-virus systems). First, we show that a tri-virus system, unlike a bi-virus system, is not a monotone dynamical system. Using the Parametric Transversality Theorem, we show that, generically, a tri-virus system has a finite number of equilibria and that the Jacobian matrices associated with each equilibrium are nonsingular. The endemic equilibria of this system can be classified as follows: a) single-virus endemic equilibria (also referred to as the boundary equilibria), where precisely one of the three viruses is alive; b) 2-coexistence equilibria, where exactly two of the three viruses are alive; and c) 3-coexistence equilibria, where all three viruses survive in the network. We provide a necessary and sufficient condition that guarantees local exponential convergence to a boundary equilibrium. Further, we secure conditions for the nonexistence of 3-coexistence equilibria (resp. for various forms of 2-coexistence equilibria). We also identify sufficient conditions for the existence of a 2-coexistence (resp. 3-coexistence) equilibrium. We identify conditions on the model parameters that give rise to a continuum of coexistence equilibria. More specifically, we establish i) a scenario that admits the existence and local exponential attractivity of a line of coexistence equilibria; and ii) scenarios that admit the existence of, and, in the case of one such scenario, global convergence to, a plane of 3-coexistence equilibria.

Runtime Verification of Self-Adaptive Systems with Changing Requirements

  • Authors: Marc Carwehl, Thomas Vogel, Genaína Nunes Rodrigues, Lars Grunske
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2303.16530
  • Pdf link: https://arxiv.org/pdf/2303.16530
  • Abstract
    To accurately make adaptation decisions, a self-adaptive system needs precise means to analyze itself at runtime. To this end, runtime verification can be used in the feedback loop to check that the managed system satisfies its requirements formalized as temporal-logic properties. These requirements, however, may change due to system evolution or uncertainty in the environment, managed system, and requirements themselves. Thus, the properties under investigation by the runtime verification have to be dynamically adapted to represent the changing requirements while preserving the knowledge about requirements satisfaction gathered thus far, all with minimal latency. To address this need, we present a runtime verification approach for self-adaptive systems with changing requirements. Our approach uses property specification patterns to automatically obtain automata with precise semantics that are the basis for runtime verification. The automata can be safely adapted during runtime verification while preserving intermediate verification results to seamlessly reflect requirement changes and enable continuous verification. We evaluate our approach on an Arduino prototype of the Body Sensor Network and the Timescales benchmark. Results show that our approach is over five times faster than the typical approach of redeploying and restarting runtime monitors to reflect requirements changes, while improving the system's trustworthiness by avoiding interruptions of verification.

Futures Quantitative Investment with Heterogeneous Continual Graph Neural Network

  • Authors: Zhizhong Tan, Min Hu, Yixuan Wang, Lu Wei, Bin Liu
  • Subjects: Machine Learning (cs.LG); Statistical Finance (q-fin.ST); Applications (stat.AP)
  • Arxiv link: https://arxiv.org/abs/2303.16532
  • Pdf link: https://arxiv.org/pdf/2303.16532
  • Abstract
    It is a challenging problem to predict trends of futures prices with traditional econometric models as one needs to consider not only futures' historical data but also correlations among different futures. Spatial-temporal graph neural networks (STGNNs) have great advantages in dealing with such kind of spatial-temporal data. However, we cannot directly apply STGNNs to high-frequency future data because future investors have to consider both the long-term and short-term characteristics when doing decision-making. To capture both the long-term and short-term features, we exploit more label information by designing four heterogeneous tasks: price regression, price moving average regression, price gap regression (within a short interval), and change-point detection, which involve both long-term and short-term scenes. To make full use of these labels, we train our model in a continual manner. Traditional continual GNNs define the gradient of prices as the parameter important to overcome catastrophic forgetting (CF). Unfortunately, the losses of the four heterogeneous tasks lie in different spaces. Hence it is improper to calculate the parameter importance with their losses. We propose to calculate parameter importance with mutual information between original observations and the extracted features. The empirical results based on 49 commodity futures demonstrate that our model has higher prediction performance on capturing long-term or short-term dynamic change.

On the use of chaotic dynamics for mobile network design and analysis: towards a trace data generator

  • Authors: Martin Rosalie, Serge Chaumette
  • Subjects: Multiagent Systems (cs.MA); Dynamical Systems (math.DS); Chaotic Dynamics (nlin.CD)
  • Arxiv link: https://arxiv.org/abs/2303.16583
  • Pdf link: https://arxiv.org/pdf/2303.16583
  • Abstract
    With the constant increase of the number of autonomous vehicles and connected objects, tools to understand and reproduce their mobility models are required. We focus on chaotic dynamics and review their applications in the design of mobility models. We also provide a review of the nonlinear tools used to characterize mobility models, as it can be found in the literature. Finally, we propose a method to generate traces for a given scenario involving moving people, using tools from the nonlinear analysis domain usually dedicated to topological analysis of chaotic attractors.

An Efficient Online Prediction of Host Workloads Using Pruned GRU Neural Nets

  • Authors: Amin Setayesh, Hamid Hadian, Radu Prodan
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2303.16601
  • Pdf link: https://arxiv.org/pdf/2303.16601
  • Abstract
    Host load prediction is essential for dynamic resource scaling and job scheduling in a cloud computing environment. In this context, workload prediction is challenging because of several issues. First, it must be accurate to enable precise scheduling decisions. Second, it must be fast to schedule at the right time. Third, a model must be able to account for new patterns of workloads so it can perform well on the latest and old patterns. Not being able to make an accurate and fast prediction or the inability to predict new usage patterns can result in severe outcomes such as service level agreement (SLA) misses. Our research trains a fast model with the ability of online adaptation based on the gated recurrent unit (GRU) to mitigate the mentioned issues. We use a multivariate approach using several features, such as memory usage, CPU usage, disk I/O usage, and disk space, to perform the predictions accurately. Moreover, we predict multiple steps ahead, which is essential for making scheduling decisions in advance. Furthermore, we use two pruning methods: L1 norm and random, to produce a sparse model for faster forecasts. Finally, online learning is used to create a model that can adapt over time to new workload patterns.

DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking

  • Authors: Qing Lian, Tai Wang, Dahua Lin, Jiangmiao Pang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.16628
  • Pdf link: https://arxiv.org/pdf/2303.16628
  • Abstract
    Recent multi-camera 3D object detectors usually leverage temporal information to construct multi-view stereo that alleviates the ill-posed depth estimation. However, they typically assume all the objects are static and directly aggregate features across frames. This work begins with a theoretical and empirical analysis to reveal that ignoring the motion of moving objects can result in serious localization bias. Therefore, we propose to model Dynamic Objects in RecurrenT (DORT) to tackle this problem. In contrast to previous global Bird-Eye-View (BEV) methods, DORT extracts object-wise local volumes for motion estimation that also alleviates the heavy computational burden. By iteratively refining the estimated object motion and location, the preceding features can be precisely aggregated to the current frame to mitigate the aforementioned adverse effects. The simple framework has two significant appealing properties. It is flexible and practical that can be plugged into most camera-based 3D object detectors. As there are predictions of object motion in the loop, it can easily track objects across frames according to their nearest center distances. Without bells and whistles, DORT outperforms all the previous methods on the nuScenes detection and tracking benchmarks with 62.5% NDS and 57.6% AMOTA, respectively. The source code will be released.

Learning Flow Functions from Data with Applications to Nonlinear Oscillators

  • Authors: Miguel Aguiar, Amritam Das, Karl H. Johansson
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2303.16656
  • Pdf link: https://arxiv.org/pdf/2303.16656
  • Abstract
    We describe a recurrent neural network (RNN) based architecture to learn the flow function of a causal, time-invariant and continuous-time control system from trajectory data. By restricting the class of control inputs to piecewise constant functions, we show that learning the flow function is equivalent to learning the input-to-state map of a discrete-time dynamical system. This motivates the use of an RNN together with encoder and decoder networks which map the state of the system to the hidden state of the RNN and back. We show that the proposed architecture is able to approximate the flow function by exploiting the system's causality and time-invariance. The output of the learned flow function model can be queried at any time instant. We experimentally validate the proposed method using models of the Van der Pol and FitzHugh Nagumo oscillators. In both cases, the results demonstrate that the architecture is able to closely reproduce the trajectories of these two systems. For the Van der Pol oscillator, we further show that the trained model generalises to the system's response with a prolonged prediction time horizon as well as control inputs outside the training distribution. For the FitzHugh-Nagumo oscillator, we show that the model accurately captures the input-dependent phenomena of excitability.

Who You Play Affects How You Play: Predicting Sports Performance Using Graph Attention Networks With Temporal Convolution

  • Authors: Rui Luo, Vikram Krishnamurthy
  • Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2303.16741
  • Pdf link: https://arxiv.org/pdf/2303.16741
  • Abstract
    This study presents a novel deep learning method, called GATv2-GCN, for predicting player performance in sports. To construct a dynamic player interaction graph, we leverage player statistics and their interactions during gameplay. We use a graph attention network to capture the attention that each player pays to each other, allowing for more accurate modeling of the dynamic player interactions. To handle the multivariate player statistics time series, we incorporate a temporal convolution layer, which provides the model with temporal predictive power. We evaluate the performance of our model using real-world sports data, demonstrating its effectiveness in predicting player performance. Furthermore, we explore the potential use of our model in a sports betting context, providing insights into profitable strategies that leverage our predictive power. The proposed method has the potential to advance the state-of-the-art in player performance prediction and to provide valuable insights for sports analytics and betting industries.

Maximin Headway Control of Automated Vehicles for System Optimal Dynamic Traffic Assignment in General Networks

  • Authors: Jinxiao Du, Wei Ma
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2303.16772
  • Pdf link: https://arxiv.org/pdf/2303.16772
  • Abstract
    This study develops the headway control framework in a fully automated road network, as we believe headway of Automated Vehicles (AVs) is another influencing factor to traffic dynamics in addition to conventional vehicle behaviors (e.g. route and departure time choices). Specifically, we aim to search for the optimal time headway between AVs on each link that achieves the network-wide system optimal dynamic traffic assignment (SO-DTA). To this end, the headway-dependent fundamental diagram (HFD) and headway-dependent double queue model (HDQ) are developed to model the effect of dynamic headway on roads, and a dynamic network model is built. It is rigorously proved that the minimum headway could always achieve SO-DTA, yet the optimal headway is non-unique. Motivated by these two findings, this study defines a novel concept of maximin headway, which is the largest headway that still achieves SO-DTA in the network. Mathematical properties regarding maximin headway are analyzed and an efficient solution algorithm is developed. Numerical experiments on both a small and large network verify the effectiveness of the maximin headway control framework as well as the properties of maximin headway. This study sheds light on deriving the desired solution among the non-unique solutions in SO-DTA and provides implications regarding the safety margin of AVs under SO-DTA.

On real and observable realizations of input-output equations

  • Authors: Sebastian Falkensteiner, Dmitrii Pavlov, Rafael Sendra
  • Subjects: Symbolic Computation (cs.SC); Algebraic Geometry (math.AG); Dynamical Systems (math.DS); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2303.16799
  • Pdf link: https://arxiv.org/pdf/2303.16799
  • Abstract
    Given a single algebraic input-output equation, we present a method for finding different representations of the associated system in the form of rational realizations; these are dynamical systems with rational right-hand sides. It has been shown that in the case where the input-output equation is of order one, rational realizations can be computed, if they exist. In this work, we focus first on the existence and actual computation of the so-called observable rational realizations, and secondly on rational realizations with real coefficients. The study of observable realizations allows to find every rational realization of a given first order input-output equation, and the necessary field extensions in this process. We show that for first order input-output equations the existence of a rational realization is equivalent to the existence of an observable rational realization. Moreover, we give a criterion to decide the existence of real rational realizations. The computation of observable and real realizations of first order input-output equations is fully algorithmic. We also present partial results for the case of higher order input-output equations.

Legged Robots for Object Manipulation: A Review

  • Authors: Yifeng Gong, Ge Sun, Aditya Nair, Aditya Bidwai, Raghuram CS, John Grezmak, Guillaume Sartoretti, Kathryn A. Daltorio
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2303.16865
  • Pdf link: https://arxiv.org/pdf/2303.16865
  • Abstract
    Legged robots can have a unique role in manipulating objects in dynamic, human-centric, or otherwise inaccessible environments. Although most legged robotics research to date typically focuses on traversing these challenging environments, many legged platform demonstrations have also included "moving an object" as a way of doing tangible work. Legged robots can be designed to manipulate a particular type of object (e.g., a cardboard box, a soccer ball, or a larger piece of furniture), by themselves or collaboratively. The objective of this review is to collect and learn from these examples, to both organize the work done so far in the community and highlight interesting open avenues for future work. This review categorizes existing works into four main manipulation methods: object interactions without grasping, manipulation with walking legs, dedicated non-locomotive arms, and legged teams. Each method has different design and autonomy features, which are illustrated by available examples in the literature. Based on a few simplifying assumptions, we further provide quantitative comparisons for the range of possible relative sizes of the manipulated object with respect to the robot. Taken together, these examples suggest new directions for research in legged robot manipulation, such as multifunctional limbs, terrain modeling, or learning-based control, to support a number of new deployments in challenging indoor/outdoor scenarios in warehouses/construction sites, preserved natural areas, and especially for home robotics.

End-to-End $n$-ary Relation Extraction for Combination Drug Therapies

  • Authors: Yuhang Jiang, Ramakanth Kavuluru
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2303.16886
  • Pdf link: https://arxiv.org/pdf/2303.16886
  • Abstract
    Combination drug therapies are treatment regimens that involve two or more drugs, administered more commonly for patients with cancer, HIV, malaria, or tuberculosis. Currently there are over 350K articles in PubMed that use the "combination drug therapy" MeSH heading with at least 10K articles published per year over the past two decades. Extracting combination therapies from scientific literature inherently constitutes an $n$-ary relation extraction problem. Unlike in the general $n$-ary setting where $n$ is fixed (e.g., drug-gene-mutation relations where $n=3$), extracting combination therapies is a special setting where $n \geq 2$ is dynamic, depending on each instance. Recently, Tiktinsky et al. (NAACL 2022) introduced a first of its kind dataset, CombDrugExt, for extracting such therapies from literature. Here, we use a sequence-to-sequence style end-to-end extraction method to achieve an F1-Score of $66.7%$ on the CombDrugExt test set for positive (or effective) combinations. This is an absolute $\approx 5%$ F1-score improvement even over the prior best relation classification score with spotted drug entities (hence, not end-to-end). Thus our effort introduces a state-of-the-art first model for end-to-end extraction that is already superior to the best prior non end-to-end model for this task. Our model seamlessly extracts all drug entities and relations in a single pass and is highly suitable for dynamic $n$-ary extraction scenarios.

New submissions for Fri, 21 Apr 23

Keyword: efficient

Evolving Constrained Reinforcement Learning Policy

  • Authors: Chengpeng Hu, Jiyuan Pei, Jialin Liu, Xin Yao
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09869
  • Pdf link: https://arxiv.org/pdf/2304.09869
  • Abstract
    Evolutionary algorithms have been used to evolve a population of actors to generate diverse experiences for training reinforcement learning agents, which helps to tackle the temporal credit assignment problem and improves the exploration efficiency. However, when adapting this approach to address constrained problems, balancing the trade-off between the reward and constraint violation is hard. In this paper, we propose a novel evolutionary constrained reinforcement learning (ECRL) algorithm, which adaptively balances the reward and constraint violation with stochastic ranking, and at the same time, restricts the policy's behaviour by maintaining a set of Lagrange relaxation coefficients with a constraint buffer. Extensive experiments on robotic control benchmarks show that our ECRL achieves outstanding performance compared to state-of-the-art algorithms. Ablation analysis shows the benefits of introducing stochastic ranking and constraint buffer.

GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

  • Authors: Li Zaitang, Pin-Yu Chen, Tsung-Yi Ho
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09875
  • Pdf link: https://arxiv.org/pdf/2304.09875
  • Abstract
    Current studies on adversarial robustness mainly focus on aggregating local robustness results from a set of data samples to evaluate and rank different models. However, the local statistics may not well represent the true global robustness of the underlying unknown data distribution. To address this challenge, this paper makes the first attempt to present a new framework, called GREAT Score , for global robustness evaluation of adversarial perturbation using generative models. Formally, GREAT Score carries the physical meaning of a global statistic capturing a mean certified attack-proof perturbation level over all samples drawn from a generative model. For finite-sample evaluation, we also derive a probabilistic guarantee on the sample complexity and the difference between the sample mean and the true mean. GREAT Score has several advantages: (1) Robustness evaluations using GREAT Score are efficient and scalable to large models, by sparing the need of running adversarial attacks. In particular, we show high correlation and significantly reduced computation cost of GREAT Score when compared to the attack-based model ranking on RobustBench (Croce,et. al. 2021). (2) The use of generative models facilitates the approximation of the unknown data distribution. In our ablation study with different generative adversarial networks (GANs), we observe consistency between global robustness evaluation and the quality of GANs. (3) GREAT Score can be used for remote auditing of privacy-sensitive black-box models, as demonstrated by our robustness evaluation on several online facial recognition services.

The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages

  • Authors: Vesa Akerman, David Baines, Damien Daspit, Ulf Hermjakob, Taeho Jang, Colin Leong, Michael Martin, Joel Mathew, Jonathan Robie, Marcus Schwarting
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09919
  • Pdf link: https://arxiv.org/pdf/2304.09919
  • Abstract
    Efficiently and accurately translating a corpus into a low-resource language remains a challenge, regardless of the strategies employed, whether manual, automated, or a combination of the two. Many Christian organizations are dedicated to the task of translating the Holy Bible into languages that lack a modern translation. Bible translation (BT) work is currently underway for over 3000 extremely low resource languages. We introduce the eBible corpus: a dataset containing 1009 translations of portions of the Bible with data in 833 different languages across 75 language families. In addition to a BT benchmarking dataset, we introduce model performance benchmarks built on the No Language Left Behind (NLLB) neural machine translation (NMT) models. Finally, we describe several problems specific to the domain of BT and consider how the established data and model benchmarks might be used for future translation efforts. For a BT task trained with NLLB, Austronesian and Trans-New Guinea language families achieve 35.1 and 31.6 BLEU scores respectively, which spurs future innovations for NMT for low-resource languages in Papua New Guinea.

A robust and interpretable deep learning framework for multi-modal registration via keypoints

  • Authors: Alan Q. Wang, Evan M. Yu, Adrian V. Dalca, Mert R. Sabuncu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09941
  • Pdf link: https://arxiv.org/pdf/2304.09941
  • Abstract
    We present KeyMorph, a deep learning-based image registration framework that relies on automatically detecting corresponding keypoints. State-of-the-art deep learning methods for registration often are not robust to large misalignments, are not interpretable, and do not incorporate the symmetries of the problem. In addition, most models produce only a single prediction at test-time. Our core insight which addresses these shortcomings is that corresponding keypoints between images can be used to obtain the optimal transformation via a differentiable closed-form expression. We use this observation to drive the end-to-end learning of keypoints tailored for the registration task, and without knowledge of ground-truth keypoints. This framework not only leads to substantially more robust registration but also yields better interpretability, since the keypoints reveal which parts of the image are driving the final alignment. Moreover, KeyMorph can be designed to be equivariant under image translations and/or symmetric with respect to the input image ordering. Finally, we show how multiple deformation fields can be computed efficiently and in closed-form at test time corresponding to different transformation variants. We demonstrate the proposed framework in solving 3D affine and spline-based registration of multi-modal brain MRI scans. In particular, we show registration accuracy that surpasses current state-of-the-art methods, especially in the context of large displacements. Our code is available at https://github.com/evanmy/keymorph.

Baugh-Wooley Multiplication for the RISCV Processor

  • Authors: Franc Grootjen, Nikolai Schauer
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.09952
  • Pdf link: https://arxiv.org/pdf/2304.09952
  • Abstract
    This article describes an efficient way to implement the multiplication instructions for a RISCV processor. Instead of using three predefined IP blocks for signed, unsigned and mixed multiplication, this article presents a novel extension to the Baugh-Wooley multiplication algorithm which reduces area and power consumption with roughly a factor three.

MasakhaNEWS: News Topic Classification for African languages

  • Authors: David Ifeoluwa Adelani, Marek Masiak, Israel Abebe Azime, Jesujoba Oluwadara Alabi, Atnafu Lambebo Tonja, Christine Mwase, Odunayo Ogundepo, Bonaventure F. P. Dossou, Akintunde Oladipo, Doreen Nixdorf, Chris Chinenye Emezue, Sana Sabah al-azzawi, Blessing K. Sibanda, Davis David, Lolwethu Ndolela, Jonathan Mukiibi, Tunde Oluwaseyi Ajayi, Tatiana Moteu Ngoli, Brian Odhiambo, Abraham Toluwase Owodunni, Nnaemeka C. Obiefuna, Shamsuddeen Hassan Muhammad, Saheed Salahudeen Abdullahi, Mesay Gemeda Yigezu, Tajuddeen Gwadabe, Idris Abdulmumin, Mahlet Taye Bame, Oluwabusayo Olufunke Awoyomi, Iyanuoluwa Shode, Tolulope Anu Adelani, Habiba Abdulganiy Kailani, Abdul-Hakeem Omotayo, Adetola Adeeko, Afolabi Abeeb, Anuoluwapo Aremu, Olanrewaju Samuel, Clemencia Siro, Wangari Kimotho, Onyekachi Raphael Ogbu, et al. (23 additional authors not shown)
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.09972
  • Pdf link: https://arxiv.org/pdf/2304.09972
  • Abstract
    African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.

Equilibrium-Invariant Embedding, Metric Space, and Fundamental Set of $2\times2$ Normal-Form Games

  • Authors: Luke Marris, Ian Gemp, Georgios Piliouras
  • Subjects: Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Theoretical Economics (econ.TH); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.09978
  • Pdf link: https://arxiv.org/pdf/2304.09978
  • Abstract
    Equilibrium solution concepts of normal-form games, such as Nash equilibria, correlated equilibria, and coarse correlated equilibria, describe the joint strategy profiles from which no player has incentive to unilaterally deviate. They are widely studied in game theory, economics, and multiagent systems. Equilibrium concepts are invariant under certain transforms of the payoffs. We define an equilibrium-inspired distance metric for the space of all normal-form games and uncover a distance-preserving equilibrium-invariant embedding. Furthermore, we propose an additional transform which defines a better-response-invariant distance metric and embedding. To demonstrate these metric spaces we study $2\times2$ games. The equilibrium-invariant embedding of $2\times2$ games has an efficient two variable parameterization (a reduction from eight), where each variable geometrically describes an angle on a unit circle. Interesting properties can be spatially inferred from the embedding, including: equilibrium support, cycles, competition, coordination, distances, best-responses, and symmetries. The best-response-invariant embedding of $2\times2$ games, after considering symmetries, rediscovers a set of 15 games, and their respective equivalence classes. We propose that this set of game classes is fundamental and captures all possible interesting strategic interactions in $2\times2$ games. We introduce a directed graph representation and name for each class. Finally, we leverage the tools developed for $2\times2$ games to develop game theoretic visualizations of large normal-form and extensive-form games that aim to fingerprint the strategic interactions that occur within.

Tetra-NeRF: Representing Neural Radiance Fields Using Tetrahedra

  • Authors: Jonas Kulhanek, Torsten Sattler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09987
  • Pdf link: https://arxiv.org/pdf/2304.09987
  • Abstract
    Neural Radiance Fields (NeRFs) are a very recent and very popular approach for the problems of novel view synthesis and 3D reconstruction. A popular scene representation used by NeRFs is to combine a uniform, voxel-based subdivision of the scene with an MLP. Based on the observation that a (sparse) point cloud of the scene is often available, this paper proposes to use an adaptive representation based on tetrahedra and a Delaunay representation instead of the uniform subdivision or point-based representations. We show that such a representation enables efficient training and leads to state-of-the-art results. Our approach elegantly combines concepts from 3D geometry processing, triangle-based rendering, and modern neural radiance fields. Compared to voxel-based representations, ours provides more detail around parts of the scene likely to be close to the surface. Compared to point-based representations, our approach achieves better performance.

AI-coherent data-driven forecasting model for a combined cycle power plant

  • Authors: Mir Sayed Shah Danish, Zahra Nazari, Tomonobu Senjyu
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10009
  • Pdf link: https://arxiv.org/pdf/2304.10009
  • Abstract
    This study investigates the transformation of energy models to align with machine learning requirements as a promising tool for optimizing the operation of combined cycle power plants (CCPPs). By modeling energy production as a function of environmental and control variables, this methodology offers an innovative way to achieve energy-efficient power generation in the context of the data-driven application. This study focuses on developing a thorough AI-coherent modeling approach for CCPP optimization, preferring an interdisciplinary perspective and coming up with a comprehensive, insightful analysis. The proposed numerical model using Broyden Fletcher Goldfarb Shanno (BFGS) algorithm enhances efficiency by simulating various operating scenarios and adjusting optimal parameters, leading to a high yield power generation of 2.23% increase from 452 MW to 462.1 MW by optimizing the environmental factors. This study deals with data-driven modeling based on historical data to make predictions without prior knowledge of the system's parameter, demonstrating several merits in identifying patterns that can be difficult for human analysts to detect, high accuracy when trained on large datasets, and the potential to improve over time with new data. The proposed modeling approach and methodology can be expanded as a valuable tool for forecasting and decision-making in complex energy systems.

Dynablox: Real-time Detection of Diverse Dynamic Objects in Complex Environments

  • Authors: Lukas Schmid, Olov Andersson, Aurelio Sulser, Patrick Pfreundschuh, Roland Siegwart
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.10049
  • Pdf link: https://arxiv.org/pdf/2304.10049
  • Abstract
    Real-time detection of moving objects is an essential capability for robots acting autonomously in dynamic environments. We thus propose Dynablox, a novel online mapping-based approach for robust moving object detection in complex environments. The central idea of our approach is to incrementally estimate high confidence free-space areas by modeling and accounting for sensing, state estimation, and mapping limitations during online robot operation. The spatio-temporally conservative free space estimate enables robust detection of moving objects without making any assumptions on the appearance of objects or environments. This allows deployment in complex scenes such as multi-storied buildings or staircases, and for diverse moving objects such as people carrying various items, doors swinging or even balls rolling around. We thoroughly evaluate our approach on real-world data sets, achieving 86% IoU at 17 FPS in typical robotic settings. The method outperforms a recent appearance-based classifier and approaches the performance of offline methods. We demonstrate its generality on a novel data set with rare moving objects in complex environments. We make our efficient implementation and the novel data set available as open-source.

Maximize the Long-term Average Revenue of Network Slice Provider via Admission Control Among Heterogeneous Slices

  • Authors: Miao Dai, Gang Sun, Hongfang Yu, Dusit Niyato
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.10057
  • Pdf link: https://arxiv.org/pdf/2304.10057
  • Abstract
    Network slicing endows 5G/B5G with differentiated and customized capabilities to cope with the proliferation of diversified services, whereas limited physical network resources may not be able to support all service requests. Slice admission control is regarded as an essential means to ensure service quality and service isolation when the network is under burden. Herein, the scenario where rational tenants coexist with partially competitive network slice providers is adopted. We aim to maximize the long-term average revenue of the network operators through slice admission control, with the feasibility of multidimensional resource requirements, the priority differences among heterogeneous slices, and the admission fairness within each slice taken into account concurrently. We prove the intractability of our problem by a reduction from the Multidimensional Knapsack Problem (MKP), and propose a two-stage algorithm called MPSAC to make a sub-optimal solution efficiently. The principle of MPSAC is to split the original problem into two sub-problems; inter-slice decision-making and intra-slice quota allocation, which are solved using a heuristic method and a tailored auction mechanism respectively. Extensive simulations are carried out to demonstrate the efficacy of our algorithm, the results show that the long-term average revenue of ours is at least 9.6% higher than comparisons while maintaining better priority relations and achieving improved fairness performance.

High-Performance and Flexible Parallel Algorithms for Semisort and Related Problems

  • Authors: Xiaojun Dong, Yunshu Wu, Zhongqi Wang, Laxman Dhulipala, Yan Gu, Yihan Sun
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.10078
  • Pdf link: https://arxiv.org/pdf/2304.10078
  • Abstract
    Semisort is a fundamental algorithmic primitive widely used in the design and analysis of efficient parallel algorithms. It takes input as an array of records and a function extracting a \emph{key} per record, and reorders them so that records with equal keys are contiguous. Since many applications only require collecting equal values, but not fully sorting the input, semisort is broadly applicable, e.g., in string algorithms, graph analytics, and geometry processing, among many other domains. However, despite dozens of recent papers that use semisort in their theoretical analysis and the existence of an asymptotically optimal parallel semisort algorithm, most implementations of these parallel algorithms choose to implement semisort by using comparison or integer sorting in practice, due to potential performance issues in existing semisort implementations. In this paper, we revisit the semisort problem, with the goal of achieving a high-performance parallel semisort implementation with a flexible interface. Our approach can easily extend to two related problems, \emph{histogram} and \emph{collect-reduce}. Our algorithms achieve strong speedups in practice, and importantly, outperform state-of-the-art parallel sorting and semisorting methods for almost all settings we tested, with varying input sizes, distribution, and key types. We also test two important applications with real-world data, and show that our algorithms improve the performance over existing approaches. We believe that many other parallel algorithm implementations can be accelerated using our results.

Transmit Power Minimization for STAR-RIS Empowered Symbiotic Radio Communications

  • Authors: Chao Zhou, Bin Lyu, Youhong Feng, Dinh Thai Hoang
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.10095
  • Pdf link: https://arxiv.org/pdf/2304.10095
  • Abstract
    In this paper, we propose a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) empowered transmission scheme for symbiotic radio (SR) systems to make more flexibility for network deployment and enhance system performance. The STAR-RIS is utilized to not only beam the primary signals from the base station (BS) towards multiple primary users on the same side of the STAR-RIS, but also achieve the secondary transmission to the secondary users on another side. We consider both the broadcasting signal model and unicasting signal model at the BS. For each model, we aim for minimizing the transmit power of the BS by designing the active beamforming and simultaneous reflection and transmission coefficients under the practical phase correlation constraint. To address the challenge of solving the formulated problem, we propose a block coordinate descent based algorithm with the semidefinite relaxation, penalty dual decomposition and successive convex approximation methods, which decomposes the original problem into one sub-problem about active beamforming and the other sub-problem about simultaneous reflection and transmission coefficients, and iteratively solve them until the convergence is achieved. Numerical results indicate that the proposed scheme can reduce up to 150.6% transmit power compared to the backscattering device enabled scheme.

Two-Memory Reinforcement Learning

  • Authors: Zhao Yang, Thomas. M. Moerland, Mike Preuss, Aske Plaat
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10098
  • Pdf link: https://arxiv.org/pdf/2304.10098
  • Abstract
    While deep reinforcement learning has shown important empirical success, it tends to learn relatively slow due to slow propagation of rewards information and slow update of parametric neural networks. Non-parametric episodic memory, on the other hand, provides a faster learning alternative that does not require representation learning and uses maximum episodic return as state-action values for action selection. Episodic memory and reinforcement learning both have their own strengths and weaknesses. Notably, humans can leverage multiple memory systems concurrently during learning and benefit from all of them. In this work, we propose a method called Two-Memory reinforcement learning agent (2M) that combines episodic memory and reinforcement learning to distill both of their strengths. The 2M agent exploits the speed of the episodic memory part and the optimality and the generalization capacity of the reinforcement learning part to complement each other. Our experiments demonstrate that the 2M agent is more data efficient and outperforms both pure episodic memory and pure reinforcement learning, as well as a state-of-the-art memory-augmented RL agent. Moreover, the proposed approach provides a general framework that can be used to combine any episodic memory agent with other off-policy reinforcement learning algorithms.

Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One

  • Authors: Hongyuan Zhang, Yanan Zhu, Xuelong Li
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10126
  • Pdf link: https://arxiv.org/pdf/2304.10126
  • Abstract
    Graph neural networks (GNN) suffer from severe inefficiency. It is mainly caused by the exponential growth of node dependency with the increase of layers. It extremely limits the application of stochastic optimization algorithms so that the training of GNN is usually time-consuming. To address this problem, we propose to decouple a multi-layer GNN as multiple simple modules for more efficient training, which is comprised of classical forward training (FT)and designed backward training (BT). Under the proposed framework, each module can be trained efficiently in FT by stochastic algorithms without distortion of graph information owing to its simplicity. To avoid the only unidirectional information delivery of FT and sufficiently train shallow modules with the deeper ones, we develop a backward training mechanism that makes the former modules perceive the latter modules. The backward training introduces the reversed information delivery into the decoupled modules as well as the forward information delivery. To investigate how the decoupling and greedy training affect the representational capacity, we theoretically prove that the error produced by linear modules will not accumulate on unsupervised tasks in most cases. The theoretical and experimental results show that the proposed framework is highly efficient with reasonable performance.

Securing Semantic Communications with Physical-layer Semantic Encryption and Obfuscation

  • Authors: Qi Qin, Yankai Rong, Guoshun Nan, Shaokang Wu, Xuefei Zhang, Qimei Cui, Xiaofeng Tao
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.10147
  • Pdf link: https://arxiv.org/pdf/2304.10147
  • Abstract
    Deep learning based semantic communication(DLSC) systems have shown great potential of making wireless networks significantly more efficient by only transmitting the semantics of the data. However, the open nature of wireless channel and fragileness of neural models cause DLSC systems extremely vulnerable to various attacks. Traditional wireless physical layer key (PLK), which relies on reciprocal channel and randomness characteristics between two legitimate users, holds the promise of securing DLSC. The main challenge lies in generating secret keys in the static environment with ultra-low/zero rate. Different from prior efforts that use relays or reconfigurable intelligent surfaces (RIS) to manipulate wireless channels, this paper proposes a novel physical layer semantic encryption scheme by exploring the randomness of bilingual evaluation understudy (BLEU) scores in the field of machine translation, and additionally presents a novel semantic obfuscation mechanism to provide further physical layer protections. Specifically, 1) we calculate the BLEU scores and corresponding weights of the DLSC system. Then, we generate semantic keys (SKey) by feeding the weighted sum of the scores into a hash function. 2) Equipped with the SKey, our proposed subcarrier obfuscation is able to further secure semantic communications with a dynamic dummy data insertion mechanism. Experiments show the effectiveness of our method, especially in the static wireless environment.

Is ChatGPT a Good Recommender? A Preliminary Study

  • Authors: Junling Liu, Chao Liu, Renjie Lv, Kang Zhou, Yan Zhang
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.10149
  • Pdf link: https://arxiv.org/pdf/2304.10149
  • Abstract
    Recommendation systems have witnessed significant advancements and have been widely used over the past decades. However, most traditional recommendation methods are task-specific and therefore lack efficient generalization ability. Recently, the emergence of ChatGPT has significantly advanced NLP tasks by enhancing the capabilities of conversational models. Nonetheless, the application of ChatGPT in the recommendation domain has not been thoroughly investigated. In this paper, we employ ChatGPT as a general-purpose recommendation model to explore its potential for transferring extensive linguistic and world knowledge acquired from large-scale corpora to recommendation scenarios. Specifically, we design a set of prompts and evaluate ChatGPT's performance on five recommendation scenarios. Unlike traditional recommendation methods, we do not fine-tune ChatGPT during the entire evaluation process, relying only on the prompts themselves to convert recommendation tasks into natural language tasks. Further, we explore the use of few-shot prompting to inject interaction information that contains user potential interest to help ChatGPT better understand user needs and interests. Comprehensive experimental results on Amazon Beauty dataset show that ChatGPT has achieved promising results in certain tasks and is capable of reaching the baseline level in others. We conduct human evaluations on two explainability-oriented tasks to more accurately evaluate the quality of contents generated by different models. And the human evaluations show ChatGPT can truly understand the provided information and generate clearer and more reasonable results. We hope that our study can inspire researchers to further explore the potential of language models like ChatGPT to improve recommendation performance and contribute to the advancement of the recommendation systems field.

Automated Dynamic Bayesian Networks for Predicting Acute Kidney Injury Before Onset

  • Authors: David Gordon, Panayiotis Petousis, Anders O. Garlid, Keith Norris, Katherine Tuttle, Susanne B. Nicholas, Alex A.T. Bui (on behalf of CURE-CKD)
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10175
  • Pdf link: https://arxiv.org/pdf/2304.10175
  • Abstract
    Several algorithms for learning the structure of dynamic Bayesian networks (DBNs) require an a priori ordering of variables, which influences the determined graph topology. However, it is often unclear how to determine this order if feature importance is unknown, especially as an exhaustive search is usually impractical. In this paper, we introduce Ranking Approaches for Unknown Structures (RAUS), an automated framework to systematically inform variable ordering and learn networks end-to-end. RAUS leverages existing statistical methods (Cramers V, chi-squared test, and information gain) to compare variable ordering, resultant generated network topologies, and DBN performance. RAUS enables end-users with limited DBN expertise to implement models via command line interface. We evaluate RAUS on the task of predicting impending acute kidney injury (AKI) from inpatient clinical laboratory data. Longitudinal observations from 67,460 patients were collected from our electronic health record (EHR) and Kidney Disease Improving Global Outcomes (KDIGO) criteria were then applied to define AKI events. RAUS learns multiple DBNs simultaneously to predict a future AKI event at different time points (i.e., 24-, 48-, 72-hours in advance of AKI). We also compared the results of the learned AKI prediction models and variable orderings to baseline techniques (logistic regression, random forests, and extreme gradient boosting). The DBNs generated by RAUS achieved 73-83% area under the receiver operating characteristic curve (AUCROC) within 24-hours before AKI; and 71-79% AUCROC within 48-hours before AKI of any stage in a 7-day observation window. Insights from this automated framework can help efficiently implement and interpret DBNs for clinical decision support. The source code for RAUS is available in GitHub at https://github.com/dgrdn08/RAUS .

Robust Deep Reinforcement Learning Scheduling via Weight Anchoring

  • Authors: Steffen Gracla, Edgar Beck, Carsten Bockelmann, Armin Dekorsy
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10176
  • Pdf link: https://arxiv.org/pdf/2304.10176
  • Abstract
    Questions remain on the robustness of data-driven learning methods when crossing the gap from simulation to reality. We utilize weight anchoring, a method known from continual learning, to cultivate and fixate desired behavior in Neural Networks. Weight anchoring may be used to find a solution to a learning problem that is nearby the solution of another learning problem. Thereby, learning can be carried out in optimal environments without neglecting or unlearning desired behavior. We demonstrate this approach on the example of learning mixed QoS-efficient discrete resource scheduling with infrequent priority messages. Results show that this method provides performance comparable to the state of the art of augmenting a simulation environment, alongside significantly increased robustness and steerability.

Regularizing Second-Order Influences for Continual Learning

  • Authors: Zhicheng Sun, Yadong Mu, Gang Hua
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10177
  • Pdf link: https://arxiv.org/pdf/2304.10177
  • Abstract
    Continual learning aims to learn on non-stationary data streams without catastrophically forgetting previous knowledge. Prevalent replay-based methods address this challenge by rehearsing on a small buffer holding the seen data, for which a delicate sample selection strategy is required. However, existing selection schemes typically seek only to maximize the utility of the ongoing selection, overlooking the interference between successive rounds of selection. Motivated by this, we dissect the interaction of sequential selection steps within a framework built on influence functions. We manage to identify a new class of second-order influences that will gradually amplify incidental bias in the replay buffer and compromise the selection process. To regularize the second-order effects, a novel selection objective is proposed, which also has clear connections to two widely adopted criteria. Furthermore, we present an efficient implementation for optimizing the proposed criterion. Experiments on multiple continual learning benchmarks demonstrate the advantage of our approach over state-of-the-art methods. Code is available at https://github.com/feifeiobama/InfluenceCL.

Efficient Uncertainty Estimation in Spiking Neural Networks via MC-dropout

  • Authors: Tao Sun, Bojian Yin, Sander Bohte
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10191
  • Pdf link: https://arxiv.org/pdf/2304.10191
  • Abstract
    Spiking neural networks (SNNs) have gained attention as models of sparse and event-driven communication of biological neurons, and as such have shown increasing promise for energy-efficient applications in neuromorphic hardware. As with classical artificial neural networks (ANNs), predictive uncertainties are important for decision making in high-stakes applications, such as autonomous vehicles, medical diagnosis, and high frequency trading. Yet, discussion of uncertainty estimation in SNNs is limited, and approaches for uncertainty estimation in artificial neural networks (ANNs) are not directly applicable to SNNs. Here, we propose an efficient Monte Carlo(MC)-dropout based approach for uncertainty estimation in SNNs. Our approach exploits the time-step mechanism of SNNs to enable MC-dropout in a computationally efficient manner, without introducing significant overheads during training and inference while demonstrating high accuracy and uncertainty quality.

Selective and Collaborative Influence Function for Efficient Recommendation Unlearning

  • Authors: Yuyuan Li, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Biao Gong, Jun Wang
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.10199
  • Pdf link: https://arxiv.org/pdf/2304.10199
  • Abstract
    Recent regulations on the Right to be Forgotten have greatly influenced the way of running a recommender system, because users now have the right to withdraw their private data. Besides simply deleting the target data in the database, unlearning the associated data lineage e.g., the learned personal features and preferences in the model, is also necessary for data withdrawal. Existing unlearning methods are mainly devised for generalized machine learning models in classification tasks. In this paper, we first identify two main disadvantages of directly applying existing unlearning methods in the context of recommendation, i.e., (i) unsatisfactory efficiency for large-scale recommendation models and (ii) destruction of collaboration across users and items. To tackle the above issues, we propose an extra-efficient recommendation unlearning method based on Selective and Collaborative Influence Function (SCIF). Our proposed method can (i) avoid any kind of retraining which is computationally prohibitive for large-scale systems, (ii) further enhance efficiency by selectively updating user embedding and (iii) preserve the collaboration across the remaining users and items. Furthermore, in order to evaluate the unlearning completeness, we define a Membership Inference Oracle (MIO), which can justify whether the unlearned data points were in the training set of the model, i.e., whether a data point was completely unlearned. Extensive experiments on two benchmark datasets demonstrate that our proposed method can not only greatly enhance unlearning efficiency, but also achieve adequate unlearning completeness. More importantly, our proposed method outperforms the state-of-the-art unlearning method regarding comprehensive recommendation metrics.

Spiking-Fer: Spiking Neural Network for Facial Expression Recognition With Event Cameras

  • Authors: Sami Barchid, Benjamin Allaert, Amel Aissaoui, José Mennesson, Chaabane Djéraba
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10211
  • Pdf link: https://arxiv.org/pdf/2304.10211
  • Abstract
    Facial Expression Recognition (FER) is an active research domain that has shown great progress recently, notably thanks to the use of large deep learning models. However, such approaches are particularly energy intensive, which makes their deployment difficult for edge devices. To address this issue, Spiking Neural Networks (SNNs) coupled with event cameras are a promising alternative, capable of processing sparse and asynchronous events with lower energy consumption. In this paper, we establish the first use of event cameras for FER, named "Event-based FER", and propose the first related benchmarks by converting popular video FER datasets to event streams. To deal with this new task, we propose "Spiking-FER", a deep convolutional SNN model, and compare it against a similar Artificial Neural Network (ANN). Experiments show that the proposed approach achieves comparable performance to the ANN architecture, while consuming less energy by orders of magnitude (up to 65.39x). In addition, an experimental study of various event-based data augmentation techniques is performed to provide insights into the efficient transformations specific to event-based FER.

Dynamic Security Region of Natural Gas Systems in Integrated Electricity-Gas Systems

  • Authors: Han Gao, Peiyao Zhao, Zhengshuo Li
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10215
  • Pdf link: https://arxiv.org/pdf/2304.10215
  • Abstract
    In an integrated electricity-gas system (IEGS), the tight coupling of power and natural gas systems is embodied by frequent changes in gas withdrawal from gas-fired units to provide regulation services for the power system to handle uncertainty, which may in turn endanger the secure operation of the natural gas system and ultimately affect the safety of the whole IEGS. Hence, it is necessary to accurately and efficiently evaluate the dynamic security region (DSR) of the natural gas system in the IEGS by considering the real-time dynamic characteristics of natural gas systems, which are not satisfactorily handled in state-of-the-art works. To bridge this gap, this paper first conceptionally verifies the necessity of the DSR and establishes its mathematical model. Then, a dimensionality reduction method is proposed for the efficient solution and visualization of the high-dimensional DSR evaluation model. A fast evaluation (FE) algorithm is developed to address the difficulties of the nonconvex dynamic constraints in the reduced DSR model. Finally, the necessity and notable advantages of the proposed DSR model and FE are verified based on small and relatively large test systems in comparison with common security region models and algorithms. To the best of our knowledge, this is the first paper that comprehensively presents models and efficient algorithms regarding the DSR of natural gas systems in an IEGS.

An Analysis of the Completion Time of the BB84 Protocol

  • Authors: Sounak Kar, Jean-Yves Le Boudec
  • Subjects: Performance (cs.PF); Quantum Physics (quant-ph)
  • Arxiv link: https://arxiv.org/abs/2304.10218
  • Pdf link: https://arxiv.org/pdf/2304.10218
  • Abstract
    The BB84 QKD protocol is based on the idea that the sender and the receiver can reconcile a certain fraction of the teleported qubits to detect eavesdropping or noise and decode the rest to use as a private key. Under the present hardware infrastructure, decoherence of quantum states poses a significant challenge to performing perfect or efficient teleportation, meaning that a teleportation-based protocol must be run multiple times to observe success. Thus, performance analyses of such protocols usually consider the completion time, i.e., the time until success, rather than the duration of a single attempt. Moreover, due to decoherence, the success of an attempt is in general dependent on the duration of individual phases of that attempt, as quantum states must wait in memory while the success or failure of a generation phase is communicated to the relevant parties. In this work, we do a performance analysis of the completion time of the BB84 protocol in a setting where the sender and the receiver are connected via a single quantum repeater and the only quantum channel between them does not see any adversarial attack. Assuming certain distributional forms for the generation and communication phases of teleportation, we provide a method to compute the MGF of the completion time and subsequently derive an estimate of the CDF and a bound on the tail probability. This result helps us gauge the (tail) behaviour of the completion time in terms of the parameters characterising the elementary phases of teleportation, without having to run the protocol multiple times. We also provide an efficient simulation scheme to generate the completion time, which relies on expressing the completion time in terms of aggregated teleportation times. We numerically compare our approach with a full-scale simulation and observe good agreement between them.

PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces

  • Authors: Shuhei Watanabe, Archit Bansal, Frank Hutter
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.10255
  • Pdf link: https://arxiv.org/pdf/2304.10255
  • Abstract
    The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA). However, the original f-ANOVA formulation is inapplicable to the subspaces most relevant to algorithm designers, such as those defined by top performance. To overcome this problem, we derive a novel formulation of f-ANOVA for arbitrary subspaces and propose an algorithm that uses Pearson divergence (PED) to enable a closed-form computation of HPI. We demonstrate that this new algorithm, dubbed PED-ANOVA, is able to successfully identify important HPs in different subspaces while also being extremely computationally efficient.

PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image

  • Authors: Jianhui Li, Jianmin Li, Haoji Zhang, Shilong Liu, Zhengyi Wang, Zihao Xiao, Kaiwen Zheng, Jun Zhu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10263
  • Pdf link: https://arxiv.org/pdf/2304.10263
  • Abstract
    We study the 3D-aware image attribute editing problem in this paper, which has wide applications in practice. Recent methods solved the problem by training a shared encoder to map images into a 3D generator's latent space or by per-image latent code optimization and then edited images in the latent space. Despite their promising results near the input view, they still suffer from the 3D inconsistency of produced images at large camera poses and imprecise image attribute editing, like affecting unspecified attributes during editing. For more efficient image inversion, we train a shared encoder for all images. To alleviate 3D inconsistency at large camera poses, we propose two novel methods, an alternating training scheme and a multi-view identity loss, to maintain 3D consistency and subject identity. As for imprecise image editing, we attribute the problem to the gap between the latent space of real images and that of generated images. We compare the latent space and inversion manifold of GAN models and demonstrate that editing in the inversion manifold can achieve better results in both quantitative and qualitative evaluations. Extensive experiments show that our method produces more 3D consistent images and achieves more precise image editing than previous work. Source code and pretrained models can be found on our project page: https://mybabyyh.github.io/Preim3D/

Robust nonlinear set-point control with reinforcement learning

  • Authors: Ruoqi Zhang, Per Mattsson, Torbjörn Wigren
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10277
  • Pdf link: https://arxiv.org/pdf/2304.10277
  • Abstract
    There has recently been an increased interest in reinforcement learning for nonlinear control problems. However standard reinforcement learning algorithms can often struggle even on seemingly simple set-point control problems. This paper argues that three ideas can improve reinforcement learning methods even for highly nonlinear set-point control problems: 1) Make use of a prior feedback controller to aid amplitude exploration. 2) Use integrated errors. 3) Train on model ensembles. Together these ideas lead to more efficient training, and a trained set-point controller that is more robust to modelling errors and thus can be directly deployed to real-world nonlinear systems. The claim is supported by experiments with a real-world nonlinear cascaded tank process and a simulated strongly nonlinear pH-control system.

A baseline on continual learning methods for video action recognition

  • Authors: Giulia Castagnolo, Concetto Spampinato, Francesco Rundo, Daniela Giordano, Simone Palazzo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10335
  • Pdf link: https://arxiv.org/pdf/2304.10335
  • Abstract
    Continual learning has recently attracted attention from the research community, as it aims to solve long-standing limitations of classic supervisedly-trained models. However, most research on this subject has tackled continual learning in simple image classification scenarios. In this paper, we present a benchmark of state-of-the-art continual learning methods on video action recognition. Besides the increased complexity due to the temporal dimension, the video setting imposes stronger requirements on computing resources for top-performing rehearsal methods. To counteract the increased memory requirements, we present two method-agnostic variants for rehearsal methods, exploiting measures of either model confidence or data information to select memorable samples. Our experiments show that, as expected from the literature, rehearsal methods outperform other approaches; moreover, the proposed memory-efficient variants are shown to be effective at retaining a certain level of performance with a smaller buffer size.

Engel's theorem in Mathlib

  • Authors: Oliver Nash
  • Subjects: Logic in Computer Science (cs.LO); Representation Theory (math.RT)
  • Arxiv link: https://arxiv.org/abs/2304.10424
  • Pdf link: https://arxiv.org/pdf/2304.10424
  • Abstract
    We discuss the theory of Lie algebras in Lean's Mathlib library. Using nilpotency as the theme, we outline a computer formalisation of Engel's theorem and an application to root space theory. We emphasise that all arguments work with coefficients in any commutative ring.

GPT-NER: Named Entity Recognition via Large Language Models

  • Authors: Shuhe Wang, Xiaofei Sun, Xiaoya Li, Rongbin Ouyang, Fei Wu, Tianwei Zhang, Jiwei Li, Guoyin Wang
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.10428
  • Pdf link: https://arxiv.org/pdf/2304.10428
  • Abstract
    Despite the fact that large-scale Language Models (LLM) have achieved SOTA performances on a variety of NLP tasks, its performance on NER is still significantly below supervised baselines. This is due to the gap between the two tasks the NER and LLMs: the former is a sequence labeling task in nature while the latter is a text-generation model. In this paper, we propose GPT-NER to resolve this issue. GPT-NER bridges the gap by transforming the sequence labeling task to a generation task that can be easily adapted by LLMs e.g., the task of finding location entities in the input text "Columbus is a city" is transformed to generate the text sequence "@@columbus## is a city", where special tokens @@## marks the entity to extract. To efficiently address the "hallucination" issue of LLMs, where LLMs have a strong inclination to over-confidently label NULL inputs as entities, we propose a self-verification strategy by prompting LLMs to ask itself whether the extracted entities belong to a labeled entity tag. We conduct experiments on five widely adopted NER datasets, and GPT-NER achieves comparable performances to fully supervised baselines, which is the first time as far as we are concerned. More importantly, we find that GPT-NER exhibits a greater ability in the low-resource and few-shot setups, when the amount of training data is extremely scarce, GPT-NER performs significantly better than supervised models. This demonstrates the capabilities of GPT-NER in real-world NER applications where the number of labeled examples is limited.

Securing Neural Networks with Knapsack Optimization

  • Authors: Yakir Gorski, Shai Avidan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10442
  • Pdf link: https://arxiv.org/pdf/2304.10442
  • Abstract
    Deep learning inference brings together the data and the Convolutional Neural Network (CNN). This is problematic in case the user wants to preserve the privacy of the data and the service provider does not want to reveal the weights of his CNN. Secure Inference allows the two parties to engage in a protocol that preserves their respective privacy concerns, while revealing only the inference result to the user. This is known as Multi-Party Computation (MPC). A major bottleneck of MPC algorithms is communication, as the parties must send data back and forth. The linear component of a CNN (i.e. convolutions) can be done efficiently with minimal communication, but the non-linear part (i.e., ReLU) requires the bulk of communication bandwidth. We propose two ways to accelerate Secure Inference. The first is based on the observation that the ReLU outcome of many convolutions is highly correlated. Therefore, we replace the per pixel ReLU operation by a ReLU operation per patch. Each layer in the network will benefit from a patch of a different size and we devise an algorithm to choose the optimal set of patch sizes through a novel reduction of the problem to a knapsack problem. The second way to accelerate Secure Inference is based on cutting the number of bit comparisons required for a secure ReLU operation. We demonstrate the cumulative effect of these tools in the semi-honest secure 3-party setting for four problems: Classifying ImageNet using ResNet50 backbone, classifying CIFAR100 using ResNet18 backbone, semantic segmentation of ADE20K using MobileNetV2 backbone and semantic segmentation of Pascal VOC 2012 using ResNet50 backbone. Our source code is publicly available: $\href{https://github.com/yg320/secure_inference}{\text{https://github.com/yg320/secure_inference}}$

Angle based dynamic learning rate for gradient descent

  • Authors: Neel Mishra, Pawan Kumar
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10457
  • Pdf link: https://arxiv.org/pdf/2304.10457
  • Abstract
    In our work, we propose a novel yet simple approach to obtain an adaptive learning rate for gradient-based descent methods on classification tasks. Instead of the traditional approach of selecting adaptive learning rates via the decayed expectation of gradient-based terms, we use the angle between the current gradient and the new gradient: this new gradient is computed from the direction orthogonal to the current gradient, which further helps us in determining a better adaptive learning rate based on angle history, thereby, leading to relatively better accuracy compared to the existing state-of-the-art optimizers. On a wide variety of benchmark datasets with prominent image classification architectures such as ResNet, DenseNet, EfficientNet, and VGG, we find that our method leads to the highest accuracy in most of the datasets. Moreover, we prove that our method is convergent.

Reducing Aggregate Electric Vehicle Battery Capacity through Sharing

  • Authors: Polina Alexeenko, Vasileios Charisopoulos
  • Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.10461
  • Pdf link: https://arxiv.org/pdf/2304.10461
  • Abstract
    Meeting growing demand for automotive battery resources is predicted to be costly from both economic and environmental perspectives. To minimize these costs, battery resources should be deployed as efficiently as possible. A potential source of inefficiency in battery deployment is the fact that the batteries of personal vehicles are typically much larger than needed to meet most daily mobility needs. In this paper, we consider whether battery resources can be used more efficiently in a setting where drivers, in addition to having personal vehicle batteries, have access to a shared battery resource. More precisely, we consider the problem of minimizing aggregate battery capacity in settings with and without a shared resource subject to the requirement that driver commuting needs are met with high reliability. To assess the potential for reductions in deployed battery capacity with the addition of a shared resource, we quantify the difference in deployed battery capacity with and without a shared resource in case study using real-world longitudinal mobility data from Puget Sound, Washington. We find that giving drivers access to a shared battery resource can substantially reduces deployed battery capacity. Furthermore, relative reductions in battery capacity increase with number of drivers and the level of reliability desired.

Efficient Deep Reinforcement Learning Requires Regulating Overfitting

  • Authors: Qiyang Li, Aviral Kumar, Ilya Kostrikov, Sergey Levine
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.10466
  • Pdf link: https://arxiv.org/pdf/2304.10466
  • Abstract
    Deep reinforcement learning algorithms that learn policies by trial-and-error must learn from limited amounts of data collected by actively interacting with the environment. While many prior works have shown that proper regularization techniques are crucial for enabling data-efficient RL, a general understanding of the bottlenecks in data-efficient RL has remained unclear. Consequently, it has been difficult to devise a universal technique that works well across all domains. In this paper, we attempt to understand the primary bottleneck in sample-efficient deep RL by examining several potential hypotheses such as non-stationarity, excessive action distribution shift, and overfitting. We perform thorough empirical analysis on state-based DeepMind control suite (DMC) tasks in a controlled and systematic way to show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms, and prior methods that lead to good performance do in fact, control the validation TD error to be low. This observation gives us a robust principle for making deep RL efficient: we can hill-climb on the validation TD error by utilizing any form of regularization techniques from supervised learning. We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.

A primal dual mixed finite element method for inverse identification of the diffusion coefficient and its relation to the Kohn-Vogelius penalty method

  • Authors: Erik Burman
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.10467
  • Pdf link: https://arxiv.org/pdf/2304.10467
  • Abstract
    We revisit the celebrated Kohn-Vogelius penalty method and discuss how to use it for the unique continuation problem where data is given in the bulk of the domain. We then show that the primal-dual mixed finite element methods for the elliptic Cauchy problem introduced in \cite{BLO18} (\emph{E. Burman, M. Larson, L. Oksanen, Primal-dual mixed finite element methods for the elliptic Cauchy problem, SIAM J. Num. Anal., 56 (6), 2018}) can be interpreted as a Kohn-Vogelius penalty method and modify it to allow for unique continuation using data in the bulk. We prove that the resulting linear system is invertible for all data. Then we show that by introducing a singularly perturbed Robin condition on the discrete level sufficient regularization is obtained so that error estimates can be shown using conditional stability. Finally we show how the method can be used for the identification of the diffusivity coefficient in a second order elliptic operator with partial data. Some numerical examples are presented showing the performance of the method for unique continuation and for impedance computed tomography with partial data.

New Closed-Form ASER Expressions for Dual-Hop Mixed THz-RF Cooperative Relay Networks

  • Authors: Soumendu Das, Nagendra Kumar, Dharmendra Dixit
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.10504
  • Pdf link: https://arxiv.org/pdf/2304.10504
  • Abstract
    In this paper, we consider a dual-hop mixed THz-RF system model for backhaul-fronthaul applications where the link between source and destination is established only through the relay node in which decode-and-forward relaying protocol is used. The THz link suffers from the joint impact of antenna misalignment and stochastic characteristics of wireless channels, including the effect of environmental conditions such as pressure, humidity, and temperature. The envelope of THz link in the first hop follows a generalized $\alpha-\mu$ distribution, and for the RF end, the Nakagami-$m$ distribution is considered. In this context, we obtain new closed-form expressions of the cumulative density function and the moment-generating function of the end-to-end signal-to-noise ratio. Further, we derive the average symbol error rate expressions for coherent rectangular quadrature amplitude modulation (RQAM) and coherent hexagonal QAM (HQAM), as well as the non-coherent modulation scheme. The asymptotic behavior is also discussed to examine the system's diversity. Furthermore, the impact of several parameters, such as fading coefficients of individual links and antenna misalignment, as well as the distance between nodes, are also highlighted in the system's performance. Moreover, Monte Carlo simulations are used to validate the presented analytical framework. Finally, the presented numerical insights aid in the extraction of practical design principles.

Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget

  • Authors: Johannes Lehner, Benedikt Alkin, Andreas Fürst, Elisabeth Rumetshofer, Lukas Miklautz, Sepp Hochreiter
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10520
  • Pdf link: https://arxiv.org/pdf/2304.10520
  • Abstract
    Masked Image Modeling (MIM) methods, like Masked Autoencoders (MAE), efficiently learn a rich representation of the input. However, for adapting to downstream tasks, they require a sufficient amount of labeled data since their rich features capture not only objects but also less relevant image background. In contrast, Instance Discrimination (ID) methods focus on objects. In this work, we study how to combine the efficiency and scalability of MIM with the ability of ID to perform downstream classification in the absence of large amounts of labeled data. To this end, we introduce Masked Autoencoder Contrastive Tuning (MAE-CT), a sequential approach that applies Nearest Neighbor Contrastive Learning (NNCLR) to a pre-trained MAE. MAE-CT tunes the rich features such that they form semantic clusters of objects without using any labels. Applied to large and huge Vision Transformer (ViT) models, MAE-CT matches or excels previous self-supervised methods trained on ImageNet in linear probing, k-NN and low-shot classification accuracy as well as in unsupervised clustering accuracy. Notably, similar results can be achieved without additional image augmentations. While ID methods generally rely on hand-crafted augmentations to avoid shortcut learning, we find that nearest neighbor lookup is sufficient and that this data-driven augmentation effect improves with model size. MAE-CT is compute efficient. For instance, starting from a MAE pre-trained ViT-L/16, MAE-CT increases the ImageNet 1% low-shot accuracy from 67.7% to 72.6%, linear probing accuracy from 76.0% to 80.2% and k-NN accuracy from 60.6% to 79.1% in just five hours using eight A100 GPUs.

Learning Narrow One-Hidden-Layer ReLU Networks

  • Authors: Sitan Chen, Zehao Dou, Surbhi Goel, Adam R Klivans, Raghu Meka
  • Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.10524
  • Pdf link: https://arxiv.org/pdf/2304.10524
  • Abstract
    We consider the well-studied problem of learning a linear combination of $k$ ReLU activations with respect to a Gaussian distribution on inputs in $d$ dimensions. We give the first polynomial-time algorithm that succeeds whenever $k$ is a constant. All prior polynomial-time learners require additional assumptions on the network, such as positive combining coefficients or the matrix of hidden weight vectors being well-conditioned. Our approach is based on analyzing random contractions of higher-order moment tensors. We use a multi-scale analysis to argue that sufficiently close neurons can be collapsed together, sidestepping the conditioning issues present in prior work. This allows us to design an iterative procedure to discover individual neurons.

Learning Sparse and Low-Rank Priors for Image Recovery via Iterative Reweighted Least Squares Minimization

  • Authors: Stamatios Lefkimmiatis, Iaroslav Koshelev
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.10536
  • Pdf link: https://arxiv.org/pdf/2304.10536
  • Abstract
    We introduce a novel optimization algorithm for image recovery under learned sparse and low-rank constraints, which we parameterize as weighted extensions of the $\ell_p^p$-vector and $\mathcal S_p^p$ Schatten-matrix quasi-norms for $0!&lt;p!\le1$, respectively. Our proposed algorithm generalizes the Iteratively Reweighted Least Squares (IRLS) method, used for signal recovery under $\ell_1$ and nuclear-norm constrained minimization. Further, we interpret our overall minimization approach as a recurrent network that we then employ to deal with inverse low-level computer vision problems. Thanks to the convergence guarantees that our IRLS strategy offers, we are able to train the derived reconstruction networks using a memory-efficient implicit back-propagation scheme, which does not pose any restrictions on their effective depth. To assess our networks' performance, we compare them against other existing reconstruction methods on several inverse problems, namely image deblurring, super-resolution, demosaicking and sparse recovery. Our reconstruction results are shown to be very competitive and in many cases outperform those of existing unrolled networks, whose number of parameters is orders of magnitude higher than that of our learned models.

Learning Neural Duplex Radiance Fields for Real-Time View Synthesis

  • Authors: Ziyu Wan, Christian Richardt, Aljaž Božič, Chao Li, Vijay Rengarajan, Seonghyeon Nam, Xiaoyu Xiang, Tuotuo Li, Bo Zhu, Rakesh Ranjan, Jing Liao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.10537
  • Pdf link: https://arxiv.org/pdf/2304.10537
  • Abstract
    Neural radiance fields (NeRFs) enable novel view synthesis with unprecedented visual quality. However, to render photorealistic images, NeRFs require hundreds of deep multilayer perceptron (MLP) evaluations - for each pixel. This is prohibitively expensive and makes real-time rendering infeasible, even on powerful modern GPUs. In this paper, we propose a novel approach to distill and bake NeRFs into highly efficient mesh-based neural representations that are fully compatible with the massively parallel graphics rendering pipeline. We represent scenes as neural radiance features encoded on a two-layer duplex mesh, which effectively overcomes the inherent inaccuracies in 3D surface reconstruction by learning the aggregated radiance information from a reliable interval of ray-surface intersections. To exploit local geometric relationships of nearby pixels, we leverage screen-space convolutions instead of the MLPs used in NeRFs to achieve high-quality appearance. Finally, the performance of the whole framework is further boosted by a novel multi-view distillation optimization strategy. We demonstrate the effectiveness and superiority of our approach via extensive experiments on a range of standard datasets.

Keyword: faster

An Intent-based Framework for Vehicular Edge Computing

  • Authors: TianZhang He, Adel N. Toosi, Negin Akbari, Muhammed Tawfiqul Islam, Muhammad Aamir Cheema
  • Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.09916
  • Pdf link: https://arxiv.org/pdf/2304.09916
  • Abstract
    The rapid development of emerging vehicular edge computing (VEC) brings new opportunities and challenges for dynamic resource management. The increasing number of edge data centers, roadside units (RSUs), and network devices, however, makes resource management a complex task in VEC. On the other hand, the exponential growth of service applications and end-users makes corresponding QoS hard to maintain. Intent-Based Networking (IBN), based on Software-Defined Networking, was introduced to provide the ability to automatically handle and manage the networking requirements of different applications. Motivated by the IBN concept, in this paper, we propose a novel approach to jointly orchestrate networking and computing resources based on user requirements. The proposed solution constantly monitors user requirements and dynamically re-configures the system to satisfy desired states of the application. We compared our proposed solution with the state-of-the-art networking embedding algorithms using real-world taxi GPS traces. Results show that our proposed method is significantly faster (up to 95%) and can improve resource utilization (up to 76%) and the acceptance ratio of computing and networking requests with various priorities (up to 71%). We also present a small-scale prototype of the proposed intent management framework to validate our solution.

Speed Me up if You Can: Conditional Lower Bounds on Opacity Verification

  • Authors: Jiří Balun, Tomáš Masopust, Petr Osička
  • Subjects: Formal Languages and Automata Theory (cs.FL); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.09920
  • Pdf link: https://arxiv.org/pdf/2304.09920
  • Abstract
    Opacity is a property of privacy and security applications asking whether, given a system model, a passive intruder that makes online observations of system's behaviour can ascertain some "secret" information of the system. Deciding opacity is a PSpace-complete problem, and hence there are no polynomial-time algorithms to verify opacity under the assumption that PSpace differs from PTime. This assumption, however, gives rise to a question whether the existing exponential-time algorithms are the best possible or whether there are faster, sub-exponential-time algorithms. We show that under the (Strong) Exponential Time Hypothesis, there are no algorithms that would be significantly faster than the existing algorithms. As a by-product, we obtained a new conditional lower bound on the time complexity of deciding universality (and therefore also inclusion and equivalence) for nondeterministic finite automata.

Two-Memory Reinforcement Learning

  • Authors: Zhao Yang, Thomas. M. Moerland, Mike Preuss, Aske Plaat
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10098
  • Pdf link: https://arxiv.org/pdf/2304.10098
  • Abstract
    While deep reinforcement learning has shown important empirical success, it tends to learn relatively slow due to slow propagation of rewards information and slow update of parametric neural networks. Non-parametric episodic memory, on the other hand, provides a faster learning alternative that does not require representation learning and uses maximum episodic return as state-action values for action selection. Episodic memory and reinforcement learning both have their own strengths and weaknesses. Notably, humans can leverage multiple memory systems concurrently during learning and benefit from all of them. In this work, we propose a method called Two-Memory reinforcement learning agent (2M) that combines episodic memory and reinforcement learning to distill both of their strengths. The 2M agent exploits the speed of the episodic memory part and the optimality and the generalization capacity of the reinforcement learning part to complement each other. Our experiments demonstrate that the 2M agent is more data efficient and outperforms both pure episodic memory and pure reinforcement learning, as well as a state-of-the-art memory-augmented RL agent. Moreover, the proposed approach provides a general framework that can be used to combine any episodic memory agent with other off-policy reinforcement learning algorithms.

ZEBRA: Z-order Curve-based Event Retrieval Approach to Efficiently Explore Automotive Data

  • Authors: Christian Berger, Lukas Birkemeyer
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.10232
  • Pdf link: https://arxiv.org/pdf/2304.10232
  • Abstract
    Evaluating the performance of software for automated vehicles is predominantly driven by data collected from the real world. While professional test drivers are supported with technical means to semi-automatically annotate driving maneuvers to allow better event identification, simple data loggers in large vehicle fleets typically lack automatic and detailed event classification and hence, extra effort is needed when post-processing such data. Yet, the data quality from professional test drivers is apparently higher than the one from large fleets where labels are missing, but the non-annotated data set from large vehicle fleets is much more representative for typical, realistic driving scenarios to be handled by automated vehicles. However, while growing the data from large fleets is relatively simple, adding valuable annotations during post-processing has become increasingly expensive. In this paper, we leverage Z-order space-filling curves to systematically reduce data dimensionality while preserving domain-specific data properties, which allows us to explore even large-scale field data sets to spot interesting events orders of magnitude faster than processing time-series data directly. Furthermore, the proposed concept is based on an analytical approach, which preserves explainability for the identified events.

Observer-Feedback-Feedforward Controller Structures in Reinforcement Learning

  • Authors: Ruoqi Zhang, Per Mattson, Torbjörn Wigren
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10276
  • Pdf link: https://arxiv.org/pdf/2304.10276
  • Abstract
    The paper proposes the use of structured neural networks for reinforcement learning based nonlinear adaptive control. The focus is on partially observable systems, with separate neural networks for the state and feedforward observer and the state feedback and feedforward controller. The observer dynamics are modelled by recurrent neural networks while a standard network is used for the controller. As discussed in the paper, this leads to a separation of the observer dynamics to the recurrent neural network part, and the state feedback to the feedback and feedforward network. The structured approach reduces the computational complexity and gives the reinforcement learning based controller an {\em understandable} structure as compared to when one single neural network is used. As shown by simulation the proposed structure has the additional and main advantage that the training becomes significantly faster. Two ways to include feedforward structure are presented, one related to state feedback control and one related to classical feedforward control. The latter method introduces further structure with a separate recurrent neural network that processes only the measured disturbance. When evaluated with simulation on a nonlinear cascaded double tank process, the method with most structure performs the best, with excellent feedforward disturbance rejection gains.

Regret-Minimizing Double Oracle for Extensive-Form Games

  • Authors: Xiaohang Tang, Le Cong Dinh, Stephen Marcus McAleer, Yaodong Yang
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2304.10498
  • Pdf link: https://arxiv.org/pdf/2304.10498
  • Abstract
    By incorporating regret minimization, double oracle methods have demonstrated rapid convergence to Nash Equilibrium (NE) in normal-form games and extensive-form games, through algorithms such as online double oracle (ODO) and extensive-form double oracle (XDO), respectively. In this study, we further examine the theoretical convergence rate and sample complexity of such regret minimization-based double oracle methods, utilizing a unified framework called Regret-Minimizing Double Oracle. Based on this framework, we extend ODO to extensive-form games and determine its sample complexity. Moreover, we demonstrate that the sample complexity of XDO can be exponential in the number of information sets $|S|$, owing to the exponentially decaying stopping threshold of restricted games. To solve this problem, we propose the Periodic Double Oracle (PDO) method, which has the lowest sample complexity among all existing double oracle methods, being only polynomial in $|S|$. Empirical evaluations on multiple poker and board games show that PDO achieves significantly faster convergence than previous double oracle algorithms and reaches a competitive level with state-of-the-art regret minimization methods.

Transformer Models for Type Inference in the Simply Typed Lambda Calculus: A Case Study in Deep Learning for Code

  • Authors: Brando Miranda, Avi Shinnar, Vasily Pestun, Barry Trager
  • Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Symbolic Computation (cs.SC)
  • Arxiv link: https://arxiv.org/abs/2304.10500
  • Pdf link: https://arxiv.org/pdf/2304.10500
  • Abstract
    Despite a growing body of work at the intersection of deep learning and formal languages, there has been relatively little systematic exploration of transformer models for reasoning about typed lambda calculi. This is an interesting area of inquiry for two reasons. First, typed lambda calculi are the lingua franc of programming languages. A set of heuristics that relate various typed lambda calculi to effective neural architectures would provide a systematic method for mapping language features (e.g., polymorphism, subtyping, inheritance, etc.) to architecture choices. Second, transformer models are widely used in deep learning architectures applied to code, but the design and hyperparameter space for them is large and relatively unexplored in programming language applications. Therefore, we suggest a benchmark that allows us to explore exactly this through perhaps the simplest and most fundamental property of a programming language: the relationship between terms and types. Consequently, we begin this inquiry of transformer architectures for typed lambda calculi by exploring the effect of transformer warm-up and optimizer selection in the task of type inference: i.e., predicting the types of lambda calculus terms using only transformers. We find that the optimization landscape is difficult even in this simple setting. One particular experimental finding is that optimization by Adafactor converges much faster compared to the optimization by Adam and RAdam. We conjecture that such different performance of optimizers might be related to the difficulties of generalization over formally generated dataset.

Autonomic Architecture for Big Data Performance Optimization

  • Authors: Mikhail Genkin, Frank Dehne, Anousheh Shahmirza, Pablo Navarro, Siyu Zhou
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10503
  • Pdf link: https://arxiv.org/pdf/2304.10503
  • Abstract
    The big data software stack based on Apache Spark and Hadoop has become mission critical in many enterprises. Performance of Spark and Hadoop jobs depends on a large number of configuration settings. Manual tuning is expensive and brittle. There have been prior efforts to develop on-line and off-line automatic tuning approaches to make the big data stack less dependent on manual tuning. These, however, demonstrated only modest performance improvements with very simple, single-user workloads on small data sets. This paper presents KERMIT - the autonomic architecture for big data capable of automatically tuning Apache Spark and Hadoop on-line, and achieving performance results 30% faster than rule-of-thumb tuning by a human administrator and up to 92% as fast as the fastest possible tuning established by performing an exhaustive search of the tuning parameter space. KERMIT can detect important workload changes with up to 99% accuracy, and predict future workload types with up to 96% accuracy. It is capable of identifying and classifying complex multi-user workloads without being explicitly trained on examples of these workloads. It does not rely on the past workload history to predict the future workload classes and their associated performance. KERMIT can identify and learn new workload classes, and adapt to workload drift, without human intervention.

Generalizing Neural Human Fitting to Unseen Poses With Articulated SE(3) Equivariance

  • Authors: Haiwen Feng, Peter Kulits, Shichen Liu, Michael J. Black, Victoria Abrevaya
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10528
  • Pdf link: https://arxiv.org/pdf/2304.10528
  • Abstract
    We address the problem of fitting a parametric human body model (SMPL) to point cloud data. Optimization-based methods require careful initialization and are prone to becoming trapped in local optima. Learning-based methods address this but do not generalize well when the input pose is far from those seen during training. For rigid point clouds, remarkable generalization has been achieved by leveraging SE(3)-equivariant networks, but these methods do not work on articulated objects. In this work we extend this idea to human bodies and propose ArtEq, a novel part-based SE(3)-equivariant neural architecture for SMPL model estimation from point clouds. Specifically, we learn a part detection network by leveraging local SO(3) invariance, and regress shape and pose using articulated SE(3) shape-invariant and pose-equivariant networks, all trained end-to-end. Our novel equivariant pose regression module leverages the permutation-equivariant property of self-attention layers to preserve rotational equivariance. Experimental results show that ArtEq can generalize to poses not seen during training, outperforming state-of-the-art methods by 74.5%, without requiring an optimization refinement step. Further, compared with competing works, our method is more than three orders of magnitude faster during inference and has 97.3% fewer parameters. The code and model will be available for research purposes at https://arteq.is.tue.mpg.de.

Keyword: mobile

NRTS: A Client-Server architecture for supporting data recording, transmission and evaluation of multidisciplinary teams during the neonatal resuscitation simulation scenario

  • Authors: Manuel Striani
  • Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.09860
  • Pdf link: https://arxiv.org/pdf/2304.09860
  • Abstract
    In this technical report, we describe Neonatal Resuscitation Training Simulator (NRTS), an Android mobile app designed to support medical experts to input, transmit and record data during a High-Fidelity Simulation course for neonatal resuscitation. This mobile app allows one to automatically send all the recorded data from "Neonatal Intensive Care Unit" (NICU) of Casale Monferrato Children's Hospital, (Italy) to a server located at the Department of Science and Technological Innovation (DiSIT), University of Piemonte Orientale (Italy). Finally, the medical instructor can view statistics on a simulation exercise that may be used during the de-briefing phase for the evaluation of multidisciplinary teams involved in the simulation scenarios.

Scheduling DNNs on Edge Servers

  • Authors: Jian He, Chenxi Yang, Zhaoyuan He, Ghufran Baig, Lili Qiu
  • Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09961
  • Pdf link: https://arxiv.org/pdf/2304.09961
  • Abstract
    Deep neural networks (DNNs) have been widely used in various video analytic tasks. These tasks demand real-time responses. Due to the limited processing power on mobile devices, a common way to support such real-time analytics is to offload the processing to an edge server. This paper examines how to speed up the edge server DNN processing for multiple clients. In particular, we observe batching multiple DNN requests significantly speeds up the processing time. Based on this observation, we first design a novel scheduling algorithm to exploit the batching benefits of all requests that run the same DNN. This is compelling since there are only a handful of DNNs and many requests tend to use the same DNN. Our algorithms are general and can support different objectives, such as minimizing the completion time or maximizing the on-time ratio. We then extend our algorithm to handle requests that use different DNNs with or without shared layers. Finally, we develop a collaborative approach to further improve performance by adaptively processing some of the requests or portions of the requests locally at the clients. This is especially useful when the network and/or server is congested. Our implementation shows the effectiveness of our approach under different request distributions (e.g., Poisson, Pareto, and Constant inter-arrivals).

Availability Model of a 5G-MEC System

  • Authors: Thilina Pathirana, Gianfranco Nencioni
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.09992
  • Pdf link: https://arxiv.org/pdf/2304.09992
  • Abstract
    Multi-access Edge Computing (MEC) is one of the enabling technologies of the fifth generation (5G) of mobile networks. MEC enables services with strict latency requirements by bringing computing capabilities close to the users. As with any new technology, the dependability of MEC is one of the aspects that need to be carefully studied. In this paper, we propose a two-level model to compute the availability of a 5G-MEC system. We then use the model to evaluate the availability of a 5G-MEC system under various configurations. The results show that having a single redundancy of the 5G-MEC elements leads an acceptable availability. To reach a high availability, the software failure intensity of the management elements of 5G and MEC should be reduced.

Robust Route Planning with Distributional Reinforcement Learning in a Stochastic Road Network Environment

  • Authors: Xi Lin, Paul Szenher, John D. Martin, Brendan Englot
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.09996
  • Pdf link: https://arxiv.org/pdf/2304.09996
  • Abstract
    Route planning is essential to mobile robot navigation problems. In recent years, deep reinforcement learning (DRL) has been applied to learning optimal planning policies in stochastic environments without prior knowledge. However, existing works focus on learning policies that maximize the expected return, the performance of which can vary greatly when the level of stochasticity in the environment is high. In this work, we propose a distributional reinforcement learning based framework that learns return distributions which explicitly reflect environmental stochasticity. Policies based on the second-order stochastic dominance (SSD) relation can be used to make adjustable route decisions according to user preference on performance robustness. Our proposed method is evaluated in a simulated road network environment, and experimental results show that our method is able to plan the shortest routes that minimize stochasticity in travel time when robustness is preferred, while other state-of-the-art DRL methods are agnostic to environmental stochasticity.

FTMRate: Collision-Immune Distance-based Data Rate Selection for IEEE 802.11 Networks

  • Authors: Wojciech Ciezobka, Maksymilian Wojnar, Katarzyna Kosek-Szott, Szymon Szott, Krzysztof Rusek
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.10140
  • Pdf link: https://arxiv.org/pdf/2304.10140
  • Abstract
    Data rate selection algorithms for Wi-Fi devices are an important area of research because they directly impact performance. Most of the proposals are based on measuring the transmission success probability for a given data rate. In dense scenarios, however, this probing approach will fail because frame collisions are misinterpreted as erroneous data rate selection. We propose FTMRate which uses the fine timing measurement (FTM) feature, recently introduced in IEEE 802.11. FTM allows stations to measure their distance from the AP. We argue that knowledge of the distance from the receiver can be useful in determining which data rate to use. We apply statistical learning (a form of machine learning) to estimate the distance based on measurements, estimate channel quality from the distance, and select data rates based on channel quality. We evaluate three distinct estimation approaches: exponential smoothing, Kalman filter, and particle filter. We present a performance evaluation of the three variants of FTMRate and show, in several dense and mobile (though line-of-sight only) scenarios, that it can outperform two benchmarks and provide close to optimal results in IEEE 802.11ax networks.

A Large-scale Examination of "Socioeconomic" Fairness in Mobile Networks

  • Authors: Souneil Park, Pavol Mulinka, Diego Perino
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.10190
  • Pdf link: https://arxiv.org/pdf/2304.10190
  • Abstract
    Internet access is a special resource of which needs has become universal across the public whereas the service is operated in the private sector. Mobile Network Operators (MNOs) put efforts for management, planning, and optimization; however, they do not link such activities to socioeconomic fairness. In this paper, we make a first step towards understanding the relation between socioeconomic status of customers and network performance, and investigate potential discrimination in network deployment and management. The scope of our study spans various aspects, including urban geography, network resource deployment, data consumption, and device distribution. A novel methodology that enables a geo-socioeconomic perspective to mobile network is developed for the study. The results are based on an actual infrastructure in multiple cities, covering millions of users densely covering the socioeconomic scale. We report a thorough examination of the fairness status, its relationship with various structural factors, and potential class specific solutions.

Breast cancer detection using deep learning

  • Authors: Gayathri Girish, Ponnathota Spandana, Badrish Vasu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10386
  • Pdf link: https://arxiv.org/pdf/2304.10386
  • Abstract
    Objective: This paper proposes a deep learning model for breast cancer detection from reconstructed images of microwave imaging scan data and aims to improve the accuracy and efficiency of breast tumor detection, which could have a significant impact on breast cancer diagnosis and treatment. Methods: Our framework consists of different convolutional neural network (CNN) architectures for feature extraction and a region-based CNN for tumor detection. We use 7 different architectures: DenseNet201, ResNet50, InceptionV3, InceptionResNetV3, MobileNetV2, NASNetMobile and NASNetLarge and compare its performance to find the best architecture out of the seven. An experimental dataset of MRI-derived breast phantoms was used. Results: NASNetLarge is the best architecture which can be used for the CNN model with accuracy of 88.41% and loss of 27.82%. Given that the model's AUC is 0.786, it can be concluded that it is suitable for use in its present form, while it could be improved upon and trained on other datasets that are comparable. Impact: One of the main causes of death in women is breast cancer, and early identification is essential for enhancing the results for patients. Due to its non-invasiveness and capacity to produce high-resolution images, microwave imaging is a potential tool for breast cancer screening. The complexity of tumors makes it difficult to adequately detect them in microwave images. The results of this research show that deep learning has a lot of potential for breast cancer detection in microwave images

Securing Neural Networks with Knapsack Optimization

  • Authors: Yakir Gorski, Shai Avidan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10442
  • Pdf link: https://arxiv.org/pdf/2304.10442
  • Abstract
    Deep learning inference brings together the data and the Convolutional Neural Network (CNN). This is problematic in case the user wants to preserve the privacy of the data and the service provider does not want to reveal the weights of his CNN. Secure Inference allows the two parties to engage in a protocol that preserves their respective privacy concerns, while revealing only the inference result to the user. This is known as Multi-Party Computation (MPC). A major bottleneck of MPC algorithms is communication, as the parties must send data back and forth. The linear component of a CNN (i.e. convolutions) can be done efficiently with minimal communication, but the non-linear part (i.e., ReLU) requires the bulk of communication bandwidth. We propose two ways to accelerate Secure Inference. The first is based on the observation that the ReLU outcome of many convolutions is highly correlated. Therefore, we replace the per pixel ReLU operation by a ReLU operation per patch. Each layer in the network will benefit from a patch of a different size and we devise an algorithm to choose the optimal set of patch sizes through a novel reduction of the problem to a knapsack problem. The second way to accelerate Secure Inference is based on cutting the number of bit comparisons required for a secure ReLU operation. We demonstrate the cumulative effect of these tools in the semi-honest secure 3-party setting for four problems: Classifying ImageNet using ResNet50 backbone, classifying CIFAR100 using ResNet18 backbone, semantic segmentation of ADE20K using MobileNetV2 backbone and semantic segmentation of Pascal VOC 2012 using ResNet50 backbone. Our source code is publicly available: $\href{https://github.com/yg320/secure_inference}{\text{https://github.com/yg320/secure_inference}}$

Keyword: pruning

Model Pruning Enables Localized and Efficient Federated Learning for Yield Forecasting and Data Sharing

  • Authors: Andy Li, Milan Markovic, Peter Edwards, Georgios Leontidis
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09876
  • Pdf link: https://arxiv.org/pdf/2304.09876
  • Abstract
    Federated Learning (FL) presents a decentralized approach to model training in the agri-food sector and offers the potential for improved machine learning performance, while ensuring the safety and privacy of individual farms or data silos. However, the conventional FL approach has two major limitations. First, the heterogeneous data on individual silos can cause the global model to perform well for some clients but not all, as the update direction on some clients may hinder others after they are aggregated. Second, it is lacking with respect to the efficiency perspective concerning communication costs during FL and large model sizes. This paper proposes a new technical solution that utilizes network pruning on client models and aggregates the pruned models. This method enables local models to be tailored to their respective data distribution and mitigate the data heterogeneity present in agri-food data. Moreover, it allows for more compact models that consume less data during transmission. We experiment with a soybean yield forecasting dataset and find that this approach can improve inference performance by 15.5% to 20% compared to FedAvg, while reducing local model sizes by up to 84% and the data volume communicated between the clients and the server by 57.1% to 64.7%.

Keyword: voxel

Tetra-NeRF: Representing Neural Radiance Fields Using Tetrahedra

  • Authors: Jonas Kulhanek, Torsten Sattler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09987
  • Pdf link: https://arxiv.org/pdf/2304.09987
  • Abstract
    Neural Radiance Fields (NeRFs) are a very recent and very popular approach for the problems of novel view synthesis and 3D reconstruction. A popular scene representation used by NeRFs is to combine a uniform, voxel-based subdivision of the scene with an MLP. Based on the observation that a (sparse) point cloud of the scene is often available, this paper proposes to use an adaptive representation based on tetrahedra and a Delaunay representation instead of the uniform subdivision or point-based representations. We show that such a representation enables efficient training and leads to state-of-the-art results. Our approach elegantly combines concepts from 3D geometry processing, triangle-based rendering, and modern neural radiance fields. Compared to voxel-based representations, ours provides more detail around parts of the scene likely to be close to the surface. Compared to point-based representations, our approach achieves better performance.

Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering

  • Authors: Dongting Hu, Zhenkai Zhang, Tingbo Hou, Tongliang Liu, Huan Fu, Mingming Gong
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10075
  • Pdf link: https://arxiv.org/pdf/2304.10075
  • Abstract
    The rendering scheme in neural radiance field (NeRF) is effective in rendering a pixel by casting a ray into the scene. However, NeRF yields blurred rendering results when the training images are captured at non-uniform scales, and produces aliasing artifacts if the test images are taken in distant views. To address this issue, Mip-NeRF proposes a multiscale representation as a conical frustum to encode scale information. Nevertheless, this approach is only suitable for offline rendering since it relies on integrated positional encoding (IPE) to query a multilayer perceptron (MLP). To overcome this limitation, we propose mip voxel grids (Mip-VoG), an explicit multiscale representation with a deferred architecture for real-time anti-aliasing rendering. Our approach includes a density Mip-VoG for scene geometry and a feature Mip-VoG with a small MLP for view-dependent color. Mip-VoG encodes scene scale using the level of detail (LOD) derived from ray differentials and uses quadrilinear interpolation to map a queried 3D location to its features and density from two neighboring downsampled voxel grids. To our knowledge, our approach is the first to offer multiscale training and real-time anti-aliasing rendering simultaneously. We conducted experiments on multiscale datasets, and the results show that our approach outperforms state-of-the-art real-time rendering baselines.

Keyword: lidar

LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields

  • Authors: Tang Tao, Longfei Gao, Guangrun Wang, Peng Chen, Dayang Hao, Xiaodan Liang, Mathieu Salzmann, Kaicheng Yu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10406
  • Pdf link: https://arxiv.org/pdf/2304.10406
  • Abstract
    We introduce a new task, novel view synthesis for LiDAR sensors. While traditional model-based LiDAR simulators with style-transfer neural networks can be applied to render novel views, they fall short in producing accurate and realistic LiDAR patterns, because the renderers they rely on exploit game engines, which are not differentiable. We address this by formulating, to the best of our knowledge, the first differentiable LiDAR renderer, and propose an end-to-end framework, LiDAR-NeRF, leveraging a neural radiance field (NeRF) to enable jointly learning the geometry and the attributes of 3D points. To evaluate the effectiveness of our approach, we establish an object-centric multi-view LiDAR dataset, dubbed NeRF-MVL. It contains observations of objects from 9 categories seen from 360-degree viewpoints captured with multiple LiDAR sensors. Our extensive experiments on the scene-level KITTI-360 dataset, and on our object-level NeRF-MVL show that our LiDAR- NeRF surpasses the model-based algorithms significantly.

Keyword: diffusion

Using Text-to-Image Generation for Architectural Design Ideation

  • Authors: Ville Paananen, Jonas Oppenlaender, Aku Visuri
  • Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10182
  • Pdf link: https://arxiv.org/pdf/2304.10182
  • Abstract
    The recent progress of text-to-image generation has been recognized in architectural design. Our study is the first to investigate the potential of text-to-image generators in supporting creativity during the early stages of the architectural design process. We conducted a laboratory study with 17 architecture students, who developed a concept for a culture center using three popular text-to-image generators: Midjourney, Stable Diffusion, and DALL-E. Through standardized questionnaires and group interviews, we found that image generation could be a meaningful part of the design process when design constraints are carefully considered. Generative tools support serendipitous discovery of ideas and an imaginative mindset, enriching the design process. We identified several challenges of image generators and provided considerations for software development and educators to support creativity and emphasize designers' imaginative mindset. By understanding the limitations and potential of text-to-image generators, architects and designers can leverage this technology in their design process and education, facilitating innovation and effective communication of concepts.

A data augmentation perspective on diffusion models and retrieval

  • Authors: Max F. Burg, Florian Wenzel, Dominik Zietlow, Max Horn, Osama Makansi, Francesco Locatello, Chris Russell
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10253
  • Pdf link: https://arxiv.org/pdf/2304.10253
  • Abstract
    Diffusion models excel at generating photorealistic images from text-queries. Naturally, many approaches have been proposed to use these generative abilities to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large noisily supervised, but nonetheless, annotated datasets. It is an open question whether the generalization capabilities of diffusion models beyond using the additional data of the pre-training process for augmentation lead to improved downstream performance. We perform a systematic evaluation of existing methods to generate images from diffusion models and study new extensions to assess their benefit for data augmentation. While we find that personalizing diffusion models towards the target data outperforms simpler prompting strategies, we also show that using the training data of the diffusion model alone, via a simple nearest neighbor retrieval procedure, leads to even stronger downstream performance. Overall, our study probes the limitations of diffusion models for data augmentation but also highlights its potential in generating new training data to improve performance on simple downstream vision tasks.

Anything-3D: Towards Single-view Anything Reconstruction in the Wild

  • Authors: Qiuhong Shen, Xingyi Yang, Xinchao Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10261
  • Pdf link: https://arxiv.org/pdf/2304.10261
  • Abstract
    3D reconstruction from a single-RGB image in unconstrained real-world scenarios presents numerous challenges due to the inherent diversity and complexity of objects and environments. In this paper, we introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model to elevate objects to 3D, yielding a reliable and versatile system for single-view conditioned 3D reconstruction task. Our approach employs a BLIP model to generate textural descriptions, utilizes the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field. Demonstrating its ability to produce accurate and detailed 3D reconstructions for a wide array of objects, \emph{Anything-3D\footnotemark[2]} shows promise in addressing the limitations of existing methodologies. Through comprehensive experiments and evaluations on various datasets, we showcase the merits of our approach, underscoring its potential to contribute meaningfully to the field of 3D reconstruction. Demos and code will be available at \href{https://github.com/Anything-of-anything/Anything-3D}{https://github.com/Anything-of-anything/Anything-3D}.

Prediction of the evolution of the nuclear reactor core parameters using artificial neural network

  • Authors: Krzysztof Palmi, Wojciech Kubinski, Piotr Darnowski
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10337
  • Pdf link: https://arxiv.org/pdf/2304.10337
  • Abstract
    A nuclear reactor based on MIT BEAVRS benchmark was used as a typical power generating Pressurized Water Reactor (PWR). The PARCS v3.2 nodal-diffusion core simulator was used as a full-core reactor physics solver to emulate the operation of a reactor and to generate training, and validation data for the ANN. The ANN was implemented with dedicated Python 3.8 code with Google's TensorFlow 2.0 library. The effort was based to a large extent on the process of appropriate automatic transformation of data generated by PARCS simulator, which was later used in the process of the ANN development. Various methods that allow obtaining better accuracy of the ANN predicted results were studied, such as trying different ANN architectures to find the optimal number of neurons in the hidden layers of the network. Results were later compared with the architectures proposed in the literature. For the selected best architecture predictions were made for different core parameters and their dependence on core loading patterns. In this study, a special focus was put on the prediction of the fuel cycle length for a given core loading pattern, as it can be considered one of the targets for plant economic operation. For instance, the length of a single fuel cycle depending on the initial core loading pattern was predicted with very good accuracy (>99%). This work contributes to the exploration of the usefulness of neural networks in solving nuclear reactor design problems. Thanks to the application of ANN, designers can avoid using an excessive amount of core simulator runs and more rapidly explore the space of possible solutions before performing more detailed design considerations.

Collaborative Diffusion for Multi-Modal Face Generation and Editing

  • Authors: Ziqi Huang, Kelvin C.K. Chan, Yuming Jiang, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10530
  • Pdf link: https://arxiv.org/pdf/2304.10530
  • Abstract
    Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion models mainly focus on uni-modal control, i.e., the diffusion process is driven by only one modality of condition. To further unleash the users' creativity, it is desirable for the model to be controllable by multiple modalities simultaneously, e.g., generating and editing faces by describing the age (text-driven) while drawing the face shape (mask-driven). In this work, we present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training. Our key insight is that diffusion models driven by different modalities are inherently complementary regarding the latent denoising steps, where bilateral connections can be established upon. Specifically, we propose dynamic diffuser, a meta-network that adaptively hallucinates multi-modal denoising steps by predicting the spatial-temporal influence functions for each pre-trained uni-modal model. Collaborative Diffusion not only collaborates generation capabilities from uni-modal diffusion models, but also integrates multiple uni-modal manipulations to perform multi-modal editing. Extensive qualitative and quantitative experiments demonstrate the superiority of our framework in both image quality and condition consistency.

Nerfbusters: Removing Ghostly Artifacts from Casually Captured NeRFs

  • Authors: Frederik Warburg, Ethan Weber, Matthew Tancik, Aleksander Holynski, Angjoo Kanazawa
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.10532
  • Pdf link: https://arxiv.org/pdf/2304.10532
  • Abstract
    Casually captured Neural Radiance Fields (NeRFs) suffer from artifacts such as floaters or flawed geometry when rendered outside the camera trajectory. Existing evaluation protocols often do not capture these effects, since they usually only assess image quality at every 8th frame of the training capture. To push forward progress in novel-view synthesis, we propose a new dataset and evaluation procedure, where two camera trajectories are recorded of the scene: one used for training, and the other for evaluation. In this more challenging in-the-wild setting, we find that existing hand-crafted regularizers do not remove floaters nor improve scene geometry. Thus, we propose a 3D diffusion-based method that leverages local 3D priors and a novel density-based score distillation sampling loss to discourage artifacts during NeRF optimization. We show that this data-driven prior removes floaters and improves scene geometry for casual captures.

Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion

  • Authors: Tomas Jakab, Ruining Li, Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10535
  • Pdf link: https://arxiv.org/pdf/2304.10535
  • Abstract
    We present Farm3D, a method to learn category-specific 3D reconstructors for articulated objects entirely from "free" virtual supervision from a pre-trained 2D diffusion-based image generator. Recent approaches can learn, given a collection of single-view images of an object category, a monocular network to predict the 3D shape, albedo, illumination and viewpoint of any object occurrence. We propose a framework using an image generator like Stable Diffusion to generate virtual training data for learning such a reconstruction network from scratch. Furthermore, we include the diffusion model as a score to further improve learning. The idea is to randomise some aspects of the reconstruction, such as viewpoint and illumination, generating synthetic views of the reconstructed 3D object, and have the 2D network assess the quality of the resulting image, providing feedback to the reconstructor. Different from work based on distillation which produces a single 3D asset for each textual prompt in hours, our approach produces a monocular reconstruction network that can output a controllable 3D asset from a given image, real or generated, in only seconds. Our network can be used for analysis, including monocular reconstruction, or for synthesis, generating articulated assets for real-time applications such as video games.

Keyword: dynamic

GeoGraphViz: Geographically Constrained 3D Force-Directed Graph for Knowledge Graph Visualization

  • Authors: Sizhe Wang, Wenwen Li, Zhining Gu
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.09864
  • Pdf link: https://arxiv.org/pdf/2304.09864
  • Abstract
    Knowledge graphs are a key technique for linking and integrating cross-domain data, concepts, tools, and knowledge to enable data-driven analytics. As much of the worlds data have become massive in size, visualizing graph entities and their interrelationships intuitively and interactively has become a crucial task for ingesting and better utilizing graph content to support semantic reasoning, discovering hidden knowledge discovering, and better scientific understanding of geophysical and social phenomena. Despite the fact that many such phenomena (e.g., disasters) have clear spatial footprints and geographical properties, their location information is considered only as a textual label in existing graph visualization tools, limiting their capability to reveal the geospatial distribution patterns of the graph nodes. In addition, most graph visualization techniques rely on 2D graph visualization, which constraints the dimensions of information that can be presented and lacks support for graph structure examination from multiple angles. To tackle the above challenges, we developed a novel 3D map-based graph visualization algorithm to enable interactive exploration of graph content and patterns in a spatially explicit manner. The algorithm extends a 3D force directed graph by integrating a web map, an additional geolocational force, and a force balancing variable that allows for the dynamic adjustment of the 3D graph structure and layout. This mechanism helps create a balanced graph view between the semantic forces among the graph nodes and the attractive force from a geolocation to a graph node. Our solution offers a new perspective in visualizing and understanding spatial entities and events in a knowledge graph.

Robust trajectory tracking for underactuated mechanical systems without velocity measurements

  • Authors: N. Javanmardi, P. Borja, M. J. Yazdanpanah, J. M. A. Scherpen
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.09910
  • Pdf link: https://arxiv.org/pdf/2304.09910
  • Abstract
    In this paper, the notion of contraction is used to solve the trajectory-tracking problem for a class of mechanical systems. Additionally, we propose a dynamic extension to remove velocity measurements from the controller while rejecting matched disturbances. In particular, we propose three control designs stemming from the Interconnection and Damping Assignment Passivity-Based Control approach. The first controller is a tracker that does not require velocity measurements. The second control design solves the trajectory-tracking problem while guaranteeing robustness with respect to matched disturbances. Then, the third approach is a combination of both mentioned controllers. It is shown that all proposed design methods guarantee exponential convergence of the mechanical system to the desired (feasible) trajectory due to the contraction property of the closed-loop system. The applicability of this method is illustrated via the design of a controller for an underactuated mechanical system.

An Intent-based Framework for Vehicular Edge Computing

  • Authors: TianZhang He, Adel N. Toosi, Negin Akbari, Muhammed Tawfiqul Islam, Muhammad Aamir Cheema
  • Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.09916
  • Pdf link: https://arxiv.org/pdf/2304.09916
  • Abstract
    The rapid development of emerging vehicular edge computing (VEC) brings new opportunities and challenges for dynamic resource management. The increasing number of edge data centers, roadside units (RSUs), and network devices, however, makes resource management a complex task in VEC. On the other hand, the exponential growth of service applications and end-users makes corresponding QoS hard to maintain. Intent-Based Networking (IBN), based on Software-Defined Networking, was introduced to provide the ability to automatically handle and manage the networking requirements of different applications. Motivated by the IBN concept, in this paper, we propose a novel approach to jointly orchestrate networking and computing resources based on user requirements. The proposed solution constantly monitors user requirements and dynamically re-configures the system to satisfy desired states of the application. We compared our proposed solution with the state-of-the-art networking embedding algorithms using real-world taxi GPS traces. Results show that our proposed method is significantly faster (up to 95%) and can improve resource utilization (up to 76%) and the acceptance ratio of computing and networking requests with various priorities (up to 71%). We also present a small-scale prototype of the proposed intent management framework to validate our solution.

Improving Urban Flood Prediction using LSTM-DeepLabv3+ and Bayesian Optimization with Spatiotemporal feature fusion

  • Authors: Zuxiang Situ, Qi Wang, Shuai Teng, Wanen Feng, Gongfa Chen, Qianqian Zhou, Guangtao Fu
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09994
  • Pdf link: https://arxiv.org/pdf/2304.09994
  • Abstract
    Deep learning models have become increasingly popular for flood prediction due to their superior accuracy and efficiency compared to traditional methods. However, current machine learning methods often rely on separate spatial or temporal feature analysis and have limitations on the types, number, and dimensions of input data. This study presented a CNN-RNN hybrid feature fusion modelling approach for urban flood prediction, which integrated the strengths of CNNs in processing spatial features and RNNs in analyzing different dimensions of time sequences. This approach allowed for both static and dynamic flood predictions. Bayesian optimization was applied to identify the seven most influential flood-driven factors and determine the best combination strategy. By combining four CNNs (FCN, UNet, SegNet, DeepLabv3+) and three RNNs (LSTM, BiLSTM, GRU), the optimal hybrid model was identified as LSTM-DeepLabv3+. This model achieved the highest prediction accuracy (MAE, RMSE, NSE, and KGE were 0.007, 0.025, 0.973 and 0.755, respectively) under various rainfall input conditions. Additionally, the processing speed was significantly improved, with an inference time of 1.158s (approximately 1/125 of the traditional computation time) compared to the physically-based models.

HTNet: Dynamic WLAN Performance Prediction using Heterogenous Temporal GNN

  • Authors: Hongkuan Zhou, Rajgopal Kannan, Ananthram Swami, Viktor Prasanna
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.10013
  • Pdf link: https://arxiv.org/pdf/2304.10013
  • Abstract
    Predicting the throughput of WLAN deployments is a classic problem that occurs in the design of robust and high performance WLAN systems. However, due to the increasingly complex communication protocols and the increase in interference between devices in denser and denser WLAN deployments, traditional methods either have substantial runtime or enormous prediction error and hence cannot be applied in downstream tasks. Recently, Graph Neural Networks have been proven to be powerful graph analytic models and have been broadly applied to various networking problems such as link scheduling and power allocation. In this work, we propose HTNet, a specialized Heterogeneous Temporal Graph Neural Network that extracts features from dynamic WLAN deployments. Analyzing the unique graph structure of WLAN deployment graphs, we show that HTNet achieves the maximum expressive power on each snapshot. Based on a powerful message passing scheme, HTNet requires fewer number of layers compared with other GNN-based methods which entails less supporting data and runtime. To evaluate the performance of HTNet, we prepare six different setups with more than five thousands dense dynamic WLAN deployments that cover a wide range of real-world scenarios. HTNet achieves the lowest prediction error on all six setups with an average improvement of 25.3% over the state-of-the-art methods.

Topological Guided Actor-Critic Modular Learning of Continuous Systems with Temporal Objectives

  • Authors: Lening Li, Zhentian Qian
  • Subjects: Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.10041
  • Pdf link: https://arxiv.org/pdf/2304.10041
  • Abstract
    This work investigates the formal policy synthesis of continuous-state stochastic dynamic systems given high-level specifications in linear temporal logic. To learn an optimal policy that maximizes the satisfaction probability, we take a product between a dynamic system and the translated automaton to construct a product system on which we solve an optimal planning problem. Since this product system has a hybrid product state space that results in reward sparsity, we introduce a generalized optimal backup order, in reverse to the topological order, to guide the value backups and accelerate the learning process. We provide the optimality proof for using the generalized optimal backup order in this optimal planning problem. Further, this paper presents an actor-critic reinforcement learning algorithm when topological order applies. This algorithm leverages advanced mathematical techniques and enjoys the property of hyperparameter self-tuning. We provide proof of the optimality and convergence of our proposed reinforcement learning algorithm. We use neural networks to approximate the value function and policy function for hybrid product state space. Furthermore, we observe that assigning integer numbers to automaton states can rank the value or policy function approximated by neural networks. To break the ordinal relationship, we use an individual neural network for each automaton state's value (policy) function, termed modular learning. We conduct two experiments. First, to show the efficacy of our reinforcement learning algorithm, we compare it with baselines on a classic control task, CartPole. Second, we demonstrate the empirical performance of our formal policy synthesis framework on motion planning of a Dubins car with a temporal specification.

Dynablox: Real-time Detection of Diverse Dynamic Objects in Complex Environments

  • Authors: Lukas Schmid, Olov Andersson, Aurelio Sulser, Patrick Pfreundschuh, Roland Siegwart
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.10049
  • Pdf link: https://arxiv.org/pdf/2304.10049
  • Abstract
    Real-time detection of moving objects is an essential capability for robots acting autonomously in dynamic environments. We thus propose Dynablox, a novel online mapping-based approach for robust moving object detection in complex environments. The central idea of our approach is to incrementally estimate high confidence free-space areas by modeling and accounting for sensing, state estimation, and mapping limitations during online robot operation. The spatio-temporally conservative free space estimate enables robust detection of moving objects without making any assumptions on the appearance of objects or environments. This allows deployment in complex scenes such as multi-storied buildings or staircases, and for diverse moving objects such as people carrying various items, doors swinging or even balls rolling around. We thoroughly evaluate our approach on real-world data sets, achieving 86% IoU at 17 FPS in typical robotic settings. The method outperforms a recent appearance-based classifier and approaches the performance of offline methods. We demonstrate its generality on a novel data set with rare moving objects in complex environments. We make our efficient implementation and the novel data set available as open-source.

Recurrent Transformer for Dynamic Graph Representation Learning with Edge Temporal States

  • Authors: Shengxiang Hu, Guobing Zou, Shiyi Lin, Liangrui Wu, Chenyang Zhou, Bofeng Zhang, Yixin Chen
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10079
  • Pdf link: https://arxiv.org/pdf/2304.10079
  • Abstract
    Dynamic graph representation learning is growing as a trending yet challenging research task owing to the widespread demand for graph data analysis in real world applications. Despite the encouraging performance of many recent works that build upon recurrent neural networks (RNNs) and graph neural networks (GNNs), they fail to explicitly model the impact of edge temporal states on node features over time slices. Additionally, they are challenging to extract global structural features because of the inherent over-smoothing disadvantage of GNNs, which further restricts the performance. In this paper, we propose a recurrent difference graph transformer (RDGT) framework, which firstly assigns the edges in each snapshot with various types and weights to illustrate their specific temporal states explicitly, then a structure-reinforced graph transformer is employed to capture the temporal node representations by a recurrent learning paradigm. Experimental results on four real-world datasets demonstrate the superiority of RDGT for discrete dynamic graph representation learning, as it consistently outperforms competing methods in dynamic link prediction tasks.

Securing Semantic Communications with Physical-layer Semantic Encryption and Obfuscation

  • Authors: Qi Qin, Yankai Rong, Guoshun Nan, Shaokang Wu, Xuefei Zhang, Qimei Cui, Xiaofeng Tao
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.10147
  • Pdf link: https://arxiv.org/pdf/2304.10147
  • Abstract
    Deep learning based semantic communication(DLSC) systems have shown great potential of making wireless networks significantly more efficient by only transmitting the semantics of the data. However, the open nature of wireless channel and fragileness of neural models cause DLSC systems extremely vulnerable to various attacks. Traditional wireless physical layer key (PLK), which relies on reciprocal channel and randomness characteristics between two legitimate users, holds the promise of securing DLSC. The main challenge lies in generating secret keys in the static environment with ultra-low/zero rate. Different from prior efforts that use relays or reconfigurable intelligent surfaces (RIS) to manipulate wireless channels, this paper proposes a novel physical layer semantic encryption scheme by exploring the randomness of bilingual evaluation understudy (BLEU) scores in the field of machine translation, and additionally presents a novel semantic obfuscation mechanism to provide further physical layer protections. Specifically, 1) we calculate the BLEU scores and corresponding weights of the DLSC system. Then, we generate semantic keys (SKey) by feeding the weighted sum of the scores into a hash function. 2) Equipped with the SKey, our proposed subcarrier obfuscation is able to further secure semantic communications with a dynamic dummy data insertion mechanism. Experiments show the effectiveness of our method, especially in the static wireless environment.

Automated Dynamic Bayesian Networks for Predicting Acute Kidney Injury Before Onset

  • Authors: David Gordon, Panayiotis Petousis, Anders O. Garlid, Keith Norris, Katherine Tuttle, Susanne B. Nicholas, Alex A.T. Bui (on behalf of CURE-CKD)
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10175
  • Pdf link: https://arxiv.org/pdf/2304.10175
  • Abstract
    Several algorithms for learning the structure of dynamic Bayesian networks (DBNs) require an a priori ordering of variables, which influences the determined graph topology. However, it is often unclear how to determine this order if feature importance is unknown, especially as an exhaustive search is usually impractical. In this paper, we introduce Ranking Approaches for Unknown Structures (RAUS), an automated framework to systematically inform variable ordering and learn networks end-to-end. RAUS leverages existing statistical methods (Cramers V, chi-squared test, and information gain) to compare variable ordering, resultant generated network topologies, and DBN performance. RAUS enables end-users with limited DBN expertise to implement models via command line interface. We evaluate RAUS on the task of predicting impending acute kidney injury (AKI) from inpatient clinical laboratory data. Longitudinal observations from 67,460 patients were collected from our electronic health record (EHR) and Kidney Disease Improving Global Outcomes (KDIGO) criteria were then applied to define AKI events. RAUS learns multiple DBNs simultaneously to predict a future AKI event at different time points (i.e., 24-, 48-, 72-hours in advance of AKI). We also compared the results of the learned AKI prediction models and variable orderings to baseline techniques (logistic regression, random forests, and extreme gradient boosting). The DBNs generated by RAUS achieved 73-83% area under the receiver operating characteristic curve (AUCROC) within 24-hours before AKI; and 71-79% AUCROC within 48-hours before AKI of any stage in a 7-day observation window. Insights from this automated framework can help efficiently implement and interpret DBNs for clinical decision support. The source code for RAUS is available in GitHub at https://github.com/dgrdn08/RAUS .

UAV-based Receding Horizon Control for 3D Inspection Planning

  • Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10201
  • Pdf link: https://arxiv.org/pdf/2304.10201
  • Abstract
    Nowadays, unmanned aerial vehicles or UAVs are being used for a wide range of tasks, including infrastructure inspection, automated monitoring and coverage. This paper investigates the problem of 3D inspection planning with an autonomous UAV agent which is subject to dynamical and sensing constraints. We propose a receding horizon 3D inspection planning control approach for generating optimal trajectories which enable an autonomous UAV agent to inspect a finite number of feature-points scattered on the surface of a cuboid-like structure of interest. The inspection planning problem is formulated as a constrained open-loop optimal control problem and is solved using mixed integer programming (MIP) optimization. Quantitative and qualitative evaluation demonstrates the effectiveness of the proposed approach.

Dynamic Security Region of Natural Gas Systems in Integrated Electricity-Gas Systems

  • Authors: Han Gao, Peiyao Zhao, Zhengshuo Li
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10215
  • Pdf link: https://arxiv.org/pdf/2304.10215
  • Abstract
    In an integrated electricity-gas system (IEGS), the tight coupling of power and natural gas systems is embodied by frequent changes in gas withdrawal from gas-fired units to provide regulation services for the power system to handle uncertainty, which may in turn endanger the secure operation of the natural gas system and ultimately affect the safety of the whole IEGS. Hence, it is necessary to accurately and efficiently evaluate the dynamic security region (DSR) of the natural gas system in the IEGS by considering the real-time dynamic characteristics of natural gas systems, which are not satisfactorily handled in state-of-the-art works. To bridge this gap, this paper first conceptionally verifies the necessity of the DSR and establishes its mathematical model. Then, a dimensionality reduction method is proposed for the efficient solution and visualization of the high-dimensional DSR evaluation model. A fast evaluation (FE) algorithm is developed to address the difficulties of the nonconvex dynamic constraints in the reduced DSR model. Finally, the necessity and notable advantages of the proposed DSR model and FE are verified based on small and relatively large test systems in comparison with common security region models and algorithms. To the best of our knowledge, this is the first paper that comprehensively presents models and efficient algorithms regarding the DSR of natural gas systems in an IEGS.

Filter-Aware Model-Predictive Control

  • Authors: Baris Kayalibay, Atanas Mirchev, Ahmed Agha, Patrick van der Smagt, Justin Bayer
  • Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10246
  • Pdf link: https://arxiv.org/pdf/2304.10246
  • Abstract
    Partially-observable problems pose a trade-off between reducing costs and gathering information. They can be solved optimally by planning in belief space, but that is often prohibitively expensive. Model-predictive control (MPC) takes the alternative approach of using a state estimator to form a belief over the state, and then plan in state space. This ignores potential future observations during planning and, as a result, cannot actively increase or preserve the certainty of its own state estimate. We find a middle-ground between planning in belief space and completely ignoring its dynamics by only reasoning about its future accuracy. Our approach, filter-aware MPC, penalises the loss of information by what we call "trackability", the expected error of the state estimator. We show that model-based simulation allows condensing trackability into a neural network, which allows fast planning. In experiments involving visual navigation, realistic every-day environments and a two-link robot arm, we show that filter-aware MPC vastly improves regular MPC.

Learning Representative Trajectories of Dynamical Systems via Domain-Adaptive Imitation

  • Authors: Edgardo Solano-Carrillo, Jannis Stoppe
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.10260
  • Pdf link: https://arxiv.org/pdf/2304.10260
  • Abstract
    Domain-adaptive trajectory imitation is a skill that some predators learn for survival, by mapping dynamic information from one domain (their speed and steering direction) to a different domain (current position of the moving prey). An intelligent agent with this skill could be exploited for a diversity of tasks, including the recognition of abnormal motion in traffic once it has learned to imitate representative trajectories. Towards this direction, we propose DATI, a deep reinforcement learning agent designed for domain-adaptive trajectory imitation using a cycle-consistent generative adversarial method. Our experiments on a variety of synthetic families of reference trajectories show that DATI outperforms baseline methods for imitation learning and optimal control in this setting, keeping the same per-task hyperparameters. Its generalization to a real-world scenario is shown through the discovery of abnormal motion patterns in maritime traffic, opening the door for the use of deep reinforcement learning methods for spatially-unconstrained trajectory data mining.

BackCache: Mitigating Contention-Based Cache Timing Attacks by Hiding Cache Line Evictions

  • Authors: Quancheng Wang, Ming Tang, Han Wang, Yuzhe Gu
  • Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.10268
  • Pdf link: https://arxiv.org/pdf/2304.10268
  • Abstract
    Caches are used to reduce the speed differential between the CPU and memory to improve the performance of modern processors. However, attackers can use contention-based cache timing attacks to steal sensitive information from victim processes through carefully designed cache eviction sets. And L1 data cache attacks are widely exploited and pose a significant privacy and confidentiality threat. Existing hardware-based countermeasures mainly focus on cache partitioning, randomization, and cache line flushing, which unfortunately either incur high overhead or can be circumvented by sophisticated attacks. In this paper, we propose a novel hardware-software co-design called BackCache with the idea of always achieving cache hits instead of cache misses to mitigate contention-based cache timing attacks on the L1 data cache. BackCache places the evicted cache lines from the L1 data cache into a fully-associative backup cache to hide the evictions. To improve the security of BackCache, we introduce a randomly used replacement policy (RURP) and a dynamic backup cache resizing mechanism. We also present a theoretical security analysis to demonstrate the effectiveness of BackCache. Our evaluation on the gem5 simulator shows that BackCache can degrade the performance by 1.33%, 7.34%, and 7.59% For OS kernel, single-thread, and multi-thread benchmarks.

Observer-Feedback-Feedforward Controller Structures in Reinforcement Learning

  • Authors: Ruoqi Zhang, Per Mattson, Torbjörn Wigren
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10276
  • Pdf link: https://arxiv.org/pdf/2304.10276
  • Abstract
    The paper proposes the use of structured neural networks for reinforcement learning based nonlinear adaptive control. The focus is on partially observable systems, with separate neural networks for the state and feedforward observer and the state feedback and feedforward controller. The observer dynamics are modelled by recurrent neural networks while a standard network is used for the controller. As discussed in the paper, this leads to a separation of the observer dynamics to the recurrent neural network part, and the state feedback to the feedback and feedforward network. The structured approach reduces the computational complexity and gives the reinforcement learning based controller an {\em understandable} structure as compared to when one single neural network is used. As shown by simulation the proposed structure has the additional and main advantage that the training becomes significantly faster. Two ways to include feedforward structure are presented, one related to state feedback control and one related to classical feedforward control. The latter method introduces further structure with a separate recurrent neural network that processes only the measured disturbance. When evaluated with simulation on a nonlinear cascaded double tank process, the method with most structure performs the best, with excellent feedforward disturbance rejection gains.

Aiding reinforcement learning for set point control

  • Authors: Ruoqi Zhang, Per Mattsson, Torbjörn Wigren
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10289
  • Pdf link: https://arxiv.org/pdf/2304.10289
  • Abstract
    While reinforcement learning has made great improvements, state-of-the-art algorithms can still struggle with seemingly simple set-point feedback control problems. One reason for this is that the learned controller may not be able to excite the system dynamics well enough initially, and therefore it can take a long time to get data that is informative enough to learn for good control. The paper contributes by augmentation of reinforcement learning with a simple guiding feedback controller, for example, a proportional controller. The key advantage in set point control is a much improved excitation that improves the convergence properties of the reinforcement learning controller significantly. This can be very important in real-world control where quick and accurate convergence is needed. The proposed method is evaluated with simulation and on a real-world double tank process with promising results.

FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits

  • Authors: Polina Karpikova (1 and 2), Radionova Ekaterina (1), Anastasia Yaschenko (1 and 2), Andrei Spiridonov (1), Leonid Kostyushko (3), Riccardo Fabbricatore (1), Aleksei Ivakhnenko (1) ((1) Samsung AI Center, (2) Higher School of Economics, (3) Lomonosov Moscow State University)
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10306
  • Pdf link: https://arxiv.org/pdf/2304.10306
  • Abstract
    Generative DNNs are a powerful tool for image synthesis, but they are limited by their computational load. On the other hand, given a trained model and a task, e.g. faces generation within a range of characteristics, the output image quality will be unevenly distributed among images with different characteristics. It follows, that we might restrain the models complexity on some instances, maintaining a high quality. We propose a method for diminishing computations by adding so-called early exit branches to the original architecture, and dynamically switching the computational path depending on how difficult it will be to render the output. We apply our method on two different SOTA models performing generative tasks: generation from a semantic map, and cross-reenactment of face expressions; showing it is able to output images with custom lower-quality thresholds. For a threshold of LPIPS <=0.1, we diminish their computations by up to a half. This is especially relevant for real-time applications such as synthesis of faces, when quality loss needs to be contained, but most of the inputs need fewer computations than the complex instances.

ORIGAMI: A flexible state channels design for public blockchain systems

  • Authors: Lydia Negka, Angeliki Katsika, Georgios Spathoulas, Vassilis Plagianakos
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.10313
  • Pdf link: https://arxiv.org/pdf/2304.10313
  • Abstract
    Public blockchain systems offer security guarantees that cannot be matched by any centralised system. This offering has attracted a lot of interest and has exposed a significant limitation of most blockchain designs with regards to scalability. One of the scaling solutions proposed is state channels which enables serving given applications with minimum number of transactions. Existing state channels designs set multiple compatibility requirements for applications to be deployed. Origami is a novel state channels design which removes most of the requirements of existing approaches, while it also offers a number of new features. Origami enables dynamic groups of users to interact in an unordered way completely off-chain after an initial on-boarding on-chain transaction. The proposed design is analysed in detail and compared to existing schemes, while a formal security analysis validates the security properties it offers.

Polylog-Competitive Algorithms for Dynamic Balanced Graph Partitioning for Ring Demands

  • Authors: Harald Räcke, Stefan Schmid, Ruslan Zabrodin
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.10350
  • Pdf link: https://arxiv.org/pdf/2304.10350
  • Abstract
    The performance of many large-scale and data-intensive distributed systems critically depends on the capacity of the interconnecting network. This paper is motivated by the vision of self-adjusting infrastructures whose resources can be adjusted according to the workload they currently serve, in a demand-aware manner. Such dynamic adjustments can be exploited to improve network utilization and hence performance, by dynamically moving frequently interacting communication partners closer, e.g., collocating them in the same server or datacenter rack. In particular, we revisit the online balanced graph partitioning problem which captures the fundamental tradeoff between the benefits and costs of dynamically collocating communication partners. The demand is modelled as a sequence $\sigma$ (revealed in an online manner) of communication requests between $n$ processes, each of which is running on one of the $\ell$ servers. Each server has capacity $k=n/\ell$, hence, the processes have to be scheduled in a balanced manner across the servers. A request incurs cost $1$, if the requested processes are located on different servers, otherwise the cost is 0. A process can be migrated to a different server at cost $1$. This paper presents the first online algorithm for online balanced graph partitioning achieving a polylogarithmic competitive ratio for the fundamental case of ring communication patterns. Specifically, our main contribution is a $O(\log^3 n)$-competitive randomized online algorithm for this problem. We further present a randomized online algorithm which is $O(\log^2 n)$-competitive when compared to a static optimal solution. Our two results rely on different algorithms and techniques and hence are of independent interest.

PDL on Steroids: on Expressive Extensions of PDL with Intersection and Converse

  • Authors: Diego Figueira, Santiago Figueira, Edwin Pin
  • Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.10381
  • Pdf link: https://arxiv.org/pdf/2304.10381
  • Abstract
    We introduce CPDL+, a family of expressive logics rooted in Propositional Dynamic Logic (PDL). In terms of expressive power, CPDL+ strictly contains PDL extended with intersection and converse (a.k.a. ICPDL) as well as Conjunctive Queries (CQ), Conjunctive Regular Path Queries (CRPQ), or some known extensions thereof (Regular Queries and CQPDL). We investigate the expressive power, characterization of bisimulation, satisfiability, and model checking for CPDL+. We argue that natural subclasses of CPDL+ can be defined in terms of the tree-width of the underlying graphs of the formulas. We show that the class of CPDL+ formulas of tree-width 2 is equivalent to ICPDL, and that it also coincides with CPDL+ formulas of tree-width 1. However, beyond tree-width 2, incrementing the tree-width strictly increases the expressive power. We characterize the expressive power for every class of fixed tree-width formulas in terms of a bisimulation game with pebbles. Based on this characterization, we show that CPDL+ has a tree-like model property. We prove that the satisfiability problem is decidable in 2ExpTime on fixed tree-width formulas, coinciding with the complexity of ICPDL. We also exhibit classes for which satisfiability is reduced to ExpTime. Finally, we establish that the model checking problem for fixed tree-width formulas is in \ptime, contrary to the full class CPDL+.

Multi-label Node Classification On Graph-Structured Data

  • Authors: Tianqi Zhao, Ngan Thi Dong, Alan Hanjalic, Megha Khosla
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10398
  • Pdf link: https://arxiv.org/pdf/2304.10398
  • Abstract
    Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, besides defining homophily for the multi-label scenario, we develop a new approach that dynamically fuses the feature and label correlation information to learn label-informed representations. Finally, we perform a large-scale comparative study with $10$ methods and $9$ datasets which also showcase the effectiveness of our approach. We release our benchmark at \url{https://anonymous.4open.science/r/LFLF-5D8C/}.

Distributed Neural Representation for Reactive in situ Visualization

  • Authors: Qi Wu, Joseph A. Insley, Victor A. Mateevitsi, Silvio Rizzi, Michael E. Papka, Kwan-Liu Ma
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10516
  • Pdf link: https://arxiv.org/pdf/2304.10516
  • Abstract
    In situ visualization and steering of computational modeling can be effectively achieved using reactive programming, which leverages temporal abstraction and data caching mechanisms to create dynamic workflows. However, implementing a temporal cache for large-scale simulations can be challenging. Implicit neural networks have proven effective in compressing large volume data. However, their application to distributed data has yet to be fully explored. In this work, we develop an implicit neural representation for distributed volume data and incorporate it into the DIVA reactive programming system. This implementation enables us to build an in situ temporal caching system with a capacity 100 times larger than previously achieved. We integrate our implementation into the Ascent infrastructure and evaluate its performance using real-world simulations.

A class of mesh-free algorithms for some problems arising in finance and machine learning

  • Authors: Philippe G. LeFloch, Jean-Marc Mercier
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.10521
  • Pdf link: https://arxiv.org/pdf/2304.10521
  • Abstract
    We introduce a numerical methodology, referred to as the transport-based mesh-free method, which allows us to deal with continuous, discrete, or statistical models in the same unified framework, and leads us to a broad class of numerical algorithms recently implemented in a Python library (namely, CodPy). Specifically, we propose a mesh-free discretization technique based on the theory of reproducing kernels and the theory of transport mappings, in a way that is reminiscent of Lagrangian methods in computational fluid dynamics. We introduce kernel-based discretizations of a variety of differential and discrete operators (gradient, divergence, Laplacian, Leray projection, extrapolation, interpolation, polar factorization). The proposed algorithms are nonlinear in nature and enjoy quantitative error estimates based on the notion of discrepancy error, which allows one to evaluate the relevance and accuracy of, both, the given data and the numerical solutions. Our strategy is relevant when a large number of degrees of freedom are present as is the case in mathematical finance and machine learning. We consider the Fokker-Planck-Kolmogorov system (relevant for problems arising in finance and material dynamics) and a class of neural networks based on support vector machines.

Collaborative Diffusion for Multi-Modal Face Generation and Editing

  • Authors: Ziqi Huang, Kelvin C.K. Chan, Yuming Jiang, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10530
  • Pdf link: https://arxiv.org/pdf/2304.10530
  • Abstract
    Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion models mainly focus on uni-modal control, i.e., the diffusion process is driven by only one modality of condition. To further unleash the users' creativity, it is desirable for the model to be controllable by multiple modalities simultaneously, e.g., generating and editing faces by describing the age (text-driven) while drawing the face shape (mask-driven). In this work, we present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training. Our key insight is that diffusion models driven by different modalities are inherently complementary regarding the latent denoising steps, where bilateral connections can be established upon. Specifically, we propose dynamic diffuser, a meta-network that adaptively hallucinates multi-modal denoising steps by predicting the spatial-temporal influence functions for each pre-trained uni-modal model. Collaborative Diffusion not only collaborates generation capabilities from uni-modal diffusion models, but also integrates multiple uni-modal manipulations to perform multi-modal editing. Extensive qualitative and quantitative experiments demonstrate the superiority of our framework in both image quality and condition consistency.

New submissions for Thu, 20 Apr 23

Keyword: efficient

Memento: Facilitating Effortless, Efficient, and Reliable ML Experiments

  • Authors: Zac Pullar-Strecker, Xinglong Chang, Liam Brydon, Ioannis Ziogas, Katharina Dost, Jörg Wicker
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09175
  • Pdf link: https://arxiv.org/pdf/2304.09175
  • Abstract
    Running complex sets of machine learning experiments is challenging and time-consuming due to the lack of a unified framework. This leaves researchers forced to spend time implementing necessary features such as parallelization, caching, and checkpointing themselves instead of focussing on their project. To simplify the process, in this paper, we introduce Memento, a Python package that is designed to aid researchers and data scientists in the efficient management and execution of computationally intensive experiments. Memento has the capacity to streamline any experimental pipeline by providing a straightforward configuration matrix and the ability to concurrently run experiments across multiple threads. A demonstration of Memento is available at: https://wickerlab.org/publication/memento.

Generative models improve fairness of medical classifiers under distribution shifts

  • Authors: Ira Ktena, Olivia Wiles, Isabela Albuquerque, Sylvestre-Alvise Rebuffi, Ryutaro Tanno, Abhijit Guha Roy, Shekoofeh Azizi, Danielle Belgrave, Pushmeet Kohli, Alan Karthikesalingam, Taylan Cemgil, Sven Gowal
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09218
  • Pdf link: https://arxiv.org/pdf/2304.09218
  • Abstract
    A ubiquitous challenge in machine learning is the problem of domain generalisation. This can exacerbate bias against groups or labels that are underrepresented in the datasets used for model development. Model bias can lead to unintended harms, especially in safety-critical applications like healthcare. Furthermore, the challenge is compounded by the difficulty of obtaining labelled data due to high cost or lack of readily available domain expertise. In our work, we show that learning realistic augmentations automatically from data is possible in a label-efficient manner using generative models. In particular, we leverage the higher abundance of unlabelled data to capture the underlying data distribution of different conditions and subgroups for an imaging modality. By conditioning generative models on appropriate labels, we can steer the distribution of synthetic examples according to specific requirements. We demonstrate that these learned augmentations can surpass heuristic ones by making models more robust and statistically fair in- and out-of-distribution. To evaluate the generality of our approach, we study 3 distinct medical imaging contexts of varying difficulty: (i) histopathology images from a publicly available generalisation benchmark, (ii) chest X-rays from publicly available clinical datasets, and (iii) dermatology images characterised by complex shifts and imaging conditions. Complementing real training samples with synthetic ones improves the robustness of models in all three medical tasks and increases fairness by improving the accuracy of diagnosis within underrepresented groups. This approach leads to stark improvements OOD across modalities: 7.7% prediction accuracy improvement in histopathology, 5.2% in chest radiology with 44.6% lower fairness gap and a striking 63.5% improvement in high-risk sensitivity for dermatology with a 7.5x reduction in fairness gap.

A Data Driven Sequential Learning Framework to Accelerate and Optimize Multi-Objective Manufacturing Decisions

  • Authors: Hamed Khosravi, Taofeeq Olajire, Ahmed Shoyeb Raihan, Imtiaz Ahmed
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.09278
  • Pdf link: https://arxiv.org/pdf/2304.09278
  • Abstract
    Manufacturing advanced materials and products with a specific property or combination of properties is often warranted. To achieve that it is crucial to find out the optimum recipe or processing conditions that can generate the ideal combination of these properties. Most of the time, a sufficient number of experiments are needed to generate a Pareto front. However, manufacturing experiments are usually costly and even conducting a single experiment can be a time-consuming process. So, it's critical to determine the optimal location for data collection to gain the most comprehensive understanding of the process. Sequential learning is a promising approach to actively learn from the ongoing experiments, iteratively update the underlying optimization routine, and adapt the data collection process on the go. This paper presents a novel data-driven Bayesian optimization framework that utilizes sequential learning to efficiently optimize complex systems with multiple conflicting objectives. Additionally, this paper proposes a novel metric for evaluating multi-objective data-driven optimization approaches. This metric considers both the quality of the Pareto front and the amount of data used to generate it. The proposed framework is particularly beneficial in practical applications where acquiring data can be expensive and resource intensive. To demonstrate the effectiveness of the proposed algorithm and metric, the algorithm is evaluated on a manufacturing dataset. The results indicate that the proposed algorithm can achieve the actual Pareto front while processing significantly less data. It implies that the proposed data-driven framework can lead to similar manufacturing decisions with reduced costs and time.

Leveraging Deep Learning Techniques on Collaborative Filtering Recommender Systems

  • Authors: Ali Fallahi RahmatAbadi, Javad Mohammadzadeh
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.09282
  • Pdf link: https://arxiv.org/pdf/2304.09282
  • Abstract
    With the exponentially increasing volume of online data, searching and finding required information have become an extensive and time-consuming task. Recommender Systems as a subclass of information retrieval and decision support systems by providing personalized suggestions helping users access what they need more efficiently. Among the different techniques for building a recommender system, Collaborative Filtering (CF) is the most popular and widespread approach. However, cold start and data sparsity are the fundamental challenges ahead of implementing an effective CF-based recommender. Recent successful developments in enhancing and implementing deep learning architectures motivated many studies to propose deep learning-based solutions for solving the recommenders' weak points. In this research, unlike the past similar works about using deep learning architectures in recommender systems that covered different techniques generally, we specifically provide a comprehensive review of deep learning-based collaborative filtering recommender systems. This in-depth filtering gives a clear overview of the level of popularity, gaps, and ignored areas on leveraging deep learning techniques to build CF-based systems as the most influential recommenders.

Integrity and Junkiness Failure Handling for Embedding-based Retrieval: A Case Study in Social Network Search

  • Authors: Wenping Wang, Yunxi Guo, Chiyao Shen, Shuai Ding, Guangdeng Liao, Hao Fu, Pramodh Karanth Prabhakar
  • Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09287
  • Pdf link: https://arxiv.org/pdf/2304.09287
  • Abstract
    Embedding based retrieval has seen its usage in a variety of search applications like e-commerce, social networking search etc. While the approach has demonstrated its efficacy in tasks like semantic matching and contextual search, it is plagued by the problem of uncontrollable relevance. In this paper, we conduct an analysis of embedding-based retrieval launched in early 2021 on our social network search engine, and define two main categories of failures introduced by it, integrity and junkiness. The former refers to issues such as hate speech and offensive content that can severely harm user experience, while the latter includes irrelevant results like fuzzy text matching or language mismatches. Efficient methods during model inference are further proposed to resolve the issue, including indexing treatments and targeted user cohort treatments, etc. Though being simple, we show the methods have good offline NDCG and online A/B tests metrics gain in practice. We analyze the reasons for the improvements, pointing out that our methods are only preliminary attempts to this important but challenging problem. We put forward potential future directions to explore.

From RSSE to BotSE: Potentials and Challenges Revisited after 15 Years

  • Authors: Walid Maalej
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.09308
  • Pdf link: https://arxiv.org/pdf/2304.09308
  • Abstract
    Both recommender systems and bots should proactively and smartly answer the questions of software developers or other project stakeholders to assist them in performing their tasks more efficiently. This paper reflects on the achievements from the more mature area of Recommendation Systems in Software Engineering (RSSE) as well as the rising area of Bots in Software Engineering (BotSE). We discuss the similarities and differences, briefly review current state of the art, and highlight three particular areas, in which the full potential is yet to be tapped: a more socio-technical context awareness, assisting knowledge sharing in addition to knowledge access, as well as covering repetitive or stimulative scenarios related to requirements and user-developer interaction.

Application of genetic algorithm to load balancing in networks with a homogeneous traffic flow

  • Authors: Marek Bolanowski (1), Alicja Gerka, Andrzej Paszkiewicz (1), Maria Ganzha (2), Marcin Paprzycki (2) ((1) Rzeszow University of Technology, (2) Systems Research Institute Polish Academy of Sciences)
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.09313
  • Pdf link: https://arxiv.org/pdf/2304.09313
  • Abstract
    The concept of extended cloud requires efficient network infrastructure to support ecosystems reaching form the edge to the cloud(s). Standard approaches to network load balancing deliver static solutions that are insufficient for the extended clouds, where network loads change often. To address this issue, a genetic algorithm based load optimizer is proposed and implemented. Next, its performance is experimentally evaluated and it is shown that it outperforms other existing solutions.

Provably-Efficient and Internally-Deterministic Parallel Union-Find

  • Authors: Alexander Fedorov, Diba Hashemi, Giorgi Nadiradze, Dan Alistarh
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.09331
  • Pdf link: https://arxiv.org/pdf/2304.09331
  • Abstract
    Determining the degree of inherent parallelism in classical sequential algorithms and leveraging it for fast parallel execution is a key topic in parallel computing, and detailed analyses are known for a wide range of classical algorithms. In this paper, we perform the first such analysis for the fundamental Union-Find problem, in which we are given a graph as a sequence of edges, and must maintain its connectivity structure under edge additions. We prove that classic sequential algorithms for this problem are well-parallelizable under reasonable assumptions, addressing a conjecture by [Blelloch, 2017]. More precisely, we show via a new potential argument that, under uniform random edge ordering, parallel union-find operations are unlikely to interfere: $T$ concurrent threads processing the graph in parallel will encounter memory contention $O(T^2 \cdot \log |V| \cdot \log |E|)$ times in expectation, where $|E|$ and $|V|$ are the number of edges and nodes in the graph, respectively. We leverage this result to design a new parallel Union-Find algorithm that is both internally deterministic, i.e., its results are guaranteed to match those of a sequential execution, but also work-efficient and scalable, as long as the number of threads $T$ is $O(|E|^{\frac{1}{3} - \varepsilon})$, for an arbitrarily small constant $\varepsilon &gt; 0$, which holds for most large real-world graphs. We present lower bounds which show that our analysis is close to optimal, and experimental results suggesting that the performance cost of internal determinism is limited.

BIM-GPT: a Prompt-Based Virtual Assistant Framework for BIM Information Retrieval

  • Authors: Junwen Zheng, Martin Fischer
  • Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.09333
  • Pdf link: https://arxiv.org/pdf/2304.09333
  • Abstract
    Efficient information retrieval (IR) from building information models (BIMs) poses significant challenges due to the necessity for deep BIM knowledge or extensive engineering efforts for automation. We introduce BIM-GPT, a prompt-based virtual assistant (VA) framework integrating BIM and generative pre-trained transformer (GPT) technologies to support NL-based IR. A prompt manager and dynamic template generate prompts for GPT models, enabling interpretation of NL queries, summarization of retrieved information, and answering BIM-related questions. In tests on a BIM IR dataset, our approach achieved 83.5% and 99.5% accuracy rates for classifying NL queries with no data and 2% data incorporated in prompts, respectively. Additionally, we validated the functionality of BIM-GPT through a VA prototype for a hospital building. This research contributes to the development of effective and versatile VAs for BIM IR in the construction industry, significantly enhancing BIM accessibility and reducing engineering efforts and training data requirements for processing NL queries.

Perception Imitation: Towards Synthesis-free Simulator for Autonomous Vehicles

  • Authors: Xiaoliang Ju, Yiyang Sun, Yiming Hao, Yikang Li, Yu Qiao, Hongsheng Li
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09365
  • Pdf link: https://arxiv.org/pdf/2304.09365
  • Abstract
    We propose a perception imitation method to simulate results of a certain perception model, and discuss a new heuristic route of autonomous driving simulator without data synthesis. The motivation is that original sensor data is not always necessary for tasks such as planning and control when semantic perception results are ready, so that simulating perception directly is more economic and efficient. In this work, a series of evaluation methods such as matching metric and performance of downstream task are exploited to examine the simulation quality. Experiments show that our method is effective to model the behavior of learning-based perception model, and can be further applied in the proposed simulation route smoothly.

SP-BatikGAN: An Efficient Generative Adversarial Network for Symmetric Pattern Generation

  • Authors: Chrystian, Wahyono
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.09384
  • Pdf link: https://arxiv.org/pdf/2304.09384
  • Abstract
    Following the contention of AI arts, our research focuses on bringing AI for all, particularly for artists, to create AI arts with limited data and settings. We are interested in geometrically symmetric pattern generation, which appears on many artworks such as Portuguese, Moroccan tiles, and Batik, a cultural heritage in Southeast Asia. Symmetric pattern generation is a complex problem, with prior research creating too-specific models for certain patterns only. We provide publicly, the first-ever 1,216 high-quality symmetric patterns straight from design files for this task. We then formulate symmetric pattern enforcement (SPE) loss to leverage underlying symmetric-based structures that exist on current image distributions. Our SPE improves and accelerates training on any GAN configuration, and, with efficient attention, SP-BatikGAN compared to FastGAN, the state-of-the-art GAN for limited setting, improves the FID score from 110.11 to 90.76, an 18% decrease, and model diversity recall score from 0.047 to 0.204, a 334% increase.

Information Geometrically Generalized Covariate Shift Adaptation

  • Authors: Masanari Kimura, Hideitsu Hino
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09387
  • Pdf link: https://arxiv.org/pdf/2304.09387
  • Abstract
    Many machine learning methods assume that the training and test data follow the same distribution. However, in the real world, this assumption is very often violated. In particular, the phenomenon that the marginal distribution of the data changes is called covariate shift, one of the most important research topics in machine learning. We show that the well-known family of covariate shift adaptation methods is unified in the framework of information geometry. Furthermore, we show that parameter search for geometrically generalized covariate shift adaptation method can be achieved efficiently. Numerical experiments show that our generalization can achieve better performance than the existing methods it encompasses.

Inferring High-level Geographical Concepts via Knowledge Graph and Multi-scale Data Integration: A Case Study of C-shaped Building Pattern Recognition

  • Authors: Zhiwei Wei, Yi Xiao, Wenjia Xu, Mi Shu, Lu Cheng, Yang Wang, Chunbo Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09391
  • Pdf link: https://arxiv.org/pdf/2304.09391
  • Abstract
    Effective building pattern recognition is critical for understanding urban form, automating map generalization, and visualizing 3D city models. Most existing studies use object-independent methods based on visual perception rules and proximity graph models to extract patterns. However, because human vision is a part-based system, pattern recognition may require decomposing shapes into parts or grouping them into clusters. Existing methods may not recognize all visually aware patterns, and the proximity graph model can be inefficient. To improve efficiency and effectiveness, we integrate multi-scale data using a knowledge graph, focusing on the recognition of C-shaped building patterns. First, we use a property graph to represent the relationships between buildings within and across different scales involved in C-shaped building pattern recognition. Next, we store this knowledge graph in a graph database and convert the rules for C-shaped pattern recognition and enrichment into query conditions. Finally, we recognize and enrich C-shaped building patterns using rule-based reasoning in the built knowledge graph. We verify the effectiveness of our method using multi-scale data with three levels of detail (LODs) collected from the Gaode Map. Our results show that our method achieves a higher recall rate of 26.4% for LOD1, 20.0% for LOD2, and 9.1% for LOD3 compared to existing approaches. We also achieve recognition efficiency improvements of 0.91, 1.37, and 9.35 times, respectively.

On the Capacity Region of Reconfigurable Intelligent Surface Assisted Symbiotic Radios

  • Authors: Qianqian Zhang, Hu Zhou, Ying-Chang Liang, Sumei Sun, Wei Zhang, H. Vincent Poor
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2304.09400
  • Pdf link: https://arxiv.org/pdf/2304.09400
  • Abstract
    In this paper, we are interested in reconfigurable intelligent surface (RIS)-assisted symbiotic radio (SR) systems, where an RIS assists a primary transmission by passive beamforming and simultaneously acts as an information transmitter by periodically adjusting its reflecting coefficients. The above modulation scheme innately enables a new multiplicative multiple access channel (M-MAC), where the primary and secondary signals are superposed in a multiplicative and additive manner. To pursue the fundamental performance limits of the M-MAC, we focus on the characterization of the capacity region of such systems. Due to the passive nature of RISs, the transmitted signal of the RIS should satisfy the peak power constraint. Under this constraint at the RIS as well as the average power constraint at the primary transmitter (PTx), we analyze the capacity-achieving distributions of the transmitted signals and characterize the capacity region of the M-MAC. Then, theoretical analysis is performed to reveal insights into the RIS-assisted SR. It is observed that: 1) the capacity region of the M-MAC is strictly convex and larger than that of the conventional TDMA scheme; 2) the secondary transmission can achieve the maximum rate when the PTx transmits the constant envelope signals; 3) and the sum rate can achieve the maximum when the PTx transmits Gaussian signals and the RIS transmits the constant envelope signals. Finally, extensive numerical results are provided to evaluate the performance of the RIS-assisted SR and verify the accuracy of our theoretical analysis.

Torque-based Deep Reinforcement Learning for Task-and-Robot Agnostic Learning on Bipedal Robots Using Sim-to-Real Transfer

  • Authors: Donghyeon Kim, Glen Berseth, Mathew Schwartz, Jaeheung Park
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.09434
  • Pdf link: https://arxiv.org/pdf/2304.09434
  • Abstract
    In this paper, we review the question of which action space is best suited for controlling a real biped robot in combination with Sim2Real training. Position control has been popular as it has been shown to be more sample efficient and intuitive to combine with other planning algorithms. However, for position control gain tuning is required to achieve the best possible policy performance. We show that instead, using a torque-based action space enables task-and-robot agnostic learning with less parameter tuning and mitigates the sim-to-reality gap by taking advantage of torque control's inherent compliance. Also, we accelerate the torque-based-policy training process by pre-training the policy to remain upright by compensating for gravity. The paper showcases the first successful sim-to-real transfer of a torque-based deep reinforcement learning policy on a real human-sized biped robot. The video is available at https://youtu.be/CR6pTS39VRE.

Local object crop collision network for efficient simulation of non-convex objects in GPU-based simulators

  • Authors: Dongwon Son, Beomjoon Kim
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09439
  • Pdf link: https://arxiv.org/pdf/2304.09439
  • Abstract
    Our goal is to develop an efficient contact detection algorithm for large-scale GPU-based simulation of non-convex objects. Current GPU-based simulators such as IsaacGym and Brax must trade-off speed with fidelity, generality, or both when simulating non-convex objects. Their main issue lies in contact detection (CD): existing CD algorithms, such as Gilbert-Johnson-Keerthi (GJK), must trade off their computational speed with accuracy which becomes expensive as the number of collisions among non-convex objects increases. We propose a data-driven approach for CD, whose accuracy depends only on the quality and quantity of offline dataset rather than online computation time. Unlike GJK, our method inherently has a uniform computational flow, which facilitates efficient GPU usage based on advanced compilers such as XLA (Accelerated Linear Algebra). Further, we offer a data-efficient solution by learning the patterns of colliding local crop object shapes, rather than global object shapes which are harder to learn. We demonstrate our approach improves the efficiency of existing CD methods by a factor of 5-10 for non-convex objects with comparable accuracy. Using the previous work on contact resolution for a neural-network-based contact detector, we integrate our CD algorithm into the open-source GPU-based simulator, Brax, and show that we can improve the efficiency over IsaacGym and generality over standard Brax. We highly recommend the videos of our simulator included in the supplementary materials.

Enhancing Multi-Camera People Tracking with Anchor-Guided Clustering and Spatio-Temporal Consistency ID Re-Assignment

  • Authors: Hsiang-Wei Huang, Cheng-Yen Yang, Zhongyu Jiang, Pyong-Kun Kim, Kyoungoh Lee, Kwangju Kim, Samartha Ramkumar, Chaitanya Mullapudi, In-Su Jang, Chung-I Huang, Jenq-Neng Hwang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09471
  • Pdf link: https://arxiv.org/pdf/2304.09471
  • Abstract
    Multi-camera multiple people tracking has become an increasingly important area of research due to the growing demand for accurate and efficient indoor people tracking systems, particularly in settings such as retail, healthcare centers, and transit hubs. We proposed a novel multi-camera multiple people tracking method that uses anchor-guided clustering for cross-camera re-identification and spatio-temporal consistency for geometry-based cross-camera ID reassigning. Our approach aims to improve the accuracy of tracking by identifying key features that are unique to every individual and utilizing the overlap of views between cameras to predict accurate trajectories without needing the actual camera parameters. The method has demonstrated robustness and effectiveness in handling both synthetic and real-world data. The proposed method is evaluated on CVPR AI City Challenge 2023 dataset, achieving IDF1 of 95.36% with the first-place ranking in the challenge. The code is available at: https://github.com/ipl-uw/AIC23_Track1_UWIPL_ETRI.

Learning Resource Scheduling with High Priority Users using Deep Deterministic Policy Gradients

  • Authors: Steffen Gracla, Edgar Beck, Carsten Bockelmann, Armin Dekorsy
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09488
  • Pdf link: https://arxiv.org/pdf/2304.09488
  • Abstract
    Advances in mobile communication capabilities open the door for closer integration of pre-hospital and in-hospital care processes. For example, medical specialists can be enabled to guide on-site paramedics and can, in turn, be supplied with live vitals or visuals. Consolidating such performance-critical applications with the highly complex workings of mobile communications requires solutions both reliable and efficient, yet easy to integrate with existing systems. This paper explores the application of Deep Deterministic Policy Gradient~(\ddpg) methods for learning a communications resource scheduling algorithm with special regards to priority users. Unlike the popular Deep-Q-Network methods, the \ddpg is able to produce continuous-valued output. With light post-processing, the resulting scheduler is able to achieve high performance on a flexible sum-utility goal.

Neural Network Quantisation for Faster Homomorphic Encryption

  • Authors: Wouter Legiest, Jan-Pieter D'Anvers, Furkan Turan, Michiel Van Beirendonck, Ingrid Verbauwhede
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.09490
  • Pdf link: https://arxiv.org/pdf/2304.09490
  • Abstract
    Homomorphic encryption (HE) enables calculating on encrypted data, which makes it possible to perform privacypreserving neural network inference. One disadvantage of this technique is that it is several orders of magnitudes slower than calculation on unencrypted data. Neural networks are commonly trained using floating-point, while most homomorphic encryption libraries calculate on integers, thus requiring a quantisation of the neural network. A straightforward approach would be to quantise to large integer sizes (e.g. 32 bit) to avoid large quantisation errors. In this work, we reduce the integer sizes of the networks, using quantisation-aware training, to allow more efficient computations. For the targeted MNIST architecture proposed by Badawi et al., we reduce the integer sizes by 33% without significant loss of accuracy, while for the CIFAR architecture, we can reduce the integer sizes by 43%. Implementing the resulting networks under the BFV homomorphic encryption scheme using SEAL, we could reduce the execution time of an MNIST neural network by 80% and by 40% for a CIFAR neural network.

Sampling is Matter: Point-guided 3D Human Mesh Reconstruction

  • Authors: Jeonghwan Kim (1), Mi-Gyeong Gwon (1), Hyunwoo Park (1), Hyukmin Kwon (2), Gi-Mun Um (2), Wonjun Kim (1) ((1) Konkuk University, (2) Electronics and Telecommunications Research Institute)
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09502
  • Pdf link: https://arxiv.org/pdf/2304.09502
  • Abstract
    This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single RGB image. Most recently, the non-local interactions of the whole mesh vertices have been effectively estimated in the transformer while the relationship between body parts also has begun to be handled via the graph model. Even though those approaches have shown the remarkable progress in 3D human mesh reconstruction, it is still difficult to directly infer the relationship between features, which are encoded from the 2D input image, and 3D coordinates of each vertex. To resolve this problem, we propose to design a simple feature sampling scheme. The key idea is to sample features in the embedded space by following the guide of points, which are estimated as projection results of 3D mesh vertices (i.e., ground truth). This helps the model to concentrate more on vertex-relevant features in the 2D space, thus leading to the reconstruction of the natural human pose. Furthermore, we apply progressive attention masking to precisely estimate local interactions between vertices even under severe occlusions. Experimental results on benchmark datasets show that the proposed method efficiently improves the performance of 3D human mesh reconstruction. The code and model are publicly available at: https://github.com/DCVL-3D/PointHMR_release.

Progressive Transfer Learning for Dexterous In-Hand Manipulation with Multi-Fingered Anthropomorphic Hand

  • Authors: Yongkang Luo, Wanyi Li, Peng Wang, Haonan Duan, Wei Wei, Jia Sun
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.09526
  • Pdf link: https://arxiv.org/pdf/2304.09526
  • Abstract
    Dexterous in-hand manipulation for a multi-fingered anthropomorphic hand is extremely difficult because of the high-dimensional state and action spaces, rich contact patterns between the fingers and objects. Even though deep reinforcement learning has made moderate progress and demonstrated its strong potential for manipulation, it is still faced with certain challenges, such as large-scale data collection and high sample complexity. Especially, for some slight change scenes, it always needs to re-collect vast amounts of data and carry out numerous iterations of fine-tuning. Remarkably, humans can quickly transfer learned manipulation skills to different scenarios with little supervision. Inspired by human flexible transfer learning capability, we propose a novel dexterous in-hand manipulation progressive transfer learning framework (PTL) based on efficiently utilizing the collected trajectories and the source-trained dynamics model. This framework adopts progressive neural networks for dynamics model transfer learning on samples selected by a new samples selection method based on dynamics properties, rewards and scores of the trajectories. Experimental results on contact-rich anthropomorphic hand manipulation tasks show that our method can efficiently and effectively learn in-hand manipulation skills with a few online attempts and adjustment learning under the new scene. Compared to learning from scratch, our method can reduce training time costs by 95%.

SelfAct: Personalized Activity Recognition based on Self-Supervised and Active Learning

  • Authors: Luca Arrotta, Gabriele Civitarese, Samuele Valente, Claudio Bettini
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.09530
  • Pdf link: https://arxiv.org/pdf/2304.09530
  • Abstract
    Supervised Deep Learning (DL) models are currently the leading approach for sensor-based Human Activity Recognition (HAR) on wearable and mobile devices. However, training them requires large amounts of labeled data whose collection is often time-consuming, expensive, and error-prone. At the same time, due to the intra- and inter-variability of activity execution, activity models should be personalized for each user. In this work, we propose SelfAct: a novel framework for HAR combining self-supervised and active learning to mitigate these problems. SelfAct leverages a large pool of unlabeled data collected from many users to pre-train through self-supervision a DL model, with the goal of learning a meaningful and efficient latent representation of sensor data. The resulting pre-trained model can be locally used by new users, which will fine-tune it thanks to a novel unsupervised active learning strategy. Our experiments on two publicly available HAR datasets demonstrate that SelfAct achieves results that are close to or even better than the ones of fully supervised approaches with a small number of active learning queries.

Graph Exploration for Effective Multi-agent Q-Learning

  • Authors: Ainur Zhaikhan, Ali H. Sayed
  • Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2304.09547
  • Pdf link: https://arxiv.org/pdf/2304.09547
  • Abstract
    This paper proposes an exploration technique for multi-agent reinforcement learning (MARL) with graph-based communication among agents. We assume the individual rewards received by the agents are independent of the actions by the other agents, while their policies are coupled. In the proposed framework, neighbouring agents collaborate to estimate the uncertainty about the state-action space in order to execute more efficient explorative behaviour. Different from existing works, the proposed algorithm does not require counting mechanisms and can be applied to continuous-state environments without requiring complex conversion techniques. Moreover, the proposed scheme allows agents to communicate in a fully decentralized manner with minimal information exchange. And for continuous-state scenarios, each agent needs to exchange only a single parameter vector. The performance of the algorithm is verified with theoretical results for discrete-state scenarios and with experiments for continuous ones.

The State-of-the-Art in Air Pollution Monitoring and Forecasting Systems using IoT, Big Data, and Machine Learning

  • Authors: Amisha Gangwar, Sudhakar Singh, Richa Mishra, Shiv Prakash
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.09574
  • Pdf link: https://arxiv.org/pdf/2304.09574
  • Abstract
    The quality of air is closely linked with the life quality of humans, plantations, and wildlife. It needs to be monitored and preserved continuously. Transportations, industries, construction sites, generators, fireworks, and waste burning have a major percentage in degrading the air quality. These sources are required to be used in a safe and controlled manner. Using traditional laboratory analysis or installing bulk and expensive models every few miles is no longer efficient. Smart devices are needed for collecting and analyzing air data. The quality of air depends on various factors, including location, traffic, and time. Recent researches are using machine learning algorithms, big data technologies, and the Internet of Things to propose a stable and efficient model for the stated purpose. This review paper focuses on studying and compiling recent research in this field and emphasizes the Data sources, Monitoring, and Forecasting models. The main objective of this paper is to provide the astuteness of the researches happening to improve the various aspects of air polluting models. Further, it casts light on the various research issues and challenges also.

DADFNet: Dual Attention and Dual Frequency-Guided Dehazing Network for Video-Empowered Intelligent Transportation

  • Authors: Yu Guo, Ryan Wen Liu, Jiangtian Nie, Lingjuan Lyu, Zehui Xiong, Jiawen Kang, Han Yu, Dusit Niyato
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09588
  • Pdf link: https://arxiv.org/pdf/2304.09588
  • Abstract
    Visual surveillance technology is an indispensable functional component of advanced traffic management systems. It has been applied to perform traffic supervision tasks, such as object detection, tracking and recognition. However, adverse weather conditions, e.g., fog, haze and mist, pose severe challenges for video-based transportation surveillance. To eliminate the influences of adverse weather conditions, we propose a dual attention and dual frequency-guided dehazing network (termed DADFNet) for real-time visibility enhancement. It consists of a dual attention module (DAM) and a high-low frequency-guided sub-net (HLFN) to jointly consider the attention and frequency mapping to guide haze-free scene reconstruction. Extensive experiments on both synthetic and real-world images demonstrate the superiority of DADFNet over state-of-the-art methods in terms of visibility enhancement and improvement in detection accuracy. Furthermore, DADFNet only takes $6.3$ ms to process a 1,920 * 1,080 image on the 2080 Ti GPU, making it highly efficient for deployment in intelligent transportation systems.

Efficient High-Order Space-Angle-Energy Polytopic Discontinuous Galerkin Finite Element Methods for Linear Boltzmann Transport

  • Authors: Paul Houston, Matthew E. Hubbard, Thomas J. Radley, Oliver J. Sutton, Richard S.J. Widdowson
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.09592
  • Pdf link: https://arxiv.org/pdf/2304.09592
  • Abstract
    We introduce an $hp$-version discontinuous Galerkin finite element method (DGFEM) for the linear Boltzmann transport problem. A key feature of this new method is that, while offering arbitrary order convergence rates, it may be implemented in an almost identical form to standard multigroup discrete ordinates methods, meaning that solutions can be computed efficiently with high accuracy and in parallel within existing software. This method provides a unified discretisation of the space, angle, and energy domains of the underlying integro-differential equation and naturally incorporates both local mesh and local polynomial degree variation within each of these computational domains. Moreover, general polytopic elements can be handled by the method, enabling efficient discretisations of problems posed on complicated spatial geometries. We study the stability and $hp$-version a priori error analysis of the proposed method, by deriving suitable $hp$-approximation estimates together with a novel inf-sup bound. Numerical experiments highlighting the performance of the method for both polyenergetic and monoenergetic problems are presented.

AdapterGNN: Efficient Delta Tuning Improves Generalization Ability in Graph Neural Networks

  • Authors: Shengrui Li, Xueting Han, Jing Bai
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09595
  • Pdf link: https://arxiv.org/pdf/2304.09595
  • Abstract
    Fine-tuning pre-trained models has recently yielded remarkable performance gains in graph neural networks (GNNs). In addition to pre-training techniques, inspired by the latest work in the natural language fields, more recent work has shifted towards applying effective fine-tuning approaches, such as parameter-efficient tuning (delta tuning). However, given the substantial differences between GNNs and transformer-based models, applying such approaches directly to GNNs proved to be less effective. In this paper, we present a comprehensive comparison of delta tuning techniques for GNNs and propose a novel delta tuning method specifically designed for GNNs, called AdapterGNN. AdapterGNN preserves the knowledge of the large pre-trained model and leverages highly expressive adapters for GNNs, which can adapt to downstream tasks effectively with only a few parameters, while also improving the model's generalization ability on the downstream tasks. Extensive experiments show that AdapterGNN achieves higher evaluation performance (outperforming full fine-tuning by 1.4% and 5.5% in the chemistry and biology domains respectively, with only 5% of its parameters tuned) and lower generalization gaps compared to full fine-tuning. Moreover, we empirically show that a larger GNN model can have a worse generalization ability, which differs from the trend observed in large language models. We have also provided a theoretical justification for delta tuning can improve the generalization ability of GNNs by applying generalization bounds.

LEA: Beyond Evolutionary Algorithms via Learned Optimization Strategy

  • Authors: Kai Wu, Penghui Liu, Jing Liu
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09599
  • Pdf link: https://arxiv.org/pdf/2304.09599
  • Abstract
    Evolutionary algorithms (EAs) have emerged as a powerful framework for expensive black-box optimization. Obtaining better solutions with less computational cost is essential and challenging for black-box optimization. The most critical obstacle is figuring out how to effectively use the target task information to form an efficient optimization strategy. However, current methods are weak due to the poor representation of the optimization strategy and the inefficient interaction between the optimization strategy and the target task. To overcome the above limitations, we design a learned EA (LEA) to realize the move from hand-designed optimization strategies to learned optimization strategies, including not only hyperparameters but also update rules. Unlike traditional EAs, LEA has high adaptability to the target task and can obtain better solutions with less computational cost. LEA is also able to effectively utilize the low-fidelity information of the target task to form an efficient optimization strategy. The experimental results on one synthetic case, CEC 2013, and two real-world cases show the advantages of learned optimization strategies over human-designed baselines. In addition, LEA is friendly to the acceleration provided by Graphics Processing Units and runs 102 times faster than unaccelerated EA when evolving 32 populations, each containing 6400 individuals.

StyleDEM: a Versatile Model for Authoring Terrains

  • Authors: Simon Perche, Adrien Peytavie, Bedrich Benes, Eric Galin, Eric Guérin
  • Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09626
  • Pdf link: https://arxiv.org/pdf/2304.09626
  • Abstract
    Many terrain modelling methods have been proposed for the past decades, providing efficient and often interactive authoring tools. However, they generally do not include any notion of style, which is a critical aspect for designers in the entertainment industry. We introduce StyleDEM, a new generative adversarial network method for terrain synthesis and authoring, with a versatile toolbox of authoring methods with style. This method starts from an input sketch or an existing terrain. It outputs a terrain with features that can be authored using interactive brushes and enhanced with additional tools such as style manipulation or super-resolution. The strength of our approach resides in the versatility and interoperability of the toolbox.

Integrated Ray-Tracing and Coverage Planning Control using Reinforcement Learning

  • Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.09631
  • Pdf link: https://arxiv.org/pdf/2304.09631
  • Abstract
    In this work we propose a coverage planning control approach which allows a mobile agent, equipped with a controllable sensor (i.e., a camera) with limited sensing domain (i.e., finite sensing range and angle of view), to cover the surface area of an object of interest. The proposed approach integrates ray-tracing into the coverage planning process, thus allowing the agent to identify which parts of the scene are visible at any point in time. The problem of integrated ray-tracing and coverage planning control is first formulated as a constrained optimal control problem (OCP), which aims at determining the agent's optimal control inputs over a finite planning horizon, that minimize the coverage time. Efficiently solving the resulting OCP is however very challenging due to non-convex and non-linear visibility constraints. To overcome this limitation, the problem is converted into a Markov decision process (MDP) which is then solved using reinforcement learning. In particular, we show that a controller which follows an optimal control law can be learned using off-policy temporal-difference control (i.e., Q-learning). Extensive numerical experiments demonstrate the effectiveness of the proposed approach for various configurations of the agent and the object of interest.

Resource Allocation in the RIS Assisted SCMA Cellular Network Coexisting with D2D Communications

  • Authors: Yukai Liu, Wen Chen, Kunlun Wang
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.09646
  • Pdf link: https://arxiv.org/pdf/2304.09646
  • Abstract
    The cellular network coexisting with device-to-device (D2D) communications has been studied extensively. Reconfigurable intelligent surface (RIS) and non-orthogonal multiple access (NOMA) are promising technologies for the evolution of 5G, 6G and beyond. Besides, sparse code multiple access (SCMA) is considered suitable for next-generation wireless network in code-domain NOMA. In this paper, we consider the RIS-aided uplink SCMA cellular network simultaneously with D2D users. We formulate the optimization problem which aims to maximize the cellular sum-rate by jointly designing D2D users resource block (RB) association, the transmitted power for both cellular users and D2D users, and the phase shifts at the RIS. The power limitation and users communication requirements are considered. The problem is non-convex, and it is challenging to solve it directly. To handle this optimization problem, we propose an efficient iterative algorithm based on block coordinate descent (BCD) method. The original problem is decoupled into three subproblems to solve separately. Simulation results demonstrate that the proposed scheme can significantly improve the sum-rate performance over various schemes.

List Defective Colorings: Distributed Algorithms and Applications

  • Authors: Marc Fuchs, Fabian Kuhn
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.09666
  • Pdf link: https://arxiv.org/pdf/2304.09666
  • Abstract
    The distributed coloring problem is at the core of the area of distributed graph algorithms and it is a problem that has seen tremendous progress over the last few years. Much of the remarkable recent progress on deterministic distributed coloring algorithms is based on two main tools: a) defective colorings in which every node of a given color can have a limited number of neighbors of the same color and b) list coloring, a natural generalization of the standard coloring problem that naturally appears when colorings are computed in different stages and one has to extend a previously computed partial coloring to a full coloring. In this paper, we introduce \emph{list defective colorings}, which can be seen as a generalization of these two coloring variants. Essentially, in a list defective coloring instance, each node $v$ is given a list of colors $x_{v,1},\dots,x_{v,p}$ together with a list of defects $d_{v,1},\dots,d_{v,p}$ such that if $v$ is colored with color $x_{v, i}$, it is allowed to have at most $d_{v, i}$ neighbors with color $x_{v, i}$. We highlight the important role of list defective colorings by showing that faster list defective coloring algorithms would directly lead to faster deterministic $(\Delta+1)$-coloring algorithms in the LOCAL model. Further, we extend a recent distributed list coloring algorithm by Maus and Tonoyan [DISC '20]. Slightly simplified, we show that if for each node $v$ it holds that $\sum_{i=1}^p \big(d_{v,i}+1)^2 &gt; \mathrm{deg}_G^2(v)\cdot polylog\Delta$ then this list defective coloring instance can be solved in a communication-efficient way in only $O(\log\Delta)$ communication rounds. This leads to the first deterministic $(\Delta+1)$-coloring algorithm in the standard CONGEST model with a time complexity of $O(\sqrt{\Delta}\cdot polylog \Delta+\log^* n)$, matching the best time complexity in the LOCAL model up to a $polylog\Delta$ factor.

Operations for D-algebraic Functions

  • Authors: Bertrand Teguia Tabuguia
  • Subjects: Symbolic Computation (cs.SC)
  • Arxiv link: https://arxiv.org/abs/2304.09675
  • Pdf link: https://arxiv.org/pdf/2304.09675
  • Abstract
    A function is differentially algebraic (or simply D-algebraic) if there is a polynomial relationship between some of its derivatives and the indeterminate variable. Many functions in the sciences, such as Mathieu functions, the Weierstrass elliptic functions, and holonomic or D-finite functions are D-algebraic. These functions form a field, and are closed under composition, taking functional inverse, and derivation. We present implementation for each underlying operation. We also give a systematic way for computing an algebraic differential equation from a linear differential equation with D-finite function coefficients. Each command is a feature of our Maple package $NLDE$ available at https://mathrepo.mis.mpg.de/OperationsForDAlgebraicFunctions.

GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database

  • Authors: Weixing Zhou, Qi Peng, Zijie Zhang, Yanfeng Zhang, Yang Ren, Sihao Li, Guo Fu, Yulong Cui, Qiang Li, Caiyi Wu, Shangjun Han, Shengyi Wang, Guoliang Li, Ge Yu
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.09692
  • Pdf link: https://arxiv.org/pdf/2304.09692
  • Abstract
    Multinational enterprises conduct global business that has a demand for geo-distributed transactional databases. Existing state-of-the-art databases adopt a sharded master-follower replication architecture. However, the single-master serving mode incurs massive cross-region writes from clients, and the sharded architecture requires multiple round-trip acknowledgments (e.g., 2PC) to ensure atomicity for cross-shard transactions. These limitations drive us to seek yet another design choice. In this paper, we propose a strongly consistent OLTP database GeoGauss with full replica multi-master architecture. To efficiently merge the updates from different master nodes, we propose a multi-master OCC that unifies data replication and concurrent transaction processing. By leveraging an epoch-based delta state merge rule and the optimistic asynchronous execution, GeoGauss ensures strong consistency with light-coordinated protocol and allows more concurrency with weak isolation, which are sufficient to meet our needs. Our geo-distributed experimental results show that GeoGauss achieves 7.06X higher throughput and 17.41X lower latency than the state-of-the-art geo-distributed database CockroachDB on the TPC-C benchmark.

Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

  • Authors: Romain Loiseau, Elliot Vincent, Mathieu Aubry, Loic Landrieu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09704
  • Pdf link: https://arxiv.org/pdf/2304.09704
  • Abstract
    We propose an unsupervised method for parsing large 3D scans of real-world scenes into interpretable parts. Our goal is to provide a practical tool for analyzing 3D scenes with unique characteristics in the context of aerial surveying and mapping, without relying on application-specific user annotations. Our approach is based on a probabilistic reconstruction model that decomposes an input 3D point cloud into a small set of learned prototypical shapes. Our model provides an interpretable reconstruction of complex scenes and leads to relevant instance and semantic segmentations. To demonstrate the usefulness of our results, we introduce a novel dataset of seven diverse aerial LiDAR scans. We show that our method outperforms state-of-the-art unsupervised methods in terms of decomposition accuracy while remaining visually interpretable. Our method offers significant advantage over existing approaches, as it does not require any manual annotations, making it a practical and efficient tool for 3D scene analysis. Our code and dataset are available at https://imagine.enpc.fr/~loiseaur/learnable-earth-parser

Grooming Connectivity Intents in IP-Optical Networks Using Directed Acyclic Graphs

  • Authors: Filippos Christou, Andreas Kirstädter
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.09711
  • Pdf link: https://arxiv.org/pdf/2304.09711
  • Abstract
    During the last few years, there have been concentrated efforts toward intent-driven networking. While relying upon Software-Defined Networking (SDN), Intent-Based Networking (IBN) pushes the frontiers of efficient networking by decoupling the intentions of a network operator (i.e., what is desired to be done) from the implementation (i.e., how is it achieved). The advantages of such a paradigm have long been argued and include, but are not limited to, the reduction of human errors, reduced expertise requirements among operator personnel, and faster business plan adaptation. In previous work, we have shown how incorporating IBN in multi-domain networks can have a significantly positive impact as it can enable decentralized operation, accountability, and confidentiality. The pillar of our previous contribution is the compilation of intents using system-generated intent trees. In this work, we extend the architecture to enable grooming among the user intents. Therefore, separate intents can now end up using the same network resources. While this makes the intent system reasonably more complex, it indisputably improves resource allocation. To represent the intent relationships of the newly enhanced architecture, we use Directed Acyclic Graphs (DAGs). Furthermore, we appropriately adapt an advanced established technique from the literature to solve the Routing, Modulation, and Spectrum Assignment (RMSA) problem for the intent compilation. We demonstrate a realistic scenario in which we evaluate our architecture and the intent compilation strategy. Our current approach successfully consolidates the advantages of having an intent-driven architecture and, at the same time, flexibly choosing among advanced resource allocation techniques.

A compact simple HWENO scheme with ADER time discretization for hyperbolic conservation laws I: structured meshes

  • Authors: Dongmi Luo, Shiyi Li, Jianxian Qiu, Jun Zhu, Yibing Chen
  • Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
  • Arxiv link: https://arxiv.org/abs/2304.09724
  • Pdf link: https://arxiv.org/pdf/2304.09724
  • Abstract
    In this paper, a compact and high order ADER (Arbitrary high order using DERivatives) scheme using the simple HWENO method (ADER-SHWENO) is proposed for hyperbolic conservation laws. The newly-developed method employs the Lax-Wendroff procedure to convert time derivatives to spatial derivatives, which provides the time evolution of the variables at the cell interfaces. This information is required for the simple HWENO reconstructions, which take advantages of the simple WENO and the classic HWENO. Compared with the original Runge-Kutta HWENO method (RK-HWENO), the new method has two advantages. Firstly, RK-HWENO method must solve the additional equations for reconstructions, which is avoided for the new method. Secondly, the SHWENO reconstruction is performed once with one stencil and is different from the classic HWENO methods, in which both the function and its derivative values are reconstructed with two different stencils, respectively. Thus the new method is more efficient than the RK-HWENO method. Moreover, the new method is more compact than the existing ADER-WENO method. Besides, the new method makes the best use of the information in the ADER method. Thus, the time evolution of the cell averages of the derivatives is simpler than that developed in the work [Li et. al., 447 (2021), 110661.]. Numerical tests indicate that the new method can achieve high order for smooth solutions both in space and time, keep non-oscillatory at discontinuities.

A Multi-robot Coverage Path Planning Algorithm Based on Improved DARP Algorithm

  • Authors: Yufan Huang, Man Li, Tao Zhao
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.09741
  • Pdf link: https://arxiv.org/pdf/2304.09741
  • Abstract
    The research on multi-robot coverage path planning (CPP) has been attracting more and more attention. In order to achieve efficient coverage, this paper proposes an improved DARP coverage algorithm. The improved DARP algorithm based on A* algorithm is used to assign tasks to robots and then combined with STC algorithm based on Up-First algorithm to achieve full coverage of the task area. Compared with the initial DARP algorithm, this algorithm has higher efficiency and higher coverage rate.

Amplifying Sine Unit: An Oscillatory Activation Function for Deep Neural Networks to Recover Nonlinear Oscillations Efficiently

  • Authors: Jamshaid Ul Rahman, Faiza Makhdoom, Dianchen Lu
  • Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.09759
  • Pdf link: https://arxiv.org/pdf/2304.09759
  • Abstract
    Many industrial and real life problems exhibit highly nonlinear periodic behaviors and the conventional methods may fall short of finding their analytical or closed form solutions. Such problems demand some cutting edge computational tools with increased functionality and reduced cost. Recently, deep neural networks have gained massive research interest due to their ability to handle large data and universality to learn complex functions. In this work, we put forward a methodology based on deep neural networks with responsive layers structure to deal nonlinear oscillations in microelectromechanical systems. We incorporated some oscillatory and non oscillatory activation functions such as growing cosine unit known as GCU, Sine, Mish and Tanh in our designed network to have a comprehensive analysis on their performance for highly nonlinear and vibrational problems. Integrating oscillatory activation functions with deep neural networks definitely outperform in predicting the periodic patterns of underlying systems. To support oscillatory actuation for nonlinear systems, we have proposed a novel oscillatory activation function called Amplifying Sine Unit denoted as ASU which is more efficient than GCU for complex vibratory systems such as microelectromechanical systems. Experimental results show that the designed network with our proposed activation function ASU is more reliable and robust to handle the challenges posed by nonlinearity and oscillations. To validate the proposed methodology, outputs of our networks are being compared with the results from Livermore solver for ordinary differential equation called LSODA. Further, graphical illustrations of incurred errors are also being presented in the work.

Nearly Work-Efficient Parallel DFS in Undirected Graphs

  • Authors: Mohsen Ghaffari, Christoph Grunau, Jiahao Qu
  • Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.09774
  • Pdf link: https://arxiv.org/pdf/2304.09774
  • Abstract
    We present the first parallel depth-first search algorithm for undirected graphs that has near-linear work and sublinear depth. Concretely, in any $n$-node $m$-edge undirected graph, our algorithm computes a DFS in $\tilde{O}(\sqrt{n})$ depth and using $\tilde{O}(m+n)$ work. All prior work either required $\Omega(n)$ depth, and thus were essentially sequential, or needed a high $poly(n)$ work and thus were far from being work-efficient.

Post-Training Quantization for Object Detection

  • Authors: Lin Niu, Jiawei Liu, Zhihang Yuan, Dawei Yang, Xinggang Wang, Wenyu Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09785
  • Pdf link: https://arxiv.org/pdf/2304.09785
  • Abstract
    Efficient inference for object detection networks is a major challenge on edge devices. Post-Training Quantization (PTQ), which transforms a full-precision model into low bit-width directly, is an effective and convenient approach to reduce model inference complexity. But it suffers severe accuracy drop when applied to complex tasks such as object detection. PTQ optimizes the quantization parameters by different metrics to minimize the perturbation of quantization. The p-norm distance of feature maps before and after quantization, Lp, is widely used as the metric to evaluate perturbation. For the specialty of object detection network, we observe that the parameter p in Lp metric will significantly influence its quantization performance. We indicate that using a fixed hyper-parameter p does not achieve optimal quantization performance. To mitigate this problem, we propose a framework, DetPTQ, to assign different p values for quantizing different layers using an Object Detection Output Loss (ODOL), which represents the task loss of object detection. DetPTQ employs the ODOL-based adaptive Lp metric to select the optimal quantization parameters. Experiments show that our DetPTQ outperforms the state-of-the-art PTQ methods by a significant margin on both 2D and 3D object detectors. For example, we achieve 31.1/31.7(quantization/full-precision) mAP on RetinaNet-ResNet18 with 4-bit weight and 4-bit activation.

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

  • Authors: Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09787
  • Pdf link: https://arxiv.org/pdf/2304.09787
  • Abstract
    Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments. We leverage Latent Diffusion Models that have been successfully utilized for efficient high-quality 2D content creation. We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene. To further compress this representation, we train a latent-autoencoder that maps the voxel grids to a set of latent representations. A hierarchical diffusion model is then fit to the latents to complete the scene generation pipeline. We achieve a substantial improvement over existing state-of-the-art scene generation models. Additionally, we show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation.

Progressive-Hint Prompting Improves Reasoning in Large Language Models

  • Authors: Chuanyang Zheng, Zhengying Liu, Enze Xie, Zhenguo Li, Yu Li
  • Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09797
  • Pdf link: https://arxiv.org/pdf/2304.09797
  • Abstract
    The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability. However, these methods do not fully exploit the answers generated by the LLM to guide subsequent responses. This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP), that enables automatic multiple interactions between users and LLMs by using previously generated answers as hints to progressively guide toward the correct answers. PHP is orthogonal to CoT and self-consistency, making it easy to combine with state-of-the-art techniques to further improve performance. We conducted an extensive and comprehensive evaluation to demonstrate the effectiveness of the proposed method. Our experimental results on six benchmarks show that combining CoT and self-consistency with PHP significantly improves accuracy while remaining highly efficient. For instance, with text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding compared to Complex CoT, and a 46.17% reduction in sample paths with self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances on SVAMP (91.9%), GSM8K (95.5%) and AQuA (79.9%).

VMA: Divide-and-Conquer Vectorized Map Annotation System for Large-Scale Driving Scene

  • Authors: Shaoyu Chen, Yunchi Zhang, Bencheng Liao, Jiafeng Xie, Tianheng Cheng, Wei Sui, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09807
  • Pdf link: https://arxiv.org/pdf/2304.09807
  • Abstract
    High-definition (HD) map serves as the essential infrastructure of autonomous driving. In this work, we build up a systematic vectorized map annotation framework (termed VMA) for efficiently generating HD map of large-scale driving scene. We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geometric patterns as unified point sequence representation, which can be extended to most map elements in the driving scene. VMA is highly efficient and extensible, requiring negligible human effort, and flexible in terms of spatial scale and element type. We quantitatively and qualitatively validate the annotation performance on real-world urban and highway scenes, as well as NYC Planimetric Database. VMA can significantly improve map generation efficiency and require little human effort. On average VMA takes 160min for annotating a scene with a range of hundreds of meters, and reduces 52.3% of the human cost, showing great application value.

FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing

  • Authors: Kyle Stachowicz, Dhruv Shah, Arjun Bhorkar, Ilya Kostrikov, Sergey Levine
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09831
  • Pdf link: https://arxiv.org/pdf/2304.09831
  • Abstract
    We present a system that enables an autonomous small-scale RC car to drive aggressively from visual observations using reinforcement learning (RL). Our system, FastRLAP (faster lap), trains autonomously in the real world, without human interventions, and without requiring any simulation or expert demonstrations. Our system integrates a number of important components to make this possible: we initialize the representations for the RL policy and value function from a large prior dataset of other robots navigating in other environments (at low speed), which provides a navigation-relevant representation. From here, a sample-efficient online RL method uses a single low-speed user-provided demonstration to determine the desired driving course, extracts a set of navigational checkpoints, and autonomously practices driving through these checkpoints, resetting automatically on collision or failure. Perhaps surprisingly, we find that with appropriate initialization and choice of algorithm, our system can learn to drive over a variety of racing courses with less than 20 minutes of online training. The resulting policies exhibit emergent aggressive driving skills, such as timing braking and acceleration around turns and avoiding areas which impede the robot's motion, approaching the performance of a human driver using a similar first-person interface over the course of training.

Optimal Codes Detecting Deletions in Concatenated Binary Strings Applied to Trace Reconstruction

  • Authors: Serge Kas Hanna
  • Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM)
  • Arxiv link: https://arxiv.org/abs/2304.09839
  • Pdf link: https://arxiv.org/pdf/2304.09839
  • Abstract
    Consider two or more strings $\mathbf{x}^1,\mathbf{x}^2,\ldots,$ that are concatenated to form $\mathbf{x}=\langle \mathbf{x}^1,\mathbf{x}^2,\ldots \rangle$. Suppose that up to $\delta$ deletions occur in each of the concatenated strings. Since deletions alter the lengths of the strings, a fundamental question to ask is: how much redundancy do we need to introduce in $\mathbf{x}$ in order to recover the boundaries of $\mathbf{x}^1,\mathbf{x}^2,\ldots$? This boundary problem is equivalent to the problem of designing codes that can detect the exact number of deletions in each concatenated string. In this work, we answer the question above by first deriving converse results that give lower bounds on the redundancy of deletion-detecting codes. Then, we present a marker-based code construction whose redundancy is asymptotically optimal in $\delta$ among all families of deletion-detecting codes, and exactly optimal among all block-by-block decodable codes. To exemplify the usefulness of such deletion-detecting codes, we apply our code to trace reconstruction and design an efficient coded reconstruction scheme that requires a constant number of traces.

Transformer-Based Visual Segmentation: A Survey

  • Authors: Xiangtai Li, Henghui Ding, Wenwei Zhang, Haobo Yuan, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09854
  • Pdf link: https://arxiv.org/pdf/2304.09854
  • Abstract
    Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several closely related settings, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research. The project page can be found at https://github.com/lxtGH/Awesome-Segmenation-With-Transformer. We will also continually monitor developments in this rapidly evolving field.

Keyword: faster

LEA: Beyond Evolutionary Algorithms via Learned Optimization Strategy

  • Authors: Kai Wu, Penghui Liu, Jing Liu
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09599
  • Pdf link: https://arxiv.org/pdf/2304.09599
  • Abstract
    Evolutionary algorithms (EAs) have emerged as a powerful framework for expensive black-box optimization. Obtaining better solutions with less computational cost is essential and challenging for black-box optimization. The most critical obstacle is figuring out how to effectively use the target task information to form an efficient optimization strategy. However, current methods are weak due to the poor representation of the optimization strategy and the inefficient interaction between the optimization strategy and the target task. To overcome the above limitations, we design a learned EA (LEA) to realize the move from hand-designed optimization strategies to learned optimization strategies, including not only hyperparameters but also update rules. Unlike traditional EAs, LEA has high adaptability to the target task and can obtain better solutions with less computational cost. LEA is also able to effectively utilize the low-fidelity information of the target task to form an efficient optimization strategy. The experimental results on one synthetic case, CEC 2013, and two real-world cases show the advantages of learned optimization strategies over human-designed baselines. In addition, LEA is friendly to the acceleration provided by Graphics Processing Units and runs 102 times faster than unaccelerated EA when evolving 32 populations, each containing 6400 individuals.

List Defective Colorings: Distributed Algorithms and Applications

  • Authors: Marc Fuchs, Fabian Kuhn
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.09666
  • Pdf link: https://arxiv.org/pdf/2304.09666
  • Abstract
    The distributed coloring problem is at the core of the area of distributed graph algorithms and it is a problem that has seen tremendous progress over the last few years. Much of the remarkable recent progress on deterministic distributed coloring algorithms is based on two main tools: a) defective colorings in which every node of a given color can have a limited number of neighbors of the same color and b) list coloring, a natural generalization of the standard coloring problem that naturally appears when colorings are computed in different stages and one has to extend a previously computed partial coloring to a full coloring. In this paper, we introduce \emph{list defective colorings}, which can be seen as a generalization of these two coloring variants. Essentially, in a list defective coloring instance, each node $v$ is given a list of colors $x_{v,1},\dots,x_{v,p}$ together with a list of defects $d_{v,1},\dots,d_{v,p}$ such that if $v$ is colored with color $x_{v, i}$, it is allowed to have at most $d_{v, i}$ neighbors with color $x_{v, i}$. We highlight the important role of list defective colorings by showing that faster list defective coloring algorithms would directly lead to faster deterministic $(\Delta+1)$-coloring algorithms in the LOCAL model. Further, we extend a recent distributed list coloring algorithm by Maus and Tonoyan [DISC '20]. Slightly simplified, we show that if for each node $v$ it holds that $\sum_{i=1}^p \big(d_{v,i}+1)^2 &gt; \mathrm{deg}_G^2(v)\cdot polylog\Delta$ then this list defective coloring instance can be solved in a communication-efficient way in only $O(\log\Delta)$ communication rounds. This leads to the first deterministic $(\Delta+1)$-coloring algorithm in the standard CONGEST model with a time complexity of $O(\sqrt{\Delta}\cdot polylog \Delta+\log^* n)$, matching the best time complexity in the LOCAL model up to a $polylog\Delta$ factor.

Grooming Connectivity Intents in IP-Optical Networks Using Directed Acyclic Graphs

  • Authors: Filippos Christou, Andreas Kirstädter
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.09711
  • Pdf link: https://arxiv.org/pdf/2304.09711
  • Abstract
    During the last few years, there have been concentrated efforts toward intent-driven networking. While relying upon Software-Defined Networking (SDN), Intent-Based Networking (IBN) pushes the frontiers of efficient networking by decoupling the intentions of a network operator (i.e., what is desired to be done) from the implementation (i.e., how is it achieved). The advantages of such a paradigm have long been argued and include, but are not limited to, the reduction of human errors, reduced expertise requirements among operator personnel, and faster business plan adaptation. In previous work, we have shown how incorporating IBN in multi-domain networks can have a significantly positive impact as it can enable decentralized operation, accountability, and confidentiality. The pillar of our previous contribution is the compilation of intents using system-generated intent trees. In this work, we extend the architecture to enable grooming among the user intents. Therefore, separate intents can now end up using the same network resources. While this makes the intent system reasonably more complex, it indisputably improves resource allocation. To represent the intent relationships of the newly enhanced architecture, we use Directed Acyclic Graphs (DAGs). Furthermore, we appropriately adapt an advanced established technique from the literature to solve the Routing, Modulation, and Spectrum Assignment (RMSA) problem for the intent compilation. We demonstrate a realistic scenario in which we evaluate our architecture and the intent compilation strategy. Our current approach successfully consolidates the advantages of having an intent-driven architecture and, at the same time, flexibly choosing among advanced resource allocation techniques.

Comma Selection Outperforms Plus Selection on OneMax with Randomly Planted Optima

  • Authors: Joost Jorritsma, Johannes Lengler, Dirk Sudholt
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.09712
  • Pdf link: https://arxiv.org/pdf/2304.09712
  • Abstract
    It is an ongoing debate whether and how comma selection in evolutionary algorithms helps to escape local optima. We propose a new benchmark function to investigate the benefits of comma selection: OneMax with randomly planted local optima, generated by frozen noise. We show that comma selection (the $(1,\lambda)$ EA) is faster than plus selection (the $(1+\lambda)$ EA) on this benchmark, in a fixed-target scenario, and for offspring population sizes $\lambda$ for which both algorithms behave differently. For certain parameters, the $(1,\lambda)$ EA finds the target in $\Theta(n \ln n)$ evaluations, with high probability (w.h.p.), while the $(1+\lambda)$ EA) w.h.p. requires almost $\Theta((n\ln n)^2)$ evaluations. We further show that the advantage of comma selection is not arbitrarily large: w.h.p. comma selection outperforms plus selection at most by a factor of $O(n \ln n)$ for most reasonable parameter choices. We develop novel methods for analysing frozen noise and give powerful and general fixed-target results with tail bounds that are of independent interest.

FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing

  • Authors: Kyle Stachowicz, Dhruv Shah, Arjun Bhorkar, Ilya Kostrikov, Sergey Levine
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09831
  • Pdf link: https://arxiv.org/pdf/2304.09831
  • Abstract
    We present a system that enables an autonomous small-scale RC car to drive aggressively from visual observations using reinforcement learning (RL). Our system, FastRLAP (faster lap), trains autonomously in the real world, without human interventions, and without requiring any simulation or expert demonstrations. Our system integrates a number of important components to make this possible: we initialize the representations for the RL policy and value function from a large prior dataset of other robots navigating in other environments (at low speed), which provides a navigation-relevant representation. From here, a sample-efficient online RL method uses a single low-speed user-provided demonstration to determine the desired driving course, extracts a set of navigational checkpoints, and autonomously practices driving through these checkpoints, resetting automatically on collision or failure. Perhaps surprisingly, we find that with appropriate initialization and choice of algorithm, our system can learn to drive over a variety of racing courses with less than 20 minutes of online training. The resulting policies exhibit emergent aggressive driving skills, such as timing braking and acceleration around turns and avoiding areas which impede the robot's motion, approaching the performance of a human driver using a similar first-person interface over the course of training.

LipsFormer: Introducing Lipschitz Continuity to Vision Transformers

  • Authors: Xianbiao Qi, Jianan Wang, Yihao Chen, Yukai Shi, Lei Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09856
  • Pdf link: https://arxiv.org/pdf/2304.09856
  • Abstract
    We present a Lipschitz continuous Transformer, called LipsFormer, to pursue training stability both theoretically and empirically for Transformer-based models. In contrast to previous practical tricks that address training instability by learning rate warmup, layer normalization, attention formulation, and weight initialization, we show that Lipschitz continuity is a more essential property to ensure training stability. In LipsFormer, we replace unstable Transformer component modules with Lipschitz continuous counterparts: CenterNorm instead of LayerNorm, spectral initialization instead of Xavier initialization, scaled cosine similarity attention instead of dot-product attention, and weighted residual shortcut. We prove that these introduced modules are Lipschitz continuous and derive an upper bound on the Lipschitz constant of LipsFormer. Our experiments show that LipsFormer allows stable training of deep Transformer architectures without the need of careful learning rate tuning such as warmup, yielding a faster convergence and better generalization. As a result, on the ImageNet 1K dataset, LipsFormer-Swin-Tiny based on Swin Transformer training for 300 epochs can obtain 82.7% without any learning rate warmup. Moreover, LipsFormer-CSwin-Tiny, based on CSwin, training for 300 epochs achieves a top-1 accuracy of 83.5% with 4.7G FLOPs and 24M parameters. The code will be released at \url{https://github.com/IDEA-Research/LipsFormer}.

Keyword: mobile

Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units

  • Authors: Mohammed E. Elbtity, Brendan Reidy, Md Hasibul Amin, Ramtin Zand
  • Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09258
  • Pdf link: https://arxiv.org/pdf/2304.09258
  • Abstract
    Tensor processing units (TPUs), specialized hardware accelerators for machine learning tasks, have shown significant performance improvements when executing convolutional layers in convolutional neural networks (CNNs). However, they struggle to maintain the same efficiency in fully connected (FC) layers, leading to suboptimal hardware utilization. In-memory analog computing (IMAC) architectures, on the other hand, have demonstrated notable speedup in executing FC layers. This paper introduces a novel, heterogeneous, mixed-signal, and mixed-precision architecture that integrates an IMAC unit with an edge TPU to enhance mobile CNN performance. To leverage the strengths of TPUs for convolutional layers and IMAC circuits for dense layers, we propose a unified learning algorithm that incorporates mixed-precision training techniques to mitigate potential accuracy drops when deploying models on the TPU-IMAC architecture. The simulations demonstrate that the TPU-IMAC configuration achieves up to $2.59\times$ performance improvements, and $88%$ memory reductions compared to conventional TPU architectures for various CNN models while maintaining comparable accuracy. The TPU-IMAC architecture shows potential for various applications where energy efficiency and high performance are essential, such as edge computing and real-time processing in mobile devices. The unified training algorithm and the integration of IMAC and TPU architectures contribute to the potential impact of this research on the broader machine learning landscape.

Secure Mobile Payment Architecture Enabling Multi-factor Authentication

  • Authors: Hosam Alamleh, Ali Abdullah S. AlQahtani, Baker Al Smadi
  • Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.09468
  • Pdf link: https://arxiv.org/pdf/2304.09468
  • Abstract
    The rise of smartphones has led to a significant increase in the usage of mobile payments. Mobile payments allow individuals to access financial resources and make transactions through their mobile devices while on the go. However, the current mobile payment systems were designed to align with traditional payment structures, which limits the full potential of smartphones, including their security features. This has become a major concern in the rapidly growing mobile payment market. To address these security concerns,in this paper we propose new mobile payment architecture. This architecture leverages the advanced capabilities of modern smartphones to verify various aspects of a payment, such as funds, biometrics, location, and others. The proposed system aims to guarantee the legitimacy of transactions and protect against identity theft by verifying multiple elements of a payment. The security of mobile payment systems is crucial, given the rapid growth of the market. Evaluating mobile payment systems based on their authentication, encryption, and fraud detection capabilities is of utmost importance. The proposed architecture provides a secure mobile payment solution that enhances the overall payment experience by taking advantage of the advanced capabilities of modern smartphones. This will not only improve the security of mobile payments but also offer a more user-friendly payment experience for consumers.

Learning Resource Scheduling with High Priority Users using Deep Deterministic Policy Gradients

  • Authors: Steffen Gracla, Edgar Beck, Carsten Bockelmann, Armin Dekorsy
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09488
  • Pdf link: https://arxiv.org/pdf/2304.09488
  • Abstract
    Advances in mobile communication capabilities open the door for closer integration of pre-hospital and in-hospital care processes. For example, medical specialists can be enabled to guide on-site paramedics and can, in turn, be supplied with live vitals or visuals. Consolidating such performance-critical applications with the highly complex workings of mobile communications requires solutions both reliable and efficient, yet easy to integrate with existing systems. This paper explores the application of Deep Deterministic Policy Gradient~(\ddpg) methods for learning a communications resource scheduling algorithm with special regards to priority users. Unlike the popular Deep-Q-Network methods, the \ddpg is able to produce continuous-valued output. With light post-processing, the resulting scheduler is able to achieve high performance on a flexible sum-utility goal.

SelfAct: Personalized Activity Recognition based on Self-Supervised and Active Learning

  • Authors: Luca Arrotta, Gabriele Civitarese, Samuele Valente, Claudio Bettini
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.09530
  • Pdf link: https://arxiv.org/pdf/2304.09530
  • Abstract
    Supervised Deep Learning (DL) models are currently the leading approach for sensor-based Human Activity Recognition (HAR) on wearable and mobile devices. However, training them requires large amounts of labeled data whose collection is often time-consuming, expensive, and error-prone. At the same time, due to the intra- and inter-variability of activity execution, activity models should be personalized for each user. In this work, we propose SelfAct: a novel framework for HAR combining self-supervised and active learning to mitigate these problems. SelfAct leverages a large pool of unlabeled data collected from many users to pre-train through self-supervision a DL model, with the goal of learning a meaningful and efficient latent representation of sensor data. The resulting pre-trained model can be locally used by new users, which will fine-tune it thanks to a novel unsupervised active learning strategy. Our experiments on two publicly available HAR datasets demonstrate that SelfAct achieves results that are close to or even better than the ones of fully supervised approaches with a small number of active learning queries.

DynamicRead: Exploring Robust Gaze Interaction Methods for Reading on Handheld Mobile Devices under Dynamic Conditions

  • Authors: Yaxiong Lei, Yuheng Wang, Tyler Caslin, Alexander Wisowaty, Xu Zhu, Mohamed Khamis, Juan Ye
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.09584
  • Pdf link: https://arxiv.org/pdf/2304.09584
  • Abstract
    Enabling gaze interaction in real-time on handheld mobile devices has attracted significant attention in recent years. An increasing number of research projects have focused on sophisticated appearance-based deep learning models to enhance the precision of gaze estimation on smartphones. This inspires important research questions, including how the gaze can be used in a real-time application, and what type of gaze interaction methods are preferable under dynamic conditions in terms of both user acceptance and delivering reliable performance. To address these questions, we design four types of gaze scrolling techniques: three explicit technique based on Gaze Gesture, Dwell time, and Pursuit; and one implicit technique based on reading speed to support touch-free, page-scrolling on a reading application. We conduct a 20-participant user study under both sitting and walking settings and our results reveal that Gaze Gesture and Dwell time-based interfaces are more robust while walking and Gaze Gesture has achieved consistently good scores on usability while not causing high cognitive workload.

Integrated Ray-Tracing and Coverage Planning Control using Reinforcement Learning

  • Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.09631
  • Pdf link: https://arxiv.org/pdf/2304.09631
  • Abstract
    In this work we propose a coverage planning control approach which allows a mobile agent, equipped with a controllable sensor (i.e., a camera) with limited sensing domain (i.e., finite sensing range and angle of view), to cover the surface area of an object of interest. The proposed approach integrates ray-tracing into the coverage planning process, thus allowing the agent to identify which parts of the scene are visible at any point in time. The problem of integrated ray-tracing and coverage planning control is first formulated as a constrained optimal control problem (OCP), which aims at determining the agent's optimal control inputs over a finite planning horizon, that minimize the coverage time. Efficiently solving the resulting OCP is however very challenging due to non-convex and non-linear visibility constraints. To overcome this limitation, the problem is converted into a Markov decision process (MDP) which is then solved using reinforcement learning. In particular, we show that a controller which follows an optimal control law can be learned using off-policy temporal-difference control (i.e., Q-learning). Extensive numerical experiments demonstrate the effectiveness of the proposed approach for various configurations of the agent and the object of interest.

Keyword: pruning

Network Pruning Spaces

  • Authors: Xuanyu He, Yu-I Yang, Ran Song, Jiachen Pu, Conggang Hu, Feijun Jiang, Wei Zhang, Huanghao Ding
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09453
  • Pdf link: https://arxiv.org/pdf/2304.09453
  • Abstract
    Network pruning techniques, including weight pruning and filter pruning, reveal that most state-of-the-art neural networks can be accelerated without a significant performance drop. This work focuses on filter pruning which enables accelerated inference with any off-the-shelf deep learning library and hardware. We propose the concept of \emph{network pruning spaces} that parametrize populations of subnetwork architectures. Based on this concept, we explore the structure aspect of subnetworks that result in minimal loss of accuracy in different pruning regimes and arrive at a series of observations by comparing subnetwork distributions. We conjecture through empirical studies that there exists an optimal FLOPs-to-parameter-bucket ratio related to the design of original network in a pruning regime. Statistically, the structure of a winning subnetwork guarantees an approximately optimal ratio in this regime. Upon our conjectures, we further refine the initial pruning space to reduce the cost of searching a good subnetwork architecture. Our experimental results on ImageNet show that the subnetwork we found is superior to those from the state-of-the-art pruning methods under comparable FLOPs.

Biologically inspired structure learning with reverse knowledge distillation for spiking neural networks

  • Authors: Qi Xu, Yaxin Li, Xuanye Fang, Jiangrong Shen, Jian K. Liu, Huajin Tang, Gang Pan
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09500
  • Pdf link: https://arxiv.org/pdf/2304.09500
  • Abstract
    Spiking neural networks (SNNs) have superb characteristics in sensory information recognition tasks due to their biological plausibility. However, the performance of some current spiking-based models is limited by their structures which means either fully connected or too-deep structures bring too much redundancy. This redundancy from both connection and neurons is one of the key factors hindering the practical application of SNNs. Although Some pruning methods were proposed to tackle this problem, they normally ignored the fact the neural topology in the human brain could be adjusted dynamically. Inspired by this, this paper proposed an evolutionary-based structure construction method for constructing more reasonable SNNs. By integrating the knowledge distillation and connection pruning method, the synaptic connections in SNNs can be optimized dynamically to reach an optimal state. As a result, the structure of SNNs could not only absorb knowledge from the teacher model but also search for deep but sparse network topology. Experimental results on CIFAR100 and DVS-Gesture show that the proposed structure learning method can get pretty well performance while reducing the connection redundancy. The proposed method explores a novel dynamical way for structure learning from scratch in SNNs which could build a bridge to close the gap between deep learning and bio-inspired neural dynamics.

Single-View View Synthesis with Self-Rectified Pseudo-Stereo

  • Authors: Zhou Yang, Wu Hanjie, Liu Wenxi, Xiong Zheng, Qin Jing, He Shengfeng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09527
  • Pdf link: https://arxiv.org/pdf/2304.09527
  • Abstract
    Synthesizing novel views from a single view image is a highly ill-posed problem. We discover an effective solution to reduce the learning ambiguity by expanding the single-view view synthesis problem to a multi-view setting. Specifically, we leverage the reliable and explicit stereo prior to generate a pseudo-stereo viewpoint, which serves as an auxiliary input to construct the 3D space. In this way, the challenging novel view synthesis process is decoupled into two simpler problems of stereo synthesis and 3D reconstruction. In order to synthesize a structurally correct and detail-preserved stereo image, we propose a self-rectified stereo synthesis to amend erroneous regions in an identify-rectify manner. Hard-to-train and incorrect warping samples are first discovered by two strategies, 1) pruning the network to reveal low-confident predictions; and 2) bidirectionally matching between stereo images to allow the discovery of improper mapping. These regions are then inpainted to form the final pseudo-stereo. With the aid of this extra input, a preferable 3D reconstruction can be easily obtained, and our method can work with arbitrary 3D representations. Extensive experiments show that our method outperforms state-of-the-art single-view view synthesis methods and stereo synthesis methods.

Keyword: voxel

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

  • Authors: Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09787
  • Pdf link: https://arxiv.org/pdf/2304.09787
  • Abstract
    Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments. We leverage Latent Diffusion Models that have been successfully utilized for efficient high-quality 2D content creation. We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene. To further compress this representation, we train a latent-autoencoder that maps the voxel grids to a set of latent representations. A hierarchical diffusion model is then fit to the latents to complete the scene generation pipeline. We achieve a substantial improvement over existing state-of-the-art scene generation models. Additionally, we show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation.

Keyword: lidar

Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection

  • Authors: Qianjiang Hu, Daizong Liu, Wei Hu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09446
  • Pdf link: https://arxiv.org/pdf/2304.09446
  • Abstract
    3D object detection from point clouds is crucial in safety-critical autonomous driving. Although many works have made great efforts and achieved significant progress on this task, most of them suffer from expensive annotation cost and poor transferability to unknown data due to the domain gap. Recently, few works attempt to tackle the domain gap in objects, but still fail to adapt to the gap of varying beam-densities between two domains, which is critical to mitigate the characteristic differences of the LiDAR collectors. To this end, we make the attempt to propose a density-insensitive domain adaption framework to address the density-induced domain gap. In particular, we first introduce Random Beam Re-Sampling (RBRS) to enhance the robustness of 3D detectors trained on the source domain to the varying beam-density. Then, we take this pre-trained detector as the backbone model, and feed the unlabeled target domain data into our newly designed task-specific teacher-student framework for predicting its high-quality pseudo labels. To further adapt the property of density-insensitivity into the target domain, we feed the teacher and student branches with the same sample of different densities, and propose an Object Graph Alignment (OGA) module to construct two object-graphs between the two branches for enforcing the consistency in both the attribute and relation of cross-density objects. Experimental results on three widely adopted 3D object detection datasets demonstrate that our proposed domain adaption method outperforms the state-of-the-art methods, especially over varying-density data. Code is available at https://github.com/WoodwindHu/DTS}{https://github.com/WoodwindHu/DTS.

CrossFusion: Interleaving Cross-modal Complementation for Noise-resistant 3D Object Detection

  • Authors: Yang Yang, Weijie Ma, Hao Chen, Linlin Ou, Xinyi Yu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09694
  • Pdf link: https://arxiv.org/pdf/2304.09694
  • Abstract
    The combination of LiDAR and camera modalities is proven to be necessary and typical for 3D object detection according to recent studies. Existing fusion strategies tend to overly rely on the LiDAR modal in essence, which exploits the abundant semantics from the camera sensor insufficiently. However, existing methods cannot rely on information from other modalities because the corruption of LiDAR features results in a large domain gap. Following this, we propose CrossFusion, a more robust and noise-resistant scheme that makes full use of the camera and LiDAR features with the designed cross-modal complementation strategy. Extensive experiments we conducted show that our method not only outperforms the state-of-the-art methods under the setting without introducing an extra depth estimation network but also demonstrates our model's noise resistance without re-training for the specific malfunction scenarios by increasing 5.2% mAP and 2.4% NDS.

Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

  • Authors: Romain Loiseau, Elliot Vincent, Mathieu Aubry, Loic Landrieu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09704
  • Pdf link: https://arxiv.org/pdf/2304.09704
  • Abstract
    We propose an unsupervised method for parsing large 3D scans of real-world scenes into interpretable parts. Our goal is to provide a practical tool for analyzing 3D scenes with unique characteristics in the context of aerial surveying and mapping, without relying on application-specific user annotations. Our approach is based on a probabilistic reconstruction model that decomposes an input 3D point cloud into a small set of learned prototypical shapes. Our model provides an interpretable reconstruction of complex scenes and leads to relevant instance and semantic segmentations. To demonstrate the usefulness of our results, we introduce a novel dataset of seven diverse aerial LiDAR scans. We show that our method outperforms state-of-the-art unsupervised methods in terms of decomposition accuracy while remaining visually interpretable. Our method offers significant advantage over existing approaches, as it does not require any manual annotations, making it a practical and efficient tool for 3D scene analysis. Our code and dataset are available at https://imagine.enpc.fr/~loiseaur/learnable-earth-parser

UniCal: a Single-Branch Transformer-Based Model for Camera-to-LiDAR Calibration and Validation

  • Authors: Mathieu Cocheteux, Aaron Low, Marius Bruehlmeier
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.09715
  • Pdf link: https://arxiv.org/pdf/2304.09715
  • Abstract
    We introduce a novel architecture, UniCal, for Camera-to-LiDAR (C2L) extrinsic calibration which leverages self-attention mechanisms through a Transformer-based backbone network to infer the 6-degree of freedom (DoF) relative transformation between the sensors. Unlike previous methods, UniCal performs an early fusion of the input camera and LiDAR data by aggregating camera image channels and LiDAR mappings into a multi-channel unified representation before extracting their features jointly with a single-branch architecture. This single-branch architecture makes UniCal lightweight, which is desirable in applications with restrained resources such as autonomous driving. Through experiments, we show that UniCal achieves state-of-the-art results compared to existing methods. We also show that through transfer learning, weights learned on the calibration task can be applied to a calibration validation task without re-training the backbone.

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

  • Authors: Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09801
  • Pdf link: https://arxiv.org/pdf/2304.09801
  • Abstract
    Perception systems in modern autonomous driving vehicles typically take inputs from complementary multi-modal sensors, e.g., LiDAR and cameras. However, in real-world applications, sensor corruptions and failures lead to inferior performances, thus compromising autonomous safety. In this paper, we propose a robust framework, called MetaBEV, to address extreme real-world environments involving overall six sensor corruptions and two extreme sensor-missing situations. In MetaBEV, signals from multiple sensors are first processed by modal-specific encoders. Subsequently, a set of dense BEV queries are initialized, termed meta-BEV. These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities. The updated BEV representations are further leveraged for multiple 3D prediction tasks. Additionally, we introduce a new M2oE structure to alleviate the performance drop on distinct tasks in multi-task joint learning. Finally, MetaBEV is evaluated on the nuScenes dataset with 3D object detection and BEV map segmentation tasks. Experiments show MetaBEV outperforms prior arts by a large margin on both full and corrupted modalities. For instance, when the LiDAR signal is missing, MetaBEV improves 35.5% detection NDS and 17.7% segmentation mIoU upon the vanilla BEVFusion model; and when the camera signal is absent, MetaBEV still achieves 69.2% NDS and 53.7% mIoU, which is even higher than previous works that perform on full-modalities. Moreover, MetaBEV performs fairly against previous methods in both canonical perception and multi-task learning settings, refreshing state-of-the-art nuScenes BEV map segmentation with 70.4% mIoU.

Keyword: diffusion

A structure-preserving upwind DG scheme for a degenerate phase-field tumor model

  • Authors: Daniel Acosta-Soba, Francisco Guillén-González, J. Rafael Rodríguez Galván
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.09257
  • Pdf link: https://arxiv.org/pdf/2304.09257
  • Abstract
    In this work, we present a modification of the phase-field tumor growth model given in [26] that leads to bounded, more physically meaningful, volume fraction variables. In addition, we develop an upwind discontinuous Galerkin (DG) scheme preserving the mass conservation, pointwise bounds and energy stability of the continuous model. Finally, some computational tests in accordance with the theoretical results are introduced. In the first test, we compare our DG scheme with the finite element (FE) scheme related to the same time approximation. The DG scheme shows a well-behavior even for strong cross-diffusion effects in contrast with FE where numerical spurious oscillations appear. Moreover, the second test exhibits the behavior of the tumor-growth model under different choices of parameters and also of mobility and proliferation functions.

DiFaReli : Diffusion Face Relighting

  • Authors: Puntawat Ponglertnapakorn, Nontawat Tritrong, Supasorn Suwajanakorn
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09479
  • Pdf link: https://arxiv.org/pdf/2304.09479
  • Abstract
    We present a novel approach to single-view face relighting in the wild. Handling non-diffuse effects, such as global illumination or cast shadows, has long been a challenge in face relighting. Prior work often assumes Lambertian surfaces, simplified lighting models or involves estimating 3D shape, albedo, or a shadow map. This estimation, however, is error-prone and requires many training examples with lighting ground truth to generalize well. Our work bypasses the need for accurate estimation of intrinsic components and can be trained solely on 2D images without any light stage data, multi-view images, or lighting ground truth. Our key idea is to leverage a conditional diffusion implicit model (DDIM) for decoding a disentangled light encoding along with other encodings related to 3D shape and facial identity inferred from off-the-shelf estimators. We also propose a novel conditioning technique that eases the modeling of the complex interaction between light and geometry by using a rendered shading reference to spatially modulate the DDIM. We achieve state-of-the-art performance on standard benchmark Multi-PIE and can photorealistically relight in-the-wild images. Please visit our page: https://diffusion-face-relighting.github.io

Realistic Data Enrichment for Robust Image Segmentation in Histopathology

  • Authors: Sarah Cechnicka, James Ball, Callum Arthurs, Candice Roufosse, Bernhard Kainz
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09534
  • Pdf link: https://arxiv.org/pdf/2304.09534
  • Abstract
    Poor performance of quantitative analysis in histopathological Whole Slide Images (WSI) has been a significant obstacle in clinical practice. Annotating large-scale WSIs manually is a demanding and time-consuming task, unlikely to yield the expected results when used for fully supervised learning systems. Rarely observed disease patterns and large differences in object scales are difficult to model through conventional patient intake. Prior methods either fall back to direct disease classification, which only requires learning a few factors per image, or report on average image segmentation performance, which is highly biased towards majority observations. Geometric image augmentation is commonly used to improve robustness for average case predictions and to enrich limited datasets. So far no method provided sampling of a realistic posterior distribution to improve stability, e.g. for the segmentation of imbalanced objects within images. Therefore, we propose a new approach, based on diffusion models, which can enrich an imbalanced dataset with plausible examples from underrepresented groups by conditioning on segmentation maps. Our method can simply expand limited clinical datasets making them suitable to train machine learning pipelines, and provides an interpretable and human-controllable way of generating histopathology images that are indistinguishable from real ones to human experts. We validate our findings on two datasets, one from the public domain and one from a Kidney Transplant study.

Reference-based Image Composition with Sketch via Structure-aware Diffusion Model

  • Authors: Kangyeol Kim, Sunghyun Park, Junsoo Lee, Jaegul Choo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09748
  • Pdf link: https://arxiv.org/pdf/2304.09748
  • Abstract
    Recent remarkable improvements in large-scale text-to-image generative models have shown promising results in generating high-fidelity images. To further enhance editability and enable fine-grained generation, we introduce a multi-input-conditioned image composition model that incorporates a sketch as a novel modal, alongside a reference image. Thanks to the edge-level controllability using sketches, our method enables a user to edit or complete an image sub-part with a desired structure (i.e., sketch) and content (i.e., reference image). Our framework fine-tunes a pre-trained diffusion model to complete missing regions using the reference image while maintaining sketch guidance. Albeit simple, this leads to wide opportunities to fulfill user needs for obtaining the in-demand images. Through extensive experiments, we demonstrate that our proposed method offers unique use cases for image manipulation, enabling user-driven modifications of arbitrary scenes.

Attributing Image Generative Models using Latent Fingerprints

  • Authors: Guangyu Nie, Changhoon Kim, Yezhou Yang, Yi Ren
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09752
  • Pdf link: https://arxiv.org/pdf/2304.09752
  • Abstract
    Generative models have enabled the creation of contents that are indistinguishable from those taken from the nature. Open-source development of such models raised concerns about the risks in their misuse for malicious purposes. One potential risk mitigation strategy is to attribute generative models via fingerprinting. Current fingerprinting methods exhibit significant tradeoff between robust attribution accuracy and generation quality, and also lack designing principles to improve this tradeoff. This paper investigates the use of latent semantic dimensions as fingerprints, from where we can analyze the effects of design variables, including the choice of fingerprinting dimensions, strength, and capacity, on the accuracy-quality tradeoff. Compared with previous SOTA, our method requires minimum computation and is more applicable to large-scale models. We use StyleGAN2 and the latent diffusion model to demonstrate the efficacy of our method.

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

  • Authors: Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09787
  • Pdf link: https://arxiv.org/pdf/2304.09787
  • Abstract
    Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments. We leverage Latent Diffusion Models that have been successfully utilized for efficient high-quality 2D content creation. We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene. To further compress this representation, we train a latent-autoencoder that maps the voxel grids to a set of latent representations. A hierarchical diffusion model is then fit to the latents to complete the scene generation pipeline. We achieve a substantial improvement over existing state-of-the-art scene generation models. Additionally, we show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation.

Keyword: dynamic

A Deep Learning Framework for Traffic Data Imputation Considering Spatiotemporal Dependencies

  • Authors: Li Jiang, Ting Zhang, Qiruyi Zuo, Chenyu Tian, George P. Chan, Wai Kin (Victor)Chan
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09182
  • Pdf link: https://arxiv.org/pdf/2304.09182
  • Abstract
    Spatiotemporal (ST) data collected by sensors can be represented as multi-variate time series, which is a sequence of data points listed in an order of time. Despite the vast amount of useful information, the ST data usually suffer from the issue of missing or incomplete data, which also limits its applications. Imputation is one viable solution and is often used to prepossess the data for further applications. However, in practice, n practice, spatiotemporal data imputation is quite difficult due to the complexity of spatiotemporal dependencies with dynamic changes in the traffic network and is a crucial prepossessing task for further applications. Existing approaches mostly only capture the temporal dependencies in time series or static spatial dependencies. They fail to directly model the spatiotemporal dependencies, and the representation ability of the models is relatively limited.

Token Imbalance Adaptation for Radiology Report Generation

  • Authors: Yuexin Wu, I-Chan Huang, Xiaolei Huang
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09185
  • Pdf link: https://arxiv.org/pdf/2304.09185
  • Abstract
    Imbalanced token distributions naturally exist in text documents, leading neural language models to overfit on frequent tokens. The token imbalance may dampen the robustness of radiology report generators, as complex medical terms appear less frequently but reflect more medical information. In this study, we demonstrate how current state-of-the-art models fail to generate infrequent tokens on two standard benchmark datasets (IU X-RAY and MIMIC-CXR) of radiology report generation. % However, no prior study has proposed methods to adapt infrequent tokens for text generators feeding with medical images. To solve the challenge, we propose the \textbf{T}oken \textbf{Im}balance Adapt\textbf{er} (\textit{TIMER}), aiming to improve generation robustness on infrequent tokens. The model automatically leverages token imbalance by an unlikelihood loss and dynamically optimizes generation processes to augment infrequent tokens. We compare our approach with multiple state-of-the-art methods on the two benchmarks. Experiments demonstrate the effectiveness of our approach in enhancing model robustness overall and infrequent tokens. Our ablation analysis shows that our reinforcement learning method has a major effect in adapting token imbalance for radiology report generation.

Towards Spatio-temporal Sea Surface Temperature Forecasting via Static and Dynamic Learnable Personalized Graph Convolution Network

  • Authors: Xiaohan Li, Gaowei Zhang, Kai Huang, Zhaofeng He
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph)
  • Arxiv link: https://arxiv.org/abs/2304.09290
  • Pdf link: https://arxiv.org/pdf/2304.09290
  • Abstract
    Sea surface temperature (SST) is uniquely important to the Earth's atmosphere since its dynamics are a major force in shaping local and global climate and profoundly affect our ecosystems. Accurate forecasting of SST brings significant economic and social implications, for example, better preparation for extreme weather such as severe droughts or tropical cyclones months ahead. However, such a task faces unique challenges due to the intrinsic complexity and uncertainty of ocean systems. Recently, deep learning techniques, such as graphical neural networks (GNN), have been applied to address this task. Even though these methods have some success, they frequently have serious drawbacks when it comes to investigating dynamic spatiotemporal dependencies between signals. To solve this problem, this paper proposes a novel static and dynamic learnable personalized graph convolution network (SD-LPGC). Specifically, two graph learning layers are first constructed to respectively model the stable long-term and short-term evolutionary patterns hidden in the multivariate SST signals. Then, a learnable personalized convolution layer is designed to fuse this information. Our experiments on real SST datasets demonstrate the state-of-the-art performances of the proposed approach on the forecasting task.

Deep Dynamic Cloud Lighting

  • Authors: Pinar Satilmis, Thomas Bashford-Rogers
  • Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09317
  • Pdf link: https://arxiv.org/pdf/2304.09317
  • Abstract
    Sky illumination is a core source of lighting in rendering, and a substantial amount of work has been developed to simulate lighting from clear skies. However, in reality, clouds substantially alter the appearance of the sky and subsequently change the scene's illumination. While there have been recent advances in developing sky models which include clouds, these all neglect cloud movement which is a crucial component of cloudy sky appearance. In any sort of video or interactive environment, it can be expected that clouds will move, sometimes quite substantially in a short period of time. Our work proposes a solution to this which enables whole-sky dynamic cloud synthesis for the first time. We achieve this by proposing a multi-timescale sky appearance model which learns to predict the sky illumination over various timescales, and can be used to add dynamism to previous static, cloudy sky lighting approaches.

A New Deterministic Algorithm for Fully Dynamic All-Pairs Shortest Paths

  • Authors: Julia Chuzhoy, Ruimin Zhang
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.09321
  • Pdf link: https://arxiv.org/pdf/2304.09321
  • Abstract
    We study the fully dynamic All-Pairs Shortest Paths (APSP) problem in undirected edge-weighted graphs. Given an $n$-vertex graph $G$ with non-negative edge lengths, that undergoes an online sequence of edge insertions and deletions, the goal is to support approximate distance queries and shortest-path queries. We provide a deterministic algorithm for this problem, that, for a given precision parameter $\epsilon$, achieves approximation factor $(\log\log n)^{2^{O(1/\epsilon^3)}}$, and has amortized update time $O(n^{\epsilon}\log L)$ per operation, where $L$ is the ratio of longest to shortest edge length. Query time for distance-query is $O(2^{O(1/\epsilon)}\cdot \log n\cdot \log\log L)$, and query time for shortest-path query is $O(|E(P)|+2^{O(1/\epsilon)}\cdot \log n\cdot \log\log L)$, where $P$ is the path that the algorithm returns. To the best of our knowledge, even allowing any $o(n)$-approximation factor, no adaptive-update algorithms with better than $\Theta(m)$ amortized update time and better than $\Theta(n)$ query time were known prior to this work. We also note that our guarantees are stronger than the best current guarantees for APSP in decremental graphs in the adaptive-adversary setting.

BIM-GPT: a Prompt-Based Virtual Assistant Framework for BIM Information Retrieval

  • Authors: Junwen Zheng, Martin Fischer
  • Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.09333
  • Pdf link: https://arxiv.org/pdf/2304.09333
  • Abstract
    Efficient information retrieval (IR) from building information models (BIMs) poses significant challenges due to the necessity for deep BIM knowledge or extensive engineering efforts for automation. We introduce BIM-GPT, a prompt-based virtual assistant (VA) framework integrating BIM and generative pre-trained transformer (GPT) technologies to support NL-based IR. A prompt manager and dynamic template generate prompts for GPT models, enabling interpretation of NL queries, summarization of retrieved information, and answering BIM-related questions. In tests on a BIM IR dataset, our approach achieved 83.5% and 99.5% accuracy rates for classifying NL queries with no data and 2% data incorporated in prompts, respectively. Additionally, we validated the functionality of BIM-GPT through a VA prototype for a hospital building. This research contributes to the development of effective and versatile VAs for BIM IR in the construction industry, significantly enhancing BIM accessibility and reducing engineering efforts and training data requirements for processing NL queries.

BioThings Explorer: a query engine for a federated knowledge graph of biomedical APIs

  • Authors: Jackson Callaghan, Colleen H. Xu, Jiwen Xin, Marco Alvarado Cano, Anders Riutta, Eric Zhou, Rohan Juneja, Yao Yao, Madhumita Narayan, Kristina Hanspers, Ayushi Agrawal, Alexander R. Pico, Chunlei Wu, Andrew I. Su
  • Subjects: Databases (cs.DB); Quantitative Methods (q-bio.QM)
  • Arxiv link: https://arxiv.org/abs/2304.09344
  • Pdf link: https://arxiv.org/pdf/2304.09344
  • Abstract
    Knowledge graphs are an increasingly common data structure for representing biomedical information. These knowledge graphs can easily represent heterogeneous types of information, and many algorithms and tools exist for querying and analyzing graphs. Biomedical knowledge graphs have been used in a variety of applications, including drug repurposing, identification of drug targets, prediction of drug side effects, and clinical decision support. Typically, knowledge graphs are constructed by centralization and integration of data from multiple disparate sources. Here, we describe BioThings Explorer, an application that can query a virtual, federated knowledge graph derived from the aggregated information in a network of biomedical web services. BioThings Explorer leverages semantically precise annotations of the inputs and outputs for each resource, and automates the chaining of web service calls to execute multi-step graph queries. Because there is no large, centralized knowledge graph to maintain, BioThing Explorer is distributed as a lightweight application that dynamically retrieves information at query time. More information can be found at https://explorer.biothings.io, and code is available at https://github.com/biothings/biothings_explorer.

LLM as A Robotic Brain: Unifying Egocentric Memory and Control

  • Authors: Jinjie Mai, Jun Chen, Bing Li, Guocheng Qian, Mohamed Elhoseiny, Bernard Ghanem
  • Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.09349
  • Pdf link: https://arxiv.org/pdf/2304.09349
  • Abstract
    Embodied AI focuses on the study and development of intelligent systems that possess a physical or virtual embodiment (i.e. robots) and are able to dynamically interact with their environment. Memory and control are the two essential parts of an embodied system and usually require separate frameworks to model each of them. In this paper, we propose a novel and generalizable framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control. The LLM-Brain framework integrates multiple multimodal language models for robotic tasks, utilizing a zero-shot learning approach. All components within LLM-Brain communicate using natural language in closed-loop multi-round dialogues that encompass perception, planning, control, and memory. The core of the system is an embodied LLM to maintain egocentric memory and control the robot. We demonstrate LLM-Brain by examining two downstream tasks: active exploration and embodied question answering. The active exploration tasks require the robot to extensively explore an unknown environment within a limited number of actions. Meanwhile, the embodied question answering tasks necessitate that the robot answers questions based on observations acquired during prior explorations.

Optimizing Carbon Storage Operations for Long-Term Safety

  • Authors: Yizheng Wang, Markus Zechner, Gege Wen, Anthony Louis Corso, John Michael Mern, Mykel J. Kochenderfer, Jef Karel Caers
  • Subjects: Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Fluid Dynamics (physics.flu-dyn)
  • Arxiv link: https://arxiv.org/abs/2304.09352
  • Pdf link: https://arxiv.org/pdf/2304.09352
  • Abstract
    To combat global warming and mitigate the risks associated with climate change, carbon capture and storage (CCS) has emerged as a crucial technology. However, safely sequestering CO2 in geological formations for long-term storage presents several challenges. In this study, we address these issues by modeling the decision-making process for carbon storage operations as a partially observable Markov decision process (POMDP). We solve the POMDP using belief state planning to optimize injector and monitoring well locations, with the goal of maximizing stored CO2 while maintaining safety. Empirical results in simulation demonstrate that our approach is effective in ensuring safe long-term carbon storage operations. We showcase the flexibility of our approach by introducing three different monitoring strategies and examining their impact on decision quality. Additionally, we introduce a neural network surrogate model for the POMDP decision-making process to handle the complex dynamics of the multi-phase flow. We also investigate the effects of different fidelity levels of the surrogate model on decision qualities.

Long-Term Fairness with Unknown Dynamics

  • Authors: Tongxin Yin, Reilly Raab, Mingyan Liu, Yang Liu
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09362
  • Pdf link: https://arxiv.org/pdf/2304.09362
  • Abstract
    While machine learning can myopically reinforce social inequalities, it may also be used to dynamically seek equitable outcomes. In this paper, we formalize long-term fairness in the context of online reinforcement learning. This formulation can accommodate dynamical control objectives, such as driving equity inherent in the state of a population, that cannot be incorporated into static formulations of fairness. We demonstrate that this framing allows an algorithm to adapt to unknown dynamics by sacrificing short-term incentives to drive a classifier-population system towards more desirable equilibria. For the proposed setting, we develop an algorithm that adapts recent work in online learning. We prove that this algorithm achieves simultaneous probabilistic bounds on cumulative loss and cumulative violations of fairness (as statistical regularities between demographic groups). We compare our proposed algorithm to the repeated retraining of myopic classifiers, as a baseline, and to a deep reinforcement learning algorithm that lacks safety guarantees. Our experiments model human populations according to evolutionary game theory and integrate real-world datasets.

Physical Knowledge Enhanced Deep Neural Network for Sea Surface Temperature Prediction

  • Authors: Yuxin Meng, Feng Gao, Eric Rigall, Ran Dong, Junyu Dong, Qian Du
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.09376
  • Pdf link: https://arxiv.org/pdf/2304.09376
  • Abstract
    Traditionally, numerical models have been deployed in oceanography studies to simulate ocean dynamics by representing physical equations. However, many factors pertaining to ocean dynamics seem to be ill-defined. We argue that transferring physical knowledge from observed data could further improve the accuracy of numerical models when predicting Sea Surface Temperature (SST). Recently, the advances in earth observation technologies have yielded a monumental growth of data. Consequently, it is imperative to explore ways in which to improve and supplement numerical models utilizing the ever-increasing amounts of historical observational data. To this end, we introduce a method for SST prediction that transfers physical knowledge from historical observations to numerical models. Specifically, we use a combination of an encoder and a generative adversarial network (GAN) to capture physical knowledge from the observed data. The numerical model data is then fed into the pre-trained model to generate physics-enhanced data, which can then be used for SST prediction. Experimental results demonstrate that the proposed method considerably enhances SST prediction performance when compared to several state-of-the-art baselines.

Analytical Large-Signal Modeling of Inverter-based Microgrids with Koopman Operator Theory for Autonomous Control

  • Authors: Zixiao Ma, Zhaoyu Wang
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.09378
  • Pdf link: https://arxiv.org/pdf/2304.09378
  • Abstract
    The microgrid (MG) plays a crucial role in the energy transition, but its nonlinearity presents a significant challenge for large-signal power systems studies in the electromagnetic transient (EMT) time scale. In this paper, we develop a large-signal linear MG model that considers the detailed dynamics of the primary and zero-control levels based on the Koopman operator (KO) theory. Firstly, a set of observable functions is carefully designed to capture the nonlinear dynamics of the MG. The corresponding linear KO is then analytically derived based on these observables, resulting in the linear representation of the original nonlinear MG with observables as the new coordinate. The influence of external input on the system dynamics is also considered during the derivation, enabling control of the MG. We solve the voltage control problem using the traditional linear quadratic integrator (LQI) method to demonstrate that textbook linear control techniques can accurately control the original nonlinear MG via the developed KO linearized MG model. Our proposed KO linearization method is generic and can be easily extended for different control objectives and MG structures using our analytical derivation procedure. We validate the effectiveness of our methodology through various case studies.

Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification

  • Authors: Suncheng Xiang, Jingsheng Gao, Mengyuan Guan, Jiacheng Ruan, Chengfeng Zhou, Ting Liu, Dahong Qian, Yuzhuo Fu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09498
  • Pdf link: https://arxiv.org/pdf/2304.09498
  • Abstract
    Generalizable person re-identification (Re-ID) is a very hot research topic in machine learning and computer vision, which plays a significant role in realistic scenarios due to its various applications in public security and video surveillance. However, previous methods mainly focus on the visual representation learning, while neglect to explore the potential of semantic features during training, which easily leads to poor generalization capability when adapted to the new domain. In this paper, we propose a Multi-Modal Equivalent Transformer called MMET for more robust visual-semantic embedding learning on visual, textual and visual-textual tasks respectively. To further enhance the robust feature learning in the context of transformer, a dynamic masking mechanism called Masked Multimodal Modeling strategy (MMM) is introduced to mask both the image patches and the text tokens, which can jointly works on multimodal or unimodal data and significantly boost the performance of generalizable person Re-ID. Extensive experiments on benchmark datasets demonstrate the competitive performance of our method over previous approaches. We hope this method could advance the research towards visual-semantic representation learning. Our source code is also publicly available at https://github.com/JeremyXSC/MMET.

Biologically inspired structure learning with reverse knowledge distillation for spiking neural networks

  • Authors: Qi Xu, Yaxin Li, Xuanye Fang, Jiangrong Shen, Jian K. Liu, Huajin Tang, Gang Pan
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09500
  • Pdf link: https://arxiv.org/pdf/2304.09500
  • Abstract
    Spiking neural networks (SNNs) have superb characteristics in sensory information recognition tasks due to their biological plausibility. However, the performance of some current spiking-based models is limited by their structures which means either fully connected or too-deep structures bring too much redundancy. This redundancy from both connection and neurons is one of the key factors hindering the practical application of SNNs. Although Some pruning methods were proposed to tackle this problem, they normally ignored the fact the neural topology in the human brain could be adjusted dynamically. Inspired by this, this paper proposed an evolutionary-based structure construction method for constructing more reasonable SNNs. By integrating the knowledge distillation and connection pruning method, the synaptic connections in SNNs can be optimized dynamically to reach an optimal state. As a result, the structure of SNNs could not only absorb knowledge from the teacher model but also search for deep but sparse network topology. Experimental results on CIFAR100 and DVS-Gesture show that the proposed structure learning method can get pretty well performance while reducing the connection redundancy. The proposed method explores a novel dynamical way for structure learning from scratch in SNNs which could build a bridge to close the gap between deep learning and bio-inspired neural dynamics.

Progressive Transfer Learning for Dexterous In-Hand Manipulation with Multi-Fingered Anthropomorphic Hand

  • Authors: Yongkang Luo, Wanyi Li, Peng Wang, Haonan Duan, Wei Wei, Jia Sun
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.09526
  • Pdf link: https://arxiv.org/pdf/2304.09526
  • Abstract
    Dexterous in-hand manipulation for a multi-fingered anthropomorphic hand is extremely difficult because of the high-dimensional state and action spaces, rich contact patterns between the fingers and objects. Even though deep reinforcement learning has made moderate progress and demonstrated its strong potential for manipulation, it is still faced with certain challenges, such as large-scale data collection and high sample complexity. Especially, for some slight change scenes, it always needs to re-collect vast amounts of data and carry out numerous iterations of fine-tuning. Remarkably, humans can quickly transfer learned manipulation skills to different scenarios with little supervision. Inspired by human flexible transfer learning capability, we propose a novel dexterous in-hand manipulation progressive transfer learning framework (PTL) based on efficiently utilizing the collected trajectories and the source-trained dynamics model. This framework adopts progressive neural networks for dynamics model transfer learning on samples selected by a new samples selection method based on dynamics properties, rewards and scores of the trajectories. Experimental results on contact-rich anthropomorphic hand manipulation tasks show that our method can efficiently and effectively learn in-hand manipulation skills with a few online attempts and adjustment learning under the new scene. Compared to learning from scratch, our method can reduce training time costs by 95%.

Network Algebraization and Port Relationship for Power-Electronic-Dominated Power Systems

  • Authors: Rui Ma, Xiaowen Yang, Meng Zhan
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.09528
  • Pdf link: https://arxiv.org/pdf/2304.09528
  • Abstract
    Different from the quasi-static network in the traditional power system, the dynamic network in the power-electronic-dominated power system should be considered due to rapid response of converters' controls. In this paper, a nonlinear differential-algebraic model framework is established with algebraic equations for dynamic electrical networks and differential equations for the (source) nodes, by generalizing the Kron reduction. The internal and terminal voltages of source nodes including converters are chosen as ports of nodes and networks. Correspondingly, the impact of dynamic network becomes clear, namely, it serves as a voltage divider and generates the terminal voltage based on the internal voltage of the sources instantaneously, even when the dynamics of inductance are included. With this simplest model, the roles of both nodes and the network become apparent.Simulations verify the proposed model framework in the modified 9-bus system.

Decadal Temperature Prediction via Chaotic Behavior Tracking

  • Authors: Jinfu Ren, Yang Liu, Jiming Liu
  • Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.09536
  • Pdf link: https://arxiv.org/pdf/2304.09536
  • Abstract
    Decadal temperature prediction provides crucial information for quantifying the expected effects of future climate changes and thus informs strategic planning and decision-making in various domains. However, such long-term predictions are extremely challenging, due to the chaotic nature of temperature variations. Moreover, the usefulness of existing simulation-based and machine learning-based methods for this task is limited because initial simulation or prediction errors increase exponentially over time. To address this challenging task, we devise a novel prediction method involving an information tracking mechanism that aims to track and adapt to changes in temperature dynamics during the prediction phase by providing probabilistic feedback on the prediction error of the next step based on the current prediction. We integrate this information tracking mechanism, which can be considered as a model calibrator, into the objective function of our method to obtain the corrections needed to avoid error accumulation. Our results show the ability of our method to accurately predict global land-surface temperatures over a decadal range. Furthermore, we demonstrate that our results are meaningful in a real-world context: the temperatures predicted using our method are consistent with and can be used to explain the well-known teleconnections within and between different continents.

SLIC: Self-Conditioned Adaptive Transform with Large-Scale Receptive Fields for Learned Image Compression

  • Authors: Wei Jiang, Peirong Ning, Ronggang Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.09571
  • Pdf link: https://arxiv.org/pdf/2304.09571
  • Abstract
    Learned image compression has achieved remarkable performance. Transform, plays an important role in boosting the RD performance. Analysis transform converts the input image to a compact latent representation. The more compact the latent representation is, the fewer bits we need to compress it. When designing better transform, some previous works adopt Swin-Transformer. The success of the Swin-Transformer in image compression can be attributed to the dynamic weights and large receptive field.However,the LayerNorm adopted in transformers is not suitable for image compression.We find CNN-based modules can also be dynamic and have large receptive-fields. The CNN-based modules can also work with GDN/IGDN. To make the CNN-based modules dynamic, we generate the weights of kernels conditioned on the input feature. We scale up the size of each kernel for larger receptive fields. To reduce complexity, we make the CNN-module channel-wise connected. We call this module Dynamic Depth-wise convolution. We replace the self-attention module with the proposed Dynamic Depth-wise convolution, replace the embedding layer with a depth-wise residual bottleneck for non-linearity and replace the FFN layer with an inverted residual bottleneck for more interactions in the spatial domain. The interactions among channels of dynamic depth-wise convolution are limited. We design the other block, which replaces the dynamic depth-wise convolution with channel attention. We equip the proposed modules in the analysis and synthesis transform and receive a more compact latent representation and propose the learned image compression model SLIC, meaning Self-Conditioned Adaptive Transform with Large-Scale Receptive Fields for Learned Image Compression Learned Image Compression. Thanks to the proposed transform modules, our proposed SLIC achieves 6.35% BD-rate reduction over VVC when measured in PSNR on Kodak dataset.

Learning controllers from data via kernel-based interpolation

  • Authors: Zhongjie Hu, Claudio De Persis, Pietro Tesi
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.09577
  • Pdf link: https://arxiv.org/pdf/2304.09577
  • Abstract
    We propose a data-driven control design method for nonlinear systems that builds on kernel-based interpolation. Under some assumptions on the system dynamics, kernel-based functions are built from data and a model of the system, along with deterministic model error bounds, is determined. Then, we derive a controller design method that aims at stabilizing the closed-loop system by cancelling out the system nonlinearities. The proposed method can be implemented using semidefinite programming and returns positively invariant sets for the closed-loop system.

DynamicRead: Exploring Robust Gaze Interaction Methods for Reading on Handheld Mobile Devices under Dynamic Conditions

  • Authors: Yaxiong Lei, Yuheng Wang, Tyler Caslin, Alexander Wisowaty, Xu Zhu, Mohamed Khamis, Juan Ye
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.09584
  • Pdf link: https://arxiv.org/pdf/2304.09584
  • Abstract
    Enabling gaze interaction in real-time on handheld mobile devices has attracted significant attention in recent years. An increasing number of research projects have focused on sophisticated appearance-based deep learning models to enhance the precision of gaze estimation on smartphones. This inspires important research questions, including how the gaze can be used in a real-time application, and what type of gaze interaction methods are preferable under dynamic conditions in terms of both user acceptance and delivering reliable performance. To address these questions, we design four types of gaze scrolling techniques: three explicit technique based on Gaze Gesture, Dwell time, and Pursuit; and one implicit technique based on reading speed to support touch-free, page-scrolling on a reading application. We conduct a 20-participant user study under both sitting and walking settings and our results reveal that Gaze Gesture and Dwell time-based interfaces are more robust while walking and Gaze Gesture has achieved consistently good scores on usability while not causing high cognitive workload.

On countings and enumerations of block-parallel automata networks

  • Authors: Kévin Perrot, Sylvain Sené, Léah Tapin
  • Subjects: Discrete Mathematics (cs.DM); Formal Languages and Automata Theory (cs.FL)
  • Arxiv link: https://arxiv.org/abs/2304.09664
  • Pdf link: https://arxiv.org/pdf/2304.09664
  • Abstract
    When we focus on finite dynamical systems from both the computability/complexity and the modelling standpoints, automata networks seem to be a particularly appropriate mathematical model on which theory shall be developed. In this paper, automata networks are finite collections of entities (the automata), each automaton having its own set of possible states, which interact with each other over discrete time, interactions being defined as local functions allowing the automata to change their state according to the states of their neighbourhoods. The studies on this model of computation have underlined the very importance of the way (i.e. the schedule) according to which the automata update their states, namely the update modes which can be deterministic, periodic, fair, or not. Indeed, a given network may admit numerous underlying dynamics, these latter depending highly on the update modes under which we let the former evolve. In this paper, we pay attention to a new kind of deterministic, periodic and fair update mode family introduced recently in a modelling framework, called the block-parallel update modes by duality with the well-known and studied block-sequential update modes. More precisely, in the general context of automata networks, this work aims at presenting what distinguish block-parallel update modes from block-sequential ones, and at counting and enumerating them: in absolute terms, by keeping only representatives leading to distinct dynamics, and by keeping only representatives giving rise to distinct isomorphic limit dynamics. Put together, this paper constitutes a first theoretical analysis of these update modes and their impact on automata networks dynamics.

State estimation of an electrochemical lithium-ion battery model: improved observer performance by hybrid redesign

  • Authors: E. Petri, T. Reynaudo, R. Postoyan, D. Astolfi, D. Nesic, S. Rael
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.09680
  • Pdf link: https://arxiv.org/pdf/2304.09680
  • Abstract
    Effective management and just-in-time maintenance of lithium-ion batteries require the knowledge of unmeasured (internal) variables that need to be estimated. Observers are thus designed for this purpose using a mathematical model of the battery internal dynamics. It appears that it is often difficult to tune the observers to obtain good estimation performances both in terms of convergence speed and accuracy, while these are essential in practice. In this context, we demonstrate how a recently developed hybrid multi-observer can be used to improve the performance of a given observer designed for an electrochemical model of a lihium-ion battery. Simulation results, obtained with standard parameters values, show the estimation performance improvement using the proposed method.

Analysing Equilibrium States for Population Diversity

  • Authors: Johannes Lengler, Andre Opris, Dirk Sudholt
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.09690
  • Pdf link: https://arxiv.org/pdf/2304.09690
  • Abstract
    Population diversity is crucial in evolutionary algorithms as it helps with global exploration and facilitates the use of crossover. Despite many runtime analyses showing advantages of population diversity, we have no clear picture of how diversity evolves over time. We study how population diversity of $(\mu+1)$ algorithms, measured by the sum of pairwise Hamming distances, evolves in a fitness-neutral environment. We give an exact formula for the drift of population diversity and show that it is driven towards an equilibrium state. Moreover, we bound the expected time for getting close to the equilibrium state. We find that these dynamics, including the location of the equilibrium, are unaffected by surprisingly many algorithmic choices. All unbiased mutation operators with the same expected number of bit flips have the same effect on the expected diversity. Many crossover operators have no effect at all, including all binary unbiased, respectful operators. We review crossover operators from the literature and identify crossovers that are neutral towards the evolution of diversity and crossovers that are not.

Guidance of the resonance energy flow in the mechanism of coupled magnetic pendulums

  • Authors: Valery N. Pilipchuk, Krystian Polczyński, Maksymilian Bednarek, Jan Awrejcewicz
  • Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Adaptation and Self-Organizing Systems (nlin.AO)
  • Arxiv link: https://arxiv.org/abs/2304.09755
  • Pdf link: https://arxiv.org/pdf/2304.09755
  • Abstract
    This paper presents a methodology of controlling the resonance energy exchange in mechanical system consisting of two weakly coupled magnetic pendulums interacting with the magnetic field generated by coils placed underneath. It is shown that properly guided magnetic fields can effectively change mechanical potentials in a way that the energy flow between the oscillators takes the desired direction. Studies were considered by using a specific set of descriptive functions characterizing the total excitation level, its distribution between the pendulums, and the phase shift. The developed control strategies are based on the observation that, in the case of antiphase oscillation, the energy is moving from the pendulum subjected to the repelling magnetic field, to the oscillator under the attracting field. In contrast, during the inphase oscillations, the energy flow is reversed. Therefore, closed-loop controller requires only the information about phase shift, which is easily estimated from dynamic state signals through the coherency index. Advantage of suggested control strategy is that the temporal rate of inputs is dictated by the speed of beating, which is relatively slow compared to the carrying oscillations.

Contactless Human Activity Recognition using Deep Learning with Flexible and Scalable Software Define Radio

  • Authors: Muhammad Zakir Khan, Jawad Ahmad, Wadii Boulila, Matthew Broadbent, Syed Aziz Shah, Anis Koubaa, Qammer H. Abbasi
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.09756
  • Pdf link: https://arxiv.org/pdf/2304.09756
  • Abstract
    Ambient computing is gaining popularity as a major technological advancement for the future. The modern era has witnessed a surge in the advancement in healthcare systems, with viable radio frequency solutions proposed for remote and unobtrusive human activity recognition (HAR). Specifically, this study investigates the use of Wi-Fi channel state information (CSI) as a novel method of ambient sensing that can be employed as a contactless means of recognizing human activity in indoor environments. These methods avoid additional costly hardware required for vision-based systems, which are privacy-intrusive, by (re)using Wi-Fi CSI for various safety and security applications. During an experiment utilizing universal software-defined radio (USRP) to collect CSI samples, it was observed that a subject engaged in six distinct activities, which included no activity, standing, sitting, and leaning forward, across different areas of the room. Additionally, more CSI samples were collected when the subject walked in two different directions. This study presents a Wi-Fi CSI-based HAR system that assesses and contrasts deep learning approaches, namely convolutional neural network (CNN), long short-term memory (LSTM), and hybrid (LSTM+CNN), employed for accurate activity recognition. The experimental results indicate that LSTM surpasses current models and achieves an average accuracy of 95.3% in multi-activity classification when compared to CNN and hybrid techniques. In the future, research needs to study the significance of resilience in diverse and dynamic environments to identify the activity of multiple users.

K-means Clustering Based Feature Consistency Alignment for Label-free Model Evaluation

  • Authors: Shuyu Miao, Lin Zheng, Jingjing Liu, and Hong Jin
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09758
  • Pdf link: https://arxiv.org/pdf/2304.09758
  • Abstract
    The label-free model evaluation aims to predict the model performance on various test sets without relying on ground truths. The main challenge of this task is the absence of labels in the test data, unlike in classical supervised model evaluation. This paper presents our solutions for the 1st DataCV Challenge of the Visual Dataset Understanding workshop at CVPR 2023. Firstly, we propose a novel method called K-means Clustering Based Feature Consistency Alignment (KCFCA), which is tailored to handle the distribution shifts of various datasets. KCFCA utilizes the K-means algorithm to cluster labeled training sets and unlabeled test sets, and then aligns the cluster centers with feature consistency. Secondly, we develop a dynamic regression model to capture the relationship between the shifts in distribution and model accuracy. Thirdly, we design an algorithm to discover the outlier model factors, eliminate the outlier models, and combine the strengths of multiple autoeval models. On the DataCV Challenge leaderboard, our approach secured 2nd place with an RMSE of 6.8526. Our method significantly improved over the best baseline method by 36% (6.8526 vs. 10.7378). Furthermore, our method achieves a relatively more robust and optimal single model performance on the validation dataset.

Advances on Concept Drift Detection in Regression Tasks using Social Networks Theory

  • Authors: Jean Paul Barddal, Heitor Murilo Gomes, Fabrício Enembreck
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09788
  • Pdf link: https://arxiv.org/pdf/2304.09788
  • Abstract
    Mining data streams is one of the main studies in machine learning area due to its application in many knowledge areas. One of the major challenges on mining data streams is concept drift, which requires the learner to discard the current concept and adapt to a new one. Ensemble-based drift detection algorithms have been used successfully to the classification task but usually maintain a fixed size ensemble of learners running the risk of needlessly spending processing time and memory. In this paper we present improvements to the Scale-free Network Regressor (SFNR), a dynamic ensemble-based method for regression that employs social networks theory. In order to detect concept drifts SFNR uses the Adaptive Window (ADWIN) algorithm. Results show improvements in accuracy, especially in concept drift situations and better performance compared to other state-of-the-art algorithms in both real and synthetic data.

Event-based Simultaneous Localization and Mapping: A Comprehensive Survey

  • Authors: Kunping Huang, Sen Zhang, Jing Zhang, Dacheng Tao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.09793
  • Pdf link: https://arxiv.org/pdf/2304.09793
  • Abstract
    In recent decades, visual simultaneous localization and mapping (vSLAM) has gained significant interest in both academia and industry. It estimates camera motion and reconstructs the environment concurrently using visual sensors on a moving robot. However, conventional cameras are limited by hardware, including motion blur and low dynamic range, which can negatively impact performance in challenging scenarios like high-speed motion and high dynamic range illumination. Recent studies have demonstrated that event cameras, a new type of bio-inspired visual sensor, offer advantages such as high temporal resolution, dynamic range, low power consumption, and low latency. This paper presents a timely and comprehensive review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks. The review covers the working principle of event cameras and various event representations for preprocessing event data. It also categorizes event-based vSLAM methods into four main categories: feature-based, direct, motion-compensation, and deep learning methods, with detailed discussions and practical guidance for each approach. Furthermore, the paper evaluates the state-of-the-art methods on various benchmarks, highlighting current challenges and future opportunities in this emerging research area. A public repository will be maintained to keep track of the rapid developments in this field at {\url{https://github.com/kun150kun/ESLAM-survey}}.

Leveraging Deep Reinforcement Learning for Metacognitive Interventions across Intelligent Tutoring Systems

  • Authors: Mark Abdelshiheed, John Wesley Hostetter, Tiffany Barnes, Min Chi
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.09821
  • Pdf link: https://arxiv.org/pdf/2304.09821
  • Abstract
    This work compares two approaches to provide metacognitive interventions and their impact on preparing students for future learning across Intelligent Tutoring Systems (ITSs). In two consecutive semesters, we conducted two classroom experiments: Exp. 1 used a classic artificial intelligence approach to classify students into different metacognitive groups and provide static interventions based on their classified groups. In Exp. 2, we leveraged Deep Reinforcement Learning (DRL) to provide adaptive interventions that consider the dynamic changes in the student's metacognitive levels. In both experiments, students received these interventions that taught how and when to use a backward-chaining (BC) strategy on a logic tutor that supports a default forward-chaining strategy. Six weeks later, we trained students on a probability tutor that only supports BC without interventions. Our results show that adaptive DRL-based interventions closed the metacognitive skills gap between students. In contrast, static classifier-based interventions only benefited a subset of students who knew how to use BC in advance. Additionally, our DRL agent prepared the experimental students for future learning by significantly surpassing their control peers on both ITSs.

Learning and Adapting Agile Locomotion Skills by Transferring Experience

  • Authors: Laura Smith, J. Chase Kew, Tianyu Li, Linda Luu, Xue Bin Peng, Sehoon Ha, Jie Tan, Sergey Levine
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09834
  • Pdf link: https://arxiv.org/pdf/2304.09834
  • Abstract
    Legged robots have enormous potential in their range of capabilities, from navigating unstructured terrains to high-speed running. However, designing robust controllers for highly agile dynamic motions remains a substantial challenge for roboticists. Reinforcement learning (RL) offers a promising data-driven approach for automatically training such controllers. However, exploration in these high-dimensional, underactuated systems remains a significant hurdle for enabling legged robots to learn performant, naturalistic, and versatile agility skills. We propose a framework for training complex robotic skills by transferring experience from existing controllers to jumpstart learning new tasks. To leverage controllers we can acquire in practice, we design this framework to be flexible in terms of their source -- that is, the controllers may have been optimized for a different objective under different dynamics, or may require different knowledge of the surroundings -- and thus may be highly suboptimal for the target task. We show that our method enables learning complex agile jumping behaviors, navigating to goal locations while walking on hind legs, and adapting to new environments. We also demonstrate that the agile behaviors learned in this way are graceful and safe enough to deploy in the real world.

Evaluating Verifiability in Generative Search Engines

  • Authors: Nelson F. Liu, Tianyi Zhang, Percy Liang
  • Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.09848
  • Pdf link: https://arxiv.org/pdf/2304.09848
  • Abstract
    Generative search engines directly generate responses to user queries, along with in-line citations. A prerequisite trait of a trustworthy generative search engine is verifiability, i.e., systems should cite comprehensively (high citation recall; all statements are fully supported by citations) and accurately (high citation precision; every cite supports its associated statement). We conduct human evaluation to audit four popular generative search engines -- Bing Chat, NeevaAI, perplexity.ai, and YouChat -- across a diverse set of queries from a variety of sources (e.g., historical Google user queries, dynamically-collected open-ended questions on Reddit, etc.). We find that responses from existing generative search engines are fluent and appear informative, but frequently contain unsupported statements and inaccurate citations: on average, a mere 51.5% of generated sentences are fully supported by citations and only 74.5% of citations support their associated sentence. We believe that these results are concerningly low for systems that may serve as a primary tool for information-seeking users, especially given their facade of trustworthiness. We hope that our results further motivate the development of trustworthy generative search engines and help researchers and users better understand the shortcomings of existing commercial systems.

Patching Neural Barrier Functions Using Hamilton-Jacobi Reachability

  • Authors: Sander Tonkens, Alex Toofanian, Zhizhen Qin, Sicun Gao, Sylvia Herbert
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.09850
  • Pdf link: https://arxiv.org/pdf/2304.09850
  • Abstract
    Learning-based control algorithms have led to major advances in robotics at the cost of decreased safety guarantees. Recently, neural networks have also been used to characterize safety through the use of barrier functions for complex nonlinear systems. Learned barrier functions approximately encode and enforce a desired safety constraint through a value function, but do not provide any formal guarantees. In this paper, we propose a local dynamic programming (DP) based approach to "patch" an almost-safe learned barrier at potentially unsafe points in the state space. This algorithm, HJ-Patch, obtains a novel barrier that provides formal safety guarantees, yet retains the global structure of the learned barrier. Our local DP based reachability algorithm, HJ-Patch, updates the barrier function "minimally" at points that both (a) neighbor the barrier safety boundary and (b) do not satisfy the safety condition. We view this as a key step to bridging the gap between learning-based barrier functions and Hamilton-Jacobi reachability analysis, providing a framework for further integration of these approaches. We demonstrate that for well-trained barriers we reduce the computational load by 2 orders of magnitude with respect to standard DP-based reachability, and demonstrate scalability to a 6-dimensional system, which is at the limit of standard DP-based reachability.

New submissions for Fri, 28 Apr 23

Keyword: efficient

SamurAI: A Versatile IoT Node With Event-Driven Wake-Up and Embedded ML Acceleration

  • Authors: Ivan Miro-Panades (LSTA), Benoit Tain (LECA), Jean-Frederic Christmann (LFIM), David Coriat (LIIM), Romain Lemaire (LIIM), Clement Jany, Baudouin Martineau (DSYS), Fabrice Chaix (DSYS), Guillaume Waltener (DSYS), Emmanuel Pluchart (LSTA), Jean-Philippe Noel (LFIM), Adam Makosiej, Maxime Montoya, Simone Bacles-Min (LIIM), David Briand (LIAE), Jean-Marc Philippe, Yvain Thonnart (LFIM), Alexandre Valentian (LSTA), Frederic Heitzmann (DSYS), Fabien Clermidy (DSCIN)
  • Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13726
  • Pdf link: https://arxiv.org/pdf/2304.13726
  • Abstract
    Increased capabilities such as recognition and self-adaptability are now required from IoT applications. While IoT node power consumption is a major concern for these applications, cloud-based processing is becoming unsustainable due to continuous sensor or image data transmission over the wireless network. Thus optimized ML capabilities and data transfers should be integrated in the IoT node. Moreover, IoT applications are torn between sporadic data-logging and energy-hungry data processing (e.g. image classification). Thus, the versatility of the node is key in addressing this wide diversity of energy and processing needs. This paper presents SamurAI, a versatile IoT node bridging this gap in processing and in energy by leveraging two on-chip sub-systems: a low power, clock-less, event-driven Always-Responsive (AR) part and an energy-efficient On-Demand (OD) part. AR contains a 1.7MOPS event-driven, asynchronous Wake-up Controller (WuC) with a 207ns wake-up time optimized for sporadic computing, while OD combines a deep-sleep RISC-V CPU and 1.3TOPS/W Machine Learning (ML) for more complex tasks up to 36GOPS. This architecture partitioning achieves best in class versatility metrics such as peak performance to idle power ratio. On an applicative classification scenario, it demonstrates system power gains, up to 3.5x compared to cloud-based processing, and thus extended battery lifetime.

A Unified Approach to Lane Change Intention Recognition and Driving Status Prediction through TCN-LSTM and Multi-Task Learning Models

  • Authors: Renteng Yuan, Mohamed Abdel-Aty, Xin Gu, Ou Zheng, Qiaojun Xiang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13732
  • Pdf link: https://arxiv.org/pdf/2304.13732
  • Abstract
    Lane change (LC) is a continuous and complex operation process. Accurately detecting and predicting LC processes can help traffic participants better understand their surrounding environment, recognize potential LC safety hazards, and improve traffic safety. This present paper focuses on LC processes, developing an LC intention recognition (LC-IR) model and an LC status prediction (LC-SP) model. A novel ensemble temporal convolutional network with Long Short-Term Memory units (TCN-LSTM) is first proposed to capture long-range dependencies in sequential data. Then, three multi-task models (MTL-LSTM, MTL-TCN, MTL-TCN -LSTM) are developed to capture the intrinsic relationship among output indicators. Furthermore, a unified modeling framework for LC intention recognition and driving status prediction (LC-IR-SP) is developed. To validate the performance of the proposed models, a total number of 1023 vehicle trajectories is extracted from the CitySim dataset. The Pearson coefficient is employed to determine the related indicators. The results indicate that using150 frames as input length, the TCN-LSTM model with 96.67% accuracy outperforms TCN and LSTM models in LC intention classification and provides more balanced results for each class. Three proposed multi-tasking learning models provide markedly increased performance compared to corresponding single-task models, with an average reduction of 24.24% and 22.86% in the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), respectively. The developed LC-IR-SP model has promising applications for autonomous vehicles to identity lane change behaviors, calculate a real-time traffic conflict index and improve vehicle control strategies.

Surrogate Assisted Generation of Human-Robot Interaction Scenarios

  • Authors: Varun Bhatt, Heramb Nemlekar, Matthew Fontaine, Bryon Tjanaka, Hejia Zhang, Ya-Chuan Hsu, Stefanos Nikolaidis
  • Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13787
  • Pdf link: https://arxiv.org/pdf/2304.13787
  • Abstract
    As human-robot interaction (HRI) systems advance, so does the difficulty of evaluating and understanding the strengths and limitations of these systems in different environments and with different users. To this end, previous methods have algorithmically generated diverse scenarios that reveal system failures in a shared control teleoperation task. However, these methods require directly evaluating generated scenarios by simulating robot policies and human actions. The computational cost of these evaluations limits their applicability in more complex domains. Thus, we propose augmenting scenario generation systems with surrogate models that predict both human and robot behaviors. In the shared control teleoperation domain and a more complex shared workspace collaboration task, we show that surrogate assisted scenario generation efficiently synthesizes diverse datasets of challenging scenarios. We demonstrate that these failures are reproducible in real-world interactions.

A Data-Driven Hybrid Automaton Framework to Modeling Complex Dynamical Systems

  • Authors: Yejiang Yang, Zihao Mo, Weiming Xiang
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.13811
  • Pdf link: https://arxiv.org/pdf/2304.13811
  • Abstract
    In this paper, a computationally efficient data-driven hybrid automaton model is proposed to capture unknown complex dynamical system behaviors using multiple neural networks. The sampled data of the system is divided by valid partitions into groups corresponding to their topologies and based on which, transition guards are defined. Then, a collection of small-scale neural networks that are computationally efficient are trained as the local dynamical description for their corresponding topologies. After modeling the system with a neural-network-based hybrid automaton, the set-valued reachability analysis with low computation cost is provided based on interval analysis and a split and combined process. At last, a numerical example of the limit cycle is presented to illustrate that the developed models can significantly reduce the computational cost in reachable set computation without sacrificing any modeling precision.

Programmatically Grounded, Compositionally Generalizable Robotic Manipulation

  • Authors: Renhao Wang, Jiayuan Mao, Joy Hsu, Hang Zhao, Jiajun Wu, Yang Gao
  • Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.13826
  • Pdf link: https://arxiv.org/pdf/2304.13826
  • Abstract
    Robots operating in the real world require both rich manipulation skills as well as the ability to semantically reason about when to apply those skills. Towards this goal, recent works have integrated semantic representations from large-scale pretrained vision-language (VL) models into manipulation models, imparting them with more general reasoning capabilities. However, we show that the conventional pretraining-finetuning pipeline for integrating such representations entangles the learning of domain-specific action information and domain-general visual information, leading to less data-efficient training and poor generalization to unseen objects and tasks. To this end, we propose ProgramPort, a modular approach to better leverage pretrained VL models by exploiting the syntactic and semantic structures of language instructions. Our framework uses a semantic parser to recover an executable program, composed of functional modules grounded on vision and action across different modalities. Each functional module is realized as a combination of deterministic computation and learnable neural networks. Program execution produces parameters to general manipulation primitives for a robotic end-effector. The entire modular network can be trained with end-to-end imitation learning objectives. Experiments show that our model successfully disentangles action and perception, translating to improved zero-shot and compositional generalization in a variety of manipulation behaviors. Project webpage at: \url{https://progport.github.io}.

Physics-informed Data-driven Discovery of Constitutive Models with Application to Strain-Rate-sensitive Soft Materials

  • Authors: Kshitiz Upadhyay, Jan N. Fuhg, Nikolaos Bouklas, K.T. Ramesh
  • Subjects: Computational Engineering, Finance, and Science (cs.CE); Materials Science (cond-mat.mtrl-sci); Soft Condensed Matter (cond-mat.soft)
  • Arxiv link: https://arxiv.org/abs/2304.13897
  • Pdf link: https://arxiv.org/pdf/2304.13897
  • Abstract
    A novel data-driven constitutive modeling approach is proposed, which combines the physics-informed nature of modeling based on continuum thermodynamics with the benefits of machine learning. This approach is demonstrated on strain-rate-sensitive soft materials. This model is based on the viscous dissipation-based visco-hyperelasticity framework where the total stress is decomposed into volumetric, isochoric hyperelastic, and isochoric viscous overstress contributions. It is shown that each of these stress components can be written as linear combinations of the components of an irreducible integrity basis. Three Gaussian process regression-based surrogate models are trained (one per stress component) between principal invariants of strain and strain rate tensors and the corresponding coefficients of the integrity basis components. It is demonstrated that this type of model construction enforces key physics-based constraints on the predicted responses: the second law of thermodynamics, the principles of local action and determinism, objectivity, the balance of angular momentum, an assumed reference state, isotropy, and limited memory. The three surrogate models that constitute our constitutive model are evaluated by training them on small-size numerically generated data sets corresponding to a single deformation mode and then analyzing their predictions over a much wider testing regime comprising multiple deformation modes. Our physics-informed data-driven constitutive model predictions are compared with the corresponding predictions of classical continuum thermodynamics-based and purely data-driven models. It is shown that our surrogate models can reasonably capture the stress-strain-strain rate responses in both training and testing regimes, and provide improvements in terms of prediction accuracy, generalizability to multiple deformation modes, and compatibility with limited data.

MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results

  • Authors: Qingpeng Zhu, Wenxiu Sun, Yuekun Dai, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Qianhui Sun, Chen Change Loy, Jinwei Gu, Yi Yu, Yangke Huang, Kang Zhang, Meiya Chen, Yu Wang, Yongchao Li, Hao Jiang, Amrit Kumar Muduli, Vikash Kumar, Kunal Swami, Pankaj Kumar Bajpai, Yunchao Ma, Jiajun Xiao, Zhi Ling
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13916
  • Pdf link: https://arxiv.org/pdf/2304.13916
  • Abstract
    Depth completion from RGB images and sparse Time-of-Flight (ToF) measurements is an important problem in computer vision and robotics. While traditional methods for depth completion have relied on stereo vision or structured light techniques, recent advances in deep learning have enabled more accurate and efficient completion of depth maps from RGB images and sparse ToF measurements. To evaluate the performance of different depth completion methods, we organized an RGB+sparse ToF depth completion competition. The competition aimed to encourage research in this area by providing a standardized dataset and evaluation metrics to compare the accuracy of different approaches. In this report, we present the results of the competition and analyze the strengths and weaknesses of the top-performing methods. We also discuss the implications of our findings for future research in RGB+sparse ToF depth completion. We hope that this competition and report will help to advance the state-of-the-art in this important area of research. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2023.

Proportionally Representative Clustering

  • Authors: Haris Aziz, Barton E. Lee, Sean Morota Chu
  • Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2304.13917
  • Pdf link: https://arxiv.org/pdf/2304.13917
  • Abstract
    In recent years, there has been a surge in effort to formalize notions of fairness in machine learning. We focus on clustering -- one of the fundamental tasks in unsupervised machine learning. We propose a new axiom that captures proportional representation fairness (PRF). We make a case that the concept achieves the raison d'{^{e}}tre of several existing concepts in the literature in an arguably more convincing manner. Our fairness concept is not satisfied by existing fair clustering algorithms. We design efficient algorithms to achieve PRF both for unconstrained and discrete clustering problems.

SkinSAM: Empowering Skin Cancer Segmentation with Segment Anything Model

  • Authors: Mingzhe Hu, Yuheng Li, Xiaofeng Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13973
  • Pdf link: https://arxiv.org/pdf/2304.13973
  • Abstract
    Skin cancer is a prevalent and potentially fatal disease that requires accurate and efficient diagnosis and treatment. Although manual tracing is the current standard in clinics, automated tools are desired to reduce human labor and improve accuracy. However, developing such tools is challenging due to the highly variable appearance of skin cancers and complex objects in the background. In this paper, we present SkinSAM, a fine-tuned model based on the Segment Anything Model that showed outstanding segmentation performance. The models are validated on HAM10000 dataset which includes 10015 dermatoscopic images. While larger models (ViT_L, ViT_H) performed better than the smaller one (ViT_b), the finetuned model (ViT_b_finetuned) exhibited the greatest improvement, with a Mean pixel accuracy of 0.945, Mean dice score of 0.8879, and Mean IoU score of 0.7843. Among the lesion types, vascular lesions showed the best segmentation results. Our research demonstrates the great potential of adapting SAM to medical image segmentation tasks.

An FPTAS for Budgeted Laminar Matroid Independent Set

  • Authors: Ilan Doron-Arad, Ariel Kulik, Hadas Shachnai
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.13984
  • Pdf link: https://arxiv.org/pdf/2304.13984
  • Abstract
    We study the budgeted laminar matroid independent set problem. The input is a ground set, where each element has a cost and a non-negative profit, along with a laminar matroid over the elements and a budget. The goal is to select a maximum profit independent set of the matroid whose total cost is bounded by the budget. Several well known special cases, where we have, e.g., no matroid constraint (the classic knapsack problem) or a uniform matroid constraint (knapsack with a cardinality constraint), admit a fully polynomial-time approximation scheme (FPTAS). In contrast, the budgeted matroid independent set (BMI) problem with a general matroid has an efficient polynomial-time approximation scheme (EPTAS) but does not admit an FPTAS. This implies an EPTAS for our problem, which is the best known result prior to this work. We present an FPTAS for budgeted laminar matroid independent set, improving the previous EPTAS for this matroid family and generalizing the FPTAS known for knapsack with a cardinality constraint and multiple-choice knapsack. Our scheme is based on a simple dynamic program which utilizes the tree-like structure of laminar matroids.

Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification

  • Authors: Thanh-Tung Nguyen, Viktor Schlegel, Abhinav Kashyap, Stefan Winkler, Shao-Syuan Huang, Jie-Jyun Liu, Chih-Jen Lin
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13998
  • Pdf link: https://arxiv.org/pdf/2304.13998
  • Abstract
    Clinical notes are assigned ICD codes - sets of codes for diagnoses and procedures. In the recent years, predictive machine learning models have been built for automatic ICD coding. However, there is a lack of widely accepted benchmarks for automated ICD coding models based on large-scale public EHR data. This paper proposes a public benchmark suite for ICD-10 coding using a large EHR dataset derived from MIMIC-IV, the most recent public EHR dataset. We implement and compare several popular methods for ICD coding prediction tasks to standardize data preprocessing and establish a comprehensive ICD coding benchmark dataset. This approach fosters reproducibility and model comparison, accelerating progress toward employing automated ICD coding in future studies. Furthermore, we create a new ICD-9 benchmark using MIMIC-IV data, providing more data points and a higher number of ICD codes than MIMIC-III. Our open-source code offers easy access to data processing steps, benchmark creation, and experiment replication for those with MIMIC-IV access, providing insights, guidance, and protocols to efficiently develop ICD coding models.

A Supervised Machine Learning Approach to Operator Intent Recognition for Teleoperated Mobile Robot Navigation

  • Authors: Evangelos Tsagkournis, Dimitris Panagopoulos, Giannis Petousakis, Grigoris Nikolaou, Rustam Stolkin, Manolis Chiou
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14003
  • Pdf link: https://arxiv.org/pdf/2304.14003
  • Abstract
    In applications that involve human-robot interaction (HRI), human-robot teaming (HRT), and cooperative human-machine systems, the inference of the human partner's intent is of critical importance. This paper presents a method for the inference of the human operator's navigational intent, in the context of mobile robots that provide full or partial (e.g., shared control) teleoperation. We propose the Machine Learning Operator Intent Inference (MLOII) method, which a) processes spatial data collected by the robot's sensors; b) utilizes a supervised machine learning algorithm to estimate the operator's most probable navigational goal online. The proposed method's ability to reliably and efficiently infer the intent of the human operator is experimentally evaluated in realistically simulated exploration and remote inspection scenarios. The results in terms of accuracy and uncertainty indicate that the proposed method is comparable to another state-of-the-art method found in the literature.

Diagonalization Based Parallel-in-Time Method for a Class of Fourth Order Time Dependent PDEs

  • Authors: Gobinda Garai, Bankim C. Mandal
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14021
  • Pdf link: https://arxiv.org/pdf/2304.14021
  • Abstract
    In this paper, we design, analyze and implement efficient time parallel method for a class of fourth order time-dependent partial differential equations (PDEs), namely biharmonic heat equation, linearized Cahn-Hilliard (CH) equation and the nonlinear CH equation. We use diagonalization technique on all-at-once system to develop efficient iterative time parallel methods for investigating the solution behaviour of said equations. We present the convergence analysis of Parallel-in-Time (PinT) algorithms. We verify our findings by presenting numerical results.

Attacks on Robust Distributed Learning Schemes via Sensitivity Curve Maximization

  • Authors: Christian A. Schroth, Stefan Vlaski, Abdelhak M. Zoubir
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.14024
  • Pdf link: https://arxiv.org/pdf/2304.14024
  • Abstract
    Distributed learning paradigms, such as federated or decentralized learning, allow a collection of agents to solve global learning and optimization problems through limited local interactions. Most such strategies rely on a mixture of local adaptation and aggregation steps, either among peers or at a central fusion center. Classically, aggregation in distributed learning is based on averaging, which is statistically efficient, but susceptible to attacks by even a small number of malicious agents. This observation has motivated a number of recent works, which develop robust aggregation schemes by employing robust variations of the mean. We present a new attack based on sensitivity curve maximization (SCM), and demonstrate that it is able to disrupt existing robust aggregation schemes by injecting small, but effective perturbations.

COSST: Multi-organ Segmentation with Partially Labeled Datasets Using Comprehensive Supervisions and Self-training

  • Authors: Han Liu, Zhoubing Xu, Riqiang Gao, Hao Li, Jianing Wang, Guillaume Chabin, Ipek Oguz, Sasa Grbic
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14030
  • Pdf link: https://arxiv.org/pdf/2304.14030
  • Abstract
    Deep learning models have demonstrated remarkable success in multi-organ segmentation but typically require large-scale datasets with all organs of interest annotated. However, medical image datasets are often low in sample size and only partially labeled, i.e., only a subset of organs are annotated. Therefore, it is crucial to investigate how to learn a unified model on the available partially labeled datasets to leverage their synergistic potential. In this paper, we empirically and systematically study the partial-label segmentation with in-depth analyses on the existing approaches and identify three distinct types of supervision signals, including two signals derived from ground truth and one from pseudo label. We propose a novel training framework termed COSST, which effectively and efficiently integrates comprehensive supervision signals with self-training. Concretely, we first train an initial unified model using two ground truth-based signals and then iteratively incorporate the pseudo label signal to the initial model using self-training. To mitigate performance degradation caused by unreliable pseudo labels, we assess the reliability of pseudo labels via outlier detection in latent space and exclude the most unreliable pseudo labels from each self-training iteration. Extensive experiments are conducted on six CT datasets for three partial-label segmentation tasks. Experimental results show that our proposed COSST achieves significant improvement over the baseline method, i.e., individual networks trained on each partially labeled dataset. Compared to the state-of-the-art partial-label segmentation methods, COSST demonstrates consistent superior performance on various segmentation tasks and with different training data size.

A Parameterized Theory of PAC Learning

  • Authors: Cornelius Brand, Robert Ganian, Kirill Simonov
  • Subjects: Computational Complexity (cs.CC); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14058
  • Pdf link: https://arxiv.org/pdf/2304.14058
  • Abstract
    Probably Approximately Correct (i.e., PAC) learning is a core concept of sample complexity theory, and efficient PAC learnability is often seen as a natural counterpart to the class P in classical computational complexity. But while the nascent theory of parameterized complexity has allowed us to push beyond the P-NP ``dichotomy'' in classical computational complexity and identify the exact boundaries of tractability for numerous problems, there is no analogue in the domain of sample complexity that could push beyond efficient PAC learnability. As our core contribution, we fill this gap by developing a theory of parameterized PAC learning which allows us to shed new light on several recent PAC learning results that incorporated elements of parameterized complexity. Within the theory, we identify not one but two notions of fixed-parameter learnability that both form distinct counterparts to the class FPT -- the core concept at the center of the parameterized complexity paradigm -- and develop the machinery required to exclude fixed-parameter learnability. We then showcase the applications of this theory to identify refined boundaries of tractability for CNF and DNF learning as well as for a range of learning problems on graphs.

Fourier-Gegenbauer Pseudospectral Method for Solving Time-Dependent One-Dimensional Fractional Partial Differential Equations with Variable Coefficients and Periodic Solutions

  • Authors: Kareem T. Elgindy
  • Subjects: Numerical Analysis (math.NA); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.14061
  • Pdf link: https://arxiv.org/pdf/2304.14061
  • Abstract
    In this paper, we present a novel pseudospectral (PS) method for solving a new class of initial-value problems (IVPs) of time-dependent one-dimensional fractional partial differential equations (FPDEs) with variable coefficients and periodic solutions. A main ingredient of our work is the use of the recently developed periodic RL/Caputo fractional derivative (FD) operators with sliding positive fixed memory length of Bourafa et al. [1] or their reduced forms obtained by Elgindy [2] as the natural FD operators to accurately model FPDEs with periodic solutions. The proposed method converts the IVP into a well-conditioned linear system of equations using the PS method based on Fourier collocations and Gegenbauer quadratures. The reduced linear system has a simple special structure and can be solved accurately and rapidly by using standard linear system solvers. A rigorous study of the error and convergence of the proposed method is presented. The idea and results presented in this paper are expected to be useful in the future to address more general problems involving FPDEs with periodic solutions.

Lightweight, Pre-trained Transformers for Remote Sensing Timeseries

  • Authors: Gabriel Tseng, Ivan Zvonkov, Mirali Purohit, David Rolnick, Hannah Kerner
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14065
  • Pdf link: https://arxiv.org/pdf/2304.14065
  • Abstract
    Machine learning algorithms for parsing remote sensing data have a wide range of societally relevant applications, but labels used to train these algorithms can be difficult or impossible to acquire. This challenge has spurred research into self-supervised learning for remote sensing data aiming to unlock the use of machine learning in geographies or application domains where labelled datasets are small. Current self-supervised learning approaches for remote sensing data draw significant inspiration from techniques applied to natural images. However, remote sensing data has important differences from natural images -- for example, the temporal dimension is critical for many tasks and data is collected from many complementary sensors. We show that designing models and self-supervised training techniques specifically for remote sensing data results in both smaller and more performant models. We introduce the Pretrained Remote Sensing Transformer (Presto), a transformer-based model pre-trained on remote sensing pixel-timeseries data. Presto excels at a wide variety of globally distributed remote sensing tasks and outperforms much larger models. Presto can be used for transfer learning or as a feature extractor for simple models, enabling efficient deployment at scale.

Linear and Nonlinear Parareal Methods for the Cahn-Hilliard Equation

  • Authors: Gobinda Garai, Bankim C. Mandal
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14074
  • Pdf link: https://arxiv.org/pdf/2304.14074
  • Abstract
    In this paper, we propose, analyze and implement efficient time parallel methods for the Cahn-Hilliard (CH) equation. It is of great importance to develop efficient numerical methods for the CH equation, given the range of applicability of the CH equation has. The CH equation generally needs to be simulated for a very long time to get the solution of phase coarsening stage. Therefore it is desirable to accelerate the computation using parallel method in time. We present linear and nonlinear Parareal methods for the CH equation depending on the choice of fine approximation. We illustrate our results by numerical experiments.

Lowering the Entry Bar to HPC-Scale Uncertainty Quantification

  • Authors: Linus Seelinger, Anne Reinarz, Jean Benezech, Mikkel Bue Lykkegaard, Lorenzo Tamellini, Robert Scheichl
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14087
  • Pdf link: https://arxiv.org/pdf/2304.14087
  • Abstract
    Treating uncertainties in models is essential in many fields of science and engineering. Uncertainty quantification (UQ) on complex and computationally costly numerical models necessitates a combination of efficient model solvers, advanced UQ methods and HPC-scale resources. The resulting technical complexities as well as lack of separation of concerns between UQ and model experts is holding back many interesting UQ applications. The aim of this paper is to close the gap between advanced UQ methods and advanced models by removing the hurdle of complex software stack integration, which in turn will offer a straightforward way to scale even prototype-grade UQ applications to high-performance resources. We achieve this goal by introducing a parallel software architecture based on UM-Bridge, a universal interface for linking UQ and models. We present three realistic applications from different areas of science and engineering, scaling from single machines to large clusters on the Google Cloud Platform.

Securing Autonomous Air Traffic Management: Blockchain Networks Driven by Explainable AI

  • Authors: Louise Axon, Dimitrios Panagiotakopoulos, Samuel Ayo, Carolina Sanchez-Hernandez, Yan Zong, Simon Brown, Lei Zhang, Michael Goldsmith, Sadie Creese, Weisi Guo
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.14095
  • Pdf link: https://arxiv.org/pdf/2304.14095
  • Abstract
    Air Traffic Management data systems today are inefficient and not scalable to enable future unmanned systems. Current data is fragmented, siloed, and not easily accessible. There is data conflict, misuse, and eroding levels of trust in provenance and accuracy. With increased autonomy in aviation, Artificially Intelligent (AI) enabled unmanned traffic management (UTM) will be more reliant on secure data from diverse stakeholders. There is an urgent need to develop a secure network that has trustworthy data chains and works with the requirements generated by UTM. Here, we review existing research in 3 key interconnected areas: (1) blockchain development for secure data transfer between competing aviation stakeholders, (2) self-learning networking architectures that distribute consensus to achieve secure air traffic control, (3) explainable AI to build trust with human stakeholders and backpropagate requirements for blockchain and network optimisation. When connected together, this new digital ecosystem blueprint is tailored for safety critical UTM sectors. We motivate the readers with a case study, where a federated learning UTM uses real air traffic and weather data is secured and explained to human operators. This emerging area still requires significant research and development by the community to ensure it can enable future autonomous air mobility.

Learning Neural PDE Solvers with Parameter-Guided Channel Attention

  • Authors: Makoto Takamoto, Francesco Alesiani, Mathias Niepert
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn); Geophysics (physics.geo-ph)
  • Arxiv link: https://arxiv.org/abs/2304.14118
  • Pdf link: https://arxiv.org/pdf/2304.14118
  • Abstract
    Scientific Machine Learning (SciML) is concerned with the development of learned emulators of physical systems governed by partial differential equations (PDE). In application domains such as weather forecasting, molecular dynamics, and inverse design, ML-based surrogate models are increasingly used to augment or replace inefficient and often non-differentiable numerical simulation algorithms. While a number of ML-based methods for approximating the solutions of PDEs have been proposed in recent years, they typically do not adapt to the parameters of the PDEs, making it difficult to generalize to PDE parameters not seen during training. We propose a Channel Attention mechanism guided by PDE Parameter Embeddings (CAPE) component for neural surrogate models and a simple yet effective curriculum learning strategy. The CAPE module can be combined with neural PDE solvers allowing them to adapt to unseen PDE parameters. The curriculum learning strategy provides a seamless transition between teacher-forcing and fully auto-regressive training. We compare CAPE in conjunction with the curriculum learning strategy using a popular PDE benchmark and obtain consistent and significant improvements over the baseline models. The experiments also show several advantages of CAPE, such as its increased ability to generalize to unseen PDE parameters without large increases inference time and parameter count.

Exploiting Inductive Bias in Transformer for Point Cloud Classification and Segmentation

  • Authors: Zihao Li, Pan Gao, Hui Yuan, Ran Wei, Manoranjan Paul
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14124
  • Pdf link: https://arxiv.org/pdf/2304.14124
  • Abstract
    Discovering inter-point connection for efficient high-dimensional feature extraction from point coordinate is a key challenge in processing point cloud. Most existing methods focus on designing efficient local feature extractors while ignoring global connection, or vice versa. In this paper, we design a new Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point relations, which considers both local and global attentions. Specifically, considering local spatial coherence, local feature learning is performed through Relative Position Encoding and Attentive Feature Pooling. We incorporate the learned locality into the Transformer module. The local feature affects value component in Transformer to modulate the relationship between channels of each point, which can enhance self-attention mechanism with locality based channel interaction. We demonstrate its superiority experimentally on classification and segmentation tasks. The code is available at: https://github.com/jiamang/IBT

Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point Clouds

  • Authors: Pengfei Song, Luoyu MEI, Han Cheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); General Topology (math.GN)
  • Arxiv link: https://arxiv.org/abs/2304.14132
  • Pdf link: https://arxiv.org/pdf/2304.14132
  • Abstract
    This paper presents a framework for semantic segmentation on sparse sequential point clouds of millimeter-wave radar. Compared with cameras and lidars, millimeter-wave radars have the advantage of not revealing privacy, having a strong anti-interference ability, and having long detection distance. The sparsity and capturing temporal-topological features of mmWave data is still a problem. However, the issue of capturing the temporal-topological coupling features under the human semantic segmentation task prevents previous advanced segmentation methods (e.g PointNet, PointCNN, Point Transformer) from being well utilized in practical scenarios. To address the challenge caused by the sparsity and temporal-topological feature of the data, we (i) introduce graph structure and topological features to the point cloud, (ii) propose a semantic segmentation framework including a global feature-extracting module and a sequential feature-extracting module. In addition, we design an efficient and more fitting loss function for a better training process and segmentation results based on graph clustering. Experimentally, we deploy representative semantic segmentation algorithms (Transformer, GCNN, etc.) on a custom dataset. Experimental results indicate that our model achieves mean accuracy on the custom dataset by $\mathbf{82.31}%$ and outperforms the state-of-the-art algorithms. Moreover, to validate the model's robustness, we deploy our model on the well-known S3DIS dataset. On the S3DIS dataset, our model achieves mean accuracy by $\mathbf{92.6}%$, outperforming baseline algorithms.

Multiplicity Problems on Algebraic Series and Context-Free Grammars

  • Authors: Nikhil Balaji, Lorenzo Clemente, Klara Nosan, Mahsa Shirmohammadi, James Worrell
  • Subjects: Formal Languages and Automata Theory (cs.FL); Computational Complexity (cs.CC)
  • Arxiv link: https://arxiv.org/abs/2304.14145
  • Pdf link: https://arxiv.org/pdf/2304.14145
  • Abstract
    In this paper we obtain complexity bounds for computational problems on algebraic power series over several commuting variables. The power series are specified by systems of polynomial equations: a formalism closely related to weighted context-free grammars. We focus on three problems -- decide whether a given algebraic series is identically zero, determine whether all but finitely many coefficients are zero, and compute the coefficient of a specific monomial. We relate these questions to well-known computational problems on arithmetic circuits and thereby show that all three problems lie in the counting hierarchy. Our main result improves the best known complexity bound on deciding zeroness of an algebraic series. This problem is known to lie in PSPACE by reduction to the decision problem for the existential fragment of the theory of real closed fields. Here we show that the problem lies in the counting hierarchy by reduction to the problem of computing the degree of a polynomial given by an arithmetic circuit. As a corollary we obtain new complexity bounds on multiplicity equivalence of context-free grammars restricted to a bounded language, language inclusion of a nondeterministic finite automaton in an unambiguous context-free grammar, and language inclusion of a non-deterministic context-free grammar in an unambiguous finite automaton.

Tractability of sampling recovery on unweighted function classes

  • Authors: David Krieg
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14169
  • Pdf link: https://arxiv.org/pdf/2304.14169
  • Abstract
    It is well-known that the problem of sampling recovery in the $L_2$-norm on unweighted Korobov spaces (Sobolev spaces with mixed smoothness) as well as classical smoothness classes such as H"older classes suffers from the curse of dimensionality. We show that the problem is tractable for those classes if they are intersected with the Wiener algebra of functions with summable Fourier coefficients. In fact, this is a relatively simple implication of powerful results by Rauhut and Ward [Appl. Comput. Harmon. Anal. 40 (2016), pp. 321--351]. Tractability is achieved by the use of non-linear algorithms, while linear algorithms cannot do the job.

The Mutual Information In The Vicinity of Capacity-Achieving Input Distributions

  • Authors: Hao-Chung Cheng, Barış Nakiboğlu
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2304.14219
  • Pdf link: https://arxiv.org/pdf/2304.14219
  • Abstract
    The mutual information is analyzed as a function of the input distribution using an identity due to Tops\o{e} for channels with (possibly multiple) linear cost constraints and finite input and output sets. The mutual information is bounded above by a function decreasing quadratically with the distance to the set of all capacity-achieving input distributions for the case when the distance is less than a certain threshold. The closed-form expressions for the threshold and the coefficient of the quadratic decrease are derived. A counter-example demonstrating the non-existence of such a quadratic bound in the case of infinitely many linear cost constraints is provided. Implications of these observations for the channel coding problem and applications of the proof technique to related problems are discussed.

Developing Distributed High-performance Computing Capabilities of an Open Science Platform for Robust Epidemic Analysis

  • Authors: Nicholson Collier, Justin M. Wozniak, Abby Stevens, Yadu Babuji, Mickaël Binois, Ardindam Fadikar, Alexandra Würth, Kyle Chard, Jonathan Ozik
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.14244
  • Pdf link: https://arxiv.org/pdf/2304.14244
  • Abstract
    COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among domain experts, mathematical modelers, and scientific computing specialists. Computationally, however, it also revealed critical gaps in the ability of researchers to exploit advanced computing systems. These challenging areas include gaining access to scalable computing systems, porting models and workflows to new systems, sharing data of varying sizes, and producing results that can be reproduced and validated by others. Informed by our team's work in supporting public health decision makers during the COVID-19 pandemic and by the identified capability gaps in applying high-performance computing (HPC) to the modeling of complex social systems, we present the goals, requirements, and initial implementation of OSPREY, an open science platform for robust epidemic analysis. The prototype implementation demonstrates an integrated, algorithm-driven HPC workflow architecture, coordinating tasks across federated HPC resources, with robust, secure and automated access to each of the resources. We demonstrate scalable and fault-tolerant task execution, an asynchronous API to support fast time-to-solution algorithms, an inclusive, multi-language approach, and efficient wide-area data management. The example OSPREY code is made available on a public repository.

Evaluating the Impact of Pair Documentation on Requirements Quality and Team Productivity

  • Authors: Nosheen Qamar, Nosheen Sabahat, Amir Mashmool, Amir Mosavi
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.14255
  • Pdf link: https://arxiv.org/pdf/2304.14255
  • Abstract
    The most important deliverable of the requirements engineering process is the software requirements specification(SRS)document. Requirements documentation is important during the complete software development lifecycle to share the vision and effective communication between major stakeholders. The Standish Group reported that the top factors behind project failures are related to requirements. By giving the right level of attention to key requirements good quality software can be produced. Therefore, more research is needed in this area and this study is trying to fill this gap. This empirical study aims to examine the importance of pair documentation. Unconventional documentation refers to the approach when two persons work on the same document's requirements collaboratively just like pair programming on the requirements quality and team productivity. Twenty pairs of documentation writers worked into two groups. one group using pair documentation, i.e., the experimental group, and the other one using conventional documentation, i.e., the control group. the resultant requirement's documents for the same project, produced by both groups were then compared. It is observed that there is a significant improvement in the quality and productivity of the experimental group using pair documentation. The findings of this study may assist requirement engineers in forming efficient teams that can create high-quality SRS documents.

A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving Services

  • Authors: Dewant Katare, Diego Perino, Jari Nurmi, Martijn Warnier, Marijn Janssen, Aaron Yi Ding
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14271
  • Pdf link: https://arxiv.org/pdf/2304.14271
  • Abstract
    Autonomous driving services rely heavily on sensors such as cameras, LiDAR, radar, and communication modules. A common practice of processing the sensed data is using a high-performance computing unit placed inside the vehicle, which deploys AI models and algorithms to act as the brain or administrator of the vehicle. The vehicular data generated from average hours of driving can be up to 20 Terabytes depending on the data rate and specification of the sensors. Given the scale and fast growth of services for autonomous driving, it is essential to improve the overall energy and environmental efficiency, especially in the trend towards vehicular electrification (e.g., battery-powered). Although the areas have seen significant advancements in sensor technologies, wireless communications, computing and AI/ML algorithms, the challenge still exists in how to apply and integrate those technology innovations to achieve energy efficiency. This survey reviews and compares the connected vehicular applications, vehicular communications, approximation and Edge AI techniques. The focus is on energy efficiency by covering newly proposed approximation and enabling frameworks. To the best of our knowledge, this survey is the first to review the latest approximate Edge AI frameworks and publicly available datasets in energy-efficient autonomous driving. The insights and vision from this survey can be beneficial for the collaborative driving service development on low-power and memory-constrained systems and also for the energy optimization of autonomous vehicles.

On Solution Discovery via Reconfiguration

  • Authors: Michael R. Fellows, Mario Grobler, Nicole Megow, Amer E. Mouawad, Vijayaragunathan Ramamoorthi, Frances A. Rosamond, Daniel Schmand, Sebastian Siebertz
  • Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.14295
  • Pdf link: https://arxiv.org/pdf/2304.14295
  • Abstract
    The dynamics of real-world applications and systems require efficient methods for improving infeasible solutions or restoring corrupted ones by making modifications to the current state of a system in a restricted way. We propose a new framework of solution discovery via reconfiguration for constructing a feasible solution for a given problem by executing a sequence of small modifications starting from a given state. Our framework integrates and formalizes different aspects of classical local search, reoptimization, and combinatorial reconfiguration. We exemplify our framework on a multitude of fundamental combinatorial problems, namely Vertex Cover, Independent Set, Dominating Set, and Coloring. We study the classical as well as the parameterized complexity of the solution discovery variants of those problems and explore the boundary between tractable and intractable instances.

Incremental Generalized Category Discovery

  • Authors: Bingchen Zhao, Oisin Mac Aodha
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14310
  • Pdf link: https://arxiv.org/pdf/2304.14310
  • Abstract
    We explore the problem of Incremental Generalized Category Discovery (IGCD). This is a challenging category incremental learning setting where the goal is to develop models that can correctly categorize images from previously seen categories, in addition to discovering novel ones. Learning is performed over a series of time steps where the model obtains new labeled and unlabeled data, and discards old data, at each iteration. The difficulty of the problem is compounded in our generalized setting as the unlabeled data can contain images from categories that may or may not have been observed before. We present a new method for IGCD which combines non-parametric categorization with efficient image sampling to mitigate catastrophic forgetting. To quantify performance, we propose a new benchmark dataset named iNatIGCD that is motivated by a real-world fine-grained visual categorization task. In our experiments we outperform existing related methods

Empirical Individual State Observability

  • Authors: Benjamin Cellini, Burak Boyacıoğlu, Floris van Breugel
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.14313
  • Pdf link: https://arxiv.org/pdf/2304.14313
  • Abstract
    A dynamical system is observable if there is a one-to-one mapping from the system's measured outputs and inputs to all of the system's states. Analytical and empirical tools exist for quantifying the (full state) observability of linear and nonlinear systems; however, empirical tools for evaluating the observability of individual state variables are lacking. Here, a new empirical approach termed Empirical Individual State Observability (E-ISO) is developed to quantify the level of observability of individual state variables. E-ISO first builds an empirical observability matrix via simulation, then applies convex optimization to efficiently determine the subset of its rows required to estimate each state variable individually. Finally, (un)observability measures for these subsets are calculated to provide independent estimates of the observability of each state variable. Multiple example applications of E-ISO on linear and nonlinear systems are shown to be consistent with analytical results. Broadly, E-ISO will be an invaluable tool both for designing active sensing control laws or optimizing sensor placement to increase the observability of individual state variables for engineered systems, and analyzing the trajectory decisions made by organisms.

SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

  • Authors: Yichen Xie, Chenfeng Xu, Marie-Julie Rakotosaona, Patrick Rim, Federico Tombari, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14340
  • Pdf link: https://arxiv.org/pdf/2304.14340
  • Abstract
    By identifying four important components of existing LiDAR-camera 3D object detection methods (LiDAR and camera candidates, transformation, and fusion outputs), we observe that all existing methods either find dense candidates or yield dense representations of scenes. However, given that objects occupy only a small part of a scene, finding dense candidates and generating dense representations is noisy and inefficient. We propose SparseFusion, a novel multi-sensor 3D detection method that exclusively uses sparse candidates and sparse representations. Specifically, SparseFusion utilizes the outputs of parallel detectors in the LiDAR and camera modalities as sparse candidates for fusion. We transform the camera candidates into the LiDAR coordinate space by disentangling the object representations. Then, we can fuse the multi-modality candidates in a unified 3D space by a lightweight self-attention module. To mitigate negative transfer between modalities, we propose novel semantic and geometric cross-modality transfer modules that are applied prior to the modality-specific detectors. SparseFusion achieves state-of-the-art performance on the nuScenes benchmark while also running at the fastest speed, even outperforming methods with stronger backbones. We perform extensive experiments to demonstrate the effectiveness and efficiency of our modules and overall method pipeline. Our code will be made publicly available at https://github.com/yichen928/SparseFusion.

$π$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

  • Authors: Chengyue Wu, Teng Wang, Yixiao Ge, Zeyu Lu, Ruisong Zhou, Ping Luo, Ying Shan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14381
  • Pdf link: https://arxiv.org/pdf/2304.14381
  • Abstract
    Foundation models have achieved great advances in multi-task learning with a unified interface of unimodal and multimodal tasks. However, the potential of such multi-task learners has not been exploited during transfer learning. In this work, we present a universal parameter-efficient transfer learning method, termed Predict-Interpolate Tuning ($\pi$-Tuning), for vision, language, and vision-language tasks. It aggregates the parameters of lightweight task-specific experts learned from similar tasks to aid the target downstream task. The task similarities are predicted in a unified modality-independent space, yielding a scalable graph to demonstrate task relationships. $\pi$-Tuning has several appealing benefits. First, it flexibly explores both intra- and inter-modal transferability between similar tasks to improve the accuracy and robustness of transfer learning, especially in data-scarce scenarios. Second, it offers a systematical solution for transfer learning with multi-task prediction-and-then-interpolation, compatible with diverse types of parameter-efficient experts, such as prompt and adapter. Third, an extensive study of task-level mutual benefits on 14 unimodal and 6 multimodal datasets shows that $\pi$-Tuning surpasses fine-tuning and other parameter-efficient transfer learning methods both in full-shot and low-shot regimes. The task graph also enables an in-depth interpretable analysis of task transferability across modalities.

Dynamic Pricing and Learning with Bayesian Persuasion

  • Authors: Shipra Agrawal, Yiding Feng, Wei Tang
  • Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14385
  • Pdf link: https://arxiv.org/pdf/2304.14385
  • Abstract
    We consider a novel dynamic pricing and learning setting where in addition to setting prices of products in sequential rounds, the seller also ex-ante commits to 'advertising schemes'. That is, in the beginning of each round the seller can decide what kind of signal they will provide to the buyer about the product's quality upon realization. Using the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses, we formulate the problem of finding an optimal design of the advertising scheme along with a pricing scheme that maximizes the seller's expected revenue. Without any apriori knowledge of the buyers' demand function, our goal is to design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy. We study the regret of the algorithm when compared to the optimal clairvoyant price and advertising scheme. Our main result is a computationally efficient online algorithm that achieves an $O(T^{2/3}(m\log T)^{1/3})$ regret bound when the valuation function is linear in the product quality. Here $m$ is the cardinality of the discrete product quality domain and $T$ is the time horizon. This result requires some natural monotonicity and Lipschitz assumptions on the valuation function, but no Lipschitz or smoothness assumption on the buyers' demand function. For constant $m$, our result matches the regret lower bound for dynamic pricing within logarithmic factors, which is a special case of our problem. We also obtain several improved results for the widely considered special case of additive valuations, including an $\tilde{O}(T^{2/3})$ regret bound independent of $m$ when $m\le T^{1/3}$.

string2string: A Modern Python Library for String-to-String Algorithms

  • Authors: Mirac Suzgun, Stuart M. Shieber, Dan Jurafsky
  • Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL)
  • Arxiv link: https://arxiv.org/abs/2304.14395
  • Pdf link: https://arxiv.org/pdf/2304.14395
  • Abstract
    We introduce string2string, an open-source library that offers a comprehensive suite of efficient algorithms for a broad range of string-to-string problems. It includes traditional algorithmic solutions as well as recent advanced neural approaches to tackle various problems in string alignment, distance measurement, lexical and semantic search, and similarity analysis -- along with several helpful visualization tools and metrics to facilitate the interpretation and analysis of these methods. Notable algorithms featured in the library include the Smith-Waterman algorithm for pairwise local alignment, the Hirschberg algorithm for global alignment, the Wagner-Fisher algorithm for edit distance, BARTScore and BERTScore for similarity analysis, the Knuth-Morris-Pratt algorithm for lexical search, and Faiss for semantic search. Besides, it wraps existing efficient and widely-used implementations of certain frameworks and metrics, such as sacreBLEU and ROUGE, whenever it is appropriate and suitable. Overall, the library aims to provide extensive coverage and increased flexibility in comparison to existing libraries for strings. It can be used for many downstream applications, tasks, and problems in natural-language processing, bioinformatics, and computational social sciences. It is implemented in Python, easily installable via pip, and accessible through a simple API. Source code, documentation, and tutorials are all available on our GitHub page: https://github.com/stanfordnlp/string2string.

Maximizing Model Generalization for Manufacturing with Self-Supervised Learning and Federated Learning

  • Authors: Matthew Russell, Peng Wang
  • Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.14398
  • Pdf link: https://arxiv.org/pdf/2304.14398
  • Abstract
    Deep Learning (DL) can diagnose faults and assess machine health from raw condition monitoring data without manually designed statistical features. However, practical manufacturing applications remain extremely difficult for existing DL methods. Machine data is often unlabeled and from very few health conditions (e.g., only normal operating data). Furthermore, models often encounter shifts in domain as process parameters change and new categories of faults emerge. Traditional supervised learning may struggle to learn compact, discriminative representations that generalize to these unseen target domains since it depends on having plentiful classes to partition the feature space with decision boundaries. Transfer Learning (TL) with domain adaptation attempts to adapt these models to unlabeled target domains but assumes similar underlying structure that may not be present if new faults emerge. This study proposes focusing on maximizing the feature generality on the source domain and applying TL via weight transfer to copy the model to the target domain. Specifically, Self-Supervised Learning (SSL) with Barlow Twins may produce more discriminative features for monitoring health condition than supervised learning by focusing on semantic properties of the data. Furthermore, Federated Learning (FL) for distributed training may also improve generalization by efficiently expanding the effective size and diversity of training data by sharing information across multiple client machines. Results show that Barlow Twins outperforms supervised learning in an unlabeled target domain with emerging motor faults when the source training data contains very few distinct categories. Incorporating FL may also provide a slight advantage by diffusing knowledge of health conditions between machines.

Keyword: faster

Physics-informed neural networks for predicting gas flow dynamics and unknown parameters in diesel engines

  • Authors: Kamaljyoti Nath, Xuhui Meng, Daniel J Smith, George Em Karniadakis
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13799
  • Pdf link: https://arxiv.org/pdf/2304.13799
  • Abstract
    This paper presents a physics-informed neural network (PINN) approach for monitoring the health of diesel engines. The aim is to evaluate the engine dynamics, identify unknown parameters in a "mean value" model, and anticipate maintenance requirements. The PINN model is applied to diesel engines with a variable-geometry turbocharger and exhaust gas recirculation, using measurement data of selected state variables. The results demonstrate the ability of the PINN model to predict simultaneously both unknown parameters and dynamics accurately with both clean and noisy data, and the importance of the self-adaptive weight in the loss function for faster convergence. The input data for these simulations are derived from actual engine running conditions, while the outputs are simulated data, making this a practical case study of PINN's ability to predict real-world dynamical systems. The mean value model of the diesel engine incorporates empirical formulae to represent certain states, but these formulae may not be generalizable to other engines. To address this, the study considers the use of deep neural networks (DNNs) in addition to the PINN model. The DNNs are trained using laboratory test data and are used to model the engine-specific empirical formulae in the mean value model, allowing for a more flexible and adaptive representation of the engine's states. In other words, the mean value model uses both the PINN model and the DNNs to represent the engine's states, with the PINN providing a physics-based understanding of the engine's overall dynamics and the DNNs offering a more engine-specific and adaptive representation of the empirical formulae. By combining these two approaches, the study aims to offer a comprehensive and versatile approach to monitoring the health and performance of diesel engines.

A Survey on Solving and Discovering Differential Equations Using Deep Neural Networks

  • Authors: Hyeonjung (Tari)Jung, Jayant Gupta, Bharat Jayaprakash, Matthew Eagon, Harish Panneer Selvam, Carl Molnar, William Northrop, Shashi Shekhar
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2304.13807
  • Pdf link: https://arxiv.org/pdf/2304.13807
  • Abstract
    Ordinary and partial differential equations (DE) are used extensively in scientific and mathematical domains to model physical systems. Current literature has focused primarily on deep neural network (DNN) based methods for solving a specific DE or a family of DEs. Research communities with a history of using DE models may view DNN-based differential equation solvers (DNN-DEs) as a faster and transferable alternative to current numerical methods. However, there is a lack of systematic surveys detailing the use of DNN-DE methods across physical application domains and a generalized taxonomy to guide future research. This paper surveys and classifies previous works and provides an educational tutorial for senior practitioners, professionals, and graduate students in engineering and computer science. First, we propose a taxonomy to navigate domains of DE systems studied under the umbrella of DNN-DE. Second, we examine the theory and performance of the Physics Informed Neural Network (PINN) to demonstrate how the influential DNN-DE architecture mathematically solves a system of equations. Third, to reinforce the key ideas of solving and discovery of DEs using DNN, we provide a tutorial using DeepXDE, a Python package for developing PINNs, to develop DNN-DEs for solving and discovering a classic DE, the linear transport equation.

Variational Bayes Made Easy

  • Authors: Mohammad Emtiyaz Khan
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.14251
  • Pdf link: https://arxiv.org/pdf/2304.14251
  • Abstract
    Variational Bayes is a popular method for approximate inference but its derivation can be cumbersome. To simplify the process, we give a 3-step recipe to identify the posterior form by explicitly looking for linearity with respect to expectations of well-known distributions. We can then directly write the update by simply ``reading-off'' the terms in front of those expectations. The recipe makes the derivation easier, faster, shorter, and more general.

Keyword: mobile

AI-based Predictive Analytic Approaches for safeguarding the Future of Electric/Hybrid Vehicles

  • Authors: Ishan Shivansh Bangroo
  • Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.13841
  • Pdf link: https://arxiv.org/pdf/2304.13841
  • Abstract
    In response to the global need for sustainable energy, green technology may help fight climate change. Before green infrastructure to be easily integrated into the world's energy system, it needs upgrading. By improving energy infrastructure and decision-making, artificial intelligence (AI) may help solve this challenge. EHVs have grown in popularity because to concerns about global warming and the need for more ecologically friendly transportation. EHVs may work better with cutting-edge technologies like AI. Electric vehicles (EVs) reduce greenhouse gas emissions and promote sustainable mobility. Electric automobiles (EVs) are growing in popularity due to their benefits for climate change mitigation and sustainable mobility. Unfortunately, EV production consumes a lot of energy and materials, which may harm nature. EV production is being improved using green technologies like artificial intelligence and predictive analysis. Electric and hybrid vehicles (EHVs) may help meet the need for ecologically friendly transportation. However, the Battery Management System (BMS) controls EHV performance and longevity. AI may improve EHV energy efficiency, emissions reduction, and sustainability. Remote hijacking, security breaches, and unauthorized access are EHV cybersecurity vulnerabilities addressed in the article. AI research and development may help make transportation more sustainable, as may optimizing EHVs and charging infrastructure.

Detecting inner-LAN anomalies using hierarchical forecasting

  • Authors: Sevvandi Kandanaarachchi, Mahdi Abolghasemi, Hideya Ochiai, Asha Rao
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.13941
  • Pdf link: https://arxiv.org/pdf/2304.13941
  • Abstract
    Increasing activity and the number of devices online are leading to increasing and more diverse cyber attacks. This continuously evolving attack activity makes signature-based detection methods ineffective. Once malware has infiltrated into a LAN, bypassing an external gateway or entering via an unsecured mobile device, it can potentially infect all nodes in the LAN as well as carry out nefarious activities such as stealing valuable data, leading to financial damage and loss of reputation. Such infiltration could be viewed as an insider attack, increasing the need for LAN monitoring and security. In this paper we aim to detect such inner-LAN activity by studying the variations in Address Resolution Protocol (ARP) calls within the LAN. We find anomalous nodes by modelling inner-LAN traffic using hierarchical forecasting methods. We substantially reduce the false positives ever present in anomaly detection, by using an extreme value theory based method. We use a dataset from a real inner-LAN monitoring project, containing over 10M ARP calls from 362 nodes. Furthermore, the small number of false positives generated using our methods, is a potential solution to the "alert fatigue" commonly reported by security experts.

A Review of Panoptic Segmentation for Mobile Mapping Point Clouds

  • Authors: Binbin Xiang, Yuanwen Yue, Torben Peters, Konrad Schindler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.13980
  • Pdf link: https://arxiv.org/pdf/2304.13980
  • Abstract
    3D point cloud panoptic segmentation is the combined task to (i) assign each point to a semantic class and (ii) separate the points in each class into object instances. Recently there has been an increased interest in such comprehensive 3D scene understanding, building on the rapid advances of semantic segmentation due to the advent of deep 3D neural networks. Yet, to date there is very little work about panoptic segmentation of outdoor mobile-mapping data, and no systematic comparisons. The present paper tries to close that gap. It reviews the building blocks needed to assemble a panoptic segmentation pipeline and the related literature. Moreover, a modular pipeline is set up to perform comprehensive, systematic experiments to assess the state of panoptic segmentation in the context of street mapping. As a byproduct, we also provide the first public dataset for that task, by extending the NPM3D dataset to include instance labels.

A Supervised Machine Learning Approach to Operator Intent Recognition for Teleoperated Mobile Robot Navigation

  • Authors: Evangelos Tsagkournis, Dimitris Panagopoulos, Giannis Petousakis, Grigoris Nikolaou, Rustam Stolkin, Manolis Chiou
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14003
  • Pdf link: https://arxiv.org/pdf/2304.14003
  • Abstract
    In applications that involve human-robot interaction (HRI), human-robot teaming (HRT), and cooperative human-machine systems, the inference of the human partner's intent is of critical importance. This paper presents a method for the inference of the human operator's navigational intent, in the context of mobile robots that provide full or partial (e.g., shared control) teleoperation. We propose the Machine Learning Operator Intent Inference (MLOII) method, which a) processes spatial data collected by the robot's sensors; b) utilizes a supervised machine learning algorithm to estimate the operator's most probable navigational goal online. The proposed method's ability to reliably and efficiently infer the intent of the human operator is experimentally evaluated in realistically simulated exploration and remote inspection scenarios. The results in terms of accuracy and uncertainty indicate that the proposed method is comparable to another state-of-the-art method found in the literature.

MCLFIQ: Mobile Contactless Fingerprint Image Quality

  • Authors: Jannis Priesnitz, Axel Weißenfeld, Christian Rathgeb, Bernhard Strobl, Ralph Lessmann, Christoph Busch1
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14123
  • Pdf link: https://arxiv.org/pdf/2304.14123
  • Abstract
    We propose MCLFIQ: Mobile Contactless Fingerprint Image Quality, the first quality assessment algorithm for mobile contactless fingerprint samples. To this end, we retrained the NIST Fingerprint Image Quality (NFIQ) 2 method, which was originally designed for contact-based fingerprints, with a synthetic contactless fingerprint database. We evaluate the predictive performance of the resulting MCLFIQ model in terms of Error-vs.-Discard Characteristic (EDC) curves on three real-world contactless fingerprint databases using two recognition algorithms. In experiments, the MCLFIQ method is compared against the original NFIQ 2 method and a sharpness-based quality assessment algorithm developed for contactless fingerprint images. Obtained results show that the re-training of NFIQ 2 on synthetic data is a viable alternative to training on real databases. Moreover, the evaluation shows that our MCLFIQ method works more accurate and robust compared to NFIQ 2 and the sharpness-based quality assessment. We suggest considering the proposed MCLFIQ method as a candidate for a new standard algorithm for contactless fingerprint quality assessment.

Combining HoloLens with Instant-NeRFs: Advanced Real-Time 3D Mobile Mapping

  • Authors: Dennis Haitz, Boris Jutzi, Markus Ulrich, Miriam Jaeger, Patrick Huebner
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14301
  • Pdf link: https://arxiv.org/pdf/2304.14301
  • Abstract
    This work represents a large step into modern ways of fast 3D reconstruction based on RGB camera images. Utilizing a Microsoft HoloLens 2 as a multisensor platform that includes an RGB camera and an inertial measurement unit for SLAM-based camera-pose determination, we train a Neural Radiance Field (NeRF) as a neural scene representation in real-time with the acquired data from the HoloLens. The HoloLens is connected via Wifi to a high-performance PC that is responsible for the training and 3D reconstruction. After the data stream ends, the training is stopped and the 3D reconstruction is initiated, which extracts a point cloud of the scene. With our specialized inference algorithm, five million scene points can be extracted within 1 second. In addition, the point cloud also includes radiometry per point. Our method of 3D reconstruction outperforms grid point sampling with NeRFs by multiple orders of magnitude and can be regarded as a complete real-time 3D reconstruction method in a mobile mapping setup.

A Versatile Low-Complexity Feedback Scheme for FDD Systems via Generative Modeling

  • Authors: Nurettin Turan, Benedikt Fesl, Michael Koller, Michael Joham, Wolfgang Utschick
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.14373
  • Pdf link: https://arxiv.org/pdf/2304.14373
  • Abstract
    In this work, we propose a versatile feedback scheme which can be deployed for both single- and multi-user multiple-input multiple-output (MIMO) frequency division duplex (FDD) systems. Particularly, we propose to use a Gaussian mixture model (GMM) with a reduced number of parameters for codebook construction, feedback encoding, and precoder design. The GMM is fitted offline at the base station (BS) to uplink (UL) training samples to approximate the channel distribution of all possible mobile terminals (MTs) located inside the BS cell. Afterwards, a codebook is constructed, where each codebook entry is based on one GMM component. By extracting directional information of the constructed codebook, the proposed GMM-based feedback approach allows to jointly design the precoders of a multi-user MIMO (MU-MIMO) system using common precoding algorithms. Alternatively, the GMM's sample generation ability can be utilized to design the precoders using a state-of-the-art stochastic iterative algorithm. After offloading the GMM to the MTs, they determine their feedback simply as the index of the GMM component with the highest responsibility for their received pilot signal. This strategy exhibits low complexity and allows for parallelization. Simulation results show that the proposed approach outperforms conventional methods, especially for a reduced number of pilots.

Keyword: pruning

Fine Tuning with Abnormal Examples

  • Authors: Will Rieger
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.13783
  • Pdf link: https://arxiv.org/pdf/2304.13783
  • Abstract
    Given the prevalence of crowd sourced labor in creating Natural Language processing datasets, these aforementioned sets have become increasingly large. For instance, the SQUAD dataset currently sits at over 80,000 records. However, because the English language is rather repetitive in structure, the distribution of word frequencies in the SQUAD dataset's contexts are relatively unchanged. By measuring each sentences distance from the co-variate distance of frequencies of all sentences in the dataset, we identify 10,500 examples that create a more uniform distribution for training. While fine-tuning ELECTRA [4] on this subset of examples reaches better performance to a model trained on all 87,000 examples. Herein we introduce a methodology for systematically pruning datasets for fine tuning reaching better out of sample performance.

JaxPruner: A concise library for sparsity research

  • Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Karolina Dziugaite, Pablo Samuel Castro, Utku Evci
  • Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.14082
  • Pdf link: https://arxiv.org/pdf/2304.14082
  • Abstract
    This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks.

Keyword: voxel

There is no result

Keyword: lidar

Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point Clouds

  • Authors: Pengfei Song, Luoyu MEI, Han Cheng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); General Topology (math.GN)
  • Arxiv link: https://arxiv.org/abs/2304.14132
  • Pdf link: https://arxiv.org/pdf/2304.14132
  • Abstract
    This paper presents a framework for semantic segmentation on sparse sequential point clouds of millimeter-wave radar. Compared with cameras and lidars, millimeter-wave radars have the advantage of not revealing privacy, having a strong anti-interference ability, and having long detection distance. The sparsity and capturing temporal-topological features of mmWave data is still a problem. However, the issue of capturing the temporal-topological coupling features under the human semantic segmentation task prevents previous advanced segmentation methods (e.g PointNet, PointCNN, Point Transformer) from being well utilized in practical scenarios. To address the challenge caused by the sparsity and temporal-topological feature of the data, we (i) introduce graph structure and topological features to the point cloud, (ii) propose a semantic segmentation framework including a global feature-extracting module and a sequential feature-extracting module. In addition, we design an efficient and more fitting loss function for a better training process and segmentation results based on graph clustering. Experimentally, we deploy representative semantic segmentation algorithms (Transformer, GCNN, etc.) on a custom dataset. Experimental results indicate that our model achieves mean accuracy on the custom dataset by $\mathbf{82.31}%$ and outperforms the state-of-the-art algorithms. Moreover, to validate the model's robustness, we deploy our model on the well-known S3DIS dataset. On the S3DIS dataset, our model achieves mean accuracy by $\mathbf{92.6}%$, outperforming baseline algorithms.

Quadric Representations for LiDAR Odometry, Mapping and Localization

  • Authors: Chao Xia, Chenfeng Xu, Patrick Rim, Mingyu Ding, Nanning Zheng, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14190
  • Pdf link: https://arxiv.org/pdf/2304.14190
  • Abstract
    Current LiDAR odometry, mapping and localization methods leverage point-wise representations of 3D scenes and achieve high accuracy in autonomous driving tasks. However, the space-inefficiency of methods that use point-wise representations limits their development and usage in practical applications. In particular, scan-submap matching and global map representation methods are restricted by the inefficiency of nearest neighbor searching (NNS) for large-volume point clouds. To improve space-time efficiency, we propose a novel method of describing scenes using quadric surfaces, which are far more compact representations of 3D objects than conventional point clouds. In contrast to point cloud-based methods, our quadric representation-based method decomposes a 3D scene into a collection of sparse quadric patches, which improves storage efficiency and avoids the slow point-wise NNS process. Our method first segments a given point cloud into patches and fits each of them to a quadric implicit function. Each function is then coupled with other geometric descriptors of the patch, such as its center position and covariance matrix. Collectively, these patch representations fully describe a 3D scene, which can be used in place of the original point cloud and employed in LiDAR odometry, mapping and localization algorithms. We further design a novel incremental growing method for quadric representations, which eliminates the need to repeatedly re-fit quadric surfaces from the original point cloud. Extensive odometry, mapping and localization experiments on large-volume point clouds in the KITTI and UrbanLoco datasets demonstrate that our method maintains low latency and memory utility while achieving competitive, and even superior, accuracy.

A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving Services

  • Authors: Dewant Katare, Diego Perino, Jari Nurmi, Martijn Warnier, Marijn Janssen, Aaron Yi Ding
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14271
  • Pdf link: https://arxiv.org/pdf/2304.14271
  • Abstract
    Autonomous driving services rely heavily on sensors such as cameras, LiDAR, radar, and communication modules. A common practice of processing the sensed data is using a high-performance computing unit placed inside the vehicle, which deploys AI models and algorithms to act as the brain or administrator of the vehicle. The vehicular data generated from average hours of driving can be up to 20 Terabytes depending on the data rate and specification of the sensors. Given the scale and fast growth of services for autonomous driving, it is essential to improve the overall energy and environmental efficiency, especially in the trend towards vehicular electrification (e.g., battery-powered). Although the areas have seen significant advancements in sensor technologies, wireless communications, computing and AI/ML algorithms, the challenge still exists in how to apply and integrate those technology innovations to achieve energy efficiency. This survey reviews and compares the connected vehicular applications, vehicular communications, approximation and Edge AI techniques. The focus is on energy efficiency by covering newly proposed approximation and enabling frameworks. To the best of our knowledge, this survey is the first to review the latest approximate Edge AI frameworks and publicly available datasets in energy-efficient autonomous driving. The insights and vision from this survey can be beneficial for the collaborative driving service development on low-power and memory-constrained systems and also for the energy optimization of autonomous vehicles.

SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

  • Authors: Yichen Xie, Chenfeng Xu, Marie-Julie Rakotosaona, Patrick Rim, Federico Tombari, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14340
  • Pdf link: https://arxiv.org/pdf/2304.14340
  • Abstract
    By identifying four important components of existing LiDAR-camera 3D object detection methods (LiDAR and camera candidates, transformation, and fusion outputs), we observe that all existing methods either find dense candidates or yield dense representations of scenes. However, given that objects occupy only a small part of a scene, finding dense candidates and generating dense representations is noisy and inefficient. We propose SparseFusion, a novel multi-sensor 3D detection method that exclusively uses sparse candidates and sparse representations. Specifically, SparseFusion utilizes the outputs of parallel detectors in the LiDAR and camera modalities as sparse candidates for fusion. We transform the camera candidates into the LiDAR coordinate space by disentangling the object representations. Then, we can fuse the multi-modality candidates in a unified 3D space by a lightweight self-attention module. To mitigate negative transfer between modalities, we propose novel semantic and geometric cross-modality transfer modules that are applied prior to the modality-specific detectors. SparseFusion achieves state-of-the-art performance on the nuScenes benchmark while also running at the fastest speed, even outperforming methods with stronger backbones. We perform extensive experiments to demonstrate the effectiveness and efficiency of our modules and overall method pipeline. Our code will be made publicly available at https://github.com/yichen928/SparseFusion.

SMAT: A Self-Reinforcing Framework for Simultaneous Mapping and Tracking in Unbounded Urban Environments

  • Authors: Tingxiang Fan, Bowen Shen, Yinqiang Zhang, Chuye Zhang, Lei Yang, Hua Chen, Wei Zhang, Jia Pan
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14356
  • Pdf link: https://arxiv.org/pdf/2304.14356
  • Abstract
    With the increasing prevalence of robots in daily life, it is crucial to enable robots to construct a reliable map online to navigate in unbounded and changing environments. Although existing methods can individually achieve the goals of spatial mapping and dynamic object detection and tracking, limited research has been conducted on an effective combination of these two important abilities. The proposed framework, SMAT (Simultaneous Mapping and Tracking), integrates the front-end dynamic object detection and tracking module with the back-end static mapping module using a self-reinforcing mechanism, which promotes mutual improvement of mapping and tracking performance. The conducted experiments demonstrate the framework's effectiveness in real-world applications, achieving successful long-range navigation and mapping in multiple urban environments using only one LiDAR, a CPU-only onboard computer, and a consumer-level GPS receiver.

Keyword: diffusion

Towards ethical multimodal systems

  • Authors: Alexis Roger, Esma Aïmeur, Irina Rish
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13765
  • Pdf link: https://arxiv.org/pdf/2304.13765
  • Abstract
    The impact of artificial intelligence systems on our society is increasing at an unprecedented speed. For instance, ChatGPT is being tested in mental health treatment applications such as Koko, Stable Diffusion generates pieces of art competitive with (or outperforming) human artists, and so on. Ethical concerns regarding the behavior and applications of generative AI systems have been increasing over the past years, and the field of AI alignment - steering the behavior of AI systems towards being aligned with human values - is a rapidly growing subfield of modern AI. In this paper, we address the challenges involved in ethical evaluation of a multimodal artificial intelligence system. The multimodal systems we focus on take both text and an image as input and output text, completing the sentence or answering the question asked as input. We perform the evaluation of these models in two steps: we first discus the creation of a multimodal ethical database and then use this database to construct morality-evaluating algorithms. The creation of the multimodal ethical database is done interactively through human feedback. Users are presented with multiple examples and votes on whether they are ethical or not. Once these answers have been aggregated into a dataset, we built and tested different algorithms to automatically evaluate the morality of multimodal systems. These algorithms aim to classify the answers as ethical or not. The models we tested are a RoBERTa-large classifier and a multilayer perceptron classifier.

Preserving Superconvergence of Spectral Elements for Curved Domains via $h$ and $p$-Geometric Refinement

  • Authors: Jacob Jones, Rebecca Conley, Xiangmin Jiao
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13766
  • Pdf link: https://arxiv.org/pdf/2304.13766
  • Abstract
    Spectral element methods (SEM), which are extensions of finite element methods (FEM), are important emerging techniques for solving partial differential equations in physics and engineering. SEM can potentially deliver better accuracy due to the potential superconvergence for well-shaped tensor-product elements. However, for complex geometries, the accuracy of SEM often degrades due to a combination of geometric inaccuracies near curved boundaries and the loss of superconvergence with simplicial or non-tensor-product elements. We propose to overcome the first issue by using $h$- and $p$-geometric refinement, to refine the mesh near high-curvature regions and increase the degree of geometric basis functions, respectively. We show that when using mixed-meshes with tensor-product elements in the interior of the domain, curvature-based geometric refinement near boundaries can improve the accuracy of the interior elements by reducing pollution errors and preserving the superconvergence. To overcome the second issue, we apply a post-processing technique to recover the accuracy near the curved boundaries by using the adaptive extended stencil finite element method (AES-FEM). The combination of curvature-based geometric refinement and accurate post-processing delivers an effective and easier-to-implement alternative to other methods based on exact geometries. We demonstrate our techniques by solving the convection-diffusion equation in 2D and show one to two orders of magnitude of improvement in the solution accuracy, even when the elements are poorly shaped near boundaries.

Multimodal Composite Association Score: Measuring Gender Bias in Generative Multimodal Models

  • Authors: Abhishek Mandal, Susan Leavy, Suzanne Little
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13855
  • Pdf link: https://arxiv.org/pdf/2304.13855
  • Abstract
    Generative multimodal models based on diffusion models have seen tremendous growth and advances in recent years. Models such as DALL-E and Stable Diffusion have become increasingly popular and successful at creating images from texts, often combining abstract ideas. However, like other deep learning models, they also reflect social biases they inherit from their training data, which is often crawled from the internet. Manually auditing models for biases can be very time and resource consuming and is further complicated by the unbounded and unconstrained nature of inputs these models can take. Research into bias measurement and quantification has generally focused on small single-stage models working on a single modality. Thus the emergence of multistage multimodal models requires a different approach. In this paper, we propose Multimodal Composite Association Score (MCAS) as a new method of measuring gender bias in multimodal generative models. Evaluating both DALL-E 2 and Stable Diffusion using this approach uncovered the presence of gendered associations of concepts embedded within the models. We propose MCAS as an accessible and scalable method of quantifying potential bias for models with different modalities and a range of potential biases.

Two kinds of numerical algorithms for ultra-slow diffusion equations

  • Authors: Min Cai, Changpin Li, Yu Wang
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13966
  • Pdf link: https://arxiv.org/pdf/2304.13966
  • Abstract
    In this article, two kinds of numerical algorithms are derived for the ultra-slow (or superslow) diffusion equation in one and two space dimensions, where the ultra-slow diffusion is characterized by the Caputo-Hadamard fractional derivative of order $\alpha \in (0,1)$. To describe the spatial interaction, the Riesz fractional derivative and the fractional Laplacian are used in one and two space dimensions, respectively. The Caputo-Hadamard derivative is discretized by two typical approximate formulae, i.e., L2-1${\sigma}$ and L1-2 methods. The spatial fractional derivatives are discretized by the 2-nd order finite difference methods. When L2-1${\sigma}$ discretization is used, the derived numerical scheme is unconditionally stable with error estimate $\mathcal{O}(\tau^{2}+h^{2})$ for all $\alpha \in (0, 1)$, in which $\tau$ and $h$ are temporal and spatial stepsizes, respectively. When L1-2 discretization is used, the derived numerical scheme is stable with error estimate $\mathcal{O}(\tau^{3-\alpha}+h^{2})$ for $\alpha \in (0, 0.3738)$. The illustrative examples displayed are in line with the theoretical analysis.

Edit Everything: A Text-Guided Generative System for Images Editing

  • Authors: Defeng Xie, Ruichen Wang, Jian Ma, Chen Chen, Haonan Lu, Dong Yang, Fobo Shi, Xiaodong Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14006
  • Pdf link: https://arxiv.org/pdf/2304.14006
  • Abstract
    We introduce a new generative system called Edit Everything, which can take image and text inputs and produce image outputs. Edit Everything allows users to edit images using simple text instructions. Our system designs prompts to guide the visual module in generating requested images. Experiments demonstrate that Edit Everything facilitates the implementation of the visual aspects of Stable Diffusion with the use of Segment Anything model and CLIP. Our system is publicly available at https://github.com/DefengXie/Edit_Everything.

Localized orthogonal decomposition for a multiscale parabolic stochastic partial differential equation

  • Authors: Annika Lang, Per Ljung, Axel Målqvist
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14049
  • Pdf link: https://arxiv.org/pdf/2304.14049
  • Abstract
    A multiscale method is proposed for a parabolic stochastic partial differential equation with additive noise and highly oscillatory diffusion. The framework is based on the localized orthogonal decomposition (LOD) method and computes a coarse-scale representation of the elliptic operator, enriched by fine-scale information on the diffusion. Optimal order strong convergence is derived. The LOD technique is combined with a (multilevel) Monte-Carlo estimator and the weak error is analyzed. Numerical examples that confirm the theoretical findings are provided, and the computational efficiency of the method is highlighted.

DataComp: In search of the next generation of multimodal datasets

  • Authors: Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14108
  • Pdf link: https://arxiv.org/pdf/2304.14108
  • Abstract
    Large multimodal datasets have been instrumental in recent breakthroughs such as CLIP, Stable Diffusion, and GPT-4. At the same time, datasets rarely receive the same research attention as model architectures or training algorithms. To address this shortcoming in the machine learning ecosystem, we introduce DataComp, a benchmark where the training code is fixed and researchers innovate by proposing new training sets. We provide a testbed for dataset experiments centered around a new candidate pool of 12.8B image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or curate new data sources and then evaluate their new dataset by running our standardized CLIP training code and testing on 38 downstream test sets. Our benchmark consists of multiple scales, with four candidate pool sizes and associated compute budgets ranging from 12.8M to 12.8B samples seen during training. This multi-scale design facilitates the study of scaling trends and makes the benchmark accessible to researchers with varying resources. Our baseline experiments show that the DataComp workflow is a promising way of improving multimodal datasets. We introduce DataComp-1B, a dataset created by applying a simple filtering algorithm to the 12.8B candidate pool. The resulting 1.4B subset enables training a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet. Our new ViT-L/14 model outperforms a larger ViT-g/14 trained on LAION-2B by 0.7 percentage points while requiring 9x less training compute. We also outperform OpenAI's CLIP ViT-L/14 by 3.7 percentage points, which is trained with the same compute budget as our model. These gains highlight the potential for improving model performance by carefully curating training sets. We view DataComp-1B as only the first step and hope that DataComp paves the way toward the next generation of multimodal datasets.

Functional Diffusion Maps

  • Authors: María Barroso, Carlos María Alaíz, Ángela Fernández, Jose Luis Torrecilla
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.14378
  • Pdf link: https://arxiv.org/pdf/2304.14378
  • Abstract
    Nowadays many real-world datasets can be considered as functional, in the sense that the processes which generate them are continuous. A fundamental property of this type of data is that in theory they belong to an infinite-dimensional space. Although in practice we usually receive finite observations, they are still high-dimensional and hence dimensionality reduction methods are crucial. In this vein, the main state-of-the-art method for functional data analysis is Functional PCA. Nevertheless, this classic technique assumes that the data lie in a linear manifold, and hence it could have problems when this hypothesis is not fulfilled. In this research, attention has been placed on a non-linear manifold learning method: Diffusion Maps. The article explains how to extend this multivariate method to functional data and compares its behavior against Functional PCA over different simulated and real examples.

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

  • Authors: Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin, Ming-Hsuan Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14404
  • Pdf link: https://arxiv.org/pdf/2304.14404
  • Abstract
    Recent advancements in diffusion models have greatly improved the quality and diversity of synthesized content. To harness the expressive power of diffusion models, researchers have explored various controllable mechanisms that allow users to intuitively guide the content synthesis process. Although the latest efforts have primarily focused on video synthesis, there has been a lack of effective methods for controlling and describing desired content and motion. In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis. To tackle the ambiguity of sparse motion inputs and achieve better synthesis quality, MCDiff first utilizes a flow completion model to predict the dense video motion based on the semantic understanding of the video frame and the sparse motion control. Then, the diffusion model synthesizes high-quality future frames to form the output video. We qualitatively and quantitatively show that MCDiff achieves the state-the-of-art visual quality in stroke-guided controllable video synthesis. Additional experiments on MPII Human Pose further exhibit the capability of our model on diverse content and motion synthesis.

Putting People in Their Place: Affordance-Aware Human Insertion into Scenes

  • Authors: Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, Jingwan Lu, Alexei A. Efros, Krishna Kumar Singh
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14406
  • Pdf link: https://arxiv.org/pdf/2304.14406
  • Abstract
    We study the problem of inferring scene affordances by presenting a method for realistically inserting people into scenes. Given a scene image with a marked region and an image of a person, we insert the person into the scene while respecting the scene affordances. Our model can infer the set of realistic poses given the scene context, re-pose the reference person, and harmonize the composition. We set up the task in a self-supervised fashion by learning to re-pose humans in video clips. We train a large-scale diffusion model on a dataset of 2.4M video clips that produces diverse plausible poses while respecting the scene context. Given the learned human-scene composition, our model can also hallucinate realistic people and scenes when prompted without conditioning and also enables interactive editing. A quantitative evaluation shows that our method synthesizes more realistic human appearance and more natural human-scene interactions than prior work.

Keyword: dynamic

TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation

  • Authors: Zhaoyan Liu, Noel Vouitsis, Satya Krishna Gorti, Jimmy Ba, Gabriel Loaiza-Ganem
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.13742
  • Pdf link: https://arxiv.org/pdf/2304.13742
  • Abstract
    We propose TR0N, a highly general framework to turn pre-trained unconditional generative models, such as GANs and VAEs, into conditional models. The conditioning can be highly arbitrary, and requires only a pre-trained auxiliary model. For example, we show how to turn unconditional models into class-conditional ones with the help of a classifier, and also into text-to-image models by leveraging CLIP. TR0N learns a lightweight stochastic mapping which "translates" between the space of conditions and the latent space of the generative model, in such a way that the generated latent corresponds to a data sample satisfying the desired condition. The translated latent samples are then further improved upon through Langevin dynamics, enabling us to obtain higher-quality data samples. TR0N requires no training data nor fine-tuning, yet can achieve a zero-shot FID of 10.9 on MS-COCO, outperforming competing alternatives not only on this metric, but also in sampling speed -- all while retaining a much higher level of generality. Our code is available at https://github.com/layer6ai-labs/tr0n.

Physics-informed neural networks for predicting gas flow dynamics and unknown parameters in diesel engines

  • Authors: Kamaljyoti Nath, Xuhui Meng, Daniel J Smith, George Em Karniadakis
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.13799
  • Pdf link: https://arxiv.org/pdf/2304.13799
  • Abstract
    This paper presents a physics-informed neural network (PINN) approach for monitoring the health of diesel engines. The aim is to evaluate the engine dynamics, identify unknown parameters in a "mean value" model, and anticipate maintenance requirements. The PINN model is applied to diesel engines with a variable-geometry turbocharger and exhaust gas recirculation, using measurement data of selected state variables. The results demonstrate the ability of the PINN model to predict simultaneously both unknown parameters and dynamics accurately with both clean and noisy data, and the importance of the self-adaptive weight in the loss function for faster convergence. The input data for these simulations are derived from actual engine running conditions, while the outputs are simulated data, making this a practical case study of PINN's ability to predict real-world dynamical systems. The mean value model of the diesel engine incorporates empirical formulae to represent certain states, but these formulae may not be generalizable to other engines. To address this, the study considers the use of deep neural networks (DNNs) in addition to the PINN model. The DNNs are trained using laboratory test data and are used to model the engine-specific empirical formulae in the mean value model, allowing for a more flexible and adaptive representation of the engine's states. In other words, the mean value model uses both the PINN model and the DNNs to represent the engine's states, with the PINN providing a physics-based understanding of the engine's overall dynamics and the DNNs offering a more engine-specific and adaptive representation of the empirical formulae. By combining these two approaches, the study aims to offer a comprehensive and versatile approach to monitoring the health and performance of diesel engines.

A Data-Driven Hybrid Automaton Framework to Modeling Complex Dynamical Systems

  • Authors: Yejiang Yang, Zihao Mo, Weiming Xiang
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS)
  • Arxiv link: https://arxiv.org/abs/2304.13811
  • Pdf link: https://arxiv.org/pdf/2304.13811
  • Abstract
    In this paper, a computationally efficient data-driven hybrid automaton model is proposed to capture unknown complex dynamical system behaviors using multiple neural networks. The sampled data of the system is divided by valid partitions into groups corresponding to their topologies and based on which, transition guards are defined. Then, a collection of small-scale neural networks that are computationally efficient are trained as the local dynamical description for their corresponding topologies. After modeling the system with a neural-network-based hybrid automaton, the set-valued reachability analysis with low computation cost is provided based on interval analysis and a split and combined process. At last, a numerical example of the limit cycle is presented to illustrate that the developed models can significantly reduce the computational cost in reachable set computation without sacrificing any modeling precision.

Controlled density transport using Perron Frobenius generators

  • Authors: Jake Buzhardt, Phanindra Tallapragada
  • Subjects: Systems and Control (eess.SY); Robotics (cs.RO); Fluid Dynamics (physics.flu-dyn)
  • Arxiv link: https://arxiv.org/abs/2304.13829
  • Pdf link: https://arxiv.org/pdf/2304.13829
  • Abstract
    We consider the problem of the transport of a density of states from an initial state distribution to a desired final state distribution through a dynamical system with actuation. In particular, we consider the case where the control signal is a function of time, but not space; that is, the same actuation is applied at every point in the state space. This is motivated by several problems in fluid mechanics, such as mixing and manipulation of a collection of particles by a global control input such as a uniform magnetic field, as well as by more general control problems where a density function describes an uncertainty distribution or a distribution of agents in a multi-agent system. We formulate this problem using the generators of the Perron-Frobenius operator associated with the drift and control vector fields of the system. By considering finite-dimensional approximations of these operators, the density transport problem can be expressed as a control problem for a bilinear system in a high-dimensional, lifted state. With this system, we frame the density control problem as a problem of driving moments of the density function to the moments of a desired density function, where the moments of the density can be expressed as an output which is linear in the lifted state. This output tracking problem for the lifted bilinear system is then solved using differential dynamic programming, an iterative trajectory optimization scheme.

Understand the Dynamic World: An End-to-End Knowledge Informed Framework for Open Domain Entity State Tracking

  • Authors: Mingchen Li, Lifu Huang
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13854
  • Pdf link: https://arxiv.org/pdf/2304.13854
  • Abstract
    Open domain entity state tracking aims to predict reasonable state changes of entities (i.e., [attribute] of [entity] was [before_state] and [after_state] afterwards) given the action descriptions. It's important to many reasoning tasks to support human everyday activities. However, it's challenging as the model needs to predict an arbitrary number of entity state changes caused by the action while most of the entities are implicitly relevant to the actions and their attributes as well as states are from open vocabularies. To tackle these challenges, we propose a novel end-to-end Knowledge Informed framework for open domain Entity State Tracking, namely KIEST, which explicitly retrieves the relevant entities and attributes from external knowledge graph (i.e., ConceptNet) and incorporates them to autoregressively generate all the entity state changes with a novel dynamic knowledge grained encoder-decoder framework. To enforce the logical coherence among the predicted entities, attributes, and states, we design a new constraint decoding strategy and employ a coherence reward to improve the decoding process. Experimental results show that our proposed KIEST framework significantly outperforms the strong baselines on the public benchmark dataset OpenPI.

Ensoul: A framework for the creation of self organizing intelligent ultra low power systems (SOULS) through evolutionary enerstatic networks

  • Authors: Ty Roachford
  • Subjects: Artificial Intelligence (cs.AI); Adaptation and Self-Organizing Systems (nlin.AO)
  • Arxiv link: https://arxiv.org/abs/2304.13863
  • Pdf link: https://arxiv.org/pdf/2304.13863
  • Abstract
    Ensoul is a framework proposed for the purpose of creating technologies that create more technologies through the combined use of networks, and nests, of energy homeostatic (enerstatic) loops and open-ended evolutionary techniques. Generative technologies developed by such an approach serve as both simple, yet insightful models of thermodynamically driven complex systems and as powerful sources of novel technologies. "Self Organizing intelligent Ultra Low power Systems" (SOULS) is a term that well describes the technologies produced by such a generative technology, as well as the generative technology itself. The term is meant to capture the abstract nature of such technologies as being independent of the substrate in which they are embedded. In other words, SOULS can be biological, artificial or hybrid in form.

Physics-informed Data-driven Discovery of Constitutive Models with Application to Strain-Rate-sensitive Soft Materials

  • Authors: Kshitiz Upadhyay, Jan N. Fuhg, Nikolaos Bouklas, K.T. Ramesh
  • Subjects: Computational Engineering, Finance, and Science (cs.CE); Materials Science (cond-mat.mtrl-sci); Soft Condensed Matter (cond-mat.soft)
  • Arxiv link: https://arxiv.org/abs/2304.13897
  • Pdf link: https://arxiv.org/pdf/2304.13897
  • Abstract
    A novel data-driven constitutive modeling approach is proposed, which combines the physics-informed nature of modeling based on continuum thermodynamics with the benefits of machine learning. This approach is demonstrated on strain-rate-sensitive soft materials. This model is based on the viscous dissipation-based visco-hyperelasticity framework where the total stress is decomposed into volumetric, isochoric hyperelastic, and isochoric viscous overstress contributions. It is shown that each of these stress components can be written as linear combinations of the components of an irreducible integrity basis. Three Gaussian process regression-based surrogate models are trained (one per stress component) between principal invariants of strain and strain rate tensors and the corresponding coefficients of the integrity basis components. It is demonstrated that this type of model construction enforces key physics-based constraints on the predicted responses: the second law of thermodynamics, the principles of local action and determinism, objectivity, the balance of angular momentum, an assumed reference state, isotropy, and limited memory. The three surrogate models that constitute our constitutive model are evaluated by training them on small-size numerically generated data sets corresponding to a single deformation mode and then analyzing their predictions over a much wider testing regime comprising multiple deformation modes. Our physics-informed data-driven constitutive model predictions are compared with the corresponding predictions of classical continuum thermodynamics-based and purely data-driven models. It is shown that our surrogate models can reasonably capture the stress-strain-strain rate responses in both training and testing regimes, and provide improvements in terms of prediction accuracy, generalizability to multiple deformation modes, and compatibility with limited data.

Conditional dominance in games with unawareness

  • Authors: Martin Meier, Burkhard C. Schipper
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2304.13901
  • Pdf link: https://arxiv.org/pdf/2304.13901
  • Abstract
    Heifetz, Meier, and Schipper (2013) introduced dynamic game with unawareness consisting of a partially ordered set of games in extensive form. Here, we study the normal form of dynamic games with unawareness. The generalized normal form associated with a dynamic game with unawareness consists of a partially ordered set of games in norm form. We use the generalized normal form to characterize extensive-form rationalizability (resp., prudent rationalizability) in dynamic games with unawareness by iterated conditional strict (resp., weak) dominance in the associated generalized normal form. We also show that the analogue to iterated admissibility for dynamic games with unawareness depends on extensive-form structure. This is because under unawareness, a player's information set not only determines which nodes she considers possible but also of which game tree(s) she is aware of.

Level Assembly as a Markov Decision Process

  • Authors: Colan F. Biemer, Seth Cooper
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.13922
  • Pdf link: https://arxiv.org/pdf/2304.13922
  • Abstract
    Many games feature a progression of levels that doesn't adapt to the player. This can be problematic because some players may get stuck if the progression is too difficult, while others may find it boring if the progression is too slow to get to more challenging levels. This can be addressed by building levels based on the player's performance and preferences. In this work, we formulate the problem of generating levels for a player as a Markov Decision Process (MDP) and use adaptive dynamic programming (ADP) to solve the MDP before assembling a level. We tested with two case studies and found that using an ADP outperforms two baselines. Furthermore, we experimented with player proxies and switched them in the middle of play, and we show that a simple modification prior to running ADP results in quick adaptation. By using ADP, which searches the entire MDP, we produce a dynamic progression of levels that adapts to the player.

A One-Dimensional Symmetric Force-Based Blending Method for Atomistic-to-Continuum Coupling

  • Authors: Elaine Gorom-Alexander, Xingjie Helen Li
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13939
  • Pdf link: https://arxiv.org/pdf/2304.13939
  • Abstract
    Inspired by the blending method developed by [P. Seleson, S. Beneddine, and S. Prudhome, \emph{A Force-Based Coupling Scheme for Peridynamics and Classical Elasticity}, (2013)] for the nonlocal-to-local coupling, we create a symmetric and consistent blended force-based Atomistic-to-Continuum (a/c) scheme for the atomistic chain in one-dimensional space. The conditions for the well-posedness of the underlying model are established by analyzing an optimal blending size and blending type to ensure the $H^1$ semi-norm stability for the blended force-based operator. We present several numerical experiments to test and confirm the theoretical findings.

Provably Stabilizing Global-Position Tracking Control for Hybrid Models of Multi-Domain Bipedal Walking via Multiple Lyapunov Analysis

  • Authors: Yuan Gao, Kentaro Barhydt, Christopher Niezrecki, Yan Gu
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.13943
  • Pdf link: https://arxiv.org/pdf/2304.13943
  • Abstract
    Accurate control of a humanoid robot's global position (i.e., its three-dimensional position in the world) is critical to the reliable execution of high-risk tasks such as avoiding collision with pedestrians in a crowded environment. This paper introduces a time-based nonlinear control method that achieves accurate global-position tracking (GPT) for multi-domain bipedal walking. Deriving a tracking controller for bipedal robots is challenging due to the highly complex robot dynamics that are time-varying and hybrid, especially for multi-domain walking that involves multiple phases/domains of full actuation, over actuation, and underactuation. To tackle this challenge, we introduce a continuous-phase GPT control law for multi-domain walking, which provably ensures the exponential convergence of the entire error state within the full and over actuation domains and that of the directly regulated error state within the underactuation domain. We then construct sufficient multiple-Lyapunov stability conditions for the hybrid multi-domain tracking error system under the proposed GPT control law. We illustrate the proposed controller design through both three-domain walking with all motors activated and two-domain gait with inactive ankle motors. Simulations of a ROBOTIS OP3 bipedal humanoid robot demonstrate the satisfactory accuracy and convergence rate of the proposed control approach under two different cases of multi-domain walking as well as various walking speeds and desired paths.

A central scheme for coupled hyperbolic systems

  • Authors: Michael Herty, Niklas Kolbe, Siegfried Müller
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13946
  • Pdf link: https://arxiv.org/pdf/2304.13946
  • Abstract
    A novel numerical scheme to solve coupled systems of conservation laws is introduced. The scheme is derived based on a relaxation approach and does not require information on the Lax curves of the coupled systems, which simplifies the computation of suitable coupling data. The coupling condition for the underlying relaxation system plays a crucial role as it determines the behavior of the scheme in the zero relaxation limit. The role of this condition is discussed, a consistency concept with respect to the original problem is introduced, well-posedness is analyzed and explicit, nodal Riemann solvers are provided. Based on a case study considering the p-system of gas dynamics a strategy for the design of the relaxation coupling condition within the new scheme is provided.

Data-driven time-scale separation of ODE right-hand sides using dynamic mode decomposition and time delay embedding

  • Authors: Cody J. Balos
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.13971
  • Pdf link: https://arxiv.org/pdf/2304.13971
  • Abstract
    Multi-physics simulation often involve multiple different scales. The ARKODE ODE solver package in the SUNDIALS library addresses multi-scale problems with a multi-rate time-integrator that can work with a right-hand side that has fast scale and slow scale components. In this report, we use dynamic mode decomposition and time delay embedding to extract the fast and and slow components of the right-hand sides of a simple ODE from data. We then use the extracted components to solve the ODE with ARKODE. Finally, to move towards a real-world use case, we attempt to extract fast and slow scale dynamics from synthetic seismic modeling data.

An FPTAS for Budgeted Laminar Matroid Independent Set

  • Authors: Ilan Doron-Arad, Ariel Kulik, Hadas Shachnai
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.13984
  • Pdf link: https://arxiv.org/pdf/2304.13984
  • Abstract
    We study the budgeted laminar matroid independent set problem. The input is a ground set, where each element has a cost and a non-negative profit, along with a laminar matroid over the elements and a budget. The goal is to select a maximum profit independent set of the matroid whose total cost is bounded by the budget. Several well known special cases, where we have, e.g., no matroid constraint (the classic knapsack problem) or a uniform matroid constraint (knapsack with a cardinality constraint), admit a fully polynomial-time approximation scheme (FPTAS). In contrast, the budgeted matroid independent set (BMI) problem with a general matroid has an efficient polynomial-time approximation scheme (EPTAS) but does not admit an FPTAS. This implies an EPTAS for our problem, which is the best known result prior to this work. We present an FPTAS for budgeted laminar matroid independent set, improving the previous EPTAS for this matroid family and generalizing the FPTAS known for knapsack with a cardinality constraint and multiple-choice knapsack. Our scheme is based on a simple dynamic program which utilizes the tree-like structure of laminar matroids.

communication of information in systems of heterogenious agents and systems' dynamics

  • Authors: Inga Ivanova
  • Subjects: Computers and Society (cs.CY); Information Theory (cs.IT); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.14013
  • Pdf link: https://arxiv.org/pdf/2304.14013
  • Abstract
    Communication of information in complex systems can be considered as major driver of systems evolution. What matters is not the communicated information by itself but rather the meaning that is supplied to the information. However informational exchange in a system of heterogenious agents, which code and decode information with different meaning processing structures, is more complex than simple input-output model. The structural difference of coding and decoding algorithms in a system of three or more groups of agents, entertaining different sets of communication codes,provide a source of additional options which has an impact on system's dynamics. The mechanisms of meaning and information processing can be evaluated analytically ion a model framework. The results show that model predictions acccurately fit empirically observed data in systems of different origions.

Unification of Lagrangian staggered-grid hydrodynamics and cell-centered hydrodynamics in one dimension

  • Authors: Xihua Xu
  • Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
  • Arxiv link: https://arxiv.org/abs/2304.14054
  • Pdf link: https://arxiv.org/pdf/2304.14054
  • Abstract
    This paper focuses on the novel scheme to unify both Lagrangian staggered-grid and cell-centered hydrodynamic methods in one dimension. The scheme neither contains empirical parameters nor solves the Riemann problem. It includes two key points: one is the relationship between pressure and velocity, and the other is Newton's second law. The two methods that make use of this scheme satisfy the entropy condition and are conservative in total mass, momentum, and energy. Numerical results show the robustness and accuracy of both methods.

Comparison of Optimization-Based Methods for Energy-Optimal Quadrotor Motion Planning

  • Authors: Welf Rehberg, Joaquim Ortiz-Haro, Marc Toussaint, Wolfgang Hönig
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14062
  • Pdf link: https://arxiv.org/pdf/2304.14062
  • Abstract
    Quadrotors are agile flying robots that are challenging to control. Considering the full dynamics of quadrotors during motion planning is crucial to achieving good solution quality and small tracking errors during flight. Optimization-based methods scale well with high-dimensional state spaces and can handle dynamic constraints directly, therefore they are often used in these scenarios. The resulting optimization problem is notoriously difficult to solve due to its nonconvex constraints. In this work, we present an analysis of four solvers for nonlinear trajectory optimization (KOMO, direct collocation with SCvx, direct collocation with CasADi, Crocoddyl) and evaluate their performance in scenarios where the solvers are tasked to find minimum-effort solutions to geometrically complex problems and problems requiring highly dynamic solutions. Benchmarking these methods helps to determine the best algorithm structures for these kinds of problems.

Compositional 3D Human-Object Neural Animation

  • Authors: Zhi Hou, Baosheng Yu, Dacheng Tao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14070
  • Pdf link: https://arxiv.org/pdf/2304.14070
  • Abstract
    Human-object interactions (HOIs) are crucial for human-centric scene understanding applications such as human-centric visual generation, AR/VR, and robotics. Since existing methods mainly explore capturing HOIs, rendering HOI remains less investigated. In this paper, we address this challenge in HOI animation from a compositional perspective, i.e., animating novel HOIs including novel interaction, novel human and/or novel object driven by a novel pose sequence. Specifically, we adopt neural human-object deformation to model and render HOI dynamics based on implicit neural representations. To enable the interaction pose transferring among different persons and objects, we then devise a new compositional conditional neural radiance field (or CC-NeRF), which decomposes the interdependence between human and object using latent codes to enable compositionally animation control of novel HOIs. Experiments show that the proposed method can generalize well to various novel HOI animation settings. Our project page is https://zhihou7.github.io/CHONA/

Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning: A Dynamic Weight-based Approach

  • Authors: Junlin Lu, Patrick Mannion, Karl Mason
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14115
  • Pdf link: https://arxiv.org/pdf/2304.14115
  • Abstract
    Many decision-making problems feature multiple objectives. In such problems, it is not always possible to know the preferences of a decision-maker for different objectives. However, it is often possible to observe the behavior of decision-makers. In multi-objective decision-making, preference inference is the process of inferring the preferences of a decision-maker for different objectives. This research proposes a Dynamic Weight-based Preference Inference (DWPI) algorithm that can infer the preferences of agents acting in multi-objective decision-making problems, based on observed behavior trajectories in the environment. The proposed method is evaluated on three multi-objective Markov decision processes: Deep Sea Treasure, Traffic, and Item Gathering. The performance of the proposed DWPI approach is compared to two existing preference inference methods from the literature, and empirical results demonstrate significant improvements compared to the baseline algorithms, in terms of both time requirements and accuracy of the inferred preferences. The Dynamic Weight-based Preference Inference algorithm also maintains its performance when inferring preferences for sub-optimal behavior demonstrations. In addition to its impressive performance, the Dynamic Weight-based Preference Inference algorithm does not require any interactions during training with the agent whose preferences are inferred, all that is required is a trajectory of observed behavior.

Learning Neural PDE Solvers with Parameter-Guided Channel Attention

  • Authors: Makoto Takamoto, Francesco Alesiani, Mathias Niepert
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn); Geophysics (physics.geo-ph)
  • Arxiv link: https://arxiv.org/abs/2304.14118
  • Pdf link: https://arxiv.org/pdf/2304.14118
  • Abstract
    Scientific Machine Learning (SciML) is concerned with the development of learned emulators of physical systems governed by partial differential equations (PDE). In application domains such as weather forecasting, molecular dynamics, and inverse design, ML-based surrogate models are increasingly used to augment or replace inefficient and often non-differentiable numerical simulation algorithms. While a number of ML-based methods for approximating the solutions of PDEs have been proposed in recent years, they typically do not adapt to the parameters of the PDEs, making it difficult to generalize to PDE parameters not seen during training. We propose a Channel Attention mechanism guided by PDE Parameter Embeddings (CAPE) component for neural surrogate models and a simple yet effective curriculum learning strategy. The CAPE module can be combined with neural PDE solvers allowing them to adapt to unseen PDE parameters. The curriculum learning strategy provides a seamless transition between teacher-forcing and fully auto-regressive training. We compare CAPE in conjunction with the curriculum learning strategy using a popular PDE benchmark and obtain consistent and significant improvements over the baseline models. The experiments also show several advantages of CAPE, such as its increased ability to generalize to unseen PDE parameters without large increases inference time and parameter count.

A particle method for non-local advection-selection-mutation equations

  • Authors: Frank Ernesto Alvarez, Jules Guilberteau
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14210
  • Pdf link: https://arxiv.org/pdf/2304.14210
  • Abstract
    The well-posedness of a non-local advection-selection-mutation problem deriving from adaptive dynamics models is shown for a wide family of initial data. A particle method is then developed, in order to approximate the solution of such problem by a regularised sum of weighted Dirac masses whose characteristics solve a suitably defined ODE system. The convergence of the particle method over any finite interval is shown and an explicit rate of convergence is given. Furthermore, we investigate the asymptotic-preserving properties of the method in large times, providing sufficient conditions for it to hold true as well as examples and counter-examples. Finally, we illustrate the method in two cases taken from the literature.

Some of the variables, some of the parameters, some of the times, with some physics known: Identification with partial information

  • Authors: Saurabh Malani, Tom S. Bertalan, Tianqi Cui, Jose L. Avalos, Michael Betenbaugh, Ioannis G. Kevrekidis
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.14214
  • Pdf link: https://arxiv.org/pdf/2304.14214
  • Abstract
    Experimental data is often comprised of variables measured independently, at different sampling rates (non-uniform ${\Delta}$t between successive measurements); and at a specific time point only a subset of all variables may be sampled. Approaches to identifying dynamical systems from such data typically use interpolation, imputation or subsampling to reorganize or modify the training data $\textit{prior}$ to learning. Partial physical knowledge may also be available $\textit{a priori}$ (accurately or approximately), and data-driven techniques can complement this knowledge. Here we exploit neural network architectures based on numerical integration methods and $\textit{a priori}$ physical knowledge to identify the right-hand side of the underlying governing differential equations. Iterates of such neural-network models allow for learning from data sampled at arbitrary time points $\textit{without}$ data modification. Importantly, we integrate the network with available partial physical knowledge in "physics informed gray-boxes"; this enables learning unknown kinetic rates or microbial growth functions while simultaneously estimating experimental parameters.

Fast Sampling of $b$-Matchings and $b$-Edge Covers

  • Authors: Zongchen Chen, Yuzhou Gu
  • Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Combinatorics (math.CO); Probability (math.PR)
  • Arxiv link: https://arxiv.org/abs/2304.14289
  • Pdf link: https://arxiv.org/pdf/2304.14289
  • Abstract
    For integer $b \ge 1$, a $b$-matching (resp. $b$-edge cover) of a graph $G=(V,E)$ is a subset $S\subseteq E$ of edges such that every vertex is incident with at most (resp. at least) $b$ edges from $S$. We prove that for any $b \ge 1$ the simple Glauber dynamics for sampling (weighted) $b$-matchings and $b$-edge covers mixes in $O(n\log n)$ time on all $n$-vertex bounded-degree graphs. This significantly improves upon previous results which have worse running time and only work for $b$-matchings with $b \le 7$ and for $b$-edge covers with $b \le 2$. Moreover generally, we prove spectral independence for a broad class of binary symmetric Holant problems with log-concave signatures, including $b$-matchings, $b$-edge covers, and antiferromagnetic $2$-spin edge models. We hence deduce optimal mixing time of Glauber dynamics from spectral independence.

Structured interpolation for multivariate transfer functions of quadratic-bilinear systems

  • Authors: Peter Benner, Serkan Gugercin, Steffen W. R. Werner
  • Subjects: Numerical Analysis (math.NA); Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.14292
  • Pdf link: https://arxiv.org/pdf/2304.14292
  • Abstract
    High-dimensional/high-fidelity nonlinear dynamical systems appear naturally when the goal is to accurately model real-world phenomena. Many physical properties are thereby encoded in the internal differential structure of these resulting large-scale nonlinear systems. The high-dimensionality of the dynamics causes computational bottlenecks, especially when these large-scale systems need to be simulated for a variety of situations such as different forcing terms. This motivates model reduction where the goal is to replace the full-order dynamics with accurate reduced-order surrogates. Interpolation-based model reduction has been proven to be an effective tool for the construction of cheap-to-evaluate surrogate models that preserve the internal structure in the case of weak nonlinearities. In this paper, we consider the construction of multivariate interpolants in frequency domain for structured quadratic-bilinear systems. We propose definitions for structured variants of the symmetric subsystem and generalized transfer functions of quadratic-bilinear systems and provide conditions for structure-preserving interpolation by projection. The theoretical results are illustrated using two numerical examples including the simulation of molecular dynamics in crystal structures.

On Solution Discovery via Reconfiguration

  • Authors: Michael R. Fellows, Mario Grobler, Nicole Megow, Amer E. Mouawad, Vijayaragunathan Ramamoorthi, Frances A. Rosamond, Daniel Schmand, Sebastian Siebertz
  • Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.14295
  • Pdf link: https://arxiv.org/pdf/2304.14295
  • Abstract
    The dynamics of real-world applications and systems require efficient methods for improving infeasible solutions or restoring corrupted ones by making modifications to the current state of a system in a restricted way. We propose a new framework of solution discovery via reconfiguration for constructing a feasible solution for a given problem by executing a sequence of small modifications starting from a given state. Our framework integrates and formalizes different aspects of classical local search, reoptimization, and combinatorial reconfiguration. We exemplify our framework on a multitude of fundamental combinatorial problems, namely Vertex Cover, Independent Set, Dominating Set, and Coloring. We study the classical as well as the parameterized complexity of the solution discovery variants of those problems and explore the boundary between tractable and intractable instances.

Learning Absorption Rates in Glucose-Insulin Dynamics from Meal Covariates

  • Authors: Ke Alexander Wang, Matthew E. Levine, Jiaxin Shi, Emily B. Fox
  • Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Quantitative Methods (q-bio.QM)
  • Arxiv link: https://arxiv.org/abs/2304.14300
  • Pdf link: https://arxiv.org/pdf/2304.14300
  • Abstract
    Traditional models of glucose-insulin dynamics rely on heuristic parameterizations chosen to fit observations within a laboratory setting. However, these models cannot describe glucose dynamics in daily life. One source of failure is in their descriptions of glucose absorption rates after meal events. A meal's macronutritional content has nuanced effects on the absorption profile, which is difficult to model mechanistically. In this paper, we propose to learn the effects of macronutrition content from glucose-insulin data and meal covariates. Given macronutrition information and meal times, we use a neural network to predict an individual's glucose absorption rate. We use this neural rate function as the control function in a differential equation of glucose dynamics, enabling end-to-end training. On simulated data, our approach is able to closely approximate true absorption rates, resulting in better forecast than heuristic parameterizations, despite only observing glucose, insulin, and macronutritional information. Our work readily generalizes to meal events with higher-dimensional covariates, such as images, setting the stage for glucose dynamics models that are personalized to each individual's daily life.

Empirical Individual State Observability

  • Authors: Benjamin Cellini, Burak Boyacıoğlu, Floris van Breugel
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.14313
  • Pdf link: https://arxiv.org/pdf/2304.14313
  • Abstract
    A dynamical system is observable if there is a one-to-one mapping from the system's measured outputs and inputs to all of the system's states. Analytical and empirical tools exist for quantifying the (full state) observability of linear and nonlinear systems; however, empirical tools for evaluating the observability of individual state variables are lacking. Here, a new empirical approach termed Empirical Individual State Observability (E-ISO) is developed to quantify the level of observability of individual state variables. E-ISO first builds an empirical observability matrix via simulation, then applies convex optimization to efficiently determine the subset of its rows required to estimate each state variable individually. Finally, (un)observability measures for these subsets are calculated to provide independent estimates of the observability of each state variable. Multiple example applications of E-ISO on linear and nonlinear systems are shown to be consistent with analytical results. Broadly, E-ISO will be an invaluable tool both for designing active sensing control laws or optimizing sensor placement to increase the observability of individual state variables for engineered systems, and analyzing the trajectory decisions made by organisms.

An Audit Framework for Adopting AI-Nudging on Children

  • Authors: Marianna Ganapini, Enrico Panai
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.14338
  • Pdf link: https://arxiv.org/pdf/2304.14338
  • Abstract
    This is an audit framework for AI-nudging. Unlike the static form of nudging usually discussed in the literature, we focus here on a type of nudging that uses large amounts of data to provide personalized, dynamic feedback and interfaces. We call this AI-nudging (Lanzing, 2019, p. 549; Yeung, 2017). The ultimate goal of the audit outlined here is to ensure that an AI system that uses nudges will maintain a level of moral inertia and neutrality by complying with the recommendations, requirements, or suggestions of the audit (in other words, the criteria of the audit). In the case of unintended negative consequences, the audit suggests risk mitigation mechanisms that can be put in place. In the case of unintended positive consequences, it suggests some reinforcement mechanisms. Sponsored by the IBM-Notre Dame Tech Ethics Lab

SMAT: A Self-Reinforcing Framework for Simultaneous Mapping and Tracking in Unbounded Urban Environments

  • Authors: Tingxiang Fan, Bowen Shen, Yinqiang Zhang, Chuye Zhang, Lei Yang, Hua Chen, Wei Zhang, Jia Pan
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14356
  • Pdf link: https://arxiv.org/pdf/2304.14356
  • Abstract
    With the increasing prevalence of robots in daily life, it is crucial to enable robots to construct a reliable map online to navigate in unbounded and changing environments. Although existing methods can individually achieve the goals of spatial mapping and dynamic object detection and tracking, limited research has been conducted on an effective combination of these two important abilities. The proposed framework, SMAT (Simultaneous Mapping and Tracking), integrates the front-end dynamic object detection and tracking module with the back-end static mapping module using a self-reinforcing mechanism, which promotes mutual improvement of mapping and tracking performance. The conducted experiments demonstrate the framework's effectiveness in real-world applications, achieving successful long-range navigation and mapping in multiple urban environments using only one LiDAR, a CPU-only onboard computer, and a consumer-level GPS receiver.

Measuring and Modeling the Free Content Web

  • Authors: Abdulrahman Alabduljabbar, Runyu Ma, Ahmed Abusnaina, Rhongho Jang, Songqing Chen, DaeHun Nyang, and David Mohaisen
  • Subjects: Computers and Society (cs.CY); Cryptography and Security (cs.CR); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.14359
  • Pdf link: https://arxiv.org/pdf/2304.14359
  • Abstract
    Free content websites that provide free books, music, games, movies, etc., have existed on the Internet for many years. While it is a common belief that such websites might be different from premium websites providing the same content types, an analysis that supports this belief is lacking in the literature. In particular, it is unclear if those websites are as safe as their premium counterparts. In this paper, we set out to investigate, by analysis and quantification, the similarities and differences between free content and premium websites, including their risk profiles. To conduct this analysis, we assembled a list of 834 free content websites offering books, games, movies, music, and software, and 728 premium websites offering content of the same type. We then contribute domain-, content-, and risk-level analysis, examining and contrasting the websites' domain names, creation times, SSL certificates, HTTP requests, page size, average load time, and content type. For risk analysis, we consider and examine the maliciousness of these websites at the website- and component-level. Among other interesting findings, we show that free content websites tend to be vastly distributed across the TLDs and exhibit more dynamics with an upward trend for newly registered domains. Moreover, the free content websites are 4.5 times more likely to utilize an expired certificate, 19 times more likely to be malicious at the website level, and 2.64 times more likely to be malicious at the component level. Encouraged by the clear differences between the two types of websites, we explore the automation and generalization of the risk modeling of the free content risky websites, showing that a simple machine learning-based technique can produce 86.81% accuracy in identifying them.

Learning Neural Constitutive Laws From Motion Observations for Generalizable PDE Dynamics

  • Authors: Pingchuan Ma, Peter Yichen Chen, Bolei Deng, Joshua B. Tenenbaum, Tao Du, Chuang Gan, Wojciech Matusik
  • Subjects: Machine Learning (cs.LG); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.14369
  • Pdf link: https://arxiv.org/pdf/2304.14369
  • Abstract
    We propose a hybrid neural network (NN) and PDE approach for learning generalizable PDE dynamics from motion observations. Many NN approaches learn an end-to-end model that implicitly models both the governing PDE and constitutive models (or material models). Without explicit PDE knowledge, these approaches cannot guarantee physical correctness and have limited generalizability. We argue that the governing PDEs are often well-known and should be explicitly enforced rather than learned. Instead, constitutive models are particularly suitable for learning due to their data-fitting nature. To this end, we introduce a new framework termed "Neural Constitutive Laws" (NCLaw), which utilizes a network architecture that strictly guarantees standard constitutive priors, including rotation equivariance and undeformed state equilibrium. We embed this network inside a differentiable simulation and train the model by minimizing a loss function based on the difference between the simulation and the motion observation. We validate NCLaw on various large-deformation dynamical systems, ranging from solids to fluids. After training on a single motion trajectory, our method generalizes to new geometries, initial/boundary conditions, temporal ranges, and even multi-physics systems. On these extremely out-of-distribution generalization tasks, NCLaw is orders-of-magnitude more accurate than previous NN approaches. Real-world experiments demonstrate our method's ability to learn constitutive laws from videos.

Pseudo-Hamiltonian neural networks for learning partial differential equations

  • Authors: Sølve Eidnes, Kjetil Olsen Lye
  • Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.14374
  • Pdf link: https://arxiv.org/pdf/2304.14374
  • Abstract
    Pseudo-Hamiltonian neural networks (PHNN) were recently introduced for learning dynamical systems that can be modelled by ordinary differential equations. In this paper, we extend the method to partial differential equations. The resulting model is comprised of up to three neural networks, modelling terms representing conservation, dissipation and external forces, and discrete convolution operators that can either be learned or be prior knowledge. We demonstrate numerically the superior performance of PHNN compared to a baseline model that models the full dynamics by a single neural network. Moreover, since the PHNN model consists of three parts with different physical interpretations, these can be studied separately to gain insight into the system, and the learned model is applicable also if external forces are removed or changed.

Dynamic Pricing and Learning with Bayesian Persuasion

  • Authors: Shipra Agrawal, Yiding Feng, Wei Tang
  • Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.14385
  • Pdf link: https://arxiv.org/pdf/2304.14385
  • Abstract
    We consider a novel dynamic pricing and learning setting where in addition to setting prices of products in sequential rounds, the seller also ex-ante commits to 'advertising schemes'. That is, in the beginning of each round the seller can decide what kind of signal they will provide to the buyer about the product's quality upon realization. Using the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses, we formulate the problem of finding an optimal design of the advertising scheme along with a pricing scheme that maximizes the seller's expected revenue. Without any apriori knowledge of the buyers' demand function, our goal is to design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy. We study the regret of the algorithm when compared to the optimal clairvoyant price and advertising scheme. Our main result is a computationally efficient online algorithm that achieves an $O(T^{2/3}(m\log T)^{1/3})$ regret bound when the valuation function is linear in the product quality. Here $m$ is the cardinality of the discrete product quality domain and $T$ is the time horizon. This result requires some natural monotonicity and Lipschitz assumptions on the valuation function, but no Lipschitz or smoothness assumption on the buyers' demand function. For constant $m$, our result matches the regret lower bound for dynamic pricing within logarithmic factors, which is a special case of our problem. We also obtain several improved results for the widely considered special case of additive valuations, including an $\tilde{O}(T^{2/3})$ regret bound independent of $m$ when $m\le T^{1/3}$.

SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos

  • Authors: John Z. Zhang, Shuo Yang, Gengshan Yang, Arun L. Bishop, Deva Ramanan, Zachary Manchester
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.14389
  • Pdf link: https://arxiv.org/pdf/2304.14389
  • Abstract
    We present SLoMo: a first-of-its-kind framework for transferring skilled motions from casually captured "in the wild" video footage of humans and animals to legged robots. SLoMo works in three stages: 1) synthesize a physically plausible reconstructed key-point trajectory from monocular videos; 2) optimize a dynamically feasible reference trajectory for the robot offline that includes body and foot motion, as well as contact sequences that closely tracks the key points; 3) track the reference trajectory online using a general-purpose model-predictive controller on robot hardware. Traditional motion imitation for legged motor skills often requires expert animators, collaborative demonstrations, and/or expensive motion capture equipment, all of which limits scalability. Instead, SLoMo only relies on easy-to-obtain monocular video footage, readily available in online repositories such as YouTube. It converts videos into motion primitives that can be executed reliably by real-world robots. We demonstrate our approach by transferring the motions of cats, dogs, and humans to example robots including a quadruped (on hardware) and a humanoid (in simulation). To the best knowledge of the authors, this is the first attempt at a general-purpose motion transfer framework that imitates animal and human motions on legged robots directly from casual videos without artificial markers or labels.

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

  • Authors: Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin, Ming-Hsuan Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.14404
  • Pdf link: https://arxiv.org/pdf/2304.14404
  • Abstract
    Recent advancements in diffusion models have greatly improved the quality and diversity of synthesized content. To harness the expressive power of diffusion models, researchers have explored various controllable mechanisms that allow users to intuitively guide the content synthesis process. Although the latest efforts have primarily focused on video synthesis, there has been a lack of effective methods for controlling and describing desired content and motion. In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis. To tackle the ambiguity of sparse motion inputs and achieve better synthesis quality, MCDiff first utilizes a flow completion model to predict the dense video motion based on the semantic understanding of the video frame and the sparse motion control. Then, the diffusion model synthesizes high-quality future frames to form the output video. We qualitatively and quantitatively show that MCDiff achieves the state-the-of-art visual quality in stroke-guided controllable video synthesis. Additional experiments on MPII Human Pose further exhibit the capability of our model on diverse content and motion synthesis.

New submissions for Wed, 5 Apr 23

Keyword: efficient

POLAR-Express: Efficient and Precise Formal Reachability Analysis of Neural-Network Controlled Systems

  • Authors: Yixuan Wan, Weichao Zhou, Jiameng Fan, Zhilu Wang, Jiajun Li, Xin Chen, Chao Huang, Wenchao Li, Qi Zhu
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.01218
  • Pdf link: https://arxiv.org/pdf/2304.01218
  • Abstract
    Neural networks (NNs) playing the role of controllers have demonstrated impressive empirical performances on challenging control problems. However, the potential adoption of NN controllers in real-life applications also gives rise to a growing concern over the safety of these neural-network controlled systems (NNCSs), especially when used in safety-critical applications. In this work, we present POLAR-Express, an efficient and precise formal reachability analysis tool for verifying the safety of NNCSs. POLAR-Express uses Taylor model arithmetic to propagate Taylor models (TMs) across a neural network layer-by-layer to compute an overapproximation of the neural-network function. It can be applied to analyze any feed-forward neural network with continuous activation functions. We also present a novel approach to propagate TMs more efficiently and precisely across ReLU activation functions. In addition, POLAR-Express provides parallel computation support for the layer-by-layer propagation of TMs, thus significantly improving the efficiency and scalability over its earlier prototype POLAR. Across the comparison with six other state-of-the-art tools on a diverse set of benchmarks, POLAR-Express achieves the best verification efficiency and tightness in the reachable set analysis.

Optimizing Data Shapley Interaction Calculation from O(2^n) to O(t n^2) for KNN models

  • Authors: Mohamed Karim Belaid, Dorra El Mekki, Maximilian Rabus, Eyke Hüllermeier
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2304.01224
  • Pdf link: https://arxiv.org/pdf/2304.01224
  • Abstract
    With the rapid growth of data availability and usage, quantifying the added value of each training data point has become a crucial process in the field of artificial intelligence. The Shapley values have been recognized as an effective method for data valuation, enabling efficient training set summarization, acquisition, and outlier removal. In this paper, we introduce "STI-KNN", an innovative algorithm that calculates the exact pair-interaction Shapley values for KNN models in O(t n^2) time, which is a significant improvement over the O(2^n)$ time complexity of baseline methods. By using STI-KNN, we can efficiently and accurately evaluate the value of individual data points, leading to improved training outcomes and ultimately enhancing the effectiveness of artificial intelligence applications.

A greedy approach for increased vehicle utilization in ridesharing networks

  • Authors: Aqsa Ashraf Makhdomi, Iqra Altaf Gillani
  • Subjects: Data Structures and Algorithms (cs.DS); Computers and Society (cs.CY); Information Retrieval (cs.IR); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.01225
  • Pdf link: https://arxiv.org/pdf/2304.01225
  • Abstract
    In recent years, ridesharing platforms have become a prominent mode of transportation for the residents of urban areas. As a fundamental problem, route recommendation for these platforms is vital for their sustenance. The works done in this direction have recommended routes with higher passenger demand. Despite the existing works, statistics have suggested that these services cause increased greenhouse emissions compared to private vehicles as they roam around in search of riders. This analysis provides finer details regarding the functionality of ridesharing systems and it reveals that in the face of their boom, they have not utilized the vehicle capacity efficiently. We propose to overcome the above limitations and recommend routes that will fetch multiple passengers simultaneously which will result in increased vehicle utilization and thereby decrease the effect of these systems on the environment. As route recommendation is NP-hard, we propose a k-hop-based sliding window approximation algorithm that reduces the search space from entire road network to a window. We further demonstrate that maximizing expected demand is submodular and greedy algorithms can be used to optimize our objective function within a window. We evaluate our proposed model on real-world datasets and experimental results demonstrate superior performance by our proposed model.

SEENN: Towards Temporal Spiking Early-Exit Neural Networks

  • Authors: Yuhang Li, Tamar Geller, Youngeun Kim, Priyadarshini Panda
  • Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.01230
  • Pdf link: https://arxiv.org/pdf/2304.01230
  • Abstract
    Spiking Neural Networks (SNNs) have recently become more popular as a biologically plausible substitute for traditional Artificial Neural Networks (ANNs). SNNs are cost-efficient and deployment-friendly because they process input in both spatial and temporal manners using binary spikes. However, we observe that the information capacity in SNNs is affected by the number of timesteps, leading to an accuracy-efficiency tradeoff. In this work, we study a fine-grained adjustment of the number of timesteps in SNNs. Specifically, we treat the number of timesteps as a variable conditioned on different input samples to reduce redundant timesteps for certain data. We call our method Spiking Early-Exit Neural Networks (SEENNs). To determine the appropriate number of timesteps, we propose SEENN-I which uses a confidence score thresholding to filter out the uncertain predictions, and SEENN-II which determines the number of timesteps by reinforcement learning. Moreover, we demonstrate that SEENN is compatible with both the directly trained SNN and the ANN-SNN conversion. By dynamically adjusting the number of timesteps, our SEENN achieves a remarkable reduction in the average number of timesteps during inference. For example, our SEENN-II ResNet-19 can achieve 96.1% accuracy with an average of 1.08 timesteps on the CIFAR-10 test dataset.

X-TIME: An in-memory engine for accelerating machine learning on tabular data with CAMs

  • Authors: Giacomo Pedretti, John Moon, Pedro Bruel, Sergey Serebryakov, Ron M. Roth, Luca Buonanno, Tobias Ziegler, Cong Xu, Martin Foltin, Jim Ignowski, Catherine E. Graves
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.01285
  • Pdf link: https://arxiv.org/pdf/2304.01285
  • Abstract
    Structured, or tabular, data is the most common format in data science. While deep learning models have proven formidable in learning from unstructured data such as images or speech, they are less accurate than simpler approaches when learning from tabular data. In contrast, modern tree-based Machine Learning (ML) models shine in extracting relevant information from structured data. An essential requirement in data science is to reduce model inference latency in cases where, for example, models are used in a closed loop with simulation to accelerate scientific discovery. However, the hardware acceleration community has mostly focused on deep neural networks and largely ignored other forms of machine learning. Previous work has described the use of an analog content addressable memory (CAM) component for efficiently mapping random forests. In this work, we focus on an overall analog-digital architecture implementing a novel increased precision analog CAM and a programmable network on chip allowing the inference of state-of-the-art tree-based ML models, such as XGBoost and CatBoost. Results evaluated in a single chip at 16nm technology show 119x lower latency at 9740x higher throughput compared with a state-of-the-art GPU, with a 19W peak power consumption.

Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes

  • Authors: Yifan Chen, Houman Owhadi, Florian Schäfer
  • Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.01294
  • Pdf link: https://arxiv.org/pdf/2304.01294
  • Abstract
    We study the computational scalability of a Gaussian process (GP) framework for solving general nonlinear partial differential equations (PDEs). This framework transforms solving PDEs to solving quadratic optimization problem with nonlinear constraints. Its complexity bottleneck lies in computing with dense kernel matrices obtained from pointwise evaluations of the covariance kernel of the GP and its partial derivatives at collocation points. We present a sparse Cholesky factorization algorithm for such kernel matrices based on the near-sparsity of the Cholesky factor under a new ordering of Diracs and derivative measurements. We rigorously identify the sparsity pattern and quantify the exponentially convergent accuracy of the corresponding Vecchia approximation of the GP, which is optimal in the Kullback-Leibler divergence. This enables us to compute $\epsilon$-approximate inverse Cholesky factors of the kernel matrices with complexity $O(N\log^d(N/\epsilon))$ in space and $O(N\log^{2d}(N/\epsilon))$ in time. With the sparse factors, gradient-based optimization methods become scalable. Furthermore, we can use the oftentimes more efficient Gauss-Newton method, for which we apply the conjugate gradient algorithm with the sparse factor of a reduced kernel matrix as a preconditioner to solve the linear system. We numerically illustrate our algorithm's near-linear space/time complexity for a broad class of nonlinear PDEs such as the nonlinear elliptic, Burgers, and Monge-Amp`ere equations. In summary, we provide a fast, scalable, and accurate method for solving general PDEs with GPs.

Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning

  • Authors: Lifu Tu, Jin Qu, Semih Yavuz, Shafiq Joty, Wenhao Liu, Caiming Xiong, Yingbo Zhou
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.01295
  • Pdf link: https://arxiv.org/pdf/2304.01295
  • Abstract
    Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks, but focus on conversational tasks has been rather limited. This is partly due to the high cost of obtaining non-English conversational data, which results in limited coverage. In this work, we introduce XSGD, a parallel and large-scale multilingual conversation dataset that we created by translating the English-only Schema-Guided Dialogue (SGD) dataset (Rastogi et al., 2020) into 105 other languages. XSGD contains approximately 330k utterances per language. To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts. We also investigate two different classifiers: NLI-based and vanilla classifiers, and test cross-lingual capability enabled by the aligned prompts. We evaluate our model's cross-lingual generalization capabilities on two conversation tasks: slot-filling and intent classification. Our results demonstrate the strong and efficient modeling ability of NLI-based classifiers and the large cross-lingual transfer improvements achieved by our aligned prompts, particularly in few-shot settings.

Towards Deterministic Communications in 6G Networks: State of the Art, Open Challenges and the Way Forward

  • Authors: Gourav Prateek Sharma, Dhruvin Patel, Joachim Sachs, Marilet De Andrade, Janos Farkas, Janos Harmatos, Balazs Varga, Hans-Peter Bernhard, Raheeb Muzaffar, Mahin K. Atiq, Frank Duerr, Dietmar Bruckner, Edgardo Montesdeoca, Drissa Houatra, Hongwei Zhang, James Gross
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.01299
  • Pdf link: https://arxiv.org/pdf/2304.01299
  • Abstract
    Over the last decade, society and industries are undergoing rapid digitization that is expected to lead to the evolution of the cyber-physical continuum. End-to-end deterministic communications infrastructure is the essential glue that will bridge the digital and physical worlds of the continuum. We describe the state of the art and open challenges with respect to contemporary deterministic communications and compute technologies: 3GPP 5G, IEEE Time-Sensitive Networking, IETF DetNet, OPC UA as well as edge computing. While these technologies represent significant technological advancements towards networking Cyber-Physical Systems (CPS), we argue in this paper that they rather represent a first generation of systems which are still limited in different dimensions. In contrast, realizing future deterministic communication systems requires, firstly, seamless convergence between these technologies and, secondly, scalability to support heterogeneous (time-varying requirements) arising from diverse CPS applications. In addition, future deterministic communication networks will have to provide such characteristics end-to-end, which for CPS refers to the entire communication and computation loop, from sensors to actuators. In this paper, we discuss the state of the art regarding the main challenges towards these goals: predictability, end-to-end technology integration, end-to-end security, and scalable vertical application interfacing. We then present our vision regarding viable approaches and technological enablers to overcome these four central challenges. Key approaches to leverage in that regard are 6G system evolutions, wireless friendly integration of 6G into TSN and DetNet, novel end-to-end security approaches, efficient edge-cloud integrations, data-driven approaches for stochastic characterization and prediction, as well as leveraging digital twins towards system awareness.

Integrated Access and Backhaul via Satellites

  • Authors: Zaid Abdullah, Steven Kisseleff, Eva Lagunas, Vu Nguyen Ha, Frank Zeppenfeldt, Symeon Chatzinotas
  • Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.01304
  • Pdf link: https://arxiv.org/pdf/2304.01304
  • Abstract
    To allow flexible and cost-efficient network densification and deployment, the integrated access and backhaul (IAB) was recently standardized by the third generation partnership project (3GPP) as part of the fifth-generation new radio (5G-NR) networks. However, the current standardization only defines the IAB for the terrestrial domain, while non-terrestrial networks (NTNs) are yet to be considered for such standardization efforts. In this work, we motivate the use of IAB in NTNs, and we discuss the compatibility issues between the 3GPP specifications on IAB in 5G-NR and the satellite radio regulations. In addition, we identify the required adaptation from the 3GPP and/or satellite operators for realizing an NTN-enabled IAB operation. A case study is provided for a low earth orbit (LEO) satellite-enabled in-band IAB operation with orthogonal and non-orthogonal bandwidth allocation between access and backhauling, and under both time- and frequency-division duplex (TDD/FDD) transmission modes. Numerical results demonstrate the feasibility of IAB through satellites, and illustrate the superiority of FDD over TDD transmission. It is also shown that in the absence of precoding, non-orthogonal bandwidth allocation between the access and the backhaul can largely degrades the network throughput.

PyFlyt -- UAV Simulation Environments for Reinforcement Learning Research

  • Authors: Jun Jet Tai, Jim Wong, Mauro Innocente, Nadjim Horri, James Brusey, Swee King Phang
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.01305
  • Pdf link: https://arxiv.org/pdf/2304.01305
  • Abstract
    Unmanned aerial vehicles (UAVs) have numerous applications, but their efficient and optimal flight can be a challenge. Reinforcement Learning (RL) has emerged as a promising approach to address this challenge, yet there is no standardized library for testing and benchmarking RL algorithms on UAVs. In this paper, we introduce PyFlyt, a platform built on the Bullet physics engine with native Gymnasium API support. PyFlyt provides modular implementations of simple components, such as motors and lifting surfaces, allowing for the implementation of UAVs of arbitrary configurations. Additionally, PyFlyt includes various task definitions and multiple reward function settings for each vehicle type. We demonstrate the effectiveness of PyFlyt by training various RL agents for two UAV models: quadrotor and fixed-wing. Our findings highlight the effectiveness of RL in UAV control and planning, and further show that it is possible to train agents in sparse reward settings for UAVs. PyFlyt fills a gap in existing literature by providing a flexible and standardised platform for testing RL algorithms on UAVs. We believe that this will inspire more standardised research in this direction.

Universal Framework for Parametric Constrained Coding

  • Authors: Daniella Bar-Lev, Adir Kobovich, Orian Leitersdorf, Eitan Yaakobi
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2304.01317
  • Pdf link: https://arxiv.org/pdf/2304.01317
  • Abstract
    Constrained coding is a fundamental field in coding theory that tackles efficient communication through constrained channels. While channels with fixed constraints have a general optimal solution, there is increasing demand for parametric constraints that are dependent on the message length. Several works have tackled such parametric constraints through iterative algorithms, yet they require complex constructions specific to each constraint to guarantee convergence through monotonic progression. In this paper, we propose a universal framework for tackling any parametric constrained-channel problem through a novel simple iterative algorithm. By reducing an execution of this iterative algorithm to an acyclic graph traversal, we prove a surprising result that guarantees convergence with efficient average time complexity even without requiring any monotonic progression. We demonstrate the effectiveness of this universal framework by applying it to a variety of both local and global channel constraints. We begin by exploring the local constraints involving illegal substrings of variable length, where the universal construction essentially iteratively replaces forbidden windows. We apply this local algorithm to the minimal periodicity, minimal Hamming weight, local almost-balanced Hamming weight and the previously-unsolved minimal palindrome constraints. We then continue by exploring global constraints, and demonstrate the effectiveness of the proposed construction on the repeat-free encoding, reverse-complement encoding, and the open problem of global almost-balanced encoding. For reverse-complement, we also tackle a previously-unsolved version of the constraint that addresses overlapping windows. Overall, the proposed framework generates state-of-the-art constructions with significant ease while also enabling the simultaneous integration of multiple constraints for the first time.

Creating Custom Event Data Without Dictionaries: A Bag-of-Tricks

  • Authors: Andrew Halterman, Philip A. Schrodt, Andreas Beger, Benjamin E. Bagozzi, Grace I. Scarborough
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.01331
  • Pdf link: https://arxiv.org/pdf/2304.01331
  • Abstract
    Event data, or structured records of who did what to whom'' that are automatically extracted from text, is an important source of data for scholars of international politics. The high cost of developing new event datasets, especially using automated systems that rely on hand-built dictionaries, means that most researchers draw on large, pre-existing datasets such as ICEWS rather than developing tailor-made event datasets optimized for their specific research question. This paper describes a bag of tricks'' for efficient, custom event data production, drawing on recent advances in natural language processing (NLP) that allow researchers to rapidly produce customized event datasets. The paper introduces techniques for training an event category classifier with active learning, identifying actors and the recipients of actions in text using large language models and standard machine learning classifiers and pretrained ``question-answering'' models from NLP, and resolving mentions of actors to their Wikipedia article to categorize them. We describe how these techniques produced the new POLECAT global event dataset that is intended to replace ICEWS, along with examples of how scholars can quickly produce smaller, custom event datasets. We publish example code and models to implement our new techniques.

A Scale-Invariant Trajectory Simplification Method for Efficient Data Collection in Videos

  • Authors: Yang Liu, Luiz Gustavo Hafemann
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01340
  • Pdf link: https://arxiv.org/pdf/2304.01340
  • Abstract
    Training data is a critical requirement for machine learning tasks, and labeled training data can be expensive to acquire, often requiring manual or semi-automated data collection pipelines. For tracking applications, the data collection involves drawing bounding boxes around the classes of interest on each frame, and associate detections of the same "instance" over frames. In a semi-automated data collection pipeline, this can be achieved by running a baseline detection and tracking algorithm, and relying on manual correction to add/remove/change bounding boxes on each frame, as well as resolving errors in the associations over frames (track switches). In this paper, we propose a data correction pipeline to generate ground-truth data more efficiently in this semi-automated scenario. Our method simplifies the trajectories from the tracking systems and let the annotator verify and correct the objects in the sampled keyframes. Once the objects in the keyframes are corrected, the bounding boxes in the other frames are obtained by interpolation. Our method achieves substantial reduction in the number of frames requiring manual correction. In the MOT dataset, it reduces the number of frames by 30x while maintaining a HOTA score of 89.61% . Moreover, it reduces the number of frames by a factor of 10x while achieving a HOTA score of 79.24% in the SoccerNet dataset, and 85.79% in the DanceTrack dataset. The project code and data are publicly released at https://github.com/foreverYoungGitHub/trajectory-simplify-benchmark.

Accelerated parallel MRI using memory efficient and robust monotone operator learning (MOL)

  • Authors: Aniket Pramanik, Mathews Jacob
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.01351
  • Pdf link: https://arxiv.org/pdf/2304.01351
  • Abstract
    Model-based deep learning methods that combine imaging physics with learned regularization priors have been emerging as powerful tools for parallel MRI acceleration. The main focus of this paper is to determine the utility of the monotone operator learning (MOL) framework in the parallel MRI setting. The MOL algorithm alternates between a gradient descent step using a monotone convolutional neural network (CNN) and a conjugate gradient algorithm to encourage data consistency. The benefits of this approach include similar guarantees as compressive sensing algorithms including uniqueness, convergence, and stability, while being significantly more memory efficient than unrolled methods. We validate the proposed scheme by comparing it with different unrolled algorithms in the context of accelerated parallel MRI for static and dynamic settings.

PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching

  • Authors: Pedro Castro, Tae-Kyun Kim
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01382
  • Pdf link: https://arxiv.org/pdf/2304.01382
  • Abstract
    Estimating the pose of an unseen object is the goal of the challenging one-shot pose estimation task. Previous methods have heavily relied on feature matching with great success. However, these methods are often inefficient and limited by their reliance on pre-trained models that have not be designed specifically for pose estimation. In this paper we propose PoseMatcher, an accurate model free one-shot object pose estimator that overcomes these limitations. We create a new training pipeline for object to image matching based on a three-view system: a query with a positive and negative templates. This simple yet effective approach emulates test time scenarios by cheaply constructing an approximation of the full object point cloud during training. To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer, a new attention layer that efficiently accommodates self and cross attention between the inputs. Moreover, we propose a pruning strategy where we iteratively remove redundant regions of the target object to further reduce the complexity and noise of the network while maintaining accuracy. Finally we redesign commonly used pose refinement strategies, zoom and 2D offset refinements, and adapt them to the one-shot paradigm. We outperform all prior one-shot pose estimation methods on the Linemod and YCB-V datasets as well achieve results rivaling recent instance-level methods. The source code and models are available at https://github.com/PedroCastro/PoseMatcher.

LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models

  • Authors: Rongqi Pan, Taher A. Ghaleb, Lionel Briand
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.01397
  • Pdf link: https://arxiv.org/pdf/2304.01397
  • Abstract
    Test suite minimization (TSM) is typically used to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources, while maintaining the fault detection capability of the test suite. Though many TSM approaches exist, most of them rely on code coverage (white-box) or model-based features, which are not always available for test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. Though ATM achieves a better trade-off between effectiveness and efficiency than FAST-R, it suffers from scalability issues for large software systems as its execution time increases rapidly with test suite size. To address scalability, we propose LTM, a scalable and black-box similarity-based TSM approach based on language models. To support similarity measurement, we investigated three different pre-trained language models: CodeBERT, GraphCodeBERT, and UniXcoder, to extract embeddings of test code (Java test methods), on which we computed two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used for minimizing test suites, thus reducing minimization time. Experimental results showed that the best configuration of LTM (using UniXcoder with Cosine similarity) outperformed the best two configurations of ATM by achieving significantly higher fault detection rates (0.84 versus 0.81, on average) and, more importantly, running much faster (26.73 minutes versus 72.75 minutes, on average) than ATM, in terms of both preparation time (up to two orders of magnitude faster) and minimization time (one order of magnitude faster).

Adaptive Defective Area Identification in Material Surface Using Active Transfer Learning-based Level Set Estimation

  • Authors: Shota Hozumi, Kentaro Kutsukake, Kota Matsui, Syunya Kusakawa, Toru Ujihara, Ichiro Takeuchi
  • Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2304.01404
  • Pdf link: https://arxiv.org/pdf/2304.01404
  • Abstract
    In material characterization, identifying defective areas on a material surface is fundamental. The conventional approach involves measuring the relevant physical properties point-by-point at the predetermined mesh grid points on the surface and determining the area at which the property does not reach the desired level. To identify defective areas more efficiently, we propose adaptive mapping methods in which measurement resources are used preferentially to detect the boundaries of defective areas. We interpret this problem as an active-learning (AL) of the level set estimation (LSE) problem. The goal of AL-based LSE is to determine the level set of the physical property function defined on the surface with as small number of measurements as possible. Furthermore, to handle the situations in which materials with similar specifications are repeatedly produced, we introduce a transfer learning approach so that the information of previously produced materials can be effectively utilized. As a proof-of-concept, we applied the proposed methods to the red-zone estimation problem of silicon wafers and demonstrated that we could identify the defective areas with significantly lower measurement costs than those of conventional methods.

An Efficient Learning-Based Solver for Two-Stage DC Optimal Power Flow with Feasibility Guarantees

  • Authors: Ling Zhang, Daniel Tabas, Baosen Zhang
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.01409
  • Pdf link: https://arxiv.org/pdf/2304.01409
  • Abstract
    In this paper, we consider the scenario-based two-stage stochastic DC optimal power flow (OPF) problem for optimal and reliable dispatch when the load is facing uncertainty. Although this problem is a linear program, it remains computationally challenging to solve due to the large number of scenarios needed to accurately represent the uncertainties. To mitigate the computational issues, many techniques have been proposed to approximate the second-stage decisions so they can dealt more efficiently. The challenge of finding good policies to approximate the second-stage decisions is that these solutions need to be feasible, which has been difficult to achieve with existing policies. To address these challenges, this paper proposes a learning method to solve the two-stage problem in a more efficient and optimal way. A technique called the gauge map is incorporated into the learning architecture design to guarantee the learned solutions' feasibility to the network constraints. Namely, we can design policies that are feed forward functions that only output feasible solutions. Simulation results on standard IEEE systems show that, compared to iterative solvers and the widely used affine policy, our proposed method not only learns solutions of good quality but also accelerates the computation by orders of magnitude.

Thematic context vector association based on event uncertainty for Twitter

  • Authors: Vaibhav Khatavkar, Swapnil Mane, Parag Kulkarni
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.01423
  • Pdf link: https://arxiv.org/pdf/2304.01423
  • Abstract
    Keyword extraction is a crucial process in text mining. The extraction of keywords with respective contextual events in Twitter data is a big challenge. The challenging issues are mainly because of the informality in the language used. The use of misspelled words, acronyms, and ambiguous terms causes informality. The extraction of keywords with informal language in current systems is pattern based or event based. In this paper, contextual keywords are extracted using thematic events with the help of data association. The thematic context for events is identified using the uncertainty principle in the proposed system. The thematic contexts are weighed with the help of vectors called thematic context vectors which signifies the event as certain or uncertain. The system is tested on the Twitter COVID-19 dataset and proves to be effective. The system extracts event-specific thematic context vectors from the test dataset and ranks them. The extracted thematic context vectors are used for the clustering of contextual thematic vectors which improves the silhouette coefficient by 0.5% than state of art methods namely TF and TF-IDF. The thematic context vector can be used in other applications like Cyberbullying, sarcasm detection, figurative language detection, etc.

Optimizing Irrigation Efficiency using Deep Reinforcement Learning in the Field

  • Authors: Xianzhong Ding, Wan Du
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.01435
  • Pdf link: https://arxiv.org/pdf/2304.01435
  • Abstract
    Agricultural irrigation is a significant contributor to freshwater consumption. However, the current irrigation systems used in the field are not efficient. They rely mainly on soil moisture sensors and the experience of growers, but do not account for future soil moisture loss. Predicting soil moisture loss is challenging because it is influenced by numerous factors, including soil texture, weather conditions, and plant characteristics. This paper proposes a solution to improve irrigation efficiency, which is called DRLIC. DRLIC is a sophisticated irrigation system that uses deep reinforcement learning (DRL) to optimize its performance. The system employs a neural network, known as the DRL control agent, which learns an optimal control policy that considers both the current soil moisture measurement and the future soil moisture loss. We introduce an irrigation reward function that enables our control agent to learn from previous experiences. However, there may be instances where the output of our DRL control agent is unsafe, such as irrigating too much or too little water. To avoid damaging the health of the plants, we implement a safety mechanism that employs a soil moisture predictor to estimate the performance of each action. If the predicted outcome is deemed unsafe, we perform a relatively-conservative action instead. To demonstrate the real-world application of our approach, we developed an irrigation system that comprises sprinklers, sensing and control nodes, and a wireless network. We evaluate the performance of DRLIC by deploying it in a testbed consisting of six almond trees. During a 15-day in-field experiment, we compared the water consumption of DRLIC with a widely-used irrigation scheme. Our results indicate that DRLIC outperformed the traditional irrigation method by achieving a water savings of up to 9.52%.

On the coordination efficiency of strategic multi-agent robotic teams

  • Authors: Marcos M. Vasconcelos, Behrouz Touri
  • Subjects: Systems and Control (eess.SY); Information Theory (cs.IT); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2304.01445
  • Pdf link: https://arxiv.org/pdf/2304.01445
  • Abstract
    We study the problem of achieving decentralized coordination by a group of strategic decision makers choosing to engage or not in a task in a stochastic setting. First, we define a class of symmetric utility games that encompass a broad class of coordination games, including the popular framework known as \textit{global games}. With the goal of studying the extent to which agents engaging in a stochastic coordination game indeed coordinate, we propose a new probabilistic measure of coordination efficiency. Then, we provide an universal information theoretic upper bound on the coordination efficiency as a function of the amount of noise in the observation channels. Finally, we revisit a large class of global games, and we illustrate that their Nash equilibrium policies may be less coordination efficient then certainty equivalent policies, despite of them providing better expected utility. This counter-intuitive result, establishes the existence of a nontrivial trade-offs between coordination efficiency and expected utility in coordination games.

Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning

  • Authors: Ariyan Bighashdel, Daan de Geus, Pavol Jancura, Gijs Dubbelman
  • Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.01447
  • Pdf link: https://arxiv.org/pdf/2304.01447
  • Abstract
    Learning anticipation in Multi-Agent Reinforcement Learning (MARL) is a reasoning paradigm where agents anticipate the learning steps of other agents to improve cooperation among themselves. As MARL uses gradient-based optimization, learning anticipation requires using Higher-Order Gradients (HOG), with so-called HOG methods. Existing HOG methods are based on policy parameter anticipation, i.e., agents anticipate the changes in policy parameters of other agents. Currently, however, these existing HOG methods have only been applied to differentiable games or games with small state spaces. In this work, we demonstrate that in the case of non-differentiable games with large state spaces, existing HOG methods do not perform well and are inefficient due to their inherent limitations related to policy parameter anticipation and multiple sampling stages. To overcome these problems, we propose Off-Policy Action Anticipation (OffPA2), a novel framework that approaches learning anticipation through action anticipation, i.e., agents anticipate the changes in actions of other agents, via off-policy sampling. We theoretically analyze our proposed OffPA2 and employ it to develop multiple HOG methods that are applicable to non-differentiable games with large state spaces. We conduct a large set of experiments and illustrate that our proposed HOG methods outperform the existing ones regarding efficiency and performance.

Signal Temporal Logic Meets Convex-Concave Programming: A Structure-Exploiting SQP Algorithm for STL Specifications

  • Authors: Yoshinari Takayama, Kazumune Hashimoto, Toshiyuki Ohtsuka
  • Subjects: Systems and Control (eess.SY); Formal Languages and Automata Theory (cs.FL); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.01475
  • Pdf link: https://arxiv.org/pdf/2304.01475
  • Abstract
    This study considers the control problem with signal temporal logic (STL) specifications. Prior works have adopted smoothing techniques to address this problem within a feasible time frame and solve the problem by applying sequential quadratic programming (SQP) methods naively. However, one of the drawbacks of this approach is that solutions can easily become trapped in local minima that do not satisfy the specification. In this study, we propose a new optimization method, termed CCP-based SQP, based on the convex-concave procedure (CCP). Our framework includes a new robustness decomposition method that decomposes the robustness function into a set of constraints, resulting in a form of difference of convex (DC) program that can be solved efficiently. We solve this DC program sequentially as a quadratic program by only approximating the disjunctive parts of the specifications. Our experimental results demonstrate that our method has a superior performance compared to the state-of-the-art SQP methods in terms of both robustness and computational time.

Blockwise Compression of Transformer-based Models without Retraining

  • Authors: Gaochen Dong, Wei Chen
  • Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.01483
  • Pdf link: https://arxiv.org/pdf/2304.01483
  • Abstract
    Transformer-based models, represented by GPT-3, ChatGPT, and GPT-4, have recently attracted increasing interest, research enthusiasm, and business demand. However, their massive computation resources and huge memory footprint are inevitable challenges. To tackle this issue, we propose BCT, a framework of blockwise compression for transformers without retraining, to lower deployment thresholds. BCT achieves more fine-grained compression of the whole transformer, including embedding, matrix multiplication, GELU, Softmax, layer normalization, and all the intermediate results. As a case, we compress an efficient model with BCT and evaluate it on several General Language Understanding Evaluation (GLUE) datasets. The results show that BCT can achieve a less than 0.90% accuracy drop in most tasks.

OneShotSTL: One-Shot Seasonal-Trend Decomposition For Online Time Series Anomaly Detection And Forecasting

  • Authors: Xiao He, Ye Li, Jian Tan, Bin Wu, Feifei Li
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.01506
  • Pdf link: https://arxiv.org/pdf/2304.01506
  • Abstract
    Seasonal-trend decomposition is one of the most fundamental concepts in time series analysis that supports various downstream tasks, including time series anomaly detection and forecasting. However, existing decomposition methods rely on batch processing with a time complexity of O(W), where W is the number of data points within a time window. Therefore, they cannot always efficiently support real-time analysis that demands low processing delay. To address this challenge, we propose OneShotSTL, an efficient and accurate algorithm that can decompose time series online with an update time complexity of O(1). OneShotSTL is more than $1,000$ times faster than the batch methods, with accuracy comparable to the best counterparts. Extensive experiments on real-world benchmark datasets for downstream time series anomaly detection and forecasting tasks demonstrate that OneShotSTL is from 10 to over 1,000 times faster than the state-of-the-art methods, while still providing comparable or even better accuracy.

LiDAR-Based 3D Object Detection via Hybrid 2D Semantic Scene Generation

  • Authors: Haitao Yang, Zaiwei Zhang, Xiangru Huang, Min Bai, Chen Song, Bo Sun, Li Erran Li, Qixing Huang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01519
  • Pdf link: https://arxiv.org/pdf/2304.01519
  • Abstract
    Bird's-Eye View (BEV) features are popular intermediate scene representations shared by the 3D backbone and the detector head in LiDAR-based object detectors. However, little research has been done to investigate how to incorporate additional supervision on the BEV features to improve proposal generation in the detector head, while still balancing the number of powerful 3D layers and efficient 2D network operations. This paper proposes a novel scene representation that encodes both the semantics and geometry of the 3D environment in 2D, which serves as a dense supervision signal for better BEV feature learning. The key idea is to use auxiliary networks to predict a combination of explicit and implicit semantic probabilities by exploiting their complementary properties. Extensive experiments show that our simple yet effective design can be easily integrated into most state-of-the-art 3D object detectors and consistently improves upon baseline models.

FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2

  • Authors: Kohav Dey, Krishna Bajaj, K S Ramalakshmi, Samuel Thomas, Sriram Radhakrishna
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01524
  • Pdf link: https://arxiv.org/pdf/2304.01524
  • Abstract
    Marine ecosystems are vital for the planet's health, but human activities such as climate change, pollution, and overfishing pose a constant threat to marine species. Accurate classification and monitoring of these species can aid in understanding their distribution, population dynamics, and the impact of human activities on them. However, classifying marine species can be challenging due to their vast diversity and the complex underwater environment. With advancements in computer performance and GPU-based computing, deep-learning algorithms can now efficiently classify marine species, making it easier to monitor and manage marine ecosystems. In this paper, we propose an optimization to the MobileNetV2 model to achieve a 99.83% average validation accuracy by highlighting specific guidelines for creating a dataset and augmenting marine species images. This transfer learning algorithm can be deployed successfully on a mobile application for on-site classification at fisheries.

How Regional Wind Characteristics Affect CNN-based wind predictions: Insights from Spatiotemporal Correlation Analysis

  • Authors: Heesoo Shin, Mario Rüttgers, Sangseung Lee
  • Subjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
  • Arxiv link: https://arxiv.org/abs/2304.01545
  • Pdf link: https://arxiv.org/pdf/2304.01545
  • Abstract
    This study investigates the impact of spatiotemporal data dimensions on the precision of a wind forecasting model developed using an artificial neural network. Although previous studies have shown that incorporating spatial data can enhance the accuracy of wind forecasting models, few investigations have explored the extent of the improvement owing to different spatial scales in neural network-based predictive models. Additionally, there are limited studies on the optimal temporal length of the input data for these models. To address this gap, this study employs data with various spatiotemporal dimensions as inputs when forecasting wind using 3D-Convolutional Neural Networks (3D-CNN) and assesses their predictive performance. The results indicate that using spatial data of the surrounding area for 3D-CNN training can achieve better predictive performance than using only single-point information. Additionally, multi-time data had a more positive effect on the predictive performance than single-time data. To determine the reasons for this, correlation analyses were used to determine the impact of the spatial and temporal sizes of the training data on the prediction performance. The study found that as the autocorrelation coefficient (ACC) decreased, meaning that there was less similarity over time, the prediction performance decreased. Furthermore, the spatial standard deviation of the ACC also affects the prediction performance. A Pearson correlation coefficient (PCC) analysis was conducted to examine the effect of space on the prediction performance. Through the PCC analysis, we show that local geometric and seasonal wind conditions can influence the forecast capability of a predictive model.

Meta-Learning with a Geometry-Adaptive Preconditioner

  • Authors: Suhyun Kang, Duhun Hwang, Moonjung Eo, Taesup Kim, Wonjong Rhee
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.01552
  • Pdf link: https://arxiv.org/pdf/2304.01552
  • Abstract
    Model-agnostic meta-learning (MAML) is one of the most successful meta-learning algorithms. It has a bi-level optimization structure where the outer-loop process learns a shared initialization and the inner-loop process optimizes task-specific weights. Although MAML relies on the standard gradient descent in the inner-loop, recent studies have shown that controlling the inner-loop's gradient descent with a meta-learned preconditioner can be beneficial. Existing preconditioners, however, cannot simultaneously adapt in a task-specific and path-dependent way. Additionally, they do not satisfy the Riemannian metric condition, which can enable the steepest descent learning with preconditioned gradient. In this study, we propose Geometry-Adaptive Preconditioned gradient descent (GAP) that can overcome the limitations in MAML; GAP can efficiently meta-learn a preconditioner that is dependent on task-specific parameters, and its preconditioner can be shown to be a Riemannian metric. Thanks to the two properties, the geometry-adaptive preconditioner is effective for improving the inner-loop optimization. Experiment results show that GAP outperforms the state-of-the-art MAML family and preconditioned gradient descent-MAML (PGD-MAML) family in a variety of few-shot learning tasks. Code is available at: https://github.com/Suhyun777/CVPR23-GAP.

Information and Energy Transmission with Wavelet-Reconstructed Harvesting Functions

  • Authors: Daewon Seo, Yongjune Kim
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2304.01560
  • Pdf link: https://arxiv.org/pdf/2304.01560
  • Abstract
    In practical simultaneous information and energy transmission (SIET), the exact energy harvesting function is usually unavailable because an energy harvesting circuit is nonlinear and nonideal. In this work, we consider a SIET problem where the harvesting function is accessible only at experimentally-taken sample points and study how close we can design SIET to the optimal system with such sampled knowledge. Assuming that the harvesting function is of bounded variation that may have discontinuities, we separately consider two settings where samples are taken without and with additive noise. For these settings, we propose to design a SIET system as if a wavelet-reconstructed harvesting function is the true one and study its asymptotic performance loss of energy and information delivery from the true optimal one. Specifically, for noiseless samples, it is shown that designing SIET as if the wavelet-reconstructed harvesting function is the truth incurs asymptotically vanishing energy and information delivery loss with the number of samples. For noisy samples, we propose to reconstruct wavelet coefficients via soft-thresholding estimation. Then, we not only obtain similar asymptotic losses to the noiseless case but also show that the energy loss by wavelets is asymptotically optimal up to a logarithmic factor.

HALO: Hazard-Aware Landing Optimization for Autonomous Systems

  • Authors: Christopher R. Hayner, Samuel C. Buckner, Daniel Broyles, Evelyn Madewell, Karen Leung, Behcet Acikmese
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.01583
  • Pdf link: https://arxiv.org/pdf/2304.01583
  • Abstract
    With autonomous aerial vehicles enacting safety-critical missions, such as the Mars Science Laboratory Curiosity rover's landing on Mars, the tasks of automatically identifying and reasoning about potentially hazardous landing sites is paramount. This paper presents a coupled perception-planning solution which addresses the hazard detection, optimal landing trajectory generation, and contingency planning challenges encountered when landing in uncertain environments. Specifically, we develop and combine two novel algorithms, Hazard-Aware Landing Site Selection (HALSS) and Adaptive Deferred-Decision Trajectory Optimization (Adaptive-DDTO), to address the perception and planning challenges, respectively. The HALSS framework processes point cloud information to identify feasible safe landing zones, while Adaptive-DDTO is a multi-target contingency planner that adaptively replans as new perception information is received. We demonstrate the efficacy of our approach using a simulated Martian environment and show that our coupled perception-planning method achieves greater landing success whilst being more fuel efficient compared to a nonadaptive DDTO approach.

MM-BSN: Self-Supervised Image Denoising for Real-World with Multi-Mask based on Blind-Spot Network

  • Authors: Dan Zhang, Fangfang Zhou, Yuwen Jiang, Zhengming Fu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01598
  • Pdf link: https://arxiv.org/pdf/2304.01598
  • Abstract
    Recent advances in deep learning have been pushing image denoising techniques to a new level. In self-supervised image denoising, blind-spot network (BSN) is one of the most common methods. However, most of the existing BSN algorithms use a dot-based central mask, which is recognized as inefficient for images with large-scale spatially correlated noise. In this paper, we give the definition of large-noise and propose a multi-mask strategy using multiple convolutional kernels masked in different shapes to further break the noise spatial correlation. Furthermore, we propose a novel self-supervised image denoising method that combines the multi-mask strategy with BSN (MM-BSN). We show that different masks can cause significant performance differences, and the proposed MM-BSN can efficiently fuse the features extracted by multi-masked layers, while recovering the texture structures destroyed by multi-masking and information transmission. Our MM-BSN can be used to address the problem of large-noise denoising, which cannot be efficiently handled by other BSN methods. Extensive experiments on public real-world datasets demonstrate that the proposed MM-BSN achieves state-of-the-art performance among self-supervised and even unpaired image denoising methods for sRGB images denoising, without any labelling effort or prior knowledge. Code can be found in https://github.com/dannie125/MM-BSN.

An interpretability framework for Similar case matching

  • Authors: Nankai Lin, Haonan Liu, Jiajun Fang, Dong Zhou, Aimin Yang
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.01622
  • Pdf link: https://arxiv.org/pdf/2304.01622
  • Abstract
    Similar Case Matching (SCM) is designed to determine whether two cases are similar. The task has an essential role in the legal system, helping legal professionals to find relevant cases quickly and thus deal with them more efficiently. Existing research has focused on improving the model's performance but not on its interpretability. Therefore, this paper proposes a pipeline framework for interpretable SCM, which consists of four modules: a judicial feature sentence identification module, a case matching module, a feature sentence alignment module, and a conflict disambiguation module. Unlike existing SCM methods, our framework will identify feature sentences in a case that contain essential information, perform similar case matching based on the extracted feature sentence results, and align the feature sentences in the two cases to provide evidence for the similarity of the cases. SCM results may conflict with feature sentence alignment results, and our framework further disambiguates against this inconsistency. The experimental results show the effectiveness of our framework, and our work provides a new benchmark for interpretable SCM.

On a family of low-rank algorithms for large-scale algebraic Riccati equations

  • Authors: Christian Bertram, Heike Faßbender
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.01624
  • Pdf link: https://arxiv.org/pdf/2304.01624
  • Abstract
    In [3] it was shown that four seemingly different algorithms for computing low-rank approximate solutions $X_j$ to the solution $X$ of large-scale continuous-time algebraic Riccati equations (CAREs) $0 = \mathcal{R}(X) := A^HX+XA+C^HC-XBB^HX $ generate the same sequence $X_j$ when used with the same parameters. The Hermitian low-rank approximations $X_j$ are of the form $X_j = Z_jY_jZ_j^H,$ where $Z_j$ is a matrix with only few columns and $Y_j$ is a small square Hermitian matrix. Each $X_j$ generates a low-rank Riccati residual $\mathcal{R}(X_j)$ such that the norm of the residual can be evaluated easily allowing for an efficient termination criterion. Here a new family of methods to generate such low-rank approximate solutions $X_j$ of CAREs is proposed. Each member of this family of algorithms proposed generates the same sequence of $X_j$ as the four previously known algorithms. The approach is based on a block rational Arnoldi decomposition and an associated block rational Krylov subspace spanned by $A^H$ and $C^H.$ Two specific versions of the general algorithm will be considered; one will turn out to be equivalent to the RADI algorithm, the other one allows for a slightly more efficient implementation compared to the RADI algorithm. Moreover, our approach allows for adding more than one shift at a time.

Equivariant Networks for Porous Crystalline Materials

  • Authors: Marko Petković, Pablo Romero-Marimon, Vlado Menkovski, Sofia Calero
  • Subjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)
  • Arxiv link: https://arxiv.org/abs/2304.01628
  • Pdf link: https://arxiv.org/pdf/2304.01628
  • Abstract
    Efficiently predicting properties of porous crystalline materials has great potential to accelerate the high throughput screening process for developing new materials, as simulations carried out using first principles model are often computationally expensive. To effectively make use of Deep Learning methods to model these materials, we need to utilize the symmetries present in the crystals, which are defined by their space group. Existing methods for crystal property prediction either have symmetry constraints that are too restrictive or only incorporate symmetries between unit cells. In addition, these models do not explicitly model the porous structure of the crystal. In this paper, we develop a model which incorporates the symmetries of the unit cell of a crystal in its architecture and explicitly models the porous structure. We evaluate our model by predicting the heat of adsorption of CO$_2$ for different configurations of the mordenite zeolite. Our results confirm that our method performs better than existing methods for crystal property prediction and that the inclusion of pores results in a more efficient model.

Moving Obstacle Collision Avoidance via Chance-Constrained MPC with CBF

  • Authors: Ming Li, Zhiyong Sun, Zirui Liao, Siep Weiland
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.01639
  • Pdf link: https://arxiv.org/pdf/2304.01639
  • Abstract
    Model predictive control (MPC) with control barrier functions (CBF) is a promising solution to address the moving obstacle collision avoidance (MOCA) problem. Unlike MPC with distance constraints (MPC-DC), this approach facilitates early obstacle avoidance without the need to increase prediction horizons. However, the existing MPC-CBF method is deterministic and fails to account for perception uncertainties. This paper proposes a generalized MPC-CBF approach for stochastic scenarios, which maintains the advantages of the deterministic method for addressing the MOCA problem. Specifically, the chance-constrained MPC-CBF (CC-MPC-CBF) technique is introduced to ensure that a user-defined collision avoidance probability is met by utilizing probabilistic CBFs. However, due to the potential empty intersection between the reachable set and the safe region confined by CBF constraints, the CC-MPC-CBF problem can pose challenges in achieving feasibility. To address this issue, we propose a sequential implementation approach that involves solving a standard MPC optimization problem followed by a predictive safety filter optimization, which leads to improved feasibility. Furthermore, we introduce an iterative convex optimization scheme to further expedite the resolution of the predictive safety filter, which results in an efficient approach to tackling the non-convex CC-MPC-CBF problem. We apply our proposed algorithm to a 2-D integrator system for MOCA, and we showcase its resilience to obstacle measurement uncertainties and favorable feasibility properties.

Adaptive Image Compression via Optimal Mesh Refinement

  • Authors: Michael Feischl, Hubert Hackl
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.01640
  • Pdf link: https://arxiv.org/pdf/2304.01640
  • Abstract
    The JPEG algorithm is a defacto standard for image compression. We investigate whether adaptive mesh refinement can be used to optimize the compression ratio and propose a new adaptive image compression algorithm. We prove that it produces a quasi-optimal subdivision grid for a given error norm with high probability. This subdivision can be stored with very little overhead and thus leads to an efficient compression algorithm. We demonstrate experimentally, that the new algorithm can achieve better compression ratios than standard JPEG compression with no visible loss of quality on many images. The mathematical core of this work shows that Binev's optimal tree approximation algorithm is applicable to image compression with high probability, when we assume small additive Gaussian noise on the pixels of the image.

Controller Synthesis for Local and Global Specifications in Multi-Agent Systems

  • Authors: David Smith Sundarsingh, Jay Bhagiya, Saharsh, Jeel Chatrola, Adnane Saoud, Pushpak Jagtap
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.01652
  • Pdf link: https://arxiv.org/pdf/2304.01652
  • Abstract
    In this paper, we propose a computationally efficient symbolic controller synthesis technique for multi-agent systems. The paper focuses on synthesizing distributed controllers enforcing local temporal logic specifications along with global safety specifications for multi-agent systems. To solve the problem in a computationally efficient way we leverage the concept of control barrier functions. In particular, we use a three-step bottom-up approach: first, the symbolic controllers for individual agents are synthesized to enforce local temporal logic specifications, then we use a notion of control barrier functions for symbolic models to compose controlled agent systems by removing unsafe transitions, and finally, we synthesize controller for the reduced composed system to ensure the satisfaction of local temporal logic specifications while ensuring global safety specification. The effectiveness of our approach is demonstrated on a multi-robot system by comparing it with the conventional monolithic symbolic control approach.

High-performance Time Series Anomaly Discovery on Graphics Processors

  • Authors: Mikhail Zymbler, Yana Kraeva
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.01660
  • Pdf link: https://arxiv.org/pdf/2304.01660
  • Abstract
    Currently, discovering subsequence anomalies in time series remains one of the most topical research problems. A subsequence anomaly refers to successive points in time that are collectively abnormal, although each point is not necessarily an outlier. Among a large number of approaches to discovering subsequence anomalies, the discord concept is considered one of the best. A time series discord is intuitively defined as a subsequence of a given length that is maximally far away from its non-overlapping nearest neighbor. Recently introduced the MERLIN algorithm discovers time series discords of every possible length in a specified range, thereby eliminating the need to set even that sole parameter to discover discords in a time series. However, MERLIN is serial and its parallelization could increase the performance of discords discovery. In this article, we introduce a novel parallelization scheme for GPUs, called PALMAD, Parallel Arbitrary Length MERLIN-based Anomaly Discovery. As opposed to its serial predecessor, PALMAD employs recurrent formulas we have derived to avoid redundant calculations, and advanced data structures for the efficient implementation of parallel processing. Experimental evaluation over real-world and synthetic time series shows that our algorithm outperforms parallel analogs. We also apply PALMAD to discover anomalies in a real-world time series employing our proposed discord heatmap technique to illustrate the results.

Reduced-Precision Floating-Point Arithmetic in Systolic Arrays with Skewed Pipelines

  • Authors: Dionysios Filippas, Christodoulos Peltekis, Giorgos Dimitrakopoulos, Chrysostomos Nicopoulos
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.01668
  • Pdf link: https://arxiv.org/pdf/2304.01668
  • Abstract
    The acceleration of deep-learning kernels in hardware relies on matrix multiplications that are executed efficiently on Systolic Arrays (SA). To effectively trade off deep-learning training/inference quality with hardware cost, SA accelerators employ reduced-precision Floating-Point (FP) arithmetic. In this work, we demonstrate the need for new pipeline organizations to reduce latency and improve energy efficiency of reduced-precision FP operators for the chained multiply-add operation imposed by the structure of the SA. The proposed skewed pipeline design reorganizes the pipelined operation of the FP multiply-add units to enable new forwarding paths for the exponent logic, which allow for parallel execution of the pipeline stages of consecutive PEs. As a result, the latency of the matrix multiplication operation within the SA is significantly reduced with minimal hardware cost, thereby yielding an energy reduction of 8% and 11% for the examined state-of-the-art CNNs.

Comparison of Two Search Criteria for Lattice-based Kernel Approximation

  • Authors: Frances Y. Kuo, Weiwen Mo, Dirk Nuyens, Ian H. Sloan, Abirami Srikumar
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.01685
  • Pdf link: https://arxiv.org/pdf/2304.01685
  • Abstract
    The kernel interpolant in a reproducing kernel Hilbert space is optimal in the worst-case sense among all approximations of a function using the same set of function values. In this paper, we compare two search criteria to construct lattice point sets for use in lattice-based kernel approximation. The first candidate, $\calP_n^$, is based on the power function that appears in machine learning literature. The second, $\calS_n^$, is a search criterion used for generating lattices for approximation using truncated Fourier series. We find that the empirical difference in error between the lattices constructed using $\calP_n^$ and $\calS_n^$ is marginal. The criterion $\calS_n^*$ is preferred as it is computationally more efficient and has a proven error bound.

Towards Open-Vocabulary Video Instance Segmentation

  • Authors: Haochen Wang, Shuai Wang, Cilin Yan, Xiaolong Jiang, XU Tang, Yao Hu, Weidi Xie, Efstratios Gavves
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.01715
  • Pdf link: https://arxiv.org/pdf/2304.01715
  • Abstract
    Video Instance Segmentation(VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos. To address this limitation, we make the following three contributions. First, we introduce the novel task of Open-Vocabulary Video Instance Segmentation, which aims to simultaneously segment, track, and classify objects in videos from open-set categories, including novel categories unseen during training. Second, to benchmark Open-Vocabulary VIS, we collect a Large-Vocabulary Video Instance Segmentation dataset(LV-VIS), that contains well-annotated objects from 1,212 diverse categories, significantly surpassing the category size of existing datasets by more than one order of magnitude. Third, we propose an efficient Memory-Induced Vision-Language Transformer, MindVLT, to first achieve Open-Vocabulary VIS in an end-to-end manner with near real-time inference speed. Extensive experiments on LV-VIS and four existing VIS datasets demonstrate the strong zero-shot generalization ability of MindVLT on novel categories. We will release the dataset and code to facilitate future endeavors.

Virtio-FPGA: a virtualization solution for SoC-attached FPGAs

  • Authors: Anna Panagopoulou, Michele Paolino, Daniel Raho
  • Subjects: Operating Systems (cs.OS)
  • Arxiv link: https://arxiv.org/abs/2304.01721
  • Pdf link: https://arxiv.org/pdf/2304.01721
  • Abstract
    Recently, FPGA accelerators have risen in popularity as they present a suitable way of satisfying the high-computation and low-power demands of real time applications. The modern electric transportation systems (such as aircraft, road vehicles) can greatly profit from embedded FPGAs, which incorporate both high-performance and flexibility features into a single SoC. At the same time, the virtualization of FPGA resources aims to reinforce these systems with strong isolation, consolidation and security. In this paper, we present a novel virtualization framework aimed for SoC-attached FPGA devices, in a Linux and QEMU/KVM setup. We use Virtio as a means to enable the configuration of FPGA resources from guest systems in an efficient way. Also, we employ the Linux VFIO and Device Tree Overlays technologies in order to render the FPGA resources dynamically accessible to guest systems. The ability to dynamically configure and utilize the FPGA resources from a virtualization environment is described in details. The evaluation procedure of the solution is presented and the virtualization overhead is benchmarked as minimal (around 10%) when accessing the FPGA devices from guest systems.

Learning quantities of interest from parametric PDEs: An efficient neural-weighted Minimal Residual approach

  • Authors: Ignacio Brevis, Ignacio Muga, David Pardo, Oscar Rodríguez, Kristoffer G. van der Zee
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.01722
  • Pdf link: https://arxiv.org/pdf/2304.01722
  • Abstract
    The efficient approximation of parametric PDEs is of tremendous importance in science and engineering. In this paper, we show how one can train Galerkin discretizations to efficiently learn quantities of interest of solutions to a parametric PDE. The central component in our approach is an efficient neural-network-weighted Minimal-Residual formulation, which, after training, provides Galerkin-based approximations in standard discrete spaces that have accurate quantities of interest, regardless of the coarseness of the discrete space.

Black Box Few-Shot Adaptation for Vision-Language models

  • Authors: Yassine Ouali, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.01752
  • Pdf link: https://arxiv.org/pdf/2304.01752
  • Abstract
    Vision-Language (V-L) models trained with contrastive learning to align the visual and language modalities have been shown to be strong few-shot learners. Soft prompt learning is the method of choice for few-shot downstream adaption aiming to bridge the modality gap caused by the distribution shift induced by the new domain. While parameter-efficient, prompt learning still requires access to the model weights and can be computationally infeasible for large models with billions of parameters. To address these shortcomings, in this work, we describe a black-box method for V-L few-shot adaptation that (a) operates on pre-computed image and text features and hence works without access to the model's weights, (b) it is orders of magnitude faster at training time, (c) it is amenable to both supervised and unsupervised training, and (d) it can be even used to align image and text features computed from uni-modal models. To achieve this, we propose Linear Feature Alignment (LFA), a simple linear approach for V-L re-alignment in the target domain. LFA is initialized from a closed-form solution to a least-squares problem and then it is iteratively updated by minimizing a re-ranking loss. Despite its simplicity, our approach can even surpass soft-prompt learning methods as shown by extensive experiments on 11 image and 2 video datasets.

Efficient Quotients Using Exact Arithmetic

  • Authors: Stephen M. Watt
  • Subjects: Symbolic Computation (cs.SC)
  • Arxiv link: https://arxiv.org/abs/2304.01753
  • Pdf link: https://arxiv.org/pdf/2304.01753
  • Abstract
    One method to compute multiple precision integer quotients is to use a Newton iteration with multiple precision fixed point or floating point values. On one hand, this allows quotients to be calculated efficiently by employing an efficient multiplication method. On the other hand, this leads to a library structure where exact and approximate arithmetic are interdependent. This paper develops the concept of a shifted inverse and modified Newton iteration to compute quotients efficiently using whole numbers only. The method is equally applicable to computing polynomial quotients efficiently.

Incorporating Unlabelled Data into Bayesian Neural Networks

  • Authors: Mrinank Sharma, Tom Rainforth, Yee Whye Teh, Vincent Fortuin
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.01762
  • Pdf link: https://arxiv.org/pdf/2304.01762
  • Abstract
    We develop a contrastive framework for learning better prior distributions for Bayesian Neural Networks (BNNs) using unlabelled data. With this framework, we propose a practical BNN algorithm that offers the label-efficiency of self-supervised learning and the principled uncertainty estimates of Bayesian methods. Finally, we demonstrate the advantages of our approach for data-efficient learning in semi-supervised and low-budget active learning problems.

Neural Field Convolutions by Repeated Differentiation

  • Authors: Ntumba Elie Nsampi, Adarsh Djeacoumar, Hans-Peter Seidel, Tobias Ritschel, Thomas Leimkühler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.01834
  • Pdf link: https://arxiv.org/pdf/2304.01834
  • Abstract
    Neural fields are evolving towards a general-purpose continuous representation for visual computing. Yet, despite their numerous appealing properties, they are hardly amenable to signal processing. As a remedy, we present a method to perform general continuous convolutions with general continuous signals such as neural fields. Observing that piecewise polynomial kernels reduce to a sparse set of Dirac deltas after repeated differentiation, we leverage convolution identities and train a repeated integral field to efficiently execute large-scale convolutions. We demonstrate our approach on a variety of data modalities and spatially-varying kernels.

FAST: Fidelity-Adjustable Semantic Transmission over Heterogeneous Wireless Networks

  • Authors: Peichun Li, Guoliang Cheng, Jiawen Kang, Rong Yu, Liping Qian, Yuan Wu, Dusit Niyato
  • Subjects: Networking and Internet Architecture (cs.NI); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.01857
  • Pdf link: https://arxiv.org/pdf/2304.01857
  • Abstract
    In this work, we investigate the challenging problem of on-demand semantic communication over heterogeneous wireless networks. We propose a fidelity-adjustable semantic transmission framework (FAST) that empowers wireless devices to send data efficiently under different application scenarios and resource conditions. To this end, we first design a dynamic sub-model training scheme to learn the flexible semantic model, which enables edge devices to customize the transmission fidelity with different widths of the semantic model. After that, we focus on the FAST optimization problem to minimize the system energy consumption with latency and fidelity constraints. Following that, the optimal transmission strategies including the scaling factor of the semantic model, computing frequency, and transmitting power are derived for the devices. Experiment results indicate that, when compared to the baseline transmission schemes, the proposed framework can reduce up to one order of magnitude of the system energy consumption and data size for maintaining reasonable data fidelity.

Incremental Verification of Neural Networks

  • Authors: Shubham Ugare, Debangshu Banerjee, Sasa Misailovic, Gagandeep Singh
  • Subjects: Machine Learning (cs.LG); Programming Languages (cs.PL); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.01874
  • Pdf link: https://arxiv.org/pdf/2304.01874
  • Abstract
    Complete verification of deep neural networks (DNNs) can exactly determine whether the DNN satisfies a desired trustworthy property (e.g., robustness, fairness) on an infinite set of inputs or not. Despite the tremendous progress to improve the scalability of complete verifiers over the years on individual DNNs, they are inherently inefficient when a deployed DNN is updated to improve its inference speed or accuracy. The inefficiency is because the expensive verifier needs to be run from scratch on the updated DNN. To improve efficiency, we propose a new, general framework for incremental and complete DNN verification based on the design of novel theory, data structure, and algorithms. Our contributions implemented in a tool named IVAN yield an overall geometric mean speedup of 2.4x for verifying challenging MNIST and CIFAR10 classifiers and a geometric mean speedup of 3.8x for the ACAS-XU classifiers over the state-of-the-art baselines.

Geometric Particle-In-Cell discretizations of a plasma hybrid model with kinetic ions and mass-less fluid electrons

  • Authors: Yingzhe Li, Martin Campos Pinto, Florian Holderied, Stefan Possanner, Eric Sonnendrücker
  • Subjects: Numerical Analysis (math.NA); Plasma Physics (physics.plasm-ph)
  • Arxiv link: https://arxiv.org/abs/2304.01891
  • Pdf link: https://arxiv.org/pdf/2304.01891
  • Abstract
    We explore the possibilities of applying structure-preserving numerical methods to a plasma hybrid model with kinetic ions and mass-less fluid electrons satisfying the quasi-neutrality relation. The numerical schemes are derived by finite element methods in the framework of finite element exterior calculus (FEEC) for field variables, particle-in-cell (PIC) methods for the Vlasov equation, and splitting methods in time based on an anti-symmetric bracket proposed. Conservation properties of energy, quasi-neutrality relation, positivity of density, and divergence-free property of the magnetic field are given irrespective of the used resolution and metric. Local quasi-interpolation is used for dealing with the current terms in order to make the proposed methods more efficient. The implementation has been done in the framework of the Python package STRUPHY [1], and has been verified by extensive numerical experiments.

Uncertainty Quantification for Recursive Estimation in Adaptive Safety-Critical Control

  • Authors: Max H. Cohen, Makai Mann, Kevin Leahy, Calin Belta
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.01901
  • Pdf link: https://arxiv.org/pdf/2304.01901
  • Abstract
    In this paper, we present a framework for online parameter estimation and uncertainty quantification in the context of adaptive safety-critical control. First, we demonstrate how incorporating a history stack of data into the classic recursive least squares algorithm facilitates parameter convergence under relaxed excitation conditions. Our key observation is that the estimate generated by this algorithm at any point in time is an affine transformation of the initial estimate. This property allows for parameterizing the uncertainty associated with such estimates using objects that are closed under affine transformation, such as zonotopes and Gaussian distributions, and enables the efficient propagation of such uncertainty metrics along the trajectory of the parameter estimates. We illustrate how such an approach facilitates the synthesis of safety-critical controllers for systems with parametric uncertainty using control barrier functions. Finally, we demonstrate the advantages of online adaptation and uncertainty quantification via numerical examples.

Torch-Choice: A PyTorch Package for Large-Scale Choice Modelling with Python

  • Authors: Tianyu Du, Ayush Kanodia, Susan Athey
  • Subjects: Machine Learning (cs.LG); Mathematical Software (cs.MS); Econometrics (econ.EM)
  • Arxiv link: https://arxiv.org/abs/2304.01906
  • Pdf link: https://arxiv.org/pdf/2304.01906
  • Abstract
    The $\texttt{torch-choice}$ is an open-source library for flexible, fast choice modeling with Python and PyTorch. $\texttt{torch-choice}$ provides a $\texttt{ChoiceDataset}$ data structure to manage databases flexibly and memory-efficiently. The paper demonstrates constructing a $\texttt{ChoiceDataset}$ from databases of various formats and functionalities of $\texttt{ChoiceDataset}$. The package implements two widely used models, namely the multinomial logit and nested logit models, and supports regularization during model estimation. The package incorporates the option to take advantage of GPUs for estimation, allowing it to scale to massive datasets while being computationally efficient. Models can be initialized using either R-style formula strings or Python dictionaries. We conclude with a comparison of the computational efficiencies of $\texttt{torch-choice}$ and $\texttt{mlogit}$ in R as (1) the number of observations increases, (2) the number of covariates increases, and (3) the expansion of item sets. Finally, we demonstrate the scalability of $\texttt{torch-choice}$ on large-scale datasets.

Inverting the SerDes Link Design Flow Process

  • Authors: Michael J. Degerstrom, Chad M. Smutzer, Patrick J. Zabinski, Barry K. Gilbert
  • Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.01911
  • Pdf link: https://arxiv.org/pdf/2304.01911
  • Abstract
    The traditional SerDes link simulation process begins with the extraction of printed circuit board (PCB) physical stripline and via models, followed by channel modeling and link simulation. We invert this simulation flow by first creating link performance curves across an array of hypothetical channels defined with specially-developed, high level, equation-based models; limited physical extraction is later undertaken to relate PCB channel implementation to these performance curves. These curves allow us to determine the system-level SerDes channel requirements and to become better informed in choosing PCB technologies for lower cost and easier manufacturability. The inverted modeling process is very efficient, allowing for the rapid identification and avoidance of problematic channel topologies and the study of other potentially useful channel designs.

Accelerating and Compressing Deep Neural Networks for Massive MIMO CSI Feedback

  • Authors: Omar Erak, Hatem Abou-Zeid
  • Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.01914
  • Pdf link: https://arxiv.org/pdf/2304.01914
  • Abstract
    The recent advances in machine learning and deep neural networks have made them attractive candidates for wireless communications functions such as channel estimation, decoding, and downlink channel state information (CSI) compression. However, most of these neural networks are large and inefficient making it a barrier for deployment in practical wireless systems that require low-latency and low memory footprints for individual network functions. To mitigate these limitations, we propose accelerated and compressed efficient neural networks for massive MIMO CSI feedback. Specifically, we have thoroughly investigated the adoption of network pruning, post-training dynamic range quantization, and weight clustering to optimize CSI feedback compression for massive MIMO systems. Furthermore, we have deployed the proposed model compression techniques on commodity hardware and demonstrated that in order to achieve inference gains, specialized libraries that accelerate computations for sparse neural networks are required. Our findings indicate that there is remarkable value in applying these model compression techniques and the proposed joint pruning and quantization approach reduced model size by 86.5% and inference time by 76.2% with minimal impact to model accuracy. These compression methods are crucial to pave the way for practical adoption and deployments of deep learning-based techniques in commercial wireless systems.

Strong Baselines for Parameter Efficient Few-Shot Fine-tuning

  • Authors: Samyadeep Basu, Daniela Massiceti, Shell Xu Hu, Soheil Feizi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01917
  • Pdf link: https://arxiv.org/pdf/2304.01917
  • Abstract
    Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase on a set of base classes. Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC. Fine-tuning ViTs, however, is expensive in time, compute and storage. This has motivated the design of parameter efficient fine-tuning (PEFT) methods which fine-tune only a fraction of the Transformer's parameters. While these methods have shown promise, inconsistencies in experimental conditions make it difficult to disentangle their advantage from other experimental factors including the feature extractor architecture, pre-trained initialization and fine-tuning algorithm, amongst others. In our paper, we conduct a large-scale, experimentally consistent, empirical analysis to study PEFTs for few-shot image classification. Through a battery of over 1.8k controlled experiments on large-scale few-shot benchmarks including Meta-Dataset (MD) and ORBIT, we uncover novel insights on PEFTs that cast light on their efficacy in fine-tuning ViTs for few-shot classification. Through our controlled empirical study, we have two main findings: (i) Fine-tuning just the LayerNorm parameters (which we call LN-Tune) during few-shot adaptation is an extremely strong baseline across ViTs pre-trained with both self-supervised and supervised objectives, (ii) For self-supervised ViTs, we find that simply learning a set of scaling parameters for each attention matrix (which we call AttnScale) along with a domain-residual adapter (DRA) module leads to state-of-the-art performance (while being $\sim!$ 9$\times$ more parameter-efficient) on MD. Our extensive empirical findings set strong baselines and call for rethinking the current design of PEFT methods for FSC.

High-Throughput Vector Similarity Search in Knowledge Graphs

  • Authors: Jason Mohoney, Anil Pacaci, Shihabur Rahman Chowdhury, Ali Mousavi, Ihab F. Ilyas, Umar Farooq Minhas, Jeffrey Pound, Theodoros Rekatsinas
  • Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.01926
  • Pdf link: https://arxiv.org/pdf/2304.01926
  • Abstract
    There is an increasing adoption of machine learning for encoding data into vectors to serve online recommendation and search use cases. As a result, recent data management systems propose augmenting query processing with online vector similarity search. In this work, we explore vector similarity search in the context of Knowledge Graphs (KGs). Motivated by the tasks of finding related KG queries and entities for past KG query workloads, we focus on hybrid vector similarity search (hybrid queries for short) where part of the query corresponds to vector similarity search and part of the query corresponds to predicates over relational attributes associated with the underlying data vectors. For example, given past KG queries for a song entity, we want to construct new queries for new song entities whose vector representations are close to the vector representation of the entity in the past KG query. But entities in a KG also have non-vector attributes such as a song associated with an artist, a genre, and a release date. Therefore, suggested entities must also satisfy query predicates over non-vector attributes beyond a vector-based similarity predicate. While these tasks are central to KGs, our contributions are generally applicable to hybrid queries. In contrast to prior works that optimize online queries, we focus on enabling efficient batch processing of past hybrid query workloads. We present our system, HQI, for high-throughput batch processing of hybrid queries. We introduce a workload-aware vector data partitioning scheme to tailor the vector index layout to the given workload and describe a multi-query optimization technique to reduce the overhead of vector similarity computations. We evaluate our methods on industrial workloads and demonstrate that HQI yields a 31x improvement in throughput for finding related KG queries compared to existing hybrid query processing approaches.

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models

  • Authors: Zhiqiang Hu, Yihuai Lan, Lei Wang, Wanyu Xu, Ee-Peng Lim, Roy Ka-Wei Lee, Lidong Bing, Soujanya Poria
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.01933
  • Pdf link: https://arxiv.org/pdf/2304.01933
  • Abstract
    The success of large language models (LLMs), like GPT-3 and ChatGPT, has led to the development of numerous cost-effective and accessible alternatives that are created by fine-tuning open-access LLMs with task-specific data (e.g., ChatDoctor) or instruction data (e.g., Alpaca). Among the various fine-tuning methods, adapter-based parameter-efficient fine-tuning (PEFT) is undoubtedly one of the most attractive topics, as it only requires fine-tuning a few external parameters instead of the entire LLMs while achieving comparable or even better performance. To enable further research on PEFT methods of LLMs, this paper presents LLM-Adapters, an easy-to-use framework that integrates various adapters into LLMs and can execute these adapter-based PEFT methods of LLMs for different tasks. The framework includes state-of-the-art open-access LLMs such as LLaMA, BLOOM, OPT, and GPT-J, as well as widely used adapters such as Series adapter, Parallel adapter, and LoRA. The framework is designed to be research-friendly, efficient, modular, and extendable, allowing the integration of new adapters and the evaluation of them with new and larger-scale LLMs. Furthermore, to evaluate the effectiveness of adapters in LLMs-Adapters, we conduct experiments on six math reasoning datasets. The results demonstrate that using adapter-based PEFT in smaller-scale LLMs (7B) with few extra trainable parameters yields comparable, and in some cases superior, performance to that of powerful LLMs (175B) in zero-shot inference on simple math reasoning datasets. Overall, we provide a promising framework for fine-tuning large LLMs on downstream tasks. We believe the proposed LLMs-Adapters will advance adapter-based PEFT research, facilitate the deployment of research pipelines, and enable practical applications to real-world systems.

Scenario-Game ADMM: A Parallelized Scenario-Based Solver for Stochastic Noncooperative Games

  • Authors: Jingqi Li, Chih-Yuan Chiu, Lasse Peters, Fernando Palafox, Mustafa Karabag, Javier Alonso-Mora, Somayeh Sojoudi, Claire Tomlin, David Fridovich-Keil
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.01945
  • Pdf link: https://arxiv.org/pdf/2304.01945
  • Abstract
    Decision making in multi-agent games can be extremely challenging, particularly under uncertainty. In this work, we propose a new sample-based approximation to a class of stochastic, general-sum, pure Nash games, where each player has an expected-value objective and a set of chance constraints. This new approximation scheme inherits the accuracy of objective approximation from the established sample average approximation (SAA) method and enjoys a feasibility guarantee derived from the scenario optimization literature. We characterize the sample complexity of this new game-theoretic approximation scheme, and observe that high accuracy usually requires a large number of samples, which results in a large number of sampled constraints. To accommodate this, we decompose the approximated game into a set of smaller games with few constraints for each sampled scenario, and propose a decentralized, consensus ADMM algorithm to efficiently compute a generalized Nash equilibrium of the approximated game. We prove the convergence of our algorithm and empirically demonstrate superior performance relative to a recent baseline.

Strong spatial mixing for colorings on trees and its algorithmic applications

  • Authors: Zongchen Chen, Kuikui Liu, Nitya Mani, Ankur Moitra
  • Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Combinatorics (math.CO); Probability (math.PR)
  • Arxiv link: https://arxiv.org/abs/2304.01954
  • Pdf link: https://arxiv.org/pdf/2304.01954
  • Abstract
    Strong spatial mixing (SSM) is an important quantitative notion of correlation decay for Gibbs distributions arising in statistical physics, probability theory, and theoretical computer science. A longstanding conjecture is that the uniform distribution on proper $q$-colorings on a $\Delta$-regular tree exhibits SSM whenever $q \ge \Delta+1$. Moreover, it is widely believed that as long as SSM holds on bounded-degree trees with $q$ colors, one would obtain an efficient sampler for $q$-colorings on all bounded-degree graphs via simple Markov chain algorithms. It is surprising that such a basic question is still open, even on trees, but then again it also highlights how much we still have to learn about random colorings. In this paper, we show the following: (1) For any $\Delta \ge 3$, SSM holds for random $q$-colorings on trees of maximum degree $\Delta$ whenever $q \ge \Delta + 3$. Thus we almost fully resolve the aforementioned conjecture. Our result substantially improves upon the previously best bound which requires $q \ge 1.59\Delta+\gamma^$ for an absolute constant $\gamma^ > 0$. (2) For any $\Delta\ge 3$ and girth $g = \Omega_\Delta(1)$, we establish optimal mixing of the Glauber dynamics for $q$-colorings on graphs of maximum degree $\Delta$ and girth $g$ whenever $q \ge \Delta+3$. Our approach is based on a new general reduction from spectral independence on large-girth graphs to SSM on trees that is of independent interest. Using the same techniques, we also prove near-optimal bounds on weak spatial mixing (WSM), a closely-related notion to SSM, for the antiferromagnetic Potts model on trees.

DWA: Differential Wavelet Amplifier for Image Super-Resolution

  • Authors: Brian Moser, Stanislav Frolov, Federico Raue, Sebastian Palacio, Andreas Dengel
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.01994
  • Pdf link: https://arxiv.org/pdf/2304.01994
  • Abstract
    This work introduces Differential Wavelet Amplifier (DWA), a drop-in module for wavelet-based image Super-Resolution (SR). DWA invigorates an approach recently receiving less attention, namely Discrete Wavelet Transformation (DWT). DWT enables an efficient image representation for SR and reduces the spatial area of its input by a factor of 4, the overall model size, and computation cost, framing it as an attractive approach for sustainable ML. Our proposed DWA model improves wavelet-based SR models by leveraging the difference between two convolutional filters to refine relevant feature extraction in the wavelet domain, emphasizing local contrasts and suppressing common noise in the input signals. We show its effectiveness by integrating it into existing SR models, e.g., DWSR and MWCNN, and demonstrate a clear improvement in classical SR tasks. Moreover, DWA enables a direct application of DWSR and MWCNN to input image space, reducing the DWT representation channel-wise since it omits traditional DWT.

Towards Optimal Human-Robot Interface Design Applied to Underwater Robotics Teleoperation

  • Authors: Paulo Padrao, Jose Fuentes, Tero Kaarlela, Alfredo Bayuelo, Leonardo Bobadilla
  • Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.02002
  • Pdf link: https://arxiv.org/pdf/2304.02002
  • Abstract
    Efficient and intuitive Human-Robot interfaces are crucial for expanding the user base of operators and enabling new applications in critical areas such as precision agriculture, automated construction, rehabilitation, and environmental monitoring. In this paper, we investigate the design of human-robot interfaces for the teleoperation of dynamical systems. The proposed framework seeks to find an optimal interface that complies with key concepts such as user comfort, efficiency, continuity, and consistency. As a proof-of-concept, we introduce an innovative approach to teleoperating underwater vehicles, allowing the translation between human body movements into vehicle control commands. This method eliminates the need for divers to work in harsh underwater environments while taking into account comfort and communication constraints. We conducted a study with human subjects using a head-mounted display attached to a smartphone to control a simulated ROV. Also, numerical experiments have demonstrated that the optimal translation is often the most intuitive and natural one, aligning with users' expectations.

Multi-Level Contrastive Learning for Dense Prediction Task

  • Authors: Qiushan Guo, Yizhou Yu, Yi Jiang, Jiannan Wu, Zehuan Yuan, Ping Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.02010
  • Pdf link: https://arxiv.org/pdf/2304.02010
  • Abstract
    In this work, we present Multi-Level Contrastive Learning for Dense Prediction Task (MCL), an efficient self-supervised method for learning region-level feature representation for dense prediction tasks. Our method is motivated by the three key factors in detection: localization, scale consistency and recognition. To explicitly encode absolute position and scale information, we propose a novel pretext task that assembles multi-scale images in a montage manner to mimic multi-object scenarios. Unlike the existing image-level self-supervised methods, our method constructs a multi-level contrastive loss that considers each sub-region of the montage image as a singleton. Our method enables the neural network to learn regional semantic representations for translation and scale consistency while reducing pre-training epochs to the same as supervised pre-training. Extensive experiments demonstrate that MCL consistently outperforms the recent state-of-the-art methods on various datasets with significant margins. In particular, MCL obtains 42.5 AP$^\mathrm{bb}$ and 38.3 AP$^\mathrm{mk}$ on COCO with the 1x schedule fintuning, when using Mask R-CNN with R50-FPN backbone pre-trained with 100 epochs. In comparison to MoCo, our method surpasses their performance by 4.0 AP$^\mathrm{bb}$ and 3.1 AP$^\mathrm{mk}$. Furthermore, we explore the alignment between pretext task and downstream tasks. We extend our pretext task to supervised pre-training, which achieves a similar performance to self-supervised learning. This result demonstrates the importance of the alignment between pretext task and downstream tasks, indicating the potential for wider applicability of our method beyond self-supervised settings.

NPC: Neural Point Characters from Video

  • Authors: Shih-Yang Su, Timur Bagautdinov, Helge Rhodin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.02013
  • Pdf link: https://arxiv.org/pdf/2304.02013
  • Abstract
    High-fidelity human 3D models can now be learned directly from videos, typically by combining a template-based surface model with neural representations. However, obtaining a template surface requires expensive multi-view capture systems, laser scans, or strictly controlled conditions. Previous methods avoid using a template but rely on a costly or ill-posed mapping from observation to canonical space. We propose a hybrid point-based representation for reconstructing animatable characters that does not require an explicit surface model, while being generalizable to novel poses. For a given video, our method automatically produces an explicit set of 3D points representing approximate canonical geometry, and learns an articulated deformation model that produces pose-dependent point transformations. The points serve both as a scaffold for high-frequency neural features and an anchor for efficiently mapping between observation and canonical space. We demonstrate on established benchmarks that our representation overcomes limitations of prior work operating in either canonical or in observation space. Moreover, our automatic point extraction approach enables learning models of human and animal characters alike, matching the performance of the methods using rigged surface templates despite being more general. Project website: https://lemonatsu.github.io/npc/

Keyword: faster

LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models

  • Authors: Rongqi Pan, Taher A. Ghaleb, Lionel Briand
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.01397
  • Pdf link: https://arxiv.org/pdf/2304.01397
  • Abstract
    Test suite minimization (TSM) is typically used to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources, while maintaining the fault detection capability of the test suite. Though many TSM approaches exist, most of them rely on code coverage (white-box) or model-based features, which are not always available for test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. Though ATM achieves a better trade-off between effectiveness and efficiency than FAST-R, it suffers from scalability issues for large software systems as its execution time increases rapidly with test suite size. To address scalability, we propose LTM, a scalable and black-box similarity-based TSM approach based on language models. To support similarity measurement, we investigated three different pre-trained language models: CodeBERT, GraphCodeBERT, and UniXcoder, to extract embeddings of test code (Java test methods), on which we computed two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used for minimizing test suites, thus reducing minimization time. Experimental results showed that the best configuration of LTM (using UniXcoder with Cosine similarity) outperformed the best two configurations of ATM by achieving significantly higher fault detection rates (0.84 versus 0.81, on average) and, more importantly, running much faster (26.73 minutes versus 72.75 minutes, on average) than ATM, in terms of both preparation time (up to two orders of magnitude faster) and minimization time (one order of magnitude faster).

TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings

  • Authors: Norman P. Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, Cliff Young, Xiang Zhou, Zongwei Zhou, David Patterson
  • Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.01433
  • Pdf link: https://arxiv.org/pdf/2304.01433
  • Abstract
    In response to innovations in machine learning (ML) models, production workloads changed radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its third supercomputer for such ML models. Optical circuit switches (OCSes) dynamically reconfigure its interconnect topology to improve scale, availability, utilization, modularity, deployment, security, power, and performance; users can pick a twisted 3D torus topology if desired. Much cheaper, lower power, and faster than Infiniband, OCSes and underlying optical components are <5% of system cost and <3% of system power. Each TPU v4 includes SparseCores, dataflow processors that accelerate models that rely on embeddings by 5x-7x yet use only 5% of die area and power. Deployed since 2020, TPU v4 outperforms TPU v3 by 2.1x and improves performance/Watt by 2.7x. The TPU v4 supercomputer is 4x larger at 4096 chips and thus ~10x faster overall, which along with OCS flexibility helps large language models. For similar sized systems, it is ~4.3x-4.5x faster than the Graphcore IPU Bow and is 1.2x-1.7x faster and uses 1.3x-1.9x less power than the Nvidia A100. TPU v4s inside the energy-optimized warehouse scale computers of Google Cloud use ~3x less energy and produce ~20x less CO2e than contemporary DSAs in a typical on-premise data center.

OneShotSTL: One-Shot Seasonal-Trend Decomposition For Online Time Series Anomaly Detection And Forecasting

  • Authors: Xiao He, Ye Li, Jian Tan, Bin Wu, Feifei Li
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.01506
  • Pdf link: https://arxiv.org/pdf/2304.01506
  • Abstract
    Seasonal-trend decomposition is one of the most fundamental concepts in time series analysis that supports various downstream tasks, including time series anomaly detection and forecasting. However, existing decomposition methods rely on batch processing with a time complexity of O(W), where W is the number of data points within a time window. Therefore, they cannot always efficiently support real-time analysis that demands low processing delay. To address this challenge, we propose OneShotSTL, an efficient and accurate algorithm that can decompose time series online with an update time complexity of O(1). OneShotSTL is more than $1,000$ times faster than the batch methods, with accuracy comparable to the best counterparts. Extensive experiments on real-world benchmark datasets for downstream time series anomaly detection and forecasting tasks demonstrate that OneShotSTL is from 10 to over 1,000 times faster than the state-of-the-art methods, while still providing comparable or even better accuracy.

IterativePFN: True Iterative Point Cloud Filtering

  • Authors: Dasith de Silva Edirimuni, Xuequan Lu, Zhiwen Shao, Gang Li, Antonio Robles-Kelly, Ying He
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01529
  • Pdf link: https://arxiv.org/pdf/2304.01529
  • Abstract
    The quality of point clouds is often limited by noise introduced during their capture process. Consequently, a fundamental 3D vision task is the removal of noise, known as point cloud filtering or denoising. State-of-the-art learning based methods focus on training neural networks to infer filtered displacements and directly shift noisy points onto the underlying clean surfaces. In high noise conditions, they iterate the filtering process. However, this iterative filtering is only done at test time and is less effective at ensuring points converge quickly onto the clean surfaces. We propose IterativePFN (iterative point cloud filtering network), which consists of multiple IterationModules that model the true iterative filtering process internally, within a single network. We train our IterativePFN network using a novel loss function that utilizes an adaptive ground truth target at each iteration to capture the relationship between intermediate filtering results during training. This ensures that the filtered results converge faster to the clean surfaces. Our method is able to obtain better performance compared to state-of-the-art methods. The source code can be found at: https://github.com/ddsediri/IterativePFN.

Black Box Few-Shot Adaptation for Vision-Language models

  • Authors: Yassine Ouali, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.01752
  • Pdf link: https://arxiv.org/pdf/2304.01752
  • Abstract
    Vision-Language (V-L) models trained with contrastive learning to align the visual and language modalities have been shown to be strong few-shot learners. Soft prompt learning is the method of choice for few-shot downstream adaption aiming to bridge the modality gap caused by the distribution shift induced by the new domain. While parameter-efficient, prompt learning still requires access to the model weights and can be computationally infeasible for large models with billions of parameters. To address these shortcomings, in this work, we describe a black-box method for V-L few-shot adaptation that (a) operates on pre-computed image and text features and hence works without access to the model's weights, (b) it is orders of magnitude faster at training time, (c) it is amenable to both supervised and unsupervised training, and (d) it can be even used to align image and text features computed from uni-modal models. To achieve this, we propose Linear Feature Alignment (LFA), a simple linear approach for V-L re-alignment in the target domain. LFA is initialized from a closed-form solution to a least-squares problem and then it is iteratively updated by minimizing a re-ranking loss. Despite its simplicity, our approach can even surpass soft-prompt learning methods as shown by extensive experiments on 11 image and 2 video datasets.

Imitation Learning from Nonlinear MPC via the Exact Q-Loss and its Gauss-Newton Approximation

  • Authors: Andrea Ghezzi, Jasper Hoffman, Jonathan Frey, Joschka Boedecker, Moritz Diehl
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.01782
  • Pdf link: https://arxiv.org/pdf/2304.01782
  • Abstract
    This work presents a novel loss function for learning nonlinear Model Predictive Control policies via Imitation Learning. Standard approaches to Imitation Learning neglect information about the expert and generally adopt a loss function based on the distance between expert and learned controls. In this work, we present a loss based on the Q-function directly embedding the performance objectives and constraint satisfaction of the associated Optimal Control Problem (OCP). However, training a Neural Network with the Q-loss requires solving the associated OCP for each new sample. To alleviate the computational burden, we derive a second Q-loss based on the Gauss-Newton approximation of the OCP resulting in a faster training time. We validate our losses against Behavioral Cloning, the standard approach to Imitation Learning, on the control of a nonlinear system with constraints. The final results show that the Q-function-based losses significantly reduce the amount of constraint violations while achieving comparable or better closed-loop costs.

Keyword: mobile

Learned Tree Search for Long-Horizon Social Robot Navigation in Shared Airspace

  • Authors: Ingrid Navarro, Jay Patrikar, Joao P. A. Dantas, Rohan Baijal, Ian Higgins, Sebastian Scherer, Jean Oh
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2304.01428
  • Pdf link: https://arxiv.org/pdf/2304.01428
  • Abstract
    The fast-growing demand for fully autonomous aerial operations in shared spaces necessitates developing trustworthy agents that can safely and seamlessly navigate in crowded, dynamic spaces. In this work, we propose Social Robot Tree Search (SoRTS), an algorithm for the safe navigation of mobile robots in social domains. SoRTS aims to augment existing socially-aware trajectory prediction policies with a Monte Carlo Tree Search planner for improved downstream navigation of mobile robots. To evaluate the performance of our method, we choose the use case of social navigation for general aviation. To aid this evaluation, within this work, we also introduce X-PlaneROS, a high-fidelity aerial simulator, to enable more research in full-scale aerial autonomy. By conducting a user study based on the assessments of 26 FAA certified pilots, we show that SoRTS performs comparably to a competent human pilot, significantly outperforming our baseline algorithm. We further complement these results with self-play experiments in scenarios with increasing complexity.

End-to-End Latency Optimization of Multi-view 3D Reconstruction for Disaster Response

  • Authors: Xiaojie Zhang, Mingjun Li, Andrew Hilton, Amitangshu Pal, Soumyabrata Dey, Saptarshi Debroy
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01488
  • Pdf link: https://arxiv.org/pdf/2304.01488
  • Abstract
    In order to plan rapid response during disasters, first responder agencies often adopt `bring your own device' (BYOD) model with inexpensive mobile edge devices (e.g., drones, robots, tablets) for complex video analytics applications, e.g., 3D reconstruction of a disaster scene. Unlike simpler video applications, widely used Multi-view Stereo (MVS) based 3D reconstruction applications (e.g., openMVG/openMVS) are exceedingly time consuming, especially when run on such computationally constrained mobile edge devices. Additionally, reducing the reconstruction latency of such inherently sequential algorithms is challenging as unintelligent, application-agnostic strategies can drastically degrade the reconstruction (i.e., application outcome) quality making them useless. In this paper, we aim to design a latency optimized MVS algorithm pipeline, with the objective to best balance the end-to-end latency and reconstruction quality by running the pipeline on a collaborative mobile edge environment. The overall optimization approach is two-pronged where: (a) application optimizations introduce data-level parallelism by splitting the pipeline into high frequency and low frequency reconstruction components and (b) system optimizations incorporate task-level parallelism to the pipelines by running them opportunistically on available resources with online quality control in order to balance both latency and quality. Our evaluation on a hardware testbed using publicly available datasets shows upto ~54% reduction in latency with negligible loss (~4-7%) in reconstruction quality.

FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2

  • Authors: Kohav Dey, Krishna Bajaj, K S Ramalakshmi, Samuel Thomas, Sriram Radhakrishna
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01524
  • Pdf link: https://arxiv.org/pdf/2304.01524
  • Abstract
    Marine ecosystems are vital for the planet's health, but human activities such as climate change, pollution, and overfishing pose a constant threat to marine species. Accurate classification and monitoring of these species can aid in understanding their distribution, population dynamics, and the impact of human activities on them. However, classifying marine species can be challenging due to their vast diversity and the complex underwater environment. With advancements in computer performance and GPU-based computing, deep-learning algorithms can now efficiently classify marine species, making it easier to monitor and manage marine ecosystems. In this paper, we propose an optimization to the MobileNetV2 model to achieve a 99.83% average validation accuracy by highlighting specific guidelines for creating a dataset and augmenting marine species images. This transfer learning algorithm can be deployed successfully on a mobile application for on-site classification at fisheries.

Energy-Saving Strategies for Mobile Web Apps and their Measurement: Results from a Decade of Research (Preprint)

  • Authors: Benedikt Dornauer, Michael Felderer
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.01646
  • Pdf link: https://arxiv.org/pdf/2304.01646
  • Abstract
    In 2022, over half of the web traffic was accessed through mobile devices. By reducing the energy consumption of mobile web apps, we can not only extend the battery life of our devices, but also make a significant contribution to energy conservation efforts. For example, if we could save only 5% of the energy used by web apps, we estimate that it would be enough to shut down one of the nuclear reactors in Fukushima. This paper presents a comprehensive overview of energy-saving experiments and related approaches for mobile web apps, relevant for researchers and practitioners. To achieve this objective, we conducted a systematic literature review and identified 44 primary studies for inclusion. Through the mapping and analysis of scientific papers, this work contributes: (1) an overview of the energy-draining aspects of mobile web apps, (2) a comprehensive description of the methodology used for the energy-saving experiments, and (3) a categorization and synthesis of various energy-saving approaches.

Model Predictive Control for Multi-Agent Systems under Limited Communication and Time-Varying Network Topology

  • Authors: Danilo Saccani, Lorenzo Fagiano, Melanie N. Zeilinger, Andrea Carron
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.01649
  • Pdf link: https://arxiv.org/pdf/2304.01649
  • Abstract
    In control system networks, reconfiguration of the controller when agents are leaving or joining the network is still an open challenge, in particular when operation constraints that depend on each agent's behavior must be met. Drawing our motivation from mobile robot swarms, in this paper, we address this problem by optimizing individual agent performance while guaranteeing persistent constraint satisfaction in presence of bounded communication range and time-varying network topology. The approach we propose is a model predictive control (MPC) formulation, building on multi-trajectory MPC (mt-MPC) concepts. To enable plug and play operations when the system is in closed-loop without the need of a request, the proposed MPC scheme predicts two different state trajectories in the same finite horizon optimal control problem. One trajectory drives the system to the desired target, assuming that the network topology will not change in the prediction horizon, while the second one ensures constraint satisfaction assuming a worst-case scenario in terms of new agents joining the network in the planning horizon. Recursive feasibility and stability of the closed-loop system during plug and play operations are shown. The approach effectiveness is illustrated with a numerical simulation.

Keyword: pruning

PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching

  • Authors: Pedro Castro, Tae-Kyun Kim
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01382
  • Pdf link: https://arxiv.org/pdf/2304.01382
  • Abstract
    Estimating the pose of an unseen object is the goal of the challenging one-shot pose estimation task. Previous methods have heavily relied on feature matching with great success. However, these methods are often inefficient and limited by their reliance on pre-trained models that have not be designed specifically for pose estimation. In this paper we propose PoseMatcher, an accurate model free one-shot object pose estimator that overcomes these limitations. We create a new training pipeline for object to image matching based on a three-view system: a query with a positive and negative templates. This simple yet effective approach emulates test time scenarios by cheaply constructing an approximation of the full object point cloud during training. To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer, a new attention layer that efficiently accommodates self and cross attention between the inputs. Moreover, we propose a pruning strategy where we iteratively remove redundant regions of the target object to further reduce the complexity and noise of the network while maintaining accuracy. Finally we redesign commonly used pose refinement strategies, zoom and 2D offset refinements, and adapt them to the one-shot paradigm. We outperform all prior one-shot pose estimation methods on the Linemod and YCB-V datasets as well achieve results rivaling recent instance-level methods. The source code and models are available at https://github.com/PedroCastro/PoseMatcher.

Attention Map Guided Transformer Pruning for Edge Device

  • Authors: Junzhu Mao, Yazhou Yao, Zeren Sun, Xingguo Huang, Fumin Shen, Heng-Tao Shen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01452
  • Pdf link: https://arxiv.org/pdf/2304.01452
  • Abstract
    Due to its significant capability of modeling long-range dependencies, vision transformer (ViT) has achieved promising success in both holistic and occluded person re-identification (Re-ID) tasks. However, the inherent problems of transformers such as the huge computational cost and memory footprint are still two unsolved issues that will block the deployment of ViT based person Re-ID models on resource-limited edge devices. Our goal is to reduce both the inference complexity and model size without sacrificing the comparable accuracy on person Re-ID, especially for tasks with occlusion. To this end, we propose a novel attention map guided (AMG) transformer pruning method, which removes both redundant tokens and heads with the guidance of the attention map in a hardware-friendly way. We first calculate the entropy in the key dimension and sum it up for the whole map, and the corresponding head parameters of maps with high entropy will be removed for model size reduction. Then we combine the similarity and first-order gradients of key tokens along the query dimension for token importance estimation and remove redundant key and value tokens to further reduce the inference complexity. Comprehensive experiments on Occluded DukeMTMC and Market-1501 demonstrate the effectiveness of our proposals. For example, our proposed pruning strategy on ViT-Base enjoys \textup{\textbf{29.4%}} \textup{\textbf{FLOPs}} savings with \textup{\textbf{0.2%}} drop on Rank-1 and \textup{\textbf{0.4%}} improvement on mAP, respectively.

Accelerating and Compressing Deep Neural Networks for Massive MIMO CSI Feedback

  • Authors: Omar Erak, Hatem Abou-Zeid
  • Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.01914
  • Pdf link: https://arxiv.org/pdf/2304.01914
  • Abstract
    The recent advances in machine learning and deep neural networks have made them attractive candidates for wireless communications functions such as channel estimation, decoding, and downlink channel state information (CSI) compression. However, most of these neural networks are large and inefficient making it a barrier for deployment in practical wireless systems that require low-latency and low memory footprints for individual network functions. To mitigate these limitations, we propose accelerated and compressed efficient neural networks for massive MIMO CSI feedback. Specifically, we have thoroughly investigated the adoption of network pruning, post-training dynamic range quantization, and weight clustering to optimize CSI feedback compression for massive MIMO systems. Furthermore, we have deployed the proposed model compression techniques on commodity hardware and demonstrated that in order to achieve inference gains, specialized libraries that accelerate computations for sparse neural networks are required. Our findings indicate that there is remarkable value in applying these model compression techniques and the proposed joint pruning and quantization approach reduced model size by 86.5% and inference time by 76.2% with minimal impact to model accuracy. These compression methods are crucial to pave the way for practical adoption and deployments of deep learning-based techniques in commercial wireless systems.

Keyword: voxel

Unsupervised Brain Tumor Segmentation with Image-based Prompts

  • Authors: Xinru Zhang, Ni Ou, Chenghao Liu, Zhizheng Zhuo, Yaou Liu, Chuyang Ye
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01472
  • Pdf link: https://arxiv.org/pdf/2304.01472
  • Abstract
    Automated brain tumor segmentation based on deep learning (DL) has achieved promising performance. However, it generally relies on annotated images for model training, which is not always feasible in clinical settings. Therefore, the development of unsupervised DL-based brain tumor segmentation approaches without expert annotations is desired. Motivated by the success of prompt learning (PL) in natural language processing, we propose an approach to unsupervised brain tumor segmentation by designing image-based prompts that allow indication of brain tumors, and this approach is dubbed as PL-based Brain Tumor Segmentation (PL-BTS). Specifically, instead of directly training a model for brain tumor segmentation with a large amount of annotated data, we seek to train a model that can answer the question: is a voxel in the input image associated with tumor-like hyper-/hypo-intensity? Such a model can be trained by artificially generating tumor-like hyper-/hypo-intensity on images without tumors with hand-crafted designs. Since the hand-crafted designs may be too simplistic to represent all kinds of real tumors, the trained model may overfit the simplistic hand-crafted task rather than actually answer the question of abnormality. To address this problem, we propose the use of a validation task, where we generate a different hand-crafted task to monitor overfitting. In addition, we propose PL-BTS+ that further improves PL-BTS by exploiting unannotated images with brain tumors. Compared with competing unsupervised methods, the proposed method has achieved marked improvements on both public and in-house datasets, and we have also demonstrated its possible extension to other brain lesion segmentation tasks.

FineRecon: Depth-aware Feed-forward Network for Detailed 3D Reconstruction

  • Authors: Noah Stier, Anurag Ranjan, Alex Colburn, Yajie Yan, Liang Yang, Fangchang Ma, Baptiste Angles
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01480
  • Pdf link: https://arxiv.org/pdf/2304.01480
  • Abstract
    Recent works on 3D reconstruction from posed images have demonstrated that direct inference of scene-level 3D geometry without iterative optimization is feasible using a deep neural network, showing remarkable promise and high efficiency. However, the reconstructed geometries, typically represented as a 3D truncated signed distance function (TSDF), are often coarse without fine geometric details. To address this problem, we propose three effective solutions for improving the fidelity of inference-based 3D reconstructions. We first present a resolution-agnostic TSDF supervision strategy to provide the network with a more accurate learning signal during training, avoiding the pitfalls of TSDF interpolation seen in previous work. We then introduce a depth guidance strategy using multi-view depth estimates to enhance the scene representation and recover more accurate surfaces. Finally, we develop a novel architecture for the final layers of the network, conditioning the output TSDF prediction on high-resolution image features in addition to coarse voxel features, enabling sharper reconstruction of fine details. Our method produces smooth and highly accurate reconstructions, showing significant improvements across multiple depth and 3D reconstruction metrics.

Keyword: lidar

LiDAR-Based 3D Object Detection via Hybrid 2D Semantic Scene Generation

  • Authors: Haitao Yang, Zaiwei Zhang, Xiangru Huang, Min Bai, Chen Song, Bo Sun, Li Erran Li, Qixing Huang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01519
  • Pdf link: https://arxiv.org/pdf/2304.01519
  • Abstract
    Bird's-Eye View (BEV) features are popular intermediate scene representations shared by the 3D backbone and the detector head in LiDAR-based object detectors. However, little research has been done to investigate how to incorporate additional supervision on the BEV features to improve proposal generation in the detector head, while still balancing the number of powerful 3D layers and efficient 2D network operations. This paper proposes a novel scene representation that encodes both the semantics and geometry of the 3D environment in 2D, which serves as a dense supervision signal for better BEV feature learning. The key idea is to use auxiliary networks to predict a combination of explicit and implicit semantic probabilities by exploiting their complementary properties. Extensive experiments show that our simple yet effective design can be easily integrated into most state-of-the-art 3D object detectors and consistently improves upon baseline models.

USTC FLICAR: A Multisensor Fusion Dataset of LiDAR-Inertial-Camera for Heavy-duty Autonomous Aerial Work Robots

  • Authors: Ziming Wang, Yujiang Liu, Yifan Duan, Xingchen Li, Xinran Zhang, Jianmin Ji, Erbao Dong, Yanyong Zhang
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01986
  • Pdf link: https://arxiv.org/pdf/2304.01986
  • Abstract
    In this paper, we present the USTC FLICAR Dataset, which is dedicated to the development of simultaneous localization and mapping and precise 3D reconstruction of the workspace for heavy-duty autonomous aerial work robots. In recent years, numerous public datasets have played significant roles in the advancement of autonomous cars and unmanned aerial vehicles (UAVs). However, these two platforms differ from aerial work robots: UAVs are limited in their payload capacity, while cars are restricted to two-dimensional movements. To fill this gap, we create the Giraffe mapping robot based on a bucket truck, which is equipped with a variety of well-calibrated and synchronized sensors: four 3D LiDARs, two stereo cameras, two monocular cameras, Inertial Measurement Units (IMUs), and a GNSS/INS system. A laser tracker is used to record the millimeter-level ground truth positions. We also make its ground twin, the Okapi mapping robot, to gather data for comparison. The proposed dataset extends the typical autonomous driving sensing suite to aerial scenes. Therefore, the dataset is named FLICAR to denote flying cars. We believe this dataset can also represent the flying car scenarios, specifically the takeoff and landing of VTOL (Vertical Takeoff and Landing) flying cars. The dataset is available for download at: https://ustc-flicar.github.io.

Keyword: diffusion

NeuroDAVIS: A neural network model for data visualization

  • Authors: Chayan Maitra, Dibyendu B. Seal, Rajat K. De
  • Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.01222
  • Pdf link: https://arxiv.org/pdf/2304.01222
  • Abstract
    The task of dimensionality reduction and visualization of high-dimensional datasets remains a challenging problem since long. Modern high-throughput technologies produce newer high-dimensional datasets having multiple views with relatively new data types. Visualization of these datasets require proper methodology that can uncover hidden patterns in the data without affecting the local and global structures within the data. To this end, however, very few such methodology exist, which can realise this task. In this work, we have introduced a novel unsupervised deep neural network model, called NeuroDAVIS, for data visualization. NeuroDAVIS is capable of extracting important features from the data, without assuming any data distribution, and visualize effectively in lower dimension. It has been shown theoritically that neighbourhood relationship of the data in high dimension remains preserved in lower dimension. The performance of NeuroDAVIS has been evaluated on a wide variety of synthetic and real high-dimensional datasets including numeric, textual, image and biological data. NeuroDAVIS has been highly competitive against both t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) with respect to visualization quality, and preservation of data size, shape, and both local and global structure. It has outperformed Fast interpolation-based t-SNE (Fit-SNE), a variant of t-SNE, for most of the high-dimensional datasets as well. For the biological datasets, besides t-SNE, UMAP and Fit-SNE, NeuroDAVIS has also performed well compared to other state-of-the-art algorithms, like Potential of Heat-diffusion for Affinity-based Trajectory Embedding (PHATE) and the siamese neural network-based method, called IVIS. Downstream classification and clustering analyses have also revealed favourable results for NeuroDAVIS-generated embeddings.

Generative Diffusion Prior for Unified Image Restoration and Enhancement

  • Authors: Ben Fei, Zhaoyang Lyu, Liang Pan, Junzhe Zhang, Weidong Yang, Tianyue Luo, Bo Zhang, Bo Dai
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01247
  • Pdf link: https://arxiv.org/pdf/2304.01247
  • Abstract
    Existing image restoration methods mostly leverage the posterior distribution of natural images. However, they often assume known degradation and also require supervised training, which restricts their adaptation to complex real applications. In this work, we propose the Generative Diffusion Prior (GDP) to effectively model the posterior distributions in an unsupervised sampling manner. GDP utilizes a pre-train denoising diffusion generative model (DDPM) for solving linear inverse, non-linear, or blind problems. Specifically, GDP systematically explores a protocol of conditional guidance, which is verified more practical than the commonly used guidance way. Furthermore, GDP is strength at optimizing the parameters of degradation model during the denoising process, achieving blind image restoration. Besides, we devise hierarchical guidance and patch-based methods, enabling the GDP to generate images of arbitrary resolutions. Experimentally, we demonstrate GDP's versatility on several image datasets for linear problems, such as super-resolution, deblurring, inpainting, and colorization, as well as non-linear and blind issues, such as low-light enhancement and HDR image recovery. GDP outperforms the current leading unsupervised methods on the diverse benchmarks in reconstruction quality and perceptual quality. Moreover, GDP also generalizes well for natural images or synthesized images with arbitrary sizes from various tasks out of the distribution of the ImageNet training set.

The Interconnected Nature of Online Harm and Moderation: Investigating the Cross-Platform Spread of Harmful Content between YouTube and Twitter

  • Authors: Valerio La Gatta, Luca Luceri, Francesco Fabbri, Emilio Ferrra
  • Subjects: Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.01371
  • Pdf link: https://arxiv.org/pdf/2304.01371
  • Abstract
    The proliferation of harmful content shared online poses a threat to online information integrity and the integrity of discussion across platforms. Despite various moderation interventions adopted by social media platforms, researchers and policymakers are calling for holistic solutions. This study explores how a target platform could leverage content that has been deemed harmful on a source platform by investigating the behavior and characteristics of Twitter users responsible for sharing moderated YouTube videos. Using a large-scale dataset of 600M tweets related to the 2020 U.S. election, we find that moderated Youtube videos are extensively shared on Twitter and that users who share these videos also endorse extreme and conspiratorial ideologies. A fraction of these users are eventually suspended by Twitter, but they do not appear to be involved in state-backed information operations. The findings of this study highlight the complex and interconnected nature of harmful cross-platform information diffusion, raising the need for cross-platform moderation strategies.

Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models

  • Authors: Jaewoong Lee, Sangwon Jang, Jaehyeong Jo, Jaehong Yoon, Yunji Kim, Jin-Hwa Kim, Jung-Woo Ha, Sung Ju Hwang
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01515
  • Pdf link: https://arxiv.org/pdf/2304.01515
  • Abstract
    Token-based masked generative models are gaining popularity for their fast inference time with parallel decoding. While recent token-based approaches achieve competitive performance to diffusion-based models, their generation performance is still suboptimal as they sample multiple tokens simultaneously without considering the dependence among them. We empirically investigate this problem and propose a learnable sampling model, Text-Conditioned Token Selection (TCTS), to select optimal tokens via localized supervision with text information. TCTS improves not only the image quality but also the semantic alignment of the generated images with the given texts. To further improve the image quality, we introduce a cohesive sampling strategy, Frequency Adaptive Sampling (FAS), to each group of tokens divided according to the self-attention maps. We validate the efficacy of TCTS combined with FAS with various generative tasks, demonstrating that it significantly outperforms the baselines in image-text alignment and image quality. Our text-conditioned sampling framework further reduces the original inference time by more than 50% without modifying the original generative model.

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

  • Authors: Mengchun Zhang, Maryam Qamar, Taegoo Kang, Yuna Jung, Chenshuang Zhang, Sung-Ho Bae, Chaoning Zhang
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01565
  • Pdf link: https://arxiv.org/pdf/2304.01565
  • Abstract
    Diffusion models have become a new SOTA generative modeling method in various fields, for which there are multiple survey works that provide an overall survey. With the number of articles on diffusion models increasing exponentially in the past few years, there is an increasing need for surveys of diffusion models on specific fields. In this work, we are committed to conducting a survey on the graph diffusion models. Even though our focus is to cover the progress of diffusion models in graphs, we first briefly summarize how other generative modeling methods are used for graphs. After that, we introduce the mechanism of diffusion models in various forms, which facilitates the discussion on the graph diffusion models. The applications of graph diffusion models mainly fall into the category of AI-generated content (AIGC) in science, for which we mainly focus on how graph diffusion models are utilized for generating molecules and proteins but also cover other cases, including materials design. Moreover, we discuss the issue of evaluating diffusion models in the graph domain and the existing challenges.

Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

  • Authors: Davis Rempe, Zhengyi Luo, Xue Bin Peng, Ye Yuan, Kris Kitani, Karsten Kreis, Sanja Fidler, Or Litany
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.01893
  • Pdf link: https://arxiv.org/pdf/2304.01893
  • Abstract
    We introduce a method for generating realistic pedestrian trajectories and full-body animations that can be controlled to meet user-defined goals. We draw on recent advances in guided diffusion modeling to achieve test-time controllability of trajectories, which is normally only associated with rule-based systems. Our guided diffusion model allows users to constrain trajectories through target waypoints, speed, and specified social groups while accounting for the surrounding environment context. This trajectory diffusion model is integrated with a novel physics-based humanoid controller to form a closed-loop, full-body pedestrian animation system capable of placing large crowds in a simulated environment with varying terrains. We further propose utilizing the value function learned during RL training of the animation controller to guide diffusion to produce trajectories better suited for particular scenarios such as collision avoidance and traversing uneven terrain. Video results are available on the project page at https://nv-tlabs.github.io/trace-pace .

PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain Gap Using Pose-Preserved Text-to-Image Diffusion

  • Authors: Gwanghyun Kim, Ji Ha Jang, Se Young Chun
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.01900
  • Pdf link: https://arxiv.org/pdf/2304.01900
  • Abstract
    Recently, significant advancements have been made in 3D generative models, however training these models across diverse domains is challenging and requires an huge amount of training data and knowledge of pose distribution. Text-guided domain adaptation methods have allowed the generator to be adapted to the target domains using text prompts, thereby obviating the need for assembling numerous data. Recently, DATID-3D presents impressive quality of samples in text-guided domain, preserving diversity in text by leveraging text-to-image diffusion. However, adapting 3D generators to domains with significant domain gaps from the source domain still remains challenging due to issues in current text-to-image diffusion models as following: 1) shape-pose trade-off in diffusion-based translation, 2) pose bias, and 3) instance bias in the target domain, resulting in inferior 3D shapes, low text-image correspondence, and low intra-domain diversity in the generated samples. To address these issues, we propose a novel pipeline called PODIA-3D, which uses pose-preserved text-to-image diffusion-based domain adaptation for 3D generative models. We construct a pose-preserved text-to-image diffusion model that allows the use of extremely high-level noise for significant domain changes. We also propose specialized-to-general sampling strategies to improve the details of the generated samples. Moreover, to overcome the instance bias, we introduce a text-guided debiasing method that improves intra-domain diversity. Consequently, our method successfully adapts 3D generators across significant domain gaps. Our qualitative results and user study demonstrates that our approach outperforms existing 3D text-guided domain adaptation methods in terms of text-image correspondence, realism, diversity of rendered images, and sense of depth of 3D shapes in the generated samples

Keyword: dynamic

SEENN: Towards Temporal Spiking Early-Exit Neural Networks

  • Authors: Yuhang Li, Tamar Geller, Youngeun Kim, Priyadarshini Panda
  • Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.01230
  • Pdf link: https://arxiv.org/pdf/2304.01230
  • Abstract
    Spiking Neural Networks (SNNs) have recently become more popular as a biologically plausible substitute for traditional Artificial Neural Networks (ANNs). SNNs are cost-efficient and deployment-friendly because they process input in both spatial and temporal manners using binary spikes. However, we observe that the information capacity in SNNs is affected by the number of timesteps, leading to an accuracy-efficiency tradeoff. In this work, we study a fine-grained adjustment of the number of timesteps in SNNs. Specifically, we treat the number of timesteps as a variable conditioned on different input samples to reduce redundant timesteps for certain data. We call our method Spiking Early-Exit Neural Networks (SEENNs). To determine the appropriate number of timesteps, we propose SEENN-I which uses a confidence score thresholding to filter out the uncertain predictions, and SEENN-II which determines the number of timesteps by reinforcement learning. Moreover, we demonstrate that SEENN is compatible with both the directly trained SNN and the ANN-SNN conversion. By dynamically adjusting the number of timesteps, our SEENN achieves a remarkable reduction in the average number of timesteps during inference. For example, our SEENN-II ResNet-19 can achieve 96.1% accuracy with an average of 1.08 timesteps on the CIFAR-10 test dataset.

Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation

  • Authors: Yan Jin, Mengke LI, Yang Lu, Yiu-ming Cheung, Hanzi Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01279
  • Pdf link: https://arxiv.org/pdf/2304.01279
  • Abstract
    Deep neural networks have made huge progress in the last few decades. However, as the real-world data often exhibits a long-tailed distribution, vanilla deep models tend to be heavily biased toward the majority classes. To address this problem, state-of-the-art methods usually adopt a mixture of experts (MoE) to focus on different parts of the long-tailed distribution. Experts in these methods are with the same model depth, which neglects the fact that different classes may have different preferences to be fit by models with different depths. To this end, we propose a novel MoE-based method called Self-Heterogeneous Integration with Knowledge Excavation (SHIKE). We first propose Depth-wise Knowledge Fusion (DKF) to fuse features between different shallow parts and the deep part in one network for each expert, which makes experts more diverse in terms of representation. Based on DKF, we further propose Dynamic Knowledge Transfer (DKT) to reduce the influence of the hardest negative class that has a non-negligible impact on the tail classes in our MoE framework. As a result, the classification accuracy of long-tailed data can be significantly improved, especially for the tail classes. SHIKE achieves the state-of-the-art performance of 56.3%, 60.3%, 75.4%, and 41.9% on CIFAR100-LT (IF100), ImageNet-LT, iNaturalist 2018, and Places-LT, respectively.

Non-Generative Energy Based Models

  • Authors: Jacob Piland, Christopher Sweet, Priscila Saboia, Charles Vardeman II, Adam Czajka
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01297
  • Pdf link: https://arxiv.org/pdf/2304.01297
  • Abstract
    Energy-based models (EBM) have become increasingly popular within computer vision. EBMs bring a probabilistic approach to training deep neural networks (DNN) and have been shown to enhance performance in areas such as calibration, out-of-distribution detection, and adversarial resistance. However, these advantages come at the cost of estimating input data probabilities, usually using a Langevin based method such as Stochastic Gradient Langevin Dynamics (SGLD), which bring additional computational costs, require parameterization, caching methods for efficiency, and can run into stability and scaling issues. EBMs use dynamical methods to draw samples from the probability density function (PDF) defined by the current state of the network and compare them to the training data using a maximum log likelihood approach to learn the correct PDF. We propose a non-generative training approach, Non-Generative EBM (NG-EBM), that utilizes the {\it{Approximate Mass}}, identified by Grathwohl et al., as a loss term to direct the training. We show that our NG-EBM training strategy retains many of the benefits of EBM in calibration, out-of-distribution detection, and adversarial resistance, but without the computational complexity and overhead of the traditional approaches. In particular, the NG-EBM approach improves the Expected Calibration Error by a factor of 2.5 for CIFAR10 and 7.5 times for CIFAR100, when compared to traditionally trained models.

Lilac: a Modal Separation Logic for Conditional Probability

  • Authors: John M. Li, Amal Ahmed, Steven Holtzen
  • Subjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2304.01339
  • Pdf link: https://arxiv.org/pdf/2304.01339
  • Abstract
    We present Lilac, a separation logic for reasoning about probabilistic programs where separating conjunction captures probabilistic independence. Inspired by an analogy with mutable state where sampling corresponds to dynamic allocation, we show how probability spaces over a fixed, ambient sample space appear to be the natural analogue of heap fragments, and present a new combining operation on them such that probability spaces behave like heaps and measurability of random variables behaves like ownership. This combining operation forms the basis for our model of separation, and produces a logic with many pleasant properties. In particular, Lilac has a frame rule identical to the ordinary one, and naturally accommodates advanced features like continuous random variables and reasoning about quantitative properties of programs. Then we propose a new modality based on disintegration theory for reasoning about conditional probability. We show how the resulting modal logic validates examples from prior work, and give a formal verification of an intricate weighted sampling algorithm whose correctness depends crucially on conditional independence structure.

Accelerated parallel MRI using memory efficient and robust monotone operator learning (MOL)

  • Authors: Aniket Pramanik, Mathews Jacob
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.01351
  • Pdf link: https://arxiv.org/pdf/2304.01351
  • Abstract
    Model-based deep learning methods that combine imaging physics with learned regularization priors have been emerging as powerful tools for parallel MRI acceleration. The main focus of this paper is to determine the utility of the monotone operator learning (MOL) framework in the parallel MRI setting. The MOL algorithm alternates between a gradient descent step using a monotone convolutional neural network (CNN) and a conjugate gradient algorithm to encourage data consistency. The benefits of this approach include similar guarantees as compressive sensing algorithms including uniqueness, convergence, and stability, while being significantly more memory efficient than unrolled methods. We validate the proposed scheme by comparing it with different unrolled algorithms in the context of accelerated parallel MRI for static and dynamic settings.

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

  • Authors: Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, Oskar van der Wal
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.01373
  • Pdf link: https://arxiv.org/pdf/2304.01373
  • Abstract
    How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \textit{Pythia}, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend \textit{Pythia} to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at https://github.com/EleutherAI/pythia.

Lidar based 3D Tracking and State Estimation of Dynamic Objects

  • Authors: Patil Shubham Suresh, Gautham Narayan Narasimhan
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01396
  • Pdf link: https://arxiv.org/pdf/2304.01396
  • Abstract
    State estimation of oncoming vehicles: Earlier research has been based on determining states like position, velocity, orientation , angular velocity, etc of ego-vehicle. Our approach focuses on estimating the states of non-ego vehicles which is crucial for Motion planning and decision-making. Dynamic Scene Based Localization: Our project will work on dynamic scenes like moving ego (self) and non-ego vehicles. Previous methods were focused on static environments.

Learned Tree Search for Long-Horizon Social Robot Navigation in Shared Airspace

  • Authors: Ingrid Navarro, Jay Patrikar, Joao P. A. Dantas, Rohan Baijal, Ian Higgins, Sebastian Scherer, Jean Oh
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2304.01428
  • Pdf link: https://arxiv.org/pdf/2304.01428
  • Abstract
    The fast-growing demand for fully autonomous aerial operations in shared spaces necessitates developing trustworthy agents that can safely and seamlessly navigate in crowded, dynamic spaces. In this work, we propose Social Robot Tree Search (SoRTS), an algorithm for the safe navigation of mobile robots in social domains. SoRTS aims to augment existing socially-aware trajectory prediction policies with a Monte Carlo Tree Search planner for improved downstream navigation of mobile robots. To evaluate the performance of our method, we choose the use case of social navigation for general aviation. To aid this evaluation, within this work, we also introduce X-PlaneROS, a high-fidelity aerial simulator, to enable more research in full-scale aerial autonomy. By conducting a user study based on the assessments of 26 FAA certified pilots, we show that SoRTS performs comparably to a competent human pilot, significantly outperforming our baseline algorithm. We further complement these results with self-play experiments in scenarios with increasing complexity.

TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings

  • Authors: Norman P. Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, Cliff Young, Xiang Zhou, Zongwei Zhou, David Patterson
  • Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2304.01433
  • Pdf link: https://arxiv.org/pdf/2304.01433
  • Abstract
    In response to innovations in machine learning (ML) models, production workloads changed radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its third supercomputer for such ML models. Optical circuit switches (OCSes) dynamically reconfigure its interconnect topology to improve scale, availability, utilization, modularity, deployment, security, power, and performance; users can pick a twisted 3D torus topology if desired. Much cheaper, lower power, and faster than Infiniband, OCSes and underlying optical components are <5% of system cost and <3% of system power. Each TPU v4 includes SparseCores, dataflow processors that accelerate models that rely on embeddings by 5x-7x yet use only 5% of die area and power. Deployed since 2020, TPU v4 outperforms TPU v3 by 2.1x and improves performance/Watt by 2.7x. The TPU v4 supercomputer is 4x larger at 4096 chips and thus ~10x faster overall, which along with OCS flexibility helps large language models. For similar sized systems, it is ~4.3x-4.5x faster than the Graphcore IPU Bow and is 1.2x-1.7x faster and uses 1.3x-1.9x less power than the Nvidia A100. TPU v4s inside the energy-optimized warehouse scale computers of Google Cloud use ~3x less energy and produce ~20x less CO2e than contemporary DSAs in a typical on-premise data center.

Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

  • Authors: Ziqian Bai, Feitong Tan, Zeng Huang, Kripasindhu Sarkar, Danhang Tang, Di Qiu, Abhimitra Meka, Ruofei Du, Mingsong Dou, Sergio Orts-Escolano, Rohit Pandey, Ping Tan, Thabo Beeler, Sean Fanello, Yinda Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.01436
  • Pdf link: https://arxiv.org/pdf/2304.01436
  • Abstract
    We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild. The learnt avatar is driven by a parametric face model to achieve user-controlled facial expressions and head poses. Our hybrid pipeline combines the geometry prior and dynamic tracking of a 3DMM with a neural radiance field to achieve fine-grained control and photorealism. To reduce over-smoothing and improve out-of-model expressions synthesis, we propose to predict local features anchored on the 3DMM geometry. These learnt features are driven by 3DMM deformation and interpolated in 3D space to yield the volumetric radiance at a designated query point. We further show that using a Convolutional Neural Network in the UV space is critical in incorporating spatial context and producing representative local features. Extensive experiments show that we are able to reconstruct high-quality avatars, with more accurate expression-dependent details, good generalization to out-of-training expressions, and quantitatively superior renderings compared to other state-of-the-art approaches.

Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection

  • Authors: Chuandong Liu, Chenqiang Gao, Fangcen Liu, Pengcheng Li, Deyu Meng, Xinbo Gao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01464
  • Pdf link: https://arxiv.org/pdf/2304.01464
  • Abstract
    State-of-the-art 3D object detectors are usually trained on large-scale datasets with high-quality 3D annotations. However, such 3D annotations are often expensive and time-consuming, which may not be practical for real applications. A natural remedy is to adopt semi-supervised learning (SSL) by leveraging a limited amount of labeled samples and abundant unlabeled samples. Current pseudolabeling-based SSL object detection methods mainly adopt a teacher-student framework, with a single fixed threshold strategy to generate supervision signals, which inevitably brings confused supervision when guiding the student network training. Besides, the data augmentation of the point cloud in the typical teacher-student framework is too weak, and only contains basic down sampling and flip-and-shift (i.e., rotate and scaling), which hinders the effective learning of feature information. Hence, we address these issues by introducing a novel approach of Hierarchical Supervision and Shuffle Data Augmentation (HSSDA), which is a simple yet effective teacher-student framework. The teacher network generates more reasonable supervision for the student network by designing a dynamic dual-threshold strategy. Besides, the shuffle data augmentation strategy is designed to strengthen the feature representation ability of the student network. Extensive experiments show that HSSDA consistently outperforms the recent state-of-the-art methods on different datasets. The code will be released at https://github.com/azhuantou/HSSDA.

DLRover: An Elastic Deep Training Extension with Auto Job Resource Recommendation

  • Authors: Qinlong Wang, Bo Sang, Haitao Zhang, Mingjie Tang, Ke Zhang
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.01468
  • Pdf link: https://arxiv.org/pdf/2304.01468
  • Abstract
    The cloud is still a popular platform for distributed deep learning (DL) training jobs since resource sharing in the cloud can improve resource utilization and reduce overall costs. However, such sharing also brings multiple challenges for DL training jobs, e.g., high-priority jobs could impact, even interrupt, low-priority jobs. Meanwhile, most existing distributed DL training systems require users to configure the resources (i.e., the number of nodes and resources like CPU and memory allocated to each node) of jobs manually before job submission and can not adjust the job's resources during the runtime. The resource configuration of a job deeply affect this job's performance (e.g., training throughput, resource utilization, and completion rate). However, this usually leads to poor performance of jobs since users fail to provide optimal resource configuration in most cases. \systemis a distributed DL framework can auto-configure a DL job's initial resources and dynamically tune the job's resources to win the better performance. With elastic capability, \systemcan effectively adjusts the resources of a job when there are performance issues detected or a job fails because of faults or eviction. Evaluations results show \systemcan outperform manual well-tuned resource configurations. Furthermore, in the production Kubernetes cluster of \company, \systemreduces the medium of job completion time by 31%, and improves the job completion rate by 6%, CPU utilization by 15%, and memory utilization by 20% compared with manual configuration.

Multi model LSTM architecture for Track Association based on Automatic Identification System Data

  • Authors: Md Asif Bin Syed, Imtiaz Ahmed
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.01491
  • Pdf link: https://arxiv.org/pdf/2304.01491
  • Abstract
    For decades, track association has been a challenging problem in marine surveillance, which involves the identification and association of vessel observations over time. However, the Automatic Identification System (AIS) has provided a new opportunity for researchers to tackle this problem by offering a large database of dynamic and geo-spatial information of marine vessels. With the availability of such large databases, researchers can now develop sophisticated models and algorithms that leverage the increased availability of data to address the track association challenge effectively. Furthermore, with the advent of deep learning, track association can now be approached as a data-intensive problem. In this study, we propose a Long Short-Term Memory (LSTM) based multi-model framework for track association. LSTM is a recurrent neural network architecture that is capable of processing multivariate temporal data collected over time in a sequential manner, enabling it to predict current vessel locations from historical observations. Based on these predictions, a geodesic distance based similarity metric is then utilized to associate the unclassified observations to their true tracks (vessels). We evaluate the performance of our approach using standard performance metrics, such as precision, recall, and F1 score, which provide a comprehensive summary of the accuracy of the proposed framework.

Multimodal Neural Processes for Uncertainty Estimation

  • Authors: Myong Chol Jung, He Zhao, Joanna Dipnall, Belinda Gabbe, Lan Du
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.01518
  • Pdf link: https://arxiv.org/pdf/2304.01518
  • Abstract
    Neural processes (NPs) have brought the representation power of parametric deep neural networks and the reliable uncertainty estimation of non-parametric Gaussian processes together. Although recent development of NPs has shown success in both regression and classification, how to adapt NPs to multimodal data has not be carefully studied. For the first time, we propose a new model of NP family for multimodal uncertainty estimation, namely Multimodal Neural Processes. In a holistic and principled way, we develop a dynamic context memory updated by the classification error, a multimodal Bayesian aggregation mechanism to aggregate multimodal representations, and a new attention mechanism for calibrated predictions. In extensive empirical evaluation, our method achieves the state-of-the-art multimodal uncertainty estimation performance, showing its appealing ability of being robust against noisy samples and reliable in out-of-domain detection.

FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2

  • Authors: Kohav Dey, Krishna Bajaj, K S Ramalakshmi, Samuel Thomas, Sriram Radhakrishna
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01524
  • Pdf link: https://arxiv.org/pdf/2304.01524
  • Abstract
    Marine ecosystems are vital for the planet's health, but human activities such as climate change, pollution, and overfishing pose a constant threat to marine species. Accurate classification and monitoring of these species can aid in understanding their distribution, population dynamics, and the impact of human activities on them. However, classifying marine species can be challenging due to their vast diversity and the complex underwater environment. With advancements in computer performance and GPU-based computing, deep-learning algorithms can now efficiently classify marine species, making it easier to monitor and manage marine ecosystems. In this paper, we propose an optimization to the MobileNetV2 model to achieve a 99.83% average validation accuracy by highlighting specific guidelines for creating a dataset and augmenting marine species images. This transfer learning algorithm can be deployed successfully on a mobile application for on-site classification at fisheries.

Online Learning with Adversaries: A Differential Inclusion Analysis

  • Authors: Swetha Ganesh, Alexandre Reiffers-Masson, Gugan Thoppe
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.01525
  • Pdf link: https://arxiv.org/pdf/2304.01525
  • Abstract
    We consider the measurement model $Y = AX,$ where $X$ and, hence, $Y$ are random variables and $A$ is an a priori known tall matrix. At each time instance, a sample of one of $Y$'s coordinates is available, and the goal is to estimate $\mu := \mathbb{E}[X]$ via these samples. However, the challenge is that a small but unknown subset of $Y$'s coordinates are controlled by adversaries with infinite power: they can return any real number each time they are queried for a sample. For such an adversarial setting, we propose the first asynchronous online algorithm that converges to $\mu$ almost surely. We prove this result using a novel differential inclusion based two-timescale analysis. Two key highlights of our proof include: (a) the use of a novel Lyapunov function for showing that $\mu$ is the unique global attractor for our algorithm's limiting dynamics, and (b) the use of martingale and stopping time theory to show that our algorithm's iterates are almost surely bounded.

Proving the Convergence to Limit Cycles using Periodically Decreasing Jacobian Matrix Measures

  • Authors: Jawher Jerray, Laurent Fribourg
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.01691
  • Pdf link: https://arxiv.org/pdf/2304.01691
  • Abstract
    Methods based on "(Jacobian) matrix measure" to show the convergence of a dynamical system to a limit cycle (LC), generally assume that the measure is negative everywhere on the LC. We relax this assumption by assuming that the matrix measure is negative "on average" over one period of LC. Using an approximate Euler trajectory, we thus present a method that guarantees the LC existence, and allows us to construct a basin of attraction. This is illustrated on the example of the Van der Pol system.

Decoupling Dynamic Monocular Videos for Dynamic View Synthesis

  • Authors: Meng You, Junhui Hou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.01716
  • Pdf link: https://arxiv.org/pdf/2304.01716
  • Abstract
    The challenge of dynamic view synthesis from dynamic monocular videos, i.e., synthesizing novel views for free viewpoints given a monocular video of a dynamic scene captured by a moving camera, mainly lies in accurately modeling the dynamic objects of a scene using limited 2D frames, each with a varying timestamp and viewpoint. Existing methods usually require pre-processed 2D optical flow and depth maps by additional methods to supervise the network, making them suffer from the inaccuracy of the pre-processed supervision and the ambiguity when lifting the 2D information to 3D. In this paper, we tackle this challenge in an unsupervised fashion. Specifically, we decouple the motion of the dynamic objects into object motion and camera motion, respectively regularized by proposed unsupervised surface consistency and patch-based multi-view constraints. The former enforces the 3D geometric surfaces of moving objects to be consistent over time, while the latter regularizes their appearances to be consistent across different viewpoints. Such a fine-grained motion formulation can alleviate the learning difficulty for the network, thus enabling it to produce not only novel views with higher quality but also more accurate scene flows and depth than existing methods requiring extra supervision. We will make the code publicly available.

Virtio-FPGA: a virtualization solution for SoC-attached FPGAs

  • Authors: Anna Panagopoulou, Michele Paolino, Daniel Raho
  • Subjects: Operating Systems (cs.OS)
  • Arxiv link: https://arxiv.org/abs/2304.01721
  • Pdf link: https://arxiv.org/pdf/2304.01721
  • Abstract
    Recently, FPGA accelerators have risen in popularity as they present a suitable way of satisfying the high-computation and low-power demands of real time applications. The modern electric transportation systems (such as aircraft, road vehicles) can greatly profit from embedded FPGAs, which incorporate both high-performance and flexibility features into a single SoC. At the same time, the virtualization of FPGA resources aims to reinforce these systems with strong isolation, consolidation and security. In this paper, we present a novel virtualization framework aimed for SoC-attached FPGA devices, in a Linux and QEMU/KVM setup. We use Virtio as a means to enable the configuration of FPGA resources from guest systems in an efficient way. Also, we employ the Linux VFIO and Device Tree Overlays technologies in order to render the FPGA resources dynamically accessible to guest systems. The ability to dynamically configure and utilize the FPGA resources from a virtualization environment is described in details. The evaluation procedure of the solution is presented and the virtualization overhead is benchmarked as minimal (around 10%) when accessing the FPGA devices from guest systems.

Adaptive parallelization of multi-agent simulations with localized dynamics

  • Authors: Alexandru-Ionuţ Băbeanu, Tatiana Filatova, Jan H. Kwakkel, Neil Yorke-Smith
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computational Engineering, Finance, and Science (cs.CE); Multiagent Systems (cs.MA); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2304.01724
  • Pdf link: https://arxiv.org/pdf/2304.01724
  • Abstract
    Agent-based modelling constitutes a versatile approach to representing and simulating complex systems. Studying large-scale systems is challenging because of the computational time required for the simulation runs: scaling is at least linear in system size (number of agents). Given the inherently modular nature of MABSs, parallel computing is a natural approach to overcoming this challenge. However, because of the shared information and communication between agents, parellelization is not simple. We present a protocol for shared-memory, parallel execution of MABSs. This approach is useful for models that can be formulated in terms of sequential computations, and that involve updates that are localized, in the sense of involving small numbers of agents. The protocol has a bottom-up and asynchronous nature, allowing it to deal with heterogeneous computation in an adaptive, yet graceful manner. We illustrate the potential performance gains on exemplar cultural dynamics and disease spreading MABSs.

Dynamic treewidth

  • Authors: Tuukka Korhonen, Konrad Majewski, Wojciech Nadara, Michał Pilipczuk, Marek Sokołowski
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.01744
  • Pdf link: https://arxiv.org/pdf/2304.01744
  • Abstract
    We present a data structure that for a dynamic graph $G$ that is updated by edge insertions and deletions, maintains a tree decomposition of $G$ of width at most $6k+5$ under the promise that the treewidth of $G$ never grows above $k$. The amortized update time is ${\cal O}_k(2^{\sqrt{\log n}\log\log n})$, where $n$ is the vertex count of $G$ and the ${\cal O}_k(\cdot)$ notation hides factors depending on $k$. In addition, we also obtain the dynamic variant of Courcelle's Theorem: for any fixed property $\varphi$ expressible in the $\mathsf{CMSO}_2$ logic, the data structure can maintain whether $G$ satisfies $\varphi$ within the same time complexity bounds. To a large extent, this answers a question posed by Bodlaender [WG 1993].

Operator splitting for port-Hamiltonian systems

  • Authors: Andreas Frommer, Michael Günther, Björn Liljegren-Sailer, Nicole Marheineke
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.01766
  • Pdf link: https://arxiv.org/pdf/2304.01766
  • Abstract
    The port-Hamiltonian approach presents an energy-based modeling of dynamical systems with energy-conservative and energy-dissipative parts as well as an interconnection over the so-called ports. In this paper, we apply an operator splitting that treats the energy-conservative and energy-dissipative parts separately. This paves the way for linear equation solvers to exploit the respective special structures of the iteration matrices as well as the multirate potential in the different right-hand sides. We illustrate the approach using test examples from coupled multibody system dynamics.

Mixing predictions for online metric algorithms

  • Authors: Antonios Antoniadis, Christian Coester, Marek Eliáš, Adam Polak, Bertrand Simon
  • Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.01781
  • Pdf link: https://arxiv.org/pdf/2304.01781
  • Abstract
    A major technique in learning-augmented online algorithms is combining multiple algorithms or predictors. Since the performance of each predictor may vary over time, it is desirable to use not the single best predictor as a benchmark, but rather a dynamic combination which follows different predictors at different times. We design algorithms that combine predictions and are competitive against such dynamic combinations for a wide class of online problems, namely, metrical task systems. Against the best (in hindsight) unconstrained combination of $\ell$ predictors, we obtain a competitive ratio of $O(\ell^2)$, and show that this is best possible. However, for a benchmark with slightly constrained number of switches between different predictors, we can get a $(1+\epsilon)$-competitive algorithm. Moreover, our algorithms can be adapted to access predictors in a bandit-like fashion, querying only one predictor at a time. An unexpected implication of one of our lower bounds is a new structural insight about covering formulations for the $k$-server problem.

Machine Learning Discovery of Optimal Quadrature Rules for Isogeometric Analysis

  • Authors: Tomas Teijeiro, Jamie M. Taylor, Ali Hashemian, David Pardo
  • Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.01802
  • Pdf link: https://arxiv.org/pdf/2304.01802
  • Abstract
    We propose the use of machine learning techniques to find optimal quadrature rules for the construction of stiffness and mass matrices in isogeometric analysis (IGA). We initially consider 1D spline spaces of arbitrary degree spanned over uniform and non-uniform knot sequences, and then the generated optimal rules are used for integration over higher-dimensional spaces using tensor product sense. The quadrature rule search is posed as an optimization problem and solved by a machine learning strategy based on gradient-descent. However, since the optimization space is highly non-convex, the success of the search strongly depends on the number of quadrature points and the parameter initialization. Thus, we use a dynamic programming strategy that initializes the parameters from the optimal solution over the spline space with a lower number of knots. With this method, we found optimal quadrature rules for spline spaces when using IGA discretizations with up to 50 uniform elements and polynomial degrees up to 8, showing the generality of the approach in this scenario. For non-uniform partitions, the method also finds an optimal rule in a reasonable number of test cases. We also assess the generated optimal rules in two practical case studies, namely, the eigenvalue problem of the Laplace operator and the eigenfrequency analysis of freeform curved beams, where the latter problem shows the applicability of the method to curved geometries. In particular, the proposed method results in savings with respect to traditional Gaussian integration of up to 44% in 1D, 68% in 2D, and 82% in 3D spaces.

FAST: Fidelity-Adjustable Semantic Transmission over Heterogeneous Wireless Networks

  • Authors: Peichun Li, Guoliang Cheng, Jiawen Kang, Rong Yu, Liping Qian, Yuan Wu, Dusit Niyato
  • Subjects: Networking and Internet Architecture (cs.NI); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2304.01857
  • Pdf link: https://arxiv.org/pdf/2304.01857
  • Abstract
    In this work, we investigate the challenging problem of on-demand semantic communication over heterogeneous wireless networks. We propose a fidelity-adjustable semantic transmission framework (FAST) that empowers wireless devices to send data efficiently under different application scenarios and resource conditions. To this end, we first design a dynamic sub-model training scheme to learn the flexible semantic model, which enables edge devices to customize the transmission fidelity with different widths of the semantic model. After that, we focus on the FAST optimization problem to minimize the system energy consumption with latency and fidelity constraints. Following that, the optimal transmission strategies including the scaling factor of the semantic model, computing frequency, and transmitting power are derived for the devices. Experiment results indicate that, when compared to the baseline transmission schemes, the proposed framework can reduce up to one order of magnitude of the system energy consumption and data size for maintaining reasonable data fidelity.

Unified Behavioral Data-Driven Performance Analysis: A Generalized Plant Approach

  • Authors: L. M. Spin, C. Verhoek, W. P. M. H. Heemels, N. van de Wouw, R. Tóth
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.01859
  • Pdf link: https://arxiv.org/pdf/2304.01859
  • Abstract
    In this paper, we present a novel approach to combine data-driven non-parametric representations with model-based representations of dynamical systems. Based on a data-driven form of linear fractional transformations, we introduce a data-driven form of generalized plants. This form can be leveraged to accomplish performance characterizations, e.g., in the form of a mixed-sensitivity approach, and LMI-based conditions to verify finite-horizon dissipativity. In particular, we show how finite-horizon $\ell_2$-gain under weighting filter-based general performance specifications are verified for implemented controllers on systems for which only input-output data is available. The overall effectiveness of the proposed method is demonstrated by simulation examples.

Rolling the Dice: Imagining Generative AI as a Dungeons & Dragons Storytelling Companion

  • Authors: Jose Ma. Santiago III, Richard Lance Parayno, Jordan Aiko Deja, Briane Paul V. Samson
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.01860
  • Pdf link: https://arxiv.org/pdf/2304.01860
  • Abstract
    AI Advancements have augmented casual writing and story generation, but their usage poses challenges in collaborative storytelling. In role-playing games such as Dungeons & Dragons (D&D), composing prompts using generative AI requires a technical understanding to generate ideal results, which is difficult for novices. Thus, emergent narratives organically developed based on player actions and decisions have yet to be fully utilized. This paper envisions the use of generative AI in transforming storytelling into an interactive drama using dynamic and immersive narratives. First, we describe scenarios where narratives are created and character conversations are designed within an overarching fantasy disposition. Then, we recommend design guidelines to help create tools using generative AI in interactive storytelling. Lastly, we raise questions on its potential impact on player immersion and cognitive load. Our contributions may be expanded within the broader interactive storytelling domain, such as speech-conversational AI and persona-driven chatbots.

SportsPose -- A Dynamic 3D sports pose dataset

  • Authors: Christian Keilstrup Ingwersen, Christian Mikkelstrup, Janus Nørtoft Jensen, Morten Rieger Hannemose, Anders Bjorholm Dahl
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.01865
  • Pdf link: https://arxiv.org/pdf/2304.01865
  • Abstract
    Accurate 3D human pose estimation is essential for sports analytics, coaching, and injury prevention. However, existing datasets for monocular pose estimation do not adequately capture the challenging and dynamic nature of sports movements. In response, we introduce SportsPose, a large-scale 3D human pose dataset consisting of highly dynamic sports movements. With more than 176,000 3D poses from 24 different subjects performing 5 different sports activities, SportsPose provides a diverse and comprehensive set of 3D poses that reflect the complex and dynamic nature of sports movements. Contrary to other markerless datasets we have quantitatively evaluated the precision of SportsPose by comparing our poses with a commercial marker-based system and achieve a mean error of 34.5 mm across all evaluation sequences. This is comparable to the error reported on the commonly used 3DPW dataset. We further introduce a new metric, local movement, which describes the movement of the wrist and ankle joints in relation to the body. With this, we show that SportsPose contains more movement than the Human3.6M and 3DPW datasets in these extremum joints, indicating that our movements are more dynamic. The dataset with accompanying code can be downloaded from our website. We hope that SportsPose will allow researchers and practitioners to develop and evaluate more effective models for the analysis of sports performance and injury prevention. With its realistic and diverse dataset, SportsPose provides a valuable resource for advancing the state-of-the-art in pose estimation in sports.

Chasing Positive Bodies

  • Authors: Sayan Bhattacharya, Niv Buchbinder, Roie Levin, Thatchaphol Saranurak
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.01889
  • Pdf link: https://arxiv.org/pdf/2304.01889
  • Abstract
    We study the problem of chasing positive bodies in $\ell_1$: given a sequence of bodies $K_{t}={x^{t}\in\mathbb{R}{+}^{n}\mid C^{t}x^{t}\geq 1,P^{t}x^{t}\leq 1}$ revealed online, where $C^{t}$ and $P^{t}$ are nonnegative matrices, the goal is to (approximately) maintain a point $x_t \in K_t$ such that $\sum_t |x_t - x{t-1}|_1$ is minimized. This captures the fully-dynamic low-recourse variant of any problem that can be expressed as a mixed packing-covering linear program and thus also the fractional version of many central problems in dynamic algorithms such as set cover, load balancing, hyperedge orientation, minimum spanning tree, and matching. We give an $O(\log d)$-competitive algorithm for this problem, where $d$ is the maximum row sparsity of any matrix $C^t$. This bypasses and improves exponentially over the lower bound of $\sqrt{n}$ known for general convex bodies. Our algorithm is based on iterated information projections, and, in contrast to general convex body chasing algorithms, is entirely memoryless. We also show how to round our solution dynamically to obtain the first fully dynamic algorithms with competitive recourse for all the stated problems above; i.e. their recourse is less than the recourse of every other algorithm on every update sequence, up to polylogarithmic factors. This is a significantly stronger notion than the notion of absolute recourse in the dynamic algorithms literature.

InfluencerRank: Discovering Effective Influencers via Graph Convolutional Attentive Recurrent Neural Networks

  • Authors: Seungbae Kim, Jyun-Yu Jiang, Jinyoung Han, Wei Wang
  • Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.01897
  • Pdf link: https://arxiv.org/pdf/2304.01897
  • Abstract
    As influencers play considerable roles in social media marketing, companies increase the budget for influencer marketing. Hiring effective influencers is crucial in social influencer marketing, but it is challenging to find the right influencers among hundreds of millions of social media users. In this paper, we propose InfluencerRank that ranks influencers by their effectiveness based on their posting behaviors and social relations over time. To represent the posting behaviors and social relations, the graph convolutional neural networks are applied to model influencers with heterogeneous networks during different historical periods. By learning the network structure with the embedded node features, InfluencerRank can derive informative representations for influencers at each period. An attentive recurrent neural network finally distinguishes highly effective influencers from other influencers by capturing the knowledge of the dynamics of influencer representations over time. Extensive experiments have been conducted on an Instagram dataset that consists of 18,397 influencers with their 2,952,075 posts published within 12 months. The experimental results demonstrate that InfluencerRank outperforms existing baseline methods. An in-depth analysis further reveals that all of our proposed features and model components are beneficial to discover effective influencers.

Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition

  • Authors: Saumya Y. Sahai, Jing Liu, Thejaswi Muniyappa, Kanthashree M. Sathyendra, Anastasios Alexandridis, Grant P. Strimel, Ross McGowan, Ariya Rastrow, Feng-Ju Chang, Athanasios Mouchtaris, Siegfried Kunzmann
  • Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2304.01905
  • Pdf link: https://arxiv.org/pdf/2304.01905
  • Abstract
    We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks. This architecture enables a dynamic switch for its runtime compute paths by exploiting WW spotting to select which branch of its attention networks to execute for an input audio frame. With this approach, we effectively improve WW spotting accuracy while saving runtime compute cost as defined by floating point operations (FLOPs). Using an in-house de-identified dataset, we demonstrate that the proposed dual-attention network can reduce the compute cost by $90%$ for WW audio frames, with only $1%$ increase in the number of parameters. This architecture improves WW F1 score by $16%$ relative and improves generic rare word error rate by $3%$ relative compared to the baselines.

Accelerating and Compressing Deep Neural Networks for Massive MIMO CSI Feedback

  • Authors: Omar Erak, Hatem Abou-Zeid
  • Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.01914
  • Pdf link: https://arxiv.org/pdf/2304.01914
  • Abstract
    The recent advances in machine learning and deep neural networks have made them attractive candidates for wireless communications functions such as channel estimation, decoding, and downlink channel state information (CSI) compression. However, most of these neural networks are large and inefficient making it a barrier for deployment in practical wireless systems that require low-latency and low memory footprints for individual network functions. To mitigate these limitations, we propose accelerated and compressed efficient neural networks for massive MIMO CSI feedback. Specifically, we have thoroughly investigated the adoption of network pruning, post-training dynamic range quantization, and weight clustering to optimize CSI feedback compression for massive MIMO systems. Furthermore, we have deployed the proposed model compression techniques on commodity hardware and demonstrated that in order to achieve inference gains, specialized libraries that accelerate computations for sparse neural networks are required. Our findings indicate that there is remarkable value in applying these model compression techniques and the proposed joint pruning and quantization approach reduced model size by 86.5% and inference time by 76.2% with minimal impact to model accuracy. These compression methods are crucial to pave the way for practical adoption and deployments of deep learning-based techniques in commercial wireless systems.

The Rise of Disappearing Frameworks in Web Development

  • Authors: Juho Vepsäläinen, Arto Hellas, Petri Vuorimaa
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2304.01947
  • Pdf link: https://arxiv.org/pdf/2304.01947
  • Abstract
    The evolution of the web can be characterized as an emergence of frameworks paving the way from static websites to dynamic web applications. As the scope of web applications has grown, new technical challenges have emerged, leading to the need for new solutions. The latest of these developments is the rise of so-called disappearing web frameworks that question the axioms of earlier generations of web frameworks, providing benefits of the early web and simple static sites.

Strong spatial mixing for colorings on trees and its algorithmic applications

  • Authors: Zongchen Chen, Kuikui Liu, Nitya Mani, Ankur Moitra
  • Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Combinatorics (math.CO); Probability (math.PR)
  • Arxiv link: https://arxiv.org/abs/2304.01954
  • Pdf link: https://arxiv.org/pdf/2304.01954
  • Abstract
    Strong spatial mixing (SSM) is an important quantitative notion of correlation decay for Gibbs distributions arising in statistical physics, probability theory, and theoretical computer science. A longstanding conjecture is that the uniform distribution on proper $q$-colorings on a $\Delta$-regular tree exhibits SSM whenever $q \ge \Delta+1$. Moreover, it is widely believed that as long as SSM holds on bounded-degree trees with $q$ colors, one would obtain an efficient sampler for $q$-colorings on all bounded-degree graphs via simple Markov chain algorithms. It is surprising that such a basic question is still open, even on trees, but then again it also highlights how much we still have to learn about random colorings. In this paper, we show the following: (1) For any $\Delta \ge 3$, SSM holds for random $q$-colorings on trees of maximum degree $\Delta$ whenever $q \ge \Delta + 3$. Thus we almost fully resolve the aforementioned conjecture. Our result substantially improves upon the previously best bound which requires $q \ge 1.59\Delta+\gamma^$ for an absolute constant $\gamma^ > 0$. (2) For any $\Delta\ge 3$ and girth $g = \Omega_\Delta(1)$, we establish optimal mixing of the Glauber dynamics for $q$-colorings on graphs of maximum degree $\Delta$ and girth $g$ whenever $q \ge \Delta+3$. Our approach is based on a new general reduction from spectral independence on large-girth graphs to SSM on trees that is of independent interest. Using the same techniques, we also prove near-optimal bounds on weak spatial mixing (WSM), a closely-related notion to SSM, for the antiferromagnetic Potts model on trees.

MEGClass: Text Classification with Extremely Weak Supervision via Mutually-Enhancing Text Granularities

  • Authors: Priyanka Kargupta, Tanay Komarlu, Susik Yoon, Xuan Wang, Jiawei Han
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.01969
  • Pdf link: https://arxiv.org/pdf/2304.01969
  • Abstract
    Text classification typically requires a substantial amount of human-annotated data to serve as supervision, which is costly to obtain in dynamic emerging domains. Certain methods seek to address this problem by solely relying on the surface text of class names to serve as extremely weak supervision. However, existing methods fail to account for single-class documents discussing multiple topics. Both topic diversity and vague sentences may introduce noise into the document's underlying representation and consequently the precision of the predicted class. Furthermore, current work focuses on text granularities (documents, sentences, or words) independently, which limits the degree of coarse- or fine-grained context that we can jointly extract from all three to identify significant subtext for classification. In order to address this problem, we propose MEGClass, an extremely weakly-supervised text classification method to exploit Mutually-Enhancing Text Granularities. Specifically, MEGClass constructs class-oriented sentence and class representations based on keywords for performing a sentence-level confidence-weighted label ensemble in order to estimate a document's initial class distribution. This serves as the target distribution for a multi-head attention network with a class-weighted contrastive loss. This network learns contextualized sentence representations and weights to form document representations that reflect its original document and sentence-level topic diversity. Retaining this heterogeneity allows MEGClass to select the most class-indicative documents to serve as iterative feedback for enhancing the class representations. Finally, these top documents are used to fine-tune a pre-trained text classifier. As demonstrated through extensive experiments on six benchmark datasets, MEGClass outperforms other weakly and extremely weakly supervised methods.

Side Channel-Assisted Inference Leakage from Machine Learning-based ECG Classification

  • Authors: Jialin Liu, Ning Miao, Chongzhou Fang, Houman Homayoun, Han Wang
  • Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.01990
  • Pdf link: https://arxiv.org/pdf/2304.01990
  • Abstract
    The Electrocardiogram (ECG) measures the electrical cardiac activity generated by the heart to detect abnormal heartbeat and heart attack. However, the irregular occurrence of the abnormalities demands continuous monitoring of heartbeats. Machine learning techniques are leveraged to automate the task to reduce labor work needed during monitoring. In recent years, many companies have launched products with ECG monitoring and irregular heartbeat alert. Among all classification algorithms, the time series-based algorithm dynamic time warping (DTW) is widely adopted to undertake the ECG classification task. Though progress has been achieved, the DTW-based ECG classification also brings a new attacking vector of leaking the patients' diagnosis results. This paper shows that the ECG input samples' labels can be stolen via a side-channel attack, Flush+Reload. In particular, we first identify the vulnerability of DTW for ECG classification, i.e., the correlation between warping path choice and prediction results. Then we implement an attack that leverages Flush+Reload to monitor the warping path selection with known ECG data and then build a predictor for constructing the relation between warping path selection and labels of input ECG samples. Based on experiments, we find that the Flush+Reload-based inference leakage can achieve an 84.0% attacking success rate to identify the labels of the two samples in DTW.

MonoHuman: Animatable Human Neural Field from Monocular Video

  • Authors: Zhengming Yu, Wei Cheng, Xian Liu, Wayne Wu, Kwan-Yee Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.02001
  • Pdf link: https://arxiv.org/pdf/2304.02001
  • Abstract
    Animating virtual avatars with free-view control is crucial for various applications like virtual reality and digital entertainment. Previous studies have attempted to utilize the representation power of the neural radiance field (NeRF) to reconstruct the human body from monocular videos. Recent works propose to graft a deformation network into the NeRF to further model the dynamics of the human neural field for animating vivid human motions. However, such pipelines either rely on pose-dependent representations or fall short of motion coherency due to frame-independent optimization, making it difficult to generalize to unseen pose sequences realistically. In this paper, we propose a novel framework MonoHuman, which robustly renders view-consistent and high-fidelity avatars under arbitrary novel poses. Our key insight is to model the deformation field with bi-directional constraints and explicitly leverage the off-the-peg keyframe information to reason the feature correlations for coherent results. Specifically, we first propose a Shared Bidirectional Deformation module, which creates a pose-independent generalizable deformation field by disentangling backward and forward deformation correspondences into shared skeletal motion weight and separate non-rigid motions. Then, we devise a Forward Correspondence Search module, which queries the correspondence feature of keyframes to guide the rendering network. The rendered results are thus multi-view consistent with high fidelity, even under challenging novel pose settings. Extensive experiments demonstrate the superiority of our proposed MonoHuman over state-of-the-art methods.

Towards Optimal Human-Robot Interface Design Applied to Underwater Robotics Teleoperation

  • Authors: Paulo Padrao, Jose Fuentes, Tero Kaarlela, Alfredo Bayuelo, Leonardo Bobadilla
  • Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.02002
  • Pdf link: https://arxiv.org/pdf/2304.02002
  • Abstract
    Efficient and intuitive Human-Robot interfaces are crucial for expanding the user base of operators and enabling new applications in critical areas such as precision agriculture, automated construction, rehabilitation, and environmental monitoring. In this paper, we investigate the design of human-robot interfaces for the teleoperation of dynamical systems. The proposed framework seeks to find an optimal interface that complies with key concepts such as user comfort, efficiency, continuity, and consistency. As a proof-of-concept, we introduce an innovative approach to teleoperating underwater vehicles, allowing the translation between human body movements into vehicle control commands. This method eliminates the need for divers to work in harsh underwater environments while taking into account comfort and communication constraints. We conducted a study with human subjects using a head-mounted display attached to a smartphone to control a simulated ROV. Also, numerical experiments have demonstrated that the optimal translation is often the most intuitive and natural one, aligning with users' expectations.

New submissions for Fri, 24 Mar 23

Keyword: pruning

Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation

  • Authors: Bingyi Zhang, Viktor Prasanna
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2303.12901
  • Pdf link: https://arxiv.org/pdf/2303.12901
  • Abstract
    Graph Neural Network (GNN) inference is used in many real-world applications. Data sparsity in GNN inference, including sparsity in the input graph and the GNN model, offer opportunities to further speed up inference. Also, many pruning techniques have been proposed for model compression that increase the data sparsity of GNNs. We propose Dynasparse, a comprehensive hardware-software codesign on FPGA to accelerate GNN inference through dynamic sparsity exploitation. For this, we decouple the GNN computation kernels from the basic computation primitives, and explore hardware-software codesign as follows: 1) Hardware design: We propose a novel unified accelerator design on FPGA to efficiently execute various computation primitives. We develop a customized soft processor that is tightly coupled with the accelerator to execute a runtime system. Moreover, we develop efficient hardware mechanisms to profile the data sparsity and perform on-the-fly data format transformation to prepare the input data for various computation primitives; 2) Software design: We develop a runtime system that works synergistically with the accelerator to perform dynamic kernel-to-primitive mapping based on data sparsity. We implement Dynasparse on a state-of-the-art FPGA platform, Xilinx Alveo U250, and evaluate the design using widely used GNN models (GCN, GraphSAGE, GIN and SGC). For the above GNN models and various input graphs, the proposed accelerator and dynamic kernel-to-primitive mapping reduces the inference latency by $3.73\times$ on the average compared with the static mapping strategies employed in the state-of-the-art GNN accelerators. Compared with state-of-the-art CPU (GPU) implementations, Dynasparse achieves up to $56.9\times$ ($2.37\times$) speedup in end-to-end latency.

CP$^3$: Channel Pruning Plug-in for Point-based Networks

  • Authors: Yaomin Huang, Ning Liu, Zhengping Che, Zhiyuan Xu, Chaomin Shen, Yaxin Peng, Guixu Zhang, Xinmei Liu, Feifei Feng, Jian Tang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2303.13097
  • Pdf link: https://arxiv.org/pdf/2303.13097
  • Abstract
    Channel pruning can effectively reduce both computational cost and memory footprint of the original network while keeping a comparable accuracy performance. Though great success has been achieved in channel pruning for 2D image-based convolutional networks (CNNs), existing works seldom extend the channel pruning methods to 3D point-based neural networks (PNNs). Directly implementing the 2D CNN channel pruning methods to PNNs undermine the performance of PNNs because of the different representations of 2D images and 3D point clouds as well as the network architecture disparity. In this paper, we proposed CP$^3$, which is a Channel Pruning Plug-in for Point-based network. CP$^3$ is elaborately designed to leverage the characteristics of point clouds and PNNs in order to enable 2D channel pruning methods for PNNs. Specifically, it presents a coordinate-enhanced channel importance metric to reflect the correlation between dimensional information and individual channel features, and it recycles the discarded points in PNN's sampling process and reconsiders their potentially-exclusive information to enhance the robustness of channel pruning. Experiments on various PNN architectures show that CP$^3$ constantly improves state-of-the-art 2D CNN pruning approaches on different point cloud tasks. For instance, our compressed PointNeXt-S on ScanObjectNN achieves an accuracy of 88.52% with a pruning rate of 57.8%, outperforming the baseline pruning methods with an accuracy gain of 1.94%.

DetOFA: Efficient Training of Once-for-All Networks for Object Detection by Using Pre-trained Supernet and Path Filter

  • Authors: Yuiko Sakuma, Masato Ishii, Takuya Narihira
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13121
  • Pdf link: https://arxiv.org/pdf/2303.13121
  • Abstract
    We address the challenge of training a large supernet for the object detection task, using a relatively small amount of training data. Specifically, we propose an efficient supernet-based neural architecture search (NAS) method that uses transfer learning and search space pruning. First, the supernet is pre-trained on a classification task, for which large datasets are available. Second, the search space defined by the supernet is pruned by removing candidate models that are predicted to perform poorly. To effectively remove the candidates over a wide range of resource constraints, we particularly design a performance predictor, called path filter, which can accurately predict the relative performance of the models that satisfy similar resource constraints. Hence, supernet training is more focused on the best-performing candidates. Our path filter handles prediction for paths with different resource budgets. Compared to once-for-all, our proposed method reduces the computational cost of the optimal network architecture by 30% and 63%, while yielding better accuracy-floating point operations Pareto front (0.85 and 0.45 points of improvement on average precision for Pascal VOC and COCO, respectively).

Keyword: neural\ architecture\ search

There is no result

Keyword: 3d object detection

MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer

  • Authors: Yunsong Zhou, Hongzi Zhu, Quan Liu, Shan Chang, Minyi Guo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13018
  • Pdf link: https://arxiv.org/pdf/2303.13018
  • Abstract
    Mobile monocular 3D object detection (Mono3D) (e.g., on a vehicle, a drone, or a robot) is an important yet challenging task. Existing transformer-based offline Mono3D models adopt grid-based vision tokens, which is suboptimal when using coarse tokens due to the limited available computational power. In this paper, we propose an online Mono3D framework, called MonoATT, which leverages a novel vision transformer with heterogeneous tokens of varying shapes and sizes to facilitate mobile Mono3D. The core idea of MonoATT is to adaptively assign finer tokens to areas of more significance before utilizing a transformer to enhance Mono3D. To this end, we first use prior knowledge to design a scoring network for selecting the most important areas of the image, and then propose a token clustering and merging network with an attention mechanism to gradually merge tokens around the selected areas in multiple stages. Finally, a pixel-level feature map is reconstructed from heterogeneous tokens before employing a SOTA Mono3D detector as the underlying detection core. Experiment results on the real-world KITTI dataset demonstrate that MonoATT can effectively improve the Mono3D accuracy for both near and far objects and guarantee low latency. MonoATT yields the best performance compared with the state-of-the-art methods by a large margin and is ranked number one on the KITTI 3D benchmark.

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

  • Authors: Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, Jinkun Cao, Jiangmiao Pang, Dahua Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13510
  • Pdf link: https://arxiv.org/pdf/2303.13510
  • Abstract
    This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset. Inspired by the scene-voxel-point hierarchy in downstream 3D object detectors, we design masking and reconstruction strategies accounting for voxel distributions in the scene and local point distributions within the voxel. We employ a Reversed-Furthest-Voxel-Sampling strategy to address the uneven distribution of LiDAR points and propose MV-JAR, which combines two techniques for modeling the aforementioned distributions, resulting in superior performance. Our experiments reveal limitations in previous data-efficient experiments, which uniformly sample fine-tuning splits with varying data proportions from each LiDAR sequence, leading to similar data diversity across splits. To address this, we propose a new benchmark that samples scene sequences for diverse fine-tuning splits, ensuring adequate model convergence and providing a more accurate evaluation of pre-training methods. Experiments on our Waymo benchmark and the KITTI dataset demonstrate that MV-JAR consistently and significantly improves 3D detection performance across various data scales, achieving up to a 6.3% increase in mAPH compared to training from scratch. Codes and the benchmark will be available at https://github.com/SmartBot-PJLab/MV-JAR .

Keyword: voxel

Marching-Primitives: Shape Abstraction from Signed Distance Function

  • Authors: Weixiao Liu, Yuwei Wu, Sipu Ruan, Gregory S. Chirikjian
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13190
  • Pdf link: https://arxiv.org/pdf/2303.13190
  • Abstract
    Representing complex objects with basic geometric primitives has long been a topic in computer vision. Primitive-based representations have the merits of compactness and computational efficiency in higher-level tasks such as physics simulation, collision checking, and robotic manipulation. Unlike previous works which extract polygonal meshes from a signed distance function (SDF), in this paper, we present a novel method, named Marching-Primitives, to obtain a primitive-based abstraction directly from an SDF. Our method grows geometric primitives (such as superquadrics) iteratively by analyzing the connectivity of voxels while marching at different levels of signed distance. For each valid connected volume of interest, we march on the scope of voxels from which a primitive is able to be extracted in a probabilistic sense and simultaneously solve for the parameters of the primitive to capture the underlying local geometry. We evaluate the performance of our method on both synthetic and real-world datasets. The results show that the proposed method outperforms the state-of-the-art in terms of accuracy, and is directly generalizable among different categories and scales. The code is open-sourced at https://github.com/ChirikjianLab/Marching-Primitives.git.

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

  • Authors: Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, Jinkun Cao, Jiangmiao Pang, Dahua Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13510
  • Pdf link: https://arxiv.org/pdf/2303.13510
  • Abstract
    This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset. Inspired by the scene-voxel-point hierarchy in downstream 3D object detectors, we design masking and reconstruction strategies accounting for voxel distributions in the scene and local point distributions within the voxel. We employ a Reversed-Furthest-Voxel-Sampling strategy to address the uneven distribution of LiDAR points and propose MV-JAR, which combines two techniques for modeling the aforementioned distributions, resulting in superior performance. Our experiments reveal limitations in previous data-efficient experiments, which uniformly sample fine-tuning splits with varying data proportions from each LiDAR sequence, leading to similar data diversity across splits. To address this, we propose a new benchmark that samples scene sequences for diverse fine-tuning splits, ensuring adequate model convergence and providing a more accurate evaluation of pre-training methods. Experiments on our Waymo benchmark and the KITTI dataset demonstrate that MV-JAR consistently and significantly improves 3D detection performance across various data scales, achieving up to a 6.3% increase in mAPH compared to training from scratch. Codes and the benchmark will be available at https://github.com/SmartBot-PJLab/MV-JAR .

Keyword: lidar

MMFormer: Multimodal Transformer Using Multiscale Self-Attention for Remote Sensing Image Classification

  • Authors: Bo Zhang, Zuheng Ming, Wei Feng, Yaqian Liu, Liang He, Kaixing Zhao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13101
  • Pdf link: https://arxiv.org/pdf/2303.13101
  • Abstract
    To benefit the complementary information between heterogeneous data, we introduce a new Multimodal Transformer (MMFormer) for Remote Sensing (RS) image classification using Hyperspectral Image (HSI) accompanied by another source of data such as Light Detection and Ranging (LiDAR). Compared with traditional Vision Transformer (ViT) lacking inductive biases of convolutions, we first introduce convolutional layers to our MMFormer to tokenize patches from multimodal data of HSI and LiDAR. Then we propose a Multi-scale Multi-head Self-Attention (MSMHSA) module to address the problem of compatibility which often limits to fuse HSI with high spectral resolution and LiDAR with relatively low spatial resolution. The proposed MSMHSA module can incorporate HSI to LiDAR data in a coarse-to-fine manner enabling us to learn a fine-grained representation. Extensive experiments on widely used benchmarks (e.g., Trento and MUUFL) demonstrate the effectiveness and superiority of our proposed MMFormer for RS image classification.

Position-Guided Point Cloud Panoptic Segmentation Transformer

  • Authors: Zeqi Xiao, Wenwei Zhang, Tai Wang, Chen Change Loy, Dahua Lin, Jiangmiao Pang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13509
  • Pdf link: https://arxiv.org/pdf/2303.13509
  • Abstract
    DEtection TRansformer (DETR) started a trend that uses a group of learnable queries for unified visual perception. This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline. Although the naive adaptation obtains fair results, the instance segmentation performance is noticeably inferior to previous works. By diving into the details, we observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain. Considering instances in 3D are more featured by their positional information, we emphasize their roles during the modeling and design a robust Mixed-parameterized Positional Embedding (MPE) to guide the segmentation process. It is embedded into backbone features and later guides the mask prediction and query update processes iteratively, leading to Position-Aware Segmentation (PA-Seg) and Masked Focal Attention (MFA). All these designs impel the queries to attend to specific regions and identify various instances. The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% PQ on SemanticKITTI and nuScenes benchmark, respectively. The source code and models are available at https://github.com/SmartBot-PJLab/P3Former .

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

  • Authors: Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, Jinkun Cao, Jiangmiao Pang, Dahua Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2303.13510
  • Pdf link: https://arxiv.org/pdf/2303.13510
  • Abstract
    This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset. Inspired by the scene-voxel-point hierarchy in downstream 3D object detectors, we design masking and reconstruction strategies accounting for voxel distributions in the scene and local point distributions within the voxel. We employ a Reversed-Furthest-Voxel-Sampling strategy to address the uneven distribution of LiDAR points and propose MV-JAR, which combines two techniques for modeling the aforementioned distributions, resulting in superior performance. Our experiments reveal limitations in previous data-efficient experiments, which uniformly sample fine-tuning splits with varying data proportions from each LiDAR sequence, leading to similar data diversity across splits. To address this, we propose a new benchmark that samples scene sequences for diverse fine-tuning splits, ensuring adequate model convergence and providing a more accurate evaluation of pre-training methods. Experiments on our Waymo benchmark and the KITTI dataset demonstrate that MV-JAR consistently and significantly improves 3D detection performance across various data scales, achieving up to a 6.3% increase in mAPH compared to training from scratch. Codes and the benchmark will be available at https://github.com/SmartBot-PJLab/MV-JAR .

New submissions for Fri, 21 Apr 23

Keyword: efficient

Evolving Constrained Reinforcement Learning Policy

  • Authors: Chengpeng Hu, Jiyuan Pei, Jialin Liu, Xin Yao
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09869
  • Pdf link: https://arxiv.org/pdf/2304.09869
  • Abstract
    Evolutionary algorithms have been used to evolve a population of actors to generate diverse experiences for training reinforcement learning agents, which helps to tackle the temporal credit assignment problem and improves the exploration efficiency. However, when adapting this approach to address constrained problems, balancing the trade-off between the reward and constraint violation is hard. In this paper, we propose a novel evolutionary constrained reinforcement learning (ECRL) algorithm, which adaptively balances the reward and constraint violation with stochastic ranking, and at the same time, restricts the policy's behaviour by maintaining a set of Lagrange relaxation coefficients with a constraint buffer. Extensive experiments on robotic control benchmarks show that our ECRL achieves outstanding performance compared to state-of-the-art algorithms. Ablation analysis shows the benefits of introducing stochastic ranking and constraint buffer.

GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

  • Authors: Li Zaitang, Pin-Yu Chen, Tsung-Yi Ho
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09875
  • Pdf link: https://arxiv.org/pdf/2304.09875
  • Abstract
    Current studies on adversarial robustness mainly focus on aggregating local robustness results from a set of data samples to evaluate and rank different models. However, the local statistics may not well represent the true global robustness of the underlying unknown data distribution. To address this challenge, this paper makes the first attempt to present a new framework, called GREAT Score , for global robustness evaluation of adversarial perturbation using generative models. Formally, GREAT Score carries the physical meaning of a global statistic capturing a mean certified attack-proof perturbation level over all samples drawn from a generative model. For finite-sample evaluation, we also derive a probabilistic guarantee on the sample complexity and the difference between the sample mean and the true mean. GREAT Score has several advantages: (1) Robustness evaluations using GREAT Score are efficient and scalable to large models, by sparing the need of running adversarial attacks. In particular, we show high correlation and significantly reduced computation cost of GREAT Score when compared to the attack-based model ranking on RobustBench (Croce,et. al. 2021). (2) The use of generative models facilitates the approximation of the unknown data distribution. In our ablation study with different generative adversarial networks (GANs), we observe consistency between global robustness evaluation and the quality of GANs. (3) GREAT Score can be used for remote auditing of privacy-sensitive black-box models, as demonstrated by our robustness evaluation on several online facial recognition services.

The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages

  • Authors: Vesa Akerman, David Baines, Damien Daspit, Ulf Hermjakob, Taeho Jang, Colin Leong, Michael Martin, Joel Mathew, Jonathan Robie, Marcus Schwarting
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09919
  • Pdf link: https://arxiv.org/pdf/2304.09919
  • Abstract
    Efficiently and accurately translating a corpus into a low-resource language remains a challenge, regardless of the strategies employed, whether manual, automated, or a combination of the two. Many Christian organizations are dedicated to the task of translating the Holy Bible into languages that lack a modern translation. Bible translation (BT) work is currently underway for over 3000 extremely low resource languages. We introduce the eBible corpus: a dataset containing 1009 translations of portions of the Bible with data in 833 different languages across 75 language families. In addition to a BT benchmarking dataset, we introduce model performance benchmarks built on the No Language Left Behind (NLLB) neural machine translation (NMT) models. Finally, we describe several problems specific to the domain of BT and consider how the established data and model benchmarks might be used for future translation efforts. For a BT task trained with NLLB, Austronesian and Trans-New Guinea language families achieve 35.1 and 31.6 BLEU scores respectively, which spurs future innovations for NMT for low-resource languages in Papua New Guinea.

A robust and interpretable deep learning framework for multi-modal registration via keypoints

  • Authors: Alan Q. Wang, Evan M. Yu, Adrian V. Dalca, Mert R. Sabuncu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.09941
  • Pdf link: https://arxiv.org/pdf/2304.09941
  • Abstract
    We present KeyMorph, a deep learning-based image registration framework that relies on automatically detecting corresponding keypoints. State-of-the-art deep learning methods for registration often are not robust to large misalignments, are not interpretable, and do not incorporate the symmetries of the problem. In addition, most models produce only a single prediction at test-time. Our core insight which addresses these shortcomings is that corresponding keypoints between images can be used to obtain the optimal transformation via a differentiable closed-form expression. We use this observation to drive the end-to-end learning of keypoints tailored for the registration task, and without knowledge of ground-truth keypoints. This framework not only leads to substantially more robust registration but also yields better interpretability, since the keypoints reveal which parts of the image are driving the final alignment. Moreover, KeyMorph can be designed to be equivariant under image translations and/or symmetric with respect to the input image ordering. Finally, we show how multiple deformation fields can be computed efficiently and in closed-form at test time corresponding to different transformation variants. We demonstrate the proposed framework in solving 3D affine and spline-based registration of multi-modal brain MRI scans. In particular, we show registration accuracy that surpasses current state-of-the-art methods, especially in the context of large displacements. Our code is available at https://github.com/evanmy/keymorph.

Baugh-Wooley Multiplication for the RISCV Processor

  • Authors: Franc Grootjen, Nikolai Schauer
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.09952
  • Pdf link: https://arxiv.org/pdf/2304.09952
  • Abstract
    This article describes an efficient way to implement the multiplication instructions for a RISCV processor. Instead of using three predefined IP blocks for signed, unsigned and mixed multiplication, this article presents a novel extension to the Baugh-Wooley multiplication algorithm which reduces area and power consumption with roughly a factor three.

MasakhaNEWS: News Topic Classification for African languages

  • Authors: David Ifeoluwa Adelani, Marek Masiak, Israel Abebe Azime, Jesujoba Oluwadara Alabi, Atnafu Lambebo Tonja, Christine Mwase, Odunayo Ogundepo, Bonaventure F. P. Dossou, Akintunde Oladipo, Doreen Nixdorf, Chris Chinenye Emezue, Sana Sabah al-azzawi, Blessing K. Sibanda, Davis David, Lolwethu Ndolela, Jonathan Mukiibi, Tunde Oluwaseyi Ajayi, Tatiana Moteu Ngoli, Brian Odhiambo, Abraham Toluwase Owodunni, Nnaemeka C. Obiefuna, Shamsuddeen Hassan Muhammad, Saheed Salahudeen Abdullahi, Mesay Gemeda Yigezu, Tajuddeen Gwadabe, Idris Abdulmumin, Mahlet Taye Bame, Oluwabusayo Olufunke Awoyomi, Iyanuoluwa Shode, Tolulope Anu Adelani, Habiba Abdulganiy Kailani, Abdul-Hakeem Omotayo, Adetola Adeeko, Afolabi Abeeb, Anuoluwapo Aremu, Olanrewaju Samuel, Clemencia Siro, Wangari Kimotho, Onyekachi Raphael Ogbu, et al. (23 additional authors not shown)
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.09972
  • Pdf link: https://arxiv.org/pdf/2304.09972
  • Abstract
    African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.

Equilibrium-Invariant Embedding, Metric Space, and Fundamental Set of $2\times2$ Normal-Form Games

  • Authors: Luke Marris, Ian Gemp, Georgios Piliouras
  • Subjects: Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Theoretical Economics (econ.TH); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.09978
  • Pdf link: https://arxiv.org/pdf/2304.09978
  • Abstract
    Equilibrium solution concepts of normal-form games, such as Nash equilibria, correlated equilibria, and coarse correlated equilibria, describe the joint strategy profiles from which no player has incentive to unilaterally deviate. They are widely studied in game theory, economics, and multiagent systems. Equilibrium concepts are invariant under certain transforms of the payoffs. We define an equilibrium-inspired distance metric for the space of all normal-form games and uncover a distance-preserving equilibrium-invariant embedding. Furthermore, we propose an additional transform which defines a better-response-invariant distance metric and embedding. To demonstrate these metric spaces we study $2\times2$ games. The equilibrium-invariant embedding of $2\times2$ games has an efficient two variable parameterization (a reduction from eight), where each variable geometrically describes an angle on a unit circle. Interesting properties can be spatially inferred from the embedding, including: equilibrium support, cycles, competition, coordination, distances, best-responses, and symmetries. The best-response-invariant embedding of $2\times2$ games, after considering symmetries, rediscovers a set of 15 games, and their respective equivalence classes. We propose that this set of game classes is fundamental and captures all possible interesting strategic interactions in $2\times2$ games. We introduce a directed graph representation and name for each class. Finally, we leverage the tools developed for $2\times2$ games to develop game theoretic visualizations of large normal-form and extensive-form games that aim to fingerprint the strategic interactions that occur within.

Tetra-NeRF: Representing Neural Radiance Fields Using Tetrahedra

  • Authors: Jonas Kulhanek, Torsten Sattler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09987
  • Pdf link: https://arxiv.org/pdf/2304.09987
  • Abstract
    Neural Radiance Fields (NeRFs) are a very recent and very popular approach for the problems of novel view synthesis and 3D reconstruction. A popular scene representation used by NeRFs is to combine a uniform, voxel-based subdivision of the scene with an MLP. Based on the observation that a (sparse) point cloud of the scene is often available, this paper proposes to use an adaptive representation based on tetrahedra and a Delaunay representation instead of the uniform subdivision or point-based representations. We show that such a representation enables efficient training and leads to state-of-the-art results. Our approach elegantly combines concepts from 3D geometry processing, triangle-based rendering, and modern neural radiance fields. Compared to voxel-based representations, ours provides more detail around parts of the scene likely to be close to the surface. Compared to point-based representations, our approach achieves better performance.

AI-coherent data-driven forecasting model for a combined cycle power plant

  • Authors: Mir Sayed Shah Danish, Zahra Nazari, Tomonobu Senjyu
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10009
  • Pdf link: https://arxiv.org/pdf/2304.10009
  • Abstract
    This study investigates the transformation of energy models to align with machine learning requirements as a promising tool for optimizing the operation of combined cycle power plants (CCPPs). By modeling energy production as a function of environmental and control variables, this methodology offers an innovative way to achieve energy-efficient power generation in the context of the data-driven application. This study focuses on developing a thorough AI-coherent modeling approach for CCPP optimization, preferring an interdisciplinary perspective and coming up with a comprehensive, insightful analysis. The proposed numerical model using Broyden Fletcher Goldfarb Shanno (BFGS) algorithm enhances efficiency by simulating various operating scenarios and adjusting optimal parameters, leading to a high yield power generation of 2.23% increase from 452 MW to 462.1 MW by optimizing the environmental factors. This study deals with data-driven modeling based on historical data to make predictions without prior knowledge of the system's parameter, demonstrating several merits in identifying patterns that can be difficult for human analysts to detect, high accuracy when trained on large datasets, and the potential to improve over time with new data. The proposed modeling approach and methodology can be expanded as a valuable tool for forecasting and decision-making in complex energy systems.

Dynablox: Real-time Detection of Diverse Dynamic Objects in Complex Environments

  • Authors: Lukas Schmid, Olov Andersson, Aurelio Sulser, Patrick Pfreundschuh, Roland Siegwart
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.10049
  • Pdf link: https://arxiv.org/pdf/2304.10049
  • Abstract
    Real-time detection of moving objects is an essential capability for robots acting autonomously in dynamic environments. We thus propose Dynablox, a novel online mapping-based approach for robust moving object detection in complex environments. The central idea of our approach is to incrementally estimate high confidence free-space areas by modeling and accounting for sensing, state estimation, and mapping limitations during online robot operation. The spatio-temporally conservative free space estimate enables robust detection of moving objects without making any assumptions on the appearance of objects or environments. This allows deployment in complex scenes such as multi-storied buildings or staircases, and for diverse moving objects such as people carrying various items, doors swinging or even balls rolling around. We thoroughly evaluate our approach on real-world data sets, achieving 86% IoU at 17 FPS in typical robotic settings. The method outperforms a recent appearance-based classifier and approaches the performance of offline methods. We demonstrate its generality on a novel data set with rare moving objects in complex environments. We make our efficient implementation and the novel data set available as open-source.

Maximize the Long-term Average Revenue of Network Slice Provider via Admission Control Among Heterogeneous Slices

  • Authors: Miao Dai, Gang Sun, Hongfang Yu, Dusit Niyato
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.10057
  • Pdf link: https://arxiv.org/pdf/2304.10057
  • Abstract
    Network slicing endows 5G/B5G with differentiated and customized capabilities to cope with the proliferation of diversified services, whereas limited physical network resources may not be able to support all service requests. Slice admission control is regarded as an essential means to ensure service quality and service isolation when the network is under burden. Herein, the scenario where rational tenants coexist with partially competitive network slice providers is adopted. We aim to maximize the long-term average revenue of the network operators through slice admission control, with the feasibility of multidimensional resource requirements, the priority differences among heterogeneous slices, and the admission fairness within each slice taken into account concurrently. We prove the intractability of our problem by a reduction from the Multidimensional Knapsack Problem (MKP), and propose a two-stage algorithm called MPSAC to make a sub-optimal solution efficiently. The principle of MPSAC is to split the original problem into two sub-problems; inter-slice decision-making and intra-slice quota allocation, which are solved using a heuristic method and a tailored auction mechanism respectively. Extensive simulations are carried out to demonstrate the efficacy of our algorithm, the results show that the long-term average revenue of ours is at least 9.6% higher than comparisons while maintaining better priority relations and achieving improved fairness performance.

High-Performance and Flexible Parallel Algorithms for Semisort and Related Problems

  • Authors: Xiaojun Dong, Yunshu Wu, Zhongqi Wang, Laxman Dhulipala, Yan Gu, Yihan Sun
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.10078
  • Pdf link: https://arxiv.org/pdf/2304.10078
  • Abstract
    Semisort is a fundamental algorithmic primitive widely used in the design and analysis of efficient parallel algorithms. It takes input as an array of records and a function extracting a \emph{key} per record, and reorders them so that records with equal keys are contiguous. Since many applications only require collecting equal values, but not fully sorting the input, semisort is broadly applicable, e.g., in string algorithms, graph analytics, and geometry processing, among many other domains. However, despite dozens of recent papers that use semisort in their theoretical analysis and the existence of an asymptotically optimal parallel semisort algorithm, most implementations of these parallel algorithms choose to implement semisort by using comparison or integer sorting in practice, due to potential performance issues in existing semisort implementations. In this paper, we revisit the semisort problem, with the goal of achieving a high-performance parallel semisort implementation with a flexible interface. Our approach can easily extend to two related problems, \emph{histogram} and \emph{collect-reduce}. Our algorithms achieve strong speedups in practice, and importantly, outperform state-of-the-art parallel sorting and semisorting methods for almost all settings we tested, with varying input sizes, distribution, and key types. We also test two important applications with real-world data, and show that our algorithms improve the performance over existing approaches. We believe that many other parallel algorithm implementations can be accelerated using our results.

Transmit Power Minimization for STAR-RIS Empowered Symbiotic Radio Communications

  • Authors: Chao Zhou, Bin Lyu, Youhong Feng, Dinh Thai Hoang
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.10095
  • Pdf link: https://arxiv.org/pdf/2304.10095
  • Abstract
    In this paper, we propose a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) empowered transmission scheme for symbiotic radio (SR) systems to make more flexibility for network deployment and enhance system performance. The STAR-RIS is utilized to not only beam the primary signals from the base station (BS) towards multiple primary users on the same side of the STAR-RIS, but also achieve the secondary transmission to the secondary users on another side. We consider both the broadcasting signal model and unicasting signal model at the BS. For each model, we aim for minimizing the transmit power of the BS by designing the active beamforming and simultaneous reflection and transmission coefficients under the practical phase correlation constraint. To address the challenge of solving the formulated problem, we propose a block coordinate descent based algorithm with the semidefinite relaxation, penalty dual decomposition and successive convex approximation methods, which decomposes the original problem into one sub-problem about active beamforming and the other sub-problem about simultaneous reflection and transmission coefficients, and iteratively solve them until the convergence is achieved. Numerical results indicate that the proposed scheme can reduce up to 150.6% transmit power compared to the backscattering device enabled scheme.

Two-Memory Reinforcement Learning

  • Authors: Zhao Yang, Thomas. M. Moerland, Mike Preuss, Aske Plaat
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10098
  • Pdf link: https://arxiv.org/pdf/2304.10098
  • Abstract
    While deep reinforcement learning has shown important empirical success, it tends to learn relatively slow due to slow propagation of rewards information and slow update of parametric neural networks. Non-parametric episodic memory, on the other hand, provides a faster learning alternative that does not require representation learning and uses maximum episodic return as state-action values for action selection. Episodic memory and reinforcement learning both have their own strengths and weaknesses. Notably, humans can leverage multiple memory systems concurrently during learning and benefit from all of them. In this work, we propose a method called Two-Memory reinforcement learning agent (2M) that combines episodic memory and reinforcement learning to distill both of their strengths. The 2M agent exploits the speed of the episodic memory part and the optimality and the generalization capacity of the reinforcement learning part to complement each other. Our experiments demonstrate that the 2M agent is more data efficient and outperforms both pure episodic memory and pure reinforcement learning, as well as a state-of-the-art memory-augmented RL agent. Moreover, the proposed approach provides a general framework that can be used to combine any episodic memory agent with other off-policy reinforcement learning algorithms.

Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One

  • Authors: Hongyuan Zhang, Yanan Zhu, Xuelong Li
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10126
  • Pdf link: https://arxiv.org/pdf/2304.10126
  • Abstract
    Graph neural networks (GNN) suffer from severe inefficiency. It is mainly caused by the exponential growth of node dependency with the increase of layers. It extremely limits the application of stochastic optimization algorithms so that the training of GNN is usually time-consuming. To address this problem, we propose to decouple a multi-layer GNN as multiple simple modules for more efficient training, which is comprised of classical forward training (FT)and designed backward training (BT). Under the proposed framework, each module can be trained efficiently in FT by stochastic algorithms without distortion of graph information owing to its simplicity. To avoid the only unidirectional information delivery of FT and sufficiently train shallow modules with the deeper ones, we develop a backward training mechanism that makes the former modules perceive the latter modules. The backward training introduces the reversed information delivery into the decoupled modules as well as the forward information delivery. To investigate how the decoupling and greedy training affect the representational capacity, we theoretically prove that the error produced by linear modules will not accumulate on unsupervised tasks in most cases. The theoretical and experimental results show that the proposed framework is highly efficient with reasonable performance.

Securing Semantic Communications with Physical-layer Semantic Encryption and Obfuscation

  • Authors: Qi Qin, Yankai Rong, Guoshun Nan, Shaokang Wu, Xuefei Zhang, Qimei Cui, Xiaofeng Tao
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.10147
  • Pdf link: https://arxiv.org/pdf/2304.10147
  • Abstract
    Deep learning based semantic communication(DLSC) systems have shown great potential of making wireless networks significantly more efficient by only transmitting the semantics of the data. However, the open nature of wireless channel and fragileness of neural models cause DLSC systems extremely vulnerable to various attacks. Traditional wireless physical layer key (PLK), which relies on reciprocal channel and randomness characteristics between two legitimate users, holds the promise of securing DLSC. The main challenge lies in generating secret keys in the static environment with ultra-low/zero rate. Different from prior efforts that use relays or reconfigurable intelligent surfaces (RIS) to manipulate wireless channels, this paper proposes a novel physical layer semantic encryption scheme by exploring the randomness of bilingual evaluation understudy (BLEU) scores in the field of machine translation, and additionally presents a novel semantic obfuscation mechanism to provide further physical layer protections. Specifically, 1) we calculate the BLEU scores and corresponding weights of the DLSC system. Then, we generate semantic keys (SKey) by feeding the weighted sum of the scores into a hash function. 2) Equipped with the SKey, our proposed subcarrier obfuscation is able to further secure semantic communications with a dynamic dummy data insertion mechanism. Experiments show the effectiveness of our method, especially in the static wireless environment.

Is ChatGPT a Good Recommender? A Preliminary Study

  • Authors: Junling Liu, Chao Liu, Renjie Lv, Kang Zhou, Yan Zhang
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.10149
  • Pdf link: https://arxiv.org/pdf/2304.10149
  • Abstract
    Recommendation systems have witnessed significant advancements and have been widely used over the past decades. However, most traditional recommendation methods are task-specific and therefore lack efficient generalization ability. Recently, the emergence of ChatGPT has significantly advanced NLP tasks by enhancing the capabilities of conversational models. Nonetheless, the application of ChatGPT in the recommendation domain has not been thoroughly investigated. In this paper, we employ ChatGPT as a general-purpose recommendation model to explore its potential for transferring extensive linguistic and world knowledge acquired from large-scale corpora to recommendation scenarios. Specifically, we design a set of prompts and evaluate ChatGPT's performance on five recommendation scenarios. Unlike traditional recommendation methods, we do not fine-tune ChatGPT during the entire evaluation process, relying only on the prompts themselves to convert recommendation tasks into natural language tasks. Further, we explore the use of few-shot prompting to inject interaction information that contains user potential interest to help ChatGPT better understand user needs and interests. Comprehensive experimental results on Amazon Beauty dataset show that ChatGPT has achieved promising results in certain tasks and is capable of reaching the baseline level in others. We conduct human evaluations on two explainability-oriented tasks to more accurately evaluate the quality of contents generated by different models. And the human evaluations show ChatGPT can truly understand the provided information and generate clearer and more reasonable results. We hope that our study can inspire researchers to further explore the potential of language models like ChatGPT to improve recommendation performance and contribute to the advancement of the recommendation systems field.

Automated Dynamic Bayesian Networks for Predicting Acute Kidney Injury Before Onset

  • Authors: David Gordon, Panayiotis Petousis, Anders O. Garlid, Keith Norris, Katherine Tuttle, Susanne B. Nicholas, Alex A.T. Bui (on behalf of CURE-CKD)
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10175
  • Pdf link: https://arxiv.org/pdf/2304.10175
  • Abstract
    Several algorithms for learning the structure of dynamic Bayesian networks (DBNs) require an a priori ordering of variables, which influences the determined graph topology. However, it is often unclear how to determine this order if feature importance is unknown, especially as an exhaustive search is usually impractical. In this paper, we introduce Ranking Approaches for Unknown Structures (RAUS), an automated framework to systematically inform variable ordering and learn networks end-to-end. RAUS leverages existing statistical methods (Cramers V, chi-squared test, and information gain) to compare variable ordering, resultant generated network topologies, and DBN performance. RAUS enables end-users with limited DBN expertise to implement models via command line interface. We evaluate RAUS on the task of predicting impending acute kidney injury (AKI) from inpatient clinical laboratory data. Longitudinal observations from 67,460 patients were collected from our electronic health record (EHR) and Kidney Disease Improving Global Outcomes (KDIGO) criteria were then applied to define AKI events. RAUS learns multiple DBNs simultaneously to predict a future AKI event at different time points (i.e., 24-, 48-, 72-hours in advance of AKI). We also compared the results of the learned AKI prediction models and variable orderings to baseline techniques (logistic regression, random forests, and extreme gradient boosting). The DBNs generated by RAUS achieved 73-83% area under the receiver operating characteristic curve (AUCROC) within 24-hours before AKI; and 71-79% AUCROC within 48-hours before AKI of any stage in a 7-day observation window. Insights from this automated framework can help efficiently implement and interpret DBNs for clinical decision support. The source code for RAUS is available in GitHub at https://github.com/dgrdn08/RAUS .

Robust Deep Reinforcement Learning Scheduling via Weight Anchoring

  • Authors: Steffen Gracla, Edgar Beck, Carsten Bockelmann, Armin Dekorsy
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10176
  • Pdf link: https://arxiv.org/pdf/2304.10176
  • Abstract
    Questions remain on the robustness of data-driven learning methods when crossing the gap from simulation to reality. We utilize weight anchoring, a method known from continual learning, to cultivate and fixate desired behavior in Neural Networks. Weight anchoring may be used to find a solution to a learning problem that is nearby the solution of another learning problem. Thereby, learning can be carried out in optimal environments without neglecting or unlearning desired behavior. We demonstrate this approach on the example of learning mixed QoS-efficient discrete resource scheduling with infrequent priority messages. Results show that this method provides performance comparable to the state of the art of augmenting a simulation environment, alongside significantly increased robustness and steerability.

Regularizing Second-Order Influences for Continual Learning

  • Authors: Zhicheng Sun, Yadong Mu, Gang Hua
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10177
  • Pdf link: https://arxiv.org/pdf/2304.10177
  • Abstract
    Continual learning aims to learn on non-stationary data streams without catastrophically forgetting previous knowledge. Prevalent replay-based methods address this challenge by rehearsing on a small buffer holding the seen data, for which a delicate sample selection strategy is required. However, existing selection schemes typically seek only to maximize the utility of the ongoing selection, overlooking the interference between successive rounds of selection. Motivated by this, we dissect the interaction of sequential selection steps within a framework built on influence functions. We manage to identify a new class of second-order influences that will gradually amplify incidental bias in the replay buffer and compromise the selection process. To regularize the second-order effects, a novel selection objective is proposed, which also has clear connections to two widely adopted criteria. Furthermore, we present an efficient implementation for optimizing the proposed criterion. Experiments on multiple continual learning benchmarks demonstrate the advantage of our approach over state-of-the-art methods. Code is available at https://github.com/feifeiobama/InfluenceCL.

Efficient Uncertainty Estimation in Spiking Neural Networks via MC-dropout

  • Authors: Tao Sun, Bojian Yin, Sander Bohte
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10191
  • Pdf link: https://arxiv.org/pdf/2304.10191
  • Abstract
    Spiking neural networks (SNNs) have gained attention as models of sparse and event-driven communication of biological neurons, and as such have shown increasing promise for energy-efficient applications in neuromorphic hardware. As with classical artificial neural networks (ANNs), predictive uncertainties are important for decision making in high-stakes applications, such as autonomous vehicles, medical diagnosis, and high frequency trading. Yet, discussion of uncertainty estimation in SNNs is limited, and approaches for uncertainty estimation in artificial neural networks (ANNs) are not directly applicable to SNNs. Here, we propose an efficient Monte Carlo(MC)-dropout based approach for uncertainty estimation in SNNs. Our approach exploits the time-step mechanism of SNNs to enable MC-dropout in a computationally efficient manner, without introducing significant overheads during training and inference while demonstrating high accuracy and uncertainty quality.

Selective and Collaborative Influence Function for Efficient Recommendation Unlearning

  • Authors: Yuyuan Li, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Biao Gong, Jun Wang
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.10199
  • Pdf link: https://arxiv.org/pdf/2304.10199
  • Abstract
    Recent regulations on the Right to be Forgotten have greatly influenced the way of running a recommender system, because users now have the right to withdraw their private data. Besides simply deleting the target data in the database, unlearning the associated data lineage e.g., the learned personal features and preferences in the model, is also necessary for data withdrawal. Existing unlearning methods are mainly devised for generalized machine learning models in classification tasks. In this paper, we first identify two main disadvantages of directly applying existing unlearning methods in the context of recommendation, i.e., (i) unsatisfactory efficiency for large-scale recommendation models and (ii) destruction of collaboration across users and items. To tackle the above issues, we propose an extra-efficient recommendation unlearning method based on Selective and Collaborative Influence Function (SCIF). Our proposed method can (i) avoid any kind of retraining which is computationally prohibitive for large-scale systems, (ii) further enhance efficiency by selectively updating user embedding and (iii) preserve the collaboration across the remaining users and items. Furthermore, in order to evaluate the unlearning completeness, we define a Membership Inference Oracle (MIO), which can justify whether the unlearned data points were in the training set of the model, i.e., whether a data point was completely unlearned. Extensive experiments on two benchmark datasets demonstrate that our proposed method can not only greatly enhance unlearning efficiency, but also achieve adequate unlearning completeness. More importantly, our proposed method outperforms the state-of-the-art unlearning method regarding comprehensive recommendation metrics.

Spiking-Fer: Spiking Neural Network for Facial Expression Recognition With Event Cameras

  • Authors: Sami Barchid, Benjamin Allaert, Amel Aissaoui, José Mennesson, Chaabane Djéraba
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10211
  • Pdf link: https://arxiv.org/pdf/2304.10211
  • Abstract
    Facial Expression Recognition (FER) is an active research domain that has shown great progress recently, notably thanks to the use of large deep learning models. However, such approaches are particularly energy intensive, which makes their deployment difficult for edge devices. To address this issue, Spiking Neural Networks (SNNs) coupled with event cameras are a promising alternative, capable of processing sparse and asynchronous events with lower energy consumption. In this paper, we establish the first use of event cameras for FER, named "Event-based FER", and propose the first related benchmarks by converting popular video FER datasets to event streams. To deal with this new task, we propose "Spiking-FER", a deep convolutional SNN model, and compare it against a similar Artificial Neural Network (ANN). Experiments show that the proposed approach achieves comparable performance to the ANN architecture, while consuming less energy by orders of magnitude (up to 65.39x). In addition, an experimental study of various event-based data augmentation techniques is performed to provide insights into the efficient transformations specific to event-based FER.

Dynamic Security Region of Natural Gas Systems in Integrated Electricity-Gas Systems

  • Authors: Han Gao, Peiyao Zhao, Zhengshuo Li
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10215
  • Pdf link: https://arxiv.org/pdf/2304.10215
  • Abstract
    In an integrated electricity-gas system (IEGS), the tight coupling of power and natural gas systems is embodied by frequent changes in gas withdrawal from gas-fired units to provide regulation services for the power system to handle uncertainty, which may in turn endanger the secure operation of the natural gas system and ultimately affect the safety of the whole IEGS. Hence, it is necessary to accurately and efficiently evaluate the dynamic security region (DSR) of the natural gas system in the IEGS by considering the real-time dynamic characteristics of natural gas systems, which are not satisfactorily handled in state-of-the-art works. To bridge this gap, this paper first conceptionally verifies the necessity of the DSR and establishes its mathematical model. Then, a dimensionality reduction method is proposed for the efficient solution and visualization of the high-dimensional DSR evaluation model. A fast evaluation (FE) algorithm is developed to address the difficulties of the nonconvex dynamic constraints in the reduced DSR model. Finally, the necessity and notable advantages of the proposed DSR model and FE are verified based on small and relatively large test systems in comparison with common security region models and algorithms. To the best of our knowledge, this is the first paper that comprehensively presents models and efficient algorithms regarding the DSR of natural gas systems in an IEGS.

An Analysis of the Completion Time of the BB84 Protocol

  • Authors: Sounak Kar, Jean-Yves Le Boudec
  • Subjects: Performance (cs.PF); Quantum Physics (quant-ph)
  • Arxiv link: https://arxiv.org/abs/2304.10218
  • Pdf link: https://arxiv.org/pdf/2304.10218
  • Abstract
    The BB84 QKD protocol is based on the idea that the sender and the receiver can reconcile a certain fraction of the teleported qubits to detect eavesdropping or noise and decode the rest to use as a private key. Under the present hardware infrastructure, decoherence of quantum states poses a significant challenge to performing perfect or efficient teleportation, meaning that a teleportation-based protocol must be run multiple times to observe success. Thus, performance analyses of such protocols usually consider the completion time, i.e., the time until success, rather than the duration of a single attempt. Moreover, due to decoherence, the success of an attempt is in general dependent on the duration of individual phases of that attempt, as quantum states must wait in memory while the success or failure of a generation phase is communicated to the relevant parties. In this work, we do a performance analysis of the completion time of the BB84 protocol in a setting where the sender and the receiver are connected via a single quantum repeater and the only quantum channel between them does not see any adversarial attack. Assuming certain distributional forms for the generation and communication phases of teleportation, we provide a method to compute the MGF of the completion time and subsequently derive an estimate of the CDF and a bound on the tail probability. This result helps us gauge the (tail) behaviour of the completion time in terms of the parameters characterising the elementary phases of teleportation, without having to run the protocol multiple times. We also provide an efficient simulation scheme to generate the completion time, which relies on expressing the completion time in terms of aggregated teleportation times. We numerically compare our approach with a full-scale simulation and observe good agreement between them.

PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces

  • Authors: Shuhei Watanabe, Archit Bansal, Frank Hutter
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.10255
  • Pdf link: https://arxiv.org/pdf/2304.10255
  • Abstract
    The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA). However, the original f-ANOVA formulation is inapplicable to the subspaces most relevant to algorithm designers, such as those defined by top performance. To overcome this problem, we derive a novel formulation of f-ANOVA for arbitrary subspaces and propose an algorithm that uses Pearson divergence (PED) to enable a closed-form computation of HPI. We demonstrate that this new algorithm, dubbed PED-ANOVA, is able to successfully identify important HPs in different subspaces while also being extremely computationally efficient.

PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image

  • Authors: Jianhui Li, Jianmin Li, Haoji Zhang, Shilong Liu, Zhengyi Wang, Zihao Xiao, Kaiwen Zheng, Jun Zhu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10263
  • Pdf link: https://arxiv.org/pdf/2304.10263
  • Abstract
    We study the 3D-aware image attribute editing problem in this paper, which has wide applications in practice. Recent methods solved the problem by training a shared encoder to map images into a 3D generator's latent space or by per-image latent code optimization and then edited images in the latent space. Despite their promising results near the input view, they still suffer from the 3D inconsistency of produced images at large camera poses and imprecise image attribute editing, like affecting unspecified attributes during editing. For more efficient image inversion, we train a shared encoder for all images. To alleviate 3D inconsistency at large camera poses, we propose two novel methods, an alternating training scheme and a multi-view identity loss, to maintain 3D consistency and subject identity. As for imprecise image editing, we attribute the problem to the gap between the latent space of real images and that of generated images. We compare the latent space and inversion manifold of GAN models and demonstrate that editing in the inversion manifold can achieve better results in both quantitative and qualitative evaluations. Extensive experiments show that our method produces more 3D consistent images and achieves more precise image editing than previous work. Source code and pretrained models can be found on our project page: https://mybabyyh.github.io/Preim3D/

Robust nonlinear set-point control with reinforcement learning

  • Authors: Ruoqi Zhang, Per Mattsson, Torbjörn Wigren
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10277
  • Pdf link: https://arxiv.org/pdf/2304.10277
  • Abstract
    There has recently been an increased interest in reinforcement learning for nonlinear control problems. However standard reinforcement learning algorithms can often struggle even on seemingly simple set-point control problems. This paper argues that three ideas can improve reinforcement learning methods even for highly nonlinear set-point control problems: 1) Make use of a prior feedback controller to aid amplitude exploration. 2) Use integrated errors. 3) Train on model ensembles. Together these ideas lead to more efficient training, and a trained set-point controller that is more robust to modelling errors and thus can be directly deployed to real-world nonlinear systems. The claim is supported by experiments with a real-world nonlinear cascaded tank process and a simulated strongly nonlinear pH-control system.

A baseline on continual learning methods for video action recognition

  • Authors: Giulia Castagnolo, Concetto Spampinato, Francesco Rundo, Daniela Giordano, Simone Palazzo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10335
  • Pdf link: https://arxiv.org/pdf/2304.10335
  • Abstract
    Continual learning has recently attracted attention from the research community, as it aims to solve long-standing limitations of classic supervisedly-trained models. However, most research on this subject has tackled continual learning in simple image classification scenarios. In this paper, we present a benchmark of state-of-the-art continual learning methods on video action recognition. Besides the increased complexity due to the temporal dimension, the video setting imposes stronger requirements on computing resources for top-performing rehearsal methods. To counteract the increased memory requirements, we present two method-agnostic variants for rehearsal methods, exploiting measures of either model confidence or data information to select memorable samples. Our experiments show that, as expected from the literature, rehearsal methods outperform other approaches; moreover, the proposed memory-efficient variants are shown to be effective at retaining a certain level of performance with a smaller buffer size.

Engel's theorem in Mathlib

  • Authors: Oliver Nash
  • Subjects: Logic in Computer Science (cs.LO); Representation Theory (math.RT)
  • Arxiv link: https://arxiv.org/abs/2304.10424
  • Pdf link: https://arxiv.org/pdf/2304.10424
  • Abstract
    We discuss the theory of Lie algebras in Lean's Mathlib library. Using nilpotency as the theme, we outline a computer formalisation of Engel's theorem and an application to root space theory. We emphasise that all arguments work with coefficients in any commutative ring.

GPT-NER: Named Entity Recognition via Large Language Models

  • Authors: Shuhe Wang, Xiaofei Sun, Xiaoya Li, Rongbin Ouyang, Fei Wu, Tianwei Zhang, Jiwei Li, Guoyin Wang
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2304.10428
  • Pdf link: https://arxiv.org/pdf/2304.10428
  • Abstract
    Despite the fact that large-scale Language Models (LLM) have achieved SOTA performances on a variety of NLP tasks, its performance on NER is still significantly below supervised baselines. This is due to the gap between the two tasks the NER and LLMs: the former is a sequence labeling task in nature while the latter is a text-generation model. In this paper, we propose GPT-NER to resolve this issue. GPT-NER bridges the gap by transforming the sequence labeling task to a generation task that can be easily adapted by LLMs e.g., the task of finding location entities in the input text "Columbus is a city" is transformed to generate the text sequence "@@columbus## is a city", where special tokens @@## marks the entity to extract. To efficiently address the "hallucination" issue of LLMs, where LLMs have a strong inclination to over-confidently label NULL inputs as entities, we propose a self-verification strategy by prompting LLMs to ask itself whether the extracted entities belong to a labeled entity tag. We conduct experiments on five widely adopted NER datasets, and GPT-NER achieves comparable performances to fully supervised baselines, which is the first time as far as we are concerned. More importantly, we find that GPT-NER exhibits a greater ability in the low-resource and few-shot setups, when the amount of training data is extremely scarce, GPT-NER performs significantly better than supervised models. This demonstrates the capabilities of GPT-NER in real-world NER applications where the number of labeled examples is limited.

Securing Neural Networks with Knapsack Optimization

  • Authors: Yakir Gorski, Shai Avidan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10442
  • Pdf link: https://arxiv.org/pdf/2304.10442
  • Abstract
    Deep learning inference brings together the data and the Convolutional Neural Network (CNN). This is problematic in case the user wants to preserve the privacy of the data and the service provider does not want to reveal the weights of his CNN. Secure Inference allows the two parties to engage in a protocol that preserves their respective privacy concerns, while revealing only the inference result to the user. This is known as Multi-Party Computation (MPC). A major bottleneck of MPC algorithms is communication, as the parties must send data back and forth. The linear component of a CNN (i.e. convolutions) can be done efficiently with minimal communication, but the non-linear part (i.e., ReLU) requires the bulk of communication bandwidth. We propose two ways to accelerate Secure Inference. The first is based on the observation that the ReLU outcome of many convolutions is highly correlated. Therefore, we replace the per pixel ReLU operation by a ReLU operation per patch. Each layer in the network will benefit from a patch of a different size and we devise an algorithm to choose the optimal set of patch sizes through a novel reduction of the problem to a knapsack problem. The second way to accelerate Secure Inference is based on cutting the number of bit comparisons required for a secure ReLU operation. We demonstrate the cumulative effect of these tools in the semi-honest secure 3-party setting for four problems: Classifying ImageNet using ResNet50 backbone, classifying CIFAR100 using ResNet18 backbone, semantic segmentation of ADE20K using MobileNetV2 backbone and semantic segmentation of Pascal VOC 2012 using ResNet50 backbone. Our source code is publicly available: $\href{https://github.com/yg320/secure_inference}{\text{https://github.com/yg320/secure_inference}}$

Angle based dynamic learning rate for gradient descent

  • Authors: Neel Mishra, Pawan Kumar
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10457
  • Pdf link: https://arxiv.org/pdf/2304.10457
  • Abstract
    In our work, we propose a novel yet simple approach to obtain an adaptive learning rate for gradient-based descent methods on classification tasks. Instead of the traditional approach of selecting adaptive learning rates via the decayed expectation of gradient-based terms, we use the angle between the current gradient and the new gradient: this new gradient is computed from the direction orthogonal to the current gradient, which further helps us in determining a better adaptive learning rate based on angle history, thereby, leading to relatively better accuracy compared to the existing state-of-the-art optimizers. On a wide variety of benchmark datasets with prominent image classification architectures such as ResNet, DenseNet, EfficientNet, and VGG, we find that our method leads to the highest accuracy in most of the datasets. Moreover, we prove that our method is convergent.

Reducing Aggregate Electric Vehicle Battery Capacity through Sharing

  • Authors: Polina Alexeenko, Vasileios Charisopoulos
  • Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.10461
  • Pdf link: https://arxiv.org/pdf/2304.10461
  • Abstract
    Meeting growing demand for automotive battery resources is predicted to be costly from both economic and environmental perspectives. To minimize these costs, battery resources should be deployed as efficiently as possible. A potential source of inefficiency in battery deployment is the fact that the batteries of personal vehicles are typically much larger than needed to meet most daily mobility needs. In this paper, we consider whether battery resources can be used more efficiently in a setting where drivers, in addition to having personal vehicle batteries, have access to a shared battery resource. More precisely, we consider the problem of minimizing aggregate battery capacity in settings with and without a shared resource subject to the requirement that driver commuting needs are met with high reliability. To assess the potential for reductions in deployed battery capacity with the addition of a shared resource, we quantify the difference in deployed battery capacity with and without a shared resource in case study using real-world longitudinal mobility data from Puget Sound, Washington. We find that giving drivers access to a shared battery resource can substantially reduces deployed battery capacity. Furthermore, relative reductions in battery capacity increase with number of drivers and the level of reliability desired.

Efficient Deep Reinforcement Learning Requires Regulating Overfitting

  • Authors: Qiyang Li, Aviral Kumar, Ilya Kostrikov, Sergey Levine
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.10466
  • Pdf link: https://arxiv.org/pdf/2304.10466
  • Abstract
    Deep reinforcement learning algorithms that learn policies by trial-and-error must learn from limited amounts of data collected by actively interacting with the environment. While many prior works have shown that proper regularization techniques are crucial for enabling data-efficient RL, a general understanding of the bottlenecks in data-efficient RL has remained unclear. Consequently, it has been difficult to devise a universal technique that works well across all domains. In this paper, we attempt to understand the primary bottleneck in sample-efficient deep RL by examining several potential hypotheses such as non-stationarity, excessive action distribution shift, and overfitting. We perform thorough empirical analysis on state-based DeepMind control suite (DMC) tasks in a controlled and systematic way to show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms, and prior methods that lead to good performance do in fact, control the validation TD error to be low. This observation gives us a robust principle for making deep RL efficient: we can hill-climb on the validation TD error by utilizing any form of regularization techniques from supervised learning. We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.

A primal dual mixed finite element method for inverse identification of the diffusion coefficient and its relation to the Kohn-Vogelius penalty method

  • Authors: Erik Burman
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.10467
  • Pdf link: https://arxiv.org/pdf/2304.10467
  • Abstract
    We revisit the celebrated Kohn-Vogelius penalty method and discuss how to use it for the unique continuation problem where data is given in the bulk of the domain. We then show that the primal-dual mixed finite element methods for the elliptic Cauchy problem introduced in \cite{BLO18} (\emph{E. Burman, M. Larson, L. Oksanen, Primal-dual mixed finite element methods for the elliptic Cauchy problem, SIAM J. Num. Anal., 56 (6), 2018}) can be interpreted as a Kohn-Vogelius penalty method and modify it to allow for unique continuation using data in the bulk. We prove that the resulting linear system is invertible for all data. Then we show that by introducing a singularly perturbed Robin condition on the discrete level sufficient regularization is obtained so that error estimates can be shown using conditional stability. Finally we show how the method can be used for the identification of the diffusivity coefficient in a second order elliptic operator with partial data. Some numerical examples are presented showing the performance of the method for unique continuation and for impedance computed tomography with partial data.

New Closed-Form ASER Expressions for Dual-Hop Mixed THz-RF Cooperative Relay Networks

  • Authors: Soumendu Das, Nagendra Kumar, Dharmendra Dixit
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2304.10504
  • Pdf link: https://arxiv.org/pdf/2304.10504
  • Abstract
    In this paper, we consider a dual-hop mixed THz-RF system model for backhaul-fronthaul applications where the link between source and destination is established only through the relay node in which decode-and-forward relaying protocol is used. The THz link suffers from the joint impact of antenna misalignment and stochastic characteristics of wireless channels, including the effect of environmental conditions such as pressure, humidity, and temperature. The envelope of THz link in the first hop follows a generalized $\alpha-\mu$ distribution, and for the RF end, the Nakagami-$m$ distribution is considered. In this context, we obtain new closed-form expressions of the cumulative density function and the moment-generating function of the end-to-end signal-to-noise ratio. Further, we derive the average symbol error rate expressions for coherent rectangular quadrature amplitude modulation (RQAM) and coherent hexagonal QAM (HQAM), as well as the non-coherent modulation scheme. The asymptotic behavior is also discussed to examine the system's diversity. Furthermore, the impact of several parameters, such as fading coefficients of individual links and antenna misalignment, as well as the distance between nodes, are also highlighted in the system's performance. Moreover, Monte Carlo simulations are used to validate the presented analytical framework. Finally, the presented numerical insights aid in the extraction of practical design principles.

Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget

  • Authors: Johannes Lehner, Benedikt Alkin, Andreas Fürst, Elisabeth Rumetshofer, Lukas Miklautz, Sepp Hochreiter
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10520
  • Pdf link: https://arxiv.org/pdf/2304.10520
  • Abstract
    Masked Image Modeling (MIM) methods, like Masked Autoencoders (MAE), efficiently learn a rich representation of the input. However, for adapting to downstream tasks, they require a sufficient amount of labeled data since their rich features capture not only objects but also less relevant image background. In contrast, Instance Discrimination (ID) methods focus on objects. In this work, we study how to combine the efficiency and scalability of MIM with the ability of ID to perform downstream classification in the absence of large amounts of labeled data. To this end, we introduce Masked Autoencoder Contrastive Tuning (MAE-CT), a sequential approach that applies Nearest Neighbor Contrastive Learning (NNCLR) to a pre-trained MAE. MAE-CT tunes the rich features such that they form semantic clusters of objects without using any labels. Applied to large and huge Vision Transformer (ViT) models, MAE-CT matches or excels previous self-supervised methods trained on ImageNet in linear probing, k-NN and low-shot classification accuracy as well as in unsupervised clustering accuracy. Notably, similar results can be achieved without additional image augmentations. While ID methods generally rely on hand-crafted augmentations to avoid shortcut learning, we find that nearest neighbor lookup is sufficient and that this data-driven augmentation effect improves with model size. MAE-CT is compute efficient. For instance, starting from a MAE pre-trained ViT-L/16, MAE-CT increases the ImageNet 1% low-shot accuracy from 67.7% to 72.6%, linear probing accuracy from 76.0% to 80.2% and k-NN accuracy from 60.6% to 79.1% in just five hours using eight A100 GPUs.

Learning Narrow One-Hidden-Layer ReLU Networks

  • Authors: Sitan Chen, Zehao Dou, Surbhi Goel, Adam R Klivans, Raghu Meka
  • Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2304.10524
  • Pdf link: https://arxiv.org/pdf/2304.10524
  • Abstract
    We consider the well-studied problem of learning a linear combination of $k$ ReLU activations with respect to a Gaussian distribution on inputs in $d$ dimensions. We give the first polynomial-time algorithm that succeeds whenever $k$ is a constant. All prior polynomial-time learners require additional assumptions on the network, such as positive combining coefficients or the matrix of hidden weight vectors being well-conditioned. Our approach is based on analyzing random contractions of higher-order moment tensors. We use a multi-scale analysis to argue that sufficiently close neurons can be collapsed together, sidestepping the conditioning issues present in prior work. This allows us to design an iterative procedure to discover individual neurons.

Learning Sparse and Low-Rank Priors for Image Recovery via Iterative Reweighted Least Squares Minimization

  • Authors: Stamatios Lefkimmiatis, Iaroslav Koshelev
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2304.10536
  • Pdf link: https://arxiv.org/pdf/2304.10536
  • Abstract
    We introduce a novel optimization algorithm for image recovery under learned sparse and low-rank constraints, which we parameterize as weighted extensions of the $\ell_p^p$-vector and $\mathcal S_p^p$ Schatten-matrix quasi-norms for $0!&lt;p!\le1$, respectively. Our proposed algorithm generalizes the Iteratively Reweighted Least Squares (IRLS) method, used for signal recovery under $\ell_1$ and nuclear-norm constrained minimization. Further, we interpret our overall minimization approach as a recurrent network that we then employ to deal with inverse low-level computer vision problems. Thanks to the convergence guarantees that our IRLS strategy offers, we are able to train the derived reconstruction networks using a memory-efficient implicit back-propagation scheme, which does not pose any restrictions on their effective depth. To assess our networks' performance, we compare them against other existing reconstruction methods on several inverse problems, namely image deblurring, super-resolution, demosaicking and sparse recovery. Our reconstruction results are shown to be very competitive and in many cases outperform those of existing unrolled networks, whose number of parameters is orders of magnitude higher than that of our learned models.

Learning Neural Duplex Radiance Fields for Real-Time View Synthesis

  • Authors: Ziyu Wan, Christian Richardt, Aljaž Božič, Chao Li, Vijay Rengarajan, Seonghyeon Nam, Xiaoyu Xiang, Tuotuo Li, Bo Zhu, Rakesh Ranjan, Jing Liao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.10537
  • Pdf link: https://arxiv.org/pdf/2304.10537
  • Abstract
    Neural radiance fields (NeRFs) enable novel view synthesis with unprecedented visual quality. However, to render photorealistic images, NeRFs require hundreds of deep multilayer perceptron (MLP) evaluations - for each pixel. This is prohibitively expensive and makes real-time rendering infeasible, even on powerful modern GPUs. In this paper, we propose a novel approach to distill and bake NeRFs into highly efficient mesh-based neural representations that are fully compatible with the massively parallel graphics rendering pipeline. We represent scenes as neural radiance features encoded on a two-layer duplex mesh, which effectively overcomes the inherent inaccuracies in 3D surface reconstruction by learning the aggregated radiance information from a reliable interval of ray-surface intersections. To exploit local geometric relationships of nearby pixels, we leverage screen-space convolutions instead of the MLPs used in NeRFs to achieve high-quality appearance. Finally, the performance of the whole framework is further boosted by a novel multi-view distillation optimization strategy. We demonstrate the effectiveness and superiority of our approach via extensive experiments on a range of standard datasets.

Keyword: faster

An Intent-based Framework for Vehicular Edge Computing

  • Authors: TianZhang He, Adel N. Toosi, Negin Akbari, Muhammed Tawfiqul Islam, Muhammad Aamir Cheema
  • Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.09916
  • Pdf link: https://arxiv.org/pdf/2304.09916
  • Abstract
    The rapid development of emerging vehicular edge computing (VEC) brings new opportunities and challenges for dynamic resource management. The increasing number of edge data centers, roadside units (RSUs), and network devices, however, makes resource management a complex task in VEC. On the other hand, the exponential growth of service applications and end-users makes corresponding QoS hard to maintain. Intent-Based Networking (IBN), based on Software-Defined Networking, was introduced to provide the ability to automatically handle and manage the networking requirements of different applications. Motivated by the IBN concept, in this paper, we propose a novel approach to jointly orchestrate networking and computing resources based on user requirements. The proposed solution constantly monitors user requirements and dynamically re-configures the system to satisfy desired states of the application. We compared our proposed solution with the state-of-the-art networking embedding algorithms using real-world taxi GPS traces. Results show that our proposed method is significantly faster (up to 95%) and can improve resource utilization (up to 76%) and the acceptance ratio of computing and networking requests with various priorities (up to 71%). We also present a small-scale prototype of the proposed intent management framework to validate our solution.

Speed Me up if You Can: Conditional Lower Bounds on Opacity Verification

  • Authors: Jiří Balun, Tomáš Masopust, Petr Osička
  • Subjects: Formal Languages and Automata Theory (cs.FL); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.09920
  • Pdf link: https://arxiv.org/pdf/2304.09920
  • Abstract
    Opacity is a property of privacy and security applications asking whether, given a system model, a passive intruder that makes online observations of system's behaviour can ascertain some "secret" information of the system. Deciding opacity is a PSpace-complete problem, and hence there are no polynomial-time algorithms to verify opacity under the assumption that PSpace differs from PTime. This assumption, however, gives rise to a question whether the existing exponential-time algorithms are the best possible or whether there are faster, sub-exponential-time algorithms. We show that under the (Strong) Exponential Time Hypothesis, there are no algorithms that would be significantly faster than the existing algorithms. As a by-product, we obtained a new conditional lower bound on the time complexity of deciding universality (and therefore also inclusion and equivalence) for nondeterministic finite automata.

Two-Memory Reinforcement Learning

  • Authors: Zhao Yang, Thomas. M. Moerland, Mike Preuss, Aske Plaat
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10098
  • Pdf link: https://arxiv.org/pdf/2304.10098
  • Abstract
    While deep reinforcement learning has shown important empirical success, it tends to learn relatively slow due to slow propagation of rewards information and slow update of parametric neural networks. Non-parametric episodic memory, on the other hand, provides a faster learning alternative that does not require representation learning and uses maximum episodic return as state-action values for action selection. Episodic memory and reinforcement learning both have their own strengths and weaknesses. Notably, humans can leverage multiple memory systems concurrently during learning and benefit from all of them. In this work, we propose a method called Two-Memory reinforcement learning agent (2M) that combines episodic memory and reinforcement learning to distill both of their strengths. The 2M agent exploits the speed of the episodic memory part and the optimality and the generalization capacity of the reinforcement learning part to complement each other. Our experiments demonstrate that the 2M agent is more data efficient and outperforms both pure episodic memory and pure reinforcement learning, as well as a state-of-the-art memory-augmented RL agent. Moreover, the proposed approach provides a general framework that can be used to combine any episodic memory agent with other off-policy reinforcement learning algorithms.

ZEBRA: Z-order Curve-based Event Retrieval Approach to Efficiently Explore Automotive Data

  • Authors: Christian Berger, Lukas Birkemeyer
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2304.10232
  • Pdf link: https://arxiv.org/pdf/2304.10232
  • Abstract
    Evaluating the performance of software for automated vehicles is predominantly driven by data collected from the real world. While professional test drivers are supported with technical means to semi-automatically annotate driving maneuvers to allow better event identification, simple data loggers in large vehicle fleets typically lack automatic and detailed event classification and hence, extra effort is needed when post-processing such data. Yet, the data quality from professional test drivers is apparently higher than the one from large fleets where labels are missing, but the non-annotated data set from large vehicle fleets is much more representative for typical, realistic driving scenarios to be handled by automated vehicles. However, while growing the data from large fleets is relatively simple, adding valuable annotations during post-processing has become increasingly expensive. In this paper, we leverage Z-order space-filling curves to systematically reduce data dimensionality while preserving domain-specific data properties, which allows us to explore even large-scale field data sets to spot interesting events orders of magnitude faster than processing time-series data directly. Furthermore, the proposed concept is based on an analytical approach, which preserves explainability for the identified events.

Observer-Feedback-Feedforward Controller Structures in Reinforcement Learning

  • Authors: Ruoqi Zhang, Per Mattson, Torbjörn Wigren
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10276
  • Pdf link: https://arxiv.org/pdf/2304.10276
  • Abstract
    The paper proposes the use of structured neural networks for reinforcement learning based nonlinear adaptive control. The focus is on partially observable systems, with separate neural networks for the state and feedforward observer and the state feedback and feedforward controller. The observer dynamics are modelled by recurrent neural networks while a standard network is used for the controller. As discussed in the paper, this leads to a separation of the observer dynamics to the recurrent neural network part, and the state feedback to the feedback and feedforward network. The structured approach reduces the computational complexity and gives the reinforcement learning based controller an {\em understandable} structure as compared to when one single neural network is used. As shown by simulation the proposed structure has the additional and main advantage that the training becomes significantly faster. Two ways to include feedforward structure are presented, one related to state feedback control and one related to classical feedforward control. The latter method introduces further structure with a separate recurrent neural network that processes only the measured disturbance. When evaluated with simulation on a nonlinear cascaded double tank process, the method with most structure performs the best, with excellent feedforward disturbance rejection gains.

Regret-Minimizing Double Oracle for Extensive-Form Games

  • Authors: Xiaohang Tang, Le Cong Dinh, Stephen Marcus McAleer, Yaodong Yang
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2304.10498
  • Pdf link: https://arxiv.org/pdf/2304.10498
  • Abstract
    By incorporating regret minimization, double oracle methods have demonstrated rapid convergence to Nash Equilibrium (NE) in normal-form games and extensive-form games, through algorithms such as online double oracle (ODO) and extensive-form double oracle (XDO), respectively. In this study, we further examine the theoretical convergence rate and sample complexity of such regret minimization-based double oracle methods, utilizing a unified framework called Regret-Minimizing Double Oracle. Based on this framework, we extend ODO to extensive-form games and determine its sample complexity. Moreover, we demonstrate that the sample complexity of XDO can be exponential in the number of information sets $|S|$, owing to the exponentially decaying stopping threshold of restricted games. To solve this problem, we propose the Periodic Double Oracle (PDO) method, which has the lowest sample complexity among all existing double oracle methods, being only polynomial in $|S|$. Empirical evaluations on multiple poker and board games show that PDO achieves significantly faster convergence than previous double oracle algorithms and reaches a competitive level with state-of-the-art regret minimization methods.

Transformer Models for Type Inference in the Simply Typed Lambda Calculus: A Case Study in Deep Learning for Code

  • Authors: Brando Miranda, Avi Shinnar, Vasily Pestun, Barry Trager
  • Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Symbolic Computation (cs.SC)
  • Arxiv link: https://arxiv.org/abs/2304.10500
  • Pdf link: https://arxiv.org/pdf/2304.10500
  • Abstract
    Despite a growing body of work at the intersection of deep learning and formal languages, there has been relatively little systematic exploration of transformer models for reasoning about typed lambda calculi. This is an interesting area of inquiry for two reasons. First, typed lambda calculi are the lingua franc of programming languages. A set of heuristics that relate various typed lambda calculi to effective neural architectures would provide a systematic method for mapping language features (e.g., polymorphism, subtyping, inheritance, etc.) to architecture choices. Second, transformer models are widely used in deep learning architectures applied to code, but the design and hyperparameter space for them is large and relatively unexplored in programming language applications. Therefore, we suggest a benchmark that allows us to explore exactly this through perhaps the simplest and most fundamental property of a programming language: the relationship between terms and types. Consequently, we begin this inquiry of transformer architectures for typed lambda calculi by exploring the effect of transformer warm-up and optimizer selection in the task of type inference: i.e., predicting the types of lambda calculus terms using only transformers. We find that the optimization landscape is difficult even in this simple setting. One particular experimental finding is that optimization by Adafactor converges much faster compared to the optimization by Adam and RAdam. We conjecture that such different performance of optimizers might be related to the difficulties of generalization over formally generated dataset.

Autonomic Architecture for Big Data Performance Optimization

  • Authors: Mikhail Genkin, Frank Dehne, Anousheh Shahmirza, Pablo Navarro, Siyu Zhou
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10503
  • Pdf link: https://arxiv.org/pdf/2304.10503
  • Abstract
    The big data software stack based on Apache Spark and Hadoop has become mission critical in many enterprises. Performance of Spark and Hadoop jobs depends on a large number of configuration settings. Manual tuning is expensive and brittle. There have been prior efforts to develop on-line and off-line automatic tuning approaches to make the big data stack less dependent on manual tuning. These, however, demonstrated only modest performance improvements with very simple, single-user workloads on small data sets. This paper presents KERMIT - the autonomic architecture for big data capable of automatically tuning Apache Spark and Hadoop on-line, and achieving performance results 30% faster than rule-of-thumb tuning by a human administrator and up to 92% as fast as the fastest possible tuning established by performing an exhaustive search of the tuning parameter space. KERMIT can detect important workload changes with up to 99% accuracy, and predict future workload types with up to 96% accuracy. It is capable of identifying and classifying complex multi-user workloads without being explicitly trained on examples of these workloads. It does not rely on the past workload history to predict the future workload classes and their associated performance. KERMIT can identify and learn new workload classes, and adapt to workload drift, without human intervention.

Generalizing Neural Human Fitting to Unseen Poses With Articulated SE(3) Equivariance

  • Authors: Haiwen Feng, Peter Kulits, Shichen Liu, Michael J. Black, Victoria Abrevaya
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10528
  • Pdf link: https://arxiv.org/pdf/2304.10528
  • Abstract
    We address the problem of fitting a parametric human body model (SMPL) to point cloud data. Optimization-based methods require careful initialization and are prone to becoming trapped in local optima. Learning-based methods address this but do not generalize well when the input pose is far from those seen during training. For rigid point clouds, remarkable generalization has been achieved by leveraging SE(3)-equivariant networks, but these methods do not work on articulated objects. In this work we extend this idea to human bodies and propose ArtEq, a novel part-based SE(3)-equivariant neural architecture for SMPL model estimation from point clouds. Specifically, we learn a part detection network by leveraging local SO(3) invariance, and regress shape and pose using articulated SE(3) shape-invariant and pose-equivariant networks, all trained end-to-end. Our novel equivariant pose regression module leverages the permutation-equivariant property of self-attention layers to preserve rotational equivariance. Experimental results show that ArtEq can generalize to poses not seen during training, outperforming state-of-the-art methods by 74.5%, without requiring an optimization refinement step. Further, compared with competing works, our method is more than three orders of magnitude faster during inference and has 97.3% fewer parameters. The code and model will be available for research purposes at https://arteq.is.tue.mpg.de.

Keyword: mobile

NRTS: A Client-Server architecture for supporting data recording, transmission and evaluation of multidisciplinary teams during the neonatal resuscitation simulation scenario

  • Authors: Manuel Striani
  • Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.09860
  • Pdf link: https://arxiv.org/pdf/2304.09860
  • Abstract
    In this technical report, we describe Neonatal Resuscitation Training Simulator (NRTS), an Android mobile app designed to support medical experts to input, transmit and record data during a High-Fidelity Simulation course for neonatal resuscitation. This mobile app allows one to automatically send all the recorded data from "Neonatal Intensive Care Unit" (NICU) of Casale Monferrato Children's Hospital, (Italy) to a server located at the Department of Science and Technological Innovation (DiSIT), University of Piemonte Orientale (Italy). Finally, the medical instructor can view statistics on a simulation exercise that may be used during the de-briefing phase for the evaluation of multidisciplinary teams involved in the simulation scenarios.

Scheduling DNNs on Edge Servers

  • Authors: Jian He, Chenxi Yang, Zhaoyuan He, Ghufran Baig, Lili Qiu
  • Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09961
  • Pdf link: https://arxiv.org/pdf/2304.09961
  • Abstract
    Deep neural networks (DNNs) have been widely used in various video analytic tasks. These tasks demand real-time responses. Due to the limited processing power on mobile devices, a common way to support such real-time analytics is to offload the processing to an edge server. This paper examines how to speed up the edge server DNN processing for multiple clients. In particular, we observe batching multiple DNN requests significantly speeds up the processing time. Based on this observation, we first design a novel scheduling algorithm to exploit the batching benefits of all requests that run the same DNN. This is compelling since there are only a handful of DNNs and many requests tend to use the same DNN. Our algorithms are general and can support different objectives, such as minimizing the completion time or maximizing the on-time ratio. We then extend our algorithm to handle requests that use different DNNs with or without shared layers. Finally, we develop a collaborative approach to further improve performance by adaptively processing some of the requests or portions of the requests locally at the clients. This is especially useful when the network and/or server is congested. Our implementation shows the effectiveness of our approach under different request distributions (e.g., Poisson, Pareto, and Constant inter-arrivals).

Availability Model of a 5G-MEC System

  • Authors: Thilina Pathirana, Gianfranco Nencioni
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.09992
  • Pdf link: https://arxiv.org/pdf/2304.09992
  • Abstract
    Multi-access Edge Computing (MEC) is one of the enabling technologies of the fifth generation (5G) of mobile networks. MEC enables services with strict latency requirements by bringing computing capabilities close to the users. As with any new technology, the dependability of MEC is one of the aspects that need to be carefully studied. In this paper, we propose a two-level model to compute the availability of a 5G-MEC system. We then use the model to evaluate the availability of a 5G-MEC system under various configurations. The results show that having a single redundancy of the 5G-MEC elements leads an acceptable availability. To reach a high availability, the software failure intensity of the management elements of 5G and MEC should be reduced.

Robust Route Planning with Distributional Reinforcement Learning in a Stochastic Road Network Environment

  • Authors: Xi Lin, Paul Szenher, John D. Martin, Brendan Englot
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.09996
  • Pdf link: https://arxiv.org/pdf/2304.09996
  • Abstract
    Route planning is essential to mobile robot navigation problems. In recent years, deep reinforcement learning (DRL) has been applied to learning optimal planning policies in stochastic environments without prior knowledge. However, existing works focus on learning policies that maximize the expected return, the performance of which can vary greatly when the level of stochasticity in the environment is high. In this work, we propose a distributional reinforcement learning based framework that learns return distributions which explicitly reflect environmental stochasticity. Policies based on the second-order stochastic dominance (SSD) relation can be used to make adjustable route decisions according to user preference on performance robustness. Our proposed method is evaluated in a simulated road network environment, and experimental results show that our method is able to plan the shortest routes that minimize stochasticity in travel time when robustness is preferred, while other state-of-the-art DRL methods are agnostic to environmental stochasticity.

FTMRate: Collision-Immune Distance-based Data Rate Selection for IEEE 802.11 Networks

  • Authors: Wojciech Ciezobka, Maksymilian Wojnar, Katarzyna Kosek-Szott, Szymon Szott, Krzysztof Rusek
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.10140
  • Pdf link: https://arxiv.org/pdf/2304.10140
  • Abstract
    Data rate selection algorithms for Wi-Fi devices are an important area of research because they directly impact performance. Most of the proposals are based on measuring the transmission success probability for a given data rate. In dense scenarios, however, this probing approach will fail because frame collisions are misinterpreted as erroneous data rate selection. We propose FTMRate which uses the fine timing measurement (FTM) feature, recently introduced in IEEE 802.11. FTM allows stations to measure their distance from the AP. We argue that knowledge of the distance from the receiver can be useful in determining which data rate to use. We apply statistical learning (a form of machine learning) to estimate the distance based on measurements, estimate channel quality from the distance, and select data rates based on channel quality. We evaluate three distinct estimation approaches: exponential smoothing, Kalman filter, and particle filter. We present a performance evaluation of the three variants of FTMRate and show, in several dense and mobile (though line-of-sight only) scenarios, that it can outperform two benchmarks and provide close to optimal results in IEEE 802.11ax networks.

A Large-scale Examination of "Socioeconomic" Fairness in Mobile Networks

  • Authors: Souneil Park, Pavol Mulinka, Diego Perino
  • Subjects: Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2304.10190
  • Pdf link: https://arxiv.org/pdf/2304.10190
  • Abstract
    Internet access is a special resource of which needs has become universal across the public whereas the service is operated in the private sector. Mobile Network Operators (MNOs) put efforts for management, planning, and optimization; however, they do not link such activities to socioeconomic fairness. In this paper, we make a first step towards understanding the relation between socioeconomic status of customers and network performance, and investigate potential discrimination in network deployment and management. The scope of our study spans various aspects, including urban geography, network resource deployment, data consumption, and device distribution. A novel methodology that enables a geo-socioeconomic perspective to mobile network is developed for the study. The results are based on an actual infrastructure in multiple cities, covering millions of users densely covering the socioeconomic scale. We report a thorough examination of the fairness status, its relationship with various structural factors, and potential class specific solutions.

Breast cancer detection using deep learning

  • Authors: Gayathri Girish, Ponnathota Spandana, Badrish Vasu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10386
  • Pdf link: https://arxiv.org/pdf/2304.10386
  • Abstract
    Objective: This paper proposes a deep learning model for breast cancer detection from reconstructed images of microwave imaging scan data and aims to improve the accuracy and efficiency of breast tumor detection, which could have a significant impact on breast cancer diagnosis and treatment. Methods: Our framework consists of different convolutional neural network (CNN) architectures for feature extraction and a region-based CNN for tumor detection. We use 7 different architectures: DenseNet201, ResNet50, InceptionV3, InceptionResNetV3, MobileNetV2, NASNetMobile and NASNetLarge and compare its performance to find the best architecture out of the seven. An experimental dataset of MRI-derived breast phantoms was used. Results: NASNetLarge is the best architecture which can be used for the CNN model with accuracy of 88.41% and loss of 27.82%. Given that the model's AUC is 0.786, it can be concluded that it is suitable for use in its present form, while it could be improved upon and trained on other datasets that are comparable. Impact: One of the main causes of death in women is breast cancer, and early identification is essential for enhancing the results for patients. Due to its non-invasiveness and capacity to produce high-resolution images, microwave imaging is a potential tool for breast cancer screening. The complexity of tumors makes it difficult to adequately detect them in microwave images. The results of this research show that deep learning has a lot of potential for breast cancer detection in microwave images

Securing Neural Networks with Knapsack Optimization

  • Authors: Yakir Gorski, Shai Avidan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10442
  • Pdf link: https://arxiv.org/pdf/2304.10442
  • Abstract
    Deep learning inference brings together the data and the Convolutional Neural Network (CNN). This is problematic in case the user wants to preserve the privacy of the data and the service provider does not want to reveal the weights of his CNN. Secure Inference allows the two parties to engage in a protocol that preserves their respective privacy concerns, while revealing only the inference result to the user. This is known as Multi-Party Computation (MPC). A major bottleneck of MPC algorithms is communication, as the parties must send data back and forth. The linear component of a CNN (i.e. convolutions) can be done efficiently with minimal communication, but the non-linear part (i.e., ReLU) requires the bulk of communication bandwidth. We propose two ways to accelerate Secure Inference. The first is based on the observation that the ReLU outcome of many convolutions is highly correlated. Therefore, we replace the per pixel ReLU operation by a ReLU operation per patch. Each layer in the network will benefit from a patch of a different size and we devise an algorithm to choose the optimal set of patch sizes through a novel reduction of the problem to a knapsack problem. The second way to accelerate Secure Inference is based on cutting the number of bit comparisons required for a secure ReLU operation. We demonstrate the cumulative effect of these tools in the semi-honest secure 3-party setting for four problems: Classifying ImageNet using ResNet50 backbone, classifying CIFAR100 using ResNet18 backbone, semantic segmentation of ADE20K using MobileNetV2 backbone and semantic segmentation of Pascal VOC 2012 using ResNet50 backbone. Our source code is publicly available: $\href{https://github.com/yg320/secure_inference}{\text{https://github.com/yg320/secure_inference}}$

Keyword: pruning

Model Pruning Enables Localized and Efficient Federated Learning for Yield Forecasting and Data Sharing

  • Authors: Andy Li, Milan Markovic, Peter Edwards, Georgios Leontidis
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09876
  • Pdf link: https://arxiv.org/pdf/2304.09876
  • Abstract
    Federated Learning (FL) presents a decentralized approach to model training in the agri-food sector and offers the potential for improved machine learning performance, while ensuring the safety and privacy of individual farms or data silos. However, the conventional FL approach has two major limitations. First, the heterogeneous data on individual silos can cause the global model to perform well for some clients but not all, as the update direction on some clients may hinder others after they are aggregated. Second, it is lacking with respect to the efficiency perspective concerning communication costs during FL and large model sizes. This paper proposes a new technical solution that utilizes network pruning on client models and aggregates the pruned models. This method enables local models to be tailored to their respective data distribution and mitigate the data heterogeneity present in agri-food data. Moreover, it allows for more compact models that consume less data during transmission. We experiment with a soybean yield forecasting dataset and find that this approach can improve inference performance by 15.5% to 20% compared to FedAvg, while reducing local model sizes by up to 84% and the data volume communicated between the clients and the server by 57.1% to 64.7%.

Keyword: voxel

Tetra-NeRF: Representing Neural Radiance Fields Using Tetrahedra

  • Authors: Jonas Kulhanek, Torsten Sattler
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.09987
  • Pdf link: https://arxiv.org/pdf/2304.09987
  • Abstract
    Neural Radiance Fields (NeRFs) are a very recent and very popular approach for the problems of novel view synthesis and 3D reconstruction. A popular scene representation used by NeRFs is to combine a uniform, voxel-based subdivision of the scene with an MLP. Based on the observation that a (sparse) point cloud of the scene is often available, this paper proposes to use an adaptive representation based on tetrahedra and a Delaunay representation instead of the uniform subdivision or point-based representations. We show that such a representation enables efficient training and leads to state-of-the-art results. Our approach elegantly combines concepts from 3D geometry processing, triangle-based rendering, and modern neural radiance fields. Compared to voxel-based representations, ours provides more detail around parts of the scene likely to be close to the surface. Compared to point-based representations, our approach achieves better performance.

Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering

  • Authors: Dongting Hu, Zhenkai Zhang, Tingbo Hou, Tongliang Liu, Huan Fu, Mingming Gong
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10075
  • Pdf link: https://arxiv.org/pdf/2304.10075
  • Abstract
    The rendering scheme in neural radiance field (NeRF) is effective in rendering a pixel by casting a ray into the scene. However, NeRF yields blurred rendering results when the training images are captured at non-uniform scales, and produces aliasing artifacts if the test images are taken in distant views. To address this issue, Mip-NeRF proposes a multiscale representation as a conical frustum to encode scale information. Nevertheless, this approach is only suitable for offline rendering since it relies on integrated positional encoding (IPE) to query a multilayer perceptron (MLP). To overcome this limitation, we propose mip voxel grids (Mip-VoG), an explicit multiscale representation with a deferred architecture for real-time anti-aliasing rendering. Our approach includes a density Mip-VoG for scene geometry and a feature Mip-VoG with a small MLP for view-dependent color. Mip-VoG encodes scene scale using the level of detail (LOD) derived from ray differentials and uses quadrilinear interpolation to map a queried 3D location to its features and density from two neighboring downsampled voxel grids. To our knowledge, our approach is the first to offer multiscale training and real-time anti-aliasing rendering simultaneously. We conducted experiments on multiscale datasets, and the results show that our approach outperforms state-of-the-art real-time rendering baselines.

Keyword: lidar

LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields

  • Authors: Tang Tao, Longfei Gao, Guangrun Wang, Peng Chen, Dayang Hao, Xiaodan Liang, Mathieu Salzmann, Kaicheng Yu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10406
  • Pdf link: https://arxiv.org/pdf/2304.10406
  • Abstract
    We introduce a new task, novel view synthesis for LiDAR sensors. While traditional model-based LiDAR simulators with style-transfer neural networks can be applied to render novel views, they fall short in producing accurate and realistic LiDAR patterns, because the renderers they rely on exploit game engines, which are not differentiable. We address this by formulating, to the best of our knowledge, the first differentiable LiDAR renderer, and propose an end-to-end framework, LiDAR-NeRF, leveraging a neural radiance field (NeRF) to enable jointly learning the geometry and the attributes of 3D points. To evaluate the effectiveness of our approach, we establish an object-centric multi-view LiDAR dataset, dubbed NeRF-MVL. It contains observations of objects from 9 categories seen from 360-degree viewpoints captured with multiple LiDAR sensors. Our extensive experiments on the scene-level KITTI-360 dataset, and on our object-level NeRF-MVL show that our LiDAR- NeRF surpasses the model-based algorithms significantly.

Keyword: diffusion

Using Text-to-Image Generation for Architectural Design Ideation

  • Authors: Ville Paananen, Jonas Oppenlaender, Aku Visuri
  • Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10182
  • Pdf link: https://arxiv.org/pdf/2304.10182
  • Abstract
    The recent progress of text-to-image generation has been recognized in architectural design. Our study is the first to investigate the potential of text-to-image generators in supporting creativity during the early stages of the architectural design process. We conducted a laboratory study with 17 architecture students, who developed a concept for a culture center using three popular text-to-image generators: Midjourney, Stable Diffusion, and DALL-E. Through standardized questionnaires and group interviews, we found that image generation could be a meaningful part of the design process when design constraints are carefully considered. Generative tools support serendipitous discovery of ideas and an imaginative mindset, enriching the design process. We identified several challenges of image generators and provided considerations for software development and educators to support creativity and emphasize designers' imaginative mindset. By understanding the limitations and potential of text-to-image generators, architects and designers can leverage this technology in their design process and education, facilitating innovation and effective communication of concepts.

A data augmentation perspective on diffusion models and retrieval

  • Authors: Max F. Burg, Florian Wenzel, Dominik Zietlow, Max Horn, Osama Makansi, Francesco Locatello, Chris Russell
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10253
  • Pdf link: https://arxiv.org/pdf/2304.10253
  • Abstract
    Diffusion models excel at generating photorealistic images from text-queries. Naturally, many approaches have been proposed to use these generative abilities to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large noisily supervised, but nonetheless, annotated datasets. It is an open question whether the generalization capabilities of diffusion models beyond using the additional data of the pre-training process for augmentation lead to improved downstream performance. We perform a systematic evaluation of existing methods to generate images from diffusion models and study new extensions to assess their benefit for data augmentation. While we find that personalizing diffusion models towards the target data outperforms simpler prompting strategies, we also show that using the training data of the diffusion model alone, via a simple nearest neighbor retrieval procedure, leads to even stronger downstream performance. Overall, our study probes the limitations of diffusion models for data augmentation but also highlights its potential in generating new training data to improve performance on simple downstream vision tasks.

Anything-3D: Towards Single-view Anything Reconstruction in the Wild

  • Authors: Qiuhong Shen, Xingyi Yang, Xinchao Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10261
  • Pdf link: https://arxiv.org/pdf/2304.10261
  • Abstract
    3D reconstruction from a single-RGB image in unconstrained real-world scenarios presents numerous challenges due to the inherent diversity and complexity of objects and environments. In this paper, we introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model to elevate objects to 3D, yielding a reliable and versatile system for single-view conditioned 3D reconstruction task. Our approach employs a BLIP model to generate textural descriptions, utilizes the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field. Demonstrating its ability to produce accurate and detailed 3D reconstructions for a wide array of objects, \emph{Anything-3D\footnotemark[2]} shows promise in addressing the limitations of existing methodologies. Through comprehensive experiments and evaluations on various datasets, we showcase the merits of our approach, underscoring its potential to contribute meaningfully to the field of 3D reconstruction. Demos and code will be available at \href{https://github.com/Anything-of-anything/Anything-3D}{https://github.com/Anything-of-anything/Anything-3D}.

Prediction of the evolution of the nuclear reactor core parameters using artificial neural network

  • Authors: Krzysztof Palmi, Wojciech Kubinski, Piotr Darnowski
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10337
  • Pdf link: https://arxiv.org/pdf/2304.10337
  • Abstract
    A nuclear reactor based on MIT BEAVRS benchmark was used as a typical power generating Pressurized Water Reactor (PWR). The PARCS v3.2 nodal-diffusion core simulator was used as a full-core reactor physics solver to emulate the operation of a reactor and to generate training, and validation data for the ANN. The ANN was implemented with dedicated Python 3.8 code with Google's TensorFlow 2.0 library. The effort was based to a large extent on the process of appropriate automatic transformation of data generated by PARCS simulator, which was later used in the process of the ANN development. Various methods that allow obtaining better accuracy of the ANN predicted results were studied, such as trying different ANN architectures to find the optimal number of neurons in the hidden layers of the network. Results were later compared with the architectures proposed in the literature. For the selected best architecture predictions were made for different core parameters and their dependence on core loading patterns. In this study, a special focus was put on the prediction of the fuel cycle length for a given core loading pattern, as it can be considered one of the targets for plant economic operation. For instance, the length of a single fuel cycle depending on the initial core loading pattern was predicted with very good accuracy (>99%). This work contributes to the exploration of the usefulness of neural networks in solving nuclear reactor design problems. Thanks to the application of ANN, designers can avoid using an excessive amount of core simulator runs and more rapidly explore the space of possible solutions before performing more detailed design considerations.

Collaborative Diffusion for Multi-Modal Face Generation and Editing

  • Authors: Ziqi Huang, Kelvin C.K. Chan, Yuming Jiang, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10530
  • Pdf link: https://arxiv.org/pdf/2304.10530
  • Abstract
    Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion models mainly focus on uni-modal control, i.e., the diffusion process is driven by only one modality of condition. To further unleash the users' creativity, it is desirable for the model to be controllable by multiple modalities simultaneously, e.g., generating and editing faces by describing the age (text-driven) while drawing the face shape (mask-driven). In this work, we present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training. Our key insight is that diffusion models driven by different modalities are inherently complementary regarding the latent denoising steps, where bilateral connections can be established upon. Specifically, we propose dynamic diffuser, a meta-network that adaptively hallucinates multi-modal denoising steps by predicting the spatial-temporal influence functions for each pre-trained uni-modal model. Collaborative Diffusion not only collaborates generation capabilities from uni-modal diffusion models, but also integrates multiple uni-modal manipulations to perform multi-modal editing. Extensive qualitative and quantitative experiments demonstrate the superiority of our framework in both image quality and condition consistency.

Nerfbusters: Removing Ghostly Artifacts from Casually Captured NeRFs

  • Authors: Frederik Warburg, Ethan Weber, Matthew Tancik, Aleksander Holynski, Angjoo Kanazawa
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2304.10532
  • Pdf link: https://arxiv.org/pdf/2304.10532
  • Abstract
    Casually captured Neural Radiance Fields (NeRFs) suffer from artifacts such as floaters or flawed geometry when rendered outside the camera trajectory. Existing evaluation protocols often do not capture these effects, since they usually only assess image quality at every 8th frame of the training capture. To push forward progress in novel-view synthesis, we propose a new dataset and evaluation procedure, where two camera trajectories are recorded of the scene: one used for training, and the other for evaluation. In this more challenging in-the-wild setting, we find that existing hand-crafted regularizers do not remove floaters nor improve scene geometry. Thus, we propose a 3D diffusion-based method that leverages local 3D priors and a novel density-based score distillation sampling loss to discourage artifacts during NeRF optimization. We show that this data-driven prior removes floaters and improves scene geometry for casual captures.

Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion

  • Authors: Tomas Jakab, Ruining Li, Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10535
  • Pdf link: https://arxiv.org/pdf/2304.10535
  • Abstract
    We present Farm3D, a method to learn category-specific 3D reconstructors for articulated objects entirely from "free" virtual supervision from a pre-trained 2D diffusion-based image generator. Recent approaches can learn, given a collection of single-view images of an object category, a monocular network to predict the 3D shape, albedo, illumination and viewpoint of any object occurrence. We propose a framework using an image generator like Stable Diffusion to generate virtual training data for learning such a reconstruction network from scratch. Furthermore, we include the diffusion model as a score to further improve learning. The idea is to randomise some aspects of the reconstruction, such as viewpoint and illumination, generating synthetic views of the reconstructed 3D object, and have the 2D network assess the quality of the resulting image, providing feedback to the reconstructor. Different from work based on distillation which produces a single 3D asset for each textual prompt in hours, our approach produces a monocular reconstruction network that can output a controllable 3D asset from a given image, real or generated, in only seconds. Our network can be used for analysis, including monocular reconstruction, or for synthesis, generating articulated assets for real-time applications such as video games.

Keyword: dynamic

GeoGraphViz: Geographically Constrained 3D Force-Directed Graph for Knowledge Graph Visualization

  • Authors: Sizhe Wang, Wenwen Li, Zhining Gu
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2304.09864
  • Pdf link: https://arxiv.org/pdf/2304.09864
  • Abstract
    Knowledge graphs are a key technique for linking and integrating cross-domain data, concepts, tools, and knowledge to enable data-driven analytics. As much of the worlds data have become massive in size, visualizing graph entities and their interrelationships intuitively and interactively has become a crucial task for ingesting and better utilizing graph content to support semantic reasoning, discovering hidden knowledge discovering, and better scientific understanding of geophysical and social phenomena. Despite the fact that many such phenomena (e.g., disasters) have clear spatial footprints and geographical properties, their location information is considered only as a textual label in existing graph visualization tools, limiting their capability to reveal the geospatial distribution patterns of the graph nodes. In addition, most graph visualization techniques rely on 2D graph visualization, which constraints the dimensions of information that can be presented and lacks support for graph structure examination from multiple angles. To tackle the above challenges, we developed a novel 3D map-based graph visualization algorithm to enable interactive exploration of graph content and patterns in a spatially explicit manner. The algorithm extends a 3D force directed graph by integrating a web map, an additional geolocational force, and a force balancing variable that allows for the dynamic adjustment of the 3D graph structure and layout. This mechanism helps create a balanced graph view between the semantic forces among the graph nodes and the attractive force from a geolocation to a graph node. Our solution offers a new perspective in visualizing and understanding spatial entities and events in a knowledge graph.

Robust trajectory tracking for underactuated mechanical systems without velocity measurements

  • Authors: N. Javanmardi, P. Borja, M. J. Yazdanpanah, J. M. A. Scherpen
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.09910
  • Pdf link: https://arxiv.org/pdf/2304.09910
  • Abstract
    In this paper, the notion of contraction is used to solve the trajectory-tracking problem for a class of mechanical systems. Additionally, we propose a dynamic extension to remove velocity measurements from the controller while rejecting matched disturbances. In particular, we propose three control designs stemming from the Interconnection and Damping Assignment Passivity-Based Control approach. The first controller is a tracker that does not require velocity measurements. The second control design solves the trajectory-tracking problem while guaranteeing robustness with respect to matched disturbances. Then, the third approach is a combination of both mentioned controllers. It is shown that all proposed design methods guarantee exponential convergence of the mechanical system to the desired (feasible) trajectory due to the contraction property of the closed-loop system. The applicability of this method is illustrated via the design of a controller for an underactuated mechanical system.

An Intent-based Framework for Vehicular Edge Computing

  • Authors: TianZhang He, Adel N. Toosi, Negin Akbari, Muhammed Tawfiqul Islam, Muhammad Aamir Cheema
  • Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2304.09916
  • Pdf link: https://arxiv.org/pdf/2304.09916
  • Abstract
    The rapid development of emerging vehicular edge computing (VEC) brings new opportunities and challenges for dynamic resource management. The increasing number of edge data centers, roadside units (RSUs), and network devices, however, makes resource management a complex task in VEC. On the other hand, the exponential growth of service applications and end-users makes corresponding QoS hard to maintain. Intent-Based Networking (IBN), based on Software-Defined Networking, was introduced to provide the ability to automatically handle and manage the networking requirements of different applications. Motivated by the IBN concept, in this paper, we propose a novel approach to jointly orchestrate networking and computing resources based on user requirements. The proposed solution constantly monitors user requirements and dynamically re-configures the system to satisfy desired states of the application. We compared our proposed solution with the state-of-the-art networking embedding algorithms using real-world taxi GPS traces. Results show that our proposed method is significantly faster (up to 95%) and can improve resource utilization (up to 76%) and the acceptance ratio of computing and networking requests with various priorities (up to 71%). We also present a small-scale prototype of the proposed intent management framework to validate our solution.

Improving Urban Flood Prediction using LSTM-DeepLabv3+ and Bayesian Optimization with Spatiotemporal feature fusion

  • Authors: Zuxiang Situ, Qi Wang, Shuai Teng, Wanen Feng, Gongfa Chen, Qianqian Zhou, Guangtao Fu
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.09994
  • Pdf link: https://arxiv.org/pdf/2304.09994
  • Abstract
    Deep learning models have become increasingly popular for flood prediction due to their superior accuracy and efficiency compared to traditional methods. However, current machine learning methods often rely on separate spatial or temporal feature analysis and have limitations on the types, number, and dimensions of input data. This study presented a CNN-RNN hybrid feature fusion modelling approach for urban flood prediction, which integrated the strengths of CNNs in processing spatial features and RNNs in analyzing different dimensions of time sequences. This approach allowed for both static and dynamic flood predictions. Bayesian optimization was applied to identify the seven most influential flood-driven factors and determine the best combination strategy. By combining four CNNs (FCN, UNet, SegNet, DeepLabv3+) and three RNNs (LSTM, BiLSTM, GRU), the optimal hybrid model was identified as LSTM-DeepLabv3+. This model achieved the highest prediction accuracy (MAE, RMSE, NSE, and KGE were 0.007, 0.025, 0.973 and 0.755, respectively) under various rainfall input conditions. Additionally, the processing speed was significantly improved, with an inference time of 1.158s (approximately 1/125 of the traditional computation time) compared to the physically-based models.

HTNet: Dynamic WLAN Performance Prediction using Heterogenous Temporal GNN

  • Authors: Hongkuan Zhou, Rajgopal Kannan, Ananthram Swami, Viktor Prasanna
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2304.10013
  • Pdf link: https://arxiv.org/pdf/2304.10013
  • Abstract
    Predicting the throughput of WLAN deployments is a classic problem that occurs in the design of robust and high performance WLAN systems. However, due to the increasingly complex communication protocols and the increase in interference between devices in denser and denser WLAN deployments, traditional methods either have substantial runtime or enormous prediction error and hence cannot be applied in downstream tasks. Recently, Graph Neural Networks have been proven to be powerful graph analytic models and have been broadly applied to various networking problems such as link scheduling and power allocation. In this work, we propose HTNet, a specialized Heterogeneous Temporal Graph Neural Network that extracts features from dynamic WLAN deployments. Analyzing the unique graph structure of WLAN deployment graphs, we show that HTNet achieves the maximum expressive power on each snapshot. Based on a powerful message passing scheme, HTNet requires fewer number of layers compared with other GNN-based methods which entails less supporting data and runtime. To evaluate the performance of HTNet, we prepare six different setups with more than five thousands dense dynamic WLAN deployments that cover a wide range of real-world scenarios. HTNet achieves the lowest prediction error on all six setups with an average improvement of 25.3% over the state-of-the-art methods.

Topological Guided Actor-Critic Modular Learning of Continuous Systems with Temporal Objectives

  • Authors: Lening Li, Zhentian Qian
  • Subjects: Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2304.10041
  • Pdf link: https://arxiv.org/pdf/2304.10041
  • Abstract
    This work investigates the formal policy synthesis of continuous-state stochastic dynamic systems given high-level specifications in linear temporal logic. To learn an optimal policy that maximizes the satisfaction probability, we take a product between a dynamic system and the translated automaton to construct a product system on which we solve an optimal planning problem. Since this product system has a hybrid product state space that results in reward sparsity, we introduce a generalized optimal backup order, in reverse to the topological order, to guide the value backups and accelerate the learning process. We provide the optimality proof for using the generalized optimal backup order in this optimal planning problem. Further, this paper presents an actor-critic reinforcement learning algorithm when topological order applies. This algorithm leverages advanced mathematical techniques and enjoys the property of hyperparameter self-tuning. We provide proof of the optimality and convergence of our proposed reinforcement learning algorithm. We use neural networks to approximate the value function and policy function for hybrid product state space. Furthermore, we observe that assigning integer numbers to automaton states can rank the value or policy function approximated by neural networks. To break the ordinal relationship, we use an individual neural network for each automaton state's value (policy) function, termed modular learning. We conduct two experiments. First, to show the efficacy of our reinforcement learning algorithm, we compare it with baselines on a classic control task, CartPole. Second, we demonstrate the empirical performance of our formal policy synthesis framework on motion planning of a Dubins car with a temporal specification.

Dynablox: Real-time Detection of Diverse Dynamic Objects in Complex Environments

  • Authors: Lukas Schmid, Olov Andersson, Aurelio Sulser, Patrick Pfreundschuh, Roland Siegwart
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.10049
  • Pdf link: https://arxiv.org/pdf/2304.10049
  • Abstract
    Real-time detection of moving objects is an essential capability for robots acting autonomously in dynamic environments. We thus propose Dynablox, a novel online mapping-based approach for robust moving object detection in complex environments. The central idea of our approach is to incrementally estimate high confidence free-space areas by modeling and accounting for sensing, state estimation, and mapping limitations during online robot operation. The spatio-temporally conservative free space estimate enables robust detection of moving objects without making any assumptions on the appearance of objects or environments. This allows deployment in complex scenes such as multi-storied buildings or staircases, and for diverse moving objects such as people carrying various items, doors swinging or even balls rolling around. We thoroughly evaluate our approach on real-world data sets, achieving 86% IoU at 17 FPS in typical robotic settings. The method outperforms a recent appearance-based classifier and approaches the performance of offline methods. We demonstrate its generality on a novel data set with rare moving objects in complex environments. We make our efficient implementation and the novel data set available as open-source.

Recurrent Transformer for Dynamic Graph Representation Learning with Edge Temporal States

  • Authors: Shengxiang Hu, Guobing Zou, Shiyi Lin, Liangrui Wu, Chenyang Zhou, Bofeng Zhang, Yixin Chen
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10079
  • Pdf link: https://arxiv.org/pdf/2304.10079
  • Abstract
    Dynamic graph representation learning is growing as a trending yet challenging research task owing to the widespread demand for graph data analysis in real world applications. Despite the encouraging performance of many recent works that build upon recurrent neural networks (RNNs) and graph neural networks (GNNs), they fail to explicitly model the impact of edge temporal states on node features over time slices. Additionally, they are challenging to extract global structural features because of the inherent over-smoothing disadvantage of GNNs, which further restricts the performance. In this paper, we propose a recurrent difference graph transformer (RDGT) framework, which firstly assigns the edges in each snapshot with various types and weights to illustrate their specific temporal states explicitly, then a structure-reinforced graph transformer is employed to capture the temporal node representations by a recurrent learning paradigm. Experimental results on four real-world datasets demonstrate the superiority of RDGT for discrete dynamic graph representation learning, as it consistently outperforms competing methods in dynamic link prediction tasks.

Securing Semantic Communications with Physical-layer Semantic Encryption and Obfuscation

  • Authors: Qi Qin, Yankai Rong, Guoshun Nan, Shaokang Wu, Xuefei Zhang, Qimei Cui, Xiaofeng Tao
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.10147
  • Pdf link: https://arxiv.org/pdf/2304.10147
  • Abstract
    Deep learning based semantic communication(DLSC) systems have shown great potential of making wireless networks significantly more efficient by only transmitting the semantics of the data. However, the open nature of wireless channel and fragileness of neural models cause DLSC systems extremely vulnerable to various attacks. Traditional wireless physical layer key (PLK), which relies on reciprocal channel and randomness characteristics between two legitimate users, holds the promise of securing DLSC. The main challenge lies in generating secret keys in the static environment with ultra-low/zero rate. Different from prior efforts that use relays or reconfigurable intelligent surfaces (RIS) to manipulate wireless channels, this paper proposes a novel physical layer semantic encryption scheme by exploring the randomness of bilingual evaluation understudy (BLEU) scores in the field of machine translation, and additionally presents a novel semantic obfuscation mechanism to provide further physical layer protections. Specifically, 1) we calculate the BLEU scores and corresponding weights of the DLSC system. Then, we generate semantic keys (SKey) by feeding the weighted sum of the scores into a hash function. 2) Equipped with the SKey, our proposed subcarrier obfuscation is able to further secure semantic communications with a dynamic dummy data insertion mechanism. Experiments show the effectiveness of our method, especially in the static wireless environment.

Automated Dynamic Bayesian Networks for Predicting Acute Kidney Injury Before Onset

  • Authors: David Gordon, Panayiotis Petousis, Anders O. Garlid, Keith Norris, Katherine Tuttle, Susanne B. Nicholas, Alex A.T. Bui (on behalf of CURE-CKD)
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10175
  • Pdf link: https://arxiv.org/pdf/2304.10175
  • Abstract
    Several algorithms for learning the structure of dynamic Bayesian networks (DBNs) require an a priori ordering of variables, which influences the determined graph topology. However, it is often unclear how to determine this order if feature importance is unknown, especially as an exhaustive search is usually impractical. In this paper, we introduce Ranking Approaches for Unknown Structures (RAUS), an automated framework to systematically inform variable ordering and learn networks end-to-end. RAUS leverages existing statistical methods (Cramers V, chi-squared test, and information gain) to compare variable ordering, resultant generated network topologies, and DBN performance. RAUS enables end-users with limited DBN expertise to implement models via command line interface. We evaluate RAUS on the task of predicting impending acute kidney injury (AKI) from inpatient clinical laboratory data. Longitudinal observations from 67,460 patients were collected from our electronic health record (EHR) and Kidney Disease Improving Global Outcomes (KDIGO) criteria were then applied to define AKI events. RAUS learns multiple DBNs simultaneously to predict a future AKI event at different time points (i.e., 24-, 48-, 72-hours in advance of AKI). We also compared the results of the learned AKI prediction models and variable orderings to baseline techniques (logistic regression, random forests, and extreme gradient boosting). The DBNs generated by RAUS achieved 73-83% area under the receiver operating characteristic curve (AUCROC) within 24-hours before AKI; and 71-79% AUCROC within 48-hours before AKI of any stage in a 7-day observation window. Insights from this automated framework can help efficiently implement and interpret DBNs for clinical decision support. The source code for RAUS is available in GitHub at https://github.com/dgrdn08/RAUS .

UAV-based Receding Horizon Control for 3D Inspection Planning

  • Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10201
  • Pdf link: https://arxiv.org/pdf/2304.10201
  • Abstract
    Nowadays, unmanned aerial vehicles or UAVs are being used for a wide range of tasks, including infrastructure inspection, automated monitoring and coverage. This paper investigates the problem of 3D inspection planning with an autonomous UAV agent which is subject to dynamical and sensing constraints. We propose a receding horizon 3D inspection planning control approach for generating optimal trajectories which enable an autonomous UAV agent to inspect a finite number of feature-points scattered on the surface of a cuboid-like structure of interest. The inspection planning problem is formulated as a constrained open-loop optimal control problem and is solved using mixed integer programming (MIP) optimization. Quantitative and qualitative evaluation demonstrates the effectiveness of the proposed approach.

Dynamic Security Region of Natural Gas Systems in Integrated Electricity-Gas Systems

  • Authors: Han Gao, Peiyao Zhao, Zhengshuo Li
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10215
  • Pdf link: https://arxiv.org/pdf/2304.10215
  • Abstract
    In an integrated electricity-gas system (IEGS), the tight coupling of power and natural gas systems is embodied by frequent changes in gas withdrawal from gas-fired units to provide regulation services for the power system to handle uncertainty, which may in turn endanger the secure operation of the natural gas system and ultimately affect the safety of the whole IEGS. Hence, it is necessary to accurately and efficiently evaluate the dynamic security region (DSR) of the natural gas system in the IEGS by considering the real-time dynamic characteristics of natural gas systems, which are not satisfactorily handled in state-of-the-art works. To bridge this gap, this paper first conceptionally verifies the necessity of the DSR and establishes its mathematical model. Then, a dimensionality reduction method is proposed for the efficient solution and visualization of the high-dimensional DSR evaluation model. A fast evaluation (FE) algorithm is developed to address the difficulties of the nonconvex dynamic constraints in the reduced DSR model. Finally, the necessity and notable advantages of the proposed DSR model and FE are verified based on small and relatively large test systems in comparison with common security region models and algorithms. To the best of our knowledge, this is the first paper that comprehensively presents models and efficient algorithms regarding the DSR of natural gas systems in an IEGS.

Filter-Aware Model-Predictive Control

  • Authors: Baris Kayalibay, Atanas Mirchev, Ahmed Agha, Patrick van der Smagt, Justin Bayer
  • Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2304.10246
  • Pdf link: https://arxiv.org/pdf/2304.10246
  • Abstract
    Partially-observable problems pose a trade-off between reducing costs and gathering information. They can be solved optimally by planning in belief space, but that is often prohibitively expensive. Model-predictive control (MPC) takes the alternative approach of using a state estimator to form a belief over the state, and then plan in state space. This ignores potential future observations during planning and, as a result, cannot actively increase or preserve the certainty of its own state estimate. We find a middle-ground between planning in belief space and completely ignoring its dynamics by only reasoning about its future accuracy. Our approach, filter-aware MPC, penalises the loss of information by what we call "trackability", the expected error of the state estimator. We show that model-based simulation allows condensing trackability into a neural network, which allows fast planning. In experiments involving visual navigation, realistic every-day environments and a two-link robot arm, we show that filter-aware MPC vastly improves regular MPC.

Learning Representative Trajectories of Dynamical Systems via Domain-Adaptive Imitation

  • Authors: Edgardo Solano-Carrillo, Jannis Stoppe
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2304.10260
  • Pdf link: https://arxiv.org/pdf/2304.10260
  • Abstract
    Domain-adaptive trajectory imitation is a skill that some predators learn for survival, by mapping dynamic information from one domain (their speed and steering direction) to a different domain (current position of the moving prey). An intelligent agent with this skill could be exploited for a diversity of tasks, including the recognition of abnormal motion in traffic once it has learned to imitate representative trajectories. Towards this direction, we propose DATI, a deep reinforcement learning agent designed for domain-adaptive trajectory imitation using a cycle-consistent generative adversarial method. Our experiments on a variety of synthetic families of reference trajectories show that DATI outperforms baseline methods for imitation learning and optimal control in this setting, keeping the same per-task hyperparameters. Its generalization to a real-world scenario is shown through the discovery of abnormal motion patterns in maritime traffic, opening the door for the use of deep reinforcement learning methods for spatially-unconstrained trajectory data mining.

BackCache: Mitigating Contention-Based Cache Timing Attacks by Hiding Cache Line Evictions

  • Authors: Quancheng Wang, Ming Tang, Han Wang, Yuzhe Gu
  • Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2304.10268
  • Pdf link: https://arxiv.org/pdf/2304.10268
  • Abstract
    Caches are used to reduce the speed differential between the CPU and memory to improve the performance of modern processors. However, attackers can use contention-based cache timing attacks to steal sensitive information from victim processes through carefully designed cache eviction sets. And L1 data cache attacks are widely exploited and pose a significant privacy and confidentiality threat. Existing hardware-based countermeasures mainly focus on cache partitioning, randomization, and cache line flushing, which unfortunately either incur high overhead or can be circumvented by sophisticated attacks. In this paper, we propose a novel hardware-software co-design called BackCache with the idea of always achieving cache hits instead of cache misses to mitigate contention-based cache timing attacks on the L1 data cache. BackCache places the evicted cache lines from the L1 data cache into a fully-associative backup cache to hide the evictions. To improve the security of BackCache, we introduce a randomly used replacement policy (RURP) and a dynamic backup cache resizing mechanism. We also present a theoretical security analysis to demonstrate the effectiveness of BackCache. Our evaluation on the gem5 simulator shows that BackCache can degrade the performance by 1.33%, 7.34%, and 7.59% For OS kernel, single-thread, and multi-thread benchmarks.

Observer-Feedback-Feedforward Controller Structures in Reinforcement Learning

  • Authors: Ruoqi Zhang, Per Mattson, Torbjörn Wigren
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10276
  • Pdf link: https://arxiv.org/pdf/2304.10276
  • Abstract
    The paper proposes the use of structured neural networks for reinforcement learning based nonlinear adaptive control. The focus is on partially observable systems, with separate neural networks for the state and feedforward observer and the state feedback and feedforward controller. The observer dynamics are modelled by recurrent neural networks while a standard network is used for the controller. As discussed in the paper, this leads to a separation of the observer dynamics to the recurrent neural network part, and the state feedback to the feedback and feedforward network. The structured approach reduces the computational complexity and gives the reinforcement learning based controller an {\em understandable} structure as compared to when one single neural network is used. As shown by simulation the proposed structure has the additional and main advantage that the training becomes significantly faster. Two ways to include feedforward structure are presented, one related to state feedback control and one related to classical feedforward control. The latter method introduces further structure with a separate recurrent neural network that processes only the measured disturbance. When evaluated with simulation on a nonlinear cascaded double tank process, the method with most structure performs the best, with excellent feedforward disturbance rejection gains.

Aiding reinforcement learning for set point control

  • Authors: Ruoqi Zhang, Per Mattsson, Torbjörn Wigren
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10289
  • Pdf link: https://arxiv.org/pdf/2304.10289
  • Abstract
    While reinforcement learning has made great improvements, state-of-the-art algorithms can still struggle with seemingly simple set-point feedback control problems. One reason for this is that the learned controller may not be able to excite the system dynamics well enough initially, and therefore it can take a long time to get data that is informative enough to learn for good control. The paper contributes by augmentation of reinforcement learning with a simple guiding feedback controller, for example, a proportional controller. The key advantage in set point control is a much improved excitation that improves the convergence properties of the reinforcement learning controller significantly. This can be very important in real-world control where quick and accurate convergence is needed. The proposed method is evaluated with simulation and on a real-world double tank process with promising results.

FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits

  • Authors: Polina Karpikova (1 and 2), Radionova Ekaterina (1), Anastasia Yaschenko (1 and 2), Andrei Spiridonov (1), Leonid Kostyushko (3), Riccardo Fabbricatore (1), Aleksei Ivakhnenko (1) ((1) Samsung AI Center, (2) Higher School of Economics, (3) Lomonosov Moscow State University)
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10306
  • Pdf link: https://arxiv.org/pdf/2304.10306
  • Abstract
    Generative DNNs are a powerful tool for image synthesis, but they are limited by their computational load. On the other hand, given a trained model and a task, e.g. faces generation within a range of characteristics, the output image quality will be unevenly distributed among images with different characteristics. It follows, that we might restrain the models complexity on some instances, maintaining a high quality. We propose a method for diminishing computations by adding so-called early exit branches to the original architecture, and dynamically switching the computational path depending on how difficult it will be to render the output. We apply our method on two different SOTA models performing generative tasks: generation from a semantic map, and cross-reenactment of face expressions; showing it is able to output images with custom lower-quality thresholds. For a threshold of LPIPS <=0.1, we diminish their computations by up to a half. This is especially relevant for real-time applications such as synthesis of faces, when quality loss needs to be contained, but most of the inputs need fewer computations than the complex instances.

ORIGAMI: A flexible state channels design for public blockchain systems

  • Authors: Lydia Negka, Angeliki Katsika, Georgios Spathoulas, Vassilis Plagianakos
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2304.10313
  • Pdf link: https://arxiv.org/pdf/2304.10313
  • Abstract
    Public blockchain systems offer security guarantees that cannot be matched by any centralised system. This offering has attracted a lot of interest and has exposed a significant limitation of most blockchain designs with regards to scalability. One of the scaling solutions proposed is state channels which enables serving given applications with minimum number of transactions. Existing state channels designs set multiple compatibility requirements for applications to be deployed. Origami is a novel state channels design which removes most of the requirements of existing approaches, while it also offers a number of new features. Origami enables dynamic groups of users to interact in an unordered way completely off-chain after an initial on-boarding on-chain transaction. The proposed design is analysed in detail and compared to existing schemes, while a formal security analysis validates the security properties it offers.

Polylog-Competitive Algorithms for Dynamic Balanced Graph Partitioning for Ring Demands

  • Authors: Harald Räcke, Stefan Schmid, Ruslan Zabrodin
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2304.10350
  • Pdf link: https://arxiv.org/pdf/2304.10350
  • Abstract
    The performance of many large-scale and data-intensive distributed systems critically depends on the capacity of the interconnecting network. This paper is motivated by the vision of self-adjusting infrastructures whose resources can be adjusted according to the workload they currently serve, in a demand-aware manner. Such dynamic adjustments can be exploited to improve network utilization and hence performance, by dynamically moving frequently interacting communication partners closer, e.g., collocating them in the same server or datacenter rack. In particular, we revisit the online balanced graph partitioning problem which captures the fundamental tradeoff between the benefits and costs of dynamically collocating communication partners. The demand is modelled as a sequence $\sigma$ (revealed in an online manner) of communication requests between $n$ processes, each of which is running on one of the $\ell$ servers. Each server has capacity $k=n/\ell$, hence, the processes have to be scheduled in a balanced manner across the servers. A request incurs cost $1$, if the requested processes are located on different servers, otherwise the cost is 0. A process can be migrated to a different server at cost $1$. This paper presents the first online algorithm for online balanced graph partitioning achieving a polylogarithmic competitive ratio for the fundamental case of ring communication patterns. Specifically, our main contribution is a $O(\log^3 n)$-competitive randomized online algorithm for this problem. We further present a randomized online algorithm which is $O(\log^2 n)$-competitive when compared to a static optimal solution. Our two results rely on different algorithms and techniques and hence are of independent interest.

PDL on Steroids: on Expressive Extensions of PDL with Intersection and Converse

  • Authors: Diego Figueira, Santiago Figueira, Edwin Pin
  • Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2304.10381
  • Pdf link: https://arxiv.org/pdf/2304.10381
  • Abstract
    We introduce CPDL+, a family of expressive logics rooted in Propositional Dynamic Logic (PDL). In terms of expressive power, CPDL+ strictly contains PDL extended with intersection and converse (a.k.a. ICPDL) as well as Conjunctive Queries (CQ), Conjunctive Regular Path Queries (CRPQ), or some known extensions thereof (Regular Queries and CQPDL). We investigate the expressive power, characterization of bisimulation, satisfiability, and model checking for CPDL+. We argue that natural subclasses of CPDL+ can be defined in terms of the tree-width of the underlying graphs of the formulas. We show that the class of CPDL+ formulas of tree-width 2 is equivalent to ICPDL, and that it also coincides with CPDL+ formulas of tree-width 1. However, beyond tree-width 2, incrementing the tree-width strictly increases the expressive power. We characterize the expressive power for every class of fixed tree-width formulas in terms of a bisimulation game with pebbles. Based on this characterization, we show that CPDL+ has a tree-like model property. We prove that the satisfiability problem is decidable in 2ExpTime on fixed tree-width formulas, coinciding with the complexity of ICPDL. We also exhibit classes for which satisfiability is reduced to ExpTime. Finally, we establish that the model checking problem for fixed tree-width formulas is in \ptime, contrary to the full class CPDL+.

Multi-label Node Classification On Graph-Structured Data

  • Authors: Tianqi Zhao, Ngan Thi Dong, Alan Hanjalic, Megha Khosla
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2304.10398
  • Pdf link: https://arxiv.org/pdf/2304.10398
  • Abstract
    Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, besides defining homophily for the multi-label scenario, we develop a new approach that dynamically fuses the feature and label correlation information to learn label-informed representations. Finally, we perform a large-scale comparative study with $10$ methods and $9$ datasets which also showcase the effectiveness of our approach. We release our benchmark at \url{https://anonymous.4open.science/r/LFLF-5D8C/}.

Distributed Neural Representation for Reactive in situ Visualization

  • Authors: Qi Wu, Joseph A. Insley, Victor A. Mateevitsi, Silvio Rizzi, Michael E. Papka, Kwan-Liu Ma
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2304.10516
  • Pdf link: https://arxiv.org/pdf/2304.10516
  • Abstract
    In situ visualization and steering of computational modeling can be effectively achieved using reactive programming, which leverages temporal abstraction and data caching mechanisms to create dynamic workflows. However, implementing a temporal cache for large-scale simulations can be challenging. Implicit neural networks have proven effective in compressing large volume data. However, their application to distributed data has yet to be fully explored. In this work, we develop an implicit neural representation for distributed volume data and incorporate it into the DIVA reactive programming system. This implementation enables us to build an in situ temporal caching system with a capacity 100 times larger than previously achieved. We integrate our implementation into the Ascent infrastructure and evaluate its performance using real-world simulations.

A class of mesh-free algorithms for some problems arising in finance and machine learning

  • Authors: Philippe G. LeFloch, Jean-Marc Mercier
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2304.10521
  • Pdf link: https://arxiv.org/pdf/2304.10521
  • Abstract
    We introduce a numerical methodology, referred to as the transport-based mesh-free method, which allows us to deal with continuous, discrete, or statistical models in the same unified framework, and leads us to a broad class of numerical algorithms recently implemented in a Python library (namely, CodPy). Specifically, we propose a mesh-free discretization technique based on the theory of reproducing kernels and the theory of transport mappings, in a way that is reminiscent of Lagrangian methods in computational fluid dynamics. We introduce kernel-based discretizations of a variety of differential and discrete operators (gradient, divergence, Laplacian, Leray projection, extrapolation, interpolation, polar factorization). The proposed algorithms are nonlinear in nature and enjoy quantitative error estimates based on the notion of discrepancy error, which allows one to evaluate the relevance and accuracy of, both, the given data and the numerical solutions. Our strategy is relevant when a large number of degrees of freedom are present as is the case in mathematical finance and machine learning. We consider the Fokker-Planck-Kolmogorov system (relevant for problems arising in finance and material dynamics) and a class of neural networks based on support vector machines.

Collaborative Diffusion for Multi-Modal Face Generation and Editing

  • Authors: Ziqi Huang, Kelvin C.K. Chan, Yuming Jiang, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2304.10530
  • Pdf link: https://arxiv.org/pdf/2304.10530
  • Abstract
    Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion models mainly focus on uni-modal control, i.e., the diffusion process is driven by only one modality of condition. To further unleash the users' creativity, it is desirable for the model to be controllable by multiple modalities simultaneously, e.g., generating and editing faces by describing the age (text-driven) while drawing the face shape (mask-driven). In this work, we present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training. Our key insight is that diffusion models driven by different modalities are inherently complementary regarding the latent denoising steps, where bilateral connections can be established upon. Specifically, we propose dynamic diffuser, a meta-network that adaptively hallucinates multi-modal denoising steps by predicting the spatial-temporal influence functions for each pre-trained uni-modal model. Collaborative Diffusion not only collaborates generation capabilities from uni-modal diffusion models, but also integrates multiple uni-modal manipulations to perform multi-modal editing. Extensive qualitative and quantitative experiments demonstrate the superiority of our framework in both image quality and condition consistency.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.