Demos and Presentations

ATLAS: Adaptive Landmark Acquisition using LLM-Guided Navigation

Autonomous navigation agents traditionally rely on predefined maps and landmarks, limiting their ability to adapt to dynamic and unfamiliar environments. ATLAS presents a novel system that continuously expands its navigable landmark set and performs complex natural language-guided navigation tasks. ATLAS integrates three key components: a path planning module for navigating to known landmarks, an object detection module for identifying and localizing objects in the environment, and a large language model (LLM) for high-level reasoning and natural language understanding.

RAFT: Robust Augmentation of FeaTures for Image Segmentation

RAFT is a novel framework for adapting image segmentation models to real-world environments using minimal labeled data. It addresses the Syn2Real problem, where models trained on synthetic data perform poorly once deployed on real images, by combining data and feature augmentations with active learning to close the gap. RAFT is validated on standard synthetic-to-real benchmarks (SYNTHIA to Cityscapes and GTAV to Cityscapes) as well as the real-to-real Cityscapes to ACDC benchmark. Across all three, it surpasses the previous state of the art, HALO, improving segmentation accuracy while keeping the annotation budget low.

E2AR: Energy-Efficient Augmented Reality Framework for Collaborative Multi-Drone Systems

Smart drones are increasingly used for autonomous, vision-based tasks, but the neural networks that power them are computationally heavy and hard to run on the resource-constrained edge devices drones carry. E2AR is an energy-efficient framework that lets multiple edge devices stream video to an augmented reality display while running ML-based navigation onboard. A YOLO detector runs directly on each edge device, keeping energy use low, and the AR layer enhances human-machine teaming by giving operators a clearer, augmented view of what the drones see. The result is a practical setup for collaborative multi-drone systems where efficiency and real-time human oversight both matter.

Vision-Based Autonomous Drone Navigation with Sim2Real Transfer

Training drone navigation policies in simulation and transferring them to the real world usually fails, because real camera images, especially on low-power drones, look too different from simulated ones. This work proposes a framework that sidesteps the gap by feeding object-detection features, rather than raw images, into a reinforcement learning agent, since detectors transfer from sim to real far more reliably. A customized YOLOv5 model detects the target and its bounding box, and those values plus range sensor readings drive a DQN policy trained to reach a goal object while avoiding obstacles. Deployed on a low-power Crazyflie drone with an AI-Deck, the approach reaches the target far more often than raw vision-based navigation while cutting training time dramatically.

GenAI on the Edge

The rise of large-scale foundational models built on trans-former architectures has revolutionized AI capabilities across image recognition (Vision Transformers – ViTs ) and natural language processing (e.g., ChatGPT). While these models demonstrate remarkable performance, their massive size and computational requirements present a fundamental obstacle to their deployment on resource-constrained edge devices. For instance, ViT-base contains 86 million parameters, resulting in a 344 MB model – far too large for embedded systems. Our goal is to develop innovative compression techniques that drastically reduce the footprint of foundational transformer models, enabling their widespread adoption in edge and tinyML applications without compromising their breakthrough capabilities.

Metareasoning and LLM Planner on the Edge

Cutting-edge Large Language Models (LLMs) play a crucial role in improving autonomous navigation by offering efficient solutions. While LLMs require powerful computers to operate, security concerns and maintaining a stable connection with the cloud can be challenging due to various factors. To address this issue, we propose a metareasoning approach for edge-cloud collaborative LLM planning which leads to a efficient autonomous navigation. The proposed approach allows the system to seamlessly switch between cloud and edge devices to fulfill the mission even in the event of a lost connection or entering a GPS-denied environment. Moreover, we deploy state-of-the-art LLM models on resource-constrained systems like the NVIDIA Jetson Orin Nano, integrated with ROSMASTER X3.

Multi-Agent Resiliency

Multi-Agent Networks are crucial for completing task efficiently. When sharing information between agents the data integrity must be maintained. By implementing a reasoning engine from the individual’s agents an informed decision can be made for completing tasks. Multiple UGVs working together can share information to a local server that can reinforce the final decision and identify compromised agents. This demonstration focuses on how autonomously moving vehicles can be trained to move towards a common goal and reinforce decision making by communicating between them. Here, the first JetBot detects the blue object as red and reports it back to the server. The second JetBot detects the blue object as blue and reports it back to the server. The server detects the difference in decisions and directs JetBot 3 to be deployed. Once JetBot 3 reports back to the server, it makes the final decision based on JetBot 3 output and identifies which JetBot has been compromised. Since, the decision making is through the local server, the communication cost is reduced for individual agents. To make the navigation more resource-efficient we can apply network exploration and compression approaches to pin-point suitable configurations for deployment.

Human and Drone Teaming

We proposed an energy-efficient architecture to enable multi-drone video streaming to a HoloLens while applying augmented reality to enhance human-machine teaming. In this video, the drone is following the wall with the help of a lidar sensor and streams the drone’s view to HoloLens.

Energy-Efficient Edge Computing

Artificial Intelligence (AI) and Deep Neural Networks (DNNs) have attracted attention as a solution within autonomous systems fields as they enable applications such as visual perception and navigation. Although cloud-based approaches have already been highly addressed, there is a growing interest in using both AI and DNNs on the edge as this allows for lower latency and avoids the potential security concerns of transmitting data to a remote server. However, deploying DNNs on edge devices is challenging due to the limited computational power available, as well as energy efficiency being of the utmost importance. In this work, we introduce an approach named E2EdgeAI for Energy-Efficient Edge computing that takes advantage of AI for autonomous tiny drones. This approach optimizes the energy efficiency of DNNs by considering the effects of memory access and core utilization on the energy consumption of tiny~UAVs. To perform the experiment, we used a tiny drone named Crazyflie with the AI-deck expansion, which includes an octa-core RISC-V processor. The experimental results show the proposed approach reduces the model size by up to 14.4x, improves energy per inference by 78%, and increases energy efficiency by 5.6x.

Low Power Multi-Agent Reinforcement Learning for UAVs and UGVs and Language Guided Reinforcement Learning for Human-Agent Teaming

This recently funded ARL ArtIAMAS project aims to develop energy efficient AI-driven approaches with heterogeneous autonomous edge devices for teaming, scene understanding and decision making in adversarial settings, Live Demo. Reinforcement Learning (RL) has shown great benefits in command and control. However, training becomes significantly challenging when we scale to multi agent and/or to real world environments. In this project we proposed to divide the tasks in hierarchy, as well as learning of the multi agents in hierarchy, which can significantly improve training. The second part of the presentation shows our project onLanguage Guided Reinforcement Learning for Human-Agent Teaming We proposed a framework to train RL agents conditioned on constraints that are in the form of structured language, thus reducing effort to design and integrate specialized rewards into the environment. In our experiments, we show that this method can be used to ground the language to behaviors and enable the agent to solve tasks while following the constraints. We also show how the agent can transfer these skills to other tasks.

An Energy Efficient and Flexible Multichannel Electroencephalogram (EEG) Artifact Detection

This project aims at an energy efficient and flexible multichannel Electroencephalogram (EEG) artifact detection and identification networks and their reconfigurable hardware implementations. EEG signals are recordings of the brain activities. The EEG recordings that do not originate from cerebral activities are termed as artifacts. Our proposed models do not need expert knowledge for feature extraction or pre-processing of EEG data and have very efficient architectures implementable on mobile devices. The proposed networks can be reconfigured for any number of EEG channel and artifact classes. Experiments were done with different deep learning models (i.e. CNN, Depthwise Separable CNN, LSTM, Conv-LSTM) with the goal of maximizing the detection/identification accuracy while minimizing the weight parameters and required number of operations.

Reinforcement Learning with Highly Reduced Input Size and Model Size

This work demonstrates a reinforcement learning demo with highly reduced input size and model size via the DonkeyCar simulator. Instead of using image observations, this works detects the lane lines in the images and uses line endpoint coordinates as observations (8-element array). The final input size is 8 by 8, which stacks up 8 most recent observations. With this highly reduced input size, we can also reduce the model size to 1 convolution layer and 2 fully connected layers. Furthermore, the extracted line coordinate features cancel out irrelevant background features.

A 0.9 TOP/S/W Accelerator for Structurally Compressed DNNs Featuring Cyclic Sparsely Connected Layers @ ISSCC 2020

Sketching based Big Data Acceleration on Low Power Cores

Wireless medical technologies have created opportunities for new methods of preventive care using biomedical implanted and body-worn devices. The design of the technologies that will enable these applications requires correct delivery of the vital physiological signs of the patient along with the energy management in power-constrained devices. The high cost and even higher risk of battery replacement require that these devices be designed and developed for minimum energy consumption.

Deep Neural Nets for Embedded Big Data Applications

We explore the use of deep neural networks (DNN) for embedded big data applications. Deep neural networks have been demonstrated to outperform state-of-the-art solutions for a variety of complex classification tasks, such as image recognition. The ability to train networks to both perform feature abstraction and classification provides a number of key benefits. One key benefit is that it reduces the burden of the developer to produce efficient, optimal feature engineering, which typically requires expert domain-knowledge and significant time. A second key benefit is that the network’s complexity can be adjusted to achieve desired accuracy performance. Despite these benefits, DNNs have yet to be fully realized in an embedded setting. In this research, we explore novel architecture optimizations and develop optimal static mappings for neural networks onto highly parallel, highly granular hardware processors such as many-cores and embedded GPUs.

A Low Power Wearable Tongue Drive System for People with Severe Disabilities

This work demonstrates an ultra low power multi-sensor Tongue Drive System (TDS) used for individuals with severe disabilities to control their environment using their tongue movement. An ultra low power local processor is proposed which can perform all signal processing at sensor side, rather than sending all raw data out. The proposed TDS will significantly reduce the transmission power consumption and subsequently increase the battery life. Assuming the TDS user issuing one command per second, implementing the proposed local processing reduces the data volume that needs to be wirelessly transmitted to a PC or smartphone by a factor of 1500x, from 12 kbit/s to approximately 8 bit/s. The proposed processor consists of three blocks: I2C protocol for communication, External Magnetic Field (EMF) Attenuation, and Logistic Regression machine learning for command classification. The processor is implemented in 65-nm CMOS technology, occupies 0.016 mm2 and consumes 3.9 nJ energy, which is 41 times smaller than the implementation in the previous work. For demonstration, the complete TDS on headset with FPGA, Bluetooth, battery and sensors has been tested. The detection accuracy is 90.12%.