GenAI on the Edge

The rise of large-scale foundational models built on trans-former architectures has revolutionized AI capabilities across image recognition (Vision Transformers – ViTs ) and natural language processing (e.g., ChatGPT). While these models demonstrate remarkable performance, their massive size and computational requirements present a fundamental obstacle to their deployment on resource-constrained edge devices. For instance, ViT-base contains 86 million parameters, resulting in a 344 MB model – far too large for embedded systems. Our goal is to develop innovative compression techniques that drastically reduce the footprint of foundational transformer models, enabling their widespread adoption in edge and tinyML applications without compromising their breakthrough capabilities.

Multi-Agent Resiliency

Multi-Agent Networks are crucial for completing task efficiently. When sharing information between agents the data integrity must be maintained. By implementing a reasoning engine from the individual’s agents an informed decision can be made for completing tasks. Multiple UGVs working together can share information to a local server that can reinforce the final decision and identify compromised agents. This demonstration focuses on how autonomously moving vehicles can be trained to move towards a common goal and reinforce decision making by communicating between them. Here, the first JetBot detects the blue object as red and reports it back to the server. The second JetBot detects the blue object as blue and reports it back to the server. The server detects the difference in decisions and directs JetBot 3 to be deployed. Once JetBot 3 reports back to the server, it makes the final decision based on JetBot 3 output and identifies which JetBot has been compromised. Since, the decision making is through the local server, the communication cost is reduced for individual agents. To make the navigation more resource-efficient we can apply network exploration and compression approaches to pin-point suitable configurations for deployment.

Human and Drone Teaming

We proposed an energy-efficient architecture to enable multi-drone video streaming to a HoloLens while applying augmented reality to enhance human-machine teaming. In this video, the drone is following the wall with the help of a lidar sensor and streams the drone’s view to HoloLens.

Energy-Efficient Edge Computing

Artificial Intelligence (AI) and Deep Neural Networks (DNNs) have attracted attention as a solution within autonomous systems fields as they enable applications such as visual perception and navigation. Although cloud-based approaches have already been highly addressed, there is a growing interest in using both AI and DNNs on the edge as this allows for lower latency and avoids the potential security concerns of transmitting data to a remote server. However, deploying DNNs on edge devices is challenging due to the limited computational power available, as well as energy efficiency being of the utmost importance. In this work, we introduce an approach named E2EdgeAI for Energy-Efficient Edge computing that takes advantage of AI for autonomous tiny drones. This approach optimizes the energy efficiency of DNNs by considering the effects of memory access and core utilization on the energy consumption of tiny~UAVs. To perform the experiment, we used a tiny drone named Crazyflie with the AI-deck expansion, which includes an octa-core RISC-V processor. The experimental results show the proposed approach reduces the model size by up to 14.4x, improves energy per inference by 78%, and increases energy efficiency by 5.6x.

Human Machine Teaming Crazyflie and HoloLens, person detection by crazyflie and location projection on HoloLens

Sim2Real Reinforcement Learning Crazyflie Reaching Goals and Obstacle Avoidance

Low Power Multi-Agent Reinforcement Learning for UAVs and UGVs and Language Guided Reinforcement Learning for Human-Agent Teaming

This recently funded ARL ArtIAMAS project aims to develop energy efficient AI-driven approaches with heterogeneous autonomous edge devices for teaming, scene understanding and decision making in adversarial settings, Live Demo. Reinforcement Learning (RL) has shown great benefits in command and control. However, training becomes significantly challenging when we scale to multi agent and/or to real world environments. In this project we proposed to divide the tasks in hierarchy, as well as learning of the multi agents in hierarchy, which can significantly improve training. The second part of the presentation shows our project onLanguage Guided Reinforcement Learning for Human-Agent Teaming We proposed a framework to train RL agents conditioned on constraints that are in the form of structured language, thus reducing effort to design and integrate specialized rewards into the environment. In our experiments, we show that this method can be used to ground the language to behaviors and enable the agent to solve tasks while following the constraints. We also show how the agent can transfer these skills to other tasks.

An Energy Efficient and Flexible Multichannel Electroencephalogram (EEG) Artifact Detection

This project aims at an energy efficient and flexible multichannel Electroencephalogram (EEG) artifact detection and identification networks and their reconfigurable hardware implementations. EEG signals are recordings of the brain activities. The EEG recordings that do not originate from cerebral activities are termed as artifacts. Our proposed models do not need expert knowledge for feature extraction or pre-processing of EEG data and have very efficient architectures implementable on mobile devices. The proposed networks can be reconfigured for any number of EEG channel and artifact classes. Experiments were done with different deep learning models (i.e. CNN, Depthwise Separable CNN, LSTM, Conv-LSTM) with the goal of maximizing the detection/identification accuracy while minimizing the weight parameters and required number of operations.

Reinforcement Learning with Highly Reduced Input Size and Model Size

This work demonstrates a reinforcement learning demo with highly reduced input size and model size via the DonkeyCar simulator. Instead of using image observations, this works detects the lane lines in the images and uses line endpoint coordinates as observations (8-element array). The final input size is 8 by 8, which stacks up 8 most recent observations. With this highly reduced input size, we can also reduce the model size to 1 convolution layer and 2 fully connected layers. Furthermore, the extracted line coordinate features cancel out irrelevant background features.

A 0.9 TOP/S/W Accelerator for Structurally Compressed DNNs Featuring Cyclic Sparsely Connected Layers @ ISSCC 2020

Sketching based Big Data Acceleration on Low Power Cores

Wireless medical technologies have created opportunities for new methods of preventive care using biomedical implanted and body-worn devices. The design of the technologies that will enable these applications requires correct delivery of the vital physiological signs of the patient along with the energy management in power-constrained devices. The high cost and even higher risk of battery replacement require that these devices be designed and developed for minimum energy consumption.

Deep Neural Nets for Embedded Big Data Applications

We explore the use of deep neural networks (DNN) for embedded big data applications. Deep neural networks have been demonstrated to outperform state-of-the-art solutions for a variety of complex classification tasks, such as image recognition. The ability to train networks to both perform feature abstraction and classification provides a number of key benefits. One key benefit is that it reduces the burden of the developer to produce efficient, optimal feature engineering, which typically requires expert domain-knowledge and significant time. A second key benefit is that the network’s complexity can be adjusted to achieve desired accuracy performance. Despite these benefits, DNNs have yet to be fully realized in an embedded setting. In this research, we explore novel architecture optimizations and develop optimal static mappings for neural networks onto highly parallel, highly granular hardware processors such as many-cores and embedded GPUs.

A Low Power Wearable Tongue Drive System for People with Severe Disabilities

This work demonstrates an ultra low power multi-sensor Tongue Drive System (TDS) used for individuals with severe disabilities to control their environment using their tongue movement. An ultra low power local processor is proposed which can perform all signal processing at sensor side, rather than sending all raw data out. The proposed TDS will significantly reduce the transmission power consumption and subsequently increase the battery life. Assuming the TDS user issuing one command per second, implementing the proposed local processing reduces the data volume that needs to be wirelessly transmitted to a PC or smartphone by a factor of 1500x, from 12 kbit/s to approximately 8 bit/s. The proposed processor consists of three blocks: I2C protocol for communication, External Magnetic Field (EMF) Attenuation, and Logistic Regression machine learning for command classification. The processor is implemented in 65-nm CMOS technology, occupies 0.016 mm2 and consumes 3.9 nJ energy, which is 41 times smaller than the implementation in the previous work. For demonstration, the complete TDS on headset with FPGA, Bluetooth, battery and sensors has been tested. The detection accuracy is 90.12%.