Paper summary. Distributed Deep Neural Networks over the Cloud, the Edge, and End Devices

This paper is by Surat Teerapittayanon, Bradley McDanel, and H.T. Kung at Harvard University and appeared in ICDCS'17. 

The paper is about partitioning the DNN for inference between the edge and the cloud. There has been other work on edge-partitioning of DNNs, most recently the Neurosurgeon paper. The goal there was to figure out the most energy-efficient/fastest-response-time partitioning of a DNN model for inference between the edge and cloud device. 

This paper adds a very nice twist to the problem. It adds an exit/output layer at the edge, so that if there is high-confidence in classification output the DNN replies early with the result, without going all the way to the cloud and processing the entire DNN for getting a result. In other words, samples can be classified and exited locally at the edge when the system is confident and offloaded to the cloud when additional processing is required.

This early exit at the edge is achieved by jointly training a single DNN with an edge exit point inserted. During the training, the loss is calculated by taking into account both the edge exit point and the ultimate exit/output point at the cloud end. The joint training does not need to be done via device/cloud collaboration. It can be done centrally or at a datacenter.

DDNN architecture


The paper classifies the serving/inference hierarchy as local, edge, cloud, but I think local versus edge distinction is somewhat superfluous for now. So in my summary, I am only using two layers: edge versus cloud.

Another thing this paper does differently than existing edge/cloud partitioning work is its support for horizontal model partitioning across devices. Horizontal model partitioning for inference is useful when each device might be a sensor capturing only one sensing modality of the same phenomena.

DDNN uses binary neural networks to reduce the memory cost of NN layers to run on resource constrained end devices. A shortcoming in the paper is that the NN layers that end up staying in the cloud is also binary neural network. That is unnecessary, and it may harm precision.

DDNN training

At training time, the loss from each exit is combined during backpropagation so that the entire network can be jointly trained, and each exit point achieves good accuracy relative to its depth. The idea is to optimize the lower parts of the DNN to create a sufficiently good feature representations to support both samples exited locally and those processed further in the cloud.

The training is done by forming a joint optimization problem to minimize a weighted sum of the loss functions of each exit. Equal weights for exits are used for the experimental results of this paper.

They use a normalized entropy threshold as the confidence criteria that determines whether to classify (exit) a sample at a particular exit point. The normalized entropy close to 0 means that the DDNN is confident about the prediction of the sample. At each exit point, the normalized entropy is computed and compared against a threshold T in order to determine if the sample should exit at that point.

The exits are determined/inserted manually before training. Future research should look into automating and optimizing the insertion of exit points.

DDNN evaluation

The evaluation is done using a dataset of 6 cameras capturing the same object (person, car, bus) from different perspectives. The task is to classify to these three categories: person, car, and bus. This task may be overly simple. Moreover, the evaluation of the paper is weak. The multicamera dataset has only 680 training samples and 171 testing samples.

Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book