[PerfNA] Performance of host-based Network Applications

With the advancement of highly network-powered paradigms like 5G, Microservices, etc. which are typically deployed as containers/VMs, there is a growing imperative on the host nodes to perform specialized network tasks like monitoring, filtering, tunneling, load-balancing, etc. While traditionally, these tasks were performed using switches and specialized middleboxes in the network, there is a demand to perform these network tasks on commodity hardware comprising of COTS servers. However, a major challenge is to perform these tasks at low-overhead and high reliability while maintaining low latency, high throughput, and flexibility.

Recently, there are several approaches from academia and industry for building such high-performance network applications. The proposed solutions are built on different high-speed packet processing targets such as DPDK (Data Plane Development Toolkit), eBPF(Extended Berkeley Packet Filter), P4-based smart NICs, DPUs, IPUs, OVS, RDMA, FPGA, etc, each has its own compiler toolchain, accelerators, and programming interface for developers. Some prior work also leverages these solutions for optimizing the TCP stack which benefits these applications deployed on COTS servers.

This workshop is focused on understanding the performance aspects of network applications built over such tools and targets. With several heterogeneous approaches available at the hosts, there is a need for dedicated discussions in this area to understand the pros/cons of using different approaches to solve different network issues, and how these different approaches can converge into a unified framework at the host. We hope this workshop could serve as a catalyst to initiate discussion both among industry and academia on how host-based tools can be leveraged to solve complex network issues and encourage its wide adaptability.

Workshop Program on June 6th

9:00 AM IST Welcome Address by Organizers
Venue: Seminar Room 2, Victor Menezes Convention Centre (IIT Bombay)
9:15 AM - 9:40 AM IST GranularNF: Granular Decomposition of Stateful NFV at 100 Gbps Line Speed and Beyond
Ziyan Wu, Tianming Cui, Arvind Narayanan, Yang Zhang, Kangjie Lu, Antonia Zhai, Zhi-Li Zhang (University of Minnesota – Twin Cities)
Abstract : In this paper, we consider the challenges that arise from the need to scale virtualized network functions (VNFs) at 100 Gbps line speed and beyond. Traditional VNF designs are monolithic in state management and scheduling: internally maintaining all states and operations associated with them. Without proper design considerations, it suffers from limitations when scaling at 100 Gbps link speed and beyond: the inability of efficient utilization of the cache because of the contention due to the frequent control plane activities, computational/memory-intensive tasks taking up CPU times, shares states causing the synchronization among the cores. We address these limitations by arguing for the need to granularly decompose a VNF into data/control components that are co-located within a server but can be independently scaled among the cores. To realize the approach, we design a "serverless" programming framework with novel abstraction to optimize the data components that must process packets at the line speed, reduce the contention of the data states and enable run-time scheduling of different components for improved resource utilization. The abstractions, combined with the runtime system that we design, help NFV developers focus on the logic and correctness of VNF programming without worrying about how VNFs may be scaled in or out. We evaluate our platform by comparing it with monolithic approaches using different workloads and by analyzing its advantages of separation on scalability, performance determinism, and feature velocity.
9:45 AM - 10:15 AM IST Invited Talk: Cooperative Network Functions
Ganesh C. Sankaran (Information Sciences Institute/University of Southern California)
Abstract: When a packet traverses the network towards a destination It encounters several network switches. These network switches executes a series of functions on the packet.  We observe that these functions are repetitive. Then we introduce the notion of cooperation among consecutive switches to reduce repetition. We formulate this as an optimization of cooperative network functions. The rest of the talk will present our cooperative parsing and routing works in detail.
Bio: Ganesh is currently a computer scientist at USC's Information Sciences Institute. He received his Masters and Doctoral degrees in Computer Science and Engineering from IIT Madras. He has rich industry experience working for Cisco and Dell EMC. He has more than 20 issued patents and another 7 in various stages in Indian and US patent offices. He has served in TPC of ICC ONS and NCC. He is an expert reviewer on INFOCOM, and ToN. His interest lies at the intersection of High speed networks and Computing specifically focusing on cross layer concerns such as security and power.
10:15 AM - 10:30 AM Coffee & Tea Break
10:30 AM - 11:00 AM IST Invited Talk: Software-hardware Co-design for Network Functions in Cloud
Rohan Gandhi (Microsoft)
Abstract: Network Functions (NFs) such as Firewall, IDS and load-balancers demand low latency and high throughput to improve application performance and lower costs. While the status-quo has been to buy specialized devices to implement NFs, there is a growing trend to implement these NFs on commodity servers. Such designs offer high "programmability" and faster development cycles, but suffer from high latency and low throughput per instance by processing packets in the software. On the other hand, P4 and Broadcom based switches offer low latency and ultra-high throughput. We believe that hardware-software co-design offers a sweet spot to get the best of switches and software. In this work, we show how to build such a hybrid design using load-balancer as an example. We show that such a design can substantially improve latency, throughput, availability at a fraction of costs.
Bio: Rohan Gandhi is a research engineer at Microsoft Research India. He previously completed his PhD at Purdue university advised by Prof. Y. Charlie Hu. His research work spans many aspects of networking and systems including network functions, SDN, cluster scheduling, key-value stores. His research work is published in many top conferences including ACM SIGCOMM, ACM CoNEXT, Usenix ATC and ACM Eurosys.
11:00 AM - 11:30 AM IST Invited Talk: Building programmable networks
Rinku Shah (IIIT Delhi)
Abstract: Recent advancements in networking have resulted in high-speed, programmable network cards and switches, where the speeds range from 100 Gbps to 10's of Tbps. Server application designers have offloaded their computations to programmable network hardware to increase application throughput, reduce latency, and save server CPUs. This talk will introduce programmable networking concepts and describe state-of-the-art application offloads. Such application offloads have resulted in newer research domains, such as the design of frameworks for programming abstractions, network management, and security. This talk will introduce such research domains with open problems.
Bio: Rinku Shah is currently an Assistant Professor in the CSE department at Indraprastha Institute of Information Technology, Delhi (IIITD). Her research interests are in the intersection domain of Software-Defined Networking (SDN), programmable data planes, and 4G/5G mobile networks. Her research is published at reputed conference venues such as SOSR, ICNP, and APNet.
11:30 AM - 12:00 PM IST Invited Talk: NFV and Edge Cloud for Efficient Machine Learning and Security Services.
Sameer G Kulkarni (IIT Gandhinagar)
Abstract: Edge clouds are deployed to provide very responsive services to the end-users.  Especially, the increasing demand for cloud-based inference services requires the use of Graphics Processing Units (GPUs). However, the Edge resources such as CPUs and GPUs are limited and must be shared across multiple concurrently running clients. It is highly desirable to multiplex different inference tasks on the GPU and utilize the GPUs efficiently. Batched processing, CUDA streams and Multi-process-service (MPS) help. However, we find that these are not adequate for achieving scalability and do not guarantee predictable performance. Further, edge servers require considerable amounts of streaming data to be processed and account for load variations. In this talk, I will present  our NFV platform GSLICE.  GSLICE incorporates a dynamic GPU resource allocation and management framework to maximize performance and resource utilization. We virtualize the GPU by apportioning the GPU resources across different Inference Functions (IFs), thus providing isolation and guaranteeing performance. We develop controlled spatial sharing of GPU, self-learning and adaptive GPU resource allocation and batching schemes that account for network traffic characteristics, while also keeping inference latencies below service level objectives. GSLICE adapts quickly to the streaming data’s workload intensity and the variability of GPU processing costs. GSLICE provides scalability of the GPU for IF processing through efficient and controlled spatial multiplexing, coupled with a GPU resource reallocation scheme with minimal (< 100 microseconds) downtime. Compared to TensorRT and default MPS, GSLICE significantly improves GPU utilization efficiency and achieves 2-13x improvement in aggregate throughput.
Bio: Sameer G Kulkarni is an Assistant Professor in Computer Sciences at Indian Institute of Technology, Gandhinagar. Prior to this, he worked as a postdoctoral  researcher at the University of California, Riverside. He  received a Ph.D.  degree  in  Computer  Science  from University  of  Göttingen, Germany  in  July  2018. He  received his M.S. degree in Computer Engineering from the University of Southern California, in 2010, and B.E. degree in Computer Science and Engineering from National Institute of Engineering, Mysore, in 2004. He is the recipient of the IEEE TCSC Best PhD Dissertation Award 2019. His work focuses on the resource management aspects towards building Efficient, Scalable and Resilient NFV/Edge platforms. His  research  interests  include  Software  Defined  Networking, Network Function Virtualization, Edge Cloud Platforms, Distributed systems, and  Disaster  Management.
12:00 PM - 1:30 PM IST Lunch
1:30 PM - 2:00 PM IST Invited Talk: Domain Specific Run Time Optimization for Software Data Planes
Sebastiano Miano (Queen Mary University London (UK))
Abstract: State-of-the-art approaches to design, develop and optimize software packet-processing programs are based on static compilation: the compiler’s input is a description of the forwarding plane semantics and the output is a binary that can accommodate any control plane configuration or input traffic. In this talk, I will demonstrate that tracking control plane actions and packet-level traffic dynamics at run time opens up new opportunities for code specialization. I will present Morpheus, a system working alongside static compilers that continuously optimizes the targeted networking code. Morpheus introduces a number of new techniques, from static code analysis to adaptive code instrumentation, together with a toolbox of domain specific optimizations that are not restricted to a specific data plane framework or programming language. We applied Morpheus to several eBPF and DPDK programs including Katran, Facebook’s production-grade load balancer, showing considerable improvement both in terms of throughput and latency.
Bio: Sebastiano Miano is a PostDoc researcher at the School of Electronic Engineering and Computer Science of Queen Mary University of London. He received his Ph.D. and M.Sc. from the Computer Engineering department of Politecnico di Torino, in Italy. During his PhD he was Visiting Researcher at the EIT Digital Silicon Valley Hub in San Francisco, and at the University of Cambridge, UK. His research interests include programmable data planes and high-speed network function virtualization with a focus on eBPF and XDP.
2:00 PM - 2:30 PM IST Invited Talk: Composition Framework for Packet-processing Programs
Hardik Soni (NEC Laboratories Europe GmbH)
Abstract: The state-of-the-art dataplane languages, like P4, and hardware architectures use domain-specific primitives such as programmable parsers and match-action tables to enable flexible and efficient processing of packets. Unfortunately, P4 programs tend to be monolithic and tightly coupled to the hardware architecture, which makes it hard to write programs in a portable and modular way-e.g., by composing reusable libraries of standard protocols. First, we will discuss various research challenges in writing library modules and target-architecture independent programs. Next, I will present the design and implementation of a novel framework (μP4) comprising a lightweight logical architecture that abstracts away from the structure of the underlying targets and naturally supports powerful forms of program composition. Using examples, we show how μP4 enables modular programming. Next, I will present a prototype of the compiler that generates code for multiple lower-level architectures, including Barefoot’s Tofino Native Architecture. We evaluate the overheads induced by our compiler on realistic examples. We will briefly discuss possible optimizations. Finally, we will revisit the same research challenges for SmartNICs and Control Plane to draw a roadmap for future research opportunities.
Bio: Hardik is a senior researcher at NEC Laboratories Europe GmbH. He started his research career as a PhD student at INRIA Sophia Antipolis, France followed by a post-doc at Computer Science department of Cornell University, USA. Earlier, he worked as a Senior Software Engineer at Alcatel-Lucent, India for a couple of years.
2:30 PM - 3:30 PM IST Graduate Forum
1. Towards an implementation of a Time-Sensitive Networking switch, Joydeep Pal, IISc Bangalore
2. Extracting Network Flow Features Efficiently in P4 Data plane, Sankalp Mittal, IIT Hyderabad
3. Tandem Queue Decomposition: An Optimal Policy for QKD Networks, Krishnakumar G, IIT Madras
4. Detecting adversarial attacks on Bloom filters in P4 data plane systems, K. Shiv Kumar, IIT Hyderabad
5. P4 Programmable Data Plane based MUD Enforcement for IoT Security, Harish S A, IIT Hyderabad
3:30 PM - 3:45 PM IST Tea Break
3:45 PM - 5:15 PM IST eBPF Hands-on
Details of the hands-on session are available Here