Observing and Mitigating Micro Burst Traffic in Data Center Networks in Java

Observing and Mitigating Micro Burst Traffic in Data Center Networks in Java

Abstract:

Micro-burst traffic is not uncommon in data centers. It can cause packet dropping, which may result in serious performance degradation (e.g., Incast problem). However, current approaches to mitigate micro-burst is usually ad-hoc and not based on a principled understanding of the underlying behaviors. On the other hand, traditional studies focus on traffic burstiness in a single flow, while micro-burst traffic in the data centers could occur with highly fan-in communication pattern, and its dynamic behavior is still unclear. To this end, in this paper, we re-examine the micro-burst traffic in typical data center scenarios. We find that the evolution of micro-burst is determined by both TCP's self-clocking mechanism and congestion control algorithm. Besides, dynamic behaviors of micro-burst under various scenarios can all be described by the time derivative of queue length evolution.Our observations also implicate that conventional solutions like absorbing and pacing are ineffective to mitigate micro-burst traffic.Instead, senders need to rapidly respond to some explicit signals of the queue buildup caused by the micro-burst traffic rather than independently and ineffectually pacing themselves in isolation. Inspired by the findings and insights from experimental observations, we propose Micro-burst-Aware Transport Control Protocol (MATCP), which leverages characteristic behaviors of micro-burst traffic derived from the time derivative of the queue occupancy. MATCP can suppress the sharp queue length increment by over 2x and reduce the tail query completion time by up to 84.4%.