Low Latency Virtualization-based Fault Tolerance

技術探索

Low Latency Virtualization-based Fault Tolerance

2016-07-07

Abstract

Virtualization technology has been widely adopted to reduce IT cost, to improve management and to increase service reliability by consolidating hardware servers and providing automatic virtual infrastructures. However, the reliability of virtual machines running on virtualized servers is threatened by hardware failures beneath the whole virtual infrastructure, but nosy hypervisors that essentially support virtual machines cannot be trusted. To protect virtual machine from hardware failures, virtualization-based fault tolerance system for an individual virtual machine is designed, implemented and evaluated. And we choice epoch-based fault tolerance method because it can support multi-core platform and it can save the backup machine performance overhead compared to log replay method. However, the epoch-based method will bring the long latency overhead, so we need to optimize processor usage and save backup bandwidth. We propose some optimization method such as tracking of dirty virtual device states to saving processor usage and fine-grained dirty region tracking to saving backup bandwidth. Furthermore, we solve the issue about the unexpected long time of snapshot using pending list method. And we also solve the TCP performance issues due to holding output buffer using fake ACK optimization. Finally, we do some experiment to show the performance result about our optimization and we can gain a low-overhead and low-latency virtualization-based fault tolerance system.