Reliability For Network Swapping Systems That Support Migration Of Remotely Swapped Pages

Document Type

Conference Proceeding

Publication Date

2004

Published In

Proceedings Of The 16th IASTED International Conference On Parallel And Distributed Computing And Systems

Abstract

Network swapping systems allow individual cluster nodes with over-committed memory to use the idle mem ory of remote nodes as their backing store, and to swap their pages over the network. As the number of nodes in a cluster increases, it becomes more likely that a node will fail or become unreachable, making it important that such a system provide reliability support. Without relia bility, a single node crash can affect programs running on other cluster nodes by losing remotely swapped page data that was stored on the crashed node. Our network swap ping system, Nswap, has design features that complicate reliability: swapped pages can migrate from one node to another in response to changes in a node's local memory needs. As a result, reliability schemes that rely on fixed placement of page and reliability data are not applicable to our system. Our reliability solutions solve the unique challenge of providing reliability to network swapping sys tems that both support dynamic changes to the size of re mote RAM swap space and support migration of remotely swapped page data. Results show that even though our Mir roring reliability scheme adds time and space overhead to Nswap, it still outperforms swapping to disk by a factor of up to 8.2. Our dynamic Parity scheme will provide reliabil ity with minimal time and space overhead.

Keywords

Cluster computing, Network RAM, reliability

Published By

ACTA Press

Conference

16th IASTED International Conference On Parallel And Distributed Computing And Systems

Conference Dates

November 9-11, 2004

Conference Location

Cambridge, MA

This document is currently not available here.

Share

COinS