Title

PeerMon: A Peer-To-Peer Network Monitoring System

Document Type

Conference Proceeding

Publication Date

2010

Published In

Proceedings Of The 24th International Conference On Large Installation System Administration

Abstract

We present PeerMon, a peer-to-peer resource monitoring system for general purpose Unix local area network (LAN) systems. PeerMon is designed to monitor system resources on a single LAN, but it also could be deployed on several LANs where some inter-LAN resource sharing is supported. Its peer-to-peer design makes Peer-Mon a scalable and fault tolerant monitoring system for efficiently collecting system-wide resource usage information. Experiments evaluating PeerMon's performance show that it adds little additional overhead to the system and that it scales well to large-sized LANs. Peer-Mon was initially designed to be used by system services that provide load balancing and job placement, however, it can be easily extended to providemonitoring data for other system-wide services. We present three tools (smarterSSH, autoMPIgen, and a dynamic DNS binding system) that use PeerMon data to pick "good" nodes for job or process placement in a LAN. Tools using PeerMon data for job placement can greatly improve the performance of applications running on general purpose LANs. We present results showing application speed-ups of up to 4.6 using our tools.

Conference

24th International Conference On Large Installation System Administration

Conference Dates

November 7-12, 2010

Conference Location

San Jose, CA

Comments

A recording of this presentation is freely available courtesy of USENIX.