PeerMon: A Peer-To-Peer Network Monitoring System
Proceedings Of The 24th International Conference On Large Installation System Administration
We present PeerMon, a peer-to-peer resource monitoring system for general purpose Unix local area network (LAN) systems. PeerMon is designed to monitor system resources on a single LAN, but it also could be deployed on several LANs where some inter-LAN resource sharing is supported. Its peer-to-peer design makes Peer-Mon a scalable and fault tolerant monitoring system for efficiently collecting system-wide resource usage information. Experiments evaluating PeerMon's performance show that it adds little additional overhead to the system and that it scales well to large-sized LANs. Peer-Mon was initially designed to be used by system services that provide load balancing and job placement, however, it can be easily extended to providemonitoring data for other system-wide services. We present three tools (smarterSSH, autoMPIgen, and a dynamic DNS binding system) that use PeerMon data to pick "good" nodes for job or process placement in a LAN. Tools using PeerMon data for job placement can greatly improve the performance of applications running on general purpose LANs. We present results showing application speed-ups of up to 4.6 using our tools.
24th International Conference On Large Installation System Administration
November 7-12, 2010
San Jose, CA
Tia Newhall; Janis Libeks , '10; Ross K. Greenwood , '11; and Jeff Knerr.
"PeerMon: A Peer-To-Peer Network Monitoring System".
Proceedings Of The 24th International Conference On Large Installation System Administration.