Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Reducing Network Latency for Low-cost Beowulf Clusters

Carver, Eric R

Abstract Details

2014, MS, University of Cincinnati, Engineering and Applied Science: Computer Engineering.
Parallel Discrete Event Simulation (PDES) is a fine-grained parallel application that can be difficult to optimize on distributed Beowulf clusters. A significant challenge on these compute platforms is the relatively high network latency compared to the high CPU performance on each node. The frequent communications and high network latency means that event information communicated between nodes can arrive after a significant delay where the processing node is either waiting for the event to arrive (conservatively synchronized solutions) or prematurely processing events while the transmitted event is in transit (optimistically synchronized solutions). Thus, solutions to reduce network latency are crucial to the deployment of PDES. Conventional attacks on network latency in cluster environments are to use high priced hardware such as Infiniband and/or lightweight messaging layers other than TCP/IP. However, clusters are generally high cost systems (tens to hundreds of thousands of dollars) that, by necessity, must be shared. The use of lower latency hardware such as Infiniband can nearly double the hardware cost and the replacement of the TCP/IP network stack on a shared platform is generally infeasible as other users of the shared platform (with coarse-grained parallel computations) are well served by the TCP/IP stack and unwilling to rewrite their applications to use the APIs of alternate network stacks. Furthermore, configuring the hardware with multiple messaging transport layers is also quite difficult to setup and not generally supported. Low cost, small-form factor compute nodes with multi-core processing chips are becoming widely available. These solutions have lower performing compute nodes and yet often still support 100Mb/1Gb Ethernet hardware (reducing the network latency/processor performance disparity). The much lower per node costs (on the order of $200 per node) can enable the deployment of non-shared, dedicated clusters and thus, may be an attractive alternative for network customization and use to support PDES applications. This thesis explores this option of using an ODROID compute node for the cluster. The conventional TCP/IP networking stack is replaced with the (publicly available) RDMA over Converged Ethernet (RoCE) networking layer which has significantly lower latency costs. We find that RoCE solution is capable of reducing end-to-end small message latency by more than 30%. This translates to a performance improvement of greater than 10% (compared to the TCP/IP solution) for PDES applications using Rensselaer's Optimistic Simulation System (ROSS). However, when comparing the ODROID-based cluster performance for cost, both in terms of operations per second and Parallel Discrete Event Simulation performance, we find that its performance does not justify its price for either application.
Philip Wilsey, Ph.D. (Committee Chair)
Wen Ben Jone, Ph.D. (Committee Member)
Carla Purdy, Ph.D. (Committee Member)
78 p.

Recommended Citations

Citations

  • Carver, E. R. (2014). Reducing Network Latency for Low-cost Beowulf Clusters [Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1406880971

    APA Style (7th edition)

  • Carver, Eric. Reducing Network Latency for Low-cost Beowulf Clusters. 2014. University of Cincinnati, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1406880971.

    MLA Style (8th edition)

  • Carver, Eric. "Reducing Network Latency for Low-cost Beowulf Clusters." Master's thesis, University of Cincinnati, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1406880971

    Chicago Manual of Style (17th edition)