Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Parallel Solution of the Subset-sum Problem: An Empirical Study

Abstract Details

2011, Master of Science, Ohio State University, Computer Science and Engineering.

We investigate the parallelization of an algorithm on three very different architectures. These are: a 128-processor Cray XMT massively multithreaded machine, a 16-processor IBM x3755 shared memory machine and a 240-core NVIDIA FX 5800 graphics processor unit (GPU). The problem we use in our investigation is the well-known subset-sum problem. While this is known to be NP-complete, it is solvable in pseudo-polynomial time, i.e., time proportional to the number of input objects multiplied by the sum of their sizes. This product defines the size of the dynamic programming table used to solve the problem. The hypothesis that we wish to test is that the Cray, with its specialized hardware and large uniform shared memory, is suitable for very large problems, the IBM x3755 is suitable for intermediate sized problems and the NVIDIA FX 5800 can give superior performance only for problems that fit within its modest internal memory.

We show that it is straightforward to parallelize this algorithm on the Cray XMT primarily because of the word-level locking that is available on this architecture. For the other two machines we present an alternating word algorithm that can implement an efficient solution. The timings of our respective codes were carefully measured over a comprehensive range of problem sizes. On the Cray XMT we observe very good scaling for large problems and see sustained performance as the problem size increases. However this machine has poor scaling for small problem sizes; it performs best for problem sizes of 1012 bits or more. The IBM x3755 performs very well on medium sized problems, but has poor scalability as the number of processors increases and is unable to sustain performance as the problem size increases. This machine tends to saturate for problem sizes of 1011 bits. The NVIDIA GPU performs well for problems whose tables t within its 4GB device memory. This corresponds to tables of size approximately 1010/.

The experimental measurements support our hypothesis and show that each machine gives superior performance for distinct ranges of problem sizes and demonstrate the strengths and weaknesses of the three architectures.

Ten H. Lai, PhD (Advisor)
Dong Xuan, PhD (Committee Member)
54 p.

Recommended Citations

Citations

  • Bokhari, S. S. (2011). Parallel Solution of the Subset-sum Problem: An Empirical Study [Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1305898281

    APA Style (7th edition)

  • Bokhari, Saniyah. Parallel Solution of the Subset-sum Problem: An Empirical Study. 2011. Ohio State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1305898281.

    MLA Style (8th edition)

  • Bokhari, Saniyah. "Parallel Solution of the Subset-sum Problem: An Empirical Study." Master's thesis, Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1305898281

    Chicago Manual of Style (17th edition)