Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

A Hybrid Non-Clustered Bitmap Index for Supporting High Cardinality Attributes

Abstract Details

2009, Master of Science, Ohio State University, Computer Science and Engineering.

Bitmap indexes are used in a variety of applications, particularly data warehouse systems and scientific databases. In their simplest form, Bitmap indexes are an array of bit vectors for each attribute in the data set. They provide a faster query response time by performing fast logical operations across attribute bitmaps, which are efficiently supported by modern hardware. Bitmap indexes are shown to work well with high-cardinality datasets [1]. However, high cardinality attributes present in the dataset affect the space efficiency of bitmap indexes. There are specialized compression schemes available to overcome these problems by utilizing the standard run length compression algorithms. Byte-aligned bitmap code (BBC) and word-aligned hybrid (WAH) compression techniques are two popular examples.

Sorting can be enforced to keep the data organized so that longer and fewer number of runs are achieved, which leads to better run-length based compression [2]. However, for large dynamic data sets, it becomes infeasible to keep the data always sorted. Also, when we utilize sorting, the initial columns in the data set are sorted more effectively than the rest of the columns. This produces better query response times for the queries which involve only those initial attributes but not other attributes, on which sorting has little impact. The aim of this research is to achieve significant performance gains for queries involving all the attributes in the data set.

Canahuate et al. overcome this problem in their recently proposed Non-Clustered Bitmap (NCB) index [3]. In NCB, data is split into smaller and more manageable chunks using horizontal and vertical partitioning. Horizontal partitioning ensures the partition fits into main memory, whereas vertical partitioning ensures better compression and query response time.

Although NCB has proven to work significantly faster with large datasets with a large number of rows and attributes, the performance gains are low in the presence of high cardinality attributes. We aim to extend NCB to achieve better performance even for high cardinality attributes. We observe in our experimental results that WAH-compressed bitmaps perform slightly better than NCB with high cardinality attributes. Hence we follow a hybrid approach utilizing NCB and WAH with their complementary strengths, and develop a query processing algorithm that combines the partial results from both. Our approach, called NCB-W, applies NCB for low to medium cardinality attributes and WAH compression scheme to generate bitmaps for high cardinality attributes. As query execution time in NCB depends on the size of the EB, NCB-W ensures faster execution time by controlling the size of EB in NCB. Depending on the cardinalities of the attributes queried, NCB-W controls the execution by dynamically selecting the implementation approach. To analyze the performance of NCB-W, we implemented the Star Schema Benchmark (SSB) queries and compare the results with FastBit [4].

Hakan Ferhatosmanoglu (Advisor)
Gagan Agrawal (Committee Member)
57 p.

Recommended Citations

Citations

  • Pendharkar, Y. (2009). A Hybrid Non-Clustered Bitmap Index for Supporting High Cardinality Attributes [Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1262203195

    APA Style (7th edition)

  • Pendharkar, Yogesh. A Hybrid Non-Clustered Bitmap Index for Supporting High Cardinality Attributes. 2009. Ohio State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1262203195.

    MLA Style (8th edition)

  • Pendharkar, Yogesh. "A Hybrid Non-Clustered Bitmap Index for Supporting High Cardinality Attributes." Master's thesis, Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1262203195

    Chicago Manual of Style (17th edition)