Business Affiliate Programs · SEO · Personals · Advertising · Resources

Annotated References on Bitmap Index

Synopses by John Wu unless specified otherwise

Email a paper for inclusion
Suggesting your own paper is encouraged


Research Articles

[2013] • [2012] • [2011] • [2010] • [2009] • [2008] • [2007] • [2006] • [2005] • [2004] • [2003] • [2002] • [2001] • [2000] • [1990s] • [1980s] • [related]

2013

Indexing Million of Packets per Second using GPUs. F Fusco, M Vlachos, X Dimitropoulos, L Deri. IMC'13
Source: NTOP.org IMC'13
Synopsis: In this work, the authors propose to intelligently offload indexing to commodity General Processing Units (GPUs). They introduce algorithms for building compressed bitmap indexes in real time on GPUs and achieve indexing throughputs of up to 185 millions records per second, which is an improvement by one order of magnitude compared to the state-of-the-art.
Dr. Fusco has worked on a number of projects using bitmap indexes.

2012

Efficient Iceberg Query Evaluation Using Compressed Bitmap Index. B He, Hui-I Hsiao, Z Liu, H Yu. Y Chen. IEEE Transactions on Knowledge and Data Engineering
Source: IEEE TKDE
Synopsis: In this paper, the authors exploited the property of bitmap index and developed a very effective bitmap pruning strategy for processing iceberg queries. Their index-pruning-based approach eliminated the need of scanning and processing the entire data set (table) and thus speeds up the iceberg query processing significantly. Experiments showed that their approach was much more efficient than existing algorithms commonly used in row-oriented and column-oriented databases.

2011

The SB-index and the HSB-index: efficient indices for spatial data warehouses. T. Siqueira, R. Ciferri, V. Times, C. Ciferri. Geoinformatica
Source: Geoinformatica
Synopsis: This journal paper describes the SB-index and HSB-index for spatial data warehouses (SDW) developed on top of FastBit indexes. Tests show the the new methods are able to reduce the query response time by 68 - 99% while adding less than 1% to the overall space requirement.
This is the journal version of a conference paper published in 2009.

2010

Analyses of Multi-Level and Multi-Component Compressed Bitmap Indexes. K. Wu, A. Shoshani, and K. Stockinger. 2010. ACM Transactions on Database Systems, v 35, Article 2, 2010.
Source: [ACM] [Preprint as LBNL-60891]
Synopsis: This article presents an indepth analysis of a variety of bitmap indexing methods. Among the popular bitmap indexing methods, the analysis showed that the bit-sliced index and the compressed equality index occupy two extreme points on the space-time trade-off curve. The analysis also shows a way to construct multi-level bitmap indexes that offer better space-time trade-off than known methods.
The analysis concentrates on the worst-case scenario with the random data. It considers only one compression method, the Word-Aligned Hybrid code.
Position list word aligned hybrid: optimizing space and performance for compressed bitmaps. F. Deliege, and T. B. Pedersen. 2010. EDBT '10, Pages 228--239, 2010.
Source: [ACM] [EDBT]
Synopsis: This paper presents a compact structure for bitmaps. It extends a trick from Byte-aligned Bitmap Code to make Word-Aligned Hybrid (WAH) code more compact. A WAH fill word represents a group of identical bits (called a fill), however, there many cases where such a fill is followed by a literal word with only a single bit that is different. With 32-bit words, PLWAH uses 5 bits to represent the position of this bit that is different. In both BBC and PLWAH, the group of mixed bits following a fill is combined into the data representing the fill.
For certain types of data, PLWAH can reduce the space requirement of WAH by a factor of two.
Concise: Compressed `n' Composable Integer Set. A. Colantonio, and R. Di Pietro. 2010. Information Processing Letters, v 110, Pages 644--650, 2010.
Source: [ScienceDirect] [Arxiv]
Synopsis: This paper presents a compact structure for bitmaps that is very similar to PLWAH. Both of them modifies the WAH fill word to absorb an adjacent literal word. The key difference is that this technique combines the literal word before the fill word to form a new fill word, while PLWAH absorbs the literal word following the fill word.
NET-FLi: On-the-fly Compression, Archiving and Indexing of Streaming Network Traffic. F. Fusco, M. Ph. Stoecklin, and M. Vlachos. VLDB. 2010.
Source: [VLDB]
Synopsis: This paper presents a system for capturing and indexing streaming network traffic data. What is really intriguing is that it is the third article on modifying the WAH compression method in 2010, following PLWAH and Concise. This variation, called COMPAX, goes one step further than both PLWAH and Concise, it introduces more ways of combining neighboring fills and literal words. It has four different types of compressed words, which is two more than the other WAH variations.
Sorting improves word-aligned bitmap indexes. D. Lemire, O. Kaser, and K. Aouiche. Data & Knowledge Engineering 69(1):3-28. 2010.
Source: ScienceDirect
Synopsis: This paper presents a formal analysis of how reordering the rows affects the Run-Length compressed bitmap indexes. Despite what the title says, this article actually covers a lot more material than sorting. For example, it started with a discussion on bitmap compression and bitwise logical operations, both of which are critical building blocks for a practical bitmap index implementation. It also has an extensive discussion on k-of-N encoding, an idea that was mentioned in an earlier work by Wong et al. (1985), but has not been used much.

2009

Secondary indexing in one dimension: beyond B-trees and bitmap indexes. R. Pagh and S. Satti. PODS 2009
Source: ACM
Synopsis (By Rasmus Pagh): This paper considers range queries whose result is given as a compressed bitmap. A static data structure is given that consists of a number of compressed bitmaps, from which any range query can be answered. Assuming that each bitmap is optimally compressed, the amount of data read is within a constant factor of the size of the result of the range query. This asymptotically improves previous results in cases where the query result is highly compressible. However, no experimental results are provided. The paper also presents some results on a similar, but much more complex, data structure that allows updates.
Secondary Bitmap Indexes with Vertical and Horizontal Partitioning Guadalupe Canahuate, Tan Apaydin, Ahmet Sacan, and Hakan Ferhatosmanoglu. In EDBT 2009.
Source: [ACM]
Synopsis:This paper presents a strategy to index all possible combinations of different column values. It demonstrates that for data with very low attribute cardinalities, the proposed scheme (called ranked Non-clustered Bitmaps) can outperform a method called Secondary Sorted Bitmaps and an open-source software called FastBit.
New Binning Strategy for Bitmap Indices on High Cardinality Attributes. Navneet Goyal and Yashvardhan Sharma. In Compute 2009, Jan 9,10, Bangalore, Karnataka, India.
Source: [ACM]
Synopsis: This paper proposes to use overlapped bins to reduce the query processing time. To reduce the possible overlapping choices to consider, they assume the queries to be processed are known ahead of time.
The impact of spatial data redundancy on SOLAP query performance. T. Siqueira, C. Ciferri, V. Times, A. de Oliveira and R. Ciferri. Journal of the Brazilian Computer Society 15(1)19-34. 2009.
Source: Brazilian Computer Society
Synopsis: This is an extension of an earlier paper. It adds a more formal description of the problem being addressed, and provides additional performance evaluations as well. In the ealier version, the take-home message was that the new SB-Index was much more compact; in this paper, the emphasis is on the reduction of query response time. The new index can reduce the query response time by 80 - 95%. Very impressive!
A spatial bitmap-based index for geographical data warehouses. T. Siqueira, R. Ciferri, V. Times, C. Ciferri. 2009 Symposium on Applied Computing
Source: ACM
Synopsis: The Spatial Bitmap Index introduced in this paper uses compressed bitmaps to represent Minimal Bounding Rectangles (MBR) for complex objects in geographical data warehouses. Tests on two versions of TPC-H benchmark (called Geographical Redundant Star Schema Benchmark and Geographical Hybrid Star Schema Benchmark) show that using the new index can reduce query answering time by 25 - 95%. Between the two benchmarks, the Geographical Redundant Star Schema Benchmark seems to have denormalized the underlying data unnecessarily, which make it taking 10 times the space as the Geographical Hybrid Star Schema Benchmark. Using the later benchmark, the reductions in query answering time are between 90 - 95%, which can be expressed as speedup of 10 - 20 times.

2008

An Adequate Design for Large Data Warehouse Systems: Bitmap Index versus B-tree Index. M. Zaker, S. Phon-Amnuaisuk and S. Haw. International Journal of Computer and Communication, 2:2, 39--46, 2008.
Source: [University Press]
Synopsis: This article aims to provide a up-to-date guidelines for choosing between the bitmap index and the B-tree index. Rather than giving general advices such as those based on attribute cardinality, it provides some concrete measurements with Oracle 11G. One key observation is that bitmap indexes are more efficient in answering queries than B-trees even on high cardinality data. This agrees with some recent observations from others, e.g. Wu et al., 2006 and Sharma 2005. At the same time, the authors also cast some doubts on how the cardinality dominates the performance characteristics of bitmap indexes.
Another noteworthy observation is that the bitmap indexes can be constructed 8 to 10 times faster than the B-trees in six out of the seven test cases (shown in Table IV). In the last case, the bitmap index was constructed 3 times as fast as the B-tree on the same set of data.
Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses. T. Siqueira, R. Ciferri, V. Times, C. Ciferri. GEOINFO 2008
Source: GEOINFO 2008
Synopsis: This paper investigates how the data redundancy affects the query performance in Geographical Data Warehousing applications. For a specialist in bitmap indexing, there are two take-home messages. (1) Compressed bitmap indexes are very effective in this type of application. The compression used in this work is the Word-Aligned Hybrid (WAH) code. (2) Join indexes are much large than the regular column indexes. Since a join index is built on the cross product of two tables, we would expect it to take more space than the index built for a single column. However, we were surprised by the difference. The examples given in this paper showed that the join indexes are 1000 times larger than the regular column indexes.
Application of Bitmap Index to Information Retrieval. K. Fujioka, Y. Uematsu, and M. Onizuka. WWW 2008
Source: [ACM]
Synopsis: This paper proposes a hierarchical structure called HS-bitmap index to represent document-term matrix. The authors implemented their data structure on PostgreSQL and observed it to perform better than an inverted index. A short-coming might be that HS-bitmap index takes more space than the inverted index even after compression.
Note this work makes use of PostgreSQL but is unrelated to the on-going work of implementing bitmap index in PostgreSQL.

2007

Bitmap Index Design Choices and Their Performance Implications. E. O'Neil, P. O'Neil and K. Wu. IDEAS 2007
Source: [IEEE] [preprint as LBNL-62756]
Synopsis: By far the most important thing to get out of this paper is that packing the bitmaps as tightly as possible is a good approach. This is a big part of the improvements on the RIDBit code during the project -- the text of the paper alluded to the improvements twice. The second point to note is that it demonstrated again that vertical partitioning is a good thing for read-only data.
Multi-resolution bitmap indexes for scientific data. Rishi R. Sinha and Marianne Winslett. ACM Trans. Datab. Syst., v32:3 article #16. 2007.
Source: [ACM]
Synopsis: This paper presents a set of multi-level binning strategies. In their tests, the multi-level indexes were shown to be about 10 times faster than the single-level indexes. Potentially, most of the performance gain is from reduction in the number of bitmaps accessed during query processing. The same authors have published some timing measurements before [IPDPS 2006], the key contribution of this paper is a systematic analysis of the multi-level indexes. This is also a big part of Sinha's PhD thesis on indexing scientific data.
The approach of using multi-level bitmap indexes was also reported earlier in a tech report in 2001.

2006

Optimizing bitmap indices with efficient compression. K. Wu, E. Otoo, and A. Shoshani. 2006. ACM Transactions on Database Systems, v 31, pages 1-38, 2006.
Source: [ACM] [Preprint as LBNL-49626]
Synopsis: This paper presents a formal analysis of the WAH compressed (basic) bitmap index. It formally establishes the WAH compressed index as one of the optimal indexing methods, an exclusive club including the best of B-tree variants such as B*-tree and B+tree. This puts the WAH compressed bitmap index at the same theoretical standing as the best of the B-trees. However, compressed bitmap indexes have an unique advantage. Because it is much easier to combine the results from multiple bitmap indexes, it is more efficient to answer a query using multiple bitmap indexes than using multiple B-Tree indexes. Therefore, bitmap indexes are more appropriate in query-intensive applications, such as data warehousing and business intelligence.
Scatter Bitmap: Space-Time Efficient Bitmap Indexing for Equality and Membership Queries. S. Vanichayobon, J. Manfuekphan, and L. Gruenwald. 2006 IEEE Conference on Cybernetics and Intelligent Systems.
Source: IEEE
Synopsis: This paper presents interesting example of 2-out-N encoding postulated earlier by Wong et al.. Compared with one-component equality, range, and interval encoding, the Scatter Bitmaps are more compact. For equality and membership queries, it is more efficient than these one-component encodings. Compared with the binary encoding, it takes more space, but can answer queries faster.
Approximate encoding for direct access and query processing over compressed bitmaps. Tan Apaydin, Guadalupe Canahuate, Hakan Ferhatosmanoglu, and Ali Saman Tosun. In Proceedings of the 32nd international conference on Very large data bases, Seoul, Korea. Pages: 846 - 857. 2006.
Source: [Tan's copy] [Hakan's copy]
Synopsis: This paper presents an indexing technique based on multiple hashes. This indexing method can reduce the index sizes as well as the query response time. The trade-off is that it answers queries approximately. The paper also shows a way to predict the rate of false positives. Note that the technique presented is similar to Bloom filters.

Original published versions: [VLDB] [ACM]

On-Disk Bitmap Index In Bizgres. A. Parashar and J. Zhang. 2006.
Source: [GreenPlum]
Synopsis: This presentation describes a bitmap index implementation effort at GreenPlum. Since Bizgres is based on Postgres and bitmap is making some headway into PostgreSQL, we may see full-fledged bitmap index in PostgreSQL yet.
A bitmap index implementation is available as a patch to PostgresSQL as of May 2007.

2005

Supporting RFID-based item tracking applications in Oracle DBMS using a bitmap datatype. Y. Hu, S. Sundara T. Chorma J. Srinivasan. VLDB 2005. Pages: 1140-1151.
Source: [VLDB] [ACM]
Synopsis: This paper introduces a new data type in ORACLE DBMS called bitmap and a set of techniques to work with these bitmaps efficiently. The Summary Bitmap Tree described in this paper is similar to certain multi-level bitmap indexes with equality encoding.

2004

On the Performance of Bitmap Indices for High Cardinality Attributes. K. Wu, E. Otoo, and A. Shoshani. 2004. VLDB 2004, pages 24 - 35.
Source: [VLDB] [Preprint as LBNL-54673]
Synopsis: This paper describe the critical algorithm to make compressed bitmap index answers queries in time that is no worse than a linear function of the number of hits (i.e., the number of records selected by the query condition). In terms of computational complexity, this places WAH compressed bitmap index in the same category of optimal indexing methods as the best of the B-tree indexes.
The critical algorithm is to perform bitwise logical OR operations on a large number of small compressed bitmap in time that proportional to the total size of all the bitmaps involved.

2003

Hierarchical Bitmap Index: An Efficient and Scalable Indexing Technique for Set-Valued Attributes. M. Morzy, T. Morzy, A. Nanopoulos, Y. Manolopoulos. Advances in Databases and Information Systems, 2003. Pages: 236-252.
Source: [Springer] [CiteSeer]
Synopsis: The Hierarchical Bitmap Index proposed in this paper is in fact of hierarchy of signatures that are designed to answer the set-containment queries. It is a version of a signature tree. The signatures are organized into a tree structure with the higher level nodes indicating the non-empty nodes at the next level. The leaf nodes contains information about the presence of a value in a set.

2001

A performance comparison of bitmap indexes. K. Wu, E. Otoo and A. Shoshani. 2001. CIKM 2001.
Source: [ACM] [Preprint of as tech report LBNL-48975]
Synopsis: This is a brief write up about WAH compression and its performance on a small subset of data from a high-energy project called STAR. The work with STAR eventually lead to the development of Grid Collector.
Bit-sliced index arithmetic. D. Rinfret, P. O'Neil, and E. O'Neil. 2001. SIGMOD Rec. 30, 2 (Jun. 2001), 47-57.
Source: [ACM] [PDF from authors.]
Synopsis: This paper describes a number of algorithms for operating on the bitmaps of a bit-sliced index to compute various qualities such as SUM, MAX and so on. These operations are useful for aggregate queries. This paper also addresses the issue of Bit-Sliced Term-Match for Information Retrieval.
BitCube: A Three-Dimensional Bitmap Indexing for XML Documents. J. P. Yoon, V. Raghavan, V. Chakilam, and L. Kerschberg. Journal of Intelligent Information Systems, Vol. 17, November 2001.
Source: [Springer] [CoverPages]
Synopsis: In this paper, the authors describe a bitmap indexing technique for clustering XML documents. The BitCube is formed from three dimensions of documents (d), XML-elements (p), and terms (w). The BitCube can be used to efficiently locate documents containing XML-elements and terms. In addition, it can also be used to computer aggregate values such as averages and standard deviations.
There are quite a few follow-on work using BitCube.

2000

The spatial bit-sliced: An index structure for spatial databases. A. F. Al-Badarneh. 2000. PhD Thesis. Wayne State University.
Source: [UMI]
Synopsis: This dissertation presents Spatial Bit-Sliced (SBS) index based on Bit-Sliced indexing method. The main conclusion from their studies is that the performance of SBS index structure is “immune” to the query size, regardless of the query size, while other indexing strategies such as R*-tree exhibit significant variations depending on the query size.

1990s

Performance Measurements of Compressed Bitmap Indices. T. Johnson. 1999. VLDB 1999. Pages 278-289. Morgan Kaufmann Publishers, San Francisco, CA.
Source: [VLDB] [ACM]
Synopsis: This paper describes a set of performance measurements comparing a number of different bitmap compression techniques. In the tests, no compression technique was the clear winner, even though the Byte-aligned Bitmap Code (BBC) was the most efficient in many cases. The author eventually developed a strategy to choose the most appropriate compression method based characteristics of the bitmaps.
An efficient bitmap encoding scheme for selection queries. C. Chan and Y. Ioannidis. 1999. SIGMOD '99. ACM Press, New York, NY, 215-226.
Source: [ACM] [SIGMOD]
Synopsis: This paper introduces Interval Encoding and compares its performance characteristics with Equality Encoding and Range Encoding.
Group Bitmap Index: A Structure for Association Rules Retrieval. T. Morzy, M. Zakrzewicz. KDD 1998. Pages: 284-288.
Source: [Posnan]
Synopsis: The Group Bitmap Index was designed to address the association rule retrieval problem. Two variations are proposed in this paper, the Simple Group Bitmap Index and Hash Group Bitmap Index . The Simple Group Bitmap Index is essentially a bitmap join index between item number and the group number (transaction ID). The Hash Group Bitmap Index can be thought of as a one-component Bloom filter, thus the same analysis apply.
Bitmap index design and evaluation. C. Chan and Y. Ioannidis. 1998. SIGMOD '98. ACM Press, New York, NY, 355-366.
Source: [ACM] [Prof. Chan's PDF version (from National University of Singapore)] [PS file from U. Wisc.]
Synopsis: This paper proposes Range Encoding and multi-component encodings. From their analysis, they conclude that using two-component encodings provides the optimal space-time trade-off among the multi-component encodings.
Improved query performance with variant indexes. P. O'Neil and D. Quass. 1997. SIGMOD '97. ACM Press, New York, NY, 38-49.
Source: [ACM]
Synopsis: This paper presents a short review Bitmap indexes, and Bit-Sliced index and Projection index. A Projection index materializes all values of a column in RID order, and a Bit-Sliced index essentially takes an orthogonal bit-by-bit view of the same data, both of which are available in Sybase IQ. This paper also present analysis that demonstrates important performance advantages for variant indexes in some types of SQL aggregation, predicate evaluation, and grouping.
Multi-table joins through bitmapped join indices. P. O'Neil and D. Quass. ACM SIGMOD Record. Volume 24, Issue 3, Pages: 8-11. 1995.
Source: [ACM]
Synopsis: This is the first paper to fully explain the bitmap join indexes for multi-table joins based on foreign keys. Such indexes are used in popular database management systems and are very effective for data warehousing applications. This is an important reason why some people claim that "the star schema and bitmap indexes are a marriage made in heaven."

1980s

Model 204 Architecture and Performance. P. O'Neil. 1987. In Proceedings of the 2nd international Workshop on High Performance Transaction Systems (September 28 - 30, 1987). Lecture Notes In Computer Science, vol. 359. Pages 40-59, Springer-Verlag, London.
Synopsis: This paper contains a full description of the first commercial implementation of bitmap index in a system called Model 204 by Computer Corporation of America.
Bit Transposed Files. H. K. T. Wong, H. Liu, F. Olken, D. Rotem, Linda Wong. 1985. VLDB 1985, Pages: 448-457.
Source: [VLDB] [SIGMOD]
Synopsis: This paper introduces the Bit Transposed File as "an extreme version of the (attribute) transposed file." It stores the bulk of the data as compressed bit vectors and answers queries with bitwise logical operations. This establishes two key characteristics of a bitmap index: using bit vectors to store information and use bitwise logical operations to answer queries. They did not present an implementation of bitmap indexes.

Guides for Practitioners

Indexing Goes a New Direction. R. Winter. 1999. Intelligent Enterprise January 1999
Source: [wintercorp]
Synopsis: This article describes the Encoded Vector Index (EVI) implemented in IBM DB2. EVI can be considered as a version of the binary encoded bitmap index. Another name used in literature is the bit-sliced index.
Bitmap Index vs. B-tree Index: Which and When? V. Sharma. 2005.
Source: [Oracle Technology Network]
Synopsis: This article has detailed explanations of how to use Auricles bitmap index feature. In the opening paragraph, it states that "a bitmap index is always advisable for systems in which data is not frequently updated by many concurrent systems." In other word, the usefulness of a bitmap index is no longer limited to low cardinality attributes any more! Zaker et al. (2008) gave the concrete measurements to affirm this general observation.
Bitmap Indexes. J. Lewis. 2006. DBAZine.com.
Source: [Part 1: Understanding Bitmap Indexes] [Part 2: Star Transformations] [Part 3: Bitmap Join Indexes]
Synopsis: This series of articles explains the bitmap index in ORACLE and how to structure the database to maximize the advantage of using bitmap indexes.

General Database References

Data Warehousing, Decision Support & OLAP. M. Stonebraker and J. Hellerstein, eds. 1998.
Source: [redbook.cs.berkeley.edu]
Synopsis: A broad review of database technologies with references to bitmap indexing.
Part of the web supplement of "Readings in Database Systems."

Related Lists

Professor Daniel Lemire maintains a list of references on Data Warehousing and OLAP
Wikipedia has an entry on bitmap index
Top 100 results on bitmap index from Google Scholar
Top results on bitmap index from Scientific Commons
John Wu © John Wu
Disclaimer

The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.