Kesheng "John" Wu

PhD graduate of the Computer Science Department
Professional associations
ACM - Distinguished Scientist
IEEE - Senior Member
Current work (at Berkeley Lab, Berkeley Lab on youTube)
Bitmap Index: [FastBit source code] [FastBit documentation]
Our WAH compressed bitmap index, was shown to be significantly faster than the best available commercial DBMS product [SSDBM 2002], and can speed up visualization tasks as well [IEEE VIS 2005]. Extensive analyses show that our method is theoretical optimal in the same sense that B*tree and B+tree are optimal for query processing [ACM TODS 2006, ACM TODS 2010]. Best of all, the source code that implements this indexing method and many others are available for free in a package called FastBit. For its capability in solving big data problems, FastBit has received the R&D 100 Award in 2008 [Photo of developers at R&D 100 Award event].
[Annotated References on Bitmap Index]
Connected Component Labeling: This grows out of our work on feature tracking for combustion data analysis. The key insight is that there is a way to make use of an implicit union-find data structure to speed up the connected component labeling algorithms, which in turn leads to faster algorithms for finding regions of interest. In particular, using compressed bitmaps as a representation of points in the regions of interest, we can find these regions in time that is proportional to the the number of points on the boundary of the regions. This is shown to be faster than the best iso-contouring algorithms and other region finding algorithms. This is also the basis of some of the work on visualization and visual analytics, e.g., finding flame front in combustion data, find distributed attacks in network traffic, and tracking particle in Laser Wakefield Accelerator data.
Eigenvalue Computation: Thick-restart Lanczos method for symmetric eigenvalue problems is a continuation of the work for solving eigenvalue problems. A Fortran 90 implementation for symmetric eigenvalue problems and a C implement for hermitian and symmetric eigenvalue problems are available under a BSD-like license.
Thesis work [PDF version of thesis] [Publications]
Title: Preconditioned Techniques for Large Eigenvalue Problems
Abstract This research focuses on finding a large number of eigenvalues and eigenvectors of a sparse symmetric or Hermitian matrix, for example, finding 1000 eigenpairs of a 100,000 100,000 matrix. These eigenvalue problems are challenging because the matrix size is too large for traditional QR based algorithms and the number of desired eigenpairs is too large for most common sparse eigenvalue algorithms. In this thesis, we approach this problem in two steps. First, we identify a sound preconditioned eigenvalue procedure for computing multiple eigenpairs. Second, we improve the basic algorithm through new preconditioning schemes and spectrum transformations.
Through careful analysis, we see that both the Arnoldi and Davidson methods have an appropriate structure for computing a large number of eigenpairs with preconditioning. We also study three variations of these two basic algorithms. Without preconditioning, these methods are mathematically equivalent but they differ in numerical stability and complexity. However, the Davidson method is much more successful when preconditioned. Despite its success, the preconditioning scheme in the Davidson method is seen as flawed because the preconditioner becomes ill-conditioned near convergence. After comparison with other methods, we find that the effectiveness of the Davidson method is due to its preconditioning step being an inexact Newton method. We proceed to explore other Newton methods for eigenvalue problems to develop preconditioning schemes without the same flaws. We found that the simplest and most effective preconditioner is to use the Conjugate Gradient method to approximately solve equations generated by the Newton methods. Also, a different strategy of enhancing the performance of the Davidson method is to alternate between the regular Davidson iteration and a polynomial method for eigenvalue problems. To use these polynomials, the user must decide which intervals of the spectrum the polynomial should suppress. We studied different schemes of selecting these intervals, and found that these hybrid methods with polynomials can be effective as well. Overall, the Davidson method with the CG preconditioner was the most successful method the eigenvalue problems we tested.
Inforamtion available elsewhere on the web
Google Scholar, ResearchGate, ACM Author Profile, DBLP
John Wu
Rant and Rave
Commentary about work in the belly of the beast called IT.
Pick a random rave.

Current Work