Information visualization is becoming increasingly important as researchers discover different domains of multidimensional data. Various techniques have been developed to map multidimensional data to three--dimensional scenes, since data from different fields often require dramatically different visualization techniques. Recently, we presented a new technique for visualizing biological sequence similarity information .
Sequence similarity analysis is the comparison of a single sequence against known sequences kept in databases. Similarity analysis provides possible protein functions for the unknown input sequences, reducing the need for painstaking lab work . Often, similarity reports include hundreds or thousands of alignments (matching segments between the input sequence and one of the database sequences); included with each alignment are measurements of how well the input sequence segment matches the database sequence segment. The entire report can be hundreds or thousands of pages long.
Our earlier efforts produced AlignmentViewer (AV), which greatly improved biologists' ability to discover features in this information space. Our group has been using AV on a daily basis for the past 18 months. However, seeing the possibilities offered by information visualization, the biologists in our group became interested in many unrepresented variables that are either in the similarity report, or related to it. Examples of such variables include different similarity measures and the submission date of the matching database sequence.
These previously unrepresented variables exacerbate the intrinsic mismatch in dimensionality between the data and the visualization. Since the data is multidimensional and the graphical system is three-dimensional, we inevitably overload the dimensionality of the screen when we map the data to the screen. How should we choose which variables to map onto the dimensions we can display on the screen? We need to incorporate substantial new capabilities into AV that allow exploration of previously unrepresented variables. Since biological sequence similarity analysis is not an analysis of a physical model, our imagination is not constrained by an underlying physical model. The data are abstract and we are free to decouple and remap the variables to any of the spatial axes. Moreover, we add a time axis for supporting an additional variable. Certain variables, such as the submission date, map naturally onto the time axis. However, we may use the time axis to support animation over any variable. So the three spatial axes and the time axis all together map four variables at a time to the screen.
An additional problem is that the data set is not just multidimensional but also large, making screen real-estate precious.
To reduce screen clutter, we introduce filters on each of the variables to further support analysis. Users can construct queries based on a range for each variable. The visual query filters provide an easy-to-use query interface for the information in the report.
In this paper, we present an implementation of the above techniques within the framework of AV. The contributions of this work are the application of a set of techniques that, although not entirely new when considered separately, provide a powerful interface to our multivariate data when combined:
The remainder of the paper is structured as follows. In the next section, we present related work. In Section 3 we discuss the design of our new visualization system. Section 4 is devoted to case studies illustrating the features of the new technique. Finally, Section 5 contains concluding remarks.
The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.