next up previous
Next: Time-series Matrices Up: Domain Prototypes Previous: Domain Prototypes

Molecular Biology

 

Biologists exploring DNA sequences often compare a given sequence against a database of known sequences. Similarity search algorithms produce reports indicating regions of similarity, and other information useful to biologists. These reports can be tens or hundreds of pages long for one sequence.

Previously, we developed a system, called AlignmentViewer, that allows visualization of the most prominent data in such reports [4, 5]. The basic 3D visual representation of this data consists of comb-like glyphs that show the different regions of similarity, how similar they are, and where they occur along the input sequence. (For example, see cell A1 in Figure 1.) The user can explore the data further by such means as interactively rotating, translating or scaling the representation, following a hyperlink to the textual report, mapping the data into a different geometric representation, animating the information over a variable, and filtering the data. The report data has many variables, and only a small number of them can appear in a single 3D visualization. AlignmentViewer mitigates this problem by enabling the user to selectively map the data dimensions onto 3D space, and allowing dynamic filtering of this data. In addition to dynamic query capabilities, we also support several types of animation along any of the dimensions, enhancing the display to 4D.

We chose this data domain for a number of reasons. First, it has a number of properties similar to many datasets we encounter in information visualization: (1) The similarity reports are highly textual, and (2) the similarity relationships between items are an important visualization problem. Second, we are collaborating closely with molecular biologists who can interact with us on a day-to-day basis. This allows us to directly support their information analysis tasks. For example, they have found the ability to compare visualizations for related sequences useful, and have specifically requested the ability to apply a number of different operations to the visualizations simultaneously. This close collaboration has allowed us to directly capture many of the requirements of building a visualization spreadsheet.

Using AlignmentViewer as a basis, we built our first prototype to support the analysis tasks carried out by molecular biologists. Figure 1 shows a snapshot of an example session. The cells are loaded with similarity data between genetic sequences. Each comb glyph within a cell represents an alignment, which is a region of similarity between the input sequence and a sequence from the database [4, 5]. The figure is the result of a three step operation:

Step 1
Each column is loaded with a different dataset generated from the same input sequence by varying one parameter of the algorithm. Here, we change the parameter that is used to specify the sensitivity of the algorithm with respect to distantly-related versus closely-related sequences. We decrease the distance from far to near in columns 1, 2, and 3, respectively.
Step 2
We select Row B and then subtract cell A3 from each cell in that row. Thus, B1=B1-A3, B2=B2-A3, B3=B3-A3.
Step 3
At this point, cells in Row C and D still contain the same datasets as the corresponding cells in Row A. We change the variables that are represented on the X, Y, and Z axis, resulting in different views of the datasets.

 

  figure67


Figure 1: A screen snapshot of the first system (SSR) after performing three operations. (Step 1) Initially, we loaded each column with a slightly different, but related, dataset (A1=B1=C1=D1, A2=B2=C2=D2, A3=B3=C3=D3). (Step 2) We selected Row B, and then subtracted cell A3 from it (B1=B1-A3, B2=B2-A3, B3=B3-A3). Cell B3 contains the empty set as expected. (Step 3) We changed Row C and D to show different views of Row A. The views show different sets of variables using a different representation, thus increasing our ability to see other dimensions of the multivariate datasets simultaneously.


next up previous
Next: Time-series Matrices Up: Domain Prototypes Previous: Domain Prototypes

Ed Chi
Tue Jul 22 19:31:52 PDT 1997