next up previous
Next: Algorithm Visualization Up: Domain Prototypes Previous: Molecular Biology

Time-series Matrices

 

Besides similarity data, a time-series of matrices is another type of data that presents challenges of the type commonly encountered in information visualization. Two major difficulties arise in dealing with time-series matrices. The first difficulty is to identify differences in the matrix values between successive matrices. The second difficulty is that there are many visual representations that can be applied. For example, the "cityscape" representation shows the matrix values as 3D bars, whereas the "heatmap" representation show the values as colored tiles [28]. Different representations extract different features, so an easy way to view and explore these several representations simultaneously is needed. Fortunately, the spreadsheet environment is excellent for dealing with these difficulties.

We encountered two matrix series in trying to solve problems with molecular biologists, who are interested in studying the effect of mutation and natural selections on genetic sequences. Natural selection accepts certain mutations, which result in the substitutions of one protein residue by another residue. For a mutation to be accepted, the protein usually must function in a similar way to the old one, presumably due to chemical and physical similarities. PAM and BLOSUM are two series of matrices with each matrix representing substitution probabilities at a given evolutionary distance [7, 11]. The two matrix series were calculated from different sets of information sources. An element tex2html_wrap_inline623 of a matrix specifies the relative probability that the amino acids i and j will be substituted after a given evolutionary interval. A positive entry specifies an accepted mutation that is more likely than random, whereas a negative entry specifies less likely than random.

The detailed nature of this series of matrices results in a large amount of information [7]. For example, these matrices are used in the calculation of similarity between sequences. Unfortunately, the computational molecular biology community have not applied visualization techniques to these matrices. To be sure, biologists are very interested in understanding the nature of these series of matrices due to their mathematical and biological complexity. The computational molecular biology community seeks to understand these matrices, because the choice of which matrix to employ is dependent on the situation.

We have used the SIV system (the second prototype) to try to gain a better understanding of these matrices. We used our system to compare the two matrix series (PAM and BLOSUM), and found that the ability to quickly bring in data and lay them out in different ways to be extremely useful. For example, after 7 lines of commands, the last row shows the BLOSUM62 matrix. To understand the differences between the matrices, it is important to be able to visually compare a number of different matrices simultaneously. In Figure 2, the first, second, third, and fourth rows of cells visualize the PAM40, PAM120, PAM250, and BLOSUM62 matrix, respectively. The first column uses a cube representation that maps positive matrix values to the volume, height, and color attributes of the cubes. The second column uses a carpet plot that maps values to the height and color of a 3D surface (using a rainbow colormap with negative entry mapped to red). The third column uses a bar representation that maps values to the length, height, and color attributes of the bars. The fourth column shows various representations in different rotational configurations.

In Figure 2, by vertically scanning the spreadsheet, the user can detect differences between matrices quickly. As we can see from all the columns, the diagonals of these matrices have strong values, which makes sense since the identity substitution (no mutation) is favored by evolution. From the second column we see that the matrices are quite different because the colors get brighter and brighter from top to bottom. The last row shows the BLOSUM62 matrix, and we see its values are clearly different from any of the PAM matrices shown.

 

  figure82


Figure 2: Visualization of time-series matrices. The visualization is built using the second system (SIV). The screen snapshot shows visualizations of protein residue substitution probability matrices of various evolutionary distances. The first, second, and third rows visualize matrix 40, 120, and 250 from the PAM matrix series. The fourth row visualizes matrix 62 from the BLOSUM matrix series. The first column uses a cube representation that maps positive matrix values to the volume, height, and color attributes of the cubes. The second column uses a carpet plot that maps values to the height and color of a 3D surface. The third column uses a bar representation that maps values to the length, height, and color attributes of the bars. The fourth column shows various representations in different rotational configurations.


next up previous
Next: Algorithm Visualization Up: Domain Prototypes Previous: Molecular Biology

Ed Chi
Tue Jul 22 19:31:52 PDT 1997