If it is your job to modify spmath, to fix bugs, or to change or extend its functionality, this document will attempt to outline for you the basics of modifying the SPMath package.
SPMath is supported by many tools, some of the used at compile-time and others at run-time. These tools allow SPMath to parse commands, present data visually, perform matrix operations, and be recompiled on other systems with minimum hassle.
Lex is a tool for generating lexicographical analyzers. It takes an input file spmath.l and generates a "lexer" in the form of a c-file which is compiled into the project.
Yacc, similar to lex, starts as a definition file (spmath.y) and ends up as a C file that is comiled in with the final product. Its input is a "grammar" defining legal command constructs and defining the code executed for each command. The result is a "parser," and this along with the lexer constitute the command interpretter for SPMath.
Tcl is a scripting language often used to easily develop visual interfaces for C programs. It is also cross-platform. The toolkit that encapsulates the host environment's individual widgets (buttons, images, etc) is called tk. SPMath uses tcl/tk for its windowed command interpretter and also its sparse matrix viewer.
Autoconf is a tool that creates the "./configure" file distributed with SPMath. "configure" will ensure that all the compilers needed are present, that they work, and that libraries, headers files, and other things needed to compile SPMath math are also there. "configure" is created by "configure.in" by running "autoconf" in the SPMath root directory. "configure" takes "Makefile.in," performs a few substitutions, and creates the "Makefile"s in each of the source directories.
Sparse Kit is a library of FORTRAN subroutines that perform operations on sparse matrices, written by Yousef Saad. Its subroutines are called extensively throughout SPMath.
SPMath commands use a yacc grammar in order to simplify the process of understanding the user's commands. This section summarizes the structure of the grammar so that the reader can gain a better understanding of how to read and modify the grammar.
Lex (the GNU version is called flex), is a tool for creating lexacagraphical analyzers from definition files. A lexacagraphical analyzer (or "lexer") takes a string and outputs a "token stream." A token stream is a sequence of constants that translate the input into a sentence of constants so that the parser can better understand the nature of the input. For example, for the input:
myfunc(-2 + -4)
Lex might translate this into the following token stream:
FUNCTION '(' NUMBER PLUS NUMBER ')'
Somewhere in the grammar definition there might be a "rule" for this particular token stream whose action is to call the function myfunc with input -6.
Yacc (GNU version: bison) takes over at this point, and examines the structure of the command to decide what to do. If the structure given to it by lex is not valid, for example:
PLUS NUMBER PLUS
...would be invalid becuase it is not well-defined in our yacc grammar, then yacc will output a syntax error. The grammar for SPMath tries to emulate the grammar for matlab, octave, and scilab.
The files that act as input to lex and yacc are spmath.l and spmath.y respectively. When flex and bison are run on these files (the Makefile does this for you) then output files containing the C code for the parser are produced which are further processed with sed and finally named "lex.sed.C," "y.sed.C," and "spmath.tab.h." These are what are compiled into the final product.
The file is divided into three pieces. The first section contains C code to be copied into the header of the resultant file as well as some lex directives. The second section contains the actual "rules" which consist of regular expressions, and the corresponding code that is to be executed when those expressions are encountered by lex. The final section contains more C code that is copied into the bottom of the resultant file. For more information on lex definition files, Try O'Reilly's "lex & yacc" by John R. Levine, Tony Mason & Doug Brown.
For every token that is detected by the lexer, there are two pieces of data that are passed to yacc. The first is the token itself, which is simply a unique constant given to that token so that it can be distinguished from other tokens in the grammar. These values are assigned by yacc and are stored in the yacc-generated header file "spmath.tab.h", discussed later. The second piece of information acts as a catch-all for any additional information that is convenient for the parser to have. Take the following rule:
\"[^"\n]*["\n] { yylval->val = new value (strdup (yytext)); return STR; }
This rule uses a regular expression to catch quoted strings and pass them to the grammar. The token is passed to the grammar via the return statement "return STR," but an extra piece of information, the string itself, is passed via yylval. This is so that when yacc parses the statement, it can have access to the string itself for performing the function of the statement. Value is always used to wrap the values passed via yylval. See "The Value Class" for more details about this wrapper.
spmath.y is by far the largest file in the project. It contains the grammar rules for all of the SPMath commands, as well as all the code that handles the execution of the commands. Like spmath.l, the rules are defined in the second section, and the first and third sections are mostly for C code.
Yacc rules consist of a left-hand-side followed by one or more right-hand-sides, each with its own block of C code. For example, here is the rule that defines the "TYPE" command:
| TYPE '(' expression ')' { if(text_flag) { char *temp = new char[4 + 1 + strlen($3->str) + 1]; sprintf(temp, "type(%s)", $3->str); $$ = new value(temp); $3->free(); delete $3; } else { $$ = spit_file($3->str); $3->free(); delete $3; } }
Now when yacc comes across an expression that matches the pattern "TYPE '(' expression ')'" (where expression is defined by the lhs "expression") it will execute the code that is encased in the curly braces. This rule itself is a rhs of the lhs "function_call."
As mentioned before, a yacc grammar is like a context-free grammar. The back-end of the parser is somewhat like a push-down automaton that recognizes the grmmar that is defined. Whenever the parser encouters a new token, it first pushes that token onto its stack, then tries to match the top of the stack with a right-hand-side of one of its rules. When a match occurs, all of the tokens in the match are popped back off the stack, then the action is executed, and then the symbol "$$" is pushed back onto the stack. If this happens, then the parser will check again, now that the stack has changed, to see if the top of the stack matches a right-hand-side of a rule. If so, it repeats this process. When no right-hand-side rules can be matched by the tokens on the top of the stack, another token is taken from the token stream and pushed onto the stack and the process repeats.
For more information on yacc grammars, see O'Reilly's "lex & yacc" by John R. Levine, Tony Mason & Doug Brown.
SPMath contains a heirarchy of classes that facilitate the handling of all the primitive data types. SPMath's primitives are numbers, strings, dense matrices, sparse matrices, and user-defined functions.
Value is the class that encapsulates all the primitives. Its definition is found in value.C. It is essentially a union of of pointers to all the data types it encapsulates:
union { char *str; double num; dense *dmat; sparse *smat; funct_def *funct; }
Value is used as the type for many of the functions defined in functions.C.
Symtab represents a symbol table. It is implemented as a hash with 107 buckets. Variables stored in the sumbol tables are of type "entry," which is a simple class that encapsulates a (name,value) pair. For every execution of SPMath, a global symbol table is created, and for every user-defined function that is called, a second local symbol is created for the symbols within that function.
The dense class encapsulates the functionality of a dense matrix. The values in the matrix are stored in a one-dimensional array of doubles. To access row i and column j, use a statement like this:
matrix[(i-1)*ncol + (j-1)]
Many of the member functions of the dense class are analogous to those in the sparse class.
Vectors are usually represented as vertically-oriented dense matrices.
The sparse class encapsulates the functionality of a Compressed Sparse Row sparse matrix. The values are stored in the three arrays, ia, ja, and a - where ia is the array of row pointers, ja is the array of column indices, and a is the array of nonzero values. These arrays can be accessed directly with calls to the get_ia(), get_ja(), and get_a() member functions.
Sparse matrix objects can be read from Harwell-Boeing style sparse matrix files using the load_matrix_from_file() function in io.C.
At the time of this writing, there is no way to store any additional vectors in the sparse object. Right-hand-sides, guess vectors, and exact value vectors are not read by the file I/O routines, and are intended to be stored and manuipulated separately by the user.
Many of the sparse class's member functions make calls to SPARSKIT routines. See csparse.c for the C definitions of these functions.
The funct_def class encapsulates a user-defined function. The class simply consists of three strings, args, body, and ret where args is a string containing the argument names, body is a string containing the function's code, and ret is a string containing the name of the variable that is to hold the function's return value.
Adding a new built-in command to sparse kit consists of a few steps. First, the token representing that command must be added to the list of tokens.
Open spmath.y and look for the token list. It will look like this:
%token TYPE %token WHO %token WHOS %token BREAK %token CONTINUE %token ELSE %token ELSEIF %token END etc...
Add a new token to this list by choosing a constant name that is not in the list, and putting it on a new line after a %token directive. Let's pretend we're adding a token called "ADD":
%token ADD %token TYPE %token WHO %token WHOS %token BREAK %token CONTINUE %token ELSE %token ELSEIF %token END etc...
This token can be added anywhere in the list.
Now open spmath.l and look for the list of rules for the built-in-functions. The list will look like this:
type { return TYPE; } who { return WHO; } whos { return WHOS; } demo { return DEMO; } load { return LOAD; } hbout { return PRTMT; } hbin { return READMT; } fdmat { return GEN57PT; } etc...
Add a new rule that maps all the strings that should be turned into the "ADD" token to "ADD" like this:
// I added this token add { return ADD; } etc...
If we wanted both the strings "add" and "plus" to map to the token ADD, the statement would look like this instead:
// I added this token add|plus { return ADD; } etc...
Now go back to spmath.y. To add this rule to the already existing list, scroll until you find a rule with a left-hand-side of "function_call:" The rule we will add should be added to the list of right-hand-side rules somewhere after the right-hand-side NAME '(' ')'. Assuming our new built-in function takes one paremeter, the grammar rule should look something like this:
| ADD '(' expression ')' { if (text_flag) { char *temp = new char[3 + 1 + strlen ($3->str) + 1]; strcpy (temp, "add("); strcat (temp, $3->str); strcat (temp, ")"); $$ = new value (temp); $3->free (); delete $3; } else { err = 0; $$ = add ($3); $3->free (); delete $3; if (err) { YYERROR; } } }
As you can see, the rule is divided into two sections. One is executed if text_flag is true and the other if text_flag is false. text_flag is a global flag that indicates if the parser is in "text capture mode." The parser is in "text capture mode" when it is reading in the body of some nested construct. In this case, instead of executing the command right away, the parser will store the command as a string and return that as the result of the command. Later, the string can be parsed as usual to execute that body.
When text_flag is false, then the command is actively being executed, and so the else {} block is the more important one. In this case, our action will make a call to a function called "add()" which takes one parameter. The variable $1 is another way of saying "The third token in this token stream" which is "expression." Expression is a token that has type "value *" so what is really being passed to add() is a "value *" that represents whatever expression was entered between the parentheses by the user. The add() is expected to do all of the error-checking on the parameters and output any error messages that might be appropriate. If an error does occur, it should set err to be nonzero so that the parser knows to stop parsing this command.
Note that the return of the "add()" function is assigned to a variable called "$$". The "$$" symbol represents something like the "return value" of the command. The grammar is processed using a stack, and essentailly what happens is each token is pushed onto the stack and then, when the contents of the stack matches some rule in the list, those tokens are popped back off and a single token is pushed back on in their place. "$$" represents the token that is pushed back on, and can be used later as a right-hand-side token in some other rule in the grammar. This stack-recursion ability is what makes yacc so powerful (and confusing). The type for the "$$" variable is determined by the type of the left-hand-side of the rule. In this case "$$" is a "value *".
The last few lines:
$3->free (); delete $3; if (err) { YYERROR; }
...simply free the memory taken by the expression and notify the parser if there was an error in the function add(). It is important that all paths through the handler code free all the variables from the right-hand-side portion of the rule.
We have assumed here that the functionality of the add() command is complex enough to justify putting it in a separate function. Sometimes the command's action will be simple enough that it can be performed inside the rule without cluttering the grammar. It is recommended, however, that these should be separate, as the grammar file is by far the largest and most confusing in the SPMath project.
Spmath's C++ calls FORTRAN routines via C wrappers which are located in csparse.c. This method has a couple advantages over linking directly from FORTRAN to C++. First, the wrappers can be named whatever the programmer wants, which eliminates the messy naming that results from FORTRAN name mangling at compile-time. And second, the wrappers can handle the extra steps needed to be compatible with FORTRAN's pass-by-reference convention.
When the FORTRAN routine is written, be sure to copy its source file to the "libraries/" subdirectory in the SPMATH directory. Edit the "files" variable in the root SPMATH directory's Makefile.in to include this source's .o file in the linking of SPMATH. The edited Makefile.in may look like this:
files = output.o \ main.o \ tcl.o \ (...) ${librdir}/mynewfunc.o
..then edit the Makefile.in in the "libraries/" subdirectory to include this file in the "all" target:
(...) all: functns.o blas1.o (...) mynewfunc.o mynewfunc.o: mynewfunc.f ${F77} ${fortran_flags} mynewfunc.f ${inc} (...)
Now open csparse.c. This file contains all the wrappers for the FORTRAN functions used in SPMATH. Create a new function with a name that resembles the name of the FORTRAN subroutine we're wrapping. Its argument list should correspond to the argument list of the FORTRAN routine (real's become double's, integer's become int's, integer arrays become int *'s, etc). In the body of the wrapper, call the FORTRAN soubroutine by invoking the name of the subroutine followed by an underscore. When passing arguments to a FORTRAN subroutine all non-pointer arguments should be passed by address.
Here is an example of a wrapper in SPMATH:
void frobnorm(n, sym, a, ja, ia, Fnorm) int n; int sym; double *a; int *ja; int *ia; double *Fnorm; { frobnorm_ (&n, &sym, a, ja, ia, Fnorm); }
Note that "n" and "sym" are passed by address, since they are not pointers.
Now open csparse.h. We need to add a prototype for the wrapper AND the FORTRAN routine in this header file.
The first list of prototypes in csparse.h contains the FORTRAN prototypes. In this section, add a new prototype for the FORTRAN subroutine without specifying a return type or any arguments:
void frobnorm_();
In the second list, add the wrapper's prototype, again specifiying neither return type nor arguments:
void frobnorm();
Now for every source file from which you would like to call this wrapper, add a regular prototype for the C wrapper function. Don't forget the 'extern "C"' modifier:
extern "C" { void frobnorm(int, int, double *, int *, int *, double *); }
You may now call this wrapper from anywhere in that source file.
At the time of this writing, all such prototypes are in sparse.h, since sparse.C is the only source file from which they are called.
If the procondionner is implemented in an external FORTRAN routine, be sure that you have followed the instructions in section 5 for how to add it to the project.
Now open functions.C. In order for the new proconditionner to be usable for the user, we must add it to the list of valid preconditioners in the solve_system function. In the 10-argument solve_system function, there is a section that looks like this:
if ( (p !=*(new string("ilut"))) && (p !=*(new string("iluk"))) && (p !=*(new string("ilud"))) && (p !=*(new string("ilu0"))) && (p !=*(new string("ilutp")))) { sp_error("Error, preconditionner \""); sp_error(p); sp_error("\" doesn\'t exist\n"); (...)
Add your preconditioner's name to this list of tests to keep SPMATH from rejecting it.
if ( (p !=*(new string("ilut"))) && (p !=*(new string("iluk"))) && (p !=*(new string("ilud"))) && (p !=*(new string("ilu0"))) && (p !=*(new string("ilutp"))) && (p !=*(new string("mynewprec")))) { (...)
This function also checks to ensure that the correct number of arguments is being passed to the preconditionner via the slvr_prec_args variable (or via whatever the user inputted). Find the section in solve_system that looks like this:
if (p ==*(new string("ilut"))) { (...) } else if (p ==*(new string("iluk"))) { (...) } else if (p ==*(new string("ilu0"))) { (...) } (...)
Add a test to check for your preconditionner, and then check "(arg_prec->dmat)->get_ncol()" to ensure that the number of columns in this vector equals the number of arguments for this preconditionner. If not, print an error message and return. If so, fill arg_prec2 with the arguments. Example:
(...) else if (p ==*(new string("mynewprec"))) { if ( (arg_prec->dmat)->get_ncol() != 3 ) { yyerror ("Error, mynewprec preconditionner takes 3 args"); err = 1; return mat; } else { arg_prec2 = new double[3]; arg_prec2[0]=(arg_prec->dmat)->get_element(1,1); arg_prec2[1]=(arg_prec->dmat)->get_element(1,2); arg_prec2[2]=(arg_prec->dmat)->get_element(1,3); } } (...)
Now open sparse.C and find the solve_system member function. Look for the section where it checks which precontionner has been specified:
(...) if (p == *(new string("ilut"))) { (...) } else if (p == *(new string("ilutp"))) { (...) } else if (p == *(new string("iluk"))) { (...) } (...)
Add your preconditionner to these tests and then add code that peels off your arguments into their own variables and calls another member function (which we will write). For example:
(...) else if (p == *(new string("mynewprec"))) { int lfil = arg_prec[0]; double drptol = arg_prec[1]; double permtol = arg_prec[2]; ierr = b->mynewprec_msr(lfil, drptol, permtol); } (...)
Finally, we must write the member function mynewprec_msr() which sets up the scratch space and calls the wrapper. For example:
int sparse::mynewprec_msr(int lfil, double drptol, double permtol) { sparse *input = new sparse(*this); free(); nrow=input->get_nrow(); ncol=input->get_ncol(); nnz=input->get_nnz(); a = new double[nrow+1+2*nrow*lfil]; ja = new int[nrow + 1 + 2*nrow*lfil]; ia = new int[nrow+1]; int iwk = nrow+1+2*nrow*lfil; int *jw = new int[2*nrow]; int ierr[1]; ierr[0]=0; double *w = new double[nrow+1]; // Call the wrapper mynewprec_prec(nrow, input->get_a(), input->get_ja(), input->get_ia(), lfil, drptol, permtol, a, ja, ia, iwk, w, jw, ierr); delete [] jw; delete [] w; input->free(); return *ierr; }
Adding a solver is very similar to adding a preconditioner. If the solver is written in FORTRAN, it must be added to the project (see instructions in section 5).
Open functions.C and find the 10-argument solve_system() function. Find the list of solver tests that looks like this:
if ( (s !=*(new string("cg"))) && (s !=*(new string("bcg"))) && (...) (s !=*(new string("fgmres"))) && (s !=*(new string("dqgmres"))) ) { sp_error("Error, solver \""); sp_error(s); sp_error("\" doesn\'t exist\n"); (...) }
...and add your solver to that list:
if ( (s !=*(new string("cg"))) && (s !=*(new string("bcg"))) && (...) (s !=*(new string("fgmres"))) && (s !=*(new string("dqgmres"))) && (s !=*(new string("mysolve"))) ) { sp_error("Error, solver \""); sp_error(s); sp_error("\" doesn\'t exist\n"); (...) }
Now open up sparse.C and find the solve_system() member function. Find the section where solver names are searched again:
if (s == *(new string("bcg"))) { (...) } if (s == *(new string("cg"))) { (...) } if (s == *(new string("dbcg"))) { (...) }
...and add your solver to this list. Inside the if statement, get any extra solver arguments ready and allocate your scratch space before calling the wrapper. For example:
(...) if (s == *(new string("mysolve"))) { wk = new double[7*nrow]; int m = ipar[4]; // Call the wrapper runmysolve_solver(nrow, rhs, sol, ipar, fpar, wk, guessvec, a, ja, ia, b->get_a(), b->get_ja(), b->get_ia() ); }
SPMath's manpages are stored as flat text files in the "manpage/" subdirectory. To edit an existing manpage, locate it in the "manpage/" directory and simply edit it using emacs or a similar tool. To add a new manpage, create a new file with the name
if ((top == *(new string("add"))) || (top == *(new string("plus")))) { strcat (temp, "/add.manpage"); spit_file(temp); return new value (); }
This page was last modified August 14th, 2001 by Ben Langmead