Before getting to the specific questions, it is important to reiterate that there is a protocol for submissions that must be followed in this course. Specific points to note: Please make sure to write a separate, clearly distinguished answer to each question. We also need space for comments and feedback. You should format and space your submissions in a way that makes this possible. In particular, leave a margin and leave space between questions; ideally answer each question on a separate page. You must provide legible, properly formatted writeups.
One other thing I should comment on specifically. This course is different from others that you might have had that deal with programming in that it is about programming languages. You will therefore have to think abstractly often about programming related issues and you will need to discuss things. Problems 5 and 6 required you to do this in this homework, for example. In these cases, we are interested in seeing your considered opinions. For example, for Problem 6, we wanted to see what you got from the two papers and where you agreed and disagreed with them. We have read the papers so a long, point by point and page by page description of their contents was not that useful. This is perhaps different from what you have seen in other courses and so possibly a little difficult in the beginning. But it can be interesting once you get used to it! Grading is also perhaps `tougher', given the subjective nature of the questions, but don't get worried by this: in the end, translation to a letter grade will be based on a curve.
The average of the scores in this homework was 19.06, the standard deviation was 3.68, the median was 20.15 and the highest score was 23.75. I realize that this was a `tough' assignment since many of you were seeing the languages for the first time, possibly also in a new environment. The important thing, though, is that at this point we have shaken out the mechanics of writing and running programs in the Unix environment and we can assume that this will not be the difficult part of future assignments.
Notice that picking the type of element was the only important choice in data structure in ML, Prolog and Scheme and so should really have carried most of the credit. We have, however, typically deducted half the credit if you did not get this right.
In C and Java (without using APIs), there no predefined structure for lists and you have to build one yourself. There are two obvious approaches. Under one, you represent a list as an array of cells of the right type, in this case characters. Since arbitrary length is required for the sequence, a dynamic allocation of the array should be used. For example, malloc or calloc can be used in C, and new in Java. Some of you chose to define a maximum length for the array in C, and used a special character to denote the end of the actual array. This is not a good choice: a large value of the maximum length would cause waste on storage space; however, a small size would complicate manipulation on the array, which consequently involves extra cost, when the array size grows over this maximum length. The other appoach is to represent a list by a pointer: this would point to nothing in the case of an empty list and to a cell that contains the first item of the list and (a pointer to) the tail of the list. Each approach has its advantages. With an array representation, you can get to any point in the list (such as the end) very quickly and with the pointer representation the sizes of lists are not artificially limited. A discussion of such issues and a presentation of the type declaration in C and Java was what was expected of an ideal answer here.
Some final points to note. First, in Prolog and ML you could conceivably use an alternative to the built-in notion of lists, but I cannot think of any natural representation that does not simply redefine this one. Second, you might choose to use the string type in Scheme, ML and Prolog as a means of representing lists of characters, but this is not a good choice: if it is lists that you want to deal with conceptually, it is likely you will have to translate the string representation to an explicit list representation internally and, so, why not use the latter in the first place? Also, observe that a list of characters is amenable to generalizations, such as to a list of integers or a list of strings, that a string is not.
One or two of you had difficulties with the structure of lists in Scheme, ML and Prolog. For example, the expression
(((1).2).3)is not a list in Scheme and the expression
[[['a'],'b'],'c']is not a char list in Prolog. If you do not understand why by now, post something to HyperNews and we will discuss it again.
In designing tests, the most important thing to keep in mind is that they are attempting to convince someone of the correctness of your program. Thus, the tests must pay attention to the special cases that arise out of the structure of your program. A typical program structure is one that deals with an empty list in a special way and that deals with the other cases by reducing the length of the list to be handled by one. (This could happen with either a recursive program or one that uses a loop.) Thus suitable tests might be ones that check that the program works properly in the special case and in a couple of the `normal' cases. Once again, it is not enough to just say that the empty list is a special case and a list with one or more elements is a normal case; you have to explain this with reference to your program. For instance, consider a reverse function in C that works inwards from the two ends and, at each stage, it swaps the items at the ends. Here the two really important cases to check are those when the list is of even and odd length as opposed to the cases mentioned earlier.
Some people said that the C and Java programs were simpler, despite their greater length, due to their iterative nature. Please keep in mind that the only reason that iteration seems simpler to you now is that it's what you're used to. Once you've spent some time with the other languages, you may change your mind. Some of you made observations regarding efficiency that are difficult to support. One kind of observation was that the C solution were obviously more efficient than that in Java, Scheme, ML or Prolog. These kinds of obvious statements are ones to watch out for in this course since they turn out to be false or difficult to substantiate in many cases! Even if your C program is more efficient in this case, they would not be noticeably more so. Another kind of observation was that the Prolog, ML and Scheme solutions were more efficient because they resulted in shorter fragments of code. This also is not correct because, ultimately, the compiler would translate this code as well into something that created linked lists and it is the translated code that eventually determines speed. Some of you made the comment that Scheme, ML and Prolog are interpreted. As we have noted in class, this is not true; all the language implementations you are using in this course are compiler based.
From my own perspective, I think the papers make good arguments in support of issues such as simplicity, security, efficiency and readability. I may have some differences in details. For instance, Hoare makes an argument in favour of leaving things like array bounds checking in on `production' versions of programs. This may be useful sometimes especially in fixing bugs, but to my mind finding errors in programs becomes less useful after they have been deployed in the field unless it is accompanied by a requirement that the program code also anticipate such possibilities and include corrective action. Java does the latter to some extent with its requirement to indentify and handle exceptions but Hoare's discussion does not address this requirement specifically. A more substantive difference with the two papers is that I feel that we should not lose sight of the fact that we are ultimately interested in a convenient use of the underlying machine. Thus, no matter how much we hate references from the point of view of security, I would not know how else to deal with data structures that grow during computation. Wirth makes a milder suggestion for taming pointers that is a more palatable one to me and also indicates a way that language and feature designers should be thinking in my opinion. There is a tradeoff between what is referred to as flexibility and security and language design involves finding a good via media. Pascal, one of the languages we will see later in the course, had problems because it was too strict and, from the other direction, the C people eventually realized that typing was important. Progress in programming languages often involves removing impediments to flexibility while maintaining security. A good example of this kind is the introduction of polymorphic typing that we will study later in the course.
Another issue that I feel both Hoare and Wirth play their hands too hard on is efficiency. (Wirth even suggests that language design is compiler construction.) I think it is important also to focus on new and convenient ways to represent solutions. Sometimes it is difficult to do this if one is bogged down by efficiency construed in an conventional manner. Languages such as Prolog, ML, Scheme, etc bear witness to this. It was only after the expressive elegance of these languages was well understood that good implementations were devised for them. Of course, there was always an underlying feeling that these languages could be implemented well, but I am not sure this would have been sufficient from the perspective of the articles in question.
There was some confusion concerning Hoare's views on modularity. Some of you read him as being opposed to modular development. My interpretation is that what he is saying is that separate compilation should not become an excuse for ignoring the space and time efficiency of compilers themselves. Some people also seem to have misunderstood his comments relating to efficiency and took exception to his point that increasing hardware capabilities (both speed and space) cannot supplant a concern for efficiency. Hoare's point is that not paying attention to this leads to an under-utilization of the capabilities and this is not a Good Thing since our computing demands also grow as the capabilities of hardware do.
Some of you were also critical about Wirth's comment that language design should be done by a single individual, observing that software engineering involves teams these days. I am not sure I agree. First of all, software construction is quite a different thing from programming language design. Wirth is mainly noting the fact that different aspects of language design interact and so different `features' cannot be added in a decentralised fashion. By analogy, even if different pieces of a large software system are constructed by different individual's in a team, there still must be a single person, or a very small group of people, telling them what functionality each of their pieces must provide.
Finally, some of you did not seem to quite understand the point Wirth was making with regard to transparency. His observation here is mainly that it should be easy to visualize the cost of a programming language feature after it is translated to something that can be really run on hardware. This sounds reasonable to me. There are choices to be made in writing any program and I would like to make informed decisions at these points. How would I be able to make such decisions well if I did not have transparency? Of course, this is not the first thing I want to think about when writing a program and a particular feature can be quite useful in programming even if it is costly to execute, but eventually I need to know about such things and I need also to have a way for estimating the cost sensibly.
Last updated on Feb 7, 2006 by gopalan@cs.umn.edu and xqi@cs.umn.edu.