[UMN logo]

CSci 5106: Programming Languages
Spring 2006, University of Minnesota
Assignment 1
Comments on Grading


General Remarks

This assignment was graded out of 25 points. The points distribution was 5, 10, 3, 2, 3 and 2, respectively. The points for the first, second and fourth problems were split evenly between each of the languages. In the third problem, the tests devised and the rationale provided were expected to have a substantial common character across languages. For this reason, the answers were graded as one although, when this was necessary or relevant, it was assumed that each language contributed an even share of the total points. In the fifth and sixth problems, we were looking for insightfulness in your comments. There was no `set' answer to these questions, so grading was based on how relevant the comments you made were to the question asked.

Before getting to the specific questions, it is important to reiterate that there is a protocol for submissions that must be followed in this course. Specific points to note: Please make sure to write a separate, clearly distinguished answer to each question. We also need space for comments and feedback. You should format and space your submissions in a way that makes this possible. In particular, leave a margin and leave space between questions; ideally answer each question on a separate page. You must provide legible, properly formatted writeups.

One other thing I should comment on specifically. This course is different from others that you might have had that deal with programming in that it is about programming languages. You will therefore have to think abstractly often about programming related issues and you will need to discuss things. Problems 5 and 6 required you to do this in this homework, for example. In these cases, we are interested in seeing your considered opinions. For example, for Problem 6, we wanted to see what you got from the two papers and where you agreed and disagreed with them. We have read the papers so a long, point by point and page by page description of their contents was not that useful. This is perhaps different from what you have seen in other courses and so possibly a little difficult in the beginning. But it can be interesting once you get used to it! Grading is also perhaps `tougher', given the subjective nature of the questions, but don't get worried by this: in the end, translation to a letter grade will be based on a curve.

The average of the scores in this homework was 19.06, the standard deviation was 3.68, the median was 20.15 and the highest score was 23.75. I realize that this was a `tough' assignment since many of you were seeing the languages for the first time, possibly also in a new environment. The important thing, though, is that at this point we have shaken out the mechanics of writing and running programs in the Unix environment and we can assume that this will not be the difficult part of future assignments.


Problem 1

Lists are provided as built-in data structures in Prolog, ML and Scheme and it is sensible to use them directly. One aspect that was missed by many people was that a choice has to be made even in these languages: you have to pick the type for elements of such lists. The best choice is the character type. Several people did not understand the relevance of using the character type for elements and they used a representation of the form '(a b c) in Scheme and [a,b,c] in Prolog instead. Notice that the a, b and c in such expressions are ``names'' (Scheme) or constants (Prolog). Using these is a bad idea since you run into all kinds of problems when you want to represent special characters or even uppercase ones in Prolog. The ML type system forces you to think about such matters indirectly and so you should have ended up making a better (implicit) choice for all the languages once you got around to testing your ML program. If you did encounter the problem relative to ML, this might have been a cue for the other languages. Try to remember that the basic issues are generally problem, not language, dependent, and hence the same regardless of the language chosen for implementation. Differences between languages will be manifest in whether or not they make you think of these issues clearly and also in whether or not they force you to think of unnecessary additional aspects. In this particular case, I think the type system of ML works to the programmer's advantage.

Notice that picking the type of element was the only important choice in data structure in ML, Prolog and Scheme and so should really have carried most of the credit. We have, however, typically deducted half the credit if you did not get this right.

In C and Java (without using APIs), there no predefined structure for lists and you have to build one yourself. There are two obvious approaches. Under one, you represent a list as an array of cells of the right type, in this case characters. Since arbitrary length is required for the sequence, a dynamic allocation of the array should be used. For example, malloc or calloc can be used in C, and new in Java. Some of you chose to define a maximum length for the array in C, and used a special character to denote the end of the actual array. This is not a good choice: a large value of the maximum length would cause waste on storage space; however, a small size would complicate manipulation on the array, which consequently involves extra cost, when the array size grows over this maximum length. The other appoach is to represent a list by a pointer: this would point to nothing in the case of an empty list and to a cell that contains the first item of the list and (a pointer to) the tail of the list. Each approach has its advantages. With an array representation, you can get to any point in the list (such as the end) very quickly and with the pointer representation the sizes of lists are not artificially limited. A discussion of such issues and a presentation of the type declaration in C and Java was what was expected of an ideal answer here.

Some final points to note. First, in Prolog and ML you could conceivably use an alternative to the built-in notion of lists, but I cannot think of any natural representation that does not simply redefine this one. Second, you might choose to use the string type in Scheme, ML and Prolog as a means of representing lists of characters, but this is not a good choice: if it is lists that you want to deal with conceptually, it is likely you will have to translate the string representation to an explicit list representation internally and, so, why not use the latter in the first place? Also, observe that a list of characters is amenable to generalizations, such as to a list of integers or a list of strings, that a string is not.


Problem 2

The most important thing in this problem is to note that an explicit reverse procedure or function was required: if you turned in one long sequence of code that included lines for reading in lists, reversing them and printing them out, you would have got very little credit for this part. In Scheme and ML, you needed to provide a function that took exactly one argument and returned a reversed list as a result. In Prolog, this becomes a two place relation. In C and Java, it is possible to write a function (method) with the relevant input/output behavior but that clobbers the input list. This, once again, violates the required functionality and so would cost points. Moreover, some of you wrote a C function with two parameters as a character array and its length respectively. This design leads to poor data encapsulation which consequently increases coupling between the reverse function and its others. In general, note that programming methodology is an important component of this course, and the reason the questions were posed the way they were was to focus attention on this aspect. If you have read the paper by C.A.R. Hoare, you should already appreciate the reason why we should take programming methodology so seriously here.

One or two of you had difficulties with the structure of lists in Scheme, ML and Prolog. For example, the expression

(((1).2).3)
is not a list in Scheme and the expression
[[['a'],'b'],'c']
is not a char list in Prolog. If you do not understand why by now, post something to HyperNews and we will discuss it again.

Problem 3

For this part you had to describe a complete set of tests and present the rationale for the choice of this set in addition to actually testing the program. For some reason several people did not pay heed to what exactly was asked and lost points for this.

In designing tests, the most important thing to keep in mind is that they are attempting to convince someone of the correctness of your program. Thus, the tests must pay attention to the special cases that arise out of the structure of your program. A typical program structure is one that deals with an empty list in a special way and that deals with the other cases by reducing the length of the list to be handled by one. (This could happen with either a recursive program or one that uses a loop.) Thus suitable tests might be ones that check that the program works properly in the special case and in a couple of the `normal' cases. Once again, it is not enough to just say that the empty list is a special case and a list with one or more elements is a normal case; you have to explain this with reference to your program. For instance, consider a reverse function in C that works inwards from the two ends and, at each stage, it swaps the items at the ends. Here the two really important cases to check are those when the list is of even and odd length as opposed to the cases mentioned earlier.


Problem 4

This part of the assignment involved demonstrating that your programs worked. You should have got a substantial portion of the credit on this question (even if your tests were not well chosen) so long as you provided some evidence for the fact that your programs run. It is also important, of course, to take the results of your test runs seriously and especially to check that the output actually has the required structure.


Problem 5

This is an open-ended question and any collection of important points (or, at least, ones that your discussion showed to be important) were accepted. Of course, the grade was dependent on how clearly you made these points and on the relative merits of the observations themselves. From my perspective, the built-in mechanisms for dealing with lists present in Prolog, ML and Scheme makes it possible to develop solutions in these languages quickly. Another plus point for these languages is that they are interaction oriented and this saves some effort in developing input/output routines (as you would have realized in Problem 4). Finally, these languages focus to a large extent on describing a problem---they are known as declarative programming languages for this reason---and so it is much easier to develop a correct program in them than in C. The advantage of C is that you have finer grained (and lower level) control over the representation. For instance, there is a choice of significance in this problem between (at least) an array representation or a pointer based representation of lists: this choice could make differences with respect to efficiency of the solution. For reasons such as this, I might use one of the first three languages at the experimentation stage (especially when dealing with structures like lists) and might think of using C at a later stage when careful attention has to be paid to efficiency. This is really not an unusual choice.

Some people said that the C and Java programs were simpler, despite their greater length, due to their iterative nature. Please keep in mind that the only reason that iteration seems simpler to you now is that it's what you're used to. Once you've spent some time with the other languages, you may change your mind. Some of you made observations regarding efficiency that are difficult to support. One kind of observation was that the C solution were obviously more efficient than that in Java, Scheme, ML or Prolog. These kinds of obvious statements are ones to watch out for in this course since they turn out to be false or difficult to substantiate in many cases! Even if your C program is more efficient in this case, they would not be noticeably more so. Another kind of observation was that the Prolog, ML and Scheme solutions were more efficient because they resulted in shorter fragments of code. This also is not correct because, ultimately, the compiler would translate this code as well into something that created linked lists and it is the translated code that eventually determines speed. Some of you made the comment that Scheme, ML and Prolog are interpreted. As we have noted in class, this is not true; all the language implementations you are using in this course are compiler based.


Problem 6

This was, once again, an open-ended question and any well made arguments were accepted. One point to note is in questions of this kind a sequential summary of what is said in the articles is not what is desired. Rather, what you needed to do was to assimilate what the important points are that the authors are making, to present these as such and then to discuss them. With regard to the last aspect, it is a real discussion that is needed: it is not enough to simply say "I agree/disagree with this" or "this is/has turned out to be false;" you need also to substantiate such comments carefully.

From my own perspective, I think the papers make good arguments in support of issues such as simplicity, security, efficiency and readability. I may have some differences in details. For instance, Hoare makes an argument in favour of leaving things like array bounds checking in on `production' versions of programs. This may be useful sometimes especially in fixing bugs, but to my mind finding errors in programs becomes less useful after they have been deployed in the field unless it is accompanied by a requirement that the program code also anticipate such possibilities and include corrective action. Java does the latter to some extent with its requirement to indentify and handle exceptions but Hoare's discussion does not address this requirement specifically. A more substantive difference with the two papers is that I feel that we should not lose sight of the fact that we are ultimately interested in a convenient use of the underlying machine. Thus, no matter how much we hate references from the point of view of security, I would not know how else to deal with data structures that grow during computation. Wirth makes a milder suggestion for taming pointers that is a more palatable one to me and also indicates a way that language and feature designers should be thinking in my opinion. There is a tradeoff between what is referred to as flexibility and security and language design involves finding a good via media. Pascal, one of the languages we will see later in the course, had problems because it was too strict and, from the other direction, the C people eventually realized that typing was important. Progress in programming languages often involves removing impediments to flexibility while maintaining security. A good example of this kind is the introduction of polymorphic typing that we will study later in the course.

Another issue that I feel both Hoare and Wirth play their hands too hard on is efficiency. (Wirth even suggests that language design is compiler construction.) I think it is important also to focus on new and convenient ways to represent solutions. Sometimes it is difficult to do this if one is bogged down by efficiency construed in an conventional manner. Languages such as Prolog, ML, Scheme, etc bear witness to this. It was only after the expressive elegance of these languages was well understood that good implementations were devised for them. Of course, there was always an underlying feeling that these languages could be implemented well, but I am not sure this would have been sufficient from the perspective of the articles in question.

There was some confusion concerning Hoare's views on modularity. Some of you read him as being opposed to modular development. My interpretation is that what he is saying is that separate compilation should not become an excuse for ignoring the space and time efficiency of compilers themselves. Some people also seem to have misunderstood his comments relating to efficiency and took exception to his point that increasing hardware capabilities (both speed and space) cannot supplant a concern for efficiency. Hoare's point is that not paying attention to this leads to an under-utilization of the capabilities and this is not a Good Thing since our computing demands also grow as the capabilities of hardware do.

Some of you were also critical about Wirth's comment that language design should be done by a single individual, observing that software engineering involves teams these days. I am not sure I agree. First of all, software construction is quite a different thing from programming language design. Wirth is mainly noting the fact that different aspects of language design interact and so different `features' cannot be added in a decentralised fashion. By analogy, even if different pieces of a large software system are constructed by different individual's in a team, there still must be a single person, or a very small group of people, telling them what functionality each of their pieces must provide.

Finally, some of you did not seem to quite understand the point Wirth was making with regard to transparency. His observation here is mainly that it should be easy to visualize the cost of a programming language feature after it is translated to something that can be really run on hardware. This sounds reasonable to me. There are choices to be made in writing any program and I would like to make informed decisions at these points. How would I be able to make such decisions well if I did not have transparency? Of course, this is not the first thing I want to think about when writing a program and a particular feature can be quite useful in programming even if it is costly to execute, but eventually I need to know about such things and I need also to have a way for estimating the cost sensibly.


Last updated on Feb 7, 2006 by gopalan@cs.umn.edu and xqi@cs.umn.edu.