(r12) ProgrammingNotes - Tech

(back to TechTips)

ProgrammingBookmarks

Design Patterns are a way of describing recurring problems and their solutions in a language-neutral way, focussing on how the problem presents itself and how to solve it. The best site that I've seen to explain patterns is the Portland Pattern Repository at http://c2.com/ppr/. You will not regret spending an hour browsing it. The "Pattern Language Catalog" is more strategic and the "People, Projects, & Patterns" is more tactical but both are worth the investment in time.

BNF Grammar - BNF grammars are used for describing things like programming language syntax and message formats. Here's a brief introduction: http://www.garshol.priv.no/download/text/bnf.html

Code Reviews - are extremely important because they actually find bugs, but I've been looking for a way to speed them up. This might be it: a tool that allows people to input their comments ahead of time. http://codestriker.sourceforge.net/

Go To Statement Considered Harmful - Edsger Dijkstra's classic. http://www.acm.org/classics/oct95/

Managing Concurrency

If you're building a system that has a database and different processes operating on the objects in that database then you'll probably want to come up with a strategy for dealing with concurrent access to specific objects. There are a couple of approaches: hard lock, soft lock, merge.

Hard Lock means that you take a database lock when you read an object and hold it until your update completes. If everyone uses the same logic then the 2nd person who tries to update the object will fail, or wait until you've finished and committed your changes.

Soft Lock means that you'll mark the record as being in the process of being updated, and report this status back to the user. The 2nd person who tries to update the object will get a message that the object is being updated, but can proceed (at the risk of one of the users losing their changes).

Merge means that you note the version of the row that you read and fail on the write if that version has changed in the mean time. This is something like CVS's approach - it works well when there are few collisions but requires some conflict-resolution logic to really work well.

Normalization - It Isn't Just For Databases Anymore

In database terms, normalization is the process of removing duplicate information from schemas so that they are more internally consistent. After all, if the same piece of information appears twice in a database there's some chance that the two values will become inconsistent with one another, whereas if there's only one it's always consistent with itself. Normalization applies to other areas of programming as well. One project I worked on had a data structure that contained a person's birthday, age, and a code that represented the person's age category (adult, child, infant). Each of these items had a member field and getter and setter methods. No one was really sure how these grew up, because they had evidently been added by different people at different times, but the result was a lot of unneccessary confusion over which item was set, which wasn't, and who was using which. Only very rarely were all three values in sync; usually one or two had been set and the others were empty. This was a nightmare for programmers because you never knew which ones had been set and which hadn't, which caused a lot of wasted time.

The solution seems obvious in retrospect, but wasn't at the time: store one field and have the getters and setters "normalize" to that field. For example, the category field didn't ever need to be set since it could be inferred from the age, and the age could be inferred from the birthday. This might have made the class itself more complex but it would have made the use of it much, much simpler.

Here's another example from an XML document:

<people>
 <num_of_people>2</num_of_people>
 <person>Bob</person>
 <person>Mary</person>
</people>

What would happen if the same fragment had a num_of_people whose value was 3? On the other hand, the num_of_people doesn't add any more information than is already contained in the document, because it's implicit from the number of person elements. The document would be far clearer and more robust if the num_of_people element were eliminated.

In general, look for opportunities to eliminate duplicate data. It will pay off over time in clearer code that's easier to use.