The Caboteria / Tech Web / TechNotes / ProgrammingNotes (revision 23)

Design Patterns are a way of describing recurring problems and their solutions in a language-neutral way, focussing on how the problem presents itself and how to solve it. The best site that I've seen to explain patterns is the Portland Pattern Repository at http://c2.com/ppr/. You will not regret spending an hour browsing it. The "Pattern Language Catalog" is more strategic and the "People, Projects, & Patterns" is more tactical but both are worth the investment in time.

BNF Grammar - BNF grammars are used for describing things like programming language syntax and message formats. Here's a brief introduction: http://www.garshol.priv.no/download/text/bnf.html

Code Reviews - are extremely important because they actually find bugs, but I've been looking for a way to speed them up. This might be it: a tool that allows people to input their comments ahead of time. http://codestriker.sourceforge.net/

Go To Statement Considered Harmful - Edsger Dijkstra's classic. http://www.acm.org/classics/oct95/

Managing Concurrency

If you're building a system that has a database and different processes operating on the objects in that database then you'll probably want to come up with a strategy for dealing with concurrent access to specific objects. There are a couple of approaches: hard lock, soft lock, merge.

Hard Lock means that you take a database lock when you read an object and hold it until your update completes. If everyone uses the same logic then the 2nd person who tries to update the object will fail, or wait until you've finished and committed your changes.

Soft Lock means that you'll mark the record as being in the process of being updated, and report this status back to the user. The 2nd person who tries to update the object will get a message that the object is being updated, but can proceed (at the risk of one of the users losing their changes).

Merge means that you note the version of the row that you read and fail on the write if that version has changed in the mean time. This is something like CVS's approach - it works well when there are few collisions but requires some conflict-resolution logic to really work well.

Normalization - It Isn't Just For Databases Anymore

In database terms, normalization is the process of removing duplicate information from schemas so that they are more internally consistent. After all, if the same piece of information appears twice in a database there's some chance that the two values will become inconsistent with one another, whereas if there's only one it's always consistent with itself. Normalization applies to other areas of programming as well. One project I worked on had a data structure that contained a person's birthday, age, and a code that represented the person's age category (adult, child, infant). Each of these items had a member field and getter and setter methods. No one was really sure how these grew up, because they had evidently been added by different people at different times, but the result was a lot of unneccessary confusion over which item was set, which wasn't, and who was using which. Only very rarely were all three values in sync; usually one or two had been set and the others were empty. This was a nightmare for programmers because you never knew which ones had been set and which hadn't, which caused a lot of wasted time.

The solution seems obvious in retrospect, but wasn't at the time: store one field and have the getters and setters "normalize" to that field. For example, the category field didn't ever need to be set since it could be inferred from the age, and the age could be inferred from the birthday. This might have made the class itself more complex but it would have made the use of it much, much simpler.

Here's another example from an XML document:

<people>
 <num_of_people>2</num_of_people>
 <person>Bob</person>
 <person>Mary</person>
</people>

What would happen if the same fragment had a num_of_people whose value was 3? On the other hand, the num_of_people doesn't add any more information than is already contained in the document, because it's implicit from the number of person elements. The document would be far clearer and more robust if the num_of_people element were eliminated.

In general, look for opportunities to eliminate duplicate data. It will pay off over time in clearer code that's easier to use.

Saying "No" vs. Saying Nothing

I've noticed that many systems seem to confuse the difference between "no" and saying nothing. For example, if you make a database query and it returns zero rows, can you tell the difference between that case and it not returning at all? This is especially important when you're working with external systems because it's very important to know whether the other system responded properly, but with an empty result set, or whether it crashed or timed out. This difference is rarely important to programmers but it's vitally important to the operations staffs that run online systems since they should be able to tell at a glance what part of the system to spend their time diagnosing.

My rule of thumb is that you shouldn't throw when you get empty responses. So if the database returns 0 rows for your query you should return an array or collection with zero elements, but there should be no exception since everything was processed properly. Save the exception for when things actually go wrong, such as timeouts, communications failures, etc.

This affects protocols as well. Let's say you're trying to design a message that can say "I know that the value of foo is 27" but it also needs to express "I don't know what the value of foo is". You've got a few possibilities, but the most important issue is: do you assign a specific value of the parameter (e.g. -1) to mean "don't know" or do you indicate that in some other way, such as a flag or different message type. I prefer the latter approach since it eliminates the issue of having to figure out a "magic" value of your parameter that means something different than all of the other possible values.

A guy that I once worked with wrote a research paper that went into extreme detail about the differences between different types of uncertainty in software development, i.e. the difference between "I don't know" and "I know that there's no answer". Actually, I didn't understand much beyond the abstract but my co-worker spent some time explaining it to me.

A related topic is the difference in relational dbms terms between null and 0, but that's a different discussion.

Other People are Smart, too

I've noticed the same antipattern on several projects that I've worked on. It manifests itself in code that's intended to "simplify" standard mechanisms, usually by wrapping the standard mechanism (e.g. HTML, JDBC, JMS) in something that's "easier to use". These wrappers are built with good intentions, but they're based on a fallacy that's very similar to the "noblesse oblige" from the previous centuries. In those times, people of a higher station were required by noblesse oblige to care for those who they believed were inferior to them. Now, people who write these libraries try to care for others who they believe aren't as smart as they are. In most cases they're doing themselves a disservice by hiding the mechanism from other users. A better alternative is to write code that simplifies the use of the mechanism while exposing its operation. This allows everyone to learn how it works.

The benefits of this approach are many, but the primary benefit is simplicity. The wrapper code, like all code, will have bugs, and even after they've been found and fixed, it's another layer of code that needs to be learned and executed, and these wrappers can be grossly inefficient at run-time especially if they use mechanisms like reflection to do their work.

Another benefit is that when you're hiring people, you can interview them to find out if they understand the standard mechanism, and if they do they can jump in and write code with very little learning. It's pointless to interview them to see if they know your proprietary wrapper, because of course they don't. They'll have to spend time learning how it works, even if they already know how the standard mechanism works.

So next time you're about to write some code that "simplifies" a standard technology, ask yourself if maybe your time wouldn't be better spent writing some documentation that will help people learn how the technology works. Sure, it's more satisfying to think that you're smarter than other people, but in the long run it's better to swallow your pride.

Edit | Attach | Print version | History: r25 < r24 < r23 < r22 < r21 | Backlinks | Raw View | Raw edit | More topic actions...
Tech.ProgrammingNotes moved from Tech.ProgrammingTips on 17 Feb 2004 - 18:59 by Main.guest - put it back
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding The Caboteria? Send feedback