The Art(?) Of Software Engineering

Saturday, April 7, 2007

Getting religion: XML

XML is out of control.

The insidious little creature has ingratiated itself with the Java standards committees, which themselves were long ago bought and sold to the unholy behemoths that stuff the latest JEE turd down our throats. Under their tutelage, XML has become the configuration file format of the 21st century; we can't get away from it in Java development. The two major build systems use it. The entire JEE spec is predicated on it. Spring, which has always been billed as the solution to the mistakes of JEE, prides itself on its XML configuration. Hibernate, which despite its flaws is worlds better than JDBC access, requires us to do our ORM in glorious, repetitive XML.

This has got to stop.

XML is a markup language; that means it is supposed to contain human-understandable text and information about that text. It was designed to be flexible to ensure adapting to arbitrary formats is easy. Whichever side you're on of the OOXML/Office XML debate you're on, the fact that the document is represented in XML is a win for all developers.

The flexibility, which has been the key to XML's wild success, has also been contorted by eager Java beavers to twist it into a general-purpose configuration language. Now it's debatable whether an XML representation of, say, a properties file is any worse than a simple key/value listing, but I would argue that at least it's not worse. However, when you start mapping database table schemas to XML, inserting namespaces for different kinds of constructs, and attempting to integrate those configurations with other programs, you wind up in a nasty place quick. On top of that, some people have even begun to hack procedural logic into XML (see the antcontrib tasks and the JSP tag library). Suddenly your XML has become a crappy approximation of a programming language. At this point, why aren't you better off using a programming language? As the XML gets more and more hairy, the parser grows similarly hairy -- just so you can map your XML into Java. But why are you trying so hard to keep your configuration in XML anyway? So it can be portable? (What other app is going to use Spring's application-context.xml?) So other languages can read it? (Ruby doesn't need a Hibernate XML file for ActiveRecord and never will.) Even if you do think of a good answer to that question, is it worth the ugly creeping horror that is your configuration parser?

If you need a programming language, don't be afraid to use one.

All right, so maybe Hibernate can't read the database for the table metadata for some reason (though I'm still not convinced that it shouldn't at least try). What's wrong with using Java to describe the schema instead of XML? It's not like it's not going to wind up in Java anyway, and the extra level of indirection doesn't buy you anything. At least use Java (or Ruby, or something useful) to generate the XML if you insist on having it; at this point, reconfiguration becomes the same as refactoring, and a good Java IDE is immensely helpful with that.

(Thankfully, Spring has finally started work on a Java configuration option, and it is tasty.)

Bottom line, guys and gals: the right tool for the right job. XML is not always the right tool, so don't use it when it isn't.

Friday, April 6, 2007

Getting religion: Testing

I've spent the past week at a Spring training session -- of which more later -- but the first thing I want to say has to do with testing.

We have maybe 20 people in here; most are on ancient Java tech, including 1.3 and JDBC, and Spring of course sounds compelling to them as it hits right where they need the most help. The subject of transactions came up, and naturally our instructors pushed hard on us that 1) they were necessary; 2) Spring made it easy to do them; 3) we should be doing it. Now I was prepared for 2 and 3, but I thought 1 was unnecessary. Of COURSE you do transactions when you're doing database development. All kinds of crap can go wrong if your operations aren't atomic by logical unit rather than by connection. Now while I wasn't surprised to hear people say transactions were a pain to implement in Java, I was shocked to hear them say that because of that, they didn't bother with them -- or even worse, said they were unnecessary. One guy claimed to know his database "well enough" that the kind of conflict described as arising out of a failed transaction "shouldn't be allowed to happen".

I had to bite my tongue to keep from laying into this guy.

So the instructor followed up with, "Well, how do you know you aren't having any problems with your code due to not using transactions?" "Well, we haven't seen any problems and our users haven't reported any." Um, genius, if your web site doesn't work for users for no apparent reason, they're not going to use it. They're not going to compose a detailed error report and help you track down the bug; that's not their job. The web is not an expanded testing department. It's not my mom's job to help you find the bugs in your site. If a user can't use your site, they'll just leave it and not come back.

Some rant-ish points:

I don't care how well you think you know the database you're using; you're wrong. You don't know it completely; no one person does. No one person even wrote it, so it's ridiculous to say you understand it completely. Second, it's not just the database you need to worry about; it's also the JDBC driver, the VM, the native platform, the versions of each, and a hundred other factors. It is way bigger than you, and it's time to humble yourself to that; the days when a programmer understood the machine completely are long gone and aren't ever coming back.
One day your database may change. I guarantee you won't understand it nearly as well as you do your current one, and that's incomplete at best.
It is not okay to write some code, run it through a couple of use cases, fix the obvious errors, and declare it production-ready. Your job is not to show that your code might work given the right conditions; you must show that your code cannot break in the wrong conditions.
Your test cases (the programmer's, I mean, not the QA department's) must test not only for correct results but for sensible and well-defined behavior when giving incorrect input. It doesn't just have to work; it also has to not break.
Treat your users with respect. They aren't programmers, and they don't get paid to use your code. You get paid to give your users an easy, non-confusing experience. Don't try to lazy your way out of it by saying that bulletproof code isn't worth your time -- you have no other function but to produce that bulletproof code.

Remember this at all times:

This is computer science, not computer religion; we don't have faith that things work, we have proof that they do.

One of the bedrocks of scientific theory is the idea of falsification; a scientific theory must be able to disprove statements that are false. (This is why creationism and intelligent design aren't science, since they can't meet this standard.) In this case, a falsifiable statement would be something like: If this operation is interrupted by the user, the database update will fail. You then write a unit test that creates such an interruption and then check if the update failed (either an exception was thrown or the data violates some integrity constraint).

A unit test is really a small proof that a falsifiable statement is in fact false; when you assert something at the end of a test, you are basically claiming that the lines of your code are a list of statements that logically demonstrates the truth (or falsity) of your falsifiable statement. This is exactly the same thing as a geometric proof, and that's not a coincidence. A good unit test only exercises one part of your class, just as a good proof only tries to establish one fact.

You know that your code is ready when it has been proven to not fail. The trick is figuring out all the ways your code might fail, and that's impossible except for the most trivial programs. However, you should be able to predict the vast majority of failure scenarios and prove through unit and integration testing that they will not break your code. If you're not doing that, you're not practicing computer science.

It's time to get religion about computer science.

Saturday, February 24, 2007

Elitism can suck it.

Slashdot | Raymond Knocks Fedora, Switches to Ubuntu

I'm not an open source "guru", "expert", or "leading voice" as ESR has been variously labeled. I think ESR has started believing his own press releases recently and become a creature of his own ego (which, let's admit, all programmers have more than the average share of). He's done some good things for open source, free software, and for the community of computer users at large. I own The Cathedral and the Bazaar, and I'm not rushing to the nearest used bookstore to get rid of it. However, he seems to think he still speaks from some high position of authority; he seems to believe he's above the petty and meaningless squabbles he thinks pervade the open source community, and therefore his opinions are well-considered and without bias when they're nothing of the sort.

I don't care if he switches to Ubuntu. I did it myself, and I couldn't be happier with it. I used Red Hat, and then Fedora, for 8 years, and it's where I learned how to use Linux. I'm not happy with the quality of recent Fedora Core releases, but they don't exist to please me. My personal preferences are not an objective set of criteria for evaluating the worthiness of a distribution, and neither are ESR's.

The fact that he now has a financial interest in Ubuntu/Linspire -- and thus has a conflict of interest in trashing Red Hat the company and promoting Canonical -- turns this from a egomaniacal explosion into something akin to FUD.

The very things ESR has spent the past year decrying -- elitism and a lack of concern for the users -- are now his stock in trade. He thinks he's more important than the average hacker, and he's not. He thinks his opinions matter more than the average hacker, and despite the mainstream media prostrate at his feet due to his remembered glory, they're not. He thinks he knows what users need. Only the users know what the users need, and the developer's job is to try his damnedest to give it to them.

Thursday, February 22, 2007

Back in touch with tech; or, How about a real programming language?

There's a singular joy in learning a new programming language, especially when you've been in the one- or one-and-a-half language rut for three years.

I tore through The Pragmatic Programmer, which is just as good as advertised, the past few days. While much of the book was already obvious to me and had already been part of my programming practice for a while, just as much was cleverly and brightly presented. "Wow, I can write my own code generators and source analyzers?" Not that it's beyond my ability to do so, but it had somehow never occurred to me to actually try it. Or I just never had a need.

Anyway, they also suggest keeping your "knowledge portfolio" up to date. It's become clear to me in the past few months that a lot of my knowledge is dating rapidly. A new language sounds like just the fix. While I'm at it, why not pick one that can help me write whizzy-bang code generators and analyzers?

I've done Perl in the past, but in my semi-regular checkins I've become impatient with the interminable Perl 6 gestation period. I'm not really interested in spending a lot of time to learn something experimental, and Perl 5 is 10 years old at this point with hardly any changes in that time. Perl is a really neat hack and a cool thing to work with, and I'm certainly not opposed to it, but as I find my own identity as an engineer and develop my own style, I feel that it's not the direction I want to be going in technically.

Next you're saying, "how about Python?" Well, I looked. Cool community, neat ideas, huge install base, pretty mature. Yet... there's something about it that just sounds foreign to my ears. While my company does a lot of internal work in Python, little of it is in spaces that scratch my itches. I think I'll pass for now. Maybe one day I'll check it out again, possibly even learn it, but right now I'll focus on the other alternative: Ruby, the other language we use internally.

This Ruby thing is neat. It works for me because:

it's different enough from Java and C# and D to be a good mind-stretcher;
it kicks ass at text processing;
it's dynamically typed and puristically object-oriented;
Rails is the new hotness in web architecting.

My Ruby education is just beginning, but already there are some ideas in it that have made me sit back and clap excitedly. Here's one neat nugget from an exception handling mechanism:


rescue ProtocolError
  if @esmtp then
    @esmtp = false
    retry
  else
    raise
  end
end

See what that does there? rescue is an exception handler, like a catch in the C++ family. Yet the keyword retry tells the control flow to reenter the exception scope after attempting to repair the error. Are you freaking kidding?! That kicks ass! That one little bit just blew my mind enough to make me completely interested in what Ruby has to say.

Wednesday, February 14, 2007

So, you want to do Maven? Good luck.

Ick.

Maven is the Next Big Thing in Java build management. Its goals and philosophy are pretty far from Ant, which in all honesty is a step in the right direction. Ant scales poorly, uses some ridiculous syntax, and becomes harder to understand the more you try to do with it. For slapping together a simple .war or .jar, it works fine. For controlling a vast array of slightly differing builds in an intelligent and maintainable way... well, let's just not mention that.

Maven has caught on among the Movers and Shakers for one reason only: dependency management. Maven has "repositories" which store version information for "artifacts" (fancy name for jars). You specify what version of something you want, and it does the Right AND Intelligent Thing: downloads it to a well-defined and organized location, automatically sets up classpaths, and makes it available for other Maven builds. This: 1) flat-out rocks; 2) solves one of the biggest problems with making Ant scaleable; 3) makes version conflicts a manageable problem instead of a hair-pulling fit of frustration. "Dammit, I said commons-lang 2.1, not 2.2! Users couldn't download jars properly if it guaranteed them a second life as Jessica Alba's Barcalounger!" Such days are now but a painful memory.

Unfortunately, they just HAD to go and cover this juicy nugget of awesome with a ponderous load of wtf. The pom.xml file, which is to Maven as build.xml is to Ant, is stunningly complex. It's true that you can get a lot more mileage out of a lot less XML in the simplest case, but that assumes you've laid out everything in your project according to the Maven "standard directory layout" -- and don't ask, because you haven't. "No problem, I'll just specify the location of everything!" Sure, do that. But even if you can figure it out without any documentation (of which more later), your nice, cute, simple POM just doubled or tripled in size. Worse, it now looks slightly familiar... I know, it's that huge list of Ant properties that every medium-to-large project has which has nothing in common with the similar list from any other medium-to-large project!

So if I want to use Maven, my first option is to move everything relating to my project into an arbitrary directory structure, thus:

screwing up my working Ant scripts, thus requiring me to rewrite them;
screwing me on all the taskdefs I'm using that have no corresponding Maven plugin;
giving me yet another moving-crap-around-subversion headache;
randomly hosing other useful things like the build machine's triggers, my cool Ruby code coverage scripts I'm prototyping, or proprietary analyzers that the Manager insists we waste otherwise useful CPU time on;
hope that the next version of Maven doesn't change its mind about what the arbitrary directory structure should be (like v2 did with v1), or I'll have to repeat this entire process.

If that sounds unpalatable, no problem: make a nice big ugly POM telling Maven where everything is. How is this better than what I already have? I get the same finished builds out of this POM -- once I've got it debugged -- that I do with Ant. Now I have two ugly sets of build scripts to maintain instead of one.

This is progress?

Recently I migrated a little Java library project of mine to Maven 2 from Ant in order to learn how it worked. I had a simple 150-line Ant script that compiled, ran tests, created javadocs, checked versions, and threw the whole mess into a neatly packaged jar with a bowtie on top. I'm not that retarded, I like to think, but the conversion took around 8 hours, and we're talking about maybe 75 classes in two packages with one external dependency. Now I have a 200-line POM that does the same thing in a completely different way, and it's only that short because I bit the bullet and converted to The One True Project Organization. Why would I try this on a big important project on which screwups or delays might mean my job?

One last item on the rant parade: documentation. Maven 2's is pathetic, and Maven 1, which is well near abandoned only two years after a huge effort to convert people to it, is only slightly better. Tons of important docs on the live site are still "coming soon", and the plugin documentation is long on parameters and short on examples. I don't care how to use esoteric feature X of weird tool Y. I just want a drop-in for doing javadocs and creating shippable packages. This shouldn't be hard to do, and once you figure it out, it really isn't, but no way should I be forced to spend valuable time ferreting out the solution myself through trial-and-error. (No, mailing lists and bug reports are not the answer. I should NEVER have to Ask The Expert When Simple Things Break.) Please, Maven guys, think about the users here.

There is a bit of good news. Maven 2 comes with some Ant tasks that allow you to do Maven-ish things from Ant, including dependency resolution. My shiny new Ant script (after the Conversion) downloads one small jar of Maven tasks, feeds a few simple bits of info on package name and version number, and the whole shebang gets downloaded right then and there and built in one go without needing to restart the build. And if the user ever wants to try the Maven build, it will reuse the previously downloaded dependencies without a single tag of additional configuration. Freaking awesome. Groin-grabbingly transcendent. The Real Build Solution, when it comes, will work in all respects as this little combination does.

Wednesday, February 7, 2007

Mac Java considered harmful

For the past two months, I've been doing daily Java development on a PowerBook G4. It sucks. Why?

First of all, it's slow. "Well, duh, Java is always slow," squawks the ill-informed peanut gallery. No, as a matter of fact, it's not; runtime performance has gotten to the point where it's comparable with native code on almost every server platform, and I'm not hacking 3D game engines here.

What I mean is that Apple's implementation doesn't compare favorably with Sun's or IBM's. The latest released version of the SDK from Apple is still stuck on 1.5.0_06, which is getting pretty long in the tooth, and the beta download pushes it all the way to 1.5.0_07. The latest from Sun is _11. Also, 1.6 final has been out for over a month now while the Apple beta of 1.6 is months old with no word on updates.

Finally, there's the widely-reported Steve Jobs quote from just after the iPhone presentation (which -- you heard it here first -- will be a huge bomb):

Java’s not worth building in. Nobody uses Java anymore. It’s this big heavyweight ball and chain.

Gee, thanks, Steve. I guess I'll just go shut down those servers running my company's websites.

I know several people are saying he was only talking about the iPhone, but this is a pretty blanket condemnation when he could have said something like "Java doesn't make sense on the iPhone for us". I may have even accepted that prima facie, since client-side Java (outside of J2ME) has been more or less a bust. But combined with the apparent neglect of their Java implementation after extravagant promises on its performance on OS X, this sounds ominous.

Saturday, January 27, 2007

MIXing it up

Is there any value to reading The Art of Computer Programming anymore?

The series is a classic, and that means it carries loads of baggage along with its merits. I don't know of any universally accepted definition for a classic other than the tongue-in-cheek one:

A classic is something that everybody wants to have read and nobody wants to read. -- Mark Twain

It sure strikes close to home for someone with a liberal arts education. Most software developers I know speak of the book with a mix of reverent respect and pitying affection; "it was an amazing achievement for its time, but it's so outdated now that it's more of a historical document than a useful guide for modern developers." My training has conditioned me to distrust this kind of attitude, if for no other reason than I spent my entire college career reading books that fit just this description. Plus, when I want to understand something through a book, I want the source; I hate textbooks and Reader's Digest-style summaries. I don't need any intermediaries between me and the original idea. I can handle it on its own merits. (Well, at least I SHOULD be able to.)

So I decided I wanted to read it. I got it for Christmas, and I'm up to about page 150. The first volume starts with some really intense math, and although I was always good at math, I'll admit frankly that I didn't understand most of it. Most "classics" are like that, though; the first time through your reaction is "Huh?", and the juicy nuggets only reveal themselves through repeated readings. So I pressed on and was treated to MIX, the ideal machine Knuth designed to illustrate algorithms in code.

MIX strongly resembles computers of the 60s, and its guts are unlike those of any modern machine. It's got registers and memory cells but no operating system; programs are written directly on the hardware layer in raw five-byte words. Bytes in MIX are not eight binary digits; they can be binary or decimal(!) and the only guarantee the programmer has is that they can contain at least 64 values (0-63) and at most 100. This is weird enough, as I've been thinking in binary forever. When dealing with values 64-100, you have to use two bytes to avoid undefined results; if it's a binary computer and needs two bytes for 90 and you only copy the first byte, you only get 63.

I haven't gotten to program this thing seriously yet (there's an assembly language called MIXAL that comes next), but it's radically different from a higher-order language. The machine does really next to nothing for you. You get to implement algorithms at the lowest level, which of course is the point, but I haven't implemented linked lists in assembly before.

Anyway, is implementing basic algorithms on an ideal machine really going to make me a better programmer? I don't know. I need to get a little further.