Saturday, April 7, 2007

Getting religion: XML

XML is out of control.

The insidious little creature has ingratiated itself with the Java standards committees, which themselves were long ago bought and sold to the unholy behemoths that stuff the latest JEE turd down our throats. Under their tutelage, XML has become the configuration file format of the 21st century; we can't get away from it in Java development. The two major build systems use it. The entire JEE spec is predicated on it. Spring, which has always been billed as the solution to the mistakes of JEE, prides itself on its XML configuration. Hibernate, which despite its flaws is worlds better than JDBC access, requires us to do our ORM in glorious, repetitive XML.

This has got to stop.

XML is a markup language; that means it is supposed to contain human-understandable text and information about that text. It was designed to be flexible to ensure adapting to arbitrary formats is easy. Whichever side you're on of the OOXML/Office XML debate you're on, the fact that the document is represented in XML is a win for all developers.

The flexibility, which has been the key to XML's wild success, has also been contorted by eager Java beavers to twist it into a general-purpose configuration language. Now it's debatable whether an XML representation of, say, a properties file is any worse than a simple key/value listing, but I would argue that at least it's not worse. However, when you start mapping database table schemas to XML, inserting namespaces for different kinds of constructs, and attempting to integrate those configurations with other programs, you wind up in a nasty place quick. On top of that, some people have even begun to hack procedural logic into XML (see the antcontrib tasks and the JSP tag library). Suddenly your XML has become a crappy approximation of a programming language. At this point, why aren't you better off using a programming language? As the XML gets more and more hairy, the parser grows similarly hairy -- just so you can map your XML into Java. But why are you trying so hard to keep your configuration in XML anyway? So it can be portable? (What other app is going to use Spring's application-context.xml?) So other languages can read it? (Ruby doesn't need a Hibernate XML file for ActiveRecord and never will.) Even if you do think of a good answer to that question, is it worth the ugly creeping horror that is your configuration parser?

If you need a programming language, don't be afraid to use one.


All right, so maybe Hibernate can't read the database for the table metadata for some reason (though I'm still not convinced that it shouldn't at least try). What's wrong with using Java to describe the schema instead of XML? It's not like it's not going to wind up in Java anyway, and the extra level of indirection doesn't buy you anything. At least use Java (or Ruby, or something useful) to generate the XML if you insist on having it; at this point, reconfiguration becomes the same as refactoring, and a good Java IDE is immensely helpful with that.

(Thankfully, Spring has finally started work on a Java configuration option, and it is tasty.)

Bottom line, guys and gals: the right tool for the right job. XML is not always the right tool, so don't use it when it isn't.

Friday, April 6, 2007

Getting religion: Testing

I've spent the past week at a Spring training session -- of which more later -- but the first thing I want to say has to do with testing.

We have maybe 20 people in here; most are on ancient Java tech, including 1.3 and JDBC, and Spring of course sounds compelling to them as it hits right where they need the most help. The subject of transactions came up, and naturally our instructors pushed hard on us that 1) they were necessary; 2) Spring made it easy to do them; 3) we should be doing it. Now I was prepared for 2 and 3, but I thought 1 was unnecessary. Of COURSE you do transactions when you're doing database development. All kinds of crap can go wrong if your operations aren't atomic by logical unit rather than by connection. Now while I wasn't surprised to hear people say transactions were a pain to implement in Java, I was shocked to hear them say that because of that, they didn't bother with them -- or even worse, said they were unnecessary. One guy claimed to know his database "well enough" that the kind of conflict described as arising out of a failed transaction "shouldn't be allowed to happen".

I had to bite my tongue to keep from laying into this guy.

So the instructor followed up with, "Well, how do you know you aren't having any problems with your code due to not using transactions?" "Well, we haven't seen any problems and our users haven't reported any." Um, genius, if your web site doesn't work for users for no apparent reason, they're not going to use it. They're not going to compose a detailed error report and help you track down the bug; that's not their job. The web is not an expanded testing department. It's not my mom's job to help you find the bugs in your site. If a user can't use your site, they'll just leave it and not come back.

Some rant-ish points:

  • I don't care how well you think you know the database you're using; you're wrong. You don't know it completely; no one person does. No one person even wrote it, so it's ridiculous to say you understand it completely. Second, it's not just the database you need to worry about; it's also the JDBC driver, the VM, the native platform, the versions of each, and a hundred other factors. It is way bigger than you, and it's time to humble yourself to that; the days when a programmer understood the machine completely are long gone and aren't ever coming back.
  • One day your database may change. I guarantee you won't understand it nearly as well as you do your current one, and that's incomplete at best.
  • It is not okay to write some code, run it through a couple of use cases, fix the obvious errors, and declare it production-ready. Your job is not to show that your code might work given the right conditions; you must show that your code cannot break in the wrong conditions.
  • Your test cases (the programmer's, I mean, not the QA department's) must test not only for correct results but for sensible and well-defined behavior when giving incorrect input. It doesn't just have to work; it also has to not break.
  • Treat your users with respect. They aren't programmers, and they don't get paid to use your code. You get paid to give your users an easy, non-confusing experience. Don't try to lazy your way out of it by saying that bulletproof code isn't worth your time -- you have no other function but to produce that bulletproof code.
Remember this at all times:
This is computer science, not computer religion; we don't have faith that things work, we have proof that they do.
One of the bedrocks of scientific theory is the idea of falsification; a scientific theory must be able to disprove statements that are false. (This is why creationism and intelligent design aren't science, since they can't meet this standard.) In this case, a falsifiable statement would be something like: If this operation is interrupted by the user, the database update will fail. You then write a unit test that creates such an interruption and then check if the update failed (either an exception was thrown or the data violates some integrity constraint).

A unit test is really a small proof that a falsifiable statement is in fact false; when you assert something at the end of a test, you are basically claiming that the lines of your code are a list of statements that logically demonstrates the truth (or falsity) of your falsifiable statement. This is exactly the same thing as a geometric proof, and that's not a coincidence. A good unit test only exercises one part of your class, just as a good proof only tries to establish one fact.

You know that your code is ready when it has been proven to not fail. The trick is figuring out all the ways your code might fail, and that's impossible except for the most trivial programs. However, you should be able to predict the vast majority of failure scenarios and prove through unit and integration testing that they will not break your code. If you're not doing that, you're not practicing computer science.

It's time to get religion about computer science.