The Art(?) Of Software Engineering

Monday, August 20, 2007

Sun is misusing the GPL for Java: Part 2

Slashdot | Sun Lowers Barriers to Open-Source Java

Now for Part 2.

Last time I ran down what I see as the differences between the FSF and the ASF's handling of their communities as expressed in their respective licenses and practices. I feel they start from very similar places yet wind up in ones that, in a purely practical sense, function differently, even to the point of not being able to collaborate. This is a shame, but both groups have sound reasons for doing things their way. It makes it hard on those of us who want to participate in both, however.

When Sun announced the release of Java under the GPL, I was naturally delighted. Finally a free software system could include Java, finally the gray cloud hanging over Java developers with a conscience could finally dissolve, finally we could inspect (and even fix) core Java classes without fear of taint or lawsuit, etc. It was a good day all around.

Until the news dropped recently that Sun was changing the license terms of the Java Compatibility Kit (JCK). This was expected since the previous terms of the JCK's license conflicted with the GPL, but Sun went a step further and declared that only Java implementations "substantially based" on Sun's GPL implementation would get access to the JCK. This is a good thing, since people who patch Java can still get it certified as Java, including ports to different platforms and architectures, but it is also a big middle finger to all existing Java implementations, especially Harmony and Classpath. (Also, several projects trying to offer a free software JVM, like Kaffe and Japhar, are also affected.) Sun has effectively delegitimized these other efforts, saying, "They're not Java, they never were Java, and they never will be. There is only one place to get real Java, and it's from us."

The most important thing about Java in Sun's eyes has always been control: control of its brand, control of its implementation, and control of its uses. Java has never been nearly as open as some people would have you believe: the first version was licensed in frankly despicable terms, turning off a lot of interested developers who were willing to look past the ubiquitous and insipid hype. It's also easy to forget that for several years, Java was crippled by poor performance and buggy internals. Microsoft's JVM was preferred not because it was included in the OS (applets were already a dud) but because it was faster and more stable than the official. Naturally Microsoft sought to embrace and extend Java into something they could control, and of course Sun fought back and eventually won, but the damage to Java's reputation had been done. The development and later uptake of C# is significant also because it offered something developers needed: a Java that performed very well that hooked deeply into Windows.

Finally, Sun's obsessive compulsion to control all things Java retarded its development. Sun doesn't have the resources to push Java along nearly as effectively as a group of interested developers passionate about the technology. If Java had been GPLed in, say, 1997, it would be very different from what it is now. I'm not sure it would be the same thing at all, but I am sure it would be a better thing. With developer interest, a free software license, and tons of time to research and experiment, we could have things like a real native code compiler, streamlined and less bloated VMs, tighter class libraries, and a myriad of other pipe dreams we're probably never going to get now. The determination to do new language features like generics while preserving very strict backward compatibility guarantees has resulted in a clunky and difficult development environment; if the technology were under the control of a group that tried to do what was best for the technology instead of what was best for the interests of a tiny minority of its users, there would be at least a fighting chance to avoid a situation like this.

Sun has found a way to control Java by ostensibly setting it free, and they're doing it by using the GPL as a club. The GPL's copyleft was designed to preserve the Four Freedoms after the software left the hands of the original developer. Sun has turned it into a weapon against people who can't or won't develop Java according to their rules. They plan to "open" the Java community not by joining with the existing ones but by creating a new one and marginalizing every other existing one.

When I say "misusing the GPL", I mean that Sun is going against the spirit of it, not the letter of it. Sun has every right to take this action, and as a member of the FSF it's hard to find fault with it -- it's what I've wanted for years. But in my time in the ASF, I've come to appreciate the work they've done, and they have a legitimate beef with Sun over their conduct in the JCP and by talking out of both sides of their mouth with respect to the ASF's access to other TCKs. (See Geir's letter for more.) This, apparently, is Sun's response.

While the new Java may be a GPL world, there is reason to wonder why anyone at the ASF should spend further time working on Java. Sun has left them for good, and pending a licensing resolution with the existing Java code, it will be more difficult to construct a system with Sun and ASF code from now on. At the same time, Classpath stands to benefit hugely from this, again at Harmony's expense; fixing Classpath and Sun Java is very easy while fixing Harmony is that much harder.

I'll have to think about what this means for ASF.

Thursday, August 16, 2007

Sun is misusing the GPL for Java: Part 1

Slashdot | Sun Lowers Barriers to Open-Source Java

I'm a proud member of the FSF as well as the ASF. For years, the incompatibility between the GPL and the ASL drove me nuts as well as many other developers. Building a complete free software Java platform is a Herculean task, and the duplication of effort between Classpath and Harmony struck me as needless. Of course the developers involved are free to do as they please, and I happily support any effort producing free software, but from the practical standpoint of just wanting one to use, neither had achieved parity with the Sun implementation. As a Java developer who wanted to make a living , I needed a rock-solid stable Java, and there just wasn't a free software option.

I don't believe there's any significant ethical difference between the FSF philosophy of free software and the ASF philosophy of open, community-driven collaborative development. I see it as two viewpoints on the same principle. Several developers work on projects for both organizations, and I know of no ethical conflict between doing so from anyone's perspective. However, the different emphases of both organizations, as expressed in their licenses, give rise to some annoying and probably unintended results.

Both the GPL and ASL are free software licenses because they protect the Four Freedoms, but the GPL uses copyleft to prevent someone else from restricting the Four Freedoms once it leaves the copyright holder's control. The ASL doesn't care about that beyond some patent restriction language that is not in and of itself a bad idea. The GPL is both a shield and a club: the shield protects the user from legal responsibility for the code and preserves the Four Freedoms for that user; the club is used to beat around the head and neck those who, having had the freedoms extended to them, would then seek to deny it to others. The GPL is the summation and distillation of everything the FSF believes in.

The ASL is emphatically NOT a distillation of everything the ASF believes. The ASF has tons of rules regarding how projects must be managed, handled, advanced, promoted, demoted, etc. These procedures are designed to ensure that any ASF project is developed in the open, that the community of users and developers can always be heard from, and that no project can ever be taken over by any individual or group hostile to the spirit of openness.

I think this is the key distinction between the two groups: FSF uses the GPL to control its community, ASF uses its culture and members to control the community.

I'm not trying to spark a debate about which is "better" or "more ethical"; again, these are different outgrowths of what I believe is the same thing, the love of what the FSF calls "free software" and what the ASF calls "open, collaborative software development". However, this doesn't mean that the different approaches work equally well in all possible situations.

Consider a software project whose copyright is held by the FSF. Anyone contributing to a FSF project must legally transfer their copyright on their contributed code to the FSF; this is done because the GPL copyleft is much easier to enforce if a single entity holds the copyright. The FSF cares about nothing more deeply than the Four Freedoms and will do anything it thinks is necessary to protect and defend them through the GPL. It's understandable that many individuals who would like to contribute to a FSF-controlled project cannot do so due to this restriction. For example, a programmer's company usually claims copyright on any code written while in the employ of that company, sometimes even code written on the programmer's own time and equipment. This programmer may not participate in the project through no fault of his own.

Also, imagine a software fork: the canonical example here is the Emacs/XEmacs division. The GPL protects the right to fork, and so XEmacs is just as legal as Emacs is. Without getting into the history of this rather ugly story, the FSF and Lucid, both acting in good faith, were unable to come to a consensus on the technical merits of Lucid's Emacs patches. The two projects forked, developed their own communities of users and developers, and continued on their separate ways. I'm not criticizing either program or either community; I only want to point out that the result of the dispute was a fork.

Contrast this to the ASF approach. ASF committers are not required to turn over copyright on their work other than under the terms of the ASL; however, ASL members must be individuals, not companies. Many contributors may insist on keeping their own copyrights, and the ASF allows this. ASF projects are controlled by a management committee, the leader of which reports to the Apache Board, and are answerable to the same. Major project decisions, such as releases, require a lazy consensus vote, and commits to a project require unanimous consent. Because the ASF puts so much effort into its community, forks and major disputes are much less common than in FSF-controlled projects; when they do arise, they are resolved not necessarily to everyone's satisfaction but to the point that the project may continue. For getting things done, this approach has obvious advantages. Again, the ASF cares about open development for its own sake and is less concerned with guaranteeing the FSF's Four Freedoms.

So what does all of this have to do with Java? My next post will discuss that, as well as ripping Sun a new one for misusing the GPL.

Thursday, May 24, 2007

Shiny rails and testing

Man, Rails kicks ass.

After getting a hold of Ruby a few months ago, I've been looking deeply into Rails. There are a couple things at work that use it, and I'll probably wind up maintaining them if we don't replace them altogether.

There's little point in me talking about how great Rails is, since you can find Rails worship sites just by typing random letters into a google search, but I am going to emphasize the thing that really kicks unseemly amounts of ass -- the one thing that no one else has done nearly as well: testing.

Rails does first-class support for model (unit) testing and controller (functional) testing better than any other framework I know, including the one I support professionally. While unit tests have been around forever, there hasn't been a concerted push to get a comprehensive functional test strategy in Java. Part of the problem, of course, is that Java has more web frameworks than France has baguette shops, and several of them seem to make a point of being as different as possible from any other framework. Rails, being an all-in-one solution for your web needs, can offer integrated tests easily, but for the Java offerings you more or less have to reinvent the testing axle every time you reinvent the wheel.

Case in point: Our framework includes Struts 1.3.8, which for purposes of this discussion means we support it and give people a number to call if it breaks. We also ship a sample application that uses our framework, and I volunteered to maintain/update/rewrite it for the next release. The guy who did the first version is no longer with the company, and he never bothered to write unit tests for his models nor functional tests for his controllers. Now it's no indictment of the technology if the developer is too lazy to code properly, but after filling in the missing tests I got to Struts. I'm not willing to ship an app without all necessary test coverage, and controller routing and request processing falls within that box. (If you disagree with this, you are wrong.) So I poked around for some test help and came across StrutsTestCase, which seemed to be just what the doctor ordered. Unfortunately, it's apparently been abandoned and the latest release is three years old, not to mention it doesn't even compile under Java 5. So I've been working on something similar for our needs.

All told, the googling/studying/problem solving/coding that I'll have done by the time this is finished will add up to a decent chunk of effort. I remind those reading that Struts 1 is still the most widely used MVC Java web framework, so one would expect to find solid testing support for it. Well, you won't, and there isn't.

The comparison to Rails, where a test case for your controller can be generated with a simple script and the entire test written and done in minutes, reflects badly on the Java world. I can only conclude that Struts shops write their action mappings and then laboriously, manually test each one through a running browser. In addition to being inefficient, slow, and boring, it takes forever, which is another way of saying that it won't be done. This is the kind of thing the computer should be doing for you, people. This isn't a Java vs. Ruby issue, and it's not even an issue particular to Struts; there is a resistance to spending code effort to automate tests that can and should be automated.

We need to make managers aware of the need for automated functional testing so time for it gets built into schedules. We need to improve our testing tools so they'll be ready to handle new technologies. We must spend our time working on difficult problems, not fighting the machine.

Guess I'll start by writing a test library.

Saturday, April 7, 2007

Getting religion: XML

XML is out of control.

The insidious little creature has ingratiated itself with the Java standards committees, which themselves were long ago bought and sold to the unholy behemoths that stuff the latest JEE turd down our throats. Under their tutelage, XML has become the configuration file format of the 21st century; we can't get away from it in Java development. The two major build systems use it. The entire JEE spec is predicated on it. Spring, which has always been billed as the solution to the mistakes of JEE, prides itself on its XML configuration. Hibernate, which despite its flaws is worlds better than JDBC access, requires us to do our ORM in glorious, repetitive XML.

This has got to stop.

XML is a markup language; that means it is supposed to contain human-understandable text and information about that text. It was designed to be flexible to ensure adapting to arbitrary formats is easy. Whichever side you're on of the OOXML/Office XML debate you're on, the fact that the document is represented in XML is a win for all developers.

The flexibility, which has been the key to XML's wild success, has also been contorted by eager Java beavers to twist it into a general-purpose configuration language. Now it's debatable whether an XML representation of, say, a properties file is any worse than a simple key/value listing, but I would argue that at least it's not worse. However, when you start mapping database table schemas to XML, inserting namespaces for different kinds of constructs, and attempting to integrate those configurations with other programs, you wind up in a nasty place quick. On top of that, some people have even begun to hack procedural logic into XML (see the antcontrib tasks and the JSP tag library). Suddenly your XML has become a crappy approximation of a programming language. At this point, why aren't you better off using a programming language? As the XML gets more and more hairy, the parser grows similarly hairy -- just so you can map your XML into Java. But why are you trying so hard to keep your configuration in XML anyway? So it can be portable? (What other app is going to use Spring's application-context.xml?) So other languages can read it? (Ruby doesn't need a Hibernate XML file for ActiveRecord and never will.) Even if you do think of a good answer to that question, is it worth the ugly creeping horror that is your configuration parser?

If you need a programming language, don't be afraid to use one.

All right, so maybe Hibernate can't read the database for the table metadata for some reason (though I'm still not convinced that it shouldn't at least try). What's wrong with using Java to describe the schema instead of XML? It's not like it's not going to wind up in Java anyway, and the extra level of indirection doesn't buy you anything. At least use Java (or Ruby, or something useful) to generate the XML if you insist on having it; at this point, reconfiguration becomes the same as refactoring, and a good Java IDE is immensely helpful with that.

(Thankfully, Spring has finally started work on a Java configuration option, and it is tasty.)

Bottom line, guys and gals: the right tool for the right job. XML is not always the right tool, so don't use it when it isn't.

Friday, April 6, 2007

Getting religion: Testing

I've spent the past week at a Spring training session -- of which more later -- but the first thing I want to say has to do with testing.

We have maybe 20 people in here; most are on ancient Java tech, including 1.3 and JDBC, and Spring of course sounds compelling to them as it hits right where they need the most help. The subject of transactions came up, and naturally our instructors pushed hard on us that 1) they were necessary; 2) Spring made it easy to do them; 3) we should be doing it. Now I was prepared for 2 and 3, but I thought 1 was unnecessary. Of COURSE you do transactions when you're doing database development. All kinds of crap can go wrong if your operations aren't atomic by logical unit rather than by connection. Now while I wasn't surprised to hear people say transactions were a pain to implement in Java, I was shocked to hear them say that because of that, they didn't bother with them -- or even worse, said they were unnecessary. One guy claimed to know his database "well enough" that the kind of conflict described as arising out of a failed transaction "shouldn't be allowed to happen".

I had to bite my tongue to keep from laying into this guy.

So the instructor followed up with, "Well, how do you know you aren't having any problems with your code due to not using transactions?" "Well, we haven't seen any problems and our users haven't reported any." Um, genius, if your web site doesn't work for users for no apparent reason, they're not going to use it. They're not going to compose a detailed error report and help you track down the bug; that's not their job. The web is not an expanded testing department. It's not my mom's job to help you find the bugs in your site. If a user can't use your site, they'll just leave it and not come back.

Some rant-ish points:

I don't care how well you think you know the database you're using; you're wrong. You don't know it completely; no one person does. No one person even wrote it, so it's ridiculous to say you understand it completely. Second, it's not just the database you need to worry about; it's also the JDBC driver, the VM, the native platform, the versions of each, and a hundred other factors. It is way bigger than you, and it's time to humble yourself to that; the days when a programmer understood the machine completely are long gone and aren't ever coming back.
One day your database may change. I guarantee you won't understand it nearly as well as you do your current one, and that's incomplete at best.
It is not okay to write some code, run it through a couple of use cases, fix the obvious errors, and declare it production-ready. Your job is not to show that your code might work given the right conditions; you must show that your code cannot break in the wrong conditions.
Your test cases (the programmer's, I mean, not the QA department's) must test not only for correct results but for sensible and well-defined behavior when giving incorrect input. It doesn't just have to work; it also has to not break.
Treat your users with respect. They aren't programmers, and they don't get paid to use your code. You get paid to give your users an easy, non-confusing experience. Don't try to lazy your way out of it by saying that bulletproof code isn't worth your time -- you have no other function but to produce that bulletproof code.

Remember this at all times:

This is computer science, not computer religion; we don't have faith that things work, we have proof that they do.

One of the bedrocks of scientific theory is the idea of falsification; a scientific theory must be able to disprove statements that are false. (This is why creationism and intelligent design aren't science, since they can't meet this standard.) In this case, a falsifiable statement would be something like: If this operation is interrupted by the user, the database update will fail. You then write a unit test that creates such an interruption and then check if the update failed (either an exception was thrown or the data violates some integrity constraint).

A unit test is really a small proof that a falsifiable statement is in fact false; when you assert something at the end of a test, you are basically claiming that the lines of your code are a list of statements that logically demonstrates the truth (or falsity) of your falsifiable statement. This is exactly the same thing as a geometric proof, and that's not a coincidence. A good unit test only exercises one part of your class, just as a good proof only tries to establish one fact.

You know that your code is ready when it has been proven to not fail. The trick is figuring out all the ways your code might fail, and that's impossible except for the most trivial programs. However, you should be able to predict the vast majority of failure scenarios and prove through unit and integration testing that they will not break your code. If you're not doing that, you're not practicing computer science.

It's time to get religion about computer science.

Saturday, February 24, 2007

Elitism can suck it.

Slashdot | Raymond Knocks Fedora, Switches to Ubuntu

I'm not an open source "guru", "expert", or "leading voice" as ESR has been variously labeled. I think ESR has started believing his own press releases recently and become a creature of his own ego (which, let's admit, all programmers have more than the average share of). He's done some good things for open source, free software, and for the community of computer users at large. I own The Cathedral and the Bazaar, and I'm not rushing to the nearest used bookstore to get rid of it. However, he seems to think he still speaks from some high position of authority; he seems to believe he's above the petty and meaningless squabbles he thinks pervade the open source community, and therefore his opinions are well-considered and without bias when they're nothing of the sort.

I don't care if he switches to Ubuntu. I did it myself, and I couldn't be happier with it. I used Red Hat, and then Fedora, for 8 years, and it's where I learned how to use Linux. I'm not happy with the quality of recent Fedora Core releases, but they don't exist to please me. My personal preferences are not an objective set of criteria for evaluating the worthiness of a distribution, and neither are ESR's.

The fact that he now has a financial interest in Ubuntu/Linspire -- and thus has a conflict of interest in trashing Red Hat the company and promoting Canonical -- turns this from a egomaniacal explosion into something akin to FUD.

The very things ESR has spent the past year decrying -- elitism and a lack of concern for the users -- are now his stock in trade. He thinks he's more important than the average hacker, and he's not. He thinks his opinions matter more than the average hacker, and despite the mainstream media prostrate at his feet due to his remembered glory, they're not. He thinks he knows what users need. Only the users know what the users need, and the developer's job is to try his damnedest to give it to them.

Thursday, February 22, 2007

Back in touch with tech; or, How about a real programming language?

There's a singular joy in learning a new programming language, especially when you've been in the one- or one-and-a-half language rut for three years.

I tore through The Pragmatic Programmer, which is just as good as advertised, the past few days. While much of the book was already obvious to me and had already been part of my programming practice for a while, just as much was cleverly and brightly presented. "Wow, I can write my own code generators and source analyzers?" Not that it's beyond my ability to do so, but it had somehow never occurred to me to actually try it. Or I just never had a need.

Anyway, they also suggest keeping your "knowledge portfolio" up to date. It's become clear to me in the past few months that a lot of my knowledge is dating rapidly. A new language sounds like just the fix. While I'm at it, why not pick one that can help me write whizzy-bang code generators and analyzers?

I've done Perl in the past, but in my semi-regular checkins I've become impatient with the interminable Perl 6 gestation period. I'm not really interested in spending a lot of time to learn something experimental, and Perl 5 is 10 years old at this point with hardly any changes in that time. Perl is a really neat hack and a cool thing to work with, and I'm certainly not opposed to it, but as I find my own identity as an engineer and develop my own style, I feel that it's not the direction I want to be going in technically.

Next you're saying, "how about Python?" Well, I looked. Cool community, neat ideas, huge install base, pretty mature. Yet... there's something about it that just sounds foreign to my ears. While my company does a lot of internal work in Python, little of it is in spaces that scratch my itches. I think I'll pass for now. Maybe one day I'll check it out again, possibly even learn it, but right now I'll focus on the other alternative: Ruby, the other language we use internally.

This Ruby thing is neat. It works for me because:

it's different enough from Java and C# and D to be a good mind-stretcher;
it kicks ass at text processing;
it's dynamically typed and puristically object-oriented;
Rails is the new hotness in web architecting.

My Ruby education is just beginning, but already there are some ideas in it that have made me sit back and clap excitedly. Here's one neat nugget from an exception handling mechanism:


rescue ProtocolError
  if @esmtp then
    @esmtp = false
    retry
  else
    raise
  end
end

See what that does there? rescue is an exception handler, like a catch in the C++ family. Yet the keyword retry tells the control flow to reenter the exception scope after attempting to repair the error. Are you freaking kidding?! That kicks ass! That one little bit just blew my mind enough to make me completely interested in what Ruby has to say.