Simplicity and Strictness

As a general rule, the stricter your application is, the simpler it is to write.

For example, imagine a program that accepts only the numbers 1 and 2 as input and rejects everything else. Even a tiny variation in the input, like adding a space before or after “1” would cause the program to throw an error. That would be very “strict” and extremely simple to write. All you’d have to do is check, “Did they enter exactly 1 or exactly 2? If not, throw an error.”

In most situations, though, such a program would be so strict as to be impractical. If the user doesn’t know the exact format you expect your input in, or if they accidentally hit the spacebar or some other key when entering a number, the program will frustrate the user by not “doing what they mean.”

That’s a case where there is a trade-off between simplicity (strictness) and usability. Not all cases of strictness have that trade-off, but many do. If I allow the user to type in 1, One, or " 1" as input, that allows for a lot more user mistakes and makes life easier for them, but also adds code and complexity to my program. Less-strict programs often take more code than strict ones, which is really directly where the complexity comes from.

(By the way, if you’re writing frameworks or languages for programmers, one of the best things you can do is make this type of “non-strictness” as simple as possible, to eliminate the trade-off between usability and complexity, and let them have the best of both worlds.)

Of course, on the other side of things, if I allowed the user to type in O1n1e1 and still have that be accepted as “1”, that would just add needless complexity to my code. We have to be more strict than that.

Strictness is mostly about what input you allow, like the examples above. I suppose in some applications (like, say, a SOAP library), you could have output strictness, too–output that always conforms to a particular, exact standard. But usually, it’s about what input you accept and what input causes an error.

Probably the best-known strictness disaster is HTML. It wasn’t designed to be very strict in the beginning, and as it grew over the years, processing it became a nightmare for the designers of web browsers. Of course, it was eventually standardized, but by that time most of the HTML out there was pretty horrific, and still is. And because it wasn’t strict from the beginning, now nobody can break backwards compatibility and make it strict.

Some people argue that HTML is commonly used because it’s not strict. That the non-strictness of its design makes it popular. That if web browsers had always just thrown an error instead of accepting invalid HTML, somehow people would not have used HTML.

That is a patently ridiculous argument. Imagine a restaurant where the waiter could never say, “Oh, we don’t have that.” So I ask for a “fresh chicken salad”, and I get a live chicken, because that’s “the closest they have.” I would get pretty frustrated with that restaurant. Similarly, if I tell the web browser to do something, and instead of throwing an error it tries to guess what I meant, I get frustrated with the web browser. It can be pretty hard to figure out why my page “doesn’t look right”, now.

So why didn’t the browser just tell me I’d done something wrong, and make life easy for me? Well, because HTML is so un-strict that it’s impossible for the web browser to know that I have done something wrong! It just goes ahead and drops a live chicken on my table without any lettuce.

Granted, I know that at this point that you can’t make HTML strict without “breaking the web.” My point is that we got into that situation because HTML wasn’t strict from the beginning. I’m not saying that it should suddenly become strict now, when it would be almost impossible. (Though there’s nothing wrong with slowly taking incremental steps in that direction.)

In general, I am strongly of the opinion that computers should never “guess” or “try to do their best” with input. That leads to a nightmare of complexity that can easily spiral out of control. The only good guessing is in things like Google’s spelling suggestions–where it gives you the option of doing something, but doesn’t just go ahead and do something for you based on that guess. This is an important part of what I mean by strictness–input is either right or wrong, it’s never a “maybe.” If one input could possibly have two meanings, then you should either present the user with a choice or throw an error.

I could go on about this all day–the world of computers is full of things that should have been strict from the beginning, and became ridiculously complex because they weren’t.

Now, some applications are forced to be non-strict. For example, anything that takes voice commands has to be pretty un-strict about how people talk, or it just won’t work at all. But those sorts of applications are the exception. Keyboards are very accurate input devices, mouses slightly less so but still pretty good. You can require input from those to be in a certain format, as long as you aren’t making life too difficult for the user.

Of course, it’s still important to strive for usability–after all, computers are here to help humans do things. But you don’t necessarily have to accept every input under the sun just to be usable. All that does is get you into a maze of complexity, and good luck finding your way out–they never strictly standardized on any way to write maps for the maze. 🙂

-Max

13 Comments

  1. I think it would be very hard for a browser to gain marketshare, if it was stricter than other browsers and simply refused to display content that other browsers would display, only because it isn’t 100% valid. Even if all browser vendors agreed to reject invalid content, it would be unlikely that all browsers managed to determine with 100% certainty whether a given page was valid or not.

    I once had a mail account at a mail server that checked for compliance with all kinds of RFC’s before accepting a mail (e.g. it called back to the origin domain to see whether it had a valid postmaster@domain account). Many senders happened to use a mailserver with some kind of (small) misconfiguration that prevented their mails from reaching me because of these checks. I complained to the senders and the administrators of their mailservers, but they usually didn’t care or didn’t know what to do about it. In terms of interoperability, this was a pretty bad solution. Of course, if all mail servers did exactly the same RFC checks, the problems would be resolved quickly, but I doubt that all mail server vendors or mail server administrators would spend CPU cycles on stuff like that.

    Personally I subscribe to the “be liberal in what you accept, and conservative in what you send” strategy but with the important addition that an implementation should report any errors it detects (e.g. in an error console like the one in Firefox) even if it is able to work around the error. In this way the user can continue using the current version of the software, and the technically savvy user will be aware that there is a problem so that he can report the bug to his software vendor.

    • Yes, the market share bit is absolutely true–look at Opera and older versions of iCab. And yes, you re-make my point–it is impossible to tell whether or not HTML is valid with any certainty because it lacks strictness (and also because the specs are very complicated, particularly when you throw in CSS). My point was that they all should have been strict from the beginning.

      As far as the mail servers go, I think you’re right to some degree, but coding with strictness does avoid those sorts of problems. Once again, the SMTP RFCs are so complicated in some aspects (and were probably unfortunately un-strict in certain aspects in older editions) that errors like this are possible and now people who receive mail have to account for them. (Processing the received emails suffers from a similar problem.)

      In the current environment we have with computers, for the usability reasons that I mention in the article, yes, it’s definitely important to be somewhat liberal with what you receive. But if you’re designing your own protocol, your own language, or something that works only internally to your own system, there’s nothing wrong with being as strict as possible in your own implementation so that you don’t end up being like Email, SMTP, HTML, Perl, etc.

      -Max

    • Oh, and I agree with the “throw an error about it even if you do accept it” bit–that’s a good point.

      -Max

    • That’s a pretty interesting article, and I think he’s made some good observations but come up with the wrong reasons behind them. He’s targeted sloppy standards as successful because of their sloppiness, which I’d disagree with. I think the standards he cites have been successful because they are simple, not because they are sloppy. SOAP, for example, is a nightmare because it’s just way too complex, not because it’s too strict. HTML is successful because it’s easy to understand, not because it’s sloppy. CSS is somewhat “sloppy” but is not as successful as it should be, due to the fact that it’s insanely complex to do certain things with it that should be very simple.

      So generally, I’d argue that the division, for popularity, is simple/complex than strict/sloppy.

      -Max

      • SOAP _is_ very simple (excluding SOAP section 5, which even Don Box says is a travesty) – it’s a very simple message passing system. The strictness of web services comes from the usual schema system – WSDL. In addition to the schema strictness, it also turns out WSDL has a lot of complexity too.

        HTML definitely was successful due to the sloppyness – I remember when I first created a popular public website (early 1994) and if I’d been required to read a book (in 1994!!) or spec on HTML I’d have given up. Instead, I took a look at sun.com and guessed. I sometimes wish that web browsers starting with the earliest Cello’s and Mosaic’s would have alerted the developer to a coding error by highlighting the error in the View Source output – this would have tidied up a lot of the web without having to make HTML parsers strict.

        Look at true XHTML parsers – they are required to stop parsing on the first XML error resulting in a cryptic error message to the user. The sloppyness of HTML allows it to carry on processing and give a “best attempt” at rendering.

        CSS is just the right amount of sloppyness. And it is successful because of it. Unfortunately, as you say CSS is unable to do things that many designers want – which is why the abomination of the CSS3 Advanced Layouts module has been created. Plus, there is no modularization in HTML styling so CSS tends to have unfortunate global effects where only local effects were intended (CSS block formatting context anyone?).

        • Yeah, perhaps I’m more concerned about the complexity of WSDL than I am of SOAP. But if it’s anything about SOAP, yeah, it’s Section 5. 🙂

          Sure, if anybody had been required to read the HTML spec, they would have given up. But what you did to learn HTML is what I did, too–copy somebody else’s. If HTML had been strict from the beginning, that copied HTML would be strict and we’d be fine. If we wanted to do something that wasn’t in the original, we could go read the spec or a good, simple reference. This all works fine for Python. It works for almost every templating system I’ve used. I think it would have worked fine for HTML.

          My problem with CSS is I think it’s somewhat overengineered, leading to complexities in the design that don’t need to be there, that just layer on top of the complexities and confusions of parsing HTML to create implementations so complex that before we had something like Acid2 (written by one of the few people who actually understands the darn spec–and I am not one of those people) there were no actually compliant implementations.

          When nobody can easily create a compliant implementation of a spec, there’s definitely some problem relating to complexity, whether it’s unstrictness or something totally different. 🙂 Having an un-implementable spec is only slightly better than having no spec, and they both lead to implementation-defined languages, which are disastrous if there’s more than one implementation.

          -Max

  2. This WordPress page fails validation horribly. There are escaped quotes, lack of quotes, and single quotes.

    Is there a good (supporting both HTML and XHTML properly) validator for Firefox, displaying the result briefly in statusbar?

    • Yeah, thanks for the note on validation. When you’re using WordPress, you’re at the mercy of plugins, sometimes. I’m fairly sure that my actual base template validates, but there might be some things in some places I have to fix, in some plugins.

      I’m not aware of a FF extension for validation like that, although it would be a good idea and I’d be surprised if it doesn’t exist.

      -Max

  3. It seems like an SGML parser has been added to “Html Validator” without me knowing. So it works for XHTML now (mostly). Thanks for the hint.

    PS: If I try to leave a comment with JavaScript disabled, I get a misleading error message.

    • Ah, interesting. It’s almost certainly something related to Brian’s Threaded Comments. I could check it out if I have some free cycles some time. (I’m not sure how often people actually leave a comment with JS disabled, though.)

      -Max

Leave a Reply