En Garde – Untapped Type Power

I was teaching an introductory Java class the other day. We were covering some of the basic methods available to handle some common tasks. Among the items I showed them was the Integer.valueOf() method. In Java, valueOf() simply takes a String representation of an integer and returns the strongly-type value of that integer. I wanted my students to know it existed, and I, of course, pointed out a common pitfall. If you pass valueOf() a String that is not a valid representation of an integer, you will see a runtime exception. I paused after I pointed it out in class for a second. I asked, “Why on earth are we allowed to pass in a String value that doesn’t represent an integer in the first place? I have no idea.”

I don’t think my students quite took my meaning, but I do think the question is valid. Java and C# style object oriented programming (OOP) includes a way to make sure function (or method) parameters meet requirements: the type system. We’re told we should use it from the time we’re beginners at these languages.

So why does almost no one actually follow the best practices for preventing runtime failures at compile time?

Backing Up a Step

The problem we’re addressing here is a simple one. How can we force people using a function (or method) to respect our parameter requirements? When writing a function (or method and from here on I’ll [try to] just say function) we want to make some assumptions about our parameters, but it can be tough to decide how to handle the situation where the parameters provided are wrong. What can we do?

  • In the olden days, we’d maybe put a note in the documentation for our function and it’d be the caller’s responsibility to make sure they met the requirements; “Not my job.”. If they missed our note or ignored it, then the outcome would be “undetermined” which would usually mean some sort of spectacular crash and errors bringing our whole system down. Even worse, we could be creating a ticking time bomb of a security hole waiting to be exploited.
  • The next step was to at least add exceptions to our languages so the caller would have a chance to catch the error when they don’t follow the rules, but that wasn’t really solving the problem, the caller could ignore the exceptions*, and it could leave unclear the source of the problem.
  • We can add guard clauses in our method and throw an exception that’s at least meaningful and also to be 100% sure my code will only execute on values that meet my assumptions. But that’s still not totally helpful. We’re not forcing the caller to use correct parameters, and we have no idea if they’re actually handling the exception we’re throwing back at them. Exceptions are for truly exceptional situations, and “bad input” happens all the time, right?

I find none of these solutions to be satisfactory. Again, what would be ideal is getting the caller to pre-validate the parameters before using our function. I want to only pass a String value to valueOf() if I know it’s a proper integer representation.

Functional Inspiration

For a proper solution, we can look to functional programming. In functional programming (at least the ML/F# style of functional programming I’m most familiar with [and not at all a expert in]), we’d have access to a flexible type system as well as powerful pattern matching. Now we can simply define a new type that expresses our expectations, and give a function to weed out the pieces that don’t meet them. Then anyone using our function knows they have to get that type we defined first, and pattern matching makes it very easy to decide what to do when the input isn’t valid. Boom, now we know every case is handle and that our code will execute as expected.

In F#, a possible type to solve our Integer.valueOf() problem from my Java class would be:

type IntConvertableString =
| Invalid of value : string * reason : string
| Valid of string

We split the possibilities along our function’s needs. Then we just need a function that can take a “regular” string and return an IntConvertableString.

let result value = CheckConvertible(value)
| Invalid -> 0
| Valid v -> Convert(v)

The Solution – Purposeful Types

Here is the kicker. Java and C# style OOP can already do this, too. Their types systems aren’t quite as flexible as F#, but they’re definitely powerful enough.

public abstract class IntConvertableString
{
    internal IntConvertableString(string value)
    {
        this.value = value;
    }
    private readonly string value;
    public string Value { get { return this.value; } }

    public sealed class Valid : IntConvertableString
    {
        internal Valid(string value) : base(value) { }
    }

    public sealed class Invalid : IntConvertableString
    {
        internal Invalid(string value, string message) : base(value) {
            this.message = message;
        }
        private readonly string message;
        public string Message { get { return this.message; } }
    }
}

There we go! Not quite as terse as F#, right? But it expresses the same idea. We use internal constructors to make sure (or kind-of sure) that only this class can instantiate these objects. I could put the Valid and Invalid classes in their own files, but when I put them like this, when we use them we get a nice indicator of where the Validate() function might live.

public static int ConvertToInt(IntConvertableString.Valid s) {
    return int.Parse(s.Value);
}

And here is a simple validator for this particular problem, which we put in the IntConvertableString class:

public static IntConvertableString Validate(string value)
{
    if (value.Any((c) => !Char.IsNumber(c)))
    {
        return new Invalid(value, "All characters of the string must be numbers!");
    }
    return new Valid(value);
}

And here is how we can actually use our new process (seen here in a test method):

public void ConvertWithType_works_with_valid_string()
{
    var someInput = "105012";
    var result = IntConvertableString.Validate(someInput);
    if (result.GetType() == typeof(IntConvertableString.Valid))
    {
        var val = Examples.ConvertToInt((IntConvertableString.Valid)result);
        // now val has a type of int and a value of 105012 - ready to use!
    } else {
        // here we have a natural place to handle bad input
    }
}

Notice the GetType() call and cast to the property type when using ConvertToInt(). It’s there that we force our caller to make an effort to pre-validate the input.

Downsides

You could argue this looks suspiciously like a try/catch block, and I’d agree. I think that’s unavoidable. Even the functional F# equivalent using a match expression is vaguely try/catch-ish. I don’t think, however, that that means we should just fall back to just using an ordinary try/catch. I think the if/else program flow is much more explicit about what is going on, and is easier to read and follow than try/catch. The if/then certainly doesn’t involve any bizarre jumps around that a finally block would involve. We also don’t have any chance of catching an error we don’t mean to. Finally, try/catch was always explained to me to be meant to catch truly exceptional cases, things that are not ordinary. Poorly formatted input is a very ordinary issue that should be handled without exceptions.

You could argue that we’re “wasting” resources instantiating our validation objects and doing a type check. I don’t find this a reasonable objection, since doing things the “right” way would involve a try/catch block with exception objects and stack unwinding, which are not resource-efficient operations. I think this solution would be better or no worse. I certainly would not prematurely optimize around it.

One real problem is that the caller could still do an unchecked cast to the valid type and pass it in. This can (obviously) cause a failure at runtime. This is unavoidable in Java or C#. But, we are at lease forcing the caller to make that conscious decision to improperly use the pattern.

This pattern doesn’t prevent mistakes from getting into my code completely. If I setup the validator incorrectly, then my code can still bomb out. We still need to write unit tests. But at least now we can test the validator code separately from the actual execution code. To me that’s actually a plus.

It does seem more indirect and confusing. There is a disconnect on where the validator is supposed to come from if you’re not aware of the pattern, but I think the parameter type helps know where to look. There is no language / tool support for this pattern.

Null values still need to be guarded against (more on that in a different post).

Upsides

The caller has no excuses for not handling the negative case where parameters are incorrect. They have to make a very purposeful bad coding decision to avoid dealing with the invalid value.

We have no excess guard clauses in our function code!  ConvertToInt() does one thing and one thing only!

We can use the validator function in other parts of the code base, as a cross-cutting concern for UI, web services input, or whatever.

We avoid exceptions altogether! I do not like exceptions; at least not for ordinary code like this.

Not Just Conversion

Our example here is converting a string to a typed integer value, but there are many, many more. The humble division operator is one that’s ripe to have this pattern applied to avoid all those nasty DivideByZero exceptions.  At the fundamental utility-function level, this sort of explicit case-handling is really important, as these bugs can creep up so quickly, and slip by without notice.

Even getting into domain modeling, the potential for security errors still abound, as this XKCD comic demonstrates with a person’s name unexpectedly containing punctuation (which wouldn’t happen with a PersonName validated type).

More on Wrapper Types

We’ve also always been meant to do this. I remember hearing over and over in my introductions to OOP meant years ago a phrase similar to this one: “When designing your real classes, you’ll want to create wrappers for the primitive types in case you want to change your underlying implementation later on. But since this is an introduction, we’re not going to do that.” We’re actually not supposed to use int, double, char, or String. I remember reading and hearing this many times (and I apologize that my memory is bad enough that I can’t find a reference).

Eric Lippert (and I swear this isn’t the only blog I read) has been posting about an OCAML ZORK project he’s working on, and he’s careful to always wrap his primitive value types in user-defined types to avoid confusion and conflicts in function parameters. Even in a functional language we’re supposed to wrap our types!

Yet no one in OOP-land does this. Not even the Java or C# libraries do this. They expose primitive value types in classes all over the place. They use unverified Strings as input to functions.

The reason usually given for needing type wrappers was (as above), “in case you want to change your underlying implementation type later on.”  We need them to make refactoring easier.  This might make sense for, say, Student IDs or something where we could change from a String value to an actual Identifier class later on, but it’s less obvious that my valueOf() method is going to change its implementation at a future date. I think a better reason to use wrapper types whenever possible is as a remedy for this guarding issue.

The code above, while pretty clear (in my opinion), and relatively self-contained is still bulky compared to not validating the input, and it also isn’t directly supported by the language and tooling. It’s at best a programming pattern, meaning it’s something that’s ignorable.

Conclusion

If I’ve convinced you at all that this is the proper way to handle this situation, then you, now, too, are asking that same question from the start of the article:

Why does almost no one actually follow the best practices for preventing runtime failures at compile time?

People follow this pattern almost unfailingly in functional languages. I think the brevity of the type system makes it seem downright silly not to follow it. The Java and C# OOP style is just too heavy, and the types feel too big to be used for such a small, simple task a validation. It may also be that Java and C# steered a little too close to their non-OOP roots in C; the implementors and early users had trouble kicking old habits. That’s my guess, but I certainly wasn’t around during the early days.

For my Java students, it’s worrisome, since they’re probably going to pick up the same habits every other Java programmer has and be very lax in checking inputs, they’re not going to use the full power of the type system, and they aren’t going to realize just how awful null values are until one or two giant production bugs pop up as a result of it. The end result is more buggy software, more security holes, and I feel it’s for no good reason.

The solution is right there, waiting for us to use it.

 

One thought on “En Garde – Untapped Type Power

  1. When it comes to atoi-type stuff, I tend to roll my own. I find that less work than reading the manual long and hard enough to gain a comprehensive understanding of what the library method’s policies might be for everything from leading spaces to digit separators to exponential notation. If my source of allegedly numeric strings is interactive input, I syntax check literally following every keystroke (or paste).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s