Language Features And Library Design

Over the past few years, I’ve come to believe that when learning programming languages, library design is one of the best ways to test your understanding of the language. Writing a library makes you more familiar with the ergonomics of the language and the mechanics of implementing a friendly user interface. It also helps you understand the limitations of a language, and what improvements can be made to it.

While it’s been time-consuming to maintain some of the libraries I’ve published, I do try to find applications that are small enough in scope and unlikely to change much over a long period of time to minimize the need for maintenance. Thanks to human nature, often anything that requires the consensus of more than three people fits the bill.

With Rust, I initially started with C library bindings, but I think the best example for my library thesis lies with rust-jwt, a JSON Web Token parsing library. If you’re unfamiliar with JWTs, you can read about them at jwt.io. If you don’t feel like reading that, you can just think of JWTs as a secure JSON payload accompanied by metadata. In three parts, the header is the metadata, any arbitrary serializable object is the payload/claims, and the signature is the assurance of security.

I won’t really go too deeply into the implementation of the library, but I wanted to surface re-usable patterns in API design. To follow along, you should know that the token is represented by the following struct:

struct Token<H, C, S> {
    header: H,
    claims: C,
    signature: S,
}

Using Explicit Lifetimes, Sometimes

While enums are usually used to represent different states of the same logicial entity, if variants of the enum have different lifetime needs, this can create additional burden for the end user. To see a concrete case, look at the problem of token verification.

When verifying a JWT, the end user often only cares for the payload. The header could be useful, but the signature exists only to guarantee the security of the header and payload. After verification, the signature is unneeded. If the user changes either the header or the payload, the signature would be invalid anyway. If the user changes neither of those, then the original token string should suffice for re-use.

From this we have two general observations:

Splitting a borrowed string is an operation that can be performed without allocations:

fn split_components(token: &str) -> Option<[&str; 3]> {
    let mut components = token.split('.');
    let header = components.next()?;
    let claims = components.next()?;
    let signature = components.next()?;

    Some([header, claims, signature])
}

We now have a borrowed header, borrowed claims, and a borrowed signature. Since the header and payload have to be decoded from Base64 and then JSON, there is no way to avoid ownership and allocations. For the signature, however, there are a few options.

If we immediately verify the signature, the signature type only needs to signal that the header and claims are valid. Verified signatures are therefore represented by this empty struct:

struct Verified;

If for some reason the user doesn’t want to immediately verify the token, the parsing instead returns a signature of:

struct Unverified<'a> {
    header_str: &'a str,
    claims_str: &'a str,
    signature_str: &'a str,
}

Since the signature verification depends on the original header and claims strings, we need to keep them around. However, we don’t want to pay the cost of allocation in order to do so. Since this intermediary costs no allocations to maintain, we can re-use the code in the verification workflow at low cost:

let unverified: Token<H, C, Unverified<_>> = Token::parse_unverified(&token_str)?;
unverified.verify_with_key(&key)

Since we only use explicit lifetimes for this one edge case (I hope you aren’t regularly using unverified tokens), we don’t want to pollute user code with them. If we had used an enum instead of separate types, we would be stuck carrying around the lifetime for verified tokens too:

enum Signature<'a> {
    Verified, // doesn't use lifetime at all
    Unverified{
        header_str: &'a str,
        claims_str: &'a str,
        signature_str: &'a str,
    }
}

In using a parameter in the struct instead of an enum, we avoid ownership of the original token string, have the option to use explicit lifetimes when needed, and avoid burdening the end user with unnecessary implementation details made to save on allocations.

Trait Inheritance Tax

Now for an example of where the language comes a bit short in designing a user-friendly interface. It’s also a justification for something that I personally see as somewhat unsightly.

There are multiple traits in the library that have a method called algorithm_type with exactly the same signature. I wish this were avoidable, but unfortunately it seems that this issue needs to be resolved before doing so.

The problem appears when we try to store keys inside any container that requires ownership over the key. If the keys are different types then we need to cast them into boxed trait objects.

An HS512 only key store could look like this:

let mut key_store = BTreeMap::new();
let key1: Hmac<Sha512> = Hmac::new_varkey(b"first")?;
let key2: Hmac<Sha512> = Hmac::new_varkey(b"second")?;
key_store.insert("first_key".to_owned(), key1);
key_store.insert("second_key".to_owned(), key2);

No extra boxing is necessary and the types are maintained so all the traits the keys implement are available.

If we add even just different key lengths, then we have to start boxing the keys:

let mut key_store = BTreeMap::new();
let key1: Hmac<Sha256> = Hmac::new_varkey(b"first")?;
let key2: Hmac<Sha512> = Hmac::new_varkey(b"second")?;
key_store.insert("first_key", Box::new(key1) as Box<dyn VerifyingAlgorithm>);
key_store.insert("second_key", Box::new(key2) as Box<dyn VerifyingAlgorithm>);

Now we have to implement traits on Box<dyn VerifyingAlgorithm> types. We can do this through AsRef for more flexibility:

impl<T: AsRef<dyn VerifyingAlgorithm>> VerifyingAlgorithm for T {
    fn algorithm_type(&self) -> AlgorithmType {
        self.as_ref().algorithm_type()
    }

    fn verify_bytes(&self, header: &str, claims: &str, signature: &[u8]) -> Result<bool, Error> {
        self.as_ref().verify_bytes(header, claims, signature)
    }

Suppose we decide to pull out the algorithm_type method into a different trait. It’s the same for both signing and verifying, so this seems logical.

pub trait HasAlgorithmType {
    fn algorithm_type(&self) -> AlgorithmType;
}

pub trait VerifyingAlgorithm: HasAlgorithmType {
// ...
}

However, now the blanket implementation for AsRef is no longer valid. AsRef<dyn VerifyingAlgorithm> can only refer to one trait. Even through all VerifyingAlgorithm types have to implement HasAlgorithmType, we can’t specify that this is the case. The easiest workaround was to just redefine the method.