How many 32-bit types might we want?

January 30, 2014 jonskeet 22 Comments

I was recently directed to an article on "tiny types" – an approach to static typing which introduces distinct types for the sake of code clarity, rather than to add particular behaviour to each type. As I understand it, they’re like type aliases with no conversions between the various types. (Unlike plain aliases, an object is genuinely an instance of the relevant tiny type – it doesn’t have "alias erasure" as a language-based solution could easily do.)

I like the idea, and wish it were better supported in languages – but it led me to thinking more about the existing numeric types that we’ve got and how they’re used. In particular, I was considering how in C# the "byte" type is relatively rarely used as a number, with a useful magnitude which has arithmetic performed on it. That does happen, but more often it’s used either as part of other types (e.g. converting 4 bytes from a stream into an integer) or as a sequence of 8 bits.

It then struck me that the situations where we perform bitwise operations and the situations where we perform arithmetic operations are reasonably distinct. I wonder whether it would be worth having five separate types – which could be purely at the language level, if we wanted:

Float32 – regular IEEE-754 32-bit binary floating point type with arithmetic operations but no bitwise operations
Int32 – regular signed integer with arithmetic operations but no bitwise operations
UInt32 – unsigned integer with arithmetic operations but no bitwise operations
Bit32 – a bit sequence with bitwise operations but no arithmetic operations
Identity32 – a 32-bit value which only defines equality

The last type would be used for identities which happened to consist of 32 bits but where the actual bit values were only useful in terms of comparison with other identities. (In this case, an Identity64 might be more useful in practice.)

Explicit conversions which preserved the underlying bit pattern would be available, so you could easily generate a sequence of Identity32 values by having a simple Int32 counter, for example.

At the same time, I’d want to introduce bitwise operations for Bit8 and Bit16 values, rather than the "everything is promoted to 32 bits" that we currently have in Java and C#, reducing the number of pointless casts for code that performs bitwise operations.

The expected benefits would be the increased friction when using a type in an "unexpected" way for the value being represented – you’d need to explicitly cast to the right type – but maintaining a low-friction path when using a value in the expected way.

I haven’t yet figured out what I’d want a Stream.Read(…) call to use. Probably Bit32, but I’m not sure yet.

Anyway, I figured I’d just throw it out there as a thought experiment… any reactions?

22 thoughts on “How many 32-bit types might we want?”

porges says:

January 30, 2014 at 6:47 pm

I feel like the types as proposed don’t make much sense – they are very restrictive without being generally useful.

Consider e.g. an EmailAddress type which wraps a string; this is useful because subscripting is a ‘non-useful’ operation on email addresses.

A type like CustomerId which wraps an int has purpose (you don’t mix up IDs), but Identity32 requires explicit casting to create, and represents … what?

In general I’m a big fan of these ‘small’ types, but C# makes it difficult (consider how much code you have to write to implement a ‘mere’ wrapper around Int32). Haskell has language-level support for “strong-typedefs” with `newtype`, which is very useful.

I’d much rather have easier-to-create tinytypes than a set of predefined ones.

For numeric types, an alternate but interesting distinction is to separate ordinal and cardinal uses, and index them over element types, e.g. Index or Count. C(++) has some conception of this with ptrdiff_t, but with stronger languages we can extend this to this to things like Index + Count resulting in Index. (The definition of quicksort on Wikipedia typechecks nicely under this regime!)

> At the same time, I’d want to introduce bitwise operations for Bit8 and Bit16 values, rather than the “everything is promoted to 32 bits” that we currently have in Java and C#, reducing the number of pointless casts for code that performs bitwise operations.

+1, I do wish integer promotion was not the default in C# :)

LikeLike

Reply
Cheetah says:

January 30, 2014 at 7:19 pm

My initial reaction is that the most useful form of this would be for the language to provide the infrastructure for me to define my own domain tiny types. Core library code didn’t seem like it can imagine enough possible use cases. Sure, making GetHashCode return an Identity32 has advantages, but letting me define my own FooIdentity32 in some simple, coherent fashion is vastly more useful.

This reminds me of a talk I saw at Hope about a library that created a type system separation between SQL strings and user provided data (and generalizations on that theme) as a security tool.

LikeLike

Reply
Virtlink says:

January 30, 2014 at 8:07 pm

Modern compilers are so smart, there should be a general `int` type with no specific size for arithmetic operations. Its size would be determined by the runtime. The idea is that I should not have to (and often do not) care about the size of my ints.

There are two reasons why I would care about the size of my ints. The first is bitwise operations, already covered by your `Bit32` type. The second is arithmetic overflow, which is often an unrecoverable exception (when checked) or a hidden bug (when unchecked). The runtime should simply be able to choose the size of the integer such that its arithmetic operations are as fast as possible. And then, if overflow occurs, the runtime should handle this somehow. Perhaps similar to how you can’t overflow an integer in Python.

By the way, I really like tiny types. The fact that all my fields are expressed as `int` doesn’t mean they all contain the same data. Using tiny types I can’t make mistakes such as adding temperature to a length.

LikeLike

Reply
Barry Kelly says:

January 30, 2014 at 9:14 pm

Pascal, and by inheritance Delphi, has had this feature for a long time.

type
// nobody will be over 200, right?
TAge = 0..200;
// strong alias
TAgeAlias = type TAge;

The problem that crops up is composition. You don’t want to have to have separate definitions for the same function that does the same logical thing to a multiplicity of different types.

The article you link to gives “dollars” as a passing example. How do you define your interest calculation function, if you want it to apply not only to dollars, but yen, euros, pounds, etc.?

For that specific problem, the correct answer is units-of-measure types like F# and Fortress. It can be approximated with templates / generics and a bit of hackery, but proper analysis is best achieved with a compiler that knows specifically about units.

For your example of Stream.Read, it’s not clear whether you mean Bit32 in the input position as a quantity of bytes to read, or as the return value as number of bytes read. Either way, think about composition. Think about the operations you’d probably want to do in order to calculate that value / calculate with that value – typically, figuring out how much is left in a buffer, how much to forward to another layer, whatever. The custom type you define would need to flow through the system, not just dynamically like Perl’s idea of taint, but statically. The bigger the spread, the higher the chances you’re going to run into conflicts between what different bits of code thinks the respective integer types should be.

You’re better off with fewer general types, and generic algorithms that can compose them. For more complicated types, type functions that compose types out of other types are a better approach. You’re already familiar with some type functions – ?/Nullable is one, as is [], * – and generics are how you write new type functions in C# and Java. Unfortunately, composed types get very verbose in most non-Hindley-Milner type systems, as most don’t have global type inference.

tl/dr: units-of-measure solve many of your issues and are preferable if available; custom types (possibly generic) are acceptable otherwise. But try and minimize the number of distinct types in your system to make it more composable with general, generic algorithms. Parameterize types instead.

LikeLike

Reply
Nikolay says:

January 30, 2014 at 11:18 pm

F# has somewhat similar feature called Unit of Measures (http://en.wikibooks.org/wiki/F_Sharp_Programming/Units_of_Measure). There you can say, that this double is “kilo” and you can not assign “meter” to it and it will be statically checked.

LikeLike

Reply
Adam Robinson says:

January 31, 2014 at 1:42 am

I really want a similar facility in C++, a “hard” type alias ( type b acts exactly as type a, but is a distinct type, i.e. a redefinition with a compiler generated sub-name or something ).

Anyway, this is what proper hungarian notation was originally designed for, so there’s always been a recognised need for it.

LikeLike

Reply
Kawa says:

January 31, 2014 at 3:43 am

This makes perfect sense.

LikeLike

Reply
Samuel says:

January 31, 2014 at 4:28 am

I suppose comparison would only be implemented on Float32, Int32 and UInt32 ?

LikeLike

Reply
Martijn Hoekstra says:

January 31, 2014 at 4:57 am

Yes, yes, we want this. As a programmer, I would want this through parametricity. Have a few “plain” bytearray types with a type alias you can do nothing(tm) at all with, apart from use as a type parameter. Then we could have

say we have a nice way to have a type alias for a sized byte array:

type _32 = Byte[4]

then we could have your types parameterised:

Float[_32]
Int[_32]
UInt[_32]
Bit[_32]
Identity[_32]

Stream.Read should probably expose the raw bytearray alias type. If you want to do something with it afterwards, you are going to have to map it to the type you want to use it. Interesting ideas Mr. Skeet!

LikeLike

Reply
Kanitatlan says:

January 31, 2014 at 9:57 am

I have done a lot of work in Delphi and it already through its Pascal inheritance supports a great deal of what you are suggesting. I use specific types extensively for exactly the reasons suggested but would immediately react to the “32” in all your types. Strict typing should hide the size of the storage for your type. The type name should reflect what the type is for.

In Delphi you can extend the functionality of a “descendant” type but one thing that is missing is the ability to hide functionality. I could define a type Identity as an ordinal type with a specified value range but I can’t hide the now inappropriate arithmetic operators (in particular the non-equality comparisons).

LikeLike

Reply
MV says:

January 31, 2014 at 2:54 pm

Yes, I would love it if languages had this. I’ve done this with typedef in C++ – it gives the expressiveness but unfortunately doesn’t enforce the no-crossing rules.

I could however see a couple of concessions to practicality, such as:

1) Identity32 might want to have less-than and greater-then comparisons so that they can be put into dictionaries and/or sorted (the order doesn’t matter, but bringing identical values together might matter).

2) You might be able to assign an Identity32 from a UInt32 without a cast, so as to simplify creation of the Identity32 instances. But not the other way, as there’s no valid use case.

LikeLike

Reply
ian says:

February 10, 2014 at 5:34 am

I crated tiny types for a project that has “weeks” and “hours” both were integers and there was no consistency in the order they were passed to methods.

The tiny types along with explicit conversion to and from int (but not between “weeks” and “hours”) took time to create.

I then changed one method to use them, and fixed all the resulting compiling errors, so changing most methods to use them in the process. The next step was to get the arguments in a consistent order, with the compiler proving 100% safety check while I was doing it.

I did not have one unit test fail, or any bugs get introduced by this refactoring, if I had just swap round the arguments I would have introduced some bugs. I often introduced bugs due to the inconsistent argument order before doing this refactoring.

LikeLike

Reply
- says:

February 10, 2014 at 8:11 am

Hmmm… Here’s a question to perhaps get people thinking differently:

What datatype would allow the programmer maximum ease of use, so that the programmer can just “get stuff done” with the least amount of thinking with regards to data types?

LikeLike

Reply
skeet says:

February 10, 2014 at 8:14 am

@-: That’s certainly a different question, but one I’m not sure *I’d* find helpful. I think that it’s important to think about data types, in terms of not just “getting stuff done” but “getting stuff done *correctly*”.

There are plentiful examples of situations where APIs, languages etc make it really easy to write code which is broken but not obviously so, simply because the data types have been chosen poorly.

LikeLike

Reply
Al says:

February 10, 2014 at 10:16 am

I love the concept of TinyTypes and it looks to me to be what hungarian notion always wanted to be. Static types have saved me so many times-at compile time instead of production, and this appears to be a conceptually simple way to exploit it further.

As you’ve pointed out before (answer#3)-
http://www.yoda.arachsys.com/csharp/teasers-answers.html

I would warn everybody that they’re playing with fire to put Money into ‘double’ instead of a ‘decimal’ value, as the article casually mentions.

LikeLike

Reply
Binary Worrier says:

February 10, 2014 at 10:17 am

Yes, but not built in types. I would love this if we could easily define our own types. Define an “Identity” to be an int, but without arithmetic operators.
Define CustomerIdType that extends Identity, a CustomerIdType cannot be assigned to a ProductIdType.

Agree with previous posts that building a struct around these is far too involved and error prone. A language facility to do this declaratively would be a beautiful thing.

LikeLike

Reply
Kris Vandermotten says:

February 11, 2014 at 4:46 am

Tiny Types are great, and I’d love to have them in C#. Support by the CLR would allow language interoperability, and would ease DLL consumption.

One the other hand, languages like SQL have had this for many years, but I don’t see it being used a lot. Language designers should study why that is.

Also, don’t forget that we have tiny types in C# since version 1.0, but only for integral types. They’re called enums…

LikeLike

Reply
Kudos says:

February 12, 2014 at 3:25 am

What about Boolean32?

LikeLike

Reply
Aik says:

February 18, 2014 at 7:11 am

Why separate arithmetic and logic operators? I would like it use on one data type, for example: I want decide if some arithmetic result is odd than binary operation is useful on Int32. Current value types concept make perfect sence for me and if you want name the types by meaning (I will be sad in code review), than you can define yours by using statement.

LikeLike

Reply
Joseph N. Musser II says:

February 19, 2014 at 9:15 am

An equality-only Identity32 is a cool idea, however, what purpose would it serve apart from a fully numeric type? If you are restricted to equality comparison, at some point you may need to do a value comparison- especially during a self join to find duplicates without finding them in both directions (on a.Name = b.Name and a.ID < b.ID).

LikeLike

Reply
David Sherret says:

February 28, 2014 at 11:34 am

I’ve been using this base class to sort of create TinyTypes in C#: https://gist.github.com/dhsto/9275577

Obviously it’s a bit more expensive to do this, but it’s proven to be extremely useful in situations where I really don’t want to mix anything up. It has also helped with code readability.

LikeLike

Reply
Harold says:

March 7, 2014 at 4:15 am

In general, I like this idea. But not having arithmetic on bitwise types is a severe restriction, as is not having bitwise operations on arithmetic types.
For example, addition and subtraction are very important in many operations on a “rightmost 1” (or 0), it would be a shame to have to cast back and forth just to make something like “reset rightmost set bit”.

LikeLike

Reply

Jon Skeet's coding blog

How many 32-bit types might we want?

22 thoughts on “How many 32-bit types might we want?”

Leave a comment Cancel reply

Share this:

Related

22 thoughts on “How many 32-bit types might we want?”

Leave a comment Cancel reply