I was recently directed to an article on "tiny types" – an approach to static typing which introduces distinct types for the sake of code clarity, rather than to add particular behaviour to each type. As I understand it, they’re like type aliases with no conversions between the various types. (Unlike plain aliases, an object is genuinely an instance of the relevant tiny type – it doesn’t have "alias erasure" as a language-based solution could easily do.)
I like the idea, and wish it were better supported in languages – but it led me to thinking more about the existing numeric types that we’ve got and how they’re used. In particular, I was considering how in C# the "byte" type is relatively rarely used as a number, with a useful magnitude which has arithmetic performed on it. That does happen, but more often it’s used either as part of other types (e.g. converting 4 bytes from a stream into an integer) or as a sequence of 8 bits.
It then struck me that the situations where we perform bitwise operations and the situations where we perform arithmetic operations are reasonably distinct. I wonder whether it would be worth having five separate types – which could be purely at the language level, if we wanted:
- Float32 – regular IEEE-754 32-bit binary floating point type with arithmetic operations but no bitwise operations
- Int32 – regular signed integer with arithmetic operations but no bitwise operations
- UInt32 – unsigned integer with arithmetic operations but no bitwise operations
- Bit32 – a bit sequence with bitwise operations but no arithmetic operations
- Identity32 – a 32-bit value which only defines equality
The last type would be used for identities which happened to consist of 32 bits but where the actual bit values were only useful in terms of comparison with other identities. (In this case, an Identity64 might be more useful in practice.)
Explicit conversions which preserved the underlying bit pattern would be available, so you could easily generate a sequence of Identity32 values by having a simple Int32 counter, for example.
At the same time, I’d want to introduce bitwise operations for Bit8 and Bit16 values, rather than the "everything is promoted to 32 bits" that we currently have in Java and C#, reducing the number of pointless casts for code that performs bitwise operations.
The expected benefits would be the increased friction when using a type in an "unexpected" way for the value being represented – you’d need to explicitly cast to the right type – but maintaining a low-friction path when using a value in the expected way.
I haven’t yet figured out what I’d want a Stream.Read(…) call to use. Probably Bit32, but I’m not sure yet.
Anyway, I figured I’d just throw it out there as a thought experiment… any reactions?
22 thoughts on “How many 32-bit types might we want?”
I feel like the types as proposed don’t make much sense – they are very restrictive without being generally useful.
Consider e.g. an EmailAddress type which wraps a string; this is useful because subscripting is a ‘non-useful’ operation on email addresses.
A type like CustomerId which wraps an int has purpose (you don’t mix up IDs), but Identity32 requires explicit casting to create, and represents … what?
In general I’m a big fan of these ‘small’ types, but C# makes it difficult (consider how much code you have to write to implement a ‘mere’ wrapper around Int32). Haskell has language-level support for “strong-typedefs” with `newtype`, which is very useful.
I’d much rather have easier-to-create tinytypes than a set of predefined ones.
For numeric types, an alternate but interesting distinction is to separate ordinal and cardinal uses, and index them over element types, e.g. Index or Count. C(++) has some conception of this with ptrdiff_t, but with stronger languages we can extend this to this to things like Index + Count resulting in Index. (The definition of quicksort on Wikipedia typechecks nicely under this regime!)
> At the same time, I’d want to introduce bitwise operations for Bit8 and Bit16 values, rather than the “everything is promoted to 32 bits” that we currently have in Java and C#, reducing the number of pointless casts for code that performs bitwise operations.
+1, I do wish integer promotion was not the default in C# :)
My initial reaction is that the most useful form of this would be for the language to provide the infrastructure for me to define my own domain tiny types. Core library code didn’t seem like it can imagine enough possible use cases. Sure, making GetHashCode return an Identity32 has advantages, but letting me define my own FooIdentity32 in some simple, coherent fashion is vastly more useful.
This reminds me of a talk I saw at Hope about a library that created a type system separation between SQL strings and user provided data (and generalizations on that theme) as a security tool.
Modern compilers are so smart, there should be a general `int` type with no specific size for arithmetic operations. Its size would be determined by the runtime. The idea is that I should not have to (and often do not) care about the size of my ints.
There are two reasons why I would care about the size of my ints. The first is bitwise operations, already covered by your `Bit32` type. The second is arithmetic overflow, which is often an unrecoverable exception (when checked) or a hidden bug (when unchecked). The runtime should simply be able to choose the size of the integer such that its arithmetic operations are as fast as possible. And then, if overflow occurs, the runtime should handle this somehow. Perhaps similar to how you can’t overflow an integer in Python.
By the way, I really like tiny types. The fact that all my fields are expressed as `int` doesn’t mean they all contain the same data. Using tiny types I can’t make mistakes such as adding temperature to a length.
Pascal, and by inheritance Delphi, has had this feature for a long time.
// nobody will be over 200, right?
TAge = 0..200;
// strong alias
TAgeAlias = type TAge;
The problem that crops up is composition. You don’t want to have to have separate definitions for the same function that does the same logical thing to a multiplicity of different types.
The article you link to gives “dollars” as a passing example. How do you define your interest calculation function, if you want it to apply not only to dollars, but yen, euros, pounds, etc.?
For that specific problem, the correct answer is units-of-measure types like F# and Fortress. It can be approximated with templates / generics and a bit of hackery, but proper analysis is best achieved with a compiler that knows specifically about units.
For your example of Stream.Read, it’s not clear whether you mean Bit32 in the input position as a quantity of bytes to read, or as the return value as number of bytes read. Either way, think about composition. Think about the operations you’d probably want to do in order to calculate that value / calculate with that value – typically, figuring out how much is left in a buffer, how much to forward to another layer, whatever. The custom type you define would need to flow through the system, not just dynamically like Perl’s idea of taint, but statically. The bigger the spread, the higher the chances you’re going to run into conflicts between what different bits of code thinks the respective integer types should be.
You’re better off with fewer general types, and generic algorithms that can compose them. For more complicated types, type functions that compose types out of other types are a better approach. You’re already familiar with some type functions – ?/Nullable is one, as is , * – and generics are how you write new type functions in C# and Java. Unfortunately, composed types get very verbose in most non-Hindley-Milner type systems, as most don’t have global type inference.
tl/dr: units-of-measure solve many of your issues and are preferable if available; custom types (possibly generic) are acceptable otherwise. But try and minimize the number of distinct types in your system to make it more composable with general, generic algorithms. Parameterize types instead.
F# has somewhat similar feature called Unit of Measures (http://en.wikibooks.org/wiki/F_Sharp_Programming/Units_of_Measure). There you can say, that this double is “kilo” and you can not assign “meter” to it and it will be statically checked.
I really want a similar facility in C++, a “hard” type alias ( type b acts exactly as type a, but is a distinct type, i.e. a redefinition with a compiler generated sub-name or something ).
Anyway, this is what proper hungarian notation was originally designed for, so there’s always been a recognised need for it.
This makes perfect sense.
I suppose comparison would only be implemented on Float32, Int32 and UInt32 ?
Yes, yes, we want this. As a programmer, I would want this through parametricity. Have a few “plain” bytearray types with a type alias you can do nothing(tm) at all with, apart from use as a type parameter. Then we could have
say we have a nice way to have a type alias for a sized byte array:
type _32 = Byte
then we could have your types parameterised:
Stream.Read should probably expose the raw bytearray alias type. If you want to do something with it afterwards, you are going to have to map it to the type you want to use it. Interesting ideas Mr. Skeet!
I have done a lot of work in Delphi and it already through its Pascal inheritance supports a great deal of what you are suggesting. I use specific types extensively for exactly the reasons suggested but would immediately react to the “32” in all your types. Strict typing should hide the size of the storage for your type. The type name should reflect what the type is for.
In Delphi you can extend the functionality of a “descendant” type but one thing that is missing is the ability to hide functionality. I could define a type Identity as an ordinal type with a specified value range but I can’t hide the now inappropriate arithmetic operators (in particular the non-equality comparisons).
Yes, I would love it if languages had this. I’ve done this with typedef in C++ – it gives the expressiveness but unfortunately doesn’t enforce the no-crossing rules.
I could however see a couple of concessions to practicality, such as:
1) Identity32 might want to have less-than and greater-then comparisons so that they can be put into dictionaries and/or sorted (the order doesn’t matter, but bringing identical values together might matter).
2) You might be able to assign an Identity32 from a UInt32 without a cast, so as to simplify creation of the Identity32 instances. But not the other way, as there’s no valid use case.
I crated tiny types for a project that has “weeks” and “hours” both were integers and there was no consistency in the order they were passed to methods.
The tiny types along with explicit conversion to and from int (but not between “weeks” and “hours”) took time to create.
I then changed one method to use them, and fixed all the resulting compiling errors, so changing most methods to use them in the process. The next step was to get the arguments in a consistent order, with the compiler proving 100% safety check while I was doing it.
I did not have one unit test fail, or any bugs get introduced by this refactoring, if I had just swap round the arguments I would have introduced some bugs. I often introduced bugs due to the inconsistent argument order before doing this refactoring.
Hmmm… Here’s a question to perhaps get people thinking differently:
What datatype would allow the programmer maximum ease of use, so that the programmer can just “get stuff done” with the least amount of thinking with regards to data types?
@-: That’s certainly a different question, but one I’m not sure *I’d* find helpful. I think that it’s important to think about data types, in terms of not just “getting stuff done” but “getting stuff done *correctly*”.
There are plentiful examples of situations where APIs, languages etc make it really easy to write code which is broken but not obviously so, simply because the data types have been chosen poorly.
I love the concept of TinyTypes and it looks to me to be what hungarian notion always wanted to be. Static types have saved me so many times-at compile time instead of production, and this appears to be a conceptually simple way to exploit it further.
As you’ve pointed out before (answer#3)-
I would warn everybody that they’re playing with fire to put Money into ‘double’ instead of a ‘decimal’ value, as the article casually mentions.
Yes, but not built in types. I would love this if we could easily define our own types. Define an “Identity” to be an int, but without arithmetic operators.
Define CustomerIdType that extends Identity, a CustomerIdType cannot be assigned to a ProductIdType.
Agree with previous posts that building a struct around these is far too involved and error prone. A language facility to do this declaratively would be a beautiful thing.
Tiny Types are great, and I’d love to have them in C#. Support by the CLR would allow language interoperability, and would ease DLL consumption.
One the other hand, languages like SQL have had this for many years, but I don’t see it being used a lot. Language designers should study why that is.
Also, don’t forget that we have tiny types in C# since version 1.0, but only for integral types. They’re called enums…
What about Boolean32?
Why separate arithmetic and logic operators? I would like it use on one data type, for example: I want decide if some arithmetic result is odd than binary operation is useful on Int32. Current value types concept make perfect sence for me and if you want name the types by meaning (I will be sad in code review), than you can define yours by using statement.
An equality-only Identity32 is a cool idea, however, what purpose would it serve apart from a fully numeric type? If you are restricted to equality comparison, at some point you may need to do a value comparison- especially during a self join to find duplicates without finding them in both directions (on a.Name = b.Name and a.ID < b.ID).
I’ve been using this base class to sort of create TinyTypes in C#: https://gist.github.com/dhsto/9275577
Obviously it’s a bit more expensive to do this, but it’s proven to be extremely useful in situations where I really don’t want to mix anything up. It has also helped with code readability.
In general, I like this idea. But not having arithmetic on bitwise types is a severe restriction, as is not having bitwise operations on arithmetic types.
For example, addition and subtraction are very important in many operations on a “rightmost 1” (or 0), it would be a shame to have to cast back and forth just to make something like “reset rightmost set bit”.