Category Archives: Wacky Ideas

Displaying NDI sources on Stream Decks

In the course of my work on our local church A/V system, I’ve spent quite a lot of time playing with Elgato Stream Decks and NDI cameras. It only occurred to me a week or so ago that it would be fun to combine them.

The Stream Deck screens are remarkably capable – they’re 96×96 or 72×72 pixels (depending on the model) and appear to have a decent refresh rate. I tend to think of them just in terms of displaying simple icons or text, but I’d already seen a video of a Stream Deck displaying a video so it wasn’t too much of a leap to display live camera outputs instead.

I should make it clear that I had absolutely no practical reason for doing this – although as it happens, the local tweaks I’ve made to the C# NDI SDK have proved useful in At Your Service for enabling a preview camera view without opening a second NDI network stream.

Tweaking the NDI SDK

The NDI SDK comes with a demo C# sample library. It’s not clear how much this is intended to be for production use, but we’ve been using it in At Your Service for quite a while with very few issues. (We’re pinned at a slightly old version of the runtime due to some smoothness issues with later versions, which I’m hoping the NDI folks will sort out at some point – I’ve given them a diagnostic program to show the issues.)

The SDK comes with a WPF “receive view” which displays a camera in a WPF control, as you’d expect. Now I don’t know enough about WPF to try to use the Stream Deck as an output device for WPF directly. It may be feasible, but it’s beyond my current skillset. However, I was able to reuse parts of the receive view to create a new “display-agnostic” class, effectively just dealing with the fiddly bits of the NDI interop layer and providing video frames (already marshalled to the dispatcher thread) as raw data via an event handler. Multiple consumers can then subscribe to that event and process the frames as they wish – in this case, drawing them to the Stream Deck, but potentially delivering them to multiple displays at the same time.

The simplest image processing imaginable

So, the frames are being delivered to my app in a raw 4-bytes-per-pixel format. How do we draw them on the Stream Deck? I’ve been using StreamDeckSharp which is has been pretty simple and reliable. It has multiple ways of drawing on buttons, but the one we’re interested in here is passing in raw byte arrays in a 3-bytes-per-pixel format. (As it happens, the Stream Deck SDK then needs to encode each frame on each button as a JPEG. It’s a pity that the data can’t be transferred to the Stream Deck in raw form, but hey.) All we need to do is scale the image appropriately.

Again, there may be better ways of doing this, such as asking WPF to write the full frame to a Graphics and performing appropriate resizing along the way. But I figured I’d start off with something simpler – and it turned out to be perfectly adequate. I’m not massively concerned with obtaining the absolute best quality image possible, so rather than using all the pixels and blending them appropriately, I’m just taking a sampling approach: if I’m displaying the original 1920×1080 image on a 96×96 button, I’ll just take 96×96 pixel values with as simple code as I could work out.

In the end, I decided to stick to integer scaling factors and just center the image. So to take the above example, 1080/96 is 11.25, so I take samples that are 11 pixels apart in the original image, trimming the top/bottom borders a little bit (because of the truncation from 11.25 to 11) and the left/right borders a lot (because the buttons are square, and it’s better to crop the image than to letter-box it). What happens if you’ve got an image which is just white dots that are 11 pixels apart, and black everywhere else? You end up with a white image (assuming the dots are aligned with the sampling) – but in a camera image, that doesn’t really happen.

The maths ends up being slightly more fiddly when displaying a frame across several buttons. There are are gaps between the buttons, and while we could just ignore them, it looks very weird if you do – if a straight line crosses multiple buttons, it ends up looking “broken” due to the gap, for example. Instead, if we take the gaps into account as if we were drawing them, it looks fine. The arithmetic here isn’t actually hard in any sense of the word – but it still took me a few goes to get right.

Putting it together

Of course, Stream Deck buttons aren’t just screens: they’re pressable buttons as well. I haven’t gone to down on the user interface here: the left-most column just buttons just displays the NDI sources, and pressing one of those buttons displays that source on the rest of the buttons. That part was really simple – as you’d probably expect it to be.

The initial prototype took about two or three hours to write (bearing in mind I already had experience with both the NDI code and the Stream Deck SDK), and then it took another couple of hours to polish it up a bit and get it ready for publication. The application code is all available on my DemoCode GitHub repository; unfortunately even if you happen to have an NDI camera and a Stream Deck, you won’t be able to use it without my tweaks to the NDI C# code. (If the C# part of the NDI SDK is ever made open source, I’ll happily fork it and include my tweaks there.)

I’ve put a YouTube video of the results. Hope you enjoy them. I’ve had a lot of fun with this project, despite the lack of practical applications for it.

Ultimate Man Cave: voice automation for my shed

Source code for everything is on Github. It probably won’t be useful to you unless you’ve got very similar hardware to mine, but you may want to just have a look.

Background

Near the end of 2015, we had a new shed built at the back of our garden. The term “shed” is downplaying it somewhat – it’s a garden building, about 7m x 2.5m, with heating, lighting and an ethernet connection from the house.

It’s divided in half, with one half being a normal shed (lawnmower, wheelbarrow, tools etc) and one half being my office for working from home. Both sides are also used for general storage – we have a lot of stuff to sort out from a loft conversion a few years ago.

The shed

It only took about three days of using the shed for me to work out that I wanted remote-controlled lighting. If I’m going out there at 6.30am in winter, it’s pretty dark – so it’s really useful to be able to turn the lights on from the house first, so I can negotiate the muddier bits of the garden, see the keyhole to unlock it etc.

After a little research, this turned out to be pretty easy: MiLight is simple and relatively cheap. The equivalent of $100 got me four lights and a wifi controller box. It only took me a few minutes to configure it to talk to my wifi, install the Light Controller android app, and I could easily turn my lights on and off from my phone from the house, before stepping outside. Yay. First steps to home automation.

I won’t go into all the details of the rest of the tech in my shed, but the important parts for the purposes of this post are:

Command-line automation

Sometimes, I’m too lazy to reach for my phone when I want to turn on the lights. Very much a first world problem, I realize. And not so much a problem, as an opportunity to see what’s feasible.

So, I looked around the net for code related to MiLight / EasyBulb, and found (amongst other things) Andy Scott’s MiLight.NET library on Github. A small amount of tweaking, and I had a short console app allowing me to run “lights on” or “lights off” which did the obvious thing. Amongst other things, copying this onto an Intel NUC allowed me to turn the lights off via remote desktop when Holly messaged me at the (Google) office to tell me that I’d left them on. It also meant I could schedule a task to turn the lights off at 10.30pm automatically, in case I forgot when I came in.

For a few months, that kept me satisfied… but it was never going to be the final solution.

The next step was to look at other aspects I could automate, and both the amplifier/receiver and the Sonos unit were obvious targets. I knew both had network support, as I already had apps for both on my phone, but I had no idea what the protocols involved were. The amplifier lives in an A/V cabinet, and I normally keep the doors of that shut – so just turning it on, setting the source, and changing the volume either involved getting the phone out or opening the cabinet. Again, could do better.

Sonos supports UPnP/SOAP for control. An old blog post got me started, and then I used Intel Device Spy to work out what else I could easily do. (I don’t have very demanding requirements – just play/pause, set volume, next/previous track is fine.)

It turns out that Onkyo has its own protocol called ISCP (Integra Serial Control Protocol) which has a network binding called eISCP. There’s remarkably good documentation in the form of an Excel spreadsheet, providing more information than I’m ever likely to need.

Implementing both of these was slightly faffy. The eISCP code didn’t work for some time, then started working – presumably with some minor tweak, but it wasn’t clear to me which of the many tweaks I made actually fixed it. The Sonos code worked fairly soon, but was very inelegant for quite a while.

Initially, this was all driven from the command line. I introduced a very simple sort of discovery, separating out controllers from their commands:

public interface IController
{
    string Name { get; }
    IImmutableList<ICommand> Commands { get; }
}

public interface ICommand
{
    string Name { get; }
    string Description { get; }

    void Execute(params string[] arguments);
}

There’s then a Factory class with a static AllControllers property. (I’m not keen on the naming here, but we’ll come to that later.)

The fact that Execute takes a string array is indicative of its use for a command line application – although looking at it now, I might have made it IEnumerable given that I’ll always be skipping the first actual argument which identifies the controller.

Anyway, this allows a very simple command line app which doesn’t know anything about lights, music etc – it just offers you the controllers and commands it finds.

There’s only actually one implementation of IController, calledReflectiveController. You pass it the real controller to wrap, which can be any instance of a type with a description and with public methods which also have descriptions. These descriptions are provided with an attribute. The arguments passed to Execute are then converted to the method parameter types using Convert.ChangeType. Crude but effective.

With this in place, adding a new command to an existing controller is just a matter of adding a public method. Adding a new controller is just a matter of creating a new class with a description, and adding it to the list of controllers in Factory. It’s all really, really simple.

Deploy to the Pi!

This was the aim all along, of course – I’ve been wanting to try out Windows IoT edition, and put my Raspberry Pi to good use, and try out Windows UAP to get a feeling for it. (In particular, I want to learn about some of the constraints I’ll run into with Noda Time 2.0.) This project was a fantastic excuse to do all three.

I started off by building the application just on my laptop. This is one of the lovely benefits of universal apps – you can get them working in a convenient environment first, then deploy elsewhere when you’re ready.

In fact, the very first version of the app didn’t have any speech recognition – it just had buttons to turn the lights on or off. I checked that this worked on both my laptop and the Raspberry Pi – it was nice to see that Windows IoT still supports a UI over HDMI, and it all worked fine, first time. A few years ago, this would have been absolutely stunning in itself – but I think we’re starting to take portability for granted.

Voice automation

On to the final steps: adding speech recognition.

I had a bit of a false start, as there are multiple approaches to speech recognition in Windows UAP. Initially I tried using Cortana, but never got that to work. Instead, I went with the Windows.Media.SpeechRecognition library, which worked pretty much immediately. Again, my initial attempt was more complicated than it needed to be, using an SRGS grammar file. This worked, but it was fiddly. When I discovered the SpeechRecognitionListConstraint class, it was beautiful… it’s literally just a list of strings, and the speech recognizer raises an event when any of those strings is recognized.

The code required to start the speech recognition is trivial:

private async void RegisterVoiceActivation(object sender, RoutedEventArgs e)
{
    recognizer = new SpeechRecognizer
    {
        Constraints = { new SpeechRecognitionListConstraint(handlers.Keys) }
    };
    recognizer.ContinuousRecognitionSession.ResultGenerated += HandleVoiceCommand;
    recognizer.StateChanged += HandleStateChange;

        SpeechRecognitionCompilationResult compilationResult = await recognizer.CompileConstraintsAsync();

    if (compilationResult.Status == SpeechRecognitionResultStatus.Success)
    {
        await recognizer.ContinuousRecognitionSession.StartAsync();
    }
    else
    {
        await Dispatcher.RunIdleAsync(_ => lastState.Text = $"Compilation failed: {compilationResult.Status}");
    }
}

Given the way we’re compiling the constraints, I’d be reasonably happy not checking the compilation result, but I just never took that code away after using it for SRGS (where it was very much required).

The HandleVoiceCommand method just checks whether the recognition confidence is above a certain threshold (0.6 at the moment, but I may tweak it down a bit), and if so, it consults a dictionary to find out a delegate to invoke. It also updates the UI for diagnostic purposes. The dictionary itself is the only code that knows about the shed controllers, using import static to avoid having Factory. everywhere:

private const string Prefix = "shed ";

private static readonly Dictionary<string, Action> handlers = new Dictionary<string, Action>
{
    { "lights on", Lighting.On },
    { "lights off", Lighting.Off },
    { "music play", Sonos.Play },
    { "music pause", Sonos.Pause },
    { "music mute", () => Sonos.SetVolume(0) },
    { "music quiet", () => Sonos.SetVolume(30) },
    { "music medium", () => Sonos.SetVolume(60) },
    { "music loud", () => Sonos.SetVolume(90) },
    { "music next", Sonos.Next },
    { "music previous", Sonos.Previous },
    { "music restart", Sonos.Restart },
    { "amplifier on", Amplifier.On },
    { "amplifier off", Amplifier.Off },
    { "amplifier mute", () => Amplifier.SetVolume(0) },
    { "amplifier quiet", () => Amplifier.SetVolume(30) },
    { "amplifier medium", () => Amplifier.SetVolume(50) },
    { "amplifier loud", () => Amplifier.SetVolume(60) },
    { "amplifier source pie", () => Amplifier.Source("pi") },
    { "amplifier source sonos", () => Amplifier.Source("sonos") },
    { "amplifier source playstation", () => Amplifier.Source("ps4") }
}.WithKeyPrefix(Prefix);

Here, WithKeyPrefix is just a small extension method to create a new dictionary with a specified prefix to each key.

Just like with the command line version, adding a command is now simply a matter of adding a single entry in this dictionary.

Deploy that on my Raspberry Pi, and as if by magic, I can say “shed lights on” and the lights come on, etc. Admittedly after saying “shed music play” it can be quite tricky to launch further actions, as the music interferes with the speed recognition for obvious reasons.

Simple code for the win

I’d like to take a few moments to talk about the code. At this point, you may want to have Github open in another tab to follow along.

There are lots of things about the code which I’d deem pretty unacceptable at work:

  • It uses the service locator pattern instead of dependency injection. I’m not a fan of this in general.
  • I really hate the name Factory – but I haven’t found anything significantly better, yet. (ControllerProvider? I’d call it just Controllers, but that’s the final part of the namespace name…)
  • There are no tests. At all. Not even a test project.
  • There are only a few comments.
  • The IP addresses are hard-coded into Factory. No config files, no discovery, not even names – just IP addresses.
  • There’s no abstraction beyond IController and ICommand. I could potentially have an IVolumeController, IMusicController, ISourceController etc.

None of these bother me, even though the code is “in production” and I’m expecting to use it for a long time. It’s never going to grow large enough for the service locator pattern to be a problem. With so few types involved, a few non-ideal names isn’t going to cause much of a problem. The only tests that matter are the ones involving me saying “shed amplifier on” and the amplifier either turning on or not… there’s very little code here that’s really testable anyway. My device IP addresses are all fixed by my router, so I’d only have to change them if I change that – and I’d still end up changing it in just one place. Extra abstraction wouldn’t actually give me any benefits at the moment.

So yes, basically I’m happy with the code now. It provides me value, and it’s easy to maintain. In particular, adding extra controllers or commands is trivial. I guess what I’m saying is that this is a reminder that not all code is “enterprise software” and even “best practice” rules such as writing no code without tests have their limitations. Context is king.

What next?

My Raspberry Pi 3 has a small touchscreen display on it, which uses the Rasperry Pi SPI for communication. I haven’t yet managed to get this working, but obviously that would be a lovely next step. It’s a bit of a pain changing from Displayport to HDMI to see the UI and check what phrases have been recognized, for example. The display part will definitely be useful – I might use the touch part just for a very few key commands, such as “stop the music, you can’t hear me any more!”

The device I’d most like to control next is the heater. I keep leaving the heating on accidentally, then having to put my shoes on again to go out and just turn the heating off. If the heater plugged in via a regular socket, it would be easy enough to sort out – but unfortunately the power cable goes straight into a box in the wall. I may try to sort this out at some point, but it’s going to be a pain.

The other thing I’d like to do is add the ability to switch monitor inputs using DDC/CI. That could be tricky in terms of getting access to such a low-level API, and also it requires a permanent “live” connection to the monitor – whereas both my HDMI and Displayport connections are switched (by the Onkyo for HDMI, and a KVM for Displayport). I’m still thinking about that one. I could potentially have a secondary output from the NUC to a DVI input on the monitor, then make the NUC listen as a server that the Pi could talk to…

Conclusion

Home automation is fun and simple – but it really, really helps to have a project which will actually be useful to you. I’ve had a few Raspberry Pis sitting around for ages waiting to be used. They’ve always been fun to play with, but now there’s a purpose, and that makes a huge difference…

C# 7 feature request #1… extension attributes

Last week I learned that using static is going to be the syntax for importing static members (including extension methods) in C# 6. That fulfils a feature request I made in September 2005 (my fourth ever blog post, as it happens). With a feature request turnaround of 10 years, I figure I should get put everything I could ever want out there now… (Just kidding really – more seriously, I’m really pleased to see this change in C# 6, relative to both C# 5 and the earlier designs of C# 6.)

Last week at CodeMash, Dustin Campbell demonstrated one Roslyn DiagnosticAnalyzer for ArgumentNullException and friends, and I demonstrated one I’d written for the JetBrains InvokerParameterName attribute. The two diagnostics check largely the same thing: some parameters are “special” in that the argument for them should be a parameter in the current context, and should use the C# 6 nameof operator.

The most significant difference between them (beyond quality – it’s not hard to spot which is written by a C# team member…) is that my diagnostic spots these special parameters because they are decorated with a particular attribute, whereas Dustin’s has the relevant information hard-coded within the diagnostic. Mine can’t spot the ArgumentNullException constructor, and Dustin’s can’t spot the Preconditions.CheckNotNull method in Noda Time. It would be really nice if I could pretend that certain members of existing types had particular attributes applied to them. For example, wouldn’t it be nice if I could write something like:

using JetBrains.Annotations;

namespace System
{
    public extern partial class ArgumentNullException
    {
        public ArgumentNullException([InvokerParameterName] string paramName);
    }
}

The rules would be roughly like this:

  • A type declared with public extern partial modifiers would indicate that attributes should be applied to an existing type.
  • Each member listed would be checked for existence, and any attributes present in the declaration would be noted in the IL for the “extending” assembly.
  • A new call on Type, MethodInfo etc would be created to allow all attributes to be fetched for a member, whether “extended” or not. This would find all attributes contributed by all assemblies loaded in the current AppDomain (so you’d need to make sure that you did something to initialize any assembly containing extension attributes).
  • A similar new call would be available within Roslyn – we’d need to think carefully about which assemblies that examined, of course.

By making this an “opt-in” mechanism from the examiner perspective, it would be safe – anything dealing in security attributes would want to use the existing mechanism, for example.

This wouldn’t just be useful for diagnostic purposes, mind you. There are a number of situations where you might want to augment existing types with more metadata, whether those attributes are your own custom ones or existing ones in the framework. Want to apply an attribute-based serialization framework to a different third-party type? Sure, just use those extensions. Want to give system enums localization-oriented attributes for your own framework? No problem.

I’m not actually expecting this feature to go very far – it’s relatively niche, as well as possibly requiring CLR changes (rather than just framework and language changes). Still, having thought of it, I decided it would be odd to keep it to myself. And hey, it’s a good excuse to create the C# 7 category on my blog…

How many 32-bit types might we want?

I was recently directed to an article on "tiny types" – an approach to static typing which introduces distinct types for the sake of code clarity, rather than to add particular behaviour to each type. As I understand it, they’re like type aliases with no conversions between the various types. (Unlike plain aliases, an object is genuinely an instance of the relevant tiny type – it doesn’t have "alias erasure" as a language-based solution could easily do.)

I like the idea, and wish it were better supported in languages – but it led me to thinking more about the existing numeric types that we’ve got and how they’re used. In particular, I was considering how in C# the "byte" type is relatively rarely used as a number, with a useful magnitude which has arithmetic performed on it. That does happen, but more often it’s used either as part of other types (e.g. converting 4 bytes from a stream into an integer) or as a sequence of 8 bits.

It then struck me that the situations where we perform bitwise operations and the situations where we perform arithmetic operations are reasonably distinct. I wonder whether it would be worth having five separate types – which could be purely at the language level, if we wanted:

  • Float32 – regular IEEE-754 32-bit binary floating point type with arithmetic operations but no bitwise operations
  • Int32 – regular signed integer with arithmetic operations but no bitwise operations
  • UInt32 – unsigned integer with arithmetic operations but no bitwise operations
  • Bit32 – a bit sequence with bitwise operations but no arithmetic operations
  • Identity32 – a 32-bit value which only defines equality

The last type would be used for identities which happened to consist of 32 bits but where the actual bit values were only useful in terms of comparison with other identities. (In this case, an Identity64 might be more useful in practice.)

Explicit conversions which preserved the underlying bit pattern would be available, so you could easily generate a sequence of Identity32 values by having a simple Int32 counter, for example.

At the same time, I’d want to introduce bitwise operations for Bit8 and Bit16 values, rather than the "everything is promoted to 32 bits" that we currently have in Java and C#, reducing the number of pointless casts for code that performs bitwise operations.

The expected benefits would be the increased friction when using a type in an "unexpected" way for the value being represented – you’d need to explicitly cast to the right type – but maintaining a low-friction path when using a value in the expected way.

I haven’t yet figured out what I’d want a Stream.Read(…) call to use. Probably Bit32, but I’m not sure yet.

Anyway, I figured I’d just throw it out there as a thought experiment… any reactions?

Array covariance: not just ugly, but slow too

It seems to be quite a long time since I’ve written a genuine "code" blog post. Time to fix that.

This material may well be covered elsewhere – it’s certainly not terrifically original, and I’ve been meaning to post about it for a long time. In particular, I remember mentioning it at CodeMash in 2012. Anyway, the time has now come.

Refresher on array covariance

Just as a bit of background before we delve into the performance aspect, let me remind you what array covariance is, and when it applies. The basic idea is that C# allows a reference conversion from type TDerived[] to type TBase[], so long as:

  • TDerived and TBase are both reference types (potentially interfaces)
  • There’s a reference conversion from TDerived to TBase (so either TDerived is the same as TBase, or a subclass, or an implementing class etc)

Just to remind you about reference conversions, those are conversions from one reference type to another, where the result (on success) is never a reference to a different object. To quote section 6.1.6 of the C# 5 spec:

Reference conversions, implicit or explicit, never change the referential identity of the object being converted. In other words, while a reference conversion may change the type of the reference, it never changes the type or value of the object being referred to.

So as a simple example, there’s a reference conversion from string to object, therefore there’s a reference conversion from string[] to object[]:

string[] strings = new string[10];
object[] objects = strings;

// strings and objects now refer to the same object

There is not a reference conversion between value type arrays, so you can’t use the same code to conver from int[] to object[].

The nasty part is that every store operation into a reference type array now has to be checked at execution time for type safety. So to extend our sample code very slightly:

string[] strings = new string[10]; 
object[] objects = strings;

objects[0] = "string"; // This is fine
objects[0] = new Button(); // This will fail

The last line here will fail with an ArrayTypeMismatchException, to avoid storing a Button reference in a String[] object. When I said that every store operation has to be checked, that’s a slight exaggeration: in theory, if the compile-time type is an array with an element type which is a sealed class, the check can be avoided as it can’t fail.

Avoiding array covariance

I would rather arrays weren’t covariant in the first place, but there’s not a lot that can be done about that now. However, we can work around this, if we really need to. We know that value type arrays are not covariant… so how about we use a value type array instead, even if we want to store reference types?

All we need is a value type which can store the reference type we need – which is dead easy with a wrapper type:

public struct Wrapper<T> where T : class
{
    private readonly T value;
    public T Value { get { return value; } }
    
    public Wrapper(T value)
    {
        this.value = value;
    }
    
    public static implicit operator Wrapper<T>(T value)
    {
        return new Wrapper<T>(value);
    }
}

Now if we have a Wrapper<string>[], we can’t assign that to a Wrapper<object>[] variable – the types are incompatible. If that feels a bit clunky, we can put the array into its own type:

public sealed class InvariantArray<T> where T : class
{
    private readonly Wrapper<T>[] array;
    
    public InvariantArray(int size)
    {
        array = new Wrapper<T>[size];
    }
    
    public T this[int index]
    {
        get { return array[index].Value; }
        set { array[index] = value; }
    }
}

Just to clarify, we now only have value type arrays, but ones where each value is a plain wrapper for a reference. We now can’t accidentally violate type-safety at compile-time, and the CLR doesn’t need to validate write operations.

There’s no memory overhead here – aside from the type information at the start, I’d actually expect the contents of a Wrapper<T>[] to be indistinguishable from a T[] in memory.

Benchmarking

So, how does it perform? I’ve written a small console app to test it. You can download the full code, but the gist of it is that we use a stopwatch to measure how long it takes to either repeatedly write to an array, or repeatedly read from an array (validating that the value read is non-null, just to prove that we’ve really read something). I’m hoping I haven’t fallen foul of any of the various mistakes in benchmarking which are so easy to make.

The test tries four scenarios:

  • object[] (but still storing strings)
  • string[]
  • Wrapper<string>[]
  • InvariantArray<string>

Running against an array size of 100, with 100 million iterations per test, I get the following results on my Thinkpad Twist :

Array type Read time (ms) Write time
object[] 11842 44827
string[] 12000 40865
Wrapper<string>[] 11843 29338
InvariantArray<string> 11825 32973

That’s just one run, but the results are fairly consistent across runs. The one interesting deviation is the write time for object[] – I’ve observed it sometimes being the same as for string[], but not consistently. I don’t understand this, but it does seem that the JIT isn’t performing the optimization for string[] that it could if it spotted that string is sealed.

Both of the workarounds to avoid array covariance make a noticeable difference to the performance of writing to the array, without affecting read performance. Hooray!

Conclusion

I think it would be a very rare application which noticed a significant performance boost here, but I do like the fact that this is one of those situations where a cleaner design also leads to better performance, without many obvious technical downsides.

That said, I doubt that I’ll actually be using this in real code any time soon – the fact that it’s just "different" to normal C# code is a big downside in itself. Hope you found it interesting though :)

Coding in the style of Glee

As previously mentioned, at CodeMash 2012 I gave a very silly Pecha Kucha talk entitled "Coding in the style of Glee". The video is on YouTube, or can be seen embedded below:

(There’s also another YouTube video from a different angle.)

This post gives the 20 slides (which were just text; no fancy pictures unlike my competitors) and what I meant to say about them. (Edited very slightly to remove a couple of CodeMash-specific in-jokes.) Don’t forget that each slide was only up for 20 seconds.

Coding in the style of Glee

As you may know, I’m from the UK, and it’s wonderful to be here. This is my first US conference, so it’s great to be in the country which has shared with the world its most marvellous cultural output: the Fox show, Glee.

At first I watched it just for surface story – but now I know better – I know that really, the songs are all about the culture and practice of coding.

(It isn’t easy) Bein’ Green

When I started coding, it was on a ZX Spectrum, in Basic. It was hard, but the computer came with a great manual. I later learned C from a ringbinder of some course or other – and entirely failed to understand half the language. Of course, this was before Stack Overflow, when it was really hard being a newbie – where could you turn for information?

Getting to know you

Over time I became semi-competent in C, with the help of friends. But learning is a constant process, of course – getting to know new languages and platforms is just part of a good dev’s life every day.

Learning itself is a skill – how similar it is to getting to know small children, I leave to your imagination.

Man in the Mirror

Glee doesn’t just talk about the coding experience, of course – it talks about specific technologies. This Michael Jackson song is talking about reflection, of course. Although the idea wasn’t new in Java, it was new to me – and now it would be almost unthinkable to come up with a new platform which didn’t let you find out about what was in the code.

Bridge over Troubled Water

Another technology covered beautifully by Glee is the interop. We’re in a big world, and we always need to talk to other systems. Whether it’s over JNI (heaven help you), P/Invoke, SOAP, REST – whatever, I hope next time you connect to another system, you’ll hear this haunting Simon and Garfunkel melody in the background.

I will survive

And who could forget persistence frameworks. I’m not sure whether Gloria Gaynor had Hibernate and the Entity Framework in mind when she sang this, but I’m utterly convinced that the Glee writers did. When you submit your data, it’s just got to survive – what else would you want?

You can’t always get what you want

We’d all like perfect specifications, reliable libraries, ideal languages, etc – but that’s just not going to happen. It’s possible that of course you won’t get what you need – even if you try real hard. But hey, you might just.

Lean on me (or Agile on me)

(I didn’t actually have notes written for this one. Copied from the video.)

Glee sympathizes with you – but it also have a bit of an answer: lean on me. Lean and agile development, so we can adapt to constantly changing specifications, and eventually we will have something that is useful. Maybe nothing like what we initially envisaged, but it will be something useful.

Losing my Religion

Of course, we don’t always stay in love with a platform. I’d like to dedicate this slight to Enterprise Java. Fortunately I never had to deal with Enterprise Java Beans, but I “enjoyed” enough other J2EE APIs to make me yearn for a world without BeanProcessorFactoryFactories.

Anything Goes

Now I’m pretty conservative – only in terms of coding, mind you. I’m a statically typed language guy. But Glee celebrates dynamic languages too – languages where really, anything goes until you try to execute it. Even though I haven’t gone down the dynamic route myself much, it’s important that we all welcome the diversity of languages and platforms we have.

Get Happy

Along with the rise of dynamic languages, we seem to have seen a rise of happy developers. We’ve always had enthusiastic developers, but there’s a certain air about your typical Ruby on Rails developer which feels new to me. Again, I’m not a Ruby fan myself – but it’s always nice to see other happy people, and maybe one day I’ll see the light.

Bust your Windows

I don’t know what I can say about this song. Do the Glee writers have it in for Microsoft? I don’t remember “Bust your OSX” or “Bust your Linux” for example. Only Windows is targeted here.

The Safety Dance

One big change for me since joining Google is increased awareness of the need for redundancy – the intricate dance we need to perform to create a service which will stay up no matter what. Where redundancy is a dirty word in most of industry, as developers we celebrate it – and will do anything we can to avoid…

The Only Exception

… a single point of failure.

(Yes, that really is all I’d prepared for that slide. Hence the need for improvisation.)

Telephone

(From video.) Glee celebrates the rise of phone apps. Who these days could be unaware of the importance of the development of mobile applications? And obviously, we can credit the iPhone for that, but since the iPhone, and just smart phone apps, we’ve also started…

U Can’t Touch This

(From video.) Tablets! And touch screen devices of all kinds. So Windows 8 – very touch-based, and sooner or later we’re all going to have to get with it. I don’t do UIs, I’ve never done a touch UI in my life, I have no idea how it works. But clearly it’s one of the ways forward.

Forget You

As smart phones and tablets become more ubiquitous and more bound to us as people, privacy has become more important. Glee gave us a timely reminder of the reverse of the persistence early on: we need to be able to forget about users, as well as remember them.

(A)waiting for a girl like you

(From video.) I’d like to leave on an up-note, so: I’m clearly very, very excited (really, really excited) about C# 5 and its await keyword so I ask you – I beg you – be excited about development. And always bear in mind your users.

My life would suck without you

Users rule our world. Can’t live without them, can’t shoot ‘em.

Celebrate – we do stuff to make users really happy! This is awesome! We should be thrilled!

(Even for enterprise apps, we’re doing useful stuff. Honest.)

Don’t stop believing

(From video.) So to sum up: have fun, keep learning, really, really enjoy what you’re doing, and… don’t stop!

Eduasync part 13: first look at coroutines with async

(This part covers project 18 in the source code.)

As I mentioned in earlier parts, the "awaiting" part of async methods is in no way limited to tasks. So long as we have a suitable GetAwaiter() method which returns a value of a type which in turn has suitable methods on it, the compiler doesn’t really care what’s going on. It’s time to exploit that to implement some form of coroutines in C#.

Introduction to coroutines

The fundamental idea of coroutines is to have multiple methods executing cooperatively, each of them maintaining their position within the coroutine when they yield to another. You can almost think of them as executing in multiple threads, with only one thread actually running at a time, and signalling between the different threads to control flow. However, we don’t really need multiple threads once we’ve got continuations – we can have a single thread with a complex flow of continuations, and still only a very short "real" stack. (The control flow is stored in normal collections instead of being implicit on the thread’s stack.)

Coroutines were already feasible in C# through the use of iterator blocks, but the async feature of C# allows a slightly more natural way of expressing them, in my view. (The linked Wikipedia page gives a sketch of how coroutines can be built on top of generators, which in the general concept that iterator blocks implement in C#.)

I have implemented various flavours of coroutines in Eduasync. It’s possible that some (all?) of them shouldn’t strictly be called coroutines, but they’re close enough to the real thing in feeling. This is far from an exhaustive set of approaches. Once you’ve got the basic idea of what I’m doing, you may well want to experiment with your own implementations.

I’m not going to claim that the use of coroutines in any of my examples really makes any sense in terms of making real tasks easier. This is purely for the sake of interest and twisting the async feature for fun.

Round-robin independent coroutines

Our first implementation of coroutines is relatively simple. A coordinator effectively "schedules" the coroutines it’s set up with in a round-robin fashion: when one of the coroutines yields control to the coordinator, the coordinator remembers where the coroutine had got to, and then starts the next one. When each coroutine has executed its first piece of code and yielded control, the coordinator will go back to the first coroutine to continue execution, and so on until all coroutines have completed.

The coroutines don’t know about each other, and no data is being passed between them.

Hopefully it’s reasonably obvious that the coordinator contains all the smarts here – the coroutines themselves can be relatively dumb. Let’s look at what the client code looks like (along with the results) before we get to the coordinator code.

Client code

The sample code contains three coroutines, all of which take a Coordinator parameter and have a void return type. These are passed to a new coordinator using a collection initializer and method group conversions; the coordinator is then started. Here’s the entry point code for this:

private static void Main(string[] args)
{
    var coordinator = new Coordinator { 
        FirstCoroutine,
        SecondCoroutine,
        ThirdCoroutine
    };
    coordinator.Start();
}

When each coroutine is initially started, the coordinator passes a reference to itself as the argument to the coroutine. That’s how we solve the chicken-and-egg problem of the coroutine and coordinator having to know about each other. The way a coroutine yields control is simply by awaiting the coordinator. The result type of this await expression is void – it’s just a way of "pausing" the coroutine.

We’re not doing anything interesting in the actual coroutines – just tracing the execution flow. Of course we could do anything we wanted, within reason. We could even await a genuinely asynchronous task such as fetching a web page asynchronously. In that case the whole coroutine collection would be "paused" until the fetch returned.

Here’s the code for the first coroutine – the second and third ones are similar, but use different indentation for clarity. The third coroutine is also shorter, just for fun – it only awaits the coordinator once.

private static async void FirstCoroutine(Coordinator coordinator)
{
    Console.WriteLine("Starting FirstCoroutine");
    Console.WriteLine("Yielding from FirstCoroutine…");

    await coordinator;

    Console.WriteLine("Returned to FirstCoroutine");
    Console.WriteLine("Yielding from FirstCoroutine again…");

    await coordinator;

    Console.WriteLine("Returned to FirstCoroutine again");
    Console.WriteLine("Finished FirstCoroutine");
}

And here’s the output…

Starting FirstCoroutine
Yielding from FirstCoroutine…
    Starting SecondCoroutine
    Yielding from SecondCoroutine…
        Starting ThirdCoroutine
        Yielding from ThirdCoroutine…
Returned to FirstCoroutine
Yielding from FirstCoroutine again…
    Returned to SecondCoroutine
    Yielding from SecondCoroutine again…
        Returned to ThirdCoroutine
        Finished ThirdCoroutine…
Returned to FirstCoroutine again
Finished FirstCoroutine
    Returned to SecondCoroutine again
    Finished SecondCoroutine

Hopefully that’s the output you expected, given the earlier description. Again it may help if you think of the coroutines as running in separate pseudo-threads: the execution within each pseudo-thread is just linear, and the timing is controlled by our explicit "await" expressions. All of this would actually be pretty easy to implement using multiple threads which really did just block on each await expression – but the fun part is keeping it all in one real thread. Let’s have a look at the coordinator.

The Coordinator class

Some of the later coroutine examples end up being slightly brainbusting, at least for me. This one is relatively straightforward though, once you’ve got the basic idea. All we need is a queue of actions to execute. After initialization, we want our queue to contain the coroutine starting points.

When a coroutine yields control, we just need to add the remainder of it to the end of the queue, and move on to the next item. Obviously the async infrastructure will provide "the remainder of the coroutine" as a continuation via the OnContinue method.

When a coroutine just returns, we continue with the next item in the queue as before – it’s just that we won’t add a continuation to the end of the queue. Eventually (well, hopefully) we’ll end up with an empty queue, at which point we can stop.

Initialization and a choice of data structures

We’ll represent our queue using Queue<T> where the T is a delegate type. We have two choices here though, because we have two kinds of delegate – one which takes the Coordinator as a parameter (for the initial coroutine setup) and one which has no parameters (for the continuations). Fortunately we can convert between the two in either direction very simply, bearing in mind that all of this is within the context of a coordinator. For example:

// If we’re given a coroutine and want a plain Action
Action<Coordinator> coroutine = …; 
Action action = () => coroutine(this);

// If we’re given a plain Action and want an Action<Continuation>:
Action continuation = …; 
Action<Coordinator> coroutine = ignored => continuation();

I’ve arbitrarily chosen to use the first option, so there’s a Queue<Action> internally.

Now we need to get the collection initializer working. The C# compiler requires an appropriate Add method (which is easy) and also checks that the type implements IEnumerable. We don’t really need to be able to iterate over the queue of actions, so I’ve use explicit interface implementation to reduce the likelihood of GetEnumerator() being called inappropriately, and made the method throw an exception for good measure. That gives us the skeleton of the class required for setting up:

public sealed class Coordinator : IEnumerable
{
    private readonly Queue<Action> actions = new Queue<Action>();

    // Used by collection initializer to specify the coroutines to run
    public void Add(Action<Coordinator> coroutine)
    {
        actions.Enqueue(() => coroutine(this));
    }

    // Required for collection initializers, but we don’t really want
    // to expose anything.
    IEnumerator IEnumerable.GetEnumerator()
    {
        throw new NotSupportedException("IEnumerable only supported to enable collection initializers");
    }
}

(Note that I haven’t used XML documentation anywhere here – it’s great for real code, but adds clutter in blog posts.)

For production code I’d probably prevent Add from being called after the coordinator had been started, but there’s no need to do it in our well-behaved sample code. We’re only going to add extra actions to the queue via continuations, which will be added due to await expressions.

The main execution loop and async infrastructure

So far we’ve got code to register coroutines in the queue – so now we need to execute them. Bearing in mind that the actions themselves will be responsible for adding continuations, the main loop of the coordinator is embarrassingly simple:

// Execute actions in the queue until it’s empty. Actions add *more*
// actions (continuations) to the queue by awaiting this coordinator.
public void Start()
{
    while (actions.Count > 0)
    {
        actions.Dequeue().Invoke();
    }
}

Of course, the interesting bit is the code which supports the async methods and await expressions. We know we need to provide a GetAwaiter() method, but what should that return? Well, we’re just going to use the awaiter to add a continuation to the coordinator’s queue. It’s got no other state than that – so we might as well return the coordinator itself, and put the other infrastructure methods directly in the coordinator.

Again, this is slightly ugly, as the extra methods don’t really make sense on the coordinator – we wouldn’t want to call them directly from client code, for example. However, they’re fairly irrelevant – we could always create a nested type which just had a reference to its "parent" coordinator if we wanted to. For simplicity, I haven’t bothered with this – I’ve just implemented GetAwaiter() trivially:

// Used by await expressions to get an awaiter
public Coordinator GetAwaiter()
{
    return this;
}

So, that leaves just three members still to implement: IsCompleted, OnCompleted and GetResult. We always want the IsCompleted property to return false, as otherwise the coroutine will just continue executing immediately without returning to cede control; the await expression would be pointless. OnCompleted just needs to add the continuation to the end of the queue – we don’t need to attach it to a task, or anything like that. Finally, GetResult is a no-op – we have no results, no exceptions, and basically nothing to do. You might want to add a bit of logging here, if you were so inclined, but there’s no real need.

So, here are the final three members of Coordinator:

// Force await to yield control
public bool IsCompleted { get { return false; } }

public void OnCompleted(Action continuation)
{
    // Put the continuation at the end of the queue, ready to
    // execute when the other coroutines have had a go.
    actions.Enqueue(continuation);
}

public void GetResult()
{
    // Our await expressions are void, and we never need to throw
    // an exception, so this is a no-op.
}

And that’s it! Fewer than 50 lines of code required, and nothing complicated at all. The interesting behaviour is all due to the way the C# compiler uses the coordinator when awaiting it.

We need AsyncVoidMethodBuilder as before, as we have some async void methods – but that doesn’t need to do anything significant. That’s basically all the code required to implement these basic round-robin coroutines.

Conclusion

Our first foray into the weird and wonderful world of coroutines was relatively tame. The basic idea of a coordinator keeping track of the state of all the different coroutines in one sense or another will keep coming back to us, but with different ways of controlling the execution flow.

Next time we’ll see some coroutines which can pass data to each other.

Of memory and strings

This post was provoked by a recent Stack Overflow question which asked whether there was an efficient representation of ASCII strings in .NET.

In particular, the questioner wanted to store hundreds of thousands – possibly millions – of strings in memory, and knowing (or assuming) that they all consisted of ASCII characters, he wanted to avoid the waste of space that comes from storing each character in a .NET string as a UTF-16 code unit.

My answer to the question mostly consisted of saying that I didn’t think it would be worth the effort, and giving some reasons. But the more reasons I gave, the more I thought about the subtleties involved, and that it’s actually quite an interesting case study into memory use in .NET.

How are we going to measure object size?

If we’re going to work out any sort of benefit from a more compact string representation, we’ll need to be able to calculate how much memory our objects are taking to start with. Rather than work this out in a purely theoretical way, I’ve been running tests using code like this:

using System;

class Program
{
    static void Main(string[] args)
    {
        int size = 10000000;
        object[] array = new object[size];

        long before = GC.GetTotalMemory(true);
        for (int i = 0; i < size; i++)
        {
            array[i] = new object();
        }
        long after = GC.GetTotalMemory(true);

        double diff = after – before;

        Console.WriteLine(“Per object: “ + diff / size);

        // Stop the GC from messing up our measurements
        GC.KeepAlive(array);
    }
}

Obviously that doesn’t take into account factors such as memory used by JITting, or anything that could be going on in other threads, but by using a suitably large number of objects, and by performing the division in floating point arithmetic (to avoid a slight variation making an 11.99999 come out as an 11, when it’s really just a “12 with something else going on”, we can work out the size of objects pretty clearly. The sample above measures the size of a vanilla object, but the code can be adapted very easily.

The first important thing to point out is that C# doesn’t guarantee the results of this – it isn’t responsible for determining how all of an object is laid out in memory; that’s the runtime’s job. While there are attributes to affect the layout and padding of the data members of a type in memory, there are other aspects that are out of your control. In this post I won’t use any of the layout attributes – we’ll just use the defaults.

Not all runtimes are created equal, either. On my laptop I’ve got Mono 2.8, .NET 3.5 and .NET 4, with the two versions of .NET each having the 32-bit (x86) and 64-bit (x64) CLRs. For the sake of simplicity, I’m going to stick with .NET 4 for this post, but I’ll give results for both the x64 and x86 CLRs. To test each of them, I’m compiling with “/platform:x64” or “/platform:x86”.

Some simple starter measurements

Before I start creating my own types, let’s try a few built-in types, including strings:

Type x86 size x64 size
object 12 24
object[] 16 + length * 4 32 + length * 8
int[] 12 + length * 4 28 + length * 4
byte[] 12 + length 24 + length
string 14 + length * 2 26 + length * 2

Note that all the x86 sizes are rounded up to the nearest 4 bytes, and all x64 sizes are rounded up to the nearest 8 bytes.

The string numbers are interesting, because strings are the only non-array types in .NET which vary in size. A long string consists of a single large object in memory. Compare this with Java, where a String is a “normal” type in terms of memory consumption, containing an offset and length into a char array – so a long string consists of a small object referring to a large char array. This distinction will be very important when we come to build an AsciiString type. Another point about measuring string sizes is that it’s relatively tricky to measure the size of an empty string – because even if you use the “new string(‘x’, 0)” constructor, the result is still cached – the same reference is returned each time.

You might be forgiven for looking at the numbers above and thinking that the “overhead” of an object is 12 bytes in x86 and 24 in x64… but that’s not quite right. Let’s build our own straightforward classes and measure them…

Custom classes containing primitives

Here are six classes, all of which are measured with the same simple test code:

class Empty {}
class OneInt32 { int x; }
class TwoInt32 { int x, y; }
class ThreeInt32 { int x, y, z; }

class Mixed1
{
    int x;
    byte b1, b2, b3, b4;
    int y, z;
}

class Mixed2
{
    int x;
    byte b1;
    int y, z;
    byte b2, b3, b4;
}

The last case is mostly to check how the CLR handles an “awkward” class declaration, where the int variables won’t naturally be aligned on 4-byte boundaries. The results look odd at first, but we’ll make sense of them in a minute:

Type x86 size x64 size
Empty 12 24
OneInt32 12 24
TwoInt32s 16 24
ThreeInt32s 20 32
Mixed1 24 32
Mixed2 24 32

A few interesting things to note here:

  • There’s a “base” overhead of 8 bytes per object in x86 and 16 per object in x64… given that we can store an Int32 of “real” data in x86 and still have an object size of 12, and likewise we can store two Int32s of real data in x64 and still have an object of x64.
  • There’s a “minimum” size of 12 bytes and 24 bytes respectively. In other words, you can’t have a type which is just the overhead. Note how the “Empty” class takes up the same size as creating instances of Object… there’s effectively some spare room, because the CLR doesn’t like operating on an object with no data. (Note that a struct with no fields takes up space too, even for local variables.)
  • The x86 objects are padded to 4 byte boundaries; on x64 it’s 8 bytes (just as before)
  • By default, the CLR is happy to pack fields pretty densely – Mixed2 only took as much space as ThreeInt32. My guess is that it reorganized the in-memory representation so that the bytes all came after the ints… and that’s what a quick bit of playing around with unsafe pointers suggests too… but I’m not sufficiently comfortable with this sort of thing to say for sure. Frankly, I don’t care… so long as it all works, what we’re interested in is the overall size, not the precise layout.

So what does an ASCII string look like?

In this blog post I’m not actually going to implement an ASCII string at all (well, not much). I’m merely pointing out what the data structures would look like. However, it’s worth working out what desirable qualities it should have. As far as possible, it should feel like System.String. In particular:

  • It should be immutable.
  • It should have fast access to individual characters, and the length.
  • It should mostly “feel” like an immutable reference type, in that passing a value of type AsciiString around should be cheap, like copying a reference.
  • It should use as little memory as possible… less than the equivalent string, or it’s pointless.
    • One caveat to this: in theory that could mean storing 8 characters in every 7 bytes, as ASCII really only uses 7 bits per character. I’m not going to those extremes, but you can think about them if you want.

We’re going to store the characters as a byte array. We have three options as to exactly how we handle that byte array:

  • We could go the Java way, where several strings share references to the same array. Each string then has an offset and a length to say which bit of the array they’re interested in.
    • Pros: Substring becomes really cheap
    • Cons: You can end up having just a tiny substring responsible for keeping a huge character array alive
  • We could go the .NET way, where each string has its own character data, but the buffer may be longer than necessary… so it stores the length too. (A bit like a List<T>.)
    • Pros: Can potentially make building strings cheap, if you just keep whatever potentially oversized buffer you’ve already got.
    • Cons: Wasted space for the unused part of the array, and a field for the length.
  • We could just have a byte array of exactly the right size – and it already knows its size.

I’m going to assume the third option here. So all the data our type needs is a byte array. That’s going to be pretty cheap… we hope. Let’s look at what we can build.

Option 1: A simple class

To give a flavour of the implementation, I’ve decided to implement four members for each option:

  • A way of creating an AsciiString from a regular string
  • The Substring overload with both a start and length
  • The Length property
  • The indexer returning a char

Hopefully that will give enough of an idea of what’s going on to be useful. Note that these aren’t production-quality implementations at all… none of the code has ever been run at all. I have made sure it compiles, so just be grateful for that :)

using System;
using System.Text;

public sealed class AsciiString
{
    private readonly byte[] data;

    public AsciiString(string text)
    {
        // TODO: Rather more data validation etc!
        data = Encoding.ASCII.GetBytes(text);
    }

    private AsciiString(byte[] data)
    {
        // This constructor is private: we can trust that it’s been called
        // by code which isn’t going to modify the contents of the array
        // afterwards.
        this.data = data;
    }

    public AsciiString Substring(int startIndex, int length)
    {
        if (startIndex < 0 || startIndex > data.Length)
        {
            throw new ArgumentOutOfRangeException(“startIndex”);
        }
        if (startIndex + length > data.Length)
        {
            throw new ArgumentOutOfRangeException(“length”);
        }
        byte[] newData = new byte[length];
        Buffer.BlockCopy(data, startIndex, newData, 0, length);
        return new AsciiString(newData);
    }

    public int Length { get { return data.Length; } }

    public char this[int position] { get { return (char) data[position]; } }
    // etc…
}

Hopefully this is pretty straightforward – it’s meant to be the most “obvious” solution. Note that we’ve not got the nice locality of reference which the real String class has – it’s possible that the an AsciiString could end up with its backing array a long way away in memory, so a indexer operation for a single character could end up with three cache misses – one for the AsciiString object, one for part of the data array storing the length (for argument validation) and one for the part of the data array containing the character we’re looking for. That may be unlikely, and it’s not the kind of thing I normally think about – but it’s probably the kind of thing the BCL team pay a lot of attention to.

We get the same “immutable reference type” behaviour present in the normal string type, however – you can have a null AsciiString reference just as normal, any assignments will just be reference assignments, etc.

What about the size though? There are two objects to consider:

  • The array, of size 12 + length or 24 + length (x86 and x64 respectively; rounded up to 4 or 8 bytes as well)
  • The object itself, of size 12 or 24

So we’ve got a total size of 24 + length or 48 + length, depending on architecture. To show how the break-even point works, here’s a little table showing the sizes of string and AsciiString for various sizes on both architectures:

Length string-x86 string-x64 AsciiString-x86 AsciiString-x64
0 16 32 24 48
1 16 32 28 56
2 20 32 28 56
3 20 32 28 56
4 24 40 28 56
5 24 40 32 56
6 28 40 32 56
7 28 40 32 56
8 32 48 32 56
9 32 48 36 64
10 36 48 36 64
..
16 48 64 40 64
24 64 80 48 72
32 80 96 56 80

As you can see, the break-even point in x86 is at length 10; in x64 it’s at length 16. After that, we start winning – as we’d expect. The penalty for very small strings is quite hefty though – you’d really better hope you didn’t have lots of single-character strings, taking 56 bytes each in x64.

Let’s see if we can do better…

Option 2: A simple struct

A lot of the overhead here has come from the fact that we’ve got an object which only has a single field. The field is all we’re interested in… why are we bothering with all the overhead of the object? Let’s make it a struct instead, effectively inlining that field wherever we use the type. Assignment, passing arguments to methods etc will still only be copying a reference – it’s just the reference will be the byte array rather than a wrapper object.

It all sounds good, but there are two snags:

  • The value can never be null; that at least diverges from the familiar string behaviour
  • We won’t be able to prevent code from creating an instance of our struct with new AsciiString() – and that won’t be good.

We can actually pit these two downsides against each other by making the “default” value a pseudo-null value… we can even throw NullReferenceException just as if it were a reference type. We don’t even need to do any work in order to get that NullReferenceException – every member is going to use the data array anyway, and dereferencing that will automatically throw an exception. We might want to change things around a bit to make that the very first thing that can throw an exception, but that’s all.

It’s nasty, but it appeals very slightly. In an evil kind of way. It makes things slightly more familiar, but at the cost of being generally weird in other ways.

We still need to be able to check whether an AsciiString value is the logical null value. I’ll add an IsNull property for that purpose. (An alternative would be HasValue, but that would be confusing with Nullable<T>.)

Most of the code remains untouched – it looks like this:

public struct AsciiString
{
    private readonly byte[] data;
    public bool IsNull { get { return data == null; } }

    // Remainder of code as before
}

Now let’s look at the sizes, which should be a lot more favourable than before. Note that I had to change the size-checking code to create an array of type AsciiStruct[] instead of object[] to avoid boxing. Should we take the size of the array itself into consideration when computing the size of the AsciiString? We haven’t when working out the size of string… in each case the size of any individual element will be the size of a reference. For the table below, I haven’t included it… but bear in mind that this form of measurement would count the size of most value types (int etc) as 0. It just goes to show that when you talk about the size of a data type, you really need to be very precise in what you mean.

Length string-x86 string-x64 AsciiString-x86 AsciiString-x64
0 16 32 12 24
1 16 32 16 32
2 20 32 16 32
3 20 32 16 32
4 24 40 16 32
5 24 40 20 32
6 28 40 20 32
7 28 40 20 32
8 32 48 20 32
9 32 48 24 40
10 36 48 24 40
..
16 48 64 28 40
24 64 80 36 48
32 80 96 44 56

This time, unsurprisingly, AsciiString is always more space-efficient than the normal string. It just takes a certain amount of holding our noses. Speaking of which…

Option 3: Extension methods on byte[]

Suppose we really, really want to have “proper” null references. We don’t really need the struct. We could treat any byte array as an array of ASCII characters, with extension methods like this:

public static class ByteArrayExtensions
{
    public static byte[] Substring(this byte[] data, int startIndex, int length)
    {
        if (startIndex < 0 || startIndex > data.Length)
        {
            throw new ArgumentOutOfRangeException(“startIndex”);
        }
        if (startIndex + length > data.Length)
        {
            throw new ArgumentOutOfRangeException(“length”);
        }
        byte[] newData = new byte[length];
        Buffer.BlockCopy(data, startIndex, newData, 0, length);
        return newData;
    }
}

The size is the same as with option 2 – in both cases there’s just the byte array, basically. This option is truly horrible in many ways though:

  • You can no longer tell in your code (just through typing) what’s meant to be an AsciiString and what isn’t
  • Kiss immutability goodbye
  • We can’t guarantee that all the characters will be valid ASCII any more
  • We can’t add extension properties or extension indexers
  • We can’t make it implement the interfaces we want it to

Obviously, we’d never take this route. I just thought I’d include it for a regular dose of evil-ness.

Conclusion

Implementing an ASCII-only string representation sounds like it should be an obvious win in terms of memory, at the cost of doing a lot of work that’s already been done for us in String. However, the most obvious implementation takes a while to break even in memory usage, compared with the normal string type, due to the “special” nature of string within the CLR. We can’t mimic the “stretchiness” of string ourselves. The BCL/CLR teams could, of course, if they really wanted to. I’m not holding my breath though.

If we’re dead keen on saving space at the cost of some design wonkiness, we can use a value type instead of a reference type. Other than nullity, it works pretty well… but you have all the disadvantages which go with value types, such as the unavoidable parameterless constructor and the need to watch out for boxing.

Aside from anything else, I hope this was useful as a delve into how much space objects actually take up in .NET – and as a way of highlighting the extra memory used when running in the x64 CLR.

A Model/View to a Kill (Naked came the null delegate, part 5)

(I suggest you read the earlier parts of the story first. I’m not claiming it’ll make any more sense afterwards, mind you.)

Even though Seymour Sharpton’s brain was in a spinlock, a low-level interrupt brought him out of his stupor – namely, an enormous motorcycle bursting through the floor near the daemon. It was impossible to tell the form of the rider under the leather and helmet. When the biker spoke, the voice was digitally disguised but its authority was clear:

"Sharpton. Here, now. The rest of you: you know me. Follow us, and there’ll be trouble."

Algol hissed sharply, and then started cajoling Seymour: "Don’t go. Stay with us. My master didn’t mean what he said about, um, deletion. It was just a little joke. You’re safe here…"

But Seymour was already running towards the motorcycle. The biker had never stopped revving the engine, and the moment Seymour jumped on the rear seat, the wheels span briefly, kicking up clouds of dust before the bike raced through the warehouse and through the (fortunately still open) door.

The ride was fast, bumpy, and terrifying. Seymour hadn’t felt like this since he’d given up C, years ago. The roar of the engine drowned out any conversation, and he was left holding on for dear life until the bike came to a screeching halt in a deserted street. The biker dismounted and offered a hand to Seymour, who shook it nervously.

"Who are you?" he murmured, almost afraid to find out.

"My name is unimportant," replied the metallic voice, still masked by the helmet.

"It’s not Slartibartfast, is it?" Seymour had visions of being whisked away to see fjords. That would just about fit in with the rest of his strange evening.

"No, it’s… unspeakable."

"Ah, so you’re an anonymous type of guy, huh?"

"Anonymous, yes… guy, no." The biker removed her helmet and shook her head. Her long blonde hair swooshed from side to side, and time seemed to slow for Seymour as he watched her. She was a model he could view forever… although the idea of trying to control her seemed out of the question. Then several strands of hair were caught in the anonymous girl’s gently pouting mouth, and she spat them out hurriedly. "Damn it, I hate it when that happens. Anyway, you are lucky to be alive. You have no idea what our shady underworld contains… those zombies aren’t the worst of it by a long chalk."

"There’s more?" Seymour gulped.

"Worse than you can imagine. We’re lucky it’s a new moon tonight, for example. Otherwise this place would be heaving with were-clauses. Most of the month they just filter as normal, but come the full moon… urgh." She shuddered, and Seymour didn’t want to ask any further. The biker paused, and then continued.

"Then there’s the mutants. They’re harmless enough, but not for want of trying. They’ll lope after you, but they mutate almost without thinking about it. Totally dysfunctional. A quick kick to the monads will usually despatch them… But tonight, we have something else to fear." She looked around, cautiously. "The word on the street is that the Vimpires are in town. Every few years we think we’ve got rid of them… and then they come back, with their long and surprisingly dexterous fingers. You know how you can tell when the Vimpires are around?"

Seymour was spellbound. "How?"

"The mice all leave, in droves. The rats don’t care, but a Vimpires will torture a mouse just for the fun of it. But this time, there are rumours. There’s talk of a bigger plan afoot. The one thing the Vimpires are still afraid of is bright light. During the day, we’re safe… but imagine if there were no more days? Perpetual twilight – like "Breaking Dawn part 1" but forever."

"They wouldn’t!" Seymour gasped. He remembered that long night in the cinema only too well.

"They would. And they have allies… for the first time, the Eclipse posse and the Vimpires are joining forces. So we have to fight them. You’re not the first innocent man I’ve rescued tonight, and you won’t be the last. But I need to be sure of you… do you have references?"

"Um, what kind of references?"

"Anything to prove your identity. It’s a class war out there, Seymour… now what type of man are you? Where do your values lie? Oh, never mind… I’ll trust you for now. But Seymour, you need to be ready. Brace yourself. Are you in The Zone?"

"I don’t know what you mean… what zone are you talking about?"

"Ah, true, that was ambiguous. UTC or not UTC, that is the question. Whether ’tis nobler in in the mind to suffer the leap seconds and missing hours of outrageous chronology, or to take ARM against a sea of doubles, and by opposing end them?"

"What on earth are you babbling about?"

"No matter. All you need to know is this… the Vimpires are trying to extinguish the sun, but we’re going to stop them. It’s daylight saving time."

Continued in part 6 – The Great Destructor

The curious case of the publicity-seeking interface and the shy abstract class

Noda Time has a guilty secret, and I’m not just talking about the fact that there’s been very little progress on it recently. (It’s not dead as a project – I have high hopes, when I can put some quality time into it.) This secret is called LocalInstant, and it’s a pain in the neck.

One of the nice things about giving talks about an API you’re currently writing is that you can see which concepts make sense to people, and which don’t – as well as seeing which concepts you’re able to explain and which you can’t. LocalInstant has been an awkward type to explain right from day 1, and I don’t think it’s improved much since then. For the purpose of this blog post, you don’t actually need to know what it means, but if you’re really interested, imagine that it’s like a time-zone-less date and time (such as "10:58 on July 2nd 2015" but also missing a calendar system, so you can’t really tell what the month is etc. The important point is that it’s not just time-zone-less, but it’s actually local – so it doesn’t represent a single instant in time. Unlike every other concept in Noda Time, I haven’t thought of any good analogy between LocalInstant and the real world.

Now, I don’t like having types I can’t describe easily, and I’d love to just get rid of it completely… but it’s actually an incredibly powerful concept to have in the library. Not for users of course, but for the implementation. It’s spattered all over the place. Okay, the next best step to removing it is to hide it away from consumers: let’s make it internal. Unfortunately, that doesn’t work either, because it’s referred to in interfaces all the time too. For example, almost every member of ICalendarSystem has LocalInstant as one of its parameters.

The rules around interfaces

Just to recap, every member of an interface – even an internal interface – is implicitly public. That causes some interesting restrictions. Firstly, every type referred to in a public interface must be public. So this would be invalid:

internal struct LocalInstant {}

// Doesn’t compile: Inconsistent accessibility
public interface ICalendarSystem

    LocalInstant GetLocalInstant(int year, int month, int day);
}

So far, so good. It’s entirely reasonable that a public member’s declaration shouldn’t refer to an internal type. Calling code wouldn’t understand what LocalInstant was, so how could it possibly use ICalendarSystem sensibly? But suppose we only wanted to declare the interface internally. That should be okay, right? Indeed, the compiler allows the following code:

internal struct LocalInstant {}

// Compiles with no problems
internal interface ICalendarSystem
{
    LocalInstant GetLocalInstant(int year, int month, int day);
}

But hang on… isn’t GetLocalInstant public? That’s what I said earlier, right? So we’re declaring a public member using an internal type… which we thought wasn’t allowed. Is this a compiler bug?

Well, no. My earlier claim that "a public member’s declaration shouldn’t refer to an internal type" isn’t nearly precise enough. The important aspect isn’t just whether the member is declared public – but its accessibility domain. In this case, the accessibility domain of ICalendarSystem.GetLocalInstant is only the assembly, which is why it’s a valid declaration.

However, life becomes fun when we try to implement ICalendarSystem in a public class. It’s perfectly valid for a public class to implement an internal interface, but we have some problems declaring the method implementing GetLocalInstant. We can’t make it a public method, because at that point its accessibility domain would be anything referring to the assembly, but the accessibility domain of LocalInstant itself would still only be the assembly. We can’t make it internal, because it’s implementing an interface member, which is public.

There is an alternative though: explicit interface implementation. That comes with all kinds of other interesting points, but it does at least compile:

internal struct LocalInstant {}

internal interface ICalendarSystem
{
    LocalInstant GetLocalInstant(int year, int month, int day);
}

public class GregorianCalendarSystem : ICalendarSystem
{
    // Has to be implemented explicitly
    LocalInstant ICalendarSystem.GetLocalInstant(int year, int month, int day);
    {
        // Implementation
    }
}

So, we’ve got somewhere at this point. We’ve managed to make a type used within an interface internal, but at the cost of making the interface itself internal, and requiring explicit interface implementation within any public classes implementing the interface.

That could potentially be useful in Noda Time, but it doesn’t solve our real LocalInstant / ICalendarSystem problem. We need ICalendarSystem to be public, because consumers need to be able to specify a calendar when they create an instance of ZonedDateTime or something similar. Interfaces are just too demanding in terms of publicity.

Fortunately, we have another option up our sleeves…

Abstract classes to the rescue!

I should come clean at this point and say that generally speaking, I’m an interface weenie. Whenever I need a reusable and testable abstraction, I reach for interfaces by default. I have a general bias against concrete inheritance, including abstract classes. I’m probably a little too harsh on them though… particularly as in this case they do everything I need them to.

In Noda Time, I definitely don’t need the ability to implement ICalendarSystem and derive from another concrete class… so making it a purely abstract class will be okay in those terms. Let’s see what happens when we try:

internal struct LocalInstant {} 

public abstract class CalendarSystem

    internal abstract LocalInstant GetLocalInstant(int year, int month, int day);

internal class GregorianCalendarSystem : CalendarSystem
{  
    internal override LocalInstant GetLocalInstant(int year, int month, int day)
    { 
        // Implementation
    } 
}

Hoorah! Now we’ve hidden away LocalInstant but left CalendarSystem public, just as we wanted to. We could make GregorianCalendarSystem public or not, as we felt like it. If we want to make any of CalendarSystem‘s abstract methods public, then we can do so provided they don’t require any internal types. There’s on interesting point though: types outside the assembly can’t derive from CalendarSystem. It’s a little bit as if the class only provided an internal constructor, but with a little bit more of an air of mystery… you can override every method you can actually see, and still get a compile-time error message like this:

OutsideCalendar.cs(1,14): error CS0534: ‘OutsideCalendar’ does not implement inherited abstract member
        ‘CalendarSystem.GetLocalInstant(int, int, int)’

I can just imagine the author of the other assembly thinking, "But I can’t even see that method! What is it? Where is it coming from?" Certainly a case where the documentation needs to be clear. Whereas it’s impossible to create an interface which is visible to the outside world but can’t be implemented externally, that’s precisely the situation we’ve reached here.

The abstract class is a little bit like an authentication token given by a single-sign-on system. From the outside, it’s an opaque item: you don’t know what’s in it or how it does its job… all you know is that you need to obtain it, and then you can use it to do other things. On the inside, it’s much richer – full of useful data and members.

Conclusion

Until recently, I hadn’t thought of using abstract classes like this. It would possibly be nice if we could use interfaces in the same way, effectively limiting the implementation to be in the declaring assembly, but letting the interface itself (and some members) be visible externally.

A bigger question is whether this is a good idea in terms of design anyway. If I do make LocalInstant internal, there will be a lot of interfaces which go the same way… or become completely internal. For example, the whole "fields" API of Noda Time could become an implementation detail, with suitable helper methods to fetch things like "how many days are there in the given month." The fields API is an elegant overall design, but it’s quite complicated considering the very limited situations in which most callers will use it.

I suspect I will try to go for this "reduced API" for v1, knowing that we can always make things more public later on… that way we give ourselves a bit more flexibility in terms of not having to get everything right first time within those APIs, too.

Part of me still feels uncomfortable with the level of hiding involved – I know other developers I respect deeply who hide as little as possible, for maximum flexibility – but I do like the idea of an API which is really simple to browse.

Aside from the concrete use case of Noda Time, this has proved an interesting exercise in terms of revisiting accessibility and the rules on what C# allows.