Ultimate Man Cave: voice automation for my shed

Source code for everything is on Github. It probably won’t be useful to you unless you’ve got very similar hardware to mine, but you may want to just have a look.

Background

Near the end of 2015, we had a new shed built at the back of our garden. The term “shed” is downplaying it somewhat – it’s a garden building, about 7m x 2.5m, with heating, lighting and an ethernet connection from the house.

It’s divided in half, with one half being a normal shed (lawnmower, wheelbarrow, tools etc) and one half being my office for working from home. Both sides are also used for general storage – we have a lot of stuff to sort out from a loft conversion a few years ago.

The shed

It only took about three days of using the shed for me to work out that I wanted remote-controlled lighting. If I’m going out there at 6.30am in winter, it’s pretty dark – so it’s really useful to be able to turn the lights on from the house first, so I can negotiate the muddier bits of the garden, see the keyhole to unlock it etc.

After a little research, this turned out to be pretty easy: MiLight is simple and relatively cheap. The equivalent of $100 got me four lights and a wifi controller box. It only took me a few minutes to configure it to talk to my wifi, install the Light Controller android app, and I could easily turn my lights on and off from my phone from the house, before stepping outside. Yay. First steps to home automation.

I won’t go into all the details of the rest of the tech in my shed, but the important parts for the purposes of this post are:

A Sonos Connect
An Onkyo TX-NR616 receiver/amplifier
A Raspberry Pi 3 with a Kinobo USB microphone

Command-line automation

Sometimes, I’m too lazy to reach for my phone when I want to turn on the lights. Very much a first world problem, I realize. And not so much a problem, as an opportunity to see what’s feasible.

So, I looked around the net for code related to MiLight / EasyBulb, and found (amongst other things) Andy Scott’s MiLight.NET library on Github. A small amount of tweaking, and I had a short console app allowing me to run “lights on” or “lights off” which did the obvious thing. Amongst other things, copying this onto an Intel NUC allowed me to turn the lights off via remote desktop when Holly messaged me at the (Google) office to tell me that I’d left them on. It also meant I could schedule a task to turn the lights off at 10.30pm automatically, in case I forgot when I came in.

For a few months, that kept me satisfied… but it was never going to be the final solution.

The next step was to look at other aspects I could automate, and both the amplifier/receiver and the Sonos unit were obvious targets. I knew both had network support, as I already had apps for both on my phone, but I had no idea what the protocols involved were. The amplifier lives in an A/V cabinet, and I normally keep the doors of that shut – so just turning it on, setting the source, and changing the volume either involved getting the phone out or opening the cabinet. Again, could do better.

Sonos supports UPnP/SOAP for control. An old blog post got me started, and then I used Intel Device Spy to work out what else I could easily do. (I don’t have very demanding requirements – just play/pause, set volume, next/previous track is fine.)

It turns out that Onkyo has its own protocol called ISCP (Integra Serial Control Protocol) which has a network binding called eISCP. There’s remarkably good documentation in the form of an Excel spreadsheet, providing more information than I’m ever likely to need.

Implementing both of these was slightly faffy. The eISCP code didn’t work for some time, then started working – presumably with some minor tweak, but it wasn’t clear to me which of the many tweaks I made actually fixed it. The Sonos code worked fairly soon, but was very inelegant for quite a while.

Initially, this was all driven from the command line. I introduced a very simple sort of discovery, separating out controllers from their commands:

public interface IController
{
    string Name { get; }
    IImmutableList<ICommand> Commands { get; }
}

public interface ICommand
{
    string Name { get; }
    string Description { get; }

    void Execute(params string[] arguments);
}

There’s then a Factory class with a static AllControllers property. (I’m not keen on the naming here, but we’ll come to that later.)

The fact that Execute takes a string array is indicative of its use for a command line application – although looking at it now, I might have made it IEnumerable given that I’ll always be skipping the first actual argument which identifies the controller.

Anyway, this allows a very simple command line app which doesn’t know anything about lights, music etc – it just offers you the controllers and commands it finds.

There’s only actually one implementation of IController, calledReflectiveController. You pass it the real controller to wrap, which can be any instance of a type with a description and with public methods which also have descriptions. These descriptions are provided with an attribute. The arguments passed to Execute are then converted to the method parameter types using Convert.ChangeType. Crude but effective.

With this in place, adding a new command to an existing controller is just a matter of adding a public method. Adding a new controller is just a matter of creating a new class with a description, and adding it to the list of controllers in Factory. It’s all really, really simple.

Deploy to the Pi!

This was the aim all along, of course – I’ve been wanting to try out Windows IoT edition, and put my Raspberry Pi to good use, and try out Windows UAP to get a feeling for it. (In particular, I want to learn about some of the constraints I’ll run into with Noda Time 2.0.) This project was a fantastic excuse to do all three.

I started off by building the application just on my laptop. This is one of the lovely benefits of universal apps – you can get them working in a convenient environment first, then deploy elsewhere when you’re ready.

In fact, the very first version of the app didn’t have any speech recognition – it just had buttons to turn the lights on or off. I checked that this worked on both my laptop and the Raspberry Pi – it was nice to see that Windows IoT still supports a UI over HDMI, and it all worked fine, first time. A few years ago, this would have been absolutely stunning in itself – but I think we’re starting to take portability for granted.

Voice automation

On to the final steps: adding speech recognition.

I had a bit of a false start, as there are multiple approaches to speech recognition in Windows UAP. Initially I tried using Cortana, but never got that to work. Instead, I went with the Windows.Media.SpeechRecognition library, which worked pretty much immediately. Again, my initial attempt was more complicated than it needed to be, using an SRGS grammar file. This worked, but it was fiddly. When I discovered the SpeechRecognitionListConstraint class, it was beautiful… it’s literally just a list of strings, and the speech recognizer raises an event when any of those strings is recognized.

The code required to start the speech recognition is trivial:

private async void RegisterVoiceActivation(object sender, RoutedEventArgs e)
{
    recognizer = new SpeechRecognizer
    {
        Constraints = { new SpeechRecognitionListConstraint(handlers.Keys) }
    };
    recognizer.ContinuousRecognitionSession.ResultGenerated += HandleVoiceCommand;
    recognizer.StateChanged += HandleStateChange;

        SpeechRecognitionCompilationResult compilationResult = await recognizer.CompileConstraintsAsync();

    if (compilationResult.Status == SpeechRecognitionResultStatus.Success)
    {
        await recognizer.ContinuousRecognitionSession.StartAsync();
    }
    else
    {
        await Dispatcher.RunIdleAsync(_ => lastState.Text = $"Compilation failed: {compilationResult.Status}");
    }
}

Given the way we’re compiling the constraints, I’d be reasonably happy not checking the compilation result, but I just never took that code away after using it for SRGS (where it was very much required).

The HandleVoiceCommand method just checks whether the recognition confidence is above a certain threshold (0.6 at the moment, but I may tweak it down a bit), and if so, it consults a dictionary to find out a delegate to invoke. It also updates the UI for diagnostic purposes. The dictionary itself is the only code that knows about the shed controllers, using import static to avoid having Factory. everywhere:

private const string Prefix = "shed ";

private static readonly Dictionary<string, Action> handlers = new Dictionary<string, Action>
{
    { "lights on", Lighting.On },
    { "lights off", Lighting.Off },
    { "music play", Sonos.Play },
    { "music pause", Sonos.Pause },
    { "music mute", () => Sonos.SetVolume(0) },
    { "music quiet", () => Sonos.SetVolume(30) },
    { "music medium", () => Sonos.SetVolume(60) },
    { "music loud", () => Sonos.SetVolume(90) },
    { "music next", Sonos.Next },
    { "music previous", Sonos.Previous },
    { "music restart", Sonos.Restart },
    { "amplifier on", Amplifier.On },
    { "amplifier off", Amplifier.Off },
    { "amplifier mute", () => Amplifier.SetVolume(0) },
    { "amplifier quiet", () => Amplifier.SetVolume(30) },
    { "amplifier medium", () => Amplifier.SetVolume(50) },
    { "amplifier loud", () => Amplifier.SetVolume(60) },
    { "amplifier source pie", () => Amplifier.Source("pi") },
    { "amplifier source sonos", () => Amplifier.Source("sonos") },
    { "amplifier source playstation", () => Amplifier.Source("ps4") }
}.WithKeyPrefix(Prefix);

Here, WithKeyPrefix is just a small extension method to create a new dictionary with a specified prefix to each key.

Just like with the command line version, adding a command is now simply a matter of adding a single entry in this dictionary.

Deploy that on my Raspberry Pi, and as if by magic, I can say “shed lights on” and the lights come on, etc. Admittedly after saying “shed music play” it can be quite tricky to launch further actions, as the music interferes with the speed recognition for obvious reasons.

Simple code for the win

I’d like to take a few moments to talk about the code. At this point, you may want to have Github open in another tab to follow along.

There are lots of things about the code which I’d deem pretty unacceptable at work:

It uses the service locator pattern instead of dependency injection. I’m not a fan of this in general.
I really hate the name Factory – but I haven’t found anything significantly better, yet. (ControllerProvider? I’d call it just Controllers, but that’s the final part of the namespace name…)
There are no tests. At all. Not even a test project.
There are only a few comments.
The IP addresses are hard-coded into Factory. No config files, no discovery, not even names – just IP addresses.
There’s no abstraction beyond IController and ICommand. I could potentially have an IVolumeController, IMusicController, ISourceController etc.

None of these bother me, even though the code is “in production” and I’m expecting to use it for a long time. It’s never going to grow large enough for the service locator pattern to be a problem. With so few types involved, a few non-ideal names isn’t going to cause much of a problem. The only tests that matter are the ones involving me saying “shed amplifier on” and the amplifier either turning on or not… there’s very little code here that’s really testable anyway. My device IP addresses are all fixed by my router, so I’d only have to change them if I change that – and I’d still end up changing it in just one place. Extra abstraction wouldn’t actually give me any benefits at the moment.

So yes, basically I’m happy with the code now. It provides me value, and it’s easy to maintain. In particular, adding extra controllers or commands is trivial. I guess what I’m saying is that this is a reminder that not all code is “enterprise software” and even “best practice” rules such as writing no code without tests have their limitations. Context is king.

What next?

My Raspberry Pi 3 has a small touchscreen display on it, which uses the Rasperry Pi SPI for communication. I haven’t yet managed to get this working, but obviously that would be a lovely next step. It’s a bit of a pain changing from Displayport to HDMI to see the UI and check what phrases have been recognized, for example. The display part will definitely be useful – I might use the touch part just for a very few key commands, such as “stop the music, you can’t hear me any more!”

The device I’d most like to control next is the heater. I keep leaving the heating on accidentally, then having to put my shoes on again to go out and just turn the heating off. If the heater plugged in via a regular socket, it would be easy enough to sort out – but unfortunately the power cable goes straight into a box in the wall. I may try to sort this out at some point, but it’s going to be a pain.

The other thing I’d like to do is add the ability to switch monitor inputs using DDC/CI. That could be tricky in terms of getting access to such a low-level API, and also it requires a permanent “live” connection to the monitor – whereas both my HDMI and Displayport connections are switched (by the Onkyo for HDMI, and a KVM for Displayport). I’m still thinking about that one. I could potentially have a secondary output from the NUC to a DVI input on the monitor, then make the NUC listen as a server that the Pi could talk to…

Conclusion

Home automation is fun and simple – but it really, really helps to have a project which will actually be useful to you. I’ve had a few Raspberry Pis sitting around for ages waiting to be used. They’ve always been fun to play with, but now there’s a purpose, and that makes a huge difference…

11 thoughts on “Ultimate Man Cave: voice automation for my shed”

AlSki says:

March 27, 2016 at 10:26 am

So, if I want to top a Jon Skeet C# project, now I have to build a shed too!!!! ;-)

LikeLiked by 1 person

1. Brian Ogletree says:
  
  March 28, 2016 at 3:45 am
  
  I’ll have to build a shed too, cause my wife will definitely NOT let me turn the house into StarTrek land. :)
  Jon; how goes the implementation of “Earl Grey, Hot!”?
  
  LikeLiked by 2 people
  
Elmo says:

May 10, 2016 at 2:48 pm

I would love to hear what microphone you were using as this seemed to be the largest problem for me. With a mobile phone talking to, i this it is quite easy to use the voice recognition. But how do you solve new List(){“Amplifier On”, “Amplifier Loud”, “Amplifier Off”}

LikeLike

1. jonskeet says:
  
  May 10, 2016 at 2:50 pm
  
  I linked to the microphone I’m using in the hardware list. It works fine, other than also listening to video conferences, music etc…
  
  LikeLike
  
Edwinem says:

May 18, 2016 at 10:18 am

You are a boss. I’m starting my own “voice command’s tray box” (I accept suggestions about the name…). I’m seeing (hearing actually) my self saying “steam open ” or “shutdown the-farthest-computer-you-can-imagine/mate-pc”

LikeLike

Mattsi says:

June 1, 2016 at 12:36 pm

What happens if you’re sitting in the dark listening to Roxanne by The Police?

LikeLike

No Comprende says:

July 21, 2016 at 5:53 pm

Aiming a bit lower than you, I am trying to find a use for my old 2G “Feature Phone” (Motorola EX124g), it has Bluetooth but not WiFi. There must be some kind of useful apps for controlling things using Bluetooth with an old phone, but I have not found anything. I had an amusing time walking around my neighborhood “discovering” lots of Bluetooth devices – I don’t have any and had never turned it on before (security concerns, you know).
Those English windows, fences and gardens look strangely familiar, haven’t been over there for a while. Regards –

LikeLike

Andy Marshall says:

June 14, 2017 at 8:45 am

I’m not sure you’ve applied the right solution here. I’m guessing that you’d spend the vast majority of your shed time sitting in front of your computer, right? Surely designing a dashboard to control all of this would be more appropriate?

LikeLike

1. jonskeet says:
  
  June 14, 2017 at 10:05 pm
  
  Often turning the lights off is the last thing I want to do – after I’ve already turned off the monitor, removed my laptop etc. Voice control is very handy for that (likewise first thing in the morning). To be honest though, this is more for the joy of messing with Win10IOT on Raspberry Pi…
  
  LikeLike
  
Chad McCune says:

December 21, 2018 at 12:38 am

I’m trying to get my code working for talking to my Integra receiver working, and, as you said…it’s…faffy :) Would you mind sending me the code you got working for talking to the receiver? Once I get it working in C#, I’m going to try and convert and integrate it into my SmartThings controller so I can have my receiver turn on, switch the correct input, etc along with my lights dimming etc, projector turning on etc when I say “hey google, lets play xbox”

LikeLike

1. jonskeet says:
  
  December 21, 2018 at 9:15 am
  
  All the code is at https://github.com/jskeet/DemoCode/tree/master/Shed – but I don’t know whether the code for an Integra receiver would be anything like that for my Onkyo.
  
  LikeLike