During lockdown, I’ve been doing quite a lot of tech work for my local church… mostly acting in a sort of “producer” role for our Zoom services, but also working out how we can enable “hybrid” services when some of us are back in our church buildings, with others still at home. (This is partly a long term plan. I never want to go back to letting down the housebound.)
This has involved sourcing a decent pan/tilt/zoom (PTZ) camera… and then having some fun with it. We’ve ended up using a PTZOptics NDI camera with 30x optical zoom. Now it’s one thing to have a PTZ camera, but then you need to work out what to do with it. There are lots of options on the “how do you broadcast” side of things, which I won’t go into here, but I was interested in the PTZ control part.
Before buying the camera, I knew that PTZOptics cameras exposed an HTTP port which provides a reasonable set of controls, so I was reasonably confident I’d be able to do something. I was also aware of the VISCA protocol and that PTZOptics cameras exposed that over the network as well as the more traditional RS-232 port… but I didn’t have much idea about what the network version of the protocol was.
The manual for the camera is quite detailed, including a complete list of VISCA commands in terms of “these are the bytes you send, and these are the bytes you receive” but without any sort of “envelope” description. It turns out, that’s because there is no envelope when working with VISCA over the network, apparently… you just send the bytes for the command packet (with TCP no-delay enabled of course), and read data until you see an FF that indicates the end of a response packet.
It took me longer to understand this “lack of an envelope” than to actually write the code to use it… once I’d worked out how to send a single command, I was able to write a reasonably complete camera control library quite easily. The code lacks documentation, tests, and decent encapsulation. (I have some ideas about the third of those, which will enable the second, but I need to find time to do the job properly.)
Today I’ve made that code available on GitHub. I’m hoping to refactor it towards decent encapsulation, potentially writing blog posts about that as I go, but until then it might prove useful to others even in its current form. Aside from anything else, it’s proof that I write scrappy code when I’m not focusing on dotting the Is and crossing the Ts, which might help to relieve imposter syndrome in others (while exacerbating it in myself.) I haven’t yet published a package on NuGet, and may never do so, but we’ll see. (It’s easy enough to clone and build yourself though.)
The library comes with a WPF demo app – which is even more scrappily written, without any view models etc. The demo app uses the WebEye WPF RTSP library to show “what the camera sees”. This is really easy to integrate, with one downside that it uses the default FFmpeg buffer size, so there’s a ~2s delay when you move the camera around. That means you wouldn’t want to use this for any kind of production purposes, but that’s not what it’s for :)
Here’s a screenshot of the demo app, focusing on the wind sculpture that Holly bought me as a present a few years ago, and which is the subject of many questions in meetings. (The vertical bar on the left of the sculpture is the door frame of my shed.) As you can see, the controls (top right) are pretty basic. It would be entirely possible to use the library for a more analog approach to panning and tilting, e.g. a rectangle where holding down the mouse button near the corners would move the camera quickly, whereas clicking nearer the middle would move it more slowly.
One of the natural questions when implementing a protocol is how portable it is. Does this work with other VISCA cameras? Well, I know it works with the SMTAV camera that I bought for home, but I don’t know beyond that. If you have a VISCA-compatible camera and could test it (either via the demo app or your own code) I’d be really interested to hear how you get on with it. I believe the VISCA protocol is fairly well standardized, but I wouldn’t be surprised if there were some corner cases such as maximum pan/tilt/zoom values that need to be queried rather than hard-coded.