Quick Guide to moving from the Kinect SDK beta 2 to v1

If you had been working with the beta 2 of the Kinect SDK prior to February 1st, you may have felt dismay at the number of API changes that were introduced in v1.

After porting several Kinect applications from the beta 2 to v1, however, I finally started to see a pattern to the changes.  For the most part, it is simply a matter of replacing one set of boilerplate code for another set of boilerplate code.  Any unique portions of the code can for the most part be left alone.

In this post, I want to demonstrate five simple code transformations that will ease your way from the beta 2 to the Kinect SDK v1.  I’ll do it boilerplate fragment by boilerplate fragment.

1. Namespaces have been shifted around.  Microsoft.Research.Kinect.Nui is now just Microsoft.Kinect.  Fortunately Visual Studio makes resolving namespaces relatively easy, so we can just move on.

2. The Runtime type, the controller object for working with data streams from the Kinect, is now called a KinectSensor type.  Grabbing an instance of it has also changed.  You used to just new up an instance like this:

Runtime nui = new Runtime();

Now you instead grab an instance of the KinectSensor from a static array containing all the KinectSensors attached to your PC. 

KinectSensor sensor = KinectSensor.KinectSensors[0];

3. Initializing a KinectSensor object to start reading the color stream, depth stream or skeleton stream has also changed.  In the beta 2, the initialization procedure just didn’t look very .NET-y.  In v1, this has been cleaned up dramatically.  The beta 2 code for initializing a depth and skeleton stream looked like this:

_nui.SkeletonFrameReady += 
    new EventHandler<SkeletonFrameReadyEventArgs>(
        _nui_SkeletonFrameReady
        );
_nui.DepthFrameReady += 
    new EventHandler<ImageFrameReadyEventArgs>(
        _nui_DepthFrameReady
        );
_nui.Initialize(RuntimeOptions.UseDepth, RuntimeOptions.UseSkeletalTracking);
_nui.DepthStream.Open(ImageStreamType.Depth
    , 2
    , ImageResolution.Resolution320x240
    , ImageType.DepthAndPlayerIndex);
     

 

In v1, this boilerplate code has been altered so the Initialize method goes away, roughly replaced by a Start method.  The Open methods on the streams, in turn, have been replaced by Enable.  The DepthAndPlayerIndex data is made available simply by having the skeleton stream enabled.  Also note that the event argument types for the depth and color streams are now different.  Here is the same code in v1:

sensor.SkeletonFrameReady += 
    new EventHandler<SkeletonFrameReadyEventArgs>(
        sensor_SkeletonFrameReady
        );
sensor.DepthFrameReady += 
    new EventHandler<DepthImageFrameReadyEventArgs>(
        sensor_DepthFrameReady
        );
sensor.SkeletonStream.Enable();
sensor.DepthStream.Enable(
    DepthImageFormat.Resolution320x240Fps30
    );
sensor.Start();

4. Transform Smoothing: it used to be really easy to smooth out the skeleton stream in beta 2.  You simply turned it on.

nui.SkeletonStream.TransformSmooth = true;

In v1, you have to create a new TransformSmoothParameters object and pass it to the skeleton stream’s enable property.  Unlike the beta 2, you also have to initialize the values yourself since they all default to zero.

sensor.SkeletonStream.Enable(
    new TransformSmoothParameters() 
    {   Correction = 0.5f
    , JitterRadius = 0.05f
    , MaxDeviationRadius = 0.04f
    , Smoothing = 0.5f });

5. Stream event handling: handling the ready events from the depth stream, the video stream and the skeleton stream also used to be much easier.  Here’s how you handled the DepthFrameReady event in beta 2 (skeleton and video followed the same pattern):

void _nui_DepthFrameReady(object sender
    , ImageFrameReadyEventArgs e)
{
    var frame = e.ImageFrame;
    var planarImage = frame.Image;
    var bits = planarImage.Bits;
    // your code goes here
}

For performance reasons, the newer v1 code looks very different and the underlying C++ API leaks through a bit.  In v1, we are required to open the image frame and check to make sure something was returned.  Additionally, we create our own array of bytes (for the depth stream this has become an array of shorts) and populate it from the frame object.  The PlanarImage type which you may have gotten cozy with in beta 2 has disappeared altogether.  Also note the using keyword to dispose of the ImageFrame object. The transliteration of the code above now looks like this:

void sensor_DepthFrameReady(object sender
    , DepthImageFrameReadyEventArgs e)
{
    using (var depthFrame = e.OpenDepthImageFrame())
    {
        if (depthFrame != null)
        {
            var bits =
                new short[depthFrame.PixelDataLength];
            depthFrame.CopyPixelDataTo(bits);
            // your code goes here
        }
    }
}

 

I have noticed that many sites and libraries that were using the Kinect SDK beta 2 still have not been ported to Kinect SDK v1.  I certainly understand the hesitation given how much the API seems to have changed.

If you follow these five simple translation rules, however, you’ll be able to convert approximately 80% of your code very quickly.

The right way to do Background Subtraction with the Kinect SDK v1

greenscreen

MapDepthFrameToColorFrame is a beautiful method introduced rather late into the Kinect SDK v1.  As far as I know, it primarily has one purpose: to make background subtraction operations easier and more performant.

Background subtraction is a technique for removing any pixels in an image that are not the primary actors.  Green Screening – which if you are old enough to have seen the original Star wars when it came out is known to you as Blue Screening – is a particular implementation of background subtraction in the movies which has actors performing in front of a green background.  The green background is then subtracted from the final film and another background image is inserted in its place.

With the Kinect, background subtraction is accomplished by comparing the data streams rendered by the depth camera and the color camera.  The depth camera will actually tell us which pixels of the depth image belong to a human being (with the pre-condition that Skeleton Tracking must be enabled for this to work).  The pixels represented in the depth stream must then be compared to the pixels in the color stream in order to subtract out any pixels that do not belong to a player.  The big trick is each pixel in the depth stream must be mapped to an equivalent pixel in the color stream in order to make this comparison possible.

I’m going to first show you how this was traditionally done (and by “traditionally” I really mean in a three to four month period before the SDK v1 was released) as well as a better way to do it.  In both techniques, we are working with three images: the image encoded in the color stream, the image encoded in the depth stream, and the resultant “output” bitmap we are trying to reconstruct pixel by pixel.

The traditional technique goes through the depth stream pixel by pixel and tries to extrapolate that same pixel location in the color stream one at a time using the MapDepthToColorImagePoint method.

var pixelFormat = PixelFormats.Bgra32;
WriteableBitmap target = new WriteableBitmap(depthWidth
    , depthHeight
    , 96, 96
    , pixelFormat
    , null);
var targetRect = new System.Windows.Int32Rect(0, 0
    , depthWidth
    , depthHeight);
var outputBytesPerPixel = pixelFormat.BitsPerPixel / 8;
sensor.AllFramesReady += (s, e) =>
{
 
    using (var depthFrame = e.OpenDepthImageFrame())
    using (var colorFrame = e.OpenColorImageFrame())
    {
        if (depthFrame != null && colorFrame != null)
        {
            var depthBits = 
                new short[depthFrame.PixelDataLength];
            depthFrame.CopyPixelDataTo(depthBits);
 
            var colorBits = 
                new byte[colorFrame.PixelDataLength];
            colorFrame.CopyPixelDataTo(colorBits);
            int colorStride = 
                colorFrame.BytesPerPixel * colorFrame.Width;
 
            byte[] output =
                new byte[depthWidth * depthHeight
                    * outputBytesPerPixel];
 
            int outputIndex = 0;
 
            for (int depthY = 0; depthY < depthFrame.Height
                ; depthY++)
            {
                for (int depthX = 0; depthX < depthFrame.Width
                    ; depthX++
                    , outputIndex += outputBytesPerPixel)
                {
                    var depthIndex = 
                        depthX + (depthY * depthFrame.Width);
 
                    var playerIndex = 
                        depthBits[depthIndex] &
                        DepthImageFrame.PlayerIndexBitmask;
 
                    var colorPoint = 
                        sensor.MapDepthToColorImagePoint(
                        depthFrame.Format
                        , depthX
                        , depthY
                        , depthBits[depthIndex]
                        , colorFrame.Format);
 
                    var colorPixelIndex = (colorPoint.X 
                        * colorFrame.BytesPerPixel) 
                        + (colorPoint.Y * colorStride);
 
                    output[outputIndex] = 
                        colorBits[colorPixelIndex + 0];
                    output[outputIndex + 1] = 
                        colorBits[colorPixelIndex + 1];
                    output[outputIndex + 2] = 
                        colorBits[colorPixelIndex + 2];
                    output[outputIndex + 3] = 
                        playerIndex > 0 ? (byte)255 : (byte)0;
 
                }
            }
            target.WritePixels(targetRect
                , output
                , depthFrame.Width * outputBytesPerPixel
                , 0);
 
 
        }
 
    }
 
};

You’ll notice that we are traversing the depth image by going across pixel by pixel (the inner loop) and then down pixel row by pixel row (the outer loop).  The pixel width of the bitmap, for reference, is known as its stride.  Then inside the inner loop, we are mapping each depth pixel to its equivalent color pixel in the color stream by using the MapDepthToColorImagePoint method.

It turns out that these calls to MapDepthToColorImagePoint are rather expensive.  It is much more efficient to simply create an array of ColorImagePoints and populate it in one go before doing any looping.  This is exactly what MapDepthFrameToColorFrame does.  The following example uses it in place of the iterative MapDepthToColorImagePoint method.  It has an added advantage in that, instead of having to iterate through the depth stream column by column and row by row, I can simply go through the depth stream pixel by pixel, removing the need for nested loops.

var pixelFormat = PixelFormats.Bgra32;
WriteableBitmap target = new WriteableBitmap(depthWidth
    , depthHeight
    , 96, 96
    , pixelFormat
    , null);
var targetRect = new System.Windows.Int32Rect(0, 0
    , depthWidth
    , depthHeight);
var outputBytesPerPixel = pixelFormat.BitsPerPixel / 8;
 
sensor.AllFramesReady += (s, e) =>
{
 
    using (var depthFrame = e.OpenDepthImageFrame())
    using (var colorFrame = e.OpenColorImageFrame())
    {
        if (depthFrame != null && colorFrame != null)
        {
            var depthBits = 
                new short[depthFrame.PixelDataLength];
            depthFrame.CopyPixelDataTo(depthBits);
 
            var colorBits = 
                new byte[colorFrame.PixelDataLength];
            colorFrame.CopyPixelDataTo(colorBits);
            int colorStride = 
                colorFrame.BytesPerPixel * colorFrame.Width;
 
            byte[] output =
                new byte[depthWidth * depthHeight
                    * outputBytesPerPixel];
 
            int outputIndex = 0;
 
            var colorCoordinates =
                new ColorImagePoint[depthFrame.PixelDataLength];
            sensor.MapDepthFrameToColorFrame(depthFrame.Format
                , depthBits
                , colorFrame.Format
                , colorCoordinates);
 
            for (int depthIndex = 0;
                depthIndex < depthBits.Length;
                depthIndex++, outputIndex += outputBytesPerPixel)
            {
                var playerIndex = depthBits[depthIndex] &
                    DepthImageFrame.PlayerIndexBitmask;
 
                var colorPoint = colorCoordinates[depthIndex];
 
                var colorPixelIndex = 
                    (colorPoint.X * colorFrame.BytesPerPixel) +
                                    (colorPoint.Y * colorStride);
 
                output[outputIndex] = 
                    colorBits[colorPixelIndex + 0];
                output[outputIndex + 1] = 
                    colorBits[colorPixelIndex + 1];
                output[outputIndex + 2] = 
                    colorBits[colorPixelIndex + 2];
                output[outputIndex + 3] = 
                    playerIndex > 0 ? (byte)255 : (byte)0;
 
            }
            target.WritePixels(targetRect
                , output
                , depthFrame.Width * outputBytesPerPixel
                , 0);
 
        }
 
    }
 
};

Why the Kinect for Windows Sensor Costs $249.99

 

This post is purely speculative.  I have no particular insight into Microsoft strategy.  Now that I’ve disqualified myself as any sort of authority on this matter, let me explain why the $249.99 price tag for the new Kinect for Windows sensor makes sense.

The new Kinect for Windows sensor went on the market earlier this month  for $249.99.  This has caused some consternation and confusion since the Kinect for Xbox sensor only costs $150 and sometimes less when bundled with other Xbox products.

Officially the Kinect for Windows sensor is the sensor you should use with the Kinect for Windows SDK – the libraries that Microsoft provides for writing programs that take advantage of the Kinect.  Prior to the release of the v1 of the SDK, there was the Kinect SDK beta and then the beta 2.  These could be used in non-commercial products and research projects with the original Xbox sensor.

By license, if you want to use the Kinect for Windows SDK publicly, however, you must use the Kinect for Windows hardware.  If you previously had a non-commercial product running with the Kinect for Xbox sensor and the beta SDK and want to upgrade to the v1 SDK, you will also need to upgrade your hardware to the more expensive model.  In other words, you will need to pay an additional $249.99 to get the correct hardware.  The one exception is for development.  You can still use the less expensive version of the sensor for development.  Your users must use the more expensive version of the sensor once the application is deployed.

I can make this even more complicated.  If you want to use one of the non-Microsoft frameworks + drivers for writing Kinect enabled applications such as OpenNI, you are not required to use the new Kinect for Windows hardware.  Shortly after the release of the original Kinect for Xbox sensor in 2010, Microsoft acknowledged that efforts to create drivers and APIs for the sensor were okay and they have not gone back on that.  You are only required to purchase the more expensive hardware if you are using the official Microsoft drivers and SDK.

So what is physically different between the new sensor and the old one?  Not much, actually.  The newer hardware has different firmware, for one thing.  The newer firmware allows depth detection as near as 40 cm. The older firmware only allowed depth detection from 80 cm.  However, the closer depth detection can only be used when the near mode flag is turned on.  Near mode is from 40 cm to 300 cm while the default mode is from 80 cm to 400 cm. In v1 of the SDK, near mode = true has the unfortunate side-effect of disabling skeleton tracking for the entire 40 cm to 300 cm range.

Additionally, the newer firmware identifies the hardware as Kinect for Windows hardware.  The Kinect for Windows SDK checks for this.  For now, the only real effect this has is that if the full SDK is not installed on a machine (i.e. a non-development machine) a Kinect for Windows application will not work with the old Xbox hardware.  If you do have the full SDK installed, then you can continue to develop using the Xbox sensor.  For completeness, if a Kinect for Windows application is running on a machine with the Kinect for Windows hardware and the full SDK is not installed on that machine, the application will still work.

The other difference between the Kinect for Windows sensor and the Kinect for Xbox sensor is that the usb/power cord is slightly different.  It is shorter and, more importantly, is designed for the peculiarities of a PC.  The Kinect for Xbox sensor usb/power cord was designed for the peculiarities of the Xbox usb ports.  Potentially, then, the Kinect for Windows sensor will just operate better with a PC than the Kinect for Xbox sensor will.

Oh.  And by the way, you can’t create Xbox games using the Kinect for Windows SDK and XNA.  That’s not what it is for.  It is for building PC applications running on Windows 7 and, eventually, Windows 8.

So, knowing all of this, why is Microsoft forcing people to dish out extra money for a new sensor when the old one seems to work fine?

Microsoft is pouring resources into developing the Kinect SDK.  The hacker community has asked them to do this for a while, actually, because they 1) understand the technologies behind the Kinect and 2) have experience building APIs.  This is completely in their wheelhouse.

The new team they have built up to develop the Kinect SDK is substantial and – according to rumor – is now even larger than the WPF and Silverlight teams put together.  They have now put out an SDK that provides pretty much all the features provided by projects like OpenNI but have also surpassed them with superior skeleton recognition and speech recognition.  Their plans for future deliverables, from what I’ve seen, will take all of this much further.  Over the next year, OpenNI will be left in the dust.

How should Microsoft pay for all of this?  A case can be made that they ought to do this for free.  The Kinect came along at a time when people no longer considered Microsoft to be a technology innovator anymore.  Their profits come from Windows and then Office while their internal politics revolve around protecting these two cash cows.  The Kinect proved to the public at large (and investors) not only that all that R&D money over the years had been well spent but also that Microsoft could still surprise us.  It could still do cool stuff and hadn’t completely abdicated technology and experience leadership to the likes of Apple and Google.  Why not pour money into the Kinect simply for the sake of goodwill?  How do you put a price on a Microsoft product that actually makes people smile?

Yeah, well.  Being a technology innovator doesn’t mean much to investors if those innovations don’t also make money.  The prestige of a product internally at Microsoft also depends on how much money your team wields.  To the extent that money is power, the success of the Kinect for non-gaming purposes depends on the ability of the new SDK to generate revenue.  Do you remember the inversion from the musical Camelot when King Arthur says that Might makes Right should be turned around in Camelot into Right makes Might?  The same sort of inversion occurs hear.  We’ve grown used to the notion that Money can make anything Cool.  The Kinect will test out the notion, within Microsoft, that Cool can also make Money.

So how should Microsoft make that money?  They could have opted to charge developers for a license to build on their SDK.  I’m grateful they didn’t, though.  This would have ended up being a tax on community innovation.  Instead, developers are allowed to develop on the Kinects they already have if they want to (the $150 Kinect).

Microsoft opted to invest in innovation.  They are giving the SDK away for free.  And now we all wait for someone to build a killer Kinect for Windows app.  Whoever does that will make a killing.  This isn’t anything like building phone apps or even Metro apps for the Windows 8 tablet.  We’re talking serious money.  And Microsoft is betting on someone coming along and building that killer app in order to recoup its investment since Microsoft won’t start making money until there is an overriding reason for people to start buying the Kinect for Windows hardware (e.g. that killer app).

This may not happen, of course.  There may never be a killer app to use with the Kinect for Windows sensor.  But in this case Microsoft can’t be blamed for hampering developers in any way.  They aren’t even charging us a developer fee the way the Windows Phone marketplace or IOS developer program does.  Instead, with the Kinect for Windows pricing, they’ve put their full faith in the developer community.  And by doing this, Microsoft shows me that they can, in fact, occasionally be pretty cool.

2011: The Year in Review

cover-001

2011 was an extremely busy and exciting year.  I had the chance to go to more conferences than I ever have before: MIX11, An Event Apart and BUILD were highlights for me.

Blog: I wrote several blog posts I was rather proud of – much as a doting father would be.  The most popular was Windows Phone 7 at a Crossroads which received more comments than I typically get as well as extremely flattering outside attention from codeproject.com and the Windows Phone Dev Podcast.  My two personal favorites, however, were one on Delight and another called The Kinect’s Past which received a comment from Bill Buxton.

Speaking Engagements: I also had a busy year speaking at the Greater Gwinnett Microsoft User Group, the Atlanta .NET User Group, CodeStock 2011, MADExpo 2011, Web Visions and SIEGE 2011 as well as a private presentation on UX for Microsoft and Bank of America.  I also did a podcast interview for IEEE Spectrum about Windows 8.

Keynote: I was invited to give one of the two keynotes at the Mid Atlantic Developer Expo conference.  It was a distinct honor and an extremely fun event.

User Group: I spent another year running the Silverlight Atlanta User Group.  Corey Schuman and I have been organizing and maintaining the Silverlight User Group for two years, now, and only recently changed the name to the Atlanta XAML group after what was frankly a very tough year for Silverlight.

Conference: It was the second year I led the organizing of the ReMIX conference in Atlanta.  Our attendance was up to 450 this year.  More importantly, we were able to get just about every speaker we wanted including Rick Barraza, August de los Reyes, Josh Blake, Arturo Toledo and Albert Shum.  We also had a single track devoted just to the Kinect.  I want to thank the other organizers of ReMIX for indulging me in this: Cliff Jacobson, Sean Gerety, Wells Caughey, Dennis Estanislao and Farhan Rabbi.

Book: I spent the last quarter of this year working on a Kinect SDK book for APress with my colleague Jarrett Webb.  This was a good outcome since I spent the first part of the year writing several chapters of a Windows Phone book that didn’t get to see the light of day.  Expect to see the Kinect book towards the beginning of February.

My most impressive achievement, however, was catching up on five seasons of The Wire.  There are lots of blog posts going up around the web right now purporting to give advice about what you should and should not do in 2012.  My advice is short and sweet: you need to watch The Wire.  If you don’t, you are a horrible person, hate America, are aiding and abetting terrorists and are preventing George R. R. Martin from completing his next novel.