Multi-modal User Input for Spatial Computing

input_mode

One of the fights that I thought we had put behind us is over the question ‘which interface is better?’ For instance, this question the was frequently brought up in comparisons of the mouse to the keyboard, its putative precursor. The same disputes came along again with the rise of natural user interfaces (NUI) when people began to ask if touch would put the mouse out of business. Always the answer has been no. Instead, we use all of these input modes side-by-side.

As Bill Buxton famously said, every technology is the best at something and the worst at something else. We use the interface best adapted to the goal we have in mind. In the case of data entry, the keyboard has always been the best tool. For password entry, on the other hand, while we have many options, including face and speech recognition, it is remarkable how often we turn to the standard keyboard or keypad.

Yet I’ve found myself sucked into arguments about which is the best interaction model, the HoloLens v1’s simple gestures, the Magic Leap One’s magnetic 6DOF controller, or the HoloLens v2’s direct manipulation (albeit w/o haptics) with hand tracking.

Ideally we would use them all. A controller can’t be beat for precision control. Direct hand manipulation is intuitive and fun. To each of these I can add a blue tooth XBox controller for additional freedom. And the best replacement for a keyboard turns out to be a keyboard (this is known as the Qwerty’s universal constant).

It was over two years ago at the Magic Leap conference that James Powderly, a spatial computing UX guru, set us on the direction of figuring out ways to use multiple input modalities at the same time. Instead of thinking of the XOR scenario (this or that but not both) we started considering the AND scenario for inputs. We had a project at the time, VIM – an architectural visualization and data reporting tool for spatial computing –, to try it out with. Our main rule in doing this was that it couldn’t be forced. We wanted to find a natural way to do multi-modal that made sense and hopefully would also be intuitive.

We found a good opportunity as we attempted to refine the ability to move building models around on a table-top. This is a fairly universal UX issue in spatial computing, which made it even more fascinating to us. There are usually a combination of transformations that can be performed on a 3D object at the same time for ease of interaction: translation (moving from position x1 to position x2), scaling the size of the object, and rotating the object. A common solution is to make each of these a different interaction mode triggered by clicking on a virtual button or something.

rotate_touch

But we went a different way. As you move a model in space by pointing the Magic Leap controller in different directions like a laser pointer with the building hanging off the end, you can also push it away by pressing on the top of the touch pad or rotate it by spinning your thumb around the edge of the touch pad.

This works great for accomplishing many tasks at once. A side effect, though, is that while users rotated a 3D building with their thumbs, they also had a tendency to shake the controller wildly so that it seemed to get tossed around the room. It took an amazing amount of dexterity and practice to rotate the model while keeping it in one spot.

hand

To fix this, we added a hand gesture to hold the model in place while the user rotated it. We called this the “halt” gesture because it just required the user to put up their off hand with the palm facing out. (Luke Hamilton, our Head of Design, also called this the “stop in the name of love” gesture.)

But we were on a gesture inventing roll and didn’t want to stop. We started thinking about how the keyboard is more accurate and faster than a mouse in data  entry scenarios, while the mouse is much more accurate than a game controller or hand tracking for pointing and selecting.

We had a similar situation here where the rotation gesture on the Magic Leap controller was intended to make it easy to spin the model in a 360 degree circle, but consequently was not so good for very slight rotations (for instance the kind of rotation needed to correctly orient a life-size digital twin of a building).

rotate_controller

We got on the phone with Brian Schwab and Jay Juneau at Magic Leap and they suggested that we try to use the controller in a different way. Rather than simply using the thumb pad, we could instead rotate the controller on its Z-axis (a bit like a screwdriver) as an alternative rotational gesture. Which is what we did, making this a secondary rotation method for fine-tuning.

And of course we combined the “halt / stop in the name of love” gesture with this “screwdrive” gesture, too. Because we could but more importantly because it made sense and most importantly because it allows the user to accomplish her goals with the least amount of friction.

The Image Book: A Review

image-book

Jean-Luc Godard released Le livre d’image in 2018. It is a montage film that stitches together brief film images – from the history of cinema, from the news and from his own films.

While it uses the same strange score editing as Farwell to Language, the overall effect is much more hypnotic – and beautiful.

The quick edits are overlaid with narration by Godard himself. At a certain point, his reflections turn to France’s relationship to the middle east and there is some original footage that Godard shot.

Montage films are kind of wonderful. There’s Wong Kar-wai’s film Hua yang de nian hua that stitches clips from Asian cinema totally unfamiliar to Western audiences. Watching it feels like glimpsing into a secret world. The beautiful scene in Cinema Paradiso that collects all the scenes deemed by the Catholic censors to be too explicit is a celebration of life, sexuality and cinema all at once. A recent discovery for me was The Road Movie from 2016, which basically takes dashcam footage uploaded by Russians and serializes it. Most of the footage involves car crashes. The best parts of it, though, are the moments before the sudden car crashes where you listen in on friends chatting, spouses fighting or the Russian version of AM talk radio. There’s a strange feeling of normalcy to those moments that makes one feel that we are all the same, wherever we are, whatever language we speak.

And then the crash happens.

The Image Book doesn’t contain any crashes. Instead it feels like a journey through Jean-Luc Godard’s mind while looking at the world through the eyes of one of cinema’s great masters.

Goodbye to Language: A Review

goodbyetolanguage

I’ve watched Godard’s Goodbye to Language (“Adieu au Langage”, 2014) once so far. It deserves and requires multiple viewings. It is a montage film, shot with multiple cameras (including a Go Pro) and covering multiple overlapping and unrelated story lines. There are also lots of shots of Jean-Luc Godard’s dog.

The movie is purposefully annoying. Take for instance the use of fast cutting. Fast cutting comes from music video editing and is used to convey forceful action. But the fast editing is still tied together with an underlying soundtrack to provide a sense of continuity and to bracket a series of related footage. Godard, on the other hand, undermines this by starting a piece of the score and then chopping it unceremoniously like a record player losing its groove. And then he does this over and over with the same piece of unsatisfyingly broken music in different places throughout the film.

If there’s a clue to what the film is “about” (and does it really need to be about anything) it’s in a line in the last third of the film, about a couple on the verge of breaking up, where both characters say that they understand what their partner is saying but cannot understand what they themselves are saying. It’s like a reverse gaslighting. Which, to be fair, is what marital fights feel like.

Other parts of the film include ponderous philosophical monologues and dialogues about the “tyranny of the image” – the tendency of myth and magical thinking to displace discursive reason. Godard also has lots of scenes of people interacting with their phones in book stalls and standing next to other people, highlighting something that has become so common that we no longer comment on it, but which can still shock when we see it on film. Smartphones and internet culture are in their own way manifestations of the tyranny of the image, since they replace long-form thinking with easily digestible memes. To the point that we now take for granted that long-form is a  waste of time and assume that it is normal (or even possible) to absorb complex thoughts in a few minutes.

Naturally there is irony in the title and concept of the film since film itself is a replacement of discursive thought with images and syllogistic reasoning is replaced with a musical score to move us from one narrative moment to the next. Except in Godard’s hands, the film resists us and makes even the simplest things hard. It fucks up the score. It limits the beautiful long shots. It uses gritty camera footage at a time when high quality digital images are cheap and easy. The handling of the sex scene is bleehhhh. Worst of all, the central story is anti-ship in a medium that requires sexually appropriate relationship building to ensure commercial success.

The overall effect of the movie on me is that I struggled to watch it but can’t stop thinking about it even weeks later. And parts of the movie I thought were pretentious and had less there than met the eye – I now think contain infinite depth.

An extra feature of the film is that it was originally shot in 3D and was exhibited at Cannes in 3D. I watched it in 2D but now will try to track down the 3D DVD. Like other amazing films – such as Bi Gan’s masterpiece Long Day’s Journey Into Night  — it uses a cinematic medium that has since fallen out of favor.  I fortunately still have an old 3D flatscreen and a 3D DVD player to watch it on.

Other movies, like Ang Lee’s Gemini Man, which is ultimately a technical master’s experiment in 3D cinema, isn’t even available in 3D DVD format. Given the current death of the movie theater in America, there’s even a chance that we won’t ever be able to see it in its intended form again.