The IG BCE’s magazine Kompakt interviewed me about IoT, AI and why simple solutions so often are inappropriate for complex issues: The interview is in German, available as an e-paper. (5 November 2019)
About a year ago, we changed our home audio setup to “smart” speakers: We wanted to be able to stream directly from Spotify, as well we from our phones. We also wanted to avoid introducing yet another microphone into our living room. (The kitchen is, hands down, the only place where a voice assistant makes real sense to me personally. Your mileage may vary.) Preferably, there should be a line-in as well; I’m old school that way.
During my research I learned that the overlap of circles in this Venn diagram of speakers that are (a) connected (“smart”) for streaming, (b) have good sound and (c) don’t have a microphone is… very thin indeed.
The Sonos range looked best to me; except for those pesky microphones. Our household is largely voice assistant free, minus the phones, where we just deactivated the assistants to whatever degree we could.
In the end, we settled for a set of Bang & Olufsen Beoplay speakers for living room and kitchen: Solid brand, good reputation. High end. Should do just fine — and it better, given the price tag!
This is just for context. I don’t want to turn this into a product review. But let’s just say that we ran into some issues with one of the speakers. These issues appeared to be software related. And while from the outside they looked like they should be easy to fix, it turned out the mechanisms to deliver the fixes were somewhat broken themselves.
Long story short: I’m now trying to return the speakers. Which made me realize that completely different rules apply than I’m used to. In Germany, where we are based, consumer protection laws are reasonably strong, so if something doesn’t work you can usually return it without too much hassle.
But with a set of connected speakers, we have an edge case. Or more accurately, a whole stack of edge cases.
This is going to be an interesting process, I’m afraid. Can we return the whole family and switch on over to a different make of speakers? Or are we stuck with an expensive set of speakers that while not quite broken, is very much unusable in our context?
If so, then at least I know never to buy connected speakers again. Rather, then I guess I’d recycle these and instead go back to a high end analog speaker set with some external streaming connector – knowing full well that that connector will be useless in a few years time but that the speakers and amp would be around and working flawlessly for 15-20 years, like my old ones did.
And that is the key insight here for our peers in the industry: If your product in this nascent field fails because of lacking quality management, then you leave scorched earth. Consumers aren’t going to trust your products any more, sure. But they are unlikely to trust anyone else’s, either.
The falling tide lowers all boats.
So let’s get not just the products right: In consumer IoT, all too often we think in families/ecosystems. So we have to consider long-term software updates (and mechanisms to deliver them, and fall-backs to those mechanisms…) as well as return policies in case something goes wrong with one of the products. And while we’re at it, we need to equally upgrade consumer protection regulation to deal with these issues of ecosystems and software updates.
This is the only way to ensure consumer trust. So we can reap the benefits of innovation without suffering all the externalized costs as well as unintended consequences of a job sloppily done.
Update (Oct 2019): Turns out other companies also start recognizing that there’s a demand for mic-free speakers: Sonos just launched a speaker that they market specifically for it being mic-free, and it’s otherwise identical to one of their staples. (It’s called the One SL; I imagine the “SL” stands for “streamlined” or “stop listening” but I might be projecting.)
At SimplySecure’s excellent Underexposed conference we discussed the importance of making it easier for those involved in making connected products and services to make safe, secure, and privacy-conscious products. After all, they might be experts, but necessarily security experts, for example. So, toolkit time!
I asked participants in the room as well as publicly on Twitter which toolkits and resources are worth knowing. This is what this looked like in the room:
“Which toolkits should we all know? Ethics, privacy, security”
Here’s the tweet that went with it:
Which toolkits for designers, devs etc for ethics, privacy, security should we all know? #underexposed— Peter Bihr ? (@peterbihr) November 9, 2017
So what are the toolkit recommendations? Given the privacy-sensitive nature of the event, I’m linking to the source only where people send the recommendations on public Twitter. Also, please note I’m including them without much background, and unchecked. So here goes:
This list can by no means claim to be complete, but hopefully it will still be useful to some of you.
Doing some research-related reading this morning had me go down a bit of a rabbit hole that led to this Twitter thread. The points hold up, I think, so here it is in easier-to-read-and-reference format:
Smart Cities are often framed as part of industrial #iot. I think we need to frame it as empowerment tech for citizens instead.
This industrial #iot framing is only natural: Most vendors of smart city tech come from that background. But I think it’s not healthy. A technology that impacts, by definition, all citizens needs to be framed, regulated & designed accordingly. Meaning: If there’s not opt-out (and there isn’t, in public space), we need to make sure this works for everyone, can be understood & queried.
We need strong democratic oversight on smart city technologies and the algorithms, processes, vendors powering them. Which is why we need to follow the principles that made the early open web so strong & resilient: decentralization, open source, etc.
Only if we reframe our thinking of smart cities from industrial to citizen centric can these technologies unfold their positive potential.
This echoes the position we developed for a report for the German federal government a while ago as part of research into how to best make smart cities work for citizens. The findings of that report are summarized here.
Executive Summary: We went to Shenzhen to explore opportunities for collaboration between European Internet of Things practitioners and the Shenzhen hardware ecosystem—and how to promote the creation of a responsible Internet of Things. We documented our experience and insights in View Source: Shenzhen.
View Source is the initiative of an alliance of organizations that promote the creation of a responsible Internet of Things:
Along for part of the ride were two other value-aligned organizations:
What unites us in our efforts is great optimism about the Internet of Things (IoT), but also a deep concern about the implications of this technology being embedded in anything ranging from our household appliances to our cities.
This document was written as part of a larger research effort that included, among other things, two trips to Shenzhen, a video documentary, and lots of workshops, meetings, and events over a period of about a year. It’s part of the documentation of these efforts. Links to the other parts are interspersed throughout this document.
This research was a collaborative effort undertaken with the Dutch design consultancy The Incredible Machine, and our delegations to China included many Dutch designers, developers, entrepreneurs and innovators: One of the over-arching goals of this collaboration was to build bridges between Shenzhen and the Netherlands specifically—and Europe more generally—in order to learn from one another and identify business opportunities and future collaborations.
We thank the Creative Industry Fund NL for their support.
*Please note: While I happen to be the one to write this text as my contribution to documenting our group’s experiences, I cannot speak for the group, and don’t want to put words in anyone’s mouth. In fact, I use the “we” loosely; depending on context it refers to either one of the two delegations, our lose alliance for responsible IoT, or is a collective “we”. I hope that it’s clear in the context. Needless to say, all factual errors in this text are mine, and mine alone. If you discover any errors, please let me know.
TL;DR: Machine learning and artificial intelligence (AI) are beginning to govern ever-greater parts of our lives. If we want to trust their analyses and recommendations, it’s crucial that we understand how they reach their conclusions, how they work, which biases are at play. Alas, that’s pretty tricky. This article explores why.
As machine learning and AI gain importance and manifest in many ways large and small wherever we look, we face some hard questions: Do we understand how algorithms make decisions? Do we trust them? How do we want to deploy them? Do we trust the output, or focus on process?
Please note that this post explores some of these questions, connecting dots from a wide range of recent articles. Some are quoted heavily (like Will Knight’s, Jeff Bezos’s, Dan Hon’s) and linked multiple times over for easier source verification rather than going with endnotes. The post is quite exploratory in that I’m essentially thinking out loud, and asking more questions than I have answers to: tread gently.
In his very good and very interesting 2017 shareholder letter, Jeff Bezos makes a point about not over-valuing process: “The process is not the thing. It’s always worth asking, do we own the process or does the process own us?” This, of course, he writes in the context of management: His point is about optimizing for innovation. About not blindly trusting process over human judgement. About not mistaking existing processes for unbreakable rules that are worth following at any price and to be followed unquestioned.
Bezos also briefly touches on machine learning and AI. He notes that Amazon is both an avid user of machine learning as well as building extensive infrastructure for machine learning—and Amazon being Amazon, making it available to third parties as a cloud-based service. The core point is this (emphasis mine): “Over the past decades computers have broadly automated tasks that programmers could describe with clear rules and algorithms. Modern machine learning techniques now allow us to do the same for tasks where describing the precise rules is much harder.”
Algorithms as a black box: Hard to tell what’s going on inside (Image: ThinkGeek)
That’s right: With machine learning, we can learn to get desirable results but without necessarily knowing how to describe the rules that get us there. It’s pure output. No—or hardly any—process in the sense that we can interrogate or clearly understand it. Maybe not even instruct it, exactly.
Let’s keep this at the back of our minds now, we’ll come back to it later. Exhibit A.
“Machine learning techniques – most recently and commonly, neural networks – are getting pretty unreasonably good at achieving outcomes opaquely. In that: we really wouldn’t know where to start in terms of prescribing and describing the precise rules that would allow you to distinguish a cat from a dog. But it turns out that neural networks are unreasonably effective (…) at doing these kinds of things. (…) We’re at the stage where we can throw a bunch of images to a network and also throw a bunch of images of cars at a network and then magic happens and we suddenly get a thing that can recognize cars.”
Dan goes on to speculate:
“If my intuition’s right, this means that the promise of machine learning is something like this: for any process you can think of where there are a bunch of rules and humans make decisions, substitute a machine learning API. (…) machine learning doesn’t necessarily threaten jobs like “write a contract between two parties that accomplishes x, y and z” but instead threatens jobs where management people make decisions.”
“Neural networks work the other way around: we tell them the outcome and then they say, “forget about the process!”. There doesn’t need to be one. The process is inside the network, encoded in the weights of connections between neurons. It’s a unit that can be cloned, repeated and so on that just does the job of “should this insurance claim be approved”. If we don’t have to worry about process anymore, then that lets us concentrate on the outcome. Does this mean that the promise of machine learning is that, with sufficient data, all we have to do is tell it what outcome we want?”
Now if we answered Dan’s question with YES, then this is where things get tricky, isn’t it? It opens the door to a potentially pretty slippery slope.
In political science, a classic question is what the best form of government looks like. While a discussion about what “best” means—freedom? wealth? health? agency? for all or for most? what are the tradeoffs?—is fully legitimate and should be revisited every so often, it boils down to this long-standing conflict:
Can a benevolent dictator, unfettered by external restraints, provide a better life for their subjects?
Does the protection of rights, freedom and agency offered by democracy outweigh the often slow and messy decision-making processes it requires?
Spoiler alert: Generally speaking, democracy won this debate a long time ago.
(Of course there are regions where societies have held on to the benevolent dictatorship model; and the recent rise of the populist right demonstrates that populations around the globe can be attracted to this line of argument.)
The reason democracy—a form of government defined by process!—has surpassed dictatorships both benevolent and malicious is that overall it seems a human endeavor to have agency and freely express it, rather than be governed by an unfettered, unrestricted ruler of any sorts.
Every country that chooses democracy over a dictator sacrifices efficiency for process: A process that can be interrogated, understood, adapted. Because, simply stated, a process understood is a process preferred. Being able to understand something gives us power to shape it, to make it work for us: This is true both on the individual and the societal level.
Messy transparency and agency trumps blackbox efficiency.
Let’s keep that in mind, too. Exhibit B.
Andrew Ng, who was heavily involved in Baidu’s (and before Google’s) AI efforts, emphasizes the potential impact of AI to transform society: “Just as electricity transformed many industries roughly 100 years ago, AI will also now change nearly every major industry?—?healthcare, transportation, entertainment, manufacturing?—?enriching the lives of countless people.”
“I want all of us to have self-driving cars; conversational computers that we can talk to naturally; and healthcare robots that understand what ails us. The industrial revolution freed humanity from much repetitive physical drudgery; I now want AI to free humanity from repetitive mental drudgery, such as driving in traffic. This work cannot be done by any single company?—?it will be done by the global AI community of researchers and engineers.”
While I share Ng’s assessment of AI’s potential impact, I got to be honest: His raw enthusiasm for AI sounds a little scary to me. Free humanity from mental drudgery? Not to wax overly nostalgic, but mental drudgery—even boredom!—has proven really quite important for humankind’s evolution and played a major role in its achievements. Plus, the idea that engineers are the driving force seems risky at least: It’s a pure form of stereotypical Silicon Valley think, almost a cliché. I’m willing to give him the benefit of the doubt and assume that by “researchers” he also meant to include anthropologists, philosophers, political scientists, and all the other valuable perspectives of social sciences, humanities, and other related fields.
Don’t leave something as important as AI to a bunch of tech bros (Image: Giphy)
Something as transformative as this should not, in the 21st century, be driven by a tiny group of people with very homogenous backgrounds. Diversity is key, in professional backgrounds and ways of thinking as much as in gender, ethnic, regional and cultural backgrounds. Otherwise, algorithms are bound to encode and help enforce unhealthy policies.
Engineering-driven, tech-deterministic, non-diverse expansionist thinking delivers sub-optimum results. File under exhibit C.
Bezos writes about the importance of making decisions fast, which often requires making them with incomplete information: “most decisions should probably be made with somewhere around 70% of the information you wish you had. If you wait for 90%, in most cases, you’re probably being slow. Plus, either way, you need to be good at quickly recognizing and correcting bad decisions. If you’re good at course correcting, being wrong may be less costly than you think, whereas being slow is going to be expensive for sure.”
This, again, he writes in the context of management—presumably by and through humans. How will algorithmic decision-making fit into this picture? Will we want our algorithms to start deciding—or issuing recommendations—based on 100 percent of information? 90? 70? Maybe there’s an algorithm that figures out through machine learning how much information is just enough to be good enough?
Who is responsible for making algorithmically-made decisions? Who bears the responsibility for enforcing them?
If the algorithmic load-optimizing (read: overbooking), tells airline staff to remove a passenger from a plane and it ends up in a dehumanizing debacle, who’s fault is that?
Teacher of Algorithm by Simone Rebaudengo and Daniel Prost
More Dan Hon! Dan takes this to its logical conclusion (s4e11): “We’ve outsourced deciding things, and computers – through their ability to diligently enact policy, rules and procedures (surprise! algorithms!) give us a get out of jail free card that we’re all too happy to employ.” It is, in extension, a jacked up version of “it’s policy, it’s always been our policy, nothing I can do about it.” Which is, of course, the oldest and laziest cop-out there ever was.
He continues: “Algorithms make decisions and we implement them in software. The easy way out is to design them in such a a way as to remove the human from the loop. A perfect system. But, there is no such thing. The universe is complicated, and Things Happen. While software can deal with that (…) we can take a step back and say: that is not the outcome we want. It is not the outcome that conscious beings that experience suffering deserve. We can do better.”
I wholeheartedly agree.
To get back to the airline example: In this case I’d argue the algorithm was not at fault. What was at fault is that corporate policy said this procedure has priority, and this was backed up by an organizational culture that made it seem acceptable (or even required) for staff to have police drag a paying passenger off a plane with a bloody lip.
Algorithms blindly followed, backed up by corporate policies and an unhealthy organizational culture: Exhibit D.
In the realm of computer vision, there’s been a lot of advances through (and for) machine learning lately. Generative adversarial networks (GANs), in which one network tries to fool another, seem particularly promising. I won’t pretend to understand the math behind GANs, but Quora got us covered:
“Imagine an aspiring painter who wants to do art forgery (G), and someone who wants to earn his living by judging paintings (D). You start by showing D some examples of work by Picasso. Then G produces paintings in an attempt to fool D every time, making him believe they are Picasso originals. Sometimes it succeeds; however as D starts learning more about Picasso style (looking at more examples), G has a harder time fooling D, so he has to do better. As this process continues, not only D gets really good in telling apart what is Picasso and what is not, but also G gets really good at forging Picasso paintings. This is the idea behind GANs.”
So we got two algorithmic networks sparring with one another. Both of them learn a lot, fast.
Impressive, if maybe not lifesaving results include so-called style transfer. You’ve probably seen it online: This is when you can upload a photo and it’s rendered in the style of a famous painter:
Collection Style Transfer refers to transferring images into artistic styles. Here: Monet, Van Gogh, Ukiyo-e, and Cezanne. (Image source: Jun-Yan Zhu)
Maybe more intuitively impressive, this type of machine learning can also be applied to changing parts of images, or even videos:
br> Sometimes, failure modes are not just interesting but also look hilarious (Image source: Jun-Yan Zhu)
Wait, how did we get here? Oh yes, output v process!
What about skill sets required to work with machine learning, and to make machines learn in interesting, promising ways?
Google has been remaking itself as a machine learning first company. As Christine Robson, who works on Google’s internal machine learning efforts puts it: “It feels like a living, breathing thing. It’s a different kind of engineering.”
Technology Review features a stunning article absolutely worth reading in full: In The Dark Secret at the Heart of AI, author Will Knight interviews MIT professor Tommi Jaakkola who says:
“Deep learning, the most common of these approaches, represents a fundamentally different way to program computers. ‘It is a problem that is already relevant, and it’s going to be much more relevant in the future. (…) Whether it’s an investment decision, a medical decision, or maybe a military decision, you don’t want to just rely on a ‘black box’ method.’”
And machine learning doesn’t just require different engineering. It requires a different kind of design, too. From Machine Learning for Designers (ebook, free O’Reilly account required): “These technologies will give rise to new design challenges and require new ways of thinking about the design of user interfaces and interactions.”
Machine learning means that algorithms learn from—and increasingly will adapt to—their own performance, user behaviors, and external factors. Processes (however oblique) will change, as will outputs. Quite likely, the interface and experience will also adapt over time. There is no end state but constant evolution.
Technologist & researcher Greg Borenstein argues that “while AI systems have made rapid progress, they are nowhere near being able to autonomously solve any substantive human problem. What they have become is powerful tools that could lead to radically better technology if, and only if, we successfully harness them for human use.”
Borenstein concludes: “What’s needed for AI’s wide adoption is an understanding of how to build interfaces that put the power of these systems in the hands of their human users.”
Future-oriented designers seem to be at least open to this idea. As Fabien Girardin of the Near Future Laboratory argues: “That type of design of system behavior represents a future in the evolution of human-centered design.”
Computers beating the best human Chess and Go players have given us Centaur Chess in which humans and computers play side-by-side in a team: While computers beat humans in chess, these hybrid team of humans and computers playing in tandem beat computers hands-down.
In centaur chess, software provides analysis and recommendations, a human expert makes the final call. (I’d be interested in seeing the reverse being tested, too: What if human experts gave recommendations for the algorithms to consider?)
Now, all of this isn’t particularly well understood today. Or more concretely, the algorithms hatched that way aren’t understood, and hence their decisions and recommendations can’t be interrogated easily.
Will Knight shares the story of a self-driving experimental vehicle that was “unlike anything demonstrated by Google, Tesla, or General Motors, and it showed the rising power of artificial intelligence. The car didn’t follow a single instruction provided by an engineer or programmer. Instead, it relied entirely on an algorithm that had taught itself to drive by watching a human do it.”
What makes this really interesting is that it’s not entirely clear how the algorithms learned:
“The system is so complicated that even the engineers who designed it may struggle to isolate the reason for any single action. And you can’t ask it: there is no obvious way to design such a system so that it could always explain why it did what it did (…) It isn’t completely clear how the car makes its decisions.”
Knight stresses just how novel this is: “We’ve never before built machines that operate in ways their creators don’t understand.”
We know that it’s possible to attack machine learning with adversarial examples: So-called adversarial examples are intentionally designed to cause the model to make a mistake, to train the algorithm incorrectly. Even without a malicious attack, algorithms also simply don’t always get the full—or right—picture: “Google researchers noted that when its [Deep Dream] algorithm generated images of a dumbbell, it also generated a human arm holding it. The machine had concluded that an arm was part of the thing.”
This—and this type of failure mode—seems relevant. We need to understand how algorithms work in order to adapt, improve, and eventually trust them.
Consider for example two areas where algorithmic decision-making could directly decide about life or death: Military and medicine. Speaking of military use cases, David Dunning of DARPA’s Explainable Artificial Intelligence program explains: “It’s often the nature of these machine-learning systems that they produce a lot of false alarms, so an intel analyst really needs extra help to understand why a recommendation was made.” Life or death might literally depend on it. What’s more, if a human operator doesn’t fully trust the AI output then that output is rendered useless.
We need to understand how algorithms work (Image: Giphy)
Should we have a legal right to interrogate AI decision making? Again, Knight in Technology Review: “Starting in the summer of 2018, the European Union may require that companies be able to give users an explanation for decisions that automated systems reach. This might be impossible, even for systems that seem relatively simple on the surface, such as the apps and websites that use deep learning to serve ads or recommend songs. The computers that run those services have programmed themselves, and they have done it in ways we cannot understand. Even the engineers who build these apps cannot fully explain their behavior.”
It seems likely that this could currently not even be enforced, that the creators of these algorithmic decision-making systems might not even be able to find out what exactly is going on.
There have been numerous attempt of exploring this, usually through visualizations. This works, to a degree, for machine learning and even other areas. However, often machine learning is used to crunch multi-dimensional data sets. We simply have no great way of visualizing this in a way that makes it easy to analyze (yet).
This is worrisome to say the least.
But let me play devil’s advocate for a moment: What if the outcomes are really so good, so much better than the human-powered analysis or decision-making skills. Might not using them be simply irresponsible? Knight gives the example of a program at Mount Sinai Hospital in New York called Deep Patient that was “just way better” at predicting certain diseases from patient records.
If this prediction algorithm has a solid track record of successful analysis, but neither developers nor doctors understand how it reaches its conclusions, is it responsible to prescribe medication based on its recommendation? Would it be responsible not to?
Philosopher Daniel Dennett who studies consciousness of the mind takes it a step further. An explanation by an algorithm might not be good enough. Humans aren’t great at explaining themselves, so if an AI “can’t do better than us at explaining what it’s doing, then don’t trust it.”
It follows that an AI would need to provide a much better explanation than a human in order for it to be trustworthy. Exhibit E.
Let’s assume that the impact of machine learning, algorithmic decision-making and AI will keep increasing. A lot. Then We need to understand how algorithms work in order to adapt, improve, and eventually trust them.
Machine learning allows us to get desirable results, but without necessarily knowing how (exhibit A). It’s essential for a society to be able to understand and shape its governance, and to have agency in doing so. So in AI just like in governance: Transparent messiness is more desirable than oblique efficiency. Black boxes simply won’t do. We cannot have black boxes govern our lives (exhibit B). Something as transformative as this should not, in the 21st century, be driven by a tiny group of people with very homogenous backgrounds. Diversity is key, in professional backgrounds and ways of thinking as much as in gender, ethnic, regional and cultural backgrounds. Engineering-driven, tech-deterministic, non-diverse, expansionist thinking delivers sub-optimum results (exhibit C). Otherwise, algorithms are bound to encode and help enforce unhealthy policies. Blindly followed, backed up by corporate policies and an unhealthy organizational culture, this is bound to deliver horrible results (exhibit D). Hence we need to be able to interrogate algorithmic decision-making. And if in doubt, an AI should provide a much better explanation than a human in order for it to be trustworthy (exhibit E).
Machine learning and AI hold great potential to improve our lives. Let’s embrace it, but deliberately and cautiously. And let’s not hand over the keys to software that’s too complex for us to interrogate, understand, and hence shape to serve us. We must apply the same kinds of checks and balances to tech-based governance as to human or legal forms of governance—accountability & democratic oversight and all.
I’d like to make a case for being careful with spreading second- or third-hand stories and rather on gathering first-hand experience of interesting products and services. I believe it’s the best way to feel our way into a future shaped by emerging technologies, and to make informed decisions about them. So in the name of science, I lived with Amazon Echo/Alexa for a week. Here’s my experience.
We talk a lot about smart homes, about connected domestic devices, about conversational interfaces and artificial intelligence. A surprising amount of what’s talked about and what’s reported on is word of mouth: I heard somewhere that Amazon Echo ordered a thousand doll houses and boxes of cookies after someone mentioned it on TV! The makers of the doll houses couldn’t believe their luck, and consumers are screwed!
(For the record: In reality, it was likely “a handful” of dollhouse orders; it’s not trivially simple to order—let alone unknowingly—via the device; and Amazon has a full refund policy for physical products ordered this way.)
Word-of-mouth information is bad for all kinds of reasons
This word-of-mouth information is bad for all kinds of reasons. (One could cynically argue that it perfectly fits our times of so-called “post-factual” news and politics.) I believe there’s plenty of reason to be critical of connected services, and even more convinced consumers of (or everyone exposed to) connected services should be able to make informed decisions about their use.
For that reason, I think we should expect from both journalists and everyone in the tech scene (expert peer group!) to be careful about what information and narrative we spread: Instead of rumors we should focus on facts and first-hand experience.
I make a point of frequently testing emerging technologies even when I’m not convinced they’ll be a good fit for my life
This is why I make a point of frequently testing emerging technologies even when I’m not convinced they’ll be a good fit for my life, but that are misunderstood or discussed heavily but with little informational basis. This way I’ve kickstarted smart watches, worn fitness trackers, spit in tubes to have my DNA analyzed. None of it killed me; a lot of it was bland and boring; every time I learned a lot, even if it was only that these technologies offered a lot less risk & reward than the hype suggested.
So we lived for a week with an Amazon Echo and it’s voice-controlled assistant Alexa.
First, for clarification: Amazon Echo is the physical full-size device; Dot is a smaller version; Alexa is the software backend that’s also available as a platform to build apps (in Amazon speak, skills) on through an API.
Second, I’d like to acknowledge that this isn’t exactly pioneering work: the Echo has been available in the US since mid-2015; only in Germany it didn’t come out until fall of last year (Wikipedia). I’d had the chance to learn a bit of its design process and decision making earlier at conference (like Interaction15), so I had a fairly good idea what to expect.
Now, what’s it like to live with a device that aims to be a smart home hub, that is often said to listen in on you permanently (partially true, but likely not in the creepy way often suggested), and that might follow you around on the web: More than once in conversations about Alexa people mentioned that other people had experienced online ads after mentioning a product in front of Alexa. This latter was always related in a friend-of-a-friend context: Nobody could point to a source or documentation, it was all hearsay. Case in point.
So from experience I can say that yes, Alexa might respond to things on TV, but it’s very rare. In an interview I recently gave for RBB Kulturradio (DE) on smart homes and their implications, the host half-joked on the air that ordering Alexa to play their channel during his show might boost their listenership stats; alas he failed to get the syntax right. (I tried to replicate it later by playing his recording to Alexa. Nothing happened.)
Much more annoyingly, it often responds to mentions of similar-sounding names, like Alex. But what might be the most frustrating is that fairly frequently it simply wouldn’t respond when I addressed it, because I wouldn’t stick to the exact tonality of the voice training I had done during setup. And if it did, it often would misunderstand—this may be partially because I mumbled or got caught up mid-sentence while trying to get the syntax right, or because I wasn’t familiar with what orders were OK to give and what was out of scope. I imagine this is part of a learning curve; a week in I could play most music without a hitch (except M.I.A., see below).
It got really, really bad once we switched Alexa to German. Playing music got really tricky. The music streaming service default I had set up before in the English-language interface (in this case Spotify) had to be set up once more. English band names would have to be pronounced in English (they’re names after all), but often would be misinterpreted. Trying to play M.I.A., Alexa would always, 100 percent of the time, play German band Mia. (If you compare the two, you’ll agree this isn’t a mixup you’re likely to enjoy.) It’s perfectly understandable this is a tough nut to crack, but hey, it really shouldn’t be the users’ problem.
How seamlessly the voice and screen control go hand-in-hand is really a thing of beauty: If it works, this is a glimpse into a near future that I’d kinda like.
That said, in English playing music was quite pleasant. The interface is OK enough to make it work. If there’s a mix-up, it’s easy to correct or change course through the Spotify app on your phone. How seamlessly the voice and screen control go hand-in-hand is really a thing of beauty: If it works, this is a glimpse into a near future that I’d kinda like.
But beyond playing music, we couldn’t find any real use case for Alexa. Our house doesn’t have many smart home appliances, and none of the ones we do can interact through Alexa—as far as we know, that is. Alexa apps (“skills”) are legion, but not discovered easily.
Setting a timer is also easy, so in the kitchen these two things alone—playing music and setting timers hands-free—might make for an appealing use case. Almost anything else I found a little disappointing: “How long to get to Hot Spot Restaurant?” failed to produce a result because there’s no routing or mapping services available by default. (Or if there is, I couldn’t find out how to find it.) Online searches for anything are likely to return sub-par results as they’re not powered by Google but Bing, and I still find the difference enormous.
If you’re after dad jokes, you’re in luck.
Alexa is choke-full of easter eggs, like “Alexa, tell me a joke.” So if you’re after dad jokes, you’re in luck.
Otherwise, I noted that most people who hadn’t spent any time with an Echo were a little cautious (“Is it safe to speak in front of it?”) or curious to test the interface (“Alexa, what’s the weather?”, “Alexa, how are you?”, “Alexa, buy a doll house and some cookies, haha!”). This kind of breaks the fourth wall, but of course only highlights how much of a learned behavior it is to interact with a voice-controlled digital assistant. A voice controlled digital assistant is very emphatically not an intuitive interface because we don’t usually talk to our appliances.
A voice controlled digital assistant is very emphatically not an intuitive interface because we don’t usually talk to our appliances.
This is a point that Alexander Aciman makes very clear in a rough take-down of Alexa on Quartz. There he argues that the current manifestation of Alexa isn’t the future of AI, it’s a glorified radio clock, and I tend to agree. Partly it’s that there are some essential default apps missing, including a better search engine integration (where Google obviously has a huge advantage, but competition between the what Bruce Sterling calls the Stacks means Amazon won’t use Google’s search): “Her response to 95% of basic search queries is ‘I can’t find the answer to the question I heard.'” But even once a skill is activated, describes Alexander point-on, “You can’t say ‘Alexa, find my phone,’ but instead must ask say ‘Alexa, ask TrackR to find my phone.’ And God forbid you should accidentally forget the name TrackR, you’ll need your phone to look it up.”
This makes for a rougher-than-necessary user experience. The Alexa companion app tries to make up for this by constantly surfacing new skills and tutorials. This is necessary for sure, but also total kludge.
In short, I found myself using Alexa only to play music—an activity we were set up for perfectly before Alexa. Despite the maybe rough criticism above, there’s something interesting there. It’s important to look at this as an early technology. Things will likely improve and start working just a little better. Interesting use cases might emerge over time.
Alexa is a little too much like simply having a physical token of Amazon, the company, in your living room, like having a print-out of a corporate powerpoint framed on your wall.
As things are today, Alexa doesn’t feel particularly smart, or threatening. Instead Alexa is a little too much like simply having a physical token of Amazon, the company, in your living room, like having a print-out of a corporate powerpoint framed on your wall. What it’s not is a solution to any problem, or a great convener of convenience. Instead it feels very explicitly like it’s the stacks, manifested.