The tech press have had mixed opinions about Apple’s iPhone 4S since it was announced in early October and went on sale last week. While they’ve been busy with the ins and outs of whether people will actually buy it (turns out, they did), there hasn’t been much for software enthusiasts to get their teeth into.
While a dual-core chip is welcome, it’s no longer mind-blowing in a mobile phone, with the Galaxy Nexus being announced tomorrow expected to have a 1.2GHz dual-core processor. By far and away the most interesting innovation is Siri, a built-in feature of iOS 5 (but only for iPhone 4S) that many are claiming to be next-generation AI for the masses (and is providing laughs as an internet meme).
Apple are tight-lipped about Siri’s technology, as you may expect. However, there is some inference that can be made over how it might work, and it’s not the AI marvel that Apple fans have been proclaiming.
According to Apple’s own site about Siri, “Siri on iPhone 4S lets you use your voice to send messages, schedule meetings, place phone calls, and more. Ask Siri to do things just by talking the way you talk. Siri understands what you say, knows what you mean, and even talks back. Siri is so easy to use and does so much, you’ll keep finding more and more ways to use it.” Once you take a look at this list (it’s actually pretty much complete, despite the “does so much” marketing line) you start to realise that Siri’s architecture can be broken down quite easily and it is not as smart as some make it out to be.
First of all, it requires speech to text translation. Despite a lot of work in the area over the past decades, speech recognition in computers is still horribly unreliable. Even consider the case of Siri: Gizmodo’s Australian contributor having real trouble getting Siri to recognise what he said when it was set to US English. Siri’s Tom Gruber, speaking in early 2010 before the company was acquired by Apple in April of that year, said in an interview that “[Siri] uses third party speech systems, and are architected so we can swap them out and experiment.” Whether it still uses third parties remains to be seen, but 1 year doesn’t seem like a long time for Apple to build up its own speech corpus. The speech recognition is also made easier because its corpus of knowledge may be limited to the actions and people it is expected to recognise.
It’s also worth noting that speech recognition is most likely not done on the iPhone device itself. Apple’s servers are involved in the process of translating speech to text, and Siri has been shown to have problems when Apple’s servers were under heavy load.
Once the speech-to-text is complete, Siri needs to perform lexical analysis of the text to find out what you want to do. For an “in the wild” AI, this would be extremely challenging – even if you can break down the structure of the sentence (quite easy to do with a good dictionary) then there are so many actions, objects, and permutations that it would be very difficult to act upon. However, Siri has the advantage that (other than people asking it stupid things) the set of actions it could be asked to do is fairly limited, to the actions on your phone outlined above. While some cleverness is required to link oddly-phrased sentences into some actions, and some knowledge of relationships between people is required (e.g. “call Mom”, allegedly it gets this data from the contacts list “Relationship” field in the phone), there are actually not that many ways of asking for a reminder to be set or for a calendar entry to be added.
After Siri has determined what you wanted to do (or returned a cynical comment at your crude joke), it is a simple step to perform the action through one of the APIs that it has access to. It is the number of on-device APIs that Siri has access to (that nobody else does) and the number of clever web services it can connect through to get you answers (see Wolfram Alpha) that sets it apart and makes it seem much cleverer than it really is to the end user.
So while Siri really is a huge step forward in terms of making AI commonplace on people’s devices, it has unfortunately suffered from the over-hype a lot of Apple launches get in the media and its abilities somewhat blown out of proportion.