Amazon Alexa

I have a love & hate relationship with Alexa! Don't get fooled by her sweet-talking flirty voice, she is one complicated lady and that's why I have built ChipChop to deal with her (ChipChop doesn't mind, it's young, naive and most likely a boy ;-)

Joking aside, building an interface with Alexa wasn't a weekend job, it's genuinely a complex ecosystem that requires thorough understanding of a number of technologies...and a patience of a saint!
If you ever had to refer to the Apple iOS API documentation, then Amazon's Alexa docs are a 5 star luxury hotel...well, they could be, if only there wasn't for one tiny-tiney-miney f%#@* glitch that made me want to chew my own arms and hope that in the next life I return as something without a brain, maybe a turnip or a bottle of ketchup.

I will give you a really simplified background on how the interplay between your devices, Alexa & ChipChop works, it's still a bit to read so if you don't have time or are not interseted jump straight to the Linking page in the menu.
I will of course taint all objective facts with little personal grumpy remarks and rude language as building this was an emotional journey for me.

Implementation

The ChipChop implementation of the communication protocols between your devices and Amazon Alexa is done properly, there are no hacks, third party services, half baked GitHub libraries or whatever lazy ass shortcuts people try to take so they can get things done quickly and most likely badly.



ChipChop has it's own Amazon Lambda function living on AWS and it's own Smart Home Skill for Alexa approved and published with Amazon. It operates as an IoT Device Cloud and provides all functionality that Amazon expects including it's own OAuth 2.0 server and lightning fast response times well below the allowed margins.

To give you an idea how quickly things happen in the ChipChop world, when you press that 'Discover New Devices' in the Alexa app the maximum allowed time to fully respond back is 8 seconds and that's including all travel time through the wires, servers, sattelites, swamps and whatnot...twice!
ChipChop does it in on a very very bad day in under 200 milliseconds


Security

All communication has to use the OAuth 2.0 protocol so there is a fuck ton of security tokens being exchanged, renewed, encrypted and decrypted every second. The word "token" has become a swear word in my vocabulary, seriously, you should see these things and how they polute poor little ChipChop.

Also, with any love correspondence with lady Alexa the primary requirement is of course the use of HTTPS (protection kids, protection...no old school touching over dirty plain HTTP)

So, yeah, your little Arduinos, ESPs, TinyMinyMinos and whatevers are well taken care of.


COMMUNICATION

In a nutshel, and to shatter any illusions, Alexa does not talk to your devices directly...it all works through various device clouds.
Note: There are couple of exeptions to this, see comment at the end of the page

There is a bunch of different scenarios how the communication between your Arduino smart doodah and Alexa can flow.

I will describe just one scenario, switching a light ON on your device by telling Alexa.
Disclaimer: To get the point accross the terminology is heavily simplified and some steps are skipped, innaccurate or slightly dramatised
  1. You say: 'Alexa, turn the Bedroom Light ON' ('Bedroom Light' is your little device)
  2. Alexa sends your voice to Amazon to be processed
  3. Amazon's mega-brain voice AI (or could be some dude with very good hearing) figures what you've said and sends that command to some Alexa Server (where Alexa actually lives...not to your house)
  4. Alexa Server fires up the ChipChop Lambda and passes the command with a bunch of security tokens
  5. ChipChop Lambda connects to the real ChipChop server and passes again all the data
  6. ChipChop figures out which device this is all about, check if it's connected, active etc.
  7. Finds the socket connection with your device and sends the command
  8. Your device figures out what it needs to do
  9. Your device turns it's little light ON
  10. ChipChop checks if you have any Actions that need to be run and sends any commands to other devices
  11. Compiles a response document to send back to the correct Alexa gateway for your location
  12. Requests a new security token
  13. Goes for a pee whilst waiting for the new token to arrive
  14. Signs the response document and sends the response back to Amazon
  15. Amazon gateway sends the response to the Alexa Server
  16. Alexa Server checks if there are any routines that need to be triggered
  17. If there are, finds any other devices and sends commands to their manufacturer clouds
  18. Alexa Server tells the Alexa App to update the device status to ON and tells the Alexa in your house that the job is done
  19. Alexa in your house makes alittle 'ting' sound and flashes it's lights to let you know it's all done

From Step 1 when you've finished saying the word 'ON' to Step 9 with my devices it takes around 300-400 milliseconds...
to reach Step 18 who knows, maybe a second or so, depends how complicated you've made things and if you are also firing devices that are not yours so information has to travel all across the world and back.


That's it, you know all the secrets now!... It's all clear as mud and you can safely proceed to the next section Linking


p.s. For the sake of completenes, at the time of writing this it has just become possible to use Zigbee enabled hardware or Low Powered Bluetooth and use an Echo device as a hub but it does come with it's own limitations.
If you are happy with basic "switch On"/switch Off" stuff than go for it, if you want freedom to tinker and evolve than stick with ChipChop.