Amazon Alexa Integration


Alexa Client Implementation on Loomo​

Get access to the Alexa Voice Services within the robot. Result: We want to use Loomo as an Alexa client and use all the skills that are available on any other Alexa device.

Control the Robot Using Alexa

We want to be able to talk to the Alexa service on Loomo and trigger commands that are fulfilled by Loomo, i.e. navigation tasks.

Alexa Client Implementation on Loomo

Project Documentation

In general the integration for Alexa is basically the same as the development of an Android app with Alexa functionalities based on Loomos Android device (Build in at the top of the unit). But you have to take care about two things: 1. Use the correct API Level (22) for building you app. 2. Import the lastest Loomo SDK dependencies, you can find here: Segwayrobotis Developer IDE Setup

The general documentation on the webpage Amazon Developer Websiteis pretty good, but there are no sample code snippets for any voice interaction. You only see the JSON files.

Create a test App with the option to use LWA (Login with Amazon) libraries

A detailed description for this step you can find here: Amazon Developer Install SKD for Android Documentation


Step 3: Amazons documentation about the key extraction process for you local keys is not completely correct. It is easier to use the build in gradle signingReport. You can find it in the gradle area in Android Studio in the folder Run Configurations

Step 5: It is not necessary to use an Image Button with different types of sizes colours and formats. It easier to use a standard button or for record purposes an onTouchListener.

Enhance this app for using AVS (Amazon Voice Services)

Detailed documentation about this topic: Amazon Developer Documentation – Authorize on Product

You have to set up a Product_ID and a PRODUCT_DSN (Device Serial Number). Theses two Strings can be chosen as you like but you have to create both of them.

More information about AVS you can find here: Amazon Developer – Get started with AVS

Implementation of an “RAW Audio recorder”

To communicate with Alexa it is necessary to send an RAW Audio Stream to the Alexa Backend. Therefore we need an RAW Audio Recorder. The mayor difference between integrated Android options for recording that are much easier to handle are the file compression methods. For sending an RAW audio material to AVS, are not able to use mp3 because this is compressed audio. Furthermore, to improve the user experience and minimise the The implementation you can find below:

package tudresden.loomo_tud_v1;

 * Created by danielschneegast on 26.02.18.
        import android.util.Log;
 * Records raw audio using AudioRecord and stores it into a byte array as
 * - signed
 * - 16-bit
 * - little endian
 * - mono
 * - 16kHz (recommended, but a different sample rate can be specified in the constructor)
 * For example, the corresponding arecord settings are

 * arecord --file-type raw --format=S16_LE --channels 1 --rate 16000
 * TODO: maybe use: ByteArrayOutputStream
 * @author Kaarel Kaljurand
public class RawAudioRecorder {

    private static final String LOG_TAG = RawAudioRecorder.class.getName();

    private static final int DEFAULT_AUDIO_SOURCE = MediaRecorder.AudioSource.VOICE_RECOGNITION;
    private static final int DEFAULT_SAMPLE_RATE = 16000;

    private static final int RESOLUTION = AudioFormat.ENCODING_PCM_16BIT;
    private static final short RESOLUTION_IN_BYTES = 2;

    // Number of channels (MONO = 1, STEREO = 2)
    private static final short CHANNELS = 1;

    public enum State {
        // recorder is ready, but not yet recording

        // recorder recording

        // error occurred, reconstruction needed

        // recorder stopped

    private AudioRecord mRecorder = null;

    private double mAvgEnergy = 0;

    private final int mSampleRate;
    private final int mOneSec;

    // Recorder state
    private State mState;

    // Buffer size
    private int mBufferSize;

    // Number of frames written to byte array on each output
    private int mFramePeriod;

    // The complete space into which the recording in written.
    // Its maximum length is about:
    // 2 (bytes) * 1 (channels) * 30 (max rec time in seconds) * 44100 (times per second) = 2 646 000 bytes
    // but typically is:
    // 2 (bytes) * 1 (channels) * 20 (max rec time in seconds) * 16000 (times per second) = 640 000 bytes
    private final byte[] mRecording;

    // TODO: use: mRecording.length instead
    private int mRecordedLength = 0;

    // The number of bytes the client has already consumed
    private int mConsumedLength = 0;

    // Buffer for output
    private byte[] mBuffer;

     * Instantiates a new recorder and sets the state to INITIALIZING.
     * In case of errors, no exception is thrown, but the state is set to ERROR.
     * Android docs say: 44100Hz is currently the only rate that is guaranteed to work on all devices,
     * but other rates such as 22050, 16000, and 11025 may work on some devices.
     * @param audioSource Identifier of the audio source (e.g. microphone)
     * @param sampleRate Sample rate (e.g. 16000)
    public RawAudioRecorder(int audioSource, int sampleRate) {
        mSampleRate = sampleRate;
        // E.g. 1 second of 16kHz 16-bit mono audio takes 32000 bytes.
        mOneSec = RESOLUTION_IN_BYTES * CHANNELS * mSampleRate;
        // TODO: replace 35 with the max length of the recording (as specified in the settings)
        mRecording = new byte[mOneSec * 35];
        try {
            mRecorder = new AudioRecord(audioSource, mSampleRate, AudioFormat.CHANNEL_IN_MONO, RESOLUTION, mBufferSize);
            if (getAudioRecordState() != AudioRecord.STATE_INITIALIZED) {
                throw new Exception("AudioRecord initialization failed");
            mBuffer = new byte[mFramePeriod * RESOLUTION_IN_BYTES * CHANNELS];
        } catch (Exception e) {
            if (e.getMessage() == null) {
                Log.e(LOG_TAG, "Unknown error occured while initializing recording");
            } else {
                Log.e(LOG_TAG, e.getMessage());

    public RawAudioRecorder(int sampleRate) {
        this(DEFAULT_AUDIO_SOURCE, sampleRate);

    public RawAudioRecorder() {

    private int read(AudioRecord recorder) {
        // public int read (byte[] audioData, int offsetInBytes, int sizeInBytes)
        int numberOfBytes =, 0, mBuffer.length); // Fill buffer

        // Some error checking
        if (numberOfBytes == AudioRecord.ERROR_INVALID_OPERATION) {
            Log.e(LOG_TAG, "The AudioRecord object was not properly initialized");
            return -1;
        } else if (numberOfBytes == AudioRecord.ERROR_BAD_VALUE) {
            Log.e(LOG_TAG, "The parameters do not resolve to valid data and indexes.");
            return -2;
        } else if (numberOfBytes > mBuffer.length) {
            Log.e(LOG_TAG, "Read more bytes than is buffer length:" + numberOfBytes + ": " + mBuffer.length);
            return -3;
        } else if (numberOfBytes == 0) {
            Log.e(LOG_TAG, "Read zero bytes");
            return -4;
        // Everything seems to be OK, adding the buffer to the recording.
        return 0;

    private void setBufferSizeAndFramePeriod() {
        int minBufferSizeInBytes = AudioRecord.getMinBufferSize(mSampleRate, AudioFormat.CHANNEL_IN_MONO, RESOLUTION);
        if (minBufferSizeInBytes == AudioRecord.ERROR_BAD_VALUE) {
            throw new IllegalArgumentException("AudioRecord.getMinBufferSize: parameters not supported by hardware");
        } else if (minBufferSizeInBytes == AudioRecord.ERROR) {
            Log.e(LOG_TAG, "AudioRecord.getMinBufferSize: unable to query hardware for output properties");
            minBufferSizeInBytes = mSampleRate * (120 / 1000) * RESOLUTION_IN_BYTES * CHANNELS;
        mBufferSize = 2 * minBufferSizeInBytes;
        mFramePeriod = mBufferSize / ( 2 * RESOLUTION_IN_BYTES * CHANNELS );
        Log.i(LOG_TAG, "AudioRecord buffer size: " + mBufferSize + ", min size = " + minBufferSizeInBytes);

     * @return recorder state
    public State getState() {
        return mState;

    private void setState(State state) {
        mState = state;

     * @return bytes that have been recorded since the beginning
    public byte[] getCompleteRecording() {
        return getCurrentRecording(0);

     * @return bytes that have been recorded since the beginning, with wav-header
    public byte[] getCompleteRecordingAsWav() {
        return getRecordingAsWav(getCompleteRecording(), mSampleRate);

    public static byte[] getRecordingAsWav(byte[] pcm, int sampleRate) {
        int headerLen = 44;
        int byteRate = sampleRate * RESOLUTION_IN_BYTES; // mSampleRate*(16/8)*1 ???
        int totalAudioLen = pcm.length;
        int totalDataLen = totalAudioLen + headerLen;

        byte[] header = new byte[headerLen];

        header[0] = 'R';  // RIFF/WAVE header
        header[1] = 'I';
        header[2] = 'F';
        header[3] = 'F';
        header[4] = (byte) (totalDataLen & 0xff);
        header[5] = (byte) ((totalDataLen >> 8) & 0xff);
        header[6] = (byte) ((totalDataLen >> 16) & 0xff);
        header[7] = (byte) ((totalDataLen >> 24) & 0xff);
        header[8] = 'W';
        header[9] = 'A';
        header[10] = 'V';
        header[11] = 'E';
        header[12] = 'f';  // 'fmt ' chunk
        header[13] = 'm';
        header[14] = 't';
        header[15] = ' ';
        header[16] = 16;  // 4 bytes: size of 'fmt ' chunk
        header[17] = 0;
        header[18] = 0;
        header[19] = 0;
        header[20] = 1;  // format = 1
        header[21] = 0;
        header[22] = (byte) CHANNELS;
        header[23] = 0;
        header[24] = (byte) (sampleRate & 0xff);
        header[25] = (byte) ((sampleRate >> 8) & 0xff);
        header[26] = (byte) ((sampleRate >> 16) & 0xff);
        header[27] = (byte) ((sampleRate >> 24) & 0xff);
        header[28] = (byte) (byteRate & 0xff);
        header[29] = (byte) ((byteRate >> 8) & 0xff);
        header[30] = (byte) ((byteRate >> 16) & 0xff);
        header[31] = (byte) ((byteRate >> 24) & 0xff);
        header[32] = (byte) (2 * 16 / 8);  // block align
        header[33] = 0;
        header[34] = (byte) 8*RESOLUTION_IN_BYTES;  // bits per sample
        header[35] = 0;
        header[36] = 'd';
        header[37] = 'a';
        header[38] = 't';
        header[39] = 'a';
        header[40] = (byte) (totalAudioLen & 0xff);
        header[41] = (byte) ((totalAudioLen >> 8) & 0xff);
        header[42] = (byte) ((totalAudioLen >> 16) & 0xff);
        header[43] = (byte) ((totalAudioLen >> 24) & 0xff);

        byte[] wav = new byte[header.length + pcm.length];
        System.arraycopy(header, 0, wav, 0, header.length);
        System.arraycopy(pcm, 0, wav, header.length, pcm.length);
        return wav;

     * @return bytes that have been recorded since this method was last called
    public synchronized byte[] consumeRecording() {
        byte[] bytes = getCurrentRecording(mConsumedLength);
        Log.i(LOG_TAG, "Copied from: " + mConsumedLength + ": " + bytes.length + " bytes");
        mConsumedLength = mRecordedLength;
        return bytes;

     * Returns the recorded bytes since the last call, and resets the recording.
     * @return bytes that have been recorded since this method was last called
    public synchronized byte[] consumeRecordingAndTruncate() {
        byte[] bytes = getCurrentRecording(mConsumedLength);
        Log.i(LOG_TAG, "Copied from position: " + mConsumedLength + ": " + bytes.length + " bytes");
        mRecordedLength = 0;
        mConsumedLength = mRecordedLength;
        return bytes;

    private byte[] getCurrentRecording(int startPos) {
        int len = getLength() - startPos;
        byte[] bytes = new byte[len];
        System.arraycopy(mRecording, startPos, bytes, 0, len);
        return bytes;

    public int getLength() {
        return mRecordedLength;

     * @return true iff a speech-ending pause has occurred at the end of the recorded data
    public boolean isPausing() {
        double pauseScore = getPauseScore();
        Log.i(LOG_TAG, "Pause score: " + pauseScore);
        return pauseScore > 7;

     * @return volume indicator that shows the average volume of the last read buffer
    public float getRmsdb() {
        long sumOfSquares = getRms(mRecordedLength, mBuffer.length);
        double rootMeanSquare = Math.sqrt(sumOfSquares / (mBuffer.length / 2));
        if (rootMeanSquare > 1) {
            // TODO: why 10?
            return (float) (10 * Math.log10(rootMeanSquare));
        return 0;

     * In order to calculate if the user has stopped speaking we take the
     * data from the last second of the recording, map it to a number
     * and compare this number to the numbers obtained previously. We
     * return a confidence score (0-INF) of a longer pause having occurred in the
     * speech input.
     * TODO: base the implementation on some well-known technique.
     * @return positive value which the caller can use to determine if there is a pause
    private double getPauseScore() {
        long t2 = getRms(mRecordedLength, mOneSec);
        if (t2 == 0) {
            return 0;
        double t = mAvgEnergy / t2;
        mAvgEnergy = (2 * mAvgEnergy + t2) / 3;
        return t;

     * Stops the recording (if needed) and releases the resources.
     * The object can no longer be used and the reference should be
     * set to null after a call to release().
    public synchronized void release() {
        if (mRecorder != null) {
            if (mRecorder.getRecordingState() == AudioRecord.RECORDSTATE_RECORDING) {
            mRecorder = null;

     * Starts the recording, and sets the state to RECORDING.
    public void start() {
        if (getAudioRecordState() == AudioRecord.STATE_INITIALIZED) {
            if (mRecorder.getRecordingState() == AudioRecord.RECORDSTATE_RECORDING) {
                new Thread() {
                    public void run() {
                        while (mRecorder != null && mRecorder.getRecordingState() == AudioRecord.RECORDSTATE_RECORDING) {
                            int status = read(mRecorder);
                            if (status < 0) { handleError(); break; } } } }.start(); } else { Log.e(LOG_TAG, "startRecording() failed"); handleError(); } } else { Log.e(LOG_TAG, "start() called on illegal state"); handleError(); } } /** * Stops the recording, and sets the state to STOPPED. * If stopping fails then sets the state to ERROR. */ public void stop() { // We check the underlying AudioRecord state trying to avoid IllegalStateException. // If it still occurs then we catch it. if (getAudioRecordState() == AudioRecord.STATE_INITIALIZED && mRecorder.getRecordingState() == AudioRecord.RECORDSTATE_RECORDING) { try { mRecorder.stop(); setState(State.STOPPED); } catch (IllegalStateException e) { Log.e(LOG_TAG, "native stop() called in illegal state: " + e.getMessage()); handleError(); } } else { Log.e(LOG_TAG, "stop() called in illegal state"); handleError(); } } /** * Copy the given byte array into the total recording array. * * The total recording array has been pre-allocated (e.g. for 35 seconds of audio). * If it gets full then the recording is stopped. * * @param buffer audio buffer */ private void add(byte[] buffer) { if (mRecording.length >= mRecordedLength + buffer.length) {
            // arraycopy(Object src, int srcPos, Object dest, int destPos, int length)
            System.arraycopy(buffer, 0, mRecording, mRecordedLength, buffer.length);
            mRecordedLength += buffer.length;
        } else {
            // This also happens on the emulator for some reason
            Log.e(LOG_TAG, "Recorder buffer overflow: " + mRecordedLength);

     * Converts two bytes to a short, assuming that the 2nd byte is
     * more significant (LITTLE_ENDIAN format).
     * 255 | (255 << 8)
     * 65535
    private static short getShort(byte argB1, byte argB2) {
        return (short) (argB1 | (argB2 << 8));

    private void handleError() {

    private int getAudioRecordState() {
        if (mRecorder == null) {
            return AudioRecord.STATE_UNINITIALIZED;
        return mRecorder.getState();

Integrate a Library for stable communication

Using this library it is possible to communicate with Alexa if you follow the readme instructions: GitHub Repo

API Documentation for this repository: Willblaschko API Documentation

After testing with LWA and AVS projects and descriptions mentioned above, please delete the lwa jar file or other links from your application-level build.gradle file! If you forget it you will receive an multiple dex error from the java framework.

Furthermore, you need to write your own getRequestCallback() function and add a few lines in the checkQueue() function for correct functionality.

source code getRequestCallback()

public AsyncCallback<AvsResponse,Exception> getRequestCallback() {

        AsyncCallback<AvsResponse, Exception> requestCallback = new AsyncCallback<AvsResponse, Exception>() {
            public void start() {


            public void success(AvsResponse result) {
                Log.i(TAG, "Voice Success");

            public void failure(Exception error) {


            public void complete() {

        return requestCallback;

Change Alexa’s language

Changing Alexa’s language is not supported by the imported GitHub repo mentioned above but can be implemented manually. Supported languages are English (US,IN,AUS,GB), German and Japanese. Note: Please be aware that these are not the only supported languages for Alexa in general but these are the only languages you can choose using you AVS json requests.

The complete documentation for the “SettingsUpdated” method can be found here: Amazon AVS SettingsInterface

package tudresden.loomo_tud_v1;

import java.util.UUID;

 * Created by danielschneegast on 19.03.18.

public class JSONinputDE {
    private String jSONString = "{\n" +
            "  \"event\": {\n" +
            "    \"header\": {\n" +
            "      \"namespace\": \"Settings\", \n" +
            "      \"name\": \"SettingsUpdated\", \n" +
            "      \"messageId\": \"18f2ff20-824a-4823-bbf9-5e7d4975fdf4\"\n" +
            "    }, \n" +
            "    \"payload\": {\n" +
            "      \"settings\": [\n" +
            "        {\n" +
            "          \"value\": \"de-DE\", \n" +
            "          \"key\": \"locale\"\n" +
            "        }\n" +
            "      ]\n" +
            "    }\n" +
            "  }, \n" +
            "  \"context\": []\n" +
    //for testing purposes
    //only works once, messageID has to be unique

    public String getjSONString() {
        return stringFirst + messageID + lastPart;
        //return jSONString;

    private String stringFirst = "{\n" +
            "  \"event\": {\n" +
            "    \"header\": {\n" +
            "      \"namespace\": \"Settings\", \n" +
            "      \"name\": \"SettingsUpdated\", \n" +
            "      \"messageId\": \"";

    private String messageID = UUID.randomUUID().toString();

    private String lastPart = "\"\n" +
            "}, \n" +
            "    \"payload\": {\n" +
            "      \"settings\": [\n" +
            "        {\n" +
            "          \"value\": \"de-DE\", \n" +
            "          \"key\": \"locale\"\n" +
            "        }\n" +
            "      ]\n" +
            "    }\n" +
            "  }, \n" +
            "  \"context\": []\n" +

Received Payload from AVS

If you receive an AVSItem (json response) from the AVS Backend it contains a header and a payload. The header itself contains the following information:

header={Directive$Header@5943} for instance

The structure of the payload looks like that:

payload={Directive$Payload@5944} for instance 
(Item id is increased about one oriented at the header)

For more information about the structure click here: structure http2 request

Alexa Skills

General usage

For further capabilities of the Voice Service by Amazon you are able to use Alexa Skills. To enable theses Skills, go to the developer console: Developer Amazonclick on “Skills”, and create a new skill. In the Video tutorial you get a brief introduction how to create your first skill. For more information please visit this page: Alexa Skills Kit

Further Links: Here you can find a very good documentation about Alexa Skill creation with AWS: YoutubeLink

Usage of Skills with instead of AWS

Normally, it is Amazons aim to use AWS for any skills interaction (you need to enter a link to a web service where the lambda function is stored). As an endpoint you can use the recommended AWS Lambda ARN or enter your own https link. (It is not necessary to install a ssl certificate on your local instance if you only use your skill for testing purposes. The certificate is only needed if you like to publish your skill. Therefore you have to care about several technical issues for instance no acceptance of unsecured http requests and so on).

To avoid AWS you can deploy your own node.js server locally using

1. install a node packet manager on your system if you don’t already have one

2. start bespoken installation following this guideline: installation

3. Now you need to install the amazon voice SDK into the same folder where you installed the project using the following cli command $ git clone MacOS the project is stored here: /Users//.bst/)

4. Now you need to copy your online created Skill using the developer console (navigate to “Interaction Model” – “JSON Editor”) to your local server. But the online skill is a json file and you need a javascript file. To convert this easily it’s common to use

5. Please insert this code to the following file: /Users//.bst/skill-sample-nodejs-hello-world/src/index.js (Be careful! In the …./.bst/ directory you can find more index.js files. Only use the upper one)

6. Now you can start you local server with the following prompt: $ bst proxy lambda index.js For more information, please have a look at the documentation: documentation

7. If you execute the prompt of point 6 you can directly see the link on your cli you need to copy to Alexa Skills Kit online as “https endpoint”.

8. That’s all. Now you can test your skill.

Project Skill for Loomo

For the university project, I’ve added an Skill with the following communication structure: “Drive Skill” Answer: “Welcome to Daniels Drive Skill. You can say drive forward to move Loomo.” “drive forward” Answer: “Loomo drives now. Skill ends automatically.”

In the last answer is a card response included. For testing purposes I’ve tried to read this card fron the response. Unfortunately, this information is not stored in the payload of the AVS response but will be send separately.

Control the Robot Using Alexa


Sprachassistenzsysteme sind heutzutage in immer mehr Lebensbereichen zu finden. Sei es im privaten Umfeld, an Autobahnen, in Geschäften zu Beratung oder in Autos. Durch die bessere Vernetzung und hochperformante Kommunikation mit anderen Systemen – welche sich in der Cloud befinden – ist die Möglichkeit von Sprachassistenten deutlich höher als die Erledigung von einfachen vorprogrammieren Aufgaben auf ein und demselben Gerät. Diese Arbeit beschäftigt sich mit der Beurteilung und Untersuchung von Anwendungsfällen im Pflegeumfeld zur Unterstützung des Personals bei den täglichen Aufgaben. Hierbei kommt der von Segway Robotics produzierte Loomo [Inc18] als Hardware zum Einsatz. Die Kommunikation findet über Amazons Sprachassistent Alexa sowie durch verschiedene Module der Amazon Web Services wie Lambda und IoT Core statt. Hilfreich ist hierbei die schnelle Kommunikation mithilfe von MQTT wodurch Befehle an Endgeräte – sogenannte Things – verschickt werden können. Die weiteren Anwendungsbereiche welche durch den hier dargestellten konkreten Anwendungsfall aufgezeigt werden konnte ist sehr groß und das Potential für weitere Forschungsthemen im Bereich der Pflegeassistenz sollte so schnell wie möglich weiter erforscht werden um zunächst das Pflegepersonal und natürlich auch den pflegebedürftlichen Menschen ein Stück Lebensqualität wiedergeben zu können.

Download Research Field Analysis Thesis

You can download the full submission as PDF file (1,5 MB) here.