Speech Synthesis Markup Language

Speech Synthesis Markup Language (SSML)

(Adapted from the Amazon Alexa Developer’s Website)

Alexa converts to speech. Alexa automatically handles normal punctuation, such as pausing after a period, or speaking a sentence ending in a question mark as a question.

However, in some cases you may want additional control over how Alexa speaks. For example, you may want a longer pause within the speech, or you may want a string of digits read back as a standard telephone number. The Alexa Skills Kit provides this type of control with Speech Synthesis Markup Language (SSML) support.

A Markup Language for Speech

A markup language is a way to mark ordinary text with tags to make it look different, or in our case, sound different when read aloud. HTML is a well-known markup language — it uses tags like “<bold>” to make format text and graphics.

SSML is a markup language that provides a standard way to mark up text for the generation of synthetic speech. SSML looks a lot like HTML, in that it uses tags of this form:

<tag value="value">My text is here.</tag>

While SSML has a lot of tags, the Alexa Skills Kit supports a subset of the tags defined in the SSML specification (only 12 right now, but you can do a lot with these!). The specific tags supported are listed in Supported SSML Tags.

To use these tags in your text on WordPress, you must view your post entry with the “Text” tab selected (it is at the top-right of the text editing box):

When you select the text tab, you can see the special tags (or codes) that WordPress uses to format text. These are mostly HTML tags.

To use the SSML tags, you must type them before and after the text you want to modify. For example, if you want a word spoken in a high pitch, you must put a “prosody” tag before it, and a “/prosody” tag after it to indicate the end.

To see a sample post with SSML markup, search for the posts for “Speech Synthesis Markup Language”. Edit the post, then click the “Text” tab to see the SSML tags.

Here is what the text of that post looks like, with SSML.

<prosody pitch="x-high">Hi, I'm Missy.</prosody>
<prosody pitch="x-low">And I'm Biff.</prosody>
<prosody pitch="x-high">Don't you wish you could bring Alexa to life with just a little more emotion</prosody>
<prosody pitch="x-low">Well, you can, with Speech Synthesis Markup Language. Here is an example of a marked-up text that demonstrates how to use SSML to modify how Alexa speaks:</prosody>
<prosody pitch="x-high">Check it out on the class website!</prosody>

To begin, here is normal volume for the first sentence.
<prosody volume="x-loud">Louder volume for the second sentence</prosody>.
When I wake up, <prosody rate="x-slow">I speak quite slowly</prosody>.
I can speak with my normal pitch,
<prosody pitch="x-high"> but also with a much higher pitch </prosody>,
and also <prosody pitch="low">with a lower pitch</prosody>.

The following link has examples of different SSML tags that Alexa supports: https://developer.amazon.com/docs/custom-skills/speech-synthesis-markup-language-ssml-reference.html#ssml-supported

For more kicks, try these special codes to add emotion to words like “Aha,” “Achoo,” “Abracadabra” and more: https://developer.amazon.com/docs/custom-skills/speechcon-reference-interjections-english-us.html

The “Radio David” Skill

Flash Briefings won’t use SSML tags! So, to try it out, you’ll have to use our custom “Radio David” skill. And, to try that skill :

  • You’ll have to get an Amazon developer account
  • Be invited to test the Radio David skill
  • Make new posts marked up with SSML

Or, you can try it in class.


You can learn more about SSML here: https://developer.amazon.com/docs/custom-skills/speech-synthesis-markup-language-ssml-reference.html