Improving the quality of spoken text on the web

The W3C Accessible Platform Architecture’s Spoken Presentation Task Force is working to develop a standard mechanism to allow authors to control how web content should be presented via text to speech synthesizers (TTS) used by assistive technologies (such as screen readers). Other beneficiaries of this work would be the voice assistants, such as Amazon Alexa and Google Home.

One simple example can be seen with the following sentence:

According the 2010 US Census, the population of 90274 increased to 25209 from 24976 over the past 10 years.

Without presentation control, the zip code 90274 is read as ninety thousand two hundred and seventy four.

With SSML markup, the author could specify that the zip code be read as digits.

According the 2010 US Census, the population of <say-as interpret-as="digits">90274</say-as> increased to 25209 from 24976 over the past 10 years.

I will repeat the sentence once again, below, using in-line SSML. This version has the actual SSML markup shown above embedded in the HTML. Consider it an experiment as we explore methods by which the SSML can be consumed.

According the 2010 US Census, the population of 90274 increased to 25209 from 24976 over the past 10 years.

Now once, more, with the SSML say-as embedded in the HTML using a data-attribute, which looks like this: <span data-ssml='{"say-as" : {"interpret-as":"digits"}}'>90274</span>

According the 2010 US Census, the population of 90274 increased to 25209 from 24976 over the past 10 years.

IDRC 2010 @ Davos

Global Risk Forum DavosThe Global Risk Forum’s International Disaster and Risk Conference (IDRC2010) is on this week in Davos, Switzerland. I’m joined by colleagues from Finland and the US in both a workshop and special session focused on Mobile Technology and ICT for Disaster Preparedness, Warning, and Response.

IDRC has become a favorite conference of mine, having previously attended IDRC 2007 in Harbin, China, and IDRC 2008, in Davos. The Global Risk Forum and IDRC spun out of the UN International Strategy for Disaster Reduction under the leadership of Walter Ammann.

The conference itself brings together  academics, researchers, governments, industry, and NGOs to focus on the key issues in disaster risk reduction.  This year has already seen significant events, ranging from the Haitian and Chilean earthquakes to the ongoing BP oil spill catastrophe in the Gulf of Mexico.  Unfortunately, there is never a shortage of tragedy for discussion.

Our workshop Sunday morning presaged a great conference. As we walked in to the meeting room just before the 9am start (after a late night arrival from Zurich), we were greeted by a fantastic group of attendees ready to get started.  The three hours went by like lightning and I look forward to continuing the discussion in person this week and online beyond the IDRC conference.

Will the N900 see Meego?

N900 Smartphone with text Me Go Away?In my initial post regarding the Nokia N900, I expressed some hope that the product would see improvement through software updates to the Maemo operating system.  Unfortunately, today’s announcement by Nokia and Intel makes it likely that the Maemo-based N900 is at a dead end.   The announcement introduces a new mobile (and device) OS called Meego, which supposedly combines the best of Intel’s Moblin with Nokia’s Maemo.  For the N900 user, the future is quite uncertain.  My own interest in experimenting with N900 app development is now over (iPhone SDK here I come).  There may be hope for current N900 owners being able to install Meego, but it is troubling that Nokia neglected to say anything about the N900 future as part of today’s announcement.

I’ve had a growing number of N900 freezes, typically when I have the Web browser and email apps open, though the overall experience with the phone isn’t that unpleasant.  However, with no updates likely, my days with the N900 are may be numbered.

Update: February 16, 2010… Nokia releases new firmware update for N900.  I had to remove my N900 battery as the phone was locked up this morning, but when it did finally start up, I saw the blinking update icon and performed the update.  The update process on the N900 is well done, and within a few minutes my phone had restarted.  Unfortunately, while exploring the settings for anything obviously new, the phone locked up again! Well, it wasn’t fully dead… I tried calling the phone and though it didn’t ring, the missed call notification light started blinking.  Holding the power button down eventually worked to shut the phone down.  What exactly did this update fix? Not too much, it appears.

Is that Linux calling? Experiencing the N900 Smart Phone

My research involves emergency notifications on mobile devices, which, as you might guess, gives our team here the opportunity to try out the latest in mobile technology.  Yes, the iPhone is great, and so are most Android devices, but when it came time to pick my own new phone, I decided to have a look at the Nokia N900. Why the N900? Most intriguing to me was the fact that it has an open source operating system called Maemo, which is based on Linux.  And with the full physical QWERTY keyboard, this phone addresses some of the very few concerns I have had about the iPhone.

Screen of Nokia N900, showing app icons, weather widget.
Nokia N900 Phone

The N900 is typical Nokia with good, solid construction, though I don’t think it will be as durable as the e51 I’ve had (and dropped repeatedly) for two years.   The N900 screen is plastic, and uses a resistive touch screen technology. The screen is one of the many things that sets the N900 apart from the Apple products. Resistive technology has been, for the most part, only single touch, though multi touch support is becoming possible. The iPhone/iPod Touch/iPad products have glass screens and highly responsive, multi touch capacitive touch screen technology.  There is no comparison between the Apple and Nokia products… Apple wins on the basis of fluid, responsive, natural interaction with the on-screen user interface.

At times, on the N900, I will swipe my finger repeatedly in an attempt to scroll, selecting items I did not intend to select, and end up using the included stylus for more accurate selection. The stylus is included for reason, as the interface works more smoothly with it than the finger tip.

I do appear to be improving my touch skills with the N900 over time,  though some negative transfer of training occurs as I shift between it and my iPod Touch during the course of the day.  Where the N900 excels with touch technology is that when it is -26 degrees Celsius outside, as it is currently here in Jyväskylä, and I need to answer the phone, I can actually use my gloved finger on the resistive touch screen.  Capacitive touch screens, like on the iPhone, will require you to remove your glove.  Not sure that feature alone would entice the typical consumer south of the 60th parallel to choose an N900 over an iPhone, however.

Excellent Skype Integration

What is there to like about the N900? The number one feature for me is the seamless integration of Skype with the base cellular telephone functionality of the device. When you first set up your Skype account, all of your Skype contacts are imported in the phone book on the N900.  If your contacts have both Skype and regular phone numbers, you have the choice of making calls using Skype to Skype, Phone to Skype, Skype to Phone, or Phone to Phone. Great when roaming internationally, and in a Wifi zone.  Receiving calls, I can’t tell the difference between an incoming Skype call or a call directly to my phone number.  In the phone book, a small icon indicates whether your contacts are online or not.  If I used GoogleTalk, I would also have that option for calls, and for text messaging, I can set up AIM, Skype, SMS, Live Messenger, etc. No need for separate apps for each service. Unified messaging takes on new meaning on the N900. Wifi support has been good so far.

But let me not paint too rosy a picture of the N900. Nokia got some things right, but the overall user experience can be quite frustrating.    I found myself at a loss as to how to answer an incoming call when I was in the middle of writing email…. I heard the ringing and struggled to touch my way to the phone function (I missed the call, by the way).  I eventually learned how, but the overall lack of intuitiveness in the user interface is a big negative. Did the N900 user interface designers spend any time at all actually using an iPhone???

Limited App Availability

There are apps, most for free and the majority from the Maemo Developer Community. Are any of the apps really compelling? No, nothing new. If you’ve seen the Apple or Google apps stores, you would immediately be asking, “Where are the N900 apps I really need?”  Sure, there are some unique things, a Linux command shell and DOS emulator for example, but those will only appeal to the developers. The message is clear, this is not a consumer phone. The OVI Store, Nokia’s answer to the Apple App Store, has a long way to go.  By the way, ovi, in Finnish, means “door”… not sure I’d buy my apps from the Door Store.

No Caller Specific Ringtones?

I’m a big fan of using different ring tones for identifying callers, and setting multiple profiles (Outdoors, Silent, Meeting, etc).  Can I do this on the N900?    There are only two profiles, General and Silent and no apparent way to define new ones. You can select a ringtone for the General profile, a volume level, and whether or not the phone vibrates, but you have none of the other options normally found on Nokia phones.  Unique ringtones for your contacts? No!  I’m sure Nokia didn’t forget these features in such a high end phone, but do I really need to recompile my Linux kernel to enable them?  Seriously, there must be a way to do this, as it ships with multiple ringtone files in the phone’s ringtone directory. Maybe some kind reader will let me in on the secret.

User Interface Frustrations

The N900 desktop certainly offers more flexibility over that of the iPhone, closer to the Android model.  I have multiple pages, or views, in the desktop, and I can use a finger swipe to switch between views, and can place widgets and shortcuts to apps and functions on the different views. But, in my daily use, I find the touch screen interface behavior is inconsistent. Sometimes the transitions between views are smooth, and other times you feel like Sisyphus, sliding your finger forcefully to try and move to the next view, only to have the original view bounce back into place.

I find the access to the installed apps panel another less than intuitive feature. It is a mind boggling two step of sequential touches to the upper right of the screen, sometimes getting you to the app view, sometimes back to where you started. And the app view itself requires precision in touch selection.  If you don’t quite hit the desired icon, and instead hit the background, you sometimes end up going back a level, sometimes to an app you did not want to select.  In fact, the user interface’s mixed use of “cancel” and “close” methods is strange. Some dialogs (if that is the correct term) require you to press the “blurred” background of the overlaid app to cancel the action and close the dialog. Other times, there is a an X icon button in the top right corner, and other times a left pointing arrow to take you back.  I’ve been lost a number of times, which seems a good segue to the map and navigation features of the N900.

Ovi Maps, integrated, it seems with the N900’s GPS, is not ready for prime time. I tried to use it, and will have to say nothing more for now. I’ll wait for an update before I give it serious examination.

The N900 has a camera, 5 megapixel, with a Carl Zeiss lens and flash. I’m relatively pleased with the quality, compared to my prior Nokia phones. Even seems a little better than the N85’s camera I used briefly.

Firefox and Flash

Web browsing is via Firefox. Not a bad experience, but only landscape mode is enabled. However, you can activate portrait mode support, thanks to the great information found on the Maemo community forums. The accelerometer’s rotation detection is responsive.  Unfortunately,  the N900 user interface is designed primarily for landscape mode operation. Few functions actually take advantage of the accelerometer.

Another advantage for N900 over the iPhone is Adobe Flash support.  Works well, thus far. However, I like the overall user experience of Safari on the iPhone/iPod Touch.  By the way, zooming, is supported, but no pinching or squeezing, please, on the N900. There is an interesting single touch gesture, using clockwise/counterclockwise swirls. Unfortunately, I can only get it to reliably work with the stylus, and instead resort to using the physical volume control buttons that change mode to become zoom in and out controls when Web browsing or using the camera. This overloading of modes leads to confusion when one wants to use the physical volume buttons to turn down the sound of a playing YouTube video only to result in the screen zooming. Well, indirectly, those volume buttons do work, allowing you to zoom in to the tiny YouTube volume control icon, which you can then attempt to adjust with finger, and more effectively, with stylus. The overall experience is, how do you say, fail?

Physical Keyboard

Lower right corner of N900 keyboard showing cursor key arrangement.
N900 Cursor Key Arrangement

The Nokia N900 has a QWERTY keyboard, which slides out from the unit. Keyboard has a relatively good feel, and I found myself able to type as quickly as with the touch screen keyboard without much effort. I will likely improve over time. The keyboard requires use of a modifier key to get to numbers and symbols. I have the Finnish/Swedish keyboard, which has a disadvantage over the US keyboard: the cursor key arrangement.  The US keyboard has the now common 4 key inverted T arrangement at the lower right part of the keyboard. The Finnish/Swedish maps the four cursor arrow to two keys. Great if you only scroll left or right, but a challenge when you are trying to go up or down.

Battery Life

Battery life is not impressive, but not unexpected.  I managed a maximum of 14 hours of use before the phone died.  In practice, I would keep it plugged in and charging whenever at my desk, and overnight.  Once the battery indicator is below half, I know that I better get to a charger.  As the battery level gets well below half, I might assume I have enough for that last email or phone call, only to find my screen suddenly blank and phone dead. When travelling, and likely not able to charge regularly, I will be prepared with a backup phone, like my E51, which has great battery life.

Accessibility

And finally, for those expecting me to talk about N900 accessibility features. There isn’t much to say. The only published feature is “Hearing Aid Compatibilty Rating: M3.”  For such a capable device, shame on you, Nokia. I’ll wager Nokia is waiting for the open source community to get Orca running, or emacspeak.

Unlike other Nokia devices, there is no built in speech synthesis, no text to speech reading of text messages, or basic phone functions.  Apple’s VoiceOver, and Android’s accessibility features are where the mobile device manufacturers of the world need to be.

I did manage to locate a text to speech program, eSpeak, via the Maemo community and installed it successfully.  Being a command line utility, it has little immediate practical value other than demonstration. If Nokia is serious about Maemo, I hope, no, make that expect, that they will port their excellent set of multi-lingual text to speech engines to the N900.

Conclusion

In conclusion, I think the N900 is a device with lots of great potential. It should have been a great phone, and still can be.  I have this idea that my N900 is dreaming of being an iPhone when it grows up.  A first release of any product will have its short comings, but the N900 has far too many. Surely, Nokia sees this, and is working on software fixes and enhancements. The Maemo community will certainly be doing its part, too, but, historically, open source projects have given low priority to usability.

I think many of the usability problems can be corrected in software, and especially through enhancements to the touch interface processing and improvements in  consistency of the user interface.  But until Nokia releases a significant software upgrade, I would strongly urge you to stay away from the N900 unless you thrive on frustration or are interested in learning how to program for the Maemo platform. Personally, I will probably give the N900 another chance, and wait for that upgrade.  In the meantime, anyone see my copy of Understanding the Linux Kernel?

Oh, wait a second, what’s this I hear about Android running on the N900? Stay tuned.

Comments Welcome on W3C User Agent Accessibility Guidelines 2.0 Draft

The W3C WAI User Agent Accessibility Working Group, in which I participate, is looking for your comments on the current draft of UAAG 2.0.  Within the document, you will find specific areas that we are seeking comment on.

User agent, if you are not familiar with it, is the term used by W3C to describe what we more commonly call a Web browser, but it can also apply to media players and other software that provides a user interface to Web content.

UAAG 2.0 will eventually replace the UAAG 1.0 guidelines, which became a W3C Recommendation in 2002. Much has changed on the Web since the release of 1.0 and the current working draft is intended to bring the accessibility guidelines for user agent developers up to date.

Comments should be sent to the WAI UA public comment email address, public-uaag2-comments@w3.org by April 22, 2009.

Spring Flowers Bloom Early: DAISY and Buttercup

No, I am not changing the blog theme to Horticulture.  Instead, I wanted to write about some digital talking book developments announced two days before today’s official, and snowy (here in New Jersey) start of Spring.

On March 18, the DAISY Consortium announced the second release of Save as DAISY for Microsoft Word, and in the same announcement introduced ButtercupReader, a Web-based DAISY player implemented in Microsoft’s Silverlight by a firm called Intergen.  These are exciting developments, and worth taking a look at.

I will admit that I generally don’t rush out to praise efforts by Microsoft, but in this case, their support of DAISY is to be commended, albeit late in coming to fruition.

Save as DAISY, or in short, SAD,  is not a new concept, as products from Dolphin Computer Access have provided similar functionality for some time. What distinguishes SAD is its open source approach, and the fact that it is freely available.  It promises to provide every user of Microsoft Word the capability to generate a digital talking book publication based on the DAISY open standard.

So, does it work?  Yes, but not without problems.  Within a few minutes after installation, I was producing, or as SAD calls it, translating,  full text and audio DAISY books from my Word documents.  I was even able to listen to these books using Buttercup Reader (more on that below).  Feeling confident, I made a few, what I would consider, minor editing changes to the first document I had successfully translated, and then started the translation process again. Unfortunately, this time I wound up with no book and a corrupted Word document.  I dutifully reported this to the DAISY SAD forum, and await their response. In the meantime, backing up your source documents before using SAD is a good idea, or you may find yourself in a sad state.

In spite of the problem, I was impressed by the fact that I could start with a Word document and in minutes, have a working DAISY publication, with the audio narration automatically generated using the default Microsoft Speech API Sam voice installed with Windows XP.  I am not that enamored with Sam’s narration of my fine prose, so I’ll be exploring how to change the speech synthesizer SAD uses when time permits.

Can Save as DAISY convert every Word document to the DAISY format?  Short answer, no.   DAISY is based upon a model of structured, semantic markup, with guidelines available for authors to aid in correctly structuring their documents.  The current version of DAISY (DAISY 3 or ANSI NISO Z39.86-2005), uses an XML language called DTBook, designed specifically for representing the structure of books and other publications.

Converting any document to DAISY requires a process of mapping the structure in the source document to DTBook.  Microsoft’s Word format is notorious for the junk that results when converting to HTML, but Office Open XML has attempted, with mixed success, to bring some order to what was often chaos.  The key to creating a well structured DAISY book in Word is to apply the same rules we generally use for any accessible document authoring.  Extra care has to be taken, though, when using headings (e.g., Word’s Heading 1, Heading 2, etc.) to maintain proper nesting of levels, a key to creating the navigation structure for a DAISY publication.

Save as DAISY is certainly promising, but not apparently without flaws.  I would also comment that in Microsoft Word 2007, SAD installs as the Accessibility ribbon, which I think is a little confusing. Though, SAD, and DAISY are great developments for accessibility, the use of the label Accessibility may promise the Word user more than it delivers, especially for those already at a loss on where to find common accessibility features such as adding alt text to an image. Why not just label the ribbon DAISY?  Unless, of course, SAD’s goal is to add more accessibility related features for Word documents in the future.  SAD’s documentation also leaves something to be desired, and seems more beta than a release 2 product would suggest.

As a concept, being able to produce a DAISY talking book from a mainstream product such as Word is a significant step for accessibility.  Adobe should do the same across their products,  as has been suggested to them in the past.  It should be noted that Adobe has added support for saving as DTBook (or ePub) from within InDesign, a good first step, and FrameMaker’s XML capabilities can support DTBook. And, there is even a Save as DAISY plug-in for OpenOffice Writer.  Let’s see if other mainstream products will follow this lead.

Buttercup Reader was a pleasant surprise. This isn’t the first time that Silverlight has been used to build a DAISY player, as Tanakom Talawat has had a beta of his DAISY Now project available since last year.  And it joins other Web-based DAISY players such as Charles Chen’s cool Dandelion prototype.

As a Web-based player,  ButtercupReader comes across as slick, functional, and well designed, especially given that it is termed an early demo.  I have had my concerns about Silverlight, but perhaps now I will be a bit more open minded.  The developers of Buttercup, Intergen, state that they will be using the player as a vehicle for demonstrating how to create accessible, rich internet applications using Silverlight, and have a presentation and demo at MIX09 to get the ball rolling.

I tested Buttercup using Firefox on both Windows XP and Mac OS 10.4, and was able to read the supplied samples as well as my own local DAISY books.   Buttercup requires locally stored DAISY books to be within a zip archive, which meant that I needed to zip up the books I wanted to read.    Given that it is only demo, we can ignore that the bookmark feature is not functional, and a common DAISY player feature, speed up and slow down of playback, is not present.  However, Buttercup is a great start, and I hope the developers follow through and turn it into a full product.

Artifact: pwWebSpeak

In the brief history of the World Wide Web, we have seen a number of technologies and ideas come and go.  The non-visual, or self-voicing, Web browser is one technology that emerged in 1995-1996 and sought to solve the challenges that people with visual impairments were facing when trying to access the Web. pwWebspeak was one such product, joined by IBM’s Home Page Reader, and a number of research oriented systems.  These products arose because screen readers of the time had significant challenges in presenting and interacting with Web content. Fortunately, with the advent of the W3C Web Accessibility Initiative and improvements in Web content, accessibility APIs, and screen readers,  these specialized browsers became largely obsolete.

Having faded into the sunset, the legacy of these pioneers is worth noting, as they served as a proving ground for Web accessibility and influenced how users of current screen readers interact with the Web today.

If you are interested in learning more about one of these pioneers, I have added an article on pwWebSpeak to the TakingInterfaces Artifacts section.

And, if you want to experience a self-voicing browser without stepping back into the 1990’s, Charles Chen’s FireVox add-on for Firefox is highly recommended.  For those on Linux, T.V. Raman’s emacspeaks, now in its 29th release, is still the best and just about only implementation of Aural CSS (Opera includes ACSS support).

Alt-erations

The section on the alt attribute in the current HTML5 working draft that begins with “What an img element represents depends on the src attribute and the alt attribute” really seems to miss the point.  This is the semantic Web era, correct?  Isn’t the conditional logic of the current draft really trying to affix a meaning or purpose to an image in all the wrong ways? Ambiguity is not the way.

I can just see the HTML5 alt attribute logic working so well in other contexts:

The Product Safety Agency announces a new toxicity labeling system.  All poisons are clearly labeled, except when they are not poisons, as indicated by a blank label. For products whose toxicity has yet to be determined, no label will be affixed.

Other applications of this logic come to mind, all equally frightening.

The very idea that alt be changed to a non-required attribute troubles me  much more than the complete departure of longdesc.  There have been ongoing discussions on this issue for some time, often heated, and WAI-PF is working on a response to the HTML WG.

Having been an early supporter and implementer of alt, I see only one real solution.

The role attribute is a key foundation in the WAI-ARIA efforts. Within ARIA, a role of presentation can be applied to an image,  indicating that the image is not part of the page structure and should be ignored by assistive technology.

Explicit indication of the authored intent of the image goes a long way toward reduction in ambiguity.  Rather than trying to infer the role of an image from the presence or absence of the alt attribute content, or of the attribute itself, role would explicitly define the authored intent of the image and its purpose in the interface.

I think, however, the current ARIA roles are insufficient (and yes, I will be responding to the request for comments on the Working Draft).

Following are two example cases of images with blank alternate text and the role attribute.

A purely decorative color bar:

<img src="colorbar.png" alt="" role="presentation" />

An image in a photo library, for which no alt text has been specified:

<img src="kif001293.jpg" alt="" role="content" />

I am not suggesting content as the ideal role for images which have a need for alternate text or description.  One could go off the deep-end and suggest photo, figure, map, etc. This needs to be simple, and content is a starting point, but not the end point.

Of course, we can still have an accessibility problem if alt is still left blank, but we won’t be guessing as to whether the image is decorative or the author just hasn’t gotten around to specifying the alternate content.

And as someone who once developed a non-visual browser, creating rules for dealing with alt (and its absence), and likely being among the first to implement longdesc support (for what it was worth),  having the ability to know that an image is really content is an important plus.  Further, knowing that an image defined as content happens to have a blank alt value could motivate the user agent and assistive technology to look for meaning where it can be found. ARIA’s describedby would be an obvious choice.

As an occasional optimist, I do have hope that technology will help mitigate the problematic assumption that content authors will neglect specifying alt (for a variety of sometimes understandable reasons). Authoring tools can become a bit more clever (or manipulative) in their approach to gleaning image alt text from authors.  And, embedded metadata, such as EXIF, may give us some clue about an image, where it was taken and when. It is not unlikely that we will see more useful metadata available, either in the image itself or queryable. And research is obviously looking for ways to solve the large photo library problem, and that can only help accessibility.

Keep alt required in HTML5, add role as a required attribute, encourage use of aria-describedby. Problem solved?