Here is what I would like to accomplish:

On a new website that I am building, I want to feature about 30 "specific images" per page in such a format that when my users click on a "specific image", they will see 4 or 5 more different angle shots of that image related to that "specific image" and that they will hear one brief description that is associated with that encounter/slide show.

Then when they are done with that encounter/slide show/ "specific image", they can go to the next encounter/slide show/ "specific image" (let's say the second of the 30 specific images) and so on until they have seen all 30. So for each encounter/slide show/ "specific image" I would have a specific audio file (which I would have to create using some text to speech software).

So in the above example, I would have 30 encounters/slide shows/ "specific images" with each one having its own speech clip. Please Note: the images would be still images, the speech clip would never be more than 50 words, the encounter/slide show duration would be less than a minute, there would be no music, and I would also want to have one title of descriptive text for each "specific image".

Thank you in advance for reviewing and responding to my request.

Larry Stoller

Nothing you need is extremely difficult. There are a few pieces that can be a little challenging, but I'm sure there's a way to make it work.

Generally I'd recommend Flash since you want images and audio to interact. There's really no other way to do this effectively, unless you're willing to have a little uncertainty in the system. In that case you could use Javascript for the slideshow and something like SoundManager2 to embed the audio, and just hope they time up correctly (leaving a few extra seconds for loading times).

For information on that audio software, see the following (it's free):

However, the main reason I wanted to reply is that I think using text to speech is a really bad plan. Even the very best speech synthesizers don't sound very good and no one will want to sit and listen to them on your site for very long. If this is part of your content (part of why your site is good), then I highly recommend using real recordings.
At the very least you should invest in a high quality program to do this, definitely not rely on freeware voices such as what comes bundled with Mac OSX. It sounds like a bad robot voice.

Note: I'm a PhD student in Linguistics, so I'm generally familiar with the topic. There are some fairly reasonable synthesizers out there, but there's nothing that will effectively replace a human voice for a long passage, or many short passages in a row. Generally speech synthesizers at this point have mastered being understandable (at least the best ones), but that's far from being enjoyable. The main problem is that even if the details in the actual sounds are correct, it's all but impossible to get the intonation of phrases and sentences to sound right, so you either end up with something that is completely monotone and boring or wrong and awkward-- neither one would be tolerated for the ear for any considerable length of time.

Also, there are some computer voices that are almost passable. But those are almost always (if not always) not fully capable of text to speech. Instead of working at the level of sounds and building up to any word from there, they use a specific set of recorded words, often each with various pronunciations (for example, beginning or end of a sentence, or question), and those are then put together to form what sounds pretty good given a limited domain of output-- for example, it would be pretty easy to create a voice synthesizer for reporting the weather because there is a very limited set of things that will be said ever. But those programs won't be able to output everything you'll need for creative (unlimited) writing like you'll be using.

Of course if you happen to be building a voice synthesizer and this website is to demonstrate it, you have a good reason for using it (and I'd be interested to see what you have). Otherwise, it will honestly just annoy your visitors.

(One exception to all of this is for visitors with difficulty seeing the page. But they already have screen readers and they have spent a lot of time adjusting to the robot voices so they [probably; I hope] don't mind any more.)

Anyway, I completely understand why you'd want to use a synthesizer, but you'll be better in the end if you can record each section. Only if you have a HUGE amount of material that is constantly changing would you want to rely on text to speech.

