The problem with this idea is that if you're listening to playback on speakers, that audio is going to conflict with and perhaps confuse the speach recognition software. A workaround for this is to wear a headset microphone which would eliminate most of the room noise and just hear your speach. Many people have tried to advance the editing interface beyond the keyboard and mouse (think, hand gestures), but practically speaking, it's not quite ready for primetime.
By comparison, the "voice recognition" capability in my car is actually pretty good. When I ask it for the nearest gas station, it usually displays the nearest ATM. Now that's advanced thinking!!!
Mark