Look at it this way:
1. Even spy agencies still use human monitoring of wiretaps.
2. If you trust an automated transcription software that mistakes "let us entertain you" for "lettuce in continue", you could end up with a documentary about salads. It'll take a hell of a lot more time correcting the mistakes than if you'd been transcribing it right in the first place.
3. Just accents alone will probably throw any machine in for a loop. Let not forget human speech nuances like stutters, pauses, drawls, slurs, flat vowels, UK vs. US, Canada vs. US...