Microsoft’s voice AI tool, called Vall-E, is trained on “discrete codes derived from an off-the-shelf neural audio codec model” as well as 60,000 hours of speech—100 times more than existing systems—from more than 7,000 speakers, most of which come from LibriVox public domain audiobooks.