Generative Artificial Intelligence (AI) describes algorithms that generates different types of data such as text, audio and video. Then there are the likes of Open AI DALL-E (that creates images from text captions), GitHubs CoPilot (that speeds up the writing of code by assistance from AI programmer) and the most recent, Open AI ChatGPT.
The latter has been trending big time, recording about 100 million users as at January this year. This year has also seen another generative AI, Microsoft’s Vall-E’s AI Text To Speech system (TTS), join the fray.
Going by the trends, AI-powered tools and services could transform how stuff works. In split seconds, they are capable of mimicking a voice after hearing one talk. After a three second trial, a generative AI is good to go. The tool will only require written words to convert to a users’ speech. It is also capable of preserving the emotions of the speaker, meaning that a text will come off “angry” if a speaker is, and vice versa.
Apart from offering a chance to those who have lost the ability to speak, a time will come when AI voice commands will be more accurate especially in terms of picking tones.
While the AI model seems to be doing well in some areas, it is facing competition. French DJ and producer, David Guetta, used an unspecified generative AI to successful clone the voice of famous rapper Marshall Mathers (alias Eminem) without his knowledge. The cloned feature was part of a song the DJ played recently during one of his shows, but little did his fans realize that the particular tune was AI replicated.
The famous DJ has no plans of releasing the remix commercially, but the situation kicked up a debate on ethical use of AI and how it can lead to impersonation. This brings forth the question of whether AI technology is advancing faster than the world. Or, not.
ChatGPT has been used for various purposes from churning out professional articles to writing thesis. Not only does it have the potential to take away opportunities from, say journalists, and programmers, but might render the current generation lazy. It is better to avoid such tools when working on serious projects.
Furthermore, in the case of text to speech, using the likes of Vall-E, originality, reputation, and livelihood of many will likely be affected. With technology now highly accessible, artists who rely on their innate talent might be disadvantaged. Emerging artists are (even) more at risk, as their voices can easily be replicated without their knowledge and used for commercial gain.
The technology also presents a security challenge. For areas where voice command can be used to gain access, it will be easy for one to use such generative AI to access any information without permission – those who use voice commands to access their smartphones might also easily be hacked, with the use of only their voices.
Think about a feature such as Safaricom’s ‘Jitambulishe’, a voice biometrics identification system, launched in 2017 to allow the firm’s post-pay, prepay and hybrid customers to access their services from resetting their M-PESA PIN to PUK requests by simply using their voices.
In such a scenario, Vall-E may be used to mimic one’s voice and have access, thanks to its accuracy. However, experts disagree with that view. A linguistics instructor at GEMS Cambridge International School, Dr Luka Wandanje, says: “vocal tract cannot be altered and because of the permanent characteristics a biometric tool measures, it cannot be fooled into thinking it is another person. Not even a cold or flu can alter this. The accuracy of likes of Vall-E although suggest a different story”
For now, generative AI may not be used to hack systems that rely on other languages other than English. Unfortunately, the technology is advancing, and anything can happen. Good, bad or Ugly? It will rely greatly on who and how the technology will be used.