Home > Our Thinking > Blogs > Tech & Sourcing @ Morgan Lewis > Rise of Text-to-Speech AI Models Part 1: Intellectual Property Issues

Tech & Sourcing @ Morgan Lewis

BLOG POST

Tech & Sourcing @ Morgan Lewis

TECHNOLOGY TRANSACTIONS, OUTSOURCING, AND COMMERCIAL CONTRACTS NEWS FOR LAWYERS AND SOURCING PROFESSIONALS

Rise of Text-to-Speech AI Models Part 1: Intellectual Property Issues

Text-to-speech AI models (TTS Models) are rapidly evolving within the broader spectrum of AI solutions, offering tremendous potential for businesses. These models can analyze text and speech as well as generate anything from simple sounds to high-quality, natural-sounding speech, which capability makes TTS models highly appealing for commercial use, including use in connection with virtual assistants, audiobooks, elearning platforms, and customer service functions.

However, as with most modern technologies, new opportunities can introduce heightened legal risks. The use of TTS Models could run afoul of an individual’s publicity rights. Recent news from around the globe highlights instances of unauthorized use of the voices of actors, singers, and even private citizens. For example, in April 2023 a TikTok user created and released a song titled “Heart on My Sleeve” with vocals made to sound like singers Drake and The Weeknd through the use of AI.

General Legal Considerations

Businesses developing TTS Models and related solutions, as well as those outsourcing such models and solutions, should be particularly vigilant in complying with and adhering to applicable laws and ethical standards. They should also be aware of the various legal issues inherent to all AI solutions (see our previous thought leadership Cracking AI and Outsourcing Conundrums (Parts 1, 2, 3, and 4) and Ensuring IP Provisions Are Fit for GenAI).

In addition to general issues surrounding the use of AI solutions, legal issues associated with the development and use of the TTS Models have their own peculiarities due to the specific nature of the product source and output: voice. In this series, we will briefly highlight potential intellectual property and right of publicity issues (Part 1) and data protection issues (Part 2).

Peculiarities Associated with TTS Models

When contracting for AI solutions, it is typically advisable to consider and delineate the intellectual property rights (IPR) that might be implicated by the relevant data input, data output, and prompts. Such issues are typically dealt with as a contractual matter. However, with TTS Models, the IPR considerations are further complicated by a number of regulatory and ethical considerations.

Laws of many jurisdictions do not currently specifically regulate the use of voice or synthesized voice. That said, such use might fall within the scope of more general regulations relating to IPR, publicity rights, and personal data protection.

Intellectual Property Rights

The fixed recording of a person’s voice is generally protectable under copyright and neighboring rights as a sound recording. Further, in some cases, the words spoken in the sound recording (such as a poem or short story) may also be covered by copyright. In the United States, it is still unclear whether the use of a sound recording as a data input for TTS Model training would infringe the copyright of the owner of the sound recording (let alone the underlying work that was recorded). However, in the United States a person’s voice itself is generally not considered to be protectable by copyright.

It is important to note that some jurisdictions recognize “moral rights,” which are personal rights held by the author of a work (who may not be the owner of the copyright in such work), including the right to attribution and right to integrity. For example, a user may be required to identify or credit the author of the work or they may be prohibited from modifying the work if it changes the work in a way that would prejudice or damage the reputation of the author. In the United States, moral rights are generally only recognized in works of visual art under the Visual Artists Rights Act of 1990.

Therefore, these IPRs must be carefully reviewed and addressed when licensing training data sets for TTS Models, outsourcing such models, or contracting with a performer for further synthesis of their voice.

Other Limitations

As voice is generally considered to be an attribute of an individual’s persona, its use also raises right of publicity issues.

In several countries (e.g., Russia, Uzbekistan, Kazakhstan), voice is regarded as an intangible asset that belongs to a person from birth (similar to dignity and name) and is inalienable and imprescriptible in any way. In these countries, the use of a person’s voice would require a properly drafted consent or license from that person or their heirs (if deceased).

In the EU, there is no harmonized law on image rights, therefore national legislations of EU member states impose different restrictive rules regarding the assignment and use of image rights. Accordingly, it is crucial to ensure that any authorization to use an individual’s voice complies with applicable laws and is carefully drafted, clearly outlining the conditions and terms of use (e.g., duration, nature of use, authorized media, any excluded contexts).

In the United States, the right of publicity prevents the unauthorized commercial use of an individual’s name, likeness, or other recognizable aspect of the individual’s persona, such as their voice. There is currently no comprehensive federal right of publicity, but a proposed federal bill, “The Nurture Originals, Foster Art, and Keep Entertainment Safe Act of 2023” (otherwise known as the No Fakes Act), seeks to create the first federal right of publicity and regulate the creation and use of digital replicas of human beings, including their voice.

Presently, there is only a patchwork of state laws that govern and restrict the use of publicity rights, and the scope and strength of the legal protections that they provide differ.

One of the most comprehensive laws in this area is California Civil Code § 3344. According to this statute, businesses can obtain a license from a performer for use of their voice for commercial purposes. However, broad exclusive licenses or assignments of a person’s right of publicity are generally not allowed as they would restrict the individual from using their own voice or likeness. In Tennessee, the “Ensuring Likeness Voice and Image Security Act” (or ELVIS Act) recently went into effect on July 1, 2024. The ELVIS Act explicitly includes a person’s voice as a protected property right, including both an individual’s actual voice as well as a simulation of the individual’s voice. The law, which includes both civil and criminal penalties, is expected to be used by music labels to seek remedies against bad actors who make unauthorized use of their Tennessee artists’ voice.

Conclusion

Dealing with the legal and ethical challenges associated with TTS Models requires careful consideration. By understanding and addressing the issues set forth in this article, businesses can responsibly leverage the transformative power of TTS Models while managing legal risk, safeguarding individual rights, and maintaining ethical standards.