Home > Our Thinking > Blogs > Tech & Sourcing @ Morgan Lewis > Rise of Text-to-Speech AI Models Part 2: Data Protection Issues

BLOG POST

Tech & Sourcing @ Morgan Lewis

TECHNOLOGY TRANSACTIONS, OUTSOURCING, AND COMMERCIAL CONTRACTS NEWS FOR LAWYERS AND SOURCING PROFESSIONALS

Rise of Text-to-Speech AI Models Part 2: Data Protection Issues

In Part 1 of our series on Text-to-Speech AI Models (TTS Models), we highlighted questions that should be considered from an intellectual property perspective. In this Part 2, we provide a high-level overview of data protection concerns of which to be aware. Given the specific nature of the product source—voice being the core element of input and output data for TTS Models—it is crucial to be mindful of data protection requirements, especially in an era of heightened attention to privacy rights.

Voice as Personal Data

Many regulators around the globe have implemented a broad definition of personal data similar to that provided in Article 4(1) of the EU General Data Protection Regulation (GDPR):

[A]ny information relating to an identified or identifiable natural person (“data subject”); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

Following the GDPR’s approach, many jurisdictions classify a person’s voice as personal data. In contrast to classic data protection scenarios where two or more identifiers (e.g., a name and an address) are required to classify information as personal data, a person’s voice alone is recognized as personal data. This is particularly true where the voice belongs to a famous person or celebrity, as such a voice is distinctive due to the person’s reputation and fame.

Recognizing a voice as personal data imposes specific requirements on how voice data can be collected, the purposes for which it can be processed, the duration for which it can be stored, and how it can be processed and transferred to third parties.

Arguably, the synthesized voice itself, if fully reproducing the original voice of the performer and potentially allowing the identification (especially if the performer’s speech patterns and other characteristics are widely known), will likely fall within the definition of personal data.

Voice as Biometric Data

Furthermore, as voice relates to physical and physiological characteristics of individuals and can be used to uniquely identify them, it could be classified as biometric personal data. This is the case, for example, under the GDPR. Similarly, the California Consumer Privacy Act (Cal. Civ. Code 1798.140(c)) explicitly classifies voice recordings as biometric information:

Biometric information includes, but is not limited to, imagery of the iris, retina, fingerprint, face, hand, palm, vein patterns, and voice recordings, from which an identifier template, such as a faceprint, a minutiae template, or a voiceprint, can be extracted, and keystroke patterns or rhythms, gait patterns or rhythms, and sleep, health, or exercise data that contain identifying information.

If a voice is considered biometric data in a relevant jurisdiction, it often entails additional obligations and restrictions. They must be carefully addressed in agreements related to the licensing of voice data sets for training purposes, engaging performers, or outsourcing the TTS Models.

In closing, because the TTS Models rely heavily on voice data, it is essential to navigate the complexities of data protection regulations. Recognizing voice as personal and potentially biometric data requires stringent adherence to legal requirements regarding data collection, storage, processing, and transfer.