BLOG POST

Tech & Sourcing @ Morgan Lewis

TECHNOLOGY TRANSACTIONS, OUTSOURCING, AND COMMERCIAL CONTRACTS NEWS FOR LAWYERS AND SOURCING PROFESSIONALS

Use of Aggregated Data in Artificial Intelligence Solutions

The use of aggregated data by technology service providers is quite common in today’s landscape, and something that even traditionally cautious customers have become amenable to in the right circumstances and subject to proper limitations. As widespread adoption of artificial intelligence (AI) technology continues, providers and customers of AI solutions should carefully consider the proper scope of aggregated data use in the design and implementation of the AI solutions.

We previously discussed recommendations for customers considering rights to use aggregated data in service relationships. Although that post was tailored to more traditional service arrangements, the principles we identified there are a useful starting point for aggregated data issues involved in AI solutions; namely:

  • Identifying the types of data that may be aggregated
  • Describing in detail the manner of de-identification and aggregation
  • Specifying the limited permitted use cases for the aggregated data
  • Ensuring that the party authorizing the aggregation and use has sufficient rights in the underlying data to do so
  • Contractually allocating ownership of the aggregated data

In addition to those generally applicable considerations, the nature of AI technology presents some unique challenges relating to aggregated data usage. While service providers of traditional services and SaaS or other technology solutions often try to present aggregated data usage as a necessary and inherent component of their offerings, the reality is that the benefits provided on account of aggregated data are often relatively distinct from their core offerings.

For example, customers of consulting service providers may gain added utility from the consultants specifically leveraging datasets collected and aggregated from prior customers, but the primary value add for the consulting services is the provider’s expertise and human capital. Likewise, users of cloud services may realize indirect benefits from the cloud provider’s use of aggregated input data and usage statistics, but that is not always intertwined with the core functionality of the services. In the typical AI solution, the technology necessarily needs the ability to leverage the input data from its users as part of implementing and improving the machine learning algorithms underlying the technology, both for the customer that provided the input data and for future users of the solution.

As a result, it is often not practical for the customer of an AI solution to take the position of limiting the use of its data, including aggregated and de-identified data, only for the limited purpose of providing services to the customer. However, this also doesn’t in and of itself justify the provider of the AI solution in requesting broad, sweeping use or ownership rights for aggregated data “because they need it for the algorithm.” In these scenarios, the parties should ensure that they both have a reasonable understanding of how the technology works and how data is used in the implementation of the solution. All of the principles discussed above should still be addressed in the contract, with some of the following additional considerations.

Use of Aggregated Data

The use cases should reflect the reality of AI solutions without using that as a way to justify unfettered use of aggregated data. Permitted use for purposes of improving the output and results of the provider’s machine learning technology as additional data from a wide range of sources is collected, analyzed, and integrated therein in order to develop, maintain, and improve the provider’s solution as it may be provided to the customer and other customers of the provider is typically a good, reasonable starting point here.

Manner of De-Identification and Aggregation

Based on the nature of the particular product, is it sufficient to include the typical requirement that the aggregation and de-identification be done in a manner such that the data does not identify, or permit identification of, the customer or any of its users? Or are additional/different restrictions necessary to protect the customer’s commercial or privacy interests? One additional restriction we commonly see relates to ensuring that the data is de-identified and used in a manner such that the customer is treated the same or substantially similar to all other customers of the provider, and that the aggregated data collected from the customer is not used in a manner based on or specific to the customer’s business or industry. The driver here is usually commercial in nature, in that the customer doesn’t want the provider to gain an advantage through aggregated data as a result of the customer’s unique standing in the industry, then specifically use that data to target the customer’s competitors.

If the provider offers one standardized solution across its customer base, these types of restrictions may be easy to agree to, but if there are multiple products or solutions that utilize different datasets or the same datasets in different ways, the provider should be mindful of that when considering any such restrictions.

Bias Considerations

As discussed in a recent post, bias issues in AI decisionmaking have become increasingly problematic in recent years. Terms intended to ensure data is collected and used by the provider and its technology in a nondiscriminatory manner are one avenue through which customers could start looking for contractual protections relating to bias issues.

Providers should be mindful to only take responsibility for the design and implementation of their technology, and not be held responsible for any issues resulting from biases or other problems in customers’ input data (i.e., “bad data in, bad data out”) or issues resulting from the decisionmaking made by customers rather than the data presented to them through the AI solution.

This is a complicated issue without a straightforward solution, but the manner in which these technologies are designed to weight and utilize the data is one avenue to try to protect against these issues.