OpenAI Unlikely To Incorporate Its For-Profit Arm Within California After Gavin Newsom Signs Into Law A Bill That Mandates Training Data Disclosure For AI Models

Sep 29, 2024 at 01:52pm EDT
OpenAI
This is not investment advice. The author has no position in any of the stocks mentioned. Wccftech.com has a disclosure and ethics policy.

The State of California has just taken a significant step in enforcing some much-needed transparency vis-a-vis the troves of data that voracious generative AI models have been consuming as they become ever more elaborate in their responses. This development, however, poses competitive risks for the increasingly mercantile OpenAI, making it quite unlikely that the current apex predator of the AI world would choose to incorporate its for-profit arm within the Golden State.

OpenAI: To Profit Or Not To Profit

OpenAI's CEO, Sam Altman, has been increasingly pushing his organization toward a more commercial pathway over the past few months. In fact, it was Altman's increasingly mercantile bent that had allegedly prompted a full-on revolt at the non-profit towards the end of 2023.

Related Story Tim Cook Isn’t Going Away Anytime Soon, But Apple’s Talent Is

Recently, things took a decidedly firmer turn towards Altman's desired commercial direction when reports emerged that OpenAI's CEO was in talks to acquire a 7 percent ownership stake in his company as part of a new $6.5 billion funding round that would value the AI-focused enterprise at $150 billion. This means that the funding round would value Altman's stake at around $10.5 billion.

Critically, Altman is proposing that OpenAI be converted from a non-profit to a public benefit corporation, where the company would supposedly utilize its profits to benefit the humanity at large. Meanwhile, OpenAI has asserted that its non-profit arm would continue to exist under the new structure and even receive a minority stake within its for-profit arm.

California's New AI Training Data Transparency Bill

This brings us to the crux of the matter. On the 28th of September, California's Governor, Gavin Newsom, signed into law a bill (AB 2013) that requires all AI models available for use within the state to disclose a "high-level summary of the datasets used in the development" of AI models, including:

  1. The sources and owners of the datasets.
  2. A description of how the datasets were employed to fulfill the stated purpose of a given AI model.
  3. The number of data points included in the set, disclosed in general or estimated ranges, as well as a description of those data points.
  4.  Whether the sets contained data that is "protected by copyright, trademark, or patent, or whether the datasets are entirely in the public domain."
  5. Whether the datasets were purchased or licensed, or include personal or aggregate consumer information.
  6. Whether the datasets were subjected to any "cleaning, processing, or other modification."
  7. The formative time period for the training datasets, including an intimation if the data collection was an ongoing process.
  8. The dates when the sets were first employed to train the AI models.
  9. Whether the AI models use any synthetic data in their training.

So, why does this new California law matter to OpenAI? Well, the company has been quite reticent to divulge its secretive ingredients recently. For instance, the GPT-4's technical report did not include much information about the Large Language Model's (LLM's) training methods and dataset construction. More recently, OpenAI has warned users of a blanket ban should they try to decipher the o1 model's train-of-thought process.

OpenAI is likely to view California's new AI Training Data Transparency Bill in a negative light, given the firm's still-evolving commercial pivot and the bill's proclivity to add to the cost of doing business. This makes it quite unlikely that OpenAI would opt to incorporate its for-profit arm within the Golden State.

Of course, AB 2013 is not the first AI-focused bill that California has introduced recently. AB 3211, which requires AI-generated content to be clearly labeled and watermarked, has received support from OpenAI. However, SB 1047, which requires developers to conduct safety testing on their own models, has received a severe backlash from the AI-focused enterprise as well as other tech giants.

For the benefit of those who might not be aware, OpenAI has released two major models recently: the GPT-4o, which excels at handling general tasks and is capable of understanding textual, audio, and visual inputs, and the o1 model, which employs a train-of-though approach similar to what humans use to solve complex problems. While the latter is much more accurate, it is up to 30x slower than the former and a lot more expensive to use.

Follow Wccftech on Google to get more of our news coverage in your feeds.