Are Large Language Models Like GPT Secure? EDRM Webinar U.S.

Sept. 13 Webinar: Are Large Language Models Like GPT Secure?

A Look at the Technology and the Law

September 13, 2023

John Tredennick

The legal industry is increasingly thinking about using AI and Large Language Models (LLMs) like GPT for document review, legal research, and even writing legal briefs. Yet, in our discussions, legal professionals regularly express concern about LLM security.

Do we risk a waiver of attorney-client or work-product privileges by sending our data to OpenAI?
What if that data includes confidential client information?
Can LLMs learn from and share the information I send?
How do large language model providers like Microsoft assure data security and confidentiality?
What is the law governing these questions and how will it be applied?

Download a PDF of the slides here.

During the webinar, we had additional insightful questions raised. Please take a look at our answers to those questions below and read our article, “Are LLMs Like GPT Secure? Or Do I Risk Waiving Attorney-Client or Work-Product Privileges?”, that inspired this webinar:

1. Does anonymizing input moot a/c concerns?

That is always an option but would be difficult to accomplish. For reasons stated during the program, we don’t think this is either necessary or practical with a commercial license.

2. ChatGPT states that “user feedback may be considered in shaping the direction of future updates,” so is it that isolated?

The open beta license does say this. Commercial licenses typically prohibit using prompts for training.

3. Is the fact that it can’t learn true for all GPT, including BARD and other LLMs?

To our knowledge, yes. It is a function of the LLM architecture.

4. It is helpful to know that it does not “learn” from current user prompts. But will the developers of GPT-4 incorporate user prompts entered into GPT-4 as training material for GPT-5?

There may never be a GPT 5 (surprising as that may seem). But, the answer is no for commercial licenses.

5. I was under the impression that Chat GPT was learning at an exponential rate because of the millions of users. Is that not correct?

No, it does not learn from your prompts, and particularly not prompts submitted under a commercial license, rather than the beta.

6. I read in the ‘Atlantic’ in an article about the founder of ChatGPT that he did not understand how ChatGPT learned how to code. Is it possible it can learn in some ways?

Anything is possible with these LLMs but they don’t learn from user prompts, at least not those submitted via a commercial license…

7. If the whiteboard is wiped, how does ChatGPT do successive prompts and save them so you can return to the session?

For ChatGPT, that information can be saved for reuse. For commercial licenses accessing GPT through the API, you have to resend the information each time.

8. When the next version of GPT is created, won’t that incorporate millions of user interactions from the prior version?

It may include training from the open ChatGPT beta. Not from commercial license users.

9. Does Chat GPT keep the info you put into the whiteboard after you are done with the session?

Yes for ChatGPT. No for access through commercial licenses.

10. On the user feedback point, does the feedback include the conversation/gpt response or just what you put in that feedback box?

To our knowledge just the information you provide. This may vary depending on the nature of the license.

11. Is it true that some vendors are building models or applications that place a different interface over a foundational model such as Chat GPT? What guarantee to users have that these overlay interfaces won’t be recording the prompts or data that are supplied by the user? And are there any circumstances in which the user wouldn’t know that they were interacting with or being exposed to an interface that collects data in this manner?

ChatGPT is not a foundational module. It is an application that you access through a web browser. To our knowledge commercial applications like we showed only access LLMs through an API. We suggest that commercial users review the license agreements and deal with reputable vendors.

12. Is confidentiality maintained in BARD, Claude, Falcon, etc. as it is in ChatGPT?

In general yes, but each is governed by its own license.

13. Unfortunately there has been a great deal of chatter about not putting private info in the context window when prompting Chat GPT or its equivalents. Would that make expectation of privacy not be customary? Would that mean that it is not reasonable to expect privacy until we get a court decision to say it’s OK?

Not in our view. We didn’t wait for the courts to approve email or cell phones. And confusion or misinformation does not negate a reasonable, objective expectation of privacy.

14. I asked GPT-4 whether it was trained on prompts from GPT-3. This was its response: “As of my last update in September 2021, OpenAI has not publicly disclosed specific details about whether GPT-4 was trained using user prompts made in GPT-3.” Doesn’t that indicate that there is not a reasonable expectation of privacy in prompts submitted to Open AI?

Not in our view. The ChatGPT 3.5 beta expressly stated that OpenAI would use prompt information for training the algorithm. However, OpenAI and Microsoft commercial licenses specifically exclude that right.

15. What is the cost for GPT enterprise as opposed to prior versions?

We don’t know. I would check with OpenAI.

16. Thoughts on how data retention policies might need to be modified to address prompts, etc.

That is not our area of expertise. But with a commercial license, prompts are maintained by your organization or software provider.

17. Restating earlier question: Using the hypothetical at hand re settlement letter, are Chat GPT concerns alleviated by anonymising names and amounts? Reminds me of using translation tools.

We don’t think that is either practical or necessary. However, we wouldn’t recommend using the free ChatGPT beta for settlement letters. Better to use the Enterprise version with the proper data restrictions.

Our Speakers:

John Tredennick is the CEO and founder of Merlin Search Technologies, a cloud technology company that has developed a revolutionary new machine learning search algorithm called Sherlock® to help people find information in large document sets–without having to master keyword search.

Tredennick began his career as a trial lawyer and litigation partner at a national law firm. In 2000, he founded and served as CEO of Catalyst, an international e-discovery search technology company that was sold to a large public company in 2019. Over the past four decades he has written or edited eight books and countless articles on legal technology topics, spoken on five continents and served as Chair of the ABA’s Law Practice Management Section.

Dr. William Webber is the Chief Data Scientist of Merlin Search Technologies. He completed his PhD in Measurement in Information Retrieval Evaluation at the University of Melbourne under Professors Alistair Moffat and Justin Zobel, and his post-doctoral research at the E-Discovery Lab of the University of Maryland under Professor Doug Oard.

With over 30 peer-reviewed scientific publications in the areas of information retrieval, statistical evaluation, and machine learning, he is a world expert in AI and statistical measurement for information retrieval and ediscovery. He has almost a decade of industry experience as a consulting data scientist to ediscovery software vendors, service providers, and law firms.

University of Florida Levin College of Law | Senior Legal Skills Professor, CEDS

Professor William Hamilton: Bill Hamilton is the Senior Legal Skills Professor at the University of Florida Levin College of Law, where he teaches electronic discovery, complex litigation and civil procedure. He has taught electronic discovery for the past 10 years and is an author of the LexisNexis Practice Guide Florida e-Discovery and Evidence and A Student Electronic Discovery Primer: An Essential Companion for Civil Procedure Courses. He is also the General Editor of the LexisNexis Practice Guide: Florida Contract Litigation.

Hamilton is a neutral arbitrator and mediator for the World Intellectual Property Organization and the author of more than 100 domain name dispute decisions. Prior to academia, Hamilton served as the electronic discovery partner for a national law firm. During his 30-year litigation career, he has been recognized in Chambers USA, Florida Legal Elite, Best Lawyers in America, and Florida Super Lawyers.

John Tredennick is the CEO and founder of Merlin Search Technologies.
JT@Merlin.Tech

Transforming Discovery with GenAI

Take a look at our research and GenAI platform integration work on our GenAI page.

If you want to learn more about Merlin, our research on ChatGPT or our software, reach out here:

Thanks for reaching out!

Meet Merlin

Meet DiscoveryPartner

Latest News

Sept. 13 Webinar: Are Large Language Models Like GPT Secure?

A Look at the Technology and the Law

Our Speakers:

Contact Us to Learn More

Quick Links

Software