Search
Close this search box.

Claude 3: A First Look at this Exciting New Technology

By John Tredennick and Dr. William Webber

On March 5, 2024, Anthropic announced the release of three powerful versions of Claude 3. Early reports here and elsewhere suggest these new large language models may give GPT 4 a run for its money both in terms of performance and cost. 

Here is a rundown of the three versions:

  • Claude 3 Opus is the most powerful AI model, with top-level performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. 
  • Claude 3 Sonnet strikes a balance between intelligence and speed. It delivers reliable and high-performing workloads at a lower price than our competitors, and is engineered to be the dependable workhorse for scaled AI deployments.
  • Claude 3 Haiku is their fastest, most compact model for near-instant responsiveness. It answers simple queries and requests with unmatched speed, and also includes image-to-text vision capabilities that are comparable with other state-of-the-art vision models. Haiku is not yet available for testing. 

We loaded Claude Opus and Claude Sonnet into the lab version of DiscoveryPartner for initial testing prior to rolling it out for production. To put it mildly, we were impressed at the results. 

Here is a look at what we found. 

Topic Creation

We loaded more than 3,000 documents into our test bed and then created the following topic:

What are the key issues around this topic: Slot Machines — All documents concerning the definition, legality, and licensing of “slot machines” in Florida.

List the main points mentioned about definition, legality and licensing.

This topic relates to Jeb Bush’s emails which he made public when running for President. The program managers for the NIST annual text retrieval conferences created a list of more than 30 topics to test different machine learning algorithms’ ability to quickly find (but not analyze) relevant documents. We now use the set against the full two million emails Bush made available to show speed and scalability. Our system not only finds relevant documents instantly but uses integrated GenAI algorithms to analyze and report on them. 

Running AI Powered Searches

The first step with our system was to run several types of searches. One used a Natural Language Processing methodology, which created embeddings for each of the document sections and then for the topic query. 

The second was syntactical, using AI to analyze for keyword similarity. We combined the rankings and promoted the top 100 documents for analysis, summarization and ranking. Let me note that all of this was done automatically in a few seconds by the system after I entered the prompt. 

You can see a list of the documents we retrieved here: 

In this case Claude created the titles based on its summary of the document. It also provided a relevance score for the document. For comparison purposes, we show the results of our initial search (Search rank) as well. In this case the search ranking pretty well matched the GPT judgments (at least for these documents). Think of Relscore as the virtual human judgment and Search rank as the initial sorting of the documents by the AI based search engines. 

Document Summaries

Here is a look at an AI-generated summary for the top ranked document.

We used the mid-grade Claude 3 Sonnet model for this work. Sonnet offers a cost-effective mix of intelligence, speed and reasonable cost. Initial testing suggests it is a good and cost-effective choice for summarization work. 

We now include paragraph references within the summary. These references tie to the actual text of the underlying document, allowing a reader to quickly match the summary statement to the relevant text in the document section. 

Here is the link to paragraph 18.

Our users find these linked references a helpful timesaver when reviewing source links for synthesis reports and the underlying summaries. Of course they always have quick access to the full text of the document either in text or native format. 

Claude 3 Opus for Reporting

The Opus model for Claude 3 is their top of the line GPT 4 competitor. Like the GPT 4 series it is more expensive (likely owing to the additional compute power required) and slower than the lesser models. While we could use it for summarization, it has and continues to feel like overkill for that part of the work. Certainly the summaries would take longer to create. They would be five times more expensive as well!

So, when it comes to the synthesis and reporting function, we use the top model, whether it is Claude Opus or GPT 4 Turbo. For this work, Claude Opus was required to read 100 summaries, analyze the information they contained and write a comprehensive report. 

Let’s see how Opus did:

What are the key issues around this topic: Slot Machines — All documents concerning the definition, legality, and licensing of “slot machines” in Florida.

List the main points mentioned about definition, legality and licensing.


DP

2024-03-06 16:49:11

claude-3-opus

Summaries used: 100

Extend answer

Based on the document summaries provided, the key issues around the definition, legality, and licensing of slot machines in Florida are as follows:

Definition:

  • Florida law broadly defines slot machines, potentially including both Class II and Class III machines (Document 256022, Document 256099).
  • Class II machines are computerized bingo games, while Class III machines use random number generators to determine outcomes (Document 203072, Document 203076).
  • Slot machines can have features like bonus rounds, multiple win lines, embedded bill acceptors, and recognizable game themes (Document 203072, Document 203076).
  • There is debate over whether machines that involve an element of skill, such as allowing players to stop spinning reels, qualify as slot machines or “amusement machines” (Document 253557, Document 253570, Document 253575).

Legality:

Licensing and Regulation:

Key people and organizations involved in this issue include:

The main time period covered in these documents is from 2004 to 2006, with a focus on the aftermath of the 2004 constitutional amendment allowing slot machines in Broward and Miami-Dade counties.

For our money, this was a pretty good report, one generated in a couple of minutes.. We found Claude Opus to be quick in providing its answer, certainly on par with GPT 4 Turbo, which was our past favorite. The writing was first rate and the answer comprehensive. 

In our system, we provide links from report statements to the underlying summary and, if desired, to the underlying source text. You can see an example here for Document ID 250887. 

As you can see, Claude reported the following:

Broward County drafted regulations to monitor slot machines, audit revenues, and prevent criminal involvement in the absence of state regulations (Document 250887).

It was based on the underlying summary which reported: 

1. Broward County released draft guidelines outlining how the county’s two racetracks, a dog track, and a jai-alai fronton can operate casinos with slot machines. [¶3]

2. The draft regulations aim to monitor the fairness of slot machines, audit the casinos’ revenue for taxation, and ensure no one with a criminal background is involved. [¶10-¶11]

3. The regulations were drafted after the Florida Legislature failed to pass slot machine rules during its regular session, despite voters approving a constitutional amendment allowing slot machines at Broward and Miami-Dade parimutuels. [¶4-¶5, ¶8, ¶14-¶15]

4. A Broward County Circuit Court judge ordered the County Commission to create operating regulations after parimutuels sued due to the lack of state guidelines. [¶8-¶9]

5. While parimutuel owners prefer state-issued regulations, they appreciate Broward County’s effort to move the process forward. [¶12-¶13]

Claude Opus also gave me a nice timeline of events. 

While a good associate could do this work, they might take a bit longer than the 41 seconds it took Claude Opus to read the 100 summaries and provide this timeline. 

What Do We Think?

Color me excited. GPT 4 now has a viable competitor at the top end of the LLM market. I have no doubt that Sam Altman and his crew are working hard on their response to these powerful new LLMs. I also don’t doubt that OpenAI or maybe even Google will come back with even more powerful and cheaper models to regain the throne. 

Are the Claude 3 models better than the GPT 4 line? I can’t say, at least not yet. My work with Claude Opus tells me it is every bit as good at GPT 4 and is offered at a better price. I am particularly pleased with their decision to offer three versions of the model, allowing our users to choose which combination of capabilities and pricing match the task at hand. Up until now, we were offering models from different vendors to provide that flexibility. 

John Tredennick, CEO and founder of Merlin Search Technologies
JT@Merlin.Tech

Dr. William Webber, Merlin Chief Data Scientist
WWebber@Merlin.Tech

Transforming Discovery with GenAI 

Take a look at our research and Generative AI platform integration work on our GenAI Page.

Subscribe 

Get the latest news and insights delivered straight to your inbox!

Scroll to Top