Member-only story
AWS is continually expanding its suite of large language models (LLMs) within its Bedrock platform. Presently, there are eight text-generation LLMs accessible to users. To assess the performance of these models, I conducted an exploratory study.
For these tests, I am simply using the on-demand versions of the models i.e. I am not using provisioned throughput which is impractical for sparse traffic.
I tested two main experimental conditions:
Text Generation Workload
For the text generation requests, I use the following prompt:
Write a 500 word long story about the tortoise and the hare.
I have set the max tokens to 200 to ensure each model generates the same number of tokens. The same request is made to each model, 20 times. The boxplot below shows the response times.
Findings:
- Jurassic-2 Mid delivered the quickest median response, albeit with considerable variability.