The Loop

Stay in The Loop

Subscribe to stay updated on the most important tech updates in your industry.

Generative AI and large language models (LLMs) are and will continue to play an increasing role in powering enterprise software.

With this comes a barrage of services, frameworks, toolkits, SDKs, APIs — all looking to carve out their role in the coming wave of LLM-based software stacks sprouting up across engineering orgs.

Following its public release in Fall 2023, Amazon’s Bedrock has quickly emerged as an easy and powerful option for building and scaling your LLM-powered applications.

At Econify, we decided to take the new AWS service for a test drive. Here’s what we learned.

A place to house and prompt your large language models

Bedrock allows you to quickly stand up a serverless API and begin interacting with top LLMs from Amazon, Meta, and leading AI startups. As a fully managed service, it handles the underlying infrastructure for you — no need to slog through configuring compute resources.

A great option if you’re already building within the AWS ecosystem

Bedrock fits seamlessly into your existing AWS service landscape, allowing you to easily connect with other services and leverage the cloud provider’s robust security and data privacy capabilities.

Our PoC application — an article taxonomy utility — was built fully within the AWS ecosystem (S3, API Gateway, Lambda, Bedrock), which allowed us to hook everything up quickly and securely. Enabling Lambda to hit Bedrock required simply setting an “Invoke Bedrock” policy on our Lambda function.

Wide array of foundation models to choose from

All the heavy hitters are here — Anthropic, Cohere, Meta, Mistral, and more. 32 models are available at time of writing.

Timely releases of new models

We logged in to the AWS console one day to find that Llama 3 8b and 70b had been added to the list of available models — only five days after Meta’s general public release. Although only a single data point, a positive sign at very least.

Bonus points: Bedrock does a good job surfacing new releases via a helpful tooltip.

Bring your own model — currently in preview

While out of scope for our project, the ability to import your own models from S3 or SageMaker is in preview release at the time of writing. This is sure to be a welcome addition for orgs with ML and data science teams tinkering with model customization as they tackle more complex/hyper-specific use cases.

Requesting access at the model level

Before you can interact with any given model, you must first request access to that model through the Model Access view in AWS. The good news: in our experience, access requests consistently were approved within a minute or two.

Rejoice — your newly enabled models are ready to use

Start by jumping in to the playground environments Bedrock offers via its console interface. Simply choose any of your enabled models and send your first prompt to see it in action.

Now do it programatically

The playground UI is a great way to get your feet wet, but we’re here to build software after all.

Proceed to your codebase and ensure you have your favorite http library or AWS SDK imported and ready to send requests. We opted for AWS4 to sign our requests and fetch() to send them.

But wait — how do I switch among the various models I’ve enabled?

In theory, switching between models is simple. You tell Bedrock which model you’re prompting by passing modelId (e.g., meta.llama3-70b-instruct-v1:0) in your POST body. Refer to Bedrock’s developer docs for a full list of model IDs.

In practice, there’s a catch. Each model defines its own request and response format, meaning that in addition to switching modelId, you must ensure your prompting and response handler logic account for the unique data shape.

Bedrock pricing structure boils down to two options: token-based pricing and Provisioned Throughput.

Token-based

For the vast majority of users, token-based is the place to start. The cost you’ll incur is a function of number of input and output tokens. Take Command R+ — Cohere’s latest flagship text model — as an example: $0.003 per one thousand input tokens and $0.015 per one thousand output tokens.

Provisioned Throughput

Provisioned Throughput, on the other hand, offers certain throughput guarantees in exchange for an hourly usage rate over a selected commitment term. Most models offer 1-month and 6-month commitment terms; note that a small subset of models support Provisioned Throughput mode with no commitment period.

There are two primary use cases that lend themselves to Provisioned Throughput:

Large ongoing inference workloads needing consistent guaranteed throughput
Organizations looking to train and leverage their own custom models to power their apps

To give you a sense, a one month commitment will run you on the order of several thousand dollars.

Pricing at a glance

Below is a snapshot we assembled comparing pricing across select Bedrock models as well as OpenAI. To avoid having to deal with fractions of pennies, we express token-based pricing as cost per one million tokens, rather than the AWS convention of one thousand tokens.

Pricing in action

We opted for token-based pricing for our PoC app. So how much did we rack up in 6 weeks of near-daily model interactions as we built and tested our LLM-powered app? A whopping $0.26!

While this may not prove a useful indicator of costs in a public app with many users, what this does tell you is that Bedrock offers a safe environment to experiment with LLM app development. You can tinker to your heart’s content without worrying about breaking the bank.

Aside from giving your models a place to live, Bedrock offers a few neat bells and whistles to enhance your experience building LLM-powered apps.

Model Evaluation

Your app’s user experience is only as good as the answers provided by the underlying LLM. An important step in building LLM-powered apps is evaluating the effectiveness of the model’s responses.

AWS offers both automated and manual model evaluation utilities. Automated evaluation pits a given model against a test dataset, using various statistical methods (F1, BERTscore, etc.) to produce a model effectiveness score. Manual evaluation, on the other hand, facilitates the process of human evaluation where evaluators are presented responses from 2 different models and are asked to select the “better” response.

Note that model evaluation has its own separate pricing structure beyond the aforementioned usage pricing options.

Fine Tuning

Users may opt to improve upon foundation model performance through fine-tuning; Bedrock makes it easy to do so through both its own customization interface as well as the ability to import models trained via Amazon SageMaker.

One frustrating limitation is that token-based pricing is unavailable with custom models — you are forced to use Provisioned Throughput mode if leveraging a fine tuned model. Depending on which foundation model you are using, this may require a costly minimum 30-day commitment. While at first we considered experimenting with fine tuning for our use case, this limitation ultimately prevented us from doing so, since Provisioned Throughput was a non-starter.

Don’t forget to consider that there are also additional costs incurred while fine tuning a model, based on number of tokens in the training dataset.

Bedrock is still a work in progress, with updates and new features being added seemingly weekly, but our nearly 2 month foray left us feeling bullish about Amazon’s budding GenAI service. Quirks like inconsistent prompt/response data shape requirements across different models are outweighed by how easy it was to get our application up and interfacing with the latest and greatest LLMs.

Stay tuned for a future post where we’ll do a deep dive comparison of three leading LLMs through the lens of our Bedrock-powered application.

Guest Post by Aleks Fiuk, Econify's Director of Engineering in New York

Stay ahead of the curve with Econify's newsletter, "The Loop." Designed to keep employees, clients, and our valued external audience up to date with the latest developments in software news and innovation, this newsletter is your go-to source for all things cutting-edge in the tech industry.

The Loop is written and edited by Aleks Fiuk, Victoria Lebel, Nick Barrameda, and Marie Stotz.

Have questions? Hit reply to this email and we'll help out!

Econify.com
Unsubscribe · Preferences

Published on Wed Jul 10 2024