The Loop

Stay in The Loop

Subscribe to stay updated on the most important tech updates in your industry.

It’s one thing to stand up a GraphQL API to serve all web, mobile, and ctv client applications for the largest Spanish-language streaming app. Having that same API stand up to the load of the World Cup viewership – that’s a whole different beast. Response times not only had to be fast – they needed to withstand those sudden spikes of traffic each time two teams hit the pitch for kickoff.

In the months leading up to the tournament, Econify and Univision engineers worked together meticulously optimizing the Vix backend, ensuring it would withstand anything and everything Copa Mundial threw its way. The result was a robust API that could handle nearly a million requests per minute without degradation.

Looking for this type of speed and resilience for your API? Here’s part 1 of our GraphQL optimization playbook.

Right-Sized Field Selection

One of the beauties of GraphQL relative to REST is the ability for clients querying your API to select exactly the fields they want to see come back in the response, and nothing more.

Work with your frontend teams to ensure they’re fetching data the “GraphQL way” – only requesting the fields they need. Preventing needless over-fetching means less wasted effort by your field resolvers.

Pagination from the Get-Go

There are multiple viable approaches for your pagination implementation. We like a cursor-based approach coupled with edges, nodes, and pageInfo (as outlined in the GraphQL docs here) to help client callers navigate fetching of pages.

Build pagination into your client-facing queries from the start, and work with your frontend teams to ensure they’re using it properly. The added development time of implementing the cursor handling logic on the frontend is well worth it to avoid the gratuitous over-fetching that inevitably will occur if left unabated.

Chances are your data doesn’t change all that frequently, and so there’s no reason to have your API to do all the work of fetching, transforming, and assembling responses for every single request that comes in.

Here comes caching to the rescue! A well-designed, multi-tiered cache architecture is crucial for achieving lightning fast API responses.

There are many options for where to focus your caching efforts. Part 1 of our playbook introduces a few server-side cache strategies to ensure your GraphQL layer is sensible about reusing previously resolved data – look forward to part 2 where we’ll go even deeper on more caching strategies and optimizations.

Response-Level Caching

For successive requests with the exact same inputs, the overall response can be cached and reused.

Parse requests to extract all inputs that inform the shape of the output – query name, input parameters, pagination parameters, user data in headers, etc. These get hashed to become your cache lookup key. Once the response is fully assembled, persist it in your in-memory cache behind that key, using a sensible TTL that balances having up-to-date data with the realities of how often it’s changing.

Resolver-Level Caching

Take your caching strategy a step further with a second, more granular layer of caching – at the individual field resolver level. While a query-level response may need more frequent refresh as a whole, some of its contributing field entity data may prove to have a longer cache shelf life.

Caching at the resolver level further reduces superfluous fetching. Even though a response-level cache entry needs refreshing, the refresh operation may still take advantage of cached field resolver outputs not yet expired.

Stale While Revalidate

While the response and resolver caching greatly help reduce compute and network requests to rehydrate data, we still have to deal with the occasional revalidation of data as our TTLs expire. In a naive implementation, this revalidation operation blocks our main thread while data is assembled from our upstream sources.

This is where stale-while-revalidate and revalidation workers save the day. When your TTL expires and data needs to be refreshed, instead of blocking the response to revalidate the data, serve stale and kick off a revalidation worker to asynchronously prepare the updated data. Once the revalidation task has completed its work, subsequent requests begin to benefit from the fresh data with a reset TTL.

And by sequestering these revalidation workers into a separate process, you keep your main thread undistracted, focusing on its goal of fielding high volumes of requests and assembling the corresponding responses as quickly as possible.

Batch Requests with DataLoader

GraphQL’s execution model presents some unique challenges with respect to fetching source data. A single GraphQL query is likely to span several entities and entity types, quickly racking up excessive outbound requests in order to hydrate all the data.

For example, a user’s list of saved movies might involve fetching data about the user, a list of movies associated with that user, and, for each of those movies, additional data around the talent, production studios, related movies, and more! In a naive implementation, as resolvers fire for each individual movie, so too will the upstream data fetching calls.

Luckily, the DataLoader utility solves this n+1 problem by batching together upstream data calls stemming from resolvers doing their work. Implementing DataLoader in your GraphQL API is a must in order to minimize outbound requests.

Circuit Breakers

Suppose one of your upstream services runs into problems and is in the middle of attempting to recover. Your GraphQL server makes its initial attempt to fetch from that service, only to receive an error instead of the desired upstream data.

A naive approach may involve rapid subsequent retries in hopes of the first failure being a fluke. Unfortunately, it's more likely that the upstream service will continue to error, and you'll only end up slowing down your APIs response as well as potentially hindering the upstream's efforts to recover.

Circuit breakers provide an answer. When an upstream data fetch operation fails exceeding certain thresholds, the breaker will shut, preventing superfluous calls hammering your upstream service and giving it a chance to get itself back into a good state.

The strategies outlined in this piece will get you much of the way toward an incredibly performant GraphQL API. Stay tuned for Part II where we explore additional techniques and optimizations that will take your API to the next level.

This was a special guest post by Aleks Fiuk and Evan Wang.

Stay ahead of the curve with Econify's newsletter, "The Loop." Designed to keep employees, clients, and our valued external audience up to date with the latest developments in software news and innovation, this newsletter is your go-to source for all things cutting-edge in the tech industry.

The Loop is written and edited by Victoria Lebel, Alex Levine, Christian Clarke, Nick Barrameda, and Marie Stotz.

Have questions? Hit reply to this email and we'll help out!

Econify.com
Unsubscribe · Preferences

Published on Wed Sep 11 2024