starptech / apollo-datasource-http Goto Github PK

View Code? Open in Web Editor NEW

73.0 6.0 32.0 261 KB

Optimized JSON HTTP Data Source for Apollo Server

License: MIT License

TypeScript 92.90% JavaScript 7.10%

graphql rest apollo-server

apollo-datasource-http's People

Contributors

Stargazers

Watchers

apollo-datasource-http's Issues

Request deduplication (memoization) does not occur across concurrently executing frames

This may be more of a feature request than a bug, as 'memoization' is admittedly currently 'working as designed'.

However, my understanding of memoization is that multiple independent tasks should be allowed to share a single common method invocation and its corresponding promised result. So for example, if 6 threads in an execution frame call into a function with the same expectation for results (e.g. Same provided parameters) they will all get the shared result without the need to execute the function 6 times.

However, it looks as though the HTTPDataSource.performRequest() method is not memoizing the Promise until after the result is already received. This means that concurrent, memoized access to an HTTPDataSource API method will still result in multiple simultaneous and redundant outbound calls to the HTTP endpoint. If one of the outstanding goals of memoization is request de-duplication, than this should not be so.

~Line 313 of HTTPDataSource.ts:
const responseData = await this.pool.request(requestOptions)

Then, ~ line 336:

      if (this.isRequestMemoizable(request)) {
        this.memoizedResults.set(cacheKey, response)
      }

In Apollo Server, this limitation is easily observable in the FIRST query returning a requested entity when multiple fields in the same schema object explicitly use the same resolver (and subsequently, the same DataSource API method).

// Query
query getMovie {
  movie(id: 12) {
    name,
    director,
    year
  }
}


// Schema
type Query {
  movie(id: Int!): [Movie]
}

type Movie {
  name: String
  director: String
  year: Int
}


// Resolver
// For this resolver, assume that the getMovie(id, ctx) method uses an HTTPDataSource implementation
// to fetch a single Movie object
const resolver: Resolvers = {
  Query: {
    movie: (root, { id }, ctx): => ({ id })
  },
  Movie: {
    name: async ({ id }, args, ctx) => {
      const { name } = await getMovie(id, ctx);
      return name;
    },
    directory: async ({ id }, args, ctx) => {
      const { director } = await getMovie(id, ctx);
      return director;
    },
    year: async ({ id }, args, ctx) => {
      const { year } = await getMovie(id, ctx);
      return year;
    }
  }

In the above circumstance, we have proper, independent field-level resolvers. However, each field resolver here will in fact invoke their own outbound HTTP call (with the same input parameters and expected result), missing both the memoized results LRU map and the cache, despite the fact that multiple calls are duplicitous and unnecessary.

If feasible, sharing function invocations - not just function results - would improve memoization 'hits' in concurrent-access scenarios. I believe a working approach to this could be to memoize the Promised result before it is actually received, rather than just the finished result itself. Like this:

    const responsePromise = await this.pool.request(requestOptions)
    if (this.isRequestMemoizable(request)) {
        this.memoizedResults.set(cacheKey, responsePromise)
    }

    // Now we can await the result and finish the rest of performRequest() that requires an actual result.
    const response = await responsePromise;

If further explanation or an example is required, let me know.

Support the same Node.js versions as Apollo Server 3.0

With Apollo Server 3.0 on the horizon and the great work being done in this codebase, the package is poised to become the de facto HTTP datasource. As such, I believe it should support the same versions that AS3 supports which is Node.js 12 and up. I believe this will facilitate adoption.

More info can be found here: apollographql/apollo-server#5262

And for ease of reference, here is AS3's tsconfig: https://github.com/apollographql/apollo-server/blob/release-3.0/tsconfig.base.json

Happy to make a PR/discuss :)

Vote on this request

This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.

👍 "I would like to see this addressed as soon as possible"
👎 "There are other more important things to focus on right now"

We are also happy to receive and review Pull Requests if you want to help working on this issue.

[Performance] Using x2 to x3 more cache memory and bandwidth than apollo-datasource-rest

Looks like due to really simple approach in implementation of the maxTtlIfError in the http-data-source.ts the cached data are sent over-the-wire and stored in the cache twice.

On this graphs you can see that this implementation (plus probably some other less optimised code - than in apollo-server-rest) is causing a huge, unexpected influx of data in our Redis instance.

Additionally, it looks like we're not able to stop using the maxTtlIfError functionality, due to the way it option has been defined and used in code.

Make POST requests cacheable

Hi,
as noted in #19 we have to make responses cacheable.

I provided a PR #24 and would be happy if you could review it.

Thanks
Dirk

Rename `onRequestError` to `onResponseEnd` hook to handle success and error case

This useful to do something at the end of every request e.g closing a trace span or logging an error.

Handle request timeoutError

Intended usage from resolver functions?

Hi, I read the docs for this repo and also the Apollo docs for data sources, and I'm just wondering if this is the intended usage from a resolver function:

    async getArtwork({ id }) {
        return (await this.get('/artwork/' + id)).body
    }

This is of course for a simple case, where only the body of the response matters, and not response headers, etc. It seems a bit inconvenient to have to access .body on the result for every method, so I'm thinking about creating a wrapper get() method:

    async get(path, requestOptions) {
        const response = await super.get(path, requestOptions)
        if (response.statusCode !== 200) {
            // handle errors
        }
        return response.body
    }

Would that be consistent with how this library is meant to be used, or am I missing something?

PATCH HTTP verb support

Some APIs are using PATCH to modify the value of a resource, unlink PUT which may overwrite with the value provided in the request. MDN page

GitHub API is using it too, see this link.

LRU max age

export interface LRUOptions {
  readonly maxAge?: number
  readonly maxSize: number
}

this.memoizedResults = new QuickLRU({
      maxSize: this.options?.lru?.maxSize ? this.options.lru.maxSize : 100,
})

Thanks for you work, and I recently noticed that you provide LRUOptions with maxAge, but did not pass to QuickLRU.

Return response Headers in Apollo response

Hi Guys,

Maybe I’m on the wrong path here, but I need to return proper set-cookie headers after a call to our api. We are in Transition from Cookie Session Authentication to OCC and therefore need this to stop doing a call to the backend on clientside to make shure our session ist alive.

Is this possible in apollo-datasource-http?

Regards,

Handling of application/*+json response content-type

As of now this datasource strictly checks against the application/json http content-type header. This is perfectly fine for most applications, however some APIs respond with JSON content types, such as ...

application/sen+json https://www.rfc-editor.org/rfc/rfc8428.html
application/problem+json https://www.rfc-editor.org/rfc/rfc7807.html

in which case the Dataloader does not parse the response body. It would be great if this package could extend it's list or improve the detection via a RegExp to ensure all content-types that are JSON like are being parsed.

[Proposal] Make the maxTtlIfError optional

In some of the case we might not need or event want the maxTtlIfError to work, it would be nice if we could omit it in the configuration and thus make sure it's not used by the code.

The temporary workaround might be using same ttl value for both maxTtlIfError and maxTtl, but this is just wasting the cache memory.

FormData body is being coerced to JSON

on this line, the typeof request.body === 'object' check returns true for FormData from undici and form-data. Resulting in a body of "{}" being sent, with content-type: application/json; charset=utf-8

I think there's a few potential solutions here:

Check the content-type, and if it's set, do not attempt to coerce to JSON.
Check the json property in Request to be exactly false. This property doesn't appear to be used?
Check instanceof FormData from undici, I don't think this is a great option but it's an option.

`maxTtlIfError` behavior clarification.

This line makes it appear maxTtlIfError is whatever value is provided in seconds for maxTtlIfError

apollo-datasource-http/README.md

Line 113 in 5f0036f

    
           maxTtlIfError: 30 * 60, // 30min, will respond with the cached response in case of an error (for further 20min)

This line makes it appear as though maxTtlIfError actually ends up being maxTtl + maxTtlIfError.

apollo-datasource-http/src/http-data-source.ts

Line 328 in 5f0036f

ttl: request.requestCache.maxTtl + request.requestCache.maxTtlIfError,

I'm happy to PR clarification once I have it. Just need to know what that is 🙃 .

[Security] Upgrade Undici dependency to latest version

Detailed paths

Introduced through: › [email protected] › [email protected]
Fix: Upgrade to [email protected]

Overview

undici is an An HTTP/1.1 client, written from scratch for Node.js

Affected versions of this package are vulnerable to Improper Certificate Validation due to Undici.ProxyAgent missing verification of the remote server's certificate, which leads to exposure of all the requests and responses data to the proxy.

[Proposal]: Support caching responses without JSON.stringify

By default, apollo-datasource-rest stringifies responses when setting cached values: https://github.com/StarpTech/apollo-datasource-http/blob/main/src/http-data-source.ts#L349. (Responses are not stringified when caching for memoization: https://github.com/StarpTech/apollo-datasource-http/blob/main/src/http-data-source.ts#L343.)

The underlying KeyValueCache from Apollo Server allows non-string types for the cached value: https://github.com/apollographql/apollo-server/blob/0aa0e4b20ef97576ce92733698a7842b61d8280e/packages/apollo-server-caching/src/KeyValueCache.ts#L10-L14.

Could an additional requestCache option (https://github.com/StarpTech/apollo-datasource-http/blob/main/src/http-data-source.ts#L28) be added to control whether or not responses are stringified and parsed when setting and getting from the cache? I have a use case where I am caching a lot of large responses, and the stringify and parse steps can add significant overhead at scale.

Simplified proposal

const cachedResponse = request.requestCache.stringifyResponse ? JSON.stringify(response) : response;
...
const response: Response<TResult> = request.requestCache.stringifyResponse ? JSON.parse(cacheItem) : cacheItem;

If Response is cacheable but method is not GET then it will never be read

Nice job on the datasource, it is a joy to use.

there may be some use cases in with a cacheable response doesn't have a GET method, ex: when querying another graphql or an elasticsearch, but in

apollo-datasource-http/src/http-data-source.ts

Line 378 in d234a10

if (options.method === 'GET') {

only GET requests are retrieved from cache

Rename `beforeRequest` to `onRequest`

Export HTTPDataSourceOptions interface in index file

This would allow developers to make their own "base" class that is appropriate for their project while still accepting the exact same input to be passed to super()

Connection pool cache

First of all, nice work with the library ;).

I encountered a case where I have a lot of open subscriptions and for each subscriptor I have a new Datasources instance.
So in this case there is a connection pool for subscriptor with some memory usage let's say 2 mb. That memory is multiplying for every subscriptor.
Summarising the connection pool is not reused for same hosts with the implementation that is currently putted on place.

Do you think it will make sense to do a connection pool cache?

In my service I end up doing something like this:

const poolCache = new Map<string, Pool>()

class HTTPDataSource extends ApolloHTTPDataSource<Context> {
  public globalOptions?: HTTPDataSourceOptions
  constructor(baseUrl: string, options: HTTPDataSourceOptions = {}) {
    if (!options.pool) {
      options.pool = poolCache.get(baseUrl)
    }
    if (!options.pool || options.pool.closed || options.pool.destroyed) {
      options.pool = new Pool(baseUrl, options?.clientOptions)
      poolCache.set(baseUrl, options.pool)
    }
    super(baseUrl, options)
  }

[Discussion] How to dedupe concurrent memoizable requests?

The Problem

Concurrent requests usually happen:

When resolving multiple fields of the same object.
When one object appears in multiple locations of a query.

Multiple requests occur and these requests should be deduped if marked as memoizable.

Examples

const resolvers: Resolvers = {
  UserInfo: {
    name: async ({ mid }, _, { dataSources }) =>
      (await dataSources.example.getUserInfo(mid)).name,
    gender: async ({ mid }, _, { dataSources }) =>
      (await dataSources.example.getUserInfo(mid)).gender,
    avatar: async ({ mid }, _, { dataSources }) =>
      (await dataSources.example.getUserInfo(mid)).avatar,
  },
}

query Users {
  currentUser {
    mid
    name
    avatar
  }
  userInfo(mid: 123) {
    name
    avatar
  }
}

My suggestion

We should use a lazy loader (like graphql/dataloader but much simpler as we don't need batching and caching).

See #29 for my rough workaround.

Minor question

How to define response.memoize if the response is from deduped result?

Handling application/vnd.api+json content type

Responses with content type application/vnd.api+json should also be parsed as JSON.

Perhaps we could include all application/json variants/vendor prefixes.

How to intercept HTTP request with jest?

I'm trying to mock the response by intercepting undici request but with no success!

I've implemented the data source and it works fine with real testing. But we are trying to intercept HTTP request with jest for testing purposes.

The following mock configuration found here https://github.com/nodejs/undici/blob/main/docs/api/MockPool.md

import {MockAgent, setGlobalDispatcher} from 'undici';

process.env.BASE_API = 'http://localhost';

const mockAgent = new MockAgent();
setGlobalDispatcher(mockAgent);

// MockPool
const mockHttpClient = mockAgent.get('http://localhost');



// Then later i call in my jest on top
mockHttpClient
  .intercept({
    path: '/api/v3/ui/searchTags',
    method: 'GET',
    headers: {
      accept:
        MAP_SMART_SEARCH_CONTENT_TYPE.listAllTagsQuery.willSendAccept,
    },
  })
  .reply(200, JSON.stringify({data: res.allTags}));

Not sure what is missing?
I tried also msw but seems they don't support undici yet mswjs/interceptors#159

UPDATE:

So after digging a bit with undici library, I found that they use a mock agent which can intercept the requset. I applied the following in my testing setupTest.ts.

import {MockAgent, setGlobalDispatcher} from 'undici';

const mockAgent = new MockAgent({connections: 1});
setGlobalDispatcher(mockAgent);

[...]

export {mockAgent}

And in my test code

 mockAgent
          .get('http://localhost')
          .intercept({
            path: '/api/v3/ui/searchTags',
            method: 'GET',
            headers: {
              accept:
                MAP_SMART_SEARCH_CONTENT_TYPE.listAllTagsQuery.willSendAccept,
            },
          })
          .reply(200, {data: res.allTags});

The mock works only when using request client from undici. So the only intersepcted request was the one called using the module not the one from the class context this.METHOD.

[...]
const httpResponse = await req(this.baseUrl + SMART_SEARCH_TAGS_RESOURCE_PATH, {
        method: 'GET',
        headers: {
          accept: MAP_SMART_SEARCH_CONTENT_TYPE.listAllTagsQuery.willSendAccept,
        },
      });
      console.log(await httpResponse.body.json());
[...]

Seems something wrong with wiring request but not sure what is the issue exatly!

Any idea of how to intercept the request?
Thanks in advance.

Dynamic baseURL

I need to change the basURL on my Data Source dynamically in the REST Data source it is possible using a getter, is this also possible with HTTP data source?

Error upon receiving a compressed response.

When you pass "Accept-Encoding" header to a supporting endpoint, and it responds with compressed data, it errors during json parse, here.

I'm having trouble finding a mechanism to decorate with this decompression. Any suggestion?

requestCache.maxTtl unit

The comments in readme suggest that maxTtl and maxTtlIfError are in milliseconds

      requestCache: {
        maxTtl: 1000 * 60 * 10, // 10min, will respond for 10min with the cached result (updated every 10min)
        maxTtlIfError: 1000 * 60 * 30, // 30min, will respond with the cached response in case of an error (for further 20min)
      },

while the comments in the code say it's in seconds. I think the latter is correct, especially since apollo-server-caching mentions seconds, too.

Specific request configs don't override more general client configs.

E.g. if you pass requestOptions: { memoize: false } at instantiation and then do instance.get(<path>, { memoize: true }), the more specific setting is not honored, and the request is not memoized.

Mocking of the comminucation

I'm trying to replace apollo-datasource-rest with your library, but I'm struggling with mocking of the external communication:

import { MockAgent, setGlobalDispatcher } from 'undici';

const mockAgent = new MockAgent();
setGlobalDispatcher(mockAgent);
mockAgent.disableNetConnect();
...
const mockPool = mockAgent.get(BASE_URL);
...
mockPool.intercept({ path: '/v1/something', method: 'GET' }).reply(404, '');
...
const api = new MyAPI();
const result = await api.doSomething('aaa', 'bbb');

Above is failing because DataSource is successfully reaching to the unmocked REST resource and gets rejected with HTTP 401. I don't find mocking you use in your tests as best approach and would like to make the undici mocking work with apollo-datasource-http, though I might need some help with that. Any suggestions?

Support ETag

Hey, first of all thanks for the work on this alternative implementation for http/rest based Apollo datasources. Both the official version and this one currently don't support ETag and/or other related complex headers for caching purposes. I'd be interested in figuring out how to make this work, but I thought I'd first check in to see if that'd be something that would be appreciated.

[QUESTION]: onRequest and query

Hi,

I had a question in regards to this library.
In some cases I need to pass an authentication or query param that I need to fetch from another source.
I tried to pass that in the request.query and I noticed that it does not get added to the path since the build query is done prior to calling onRequest.

Is this intended behavior?

How do you guys handle such scenarios?
I just want to make sure I am using it properly.