Coder Social home page Coder Social logo

memoize-fs's Introduction

memoize-fs

Node.js solution for memoizing/caching function results on the file system

Coverage Status npm version

Motivation

Sometimes you have to persist cached function calls, but you do not want to deal with an extra process (i.e. managing a Redis store).

Memoization is a technique which can help save on memory or CPU cycles when dealing with repeated operations. For detailed insight see: http://en.wikipedia.org/wiki/Memoization

Features

Installation

In your project path:

npm install memoize-fs --save

Usage

import memoizeFs from 'memoize-fs'
import assert from 'node:assert'

const memoizer = memoizeFs({ cachePath: './some-cache' })

;(async () => {
  let idx = 0
  const func = function foo(a, b) {
    idx += a + b
    return idx
  }

  const memoizedFn = await memoizer.fn(func)
  const resultOne = await memoizedFn(1, 2)

  assert.strictEqual(resultOne, 3)
  assert.strictEqual(idx, 3)

  const resultTwo = await memoizedFn(1, 2) // cache hit
  assert.strictEqual(resultTwo, 3)
  assert.strictEqual(idx, 3)
})()

Note A memoized function is always an async function and the result of it is a Promise (which you can await, as seen in the example above)!

Signature

See Types and Options sections for more info.

const memoizer = memoizeFs(options)

console.log(memoizer)
// => {
//  fn: [AsyncFunction: fn],
//  getCacheFilePath: [Function: t],
//  invalidate: [AsyncFunction: e]
// }

const memoizedFn = memoizer.fn(functionToMemoize, options)

Memoizing asynchronous functions

memoize-fs assumes a function asynchronous if the last argument it accepts is of type function and that function itself accepts at least one argument. So basically you don't have to do anything differently than when memoizing synchronous functions. Just make sure the above condition is fulfilled. Here is an example of memoizing a function with a callback:

const funAsync = function (a, b, cb) {
  setTimeout(function () {
    cb(null, a + b);
  }, 100);
};

const memFn = await memoize.fn(funAsync)

await memFn(1, 2, function (err, sum) { if (err) { throw err; } console.log(sum); })
await memFn(1, 2, function (err, sum) { if (err) { throw err; } console.log(sum); }) // cache hit

Memoizing promisified functions

You can also memoize a promisified function. memoize-fs assumes a function promisified if its result is thenable which means that the result is an object with a property then of type function (read more about JavaScript promises here). So again it's the same as with memoizing synchronous functions. Here is an example of memoizing a promisified function:

const memoizer = memoizeFs({ cachePath: './some-cache' })

const funAsync = function (a, b, cb) {
  setTimeout(function () {
    cb(null, a + b)
  }, 100)
}

;(async () => {
  const memFn = await memoizer.fn(funAsync)

  await memFn(1, 2, function (err, sum) {
    if (err) throw err
    console.log(sum)
  })
  await memFn(1, 2, function (err, sum) {
    if (err) throw err
    console.log(sum)
  }) // cache hit
})()

Types

export interface MemoizerOptions {
  cacheId: string
  cachePath: string
  salt: string
  maxAge: number
  force: boolean
  astBody: boolean
  noBody: boolean
  throwError: boolean
  retryOnInvalidCache: boolean
  serialize: (val: unknown) => string
  deserialize: (val: string) => unknown
}

export declare function getCacheFilePath(
  fn: unknown,
  args: unknown[],
  opt: Partial<MemoizerOptions>
): string

export default function buildMemoizer(
  memoizerOptions: Partial<MemoizerOptions>
): {
  fn: <FN extends (...args: never) => unknown>(
    fn: FN,
    opt?: Partial<MemoizerOptions>
  ) => Promise<(...args: Parameters<FN>) => Promise<Awaited<ReturnType<FN>>>>
  getCacheFilePath: (
    fn: (...args: never) => unknown,
    args: unknown[],
    opt: Partial<MemoizerOptions>
  ) => string
  invalidate: (cacheId?: string) => Promise<void>
}

Options

When memoizing a function all below options can be applied in any combination. The only required option is cachePath.

cachePath

Path to the location of the cache on the disk. This option is always required.

cacheId

By default all cache files are saved into the root cache which is the folder specified by the cachePath option:

const path = require('path')
const memoizer = require('memoize-fs')({ cachePath: path.join(__dirname, '../../cache') })

The cacheId option which you can specify during memoization of a function resolves to the name of a subfolder created inside the root cache folder. Cached function calls will be cached inside that folder:

memoizer.fn(fnToMemoize, { cacheId: 'foobar' })

salt

Functions may have references to variables outside their own scope. As a consequence two functions which look exactly the same (they have the same function signature and function body) can return different results even when executed with identical arguments. In order to avoid the same cache being used for two different functions you can use the salt option which mutates the hash key created for the memoized function which in turn defines the name of the cache file:

memoizer.fn(fnToMemoize, { salt: 'foobar' })

maxAge

You can ensure that cache becomes invalid after a cache lifetime defined by the maxAge option is reached. memoize-fs uses stats.mtimeMs (last modification time) when checking the age of the cache.

memoizer.fn(fnToMemoize, { maxAge: 10000 })

force

The force option forces the re-execution of an already memoized function and the re-caching of its outcome:

memoizer.fn(fnToMemoize, { force: true })

NOTE that using the force option you are invalidating one single function outcome with specific arguments passed to that function (the first after memoization). All other previously cached results for that function are kept in the cache. If you need to invalidate all cache for a function, you can use cache invalidation.

astBody

If you want to use the function AST instead the function body when generating the hash (see serialization), set the option astBody to true. This allows the function source code to be reformatted without busting the cache. See #6 for details.

memoizer.fn(fnToMemoize, { astBody: true })

noBody

If for some reason you want to omit the function body when generating the hash (see serialization), set the option noBody to true.

memoizer.fn(fnToMemoize, { noBody: true })

retryOnInvalidCache

By default, undefined is returned when trying to read an invalid cache file. For example, when trying to parse an empty file with JSON.parse. By enabling retryOnInvalidCache, the memoized function will be called again, and a new cache file will be written.

memoizer.fn(fnToMemoize, { retryOnInvalidCache: true })

serialize and deserialize

These two options allows you to control how the serialization and deserialization process works. By default we use basic JSON.stringify and JSON.parse, but you may need more advanced stuff.

In the following example we are using Yahoo's serialize-javascript to be able to cache properly the return result of memoized function containing a function.

import memoizeFs from 'memoize-fs'
import serialize from 'serialize-javascript'

// Note: For the sake of the example we use eval in the next line of code. eval is dangegrous
// in most cases. Don't do this at home, or anywhere else, unless you know what you are doing.
const deserialize = (serializedJsString) => eval(`(() => (${serializedJavascript}))()`).data

const memoizer = memoizeFs({ cachePath: './cache', serialize, deserialize })

function someFn (a) {
  const bar = 123

  setTimeout(() => {}, a * 10)

  return {
    bar,
    getBar() { return a + bar }
  }
}

memoizer.fn(someFn)

Manual cache invalidation

You can delete the root cache (all cache files inside the folder specified by the cachePath option):

memoizer.invalidate().then(() => { console.log('cache cleared') })

You can also pass the cacheId argument to the invalidate method. This way you only delete the cache inside the subfolder with given id.

memoizer.invalidate('foobar').then(() => { console.log('cache for "foobar" cleared') })

Serialization

See also the options.serialize and options.deserialize.

memoize-fs uses JSON to serialize the results of a memoized function. It also uses JSON, when it tries to serialize the arguments of the memoized function in order to create a hash which is used as the name of the cache file to be stored or retrieved. The hash is created from the serialized arguments, the function body and the salt (if provided as an option).

You can generate this hash using memoize.getCacheFilePath:

const memoizer = require('memoize-fs')({ cachePath: './' })
memoizer.getCacheFilePath(function () {}, ['arg', 'arg'], { cacheId: 'foobar' })
// -> './foobar/06f254...'

Since memoize-fs is using JSON for serialization, you should know how it works around some of its "limitations":

  • It ignores circular references silently
  • It ignores arguments and attributes of type function silently
  • It converts NaN to undefined silently
  • It converts all objects, no matter what class they were an instance of, to objects with prototype Object (see #16)

Some "limitations" can not (yet?) be worked around:

  • Serializing huge objects will fail with one of the following two error messages
RangeError: Invalid string length
  at Object.stringify (native)
  at stringifyResult (node_modules/memoize-fs/index.js:x:y) -> line where memoize-fs uses JSON.stringify
FATAL ERROR: JS Allocation failed - process out of memory

Common pitfalls

  • Be carefull when memoizing a function which uses variables from the outer scope. The value of these variables may change during runtime but the cached result will remain the same when calling the memoized function with the same arguments as the first time when the result was cached.

  • You should know about how memoize-fs handles serialization under the hood.

Contributing

Issues and Pull-requests are absolutely welcome. If you want to submit a patch, please make sure that you follow this simple rule:

All code in any code-base should look like a single person typed it, no matter how many people contributed. โ€” idiomatic.js

Then please commit with a detailed commit message.

memoize-fs's People

Contributors

alfred-nsh avatar borisdiakur avatar christianscott avatar dependabot[bot] avatar gburtini avatar joliss avatar josephfrazier avatar tunnckocore avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

memoize-fs's Issues

Promisified function that also accepts callback

Awesome tool first of all!

I have an unusual case that I have a promisified function (returns a promise) but also takes a callback (although it's for a different purpose). This makes it incorrectly identify it as an async function.

As a workaround for now I'm just passing an extra non-function last argument, but it'd be better if this could be set manually like in options {promisified}, or if it were in the API like memoize.async/promisified.

Serialize function AST instead of source

Hey @borisdiakur, thanks for the module! I was wondering if you'd be willing to accept a patch that would allow the comments/formatting of a function to change, without invalidating the cache. The idea is that instead of hashing the source code of the function, we would use a parser like babylon to build an AST (abstract syntax tree) of the function that represents its semantic behavior while disregarding comments/formatting/etc. The AST would be serialized into JSON and then included in the hash.

Note about storing instances of objects

async function fun(a) {
  return new Hello(a);
}

async function fun_(a) {
  return memoize.fn(fun, {}).then((memFn) => {
    return memFn(a);
  });
}

fun('meow').then(console.log);
fun_('meow').then(console.log);

First time this runs:

Hello { sound: 'meow' }
Hello { sound: 'meow' }

Second time this runs:

Hello { sound: 'meow' }
{ sound: 'meow' }

{ force: true } only works once.

I have fixed it by removing line 249 in my own branch, but I'm not sure what the intent of this was, so I haven't suggested a PR.

delete optExt.force

So, as a pseudo-example:

let callCount = 0;
const f = (_) => {
  callCount++;
}

const example = async () => {
  const memoizedF = await memoizer.fn(f, { force: true });
  
  // two calls.
  memoizedF(); 
  memoizedF();
}

example(); // call count is 1.
example(); // call count is 2.

This behavior gets weirder when you consider how it plays with argument level caching: only the first call after creating the memoizedF seems to have its cache renewed, that is, in the following code:

const memoizedF = await memoizer.fn(f);
memoizedF();
memoizedF(2);

const forceMemoizedF = await memoizer.fn(f, { force: true });
forceMemoizedF(); 
forceMemoizedF(2); // this will not have cache renewed.

As far as I can tell, removing line 249 fixes this with no side effects, but I haven't run your tests or otherwise figured out why you may wish for this run-once type behavior.

Make cache file be regular .js file

This will fix most of the "limitations", and also resolve my current problem.

I'm using it to cache configuration (which may include functions). Basically I memoize the function that loads a config file which returns some configuration that can have functions.

// tryLoadConfig returns an async function
const loadConfig = tryLoadConfig({
  ...ctxRest,
  testPath,
  config,
  start,
  runnerName,
});
const cfgFunc = await memoize(loadConfig, {
  cacheId: 'load-runner-config',
  astBody: true,
  salt: runnerName
});
const runnerConfig = (await cfgFunc()) || {};

In the fresh run, without cache, it writes a cache file with only the non-function things from this configuration object. The thing is that I later getting a function from this config (e.g. runnerConfig.postHook) but because on the second run it gets the config from cache it doesn't have the function there.

So. That's clear from the readme. But if we switch the cache files to be regular js file we'll be able to just put there whatever js the memoized function returns.

There's no problem for generating a hash in this way too.

Timed cache invalidation

Hi @borisdiakur, thanks for your work. Have you thought of adding timed cached invalidation? It would be really useful to stop calling APIs whose results doesn't naturally change that often, particularly if a quota is set, too. Thanks!

Changing the body of the memoized function invalidates cache

I'm using memoize-fs in a crawler to store downloaded files. When the cache of 43k files (2GiB in total) got invalidated, I've found these two rows:

fnStr = String(fn),
hash = crypto.createHash('md5').update(fnStr + argsStr + salt).digest('hex');

I think that such behaviour should at least be mentioned in the manual, if not to say that there should be a way to turn it off.

Importing from TypeScript fails (because of problem with `exports` field?)

I'm having trouble importing memoize-fs using TypeScript, and I've been struggling for the past hour to figure out why it's failing. I made a minimal test case at https://github.com/joliss/import-problem so you can run it yourself if you like.

I have the following index.ts:

import memoizeFs from "memoize-fs";
memoizeFs({});

TypeScript can't find the import (both with tsc and in VS Code), even though memoize-fs is clearly installed:

~/src/import-problem $ tsc
index.ts:1:23 - error TS2307: Cannot find module 'memoize-fs' or its corresponding type declarations.

1 import memoizeFs from 'memoize-fs'
                        ~~~~~~~~~~~~
~/src/import-problem $ ls -l node_modules/memoize-fs/
total 20
-rw-r--r-- 1 ubuntu ubuntu  1079 Feb 29 15:44 LICENSE
-rwxr-xr-x 1 ubuntu ubuntu 12026 Feb 29 15:44 README.md
drwxr-xr-x 2 ubuntu ubuntu    41 Feb 29 15:44 dist
-rw-r--r-- 1 ubuntu ubuntu  2034 Feb 29 15:44 package.json

tsc nonetheless produces a compiled index.js file. If I run it, I get the following error from Node's module resolver:

~/src/import-problem $ node index.js
node:internal/modules/cjs/loader:598
      throw e;
      ^

Error [ERR_PACKAGE_PATH_NOT_EXPORTED]: No "exports" main defined in /home/ubuntu/src/import-problem/node_modules/memoize-fs/package.json
    at exportsNotFound (node:internal/modules/esm/resolve:303:10)
    at packageExportsResolve (node:internal/modules/esm/resolve:593:13)
    at resolveExports (node:internal/modules/cjs/loader:591:36)
    at Module._findPath (node:internal/modules/cjs/loader:668:31)
    at Module._resolveFilename (node:internal/modules/cjs/loader:1130:27)
    at Module._load (node:internal/modules/cjs/loader:985:27)
    at Module.require (node:internal/modules/cjs/loader:1235:19)
    at require (node:internal/modules/helpers:176:18)
    at Object.<anonymous> (/home/ubuntu/src/import-problem/index.js:6:38)
    at Module._compile (node:internal/modules/cjs/loader:1376:14) {
  code: 'ERR_PACKAGE_PATH_NOT_EXPORTED'
}

Node.js v20.11.1

I can see the following exports declaration in the package.json:

~/src/import-problem $ cat node_modules/memoize-fs/package.json
{
   ...
  "module": "./dist/index.mjs",
  "types": "./dist/index.d.ts",
  "exports": {
    ".": {
      "import": "./dist/index.mjs"
    }
  },
  ...
}

The dist/index.d.ts and dist/index.mjs files do exist in the distribution.

I'm not sure why exactly the exports field isn't being picked up properly by Node.

I'm running the latest stable Node and TypeScript versions:

~/src/import-problem $ node --version
v20.11.1
~/src/import-problem $ tsc --version
Version 5.3.3

For what it's worth, my actual workflow doesn't involve transpiling with tsc but rather running with ts-node, but it's showing the same error (ts-node index.ts and ts-node --transpile-only index.ts).

maxAge should be stored in file instead of setTimeout, so cache is properly invalidated across several runs

My use case and confusion. I'm using memoize-fs to acquire jwt tokens for tests. Tokens living 1 hours, and their acquisition takes about 20 seconds, so memoize-fs effectively cutting those 20 seconds of from every test run.

maxAge is a timer based cache invalidation, however, because I'm using it to cache results across runs, I need invalidation happen based on the age of the cached data, not when memoization function was created.

This is at least to say confusion point, as documentation doesn't say anything about it. And I'd say, it's makes sense to store records age along with the data and verify this age on pull from disk. In this case maxAge would behave consistently with expectations (expectation is age of the data, not the memoization function instance in a process).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.