Coder Social home page Coder Social logo

icaros-usc / pyribs Goto Github PK

View Code? Open in Web Editor NEW
201.0 14.0 30.0 49.16 MB

A bare-bones Python library for quality diversity optimization.

Home Page: https://pyribs.org

License: MIT License

Makefile 0.44% Python 98.72% Shell 0.83%
quality-diversity artificial-intelligence python optimization openai-gym evolutionary-computation deep-learning evolutionary-algorithms cma-es map-elites

pyribs's Issues

[FEATURE REQUEST] Make Colab links more flexible

Description

Currently, Colab links are created in docs/_templates/sourcelink.html. However, this assumes the notebooks are either 1) under the docs directory or 2) under the examples/tutorials directory. Furthermore, the links only work for notebooks on the master branch. In the future, we may want to make this more flexible.

Make archive as_pandas method use snake_case in column names

Description

Currently, we are using kebab-case in our column names in as_pandas. When we iterate over the columns of the dataframe with itertuples(), Pandas tries to convert these names to namedtuple attributes. But since dashes are not allowed, it just uses a generic name like _1, _2, etc. We need to switch this name so that itertuples() makes sense.

This will involve:

  • Fixing the naming in ArchiveBase and other archive components
  • Fixing examples, tutorials, and other documentation pages that use as_pandas()

[FEATURE REQUEST] Make emitters easy to extend

Description

Basically, this is a clean-up of EmitterBase and its children to make it as easy as possible to extend. We should also add a tutorial on how to extend it, or at least refer to existing code. We can link to this tutorial or add some notes in the EmitterBase docstring.

[FEATURE REQUEST] Multi-dim heatmaps

Description

Following the plots in Mouret and Clune 2015 (the MAP-Elites paper), we could extend heatmap functions to plot multi-dimensional heatmaps. Users would be able to specify a dimension ordering, and we would plot all the heatmaps. This would work on all heatmaps (sliding boundaries heatmap may be a bit tricky though).

[BUG] Lunar Lander Example does not work

  • pyribs version: 0.2.1
  • Python version: 3.8
  • Operating System: Ubuntu 20.04

Description

The seaborn dependency is not installed when using pip install -e .[examples]

This leads to the following error:

Traceback (most recent call last):
  File "lunar_lander.py", line 15, in <module>
    import seaborn as sns
ModuleNotFoundError: No module named 'seaborn'

Steps to Reproduce

  1. Clone repository
  2. run pip install -e .[examples] to install depedencies for examples
  3. cd examples
  4. python lunar_lander.py

[INFO] scipy cKDTree is faster than sklearn NN for single queries

In CVTArchive, we need to use a k-D tree for nearest neighbor searches when there are a lot of centroids. There are many implementations of k-D tree to choose from. Two notable implementations are scipy.spatial.cKDTree and sklearn.neighbors.NearestNeighbors. Both implementations are optimized for batched nearest-neighbor queries, but in _get_index, we query the nearest neighbor of a single point. If we run the following code, we can compare the performance of each implementation on single and batch queries.

import time

import numpy as np
from scipy.spatial import cKDTree
from sklearn.neighbors import NearestNeighbors
from tqdm import trange

samples = np.random.uniform(
    np.array([-1, -1]),
    np.array([1, 1]),
    size=(100_000, 2),
)

points = np.random.uniform(
    np.array([-1, -1]),
    np.array([1, 1]),
    size=(100, 2),
)

print("scipy cKDTree (batch)")
nn = cKDTree(points)
start = time.time()
nn.query(samples)
print(time.time() - start)

print("scipy cKDTree (single)")
nn = cKDTree(points)
start = time.time()
for i in trange(len(samples)):
    nn.query(samples[i])
print(time.time() - start)

print("sklearn NN with kd_tree (batch)")
nn = NearestNeighbors(n_neighbors=1, algorithm="kd_tree").fit(points)
start = time.time()
nn.kneighbors(samples)
print(time.time() - start)

print("sklearn NN with kd_tree (single)")
nn = NearestNeighbors(n_neighbors=1, algorithm="kd_tree").fit(points)
start = time.time()
for i in trange(len(samples)):
    nn.kneighbors(np.expand_dims(samples[i], axis=0))
print(time.time() - start)

I got the following output, which shows that NearestNeighbors is ~10x slower on single queries.

scipy cKDTree (batch)                                                                                     
0.036180734634399414                                                                                      
scipy cKDTree (single)                                                                                    
100%|██████████████████████████████████████████████████████████| 100000/100000 [00:03<00:00, 27718.94it/s]
3.6090946197509766   # cKDTree is fast                                                                                     
sklearn NN with kd_tree (batch)                                                                           
0.052004098892211914
sklearn NN with kd_tree (single)
100%|███████████████████████████████████████████████████████████| 100000/100000 [00:41<00:00, 2397.46it/s$
41.71117091178894    # NearesNeighbors is slow

In short, we should definitely use cKDTree.

Set up GitHub Actions for CI/CD

Currently, we are using Travis CI, which does not give free CI/CD for private repos. GitHub Actions will give us a limited amount of compute per month. If we run out, we will have to test locally.

Create Lunar Lander CMA-ME Example

Typical QD algorithms have to be run on large clusters; it is very hard to run them locally and then see their results. To help solve this, we will create a CMA-ME example that runs in the OpenAI Gym LunarLander-v2 environment. This example will show what QD algorithms are able to accomplish; furthermore, since it will only take a few hours to train (perhaps even less if we use multiprocessing and other optimizations), it will be highly accessible.

Default to stable version for documentation

The current default version is latest. This may be confusing as latest has some features that stable does not. Switching the default will require:

  • grep existing documentation and documentation configuration for "latest"
  • check the website for instances of "latest"
  • switch the default on readthedocs

[FEATURE REQUEST] Logging Capabilities

Description

Similar to pycma, we can add a logger that, when given an archive, records several metrics about it. I don't think we would want to tie it in too closely to the archives, as people may want to calculate their own metrics.

This may require adding some methods to the archives (preferable ArchiveBase) to compute metrics on them.

Metrics include archive size, archive fitness (mean, max, min, and maybe median), QD score.

[FEATURE REQUEST] Archive metadata

Description

Sometimes, we would like to store information about solutions in the archive, and this data is not necessarily a BC or objective. For instance, in the Lunar Lander example, it would be nice if we could store the total number of timesteps of the run, or the final x position. To do this, we can modify the archive API to support storing metadata. For simplicity, metadata will take the form of arbitrary objects (often a dict). Metadata will always be present in the archive, but it will default to None, such that users do not need to provide it. The following API changes are proposed:

  • __init__ gains a use_metadata parameter where the user can specify whether or not to use metadata. This defaults to False for backward compatibility and simplicity.
  • initialize allocates an object array for storing the metadata
  • add gains a parameter for passing in a metadata object. This defaults to None and is ignored unless metadata is turned on.
  • elite_with_behavior and get_random_elite additionally return metadata (only if applicable)
  • as_pandas gets a parameter for including metadata (defaults to False) and adds a metadata column onto the dataframe. The resulting dataframe cannot be saved with to_csv, but that is something we can leave to users. Users may also just save with to_pickle

The ask and tell methods in the emitters and Optimizer must also be modified to support the metadata by taking in an array/list of objects.

Usage example:

archive = GridArchive([10,10], [(-1,1)] * 2)
archive.initialize(10)  # In addition to the regular arrays, allocates a metadata array
archive.add(
    solutions=np.ones(10),
    objective_value=1.0,
    behavior_values=np.ones(2),
    metadata={  # An arbitrary metadata object (in this case a dict).
        "metadata1": np.ones(12), 
        "metadata2": 1,
    },
)

[FEATURE REQUEST] Add data parameter to heat map functions

Description

A data parameter would contain an earlier snapshot of the archive as a Pandas dataframe. This parameter would allow one to plot an earlier version of the archive -- this is useful if one has multiple snapshots of an archive lying around and wants to plot heatmaps of them after the fact.

Change Optimizer to Scheduler

"Optimizer" can be a bit confusing, especially since we have optimizers in the emitters. Hence, we would like to rename everything to "Scheduler". The current "Optimizer" would be renamed "BasicScheduler"

Changes Required

  • Rename Optimizer in the library
  • Fix ribs.factory
  • Fix tutorials
  • Fix examples

[FEATURE REQUEST] Ability to save and reload optimizer state

Description

In some cases, QD algorithms may run for a certain amount of time but then need to be paused. We should add the ability to save the state of a running algorithm's components along with its config, and to then reload that algorithm.

This could be implemented in several ways. We could try to make the algorithm components pickle-able, and then save the whole algorithm to a pickle file. Perhaps a simpler way would be to have a function that, given an Optimizer, saves all the optimizer's variables to a file along with all the archive and emitter variables. Then a separate function would recreate everything from this file.

A challenge would be dealing with changes to our API -- some arguments may not always be compatible across versions. To solve this, we can ensure that we have good versioning practices and only allow reconstructing certain files with certain versions of the library.

References

[FEATURE REQUEST] Test Mac and Windows in CI/CD

Description

Many users use Mac or Windows (not just Linux), so we should start testing on those platforms too. GitHub Actions is completely free for public repos, so this would not cost anything on our end.

Develop a logo and social image

Description

A logo would be great for making things look professional. We should also have a "Social Preview image" so that the repository has a nice image when shared on places like Facebook. Also, the logo should be made into a favicon.

[FEATURE REQUEST] Add full support for Python 3.9

Description

Currently, numba does not work with Python 3.9 (numba/numba#6345), so we cannot support it yet.

This is due to numba/llvmlite#669. Since Python 3.9 wheels are not available for llvmlite, CI/CD is trying to build the wheels, but is unable to due to lack of llvm. This should be resolved in a few weeks. For now, Python 3.9 will only work if users can build the llvmlite wheel on their own -- for this, they will need to have the llvm libraries installed.

[FEATURE REQUEST] Archive size attributes

Description

Currently, the number of bins occupied in the archive is found by taking the length of archive.as_pandas(include_solutions=False). This is very indirect and slow. We should add a length method (either overload __len__ or add a property like occupied) that makes it easy to check how many bins are occupied.

We could also add a bins property to all archives with the total number of bins in the archive.

[FEATURE REQUEST] Extended ask-tell interface

Description

To facilitate more calls to ask-tell within a single loop, add a mode parameter to ask and tell. Different types of solutions are returned depending on what mode is used. Example usage:

for i in range(n):
  sols1 = opt.ask("mode1")
  ...
  opt.tell(objs1, bcs1)

  sols2 = opt.ask("mode2")
  ...
  opt.tell(objs2, bcs2)

Notes:

  • This change would involve the Optimizer as well as all emitters.
  • For simplicity, the emitter should throw an error if it is passed an unknown mode.
  • The default mode would be called "default".

[FEATURE REQUEST] Grid archive heatmap

Description

Just as we have a heatmap function for CVTArchive and SlidingBoundaryArchive, we should have one for GridArchive. The current method of using Seaborn's heatmap directly does not work well because not all cells are shown.

In order to implement this method, we could leverage Seaborn's heatmap, though that would mean adding another dependency. It would be better to use a method like imshow or pcolormesh from matplotlib.

Make archive be the first parameter to all emitters

Currently, we have emitter constructors like:

IsoLineEmitter(x0, archive, ...)
GaussianEmitter(x0, sigma0, archive, ...)
ImprovementEmitter(x0, sigma0, archive, ...)

It would be more consistent if archive was always the first parameter, as it is the one thing that all emitters must have. This would require changing the library code as well as fixing the tests and examples.

Decide on and set up project styles and tools.

Styes:

  • flake8 vs pylint
  • numpy style vs google style vs pep8 style

Tools:

  • restview (for automatically viewing reST files)
  • yapf
  • sphinx-rtd-theme (nice Sphinx theme for ReadTheDocs)
  • sphinx-autobuild (automatically reloads Sphinx documentation)
  • pytest-cov
  • LGTM? (https://lgtm.io)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.