The box-embeddings from iesl

Demo models

Embedding wordnet like in gumbel box paper.
2-class NLI using Multi-NLI.

Implement each one in Tensorflow and Pytorch in separate notebooks in an examples folder at the root level.

Add more labels and workflow to update the changelog on dev/main on every push.

Add document for steps before release.

Add a RELEASE.md describing steps that need to be taken before making a release.

Update the version in setup.py
Check if the test coverage is above a threshold.
Check if the CHANGELOG is up-to-date.
Create a tag matching the version in the setup.py. Copy the "unreleased" section of the CHANGELOG into the description for the tag and release.

Pytorch Docs Theme missing certain elements

Contributors
Branches
Tags

Usage Documentation

Examples of the following:

Train shallow box representations (can / should be on a simple toy dataset, eg. mammal, birds, etc. maybe do both BCE and max-margin)
How you use it on the output of BERT (maybe NLI?)
(Maybe more in the future)
Probably do this in notebooks: https://mybinder.org/

How to run the examples？

Hello, what command is used to run the "examples" and what settings are required?

Change to "temperature" everywhere

Currently we have gumbel_beta and also beta in softplus, this is confusing.

We should change to using intersection_temperature (taking the place of current gumbel_beta) and volume_temperature (where 1/volume_temperature takes the place of current beta in softplus calculation).

Instead of clamping the input to `log` to a minimum eps, add an eps to the input.

box-embeddings/box_embeddings/modules/volume/soft_volume.py

Line 63 in 76a5268

softplus(box_tensor.Z - box_tensor.z, beta=beta).clamp_min(eps)

Clamping will kill the gradient. Adding a minimum eps will not.

Exact Bessel Volume

Currently, the volume function is an approximation of the Bessel volume in this repo. However, I have tried to implement an exact version of the Bessel volume in the past. It was numerically not stable. I would like to request you to have a look at the code snippet and see how this could be appended to this repo.

The bessel function wrapper

class Bessel(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        dev = input.device
        with torch.no_grad():
            x = special.k0(input.detach().cpu()).to(dev)
            input.to(dev)
        return x

    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        dev = grad_output.device
        with torch.no_grad():
            grad_input = grad_output*(-special.k1(input.detach().cpu())).to(dev)
            input.to(dev)

        return grad_input

The volume function

    def _log_bessel_volume(cls,
                           z: Tensor,
                           Z: Tensor,
                           gumbel_beta: float=1.,
                           scale: Union[float, Tensor] = 1.) -> Tensor:
        eps = torch.finfo(z.dtype).tiny
        if isinstance(scale, float):
            s = torch.tensor(scale)
        else:
            s = scale
        element = (2*torch.exp((z-Z)/(2*gumbel_beta))).clamp_max(100)
        return (torch.sum(
            torch.log(2*gumbel_beta*Bessel.apply(element).clamp_min(eps)),
            dim=-1) + torch.log(s)
        )

Should we be calling it an NLP library?

It is a general library that can be used for NLP or graph learning. Calling it an NLP library might be too restrictive. @mboratko @ssdasgupta What do you think?

Tensorflow docs not appearing.

Add regularization for tensorflow.

Regularization tests missing for tensorflow

Efficient computation of intersection and volume.

Once we have minus_Z and a parameterization that stores minuz_Z instead of Z directly, we can use z and minus_Z to compute intersection and volume.

Add volume and side-length based regularizers.

Side-length based

L2 on side-length
L1 on side-length
Huber (smooth L1) on side lengths.

Volume-based

L1, L2 on box volumes.

Restructure imports to not depend on PyTorch

Change the top level imports so that the tensorflow version can be imported without pytorch

Denominator in bag of boxes pooler can be zero.

The denominator can be zero and result in NaN output. Adding a small epsilon will help.

box-embeddings/box_embeddings/modules/pooling/bag_of_boxes.py

Line 23 in 444e3e8

z = torch.sum(box_z * weights, dim=dim, keepdim=keepdim) / denominator

Use jupytext instead of raw ipynb with Binder

Storing raw notebook in VCS is messy and not recommended. Can we use jupytext instead? Will jupytext work with MyBinder?

The .ipynb_checkpoints directory should not be present in VCS.

Update the usage doc (ipynb) with updated function calls and put it on mybinder

https://mybinder.org/

Documentation issues

Change theme (PyTorch? FastAPI? https://fastapi.tiangolo.com/)
Look into why inherented methods are appearing in subclasses (maybe a flag can turn this off)

Subclass from `torch.Tensor`

One issue with subclassing will be the creation from zZ. We want to avoid the inverse-forward roundtrips whenever possible.

Tf.Tensor is not an instance of Tf.Variable

isinstance(tf.Variable([1]), tf.Tensor) returns False but returns True for tf.constant([1]). The variable implementation will be required for computing gradients as constants cannot be modified in tensorflow. Currently the code is functional for the tf.constant([1]) implementation

Cannot commit to personal branches due to precommit hook

Personal branches (other than main and dev/main) seem to still require precommit hooks to pass.

Return torch.sum instead of torch.mean for l2_side_regularizer

def l2_side_regularizer(
    box_tensor: BoxTensor, log_scale: bool = False
) -> Union[float, torch.Tensor]:
    z = box_tensor.z  # (..., box_dim)
    Z = box_tensor.Z  # (..., box_dim)
    if not log_scale:
        return torch.mean((Z - z) ** 2)
    else:
        return torch.mean(torch.log(torch.abs(Z - z) + eps))

Move the project into a `src` subdirectory structure

Embedding module

Currently we need to have a separate embedding layer which we then pass to some box parameterization, it would be nice to wrap this so box embeddings can be created directly.

Update description

Current description mentions "Pytorch implementation for box..". Should be updated with Tensorflow too

Create a wrapper for Conditional probability

We need this because of the numerical issues with pytorch's softplus. While creating conditional probability we need to concatenate, take volume, and split again.

Optional step: Find the failure points: on what shapes and values does this happen?
First step: Profile the difference between applying softplus on two tensors or applying softplus on their concatenation and split it. Profile both runtime and memory usage.

Fixed-width boxes

Capability to model boxes with fixed side-lengths.

Ref: https://github.com/dhruvdcoder/box-embeddings/blob/master/box_embeddings/parameterizations/center_fixed_delta_tensor.py

BoxTensor Indexing

Hi,

I'm new to the box embeddings. I'm wondering if the current implementation has some sort of indexing method? E.g. if a box tensor contains 64 16-dimensional boxes, what's the best way to access specific boxes in this tensor? I think constructing a new box tensor with .from_zZ() could work but just wondering if we can do this more efficiently.

Thanks!

Make `gumbel_intersection` work with broadcasting semantic in box_shape.

Currently, we stack up the two BoxTensors to compute intersection using logsumexp.

box-embeddings/box_embeddings/modules/intersection/gumbel_intersection.py

Line 34 in e70e283

torch.stack((t1.z / gumbel_beta, t2.z / gumbel_beta)), 0

This does not support broadcasting. For instance, if we have two BoxTensors of box_shape (batch1, 1, box_dim) and (1, batch2, box_dim), all the other intersection ops can produce intersection box with box_shape (batch1, batch2, box_dim) but gumbel_intersection cannot. This operation is required for multilabel classification using boxes.

Plan:

Implement a logsumexp that works with two tensors and that is API consistent with torch.maximum(). Call this operator real_softmax because it is essentially a differentiable max operation.
Update the current gumbel_intersection to use real_softmax instead of stack + logsumexp.
Write boundary test-cases to check for numerical issues (underflow as well as overflow).

Create a CANGELOG.md file

Follow the format from AllenNLP: https://github.com/allenai/allennlp/blob/main/CHANGELOG.md

Image missing the doc website

https://www.iesl.cs.umass.edu/box-embeddings/main/index.html

Documentation is not being generated for the new tag v0.1.0

Check if the extra max is needed and if `logsumexp > max ` does not hold always due to numerical issues.

Write a test either confirming the numerical issue or confirming the absence of it.

Line:

box-embeddings/box_embeddings/modules/intersection/gumbel_intersection.py

Line 36 in e70e283

z = torch.max(z, torch.max(t1.z, t2.z))

Issue while installing on windows

box-embeddings/setup.py

Line 6 in ea401d5

with open("README.md", "r") as fh:

Change this to :

with open("README.md", "r", encoding="utf8") as fh:

One which can plot boxes in 2d
One which plots boxes in n dimensions, using a horizontal / vertical stack of one dimensional boxes

For the API, we should look into "grammar of graphics".

Put example usage into docstrings

Add code examples in the docstrings for functions.
Use the tests as an example.

iesl / box-embeddings Goto Github PK

box-embeddings's People

Contributors

Stargazers

Watchers

Forkers

box-embeddings's Issues

Side-length based

Volume-based

Recommend Projects

Recommend Topics

Recommend Org