con-kitty / categorifier Goto Github PK

View Code? Open in Web Editor NEW

56.0 5.0 2.0 776 KB

Interpret Haskell programs into any cartesian closed category.

License: BSD 3-Clause "New" or "Revised" License

Haskell 98.27% Nix 1.71% Emacs Lisp 0.02%

haskell category-theory plugin

categorifier's Introduction

Categorifier

Defining novel interpretations of Haskell programs

You probably want to look at the plugin README.

Building

A Nix flake is provided, so if you are familiar with Nix, that’s the most reliable way to build the project.

If you‘re not using Nix, the cabal.project file requires at least Cabal 3.8, but the individual projects should work with older versions.

Contributing

There are compatible direnv and Nix environments in the repository to make it easy to build, test, etc. everything with consistent versions to help replicate issues.

This repository is all formatted using Ormolu. Currently CI runs Ormolu 0.4.0.0, which can be installed by cabal install ormolu-0.4.0.0. See the usage notes for how to best integrate it with your workflow. But don't let Ormolu get in the way of contributing - CI will catch the formatting, and we can help clean up anything.

categorifier's People

Contributors

Stargazers

Watchers

Forkers

carnotweat martyall

categorifier's Issues

Use thNameToGhcName

There's a GHC function, GhcPlugins.thNameToGhcName. This should be used instead of Categorifier.Hierarchy.findName.

support more coercions

The plugin supports only a handful of coercions in categorizeFun. One in particular that crops up is AxiomInstCo.

So far, AxiomInstCo has always been the result of a type class that has a single method and no super-classes. There is a workaround for this case.

make the type class "more complex" by either adding a dummy method or adding a super-class, like class EmptyClass a => SimpleClass a where ...
include a comment referencing this ticket.

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-3013)

report when we try to convert a mutually-recursive top-level definition

Right now categorize simply gets stuck in a loop when we try to categorize a mutually-recursive function.

I think the problem is tied to inlining, so it would be great if we could keep track of what we've inlined, and error if we try to inline a function that we're in the middle of converting already.

The problem with this approach currently is that we usually inline the original expression within a larger unconverted expression, then convert the larger expression, so it's possible that if we just track it in the larger expression that we're noticing parallel invocations of the same function, rather than nested ones.

Perhaps we could manually introduce a Tick node whenever we inline, so then we can notice that we hit the inlined identifier within the larger expression.

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-2733)

Support `show`

Currently use of show may lead to primitive type errors, for instance

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-3366)

simplify unsupported interpretations

In Categorifier, we may try to interpret / @(Native Double) to a category that doesn't support the Native newtype. This will fail, but if we catch that, we can specalize and inline / so that it ends up using the underlying / @Double, which is supported.

However, if we blindly catch and inline any interpretation that fails, we can easily find ourselves in a loop rather than reporting a real error. This needs to be handled delicately to avoid loops.

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-3106)

Hold meta-information for functions in Maker

Currently Maker is just designed for generate CoreExpr for specified functions in target category, but does not contain meta-information (module name for example). So we cannot use such information for checking equality of a function under investigation and a target function. This case happens in f . id = f simplification when checking id-ness of a function. Safer way in the case was to use temporarily made function using mkId but that's an expensive operation.
So this will make our custom simplification rules safer and more principled.

AC:
Make meta-information fields in Maker and make use of it in custom simplification like f . id = f.

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-3434)

Add a test for AxiomInstCo

I don't think we have a test case for this part of the code

Plugins.Cast from0 Plugins.AxiomInstCo {}
  | Plugins.isPredTy (Plugins.exprType from0) -> do
      inlined <- flip loopM from0 $ \from -> do
        ( \case
            Plugins.Cast from' _ ->
              if Plugins.isPredTy (Plugins.exprType from')
                then Left from'
                else Right from'
            other -> Left other
          )
          <$> (simplifyFun dflags logger [] =<\< mkInline from)
      categorifyLambda name
        =<\< simplifyFun dflags logger [] (Plugins.mkCoreApps inlined args)

outside of Heavisoft.

Fix OccInfo

In the Plugins.Let (Plugins.NonRec v rhs) expr case, currently the v's OccInfo is always ManyOccs, probably because we forget to zapIdOccInfo somewhere. If the OccInfo is accurate, we can obtain isManyOccs from it rather than manually counting.

Support `rem` and `quot` in PrimOp.hs

rem and quot should be added to mkNoInlinePrimIntFunctions.

Unfortunately there's no Plugins.remIntName or Plugins.quotIntName which makes things slightly more complicated.

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-4259)

Principled subst determination with re-exported Dict module

substituting let variables need some analysis on the nature of the target expressions. Especially, if the expression is Dict type, then we need to have special cares. Unfortunately, Dict can come from different modules from re-export and there are no easy ways to check the equality of those definitions. So in 15465, we have hard-coded two modules Data.Constraint and Barbies.Internal,Dicts, but some other cases may happen in the future, so this should be handled correctly.This may require some change in the ordering of our categorization steps and further module export analysis. So we want to postpone the work until when the approach becomes more clear.

AC:

No more hard-coded re-exported module check in isDictOrBarbiesDictTyCon but detecting such re-export in general way.

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-3418)

eliminate `CoerceCat`

We use coerceC to model Core's casts. However, it introduces some problems. For one, it basically duplicates the functionality of RepCat, but for different types (Coercible ones). The other problem is that it means all newtypes that are ever coerced need to have their constructors in scope whenever categorize is called. This is especially problematic for private constructors.

Conal has brought up this issue in both

If we had something along the lines of

instance {-# overlappable #-} Newtype a b => HasRep a where
  type Rep a = b
  abst = Newtype.pack
  repr = Newtype.unpack

could we avoid the newtype constructor scoping issues? (At the expense of using DeriveAnyClass and deriving (Newtype) on all newtypes) If we manage to switch to Generics, this could be even simpler.

The complicated part being that Coercions get combined, so a coercion like TyConAppCo Representational sumTyCon [TyConAppCo Representational productTyCo []] would become abstC . abstC, not just a single abstC. I think we can approach this by inspecting the coercion and converting it into a composition of HasRep operations. Could be complicated to implement, though, because I'm not sure how to destructure Coercions, since they're all private.

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-2657)

Errors with Natural

I was using to the plug-in, and part of my program involved an (==) expression on this type. The plug-in threw the following exception:

ghc: panic! (the 'impossible' happened)
  (GHC version 9.0.2:
	Categorifier failed to categorify the following expression:
\ (ds_dIjG :: (F_BN128, F_BN128)) ->
  case ds_dIjG of { (x, y) ->
  let {
    x_sIkK
      :: Prime
           21888242871839275222246405745257275088548364400416034343698204186575808495617
    [LclId,
     Unf=Unf{Src=<vanilla>, TopLvl=False, Value=False, ConLike=False,
             WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 350 0}]
    x_sIkK
      = let {
          $dKnownNat_sIkE :: Natural
          [LclId,
           Unf=Unf{Src=<vanilla>, TopLvl=False, Value=True, ConLike=True,
                   WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 100 0}]
          $dKnownNat_sIkE
            = 21888242871839275222246405745257275088548364400416034343698204186575808495617 } in
        let {
          $dNum_aIiq :: Num F_BN128
          [LclId,
           Unf=Unf{Src=<vanilla>, TopLvl=False, Value=False, ConLike=True,
                   WorkFree=False, Expandable=True, Guidance=IF_ARGS [] 20 0}]
          $dNum_aIiq
            = $fNumPrime
                @21888242871839275222246405745257275088548364400416034343698204186575808495617
                ($dKnownNat_sIkE
                 `cast` (Sym (N:SNat[0]
                                  <21888242871839275222246405745257275088548364400416034343698204186575808495617>_P) ; Sym (N:KnownNat[0]) <21888242871839275222246405745257275088548364400416034343698204186575808495617>_N
                         :: Coercible
                              Natural
                              (KnownNat
                                 21888242871839275222246405745257275088548364400416034343698204186575808495617))) } in
        - @F_BN128
          $dNum_aIiq
          (+ @F_BN128 $dNum_aIiq x (fromInteger @F_BN128 $dNum_aIiq 7))
          y } in
  let {
    y_sIkL
      :: Prime
           21888242871839275222246405745257275088548364400416034343698204186575808495617
    [LclId,
     Unf=Unf{Src=<vanilla>, TopLvl=False, Value=False, ConLike=False,
             WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 130 0}]
    y_sIkL = fromInteger @F_BN128 $dNum_aIhK 0 } in
  case case x_sIkK
            `cast` (N:Prime[0] <21888242871839275222246405745257275088548364400416034343698204186575808495617>_N ; N:Mod[0]
                                                                                                                       <21888242871839275222246405745257275088548364400416034343698204186575808495617>_P
                    :: Coercible
                         (Prime
                            21888242871839275222246405745257275088548364400416034343698204186575808495617)
                         Natural)
       of x1
       { __DEFAULT ->
       case y_sIkL
            `cast` (N:Prime[0] <21888242871839275222246405745257275088548364400416034343698204186575808495617>_N ; N:Mod[0]
                                                                                                                       <21888242871839275222246405745257275088548364400416034343698204186575808495617>_P
                    :: Coercible
                         (Prime
                            21888242871839275222246405745257275088548364400416034343698204186575808495617)
                         Natural)
       of y1
       { __DEFAULT ->
       case naturalEq# x1 y1 of wild { __DEFAULT ->
       tagToEnum# @Bool wild
       }
       }
       }
  of {
    False -> fromInteger @F_BN128 $dNum_aIhK 0;
    True -> fromInteger @F_BN128 $dNum_aIhK 42
  }
  }
  - The Categorifier plugin was unable to inline naturalEq#

My understanding from the discord thread is that the expression had been optimized to the underlying Natural type by the time it hit the plug-in, and there is apparently no support for Natural.

I guess I either need (1) to add support for Natural to the plugin or (2) better understand what classes need to be implemented in order to compile the Prime p type. It is also a supported object in my target category.

You can find the offending program here:
https://github.com/martyall/straw/blob/snarkl-json-update/examples/Examples/Arithmetic.hs#L50-L64

Handle unboxed tuples caused by `-funbox-strict-fields`

Handle unboxed tuples caused by -funbox-strict-fields

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-3652)

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Lock file maintenance

Detected dependencies

github-actions

.github/workflows/build.yml

.github/workflows/flakehub-publish.yml

actions/checkout v4

nix

flake.nix

nixpkgs release-23.11

Check this box to trigger a request for Renovate to run again on this repository

Avoid requiring `-fno-ignore-interface-pragmas` on Categorifier-using targets

With GHC 8.10, there is a new dynflagsPlugin. We should use that to set the flags we require for categorification. We should also 1. warn when we had to enable it (in case it is being explicitly disabled by the user) and 2. make sure it's clear that the flag is still required pre-8.10.

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-2423)

support tracing of the plugin's conversion

There is a change up with a crude version of this, but it needs to be more comprehensive and more controllable

a number of improvements we should make:

control these via plugin flags (already done)
convert to general levels and/or categories
distinguish between plugin-user debugging (what instance am I missing?) and plugin-dev debugging (why didn't it find the instance that exists?)
optionally only output on a failure
replace trace with something Writer-y (which also makes output-on- failure feasible)
take advantage of trampolining to do this in a single place
trampolining (which we already need for the conversion) gives us a middle ground between an opaque fold and recursion-scheme level introspection.

Use nix-build for CI

This external-world-dependency will give us continuing breaking-build troubles:
https://github.com/con-kitty/categorifier/blob/master/.github/workflows/ci.yml#L34

For CI process, it is desired to have a fully deterministic build.

Cache inlined and categorified expressions

Once we inline a variable and categorify its unfolding, we should cache it so that the next time we see it, we don't need to re-do the categorification work.

A/C: implement caching, benchmark it and the determine whether it should be adopted.

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-3892)

add explicit functor from Hask to target category

allows us to have distinct existential and product types, for example

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-1672)

Run `categorify` in `CoreM`

Currently categorify runs in CategoryStack, which doesn't include CoreM. This is because categorify is used as a rewrite rule, and the type of ru_try doesn't permit using CoreM.

This means we can't do CoreM things in it, like findId.

It should be possible to use categorify to modify the CoreProgram directly (see bindsOnlyPass), i.e., we just look for applications of Categorifier.Categorify.expression and rewrite them. This way it can run in CoreM, giving us more flexibility.

Perform Prim.checkForUnboxedVars only when appropriate

This happened way back is no longer reproducible, but this is what happened back then:

[Ziyang Liu] [Jan 15th, 2021 at 1:57 PM]
I think I found the cause of the "unexpected Word#" error in Biquad. Fortunately it seems to be fixable

[Ziyang Liu] [1 year ago]
When biquad2T is inlined, clampMaybe has not yet been inlined (possibly because it is in a different module; if I move clampMaybe to the same module then the error goes away), and as a result, its Typeable (Native Bool) instance becomes

case $tcBool of { TyCon ww1 ww2 ww3 ww4 ww5 ww6 ->
             case $wmkTrCon ww1 ww2 ww3 ww4 ww5 ww6 [] of
             { (# ww8, ww9, ww10, ww11, ww12 #) ->
             (mkTrApp $dTypeable_slKI (TrTyCon ww8 ww9 ww10 ww11 ww12))
             `cast` <Co:6>
             }

Those wws are primitive variables, and can't be eliminated by replacePrimOps.

[Ziyang Liu] [1 year ago]
So, I guess the simpliest (but not necessarily desirable) solution is to just ignore that error, because things will be OK once clampMaybe is inlined

[Ziyang Liu] [1 year ago]
I just added a FIXME for now
Rather than simply disabling checkForUnboxedVars altogether, a potentially better approach is to only ignore primitives in arguments to a not-yet-inlined function.

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-3416)

Preprocessing step to make sure `categorify` calls are in the right form

We use presimplifier to support partial applications of Categorify.expression hidden inside a wrapper function, like this:

preApply :: (a -> b) -> a `c` b
preApply = Categorify.expression
{-# INLINE preApply #-}

(see https://github.com/con-kitty/categorifier/blob/d72020b7f2c4534f94d7694b477e0dc8a3cb1f77/plugin/Categorifier/Test/PartialApplication.hs)

The presimplifier performs inlining, which inlines preApply and exposes Categorify.expression, whose applications are now fully saturated.

However, this approach doesn't work for slightly non-trivial cases, for example:

preApply :: (a -> b) -> a `c` b
preApply = if True then Categorify.expression else error "oops"
{-# INLINE preApply #-}

or direct partial application:

Categorify.expression . foo

Also, running the presimplifier has the disadvantage of unwanted inlining, e.g., sometimes fromIntegral is inlined and becomes fromInteger . toInteger, which may lead to categorification failures (e.g., the target category C.Cat does not support categorifying toInteger).

We should use a different approach to support (at least some) partial applications. At least Categorify.expression . foo seems easy to support.

Reorganize custom categorize exp simplification rules

We introduce some category-theoretic rules like f . id = f when constructing expression in target category like mkCompose'. By nature, such expressions can be scattered here and there in conversion, so we want to have some centralized rule map and apply them systematically.

Acceptance Criteria:

have a consensus inside the team about the strategy

make a refactored function and rules map for the application of the rules

clear documentation of this step

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-3371)

Categorifier imposes a `ClosedCat` constraint where not theoretically necessary

Hi there. I've been trying to use Conal's original plugin for automatic differentiation and was excited to find you had created this alternative! I've had no luck using it for AD so far though... when I try to categorify my expression into Conal's RAD category I get the error:

...
  - couldn't build dictionary for constraint

ClosedCat (GD (Dual (-+>)))

required by curry @ (GD (Dual (-+>))).
...

The same expression works fine with Conal's plugin. I have a very shaky understanding of these plugins, but as I understand it Conal's plugin does considerable work to avoid imposing the ClosedCat constraint (i.e. generating a curry or uncurry) because many of the categories he's working with aren't actually Cartesian closed (like RAD aka GD (Dual (-+>))). So I'm wondering if you decided to forget about all that complexity in favor of just requiring that the destination category be Cartesian closed. If so I suppose I'll just go back to Conal's plugin (with some regret - I love the quality of your error messages!)

If you think this is a bug and automatic differentiation should actually work then I'll provide more detail, I just thought I should ask the philosophical question first. Thanks!

separate trace from fix in TracedCat

fix requires a Cartesian category whereas trace can be implemented for weaker categories. Currently, we have them in the same class because we only ever generate fix in the plugin. However, a user could use ArrowLoop.loop and have it translated directly to trace, allowing us to target weaker traced categories.

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-3207)

improve support for weaker categories

Currently, something like uncurry (+) requires a CCC, but, it actually expands to uncurry . curry . plusC, which should be simplifiable to plusC, which only requires a Cartesian category (and appropriate NumCat instance).

The existing implementation converts to the new category too eagerly, which means we need to resolve the CCC instance before we get around to simplifying. Instead, we should categorize in Hask, allowing simplifications like uncurry . curry = id to be performed, then convert to the target category.

This would also reduce the reliance on inlining/rules in the target category in order to get efficient representations.

Greg Pfeil
September 16, 2020, 9:19 AM

The reason the current implementation converts to the target early is because it means we get errors earlier. If we went from \x -> foo (bar x) → foo . bar (staying in (->) as our category), we wouldn’t get a failure about skipping some component of categorization until after everything had been categorized, because both the first and second forms have the same type. By switching categories, we switch the type of the arrows, and so GHC complains to us immediately when they don’t line up.

We could potentially get the best of both worlds if we converted to Hask (as used for the plugin tests) in the first phase, then substitute Hask → cat in the second phase. This should work since Hask is isomorphic to (->), but still a distinct type.

Then, as long as the Hask instances have good rules (like curry . uncurry == id), we should get the simplified representation out the other end.

Greg Pfeil
May 19, 2020, 12:37 PM

This would probably also require enabling sm_rules in the simplifier in Categorize.hs (so that things like uncurry . curry get rewritten away … but maybe inlining will take care of that).

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-1940)

Implement listed cat-theory simplification rules

as listed in https://github.com/conal/concat/blob/master/classes/src/ConCat/AltCat.hs#L558,
there are many cat-theory law-based rules which should be used by our simplifier.
When custom rule simplifier is implemented, we need to install those rules into the system.

AC:

implement the rules
make test cases
apply this to actual control code and report the difference and check the improvement

(Extricated from https://kitty-hawk.atlassian.net/browse/SW-3379)

Flatten associative composition application and simplification

In some cases, we can have complex composition operator application like
((f . g) . (h . (i. j)) which is equivalent to f . g . h . i . j, which can have some opportunity of simplification via g . h = someexp. The original form is obscure to apply such simplification. Therefore, common procedure is to apply flattening simplification to have a normal reducible form and apply further simplification.