Coder Social home page Coder Social logo

paulbricman / dual-obsidian-client Goto Github PK

View Code? Open in Web Editor NEW
240.0 13.0 7.0 28.03 MB

A skilled virtual assistant for Obsidian.

Home Page: https://paulbricman.com/thoughtware/dual

License: Mozilla Public License 2.0

JavaScript 10.20% TypeScript 43.51% CSS 46.29%
tools-for-thought chatbot zettelkasten second-brain non-linear-note-taking

dual-obsidian-client's People

Contributors

bensleveritt avatar paulbricman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dual-obsidian-client's Issues

Implement generation in new backend

Just like #48 and #49, a barebones endpoint which receives a prompt and returns text. The query itself is not interpreted on the backend side, as per #45. Should use a pretrained GPT-Neo 350M, with customization possibilities in the future.

Implement embedding cache.

In order for subsequent operations to be tractable on light hardware, a caching strategy should be used. The goal is to maintain a dictionary of precomputed embeddings for each file. This way, subsequent operations can simply load the embeddings without computing them again.

This cache manager should:

  • remove the embeddings of the files which have been deleted in the meantime
  • add new dictionary entries for files which have been recently created
  • update dictionary entries for files which have been recently updated

Embeddings should be based on the sentence-bert module, like in MemNav. The dictionary can be stored in a pickle in a hidden local folder.

Front matter should be ignored. And perhaps also headings. Some Markdown-specific module might already be able to sort this out.

Implement rule-based query parsing

A function should wrap around Persona's functionality and deliver results based on a parsed query. Should be similar to the lists of commands used by virtual assistants. Probably using simple regex rules.

Implement question answering

On top of files selected through a fluid search (#2) based on the query, a straight-forward question-answering pipeline is used here to answer user questions.

Implement recipe matching in new frontend

Based on a call to the backend (#48) containing bundled example commands from all recipes, the first task of the new recipe engine is to determine which recipe has to be followed given a query. This can be a user query in the chat, or a query made by another recipe when composed. This behavior should be contained in a function which simply returns the recipe filename based on a given query.

issue with executing server

image

python 3.9
MacOS Catalina 10.15.7

not sure if this is related, but I couldn't find an essence.zip folder but I looked in the sample vault on the repo and made a copy of the config.json to the same directory on my other machine

image

Error deriveing the essence

Google keeps giving me this error:

ValueError                                Traceback (most recent call last)
<ipython-input-16-3fe4c666eb07> in <module>()
----> 1 output = trainer.train()

3 frames
/usr/local/lib/python3.7/dist-packages/torch/utils/data/sampler.py in __init__(self, data_source, replacement, num_samples, generator)
   102         if not isinstance(self.num_samples, int) or self.num_samples <= 0:
   103             raise ValueError("num_samples should be a positive integer "
--> 104                              "value, but got num_samples={}".format(self.num_samples))
   105 
   106     @property

ValueError: num_samples should be a positive integer value, but got num_samples=0

Implement fluid search in new backend

The new backend component (roughly housing the original skeleton and essence) will simply be a web server exposing a few generic endpoints for the NLP tasks involved, as described here. The most basic one of them is fluid search. Based on a query and a collection of documents, retrieve the closest ones in terms of meaning. No regex matching involved, something related will be taken up the frontend as described in #45. Caching would simply become additive, and only for the documents, not for the query.

Implement question generation

Question generation can enable the user's persona to quiz them on a subject. Mixed with knowledge probes, it can support reflection. Core module should simply return a batch of questions, leaving the dialogue up to the future wrapper interface.

Proposal for functionality changes and recipe framework design

Progress towards solving existing issues and setting up a proper roadmap had been slowed in the past days by the fear of prematurely settling on an architecture and API design given that this space of conversational interfaces over personal knowledge bases is quite unexplored.

The following describes a suggestion for heavily restructuring the functionality and the codebase, a tentative something in between a spec and a user story.

Architecture

Dual is based on two components: the backend and the frontend. The backend is a server which exposes two main endpoints:

  • /extract, which returns entries from one's knowledge base based on a natural language description, with some options
  • /generate, which generates text given a prompt, with some options

However, the user doesn't usually interact with the endpoints directly. Rather, they use recipes. Recipes tell Dual how to answer certain commands. They can be predefined, user defined, or contributed by some other user. Recipes are simple Markdown files with the following structure:

---
tags: "#dualrecipe"
pattern: "What is the answer to the ultimate question of life, the universe, and everything?"
---

42, naturally.

If the user has this recipe in their vault as a note, then whenever they ask their Dual that question, they'll get the contents of the note as an answer.

The pattern field of a recipe is a regex pattern. It can also house groups, which can then be referenced in the content.

---
tags: "#dualrecipe"
pattern: "My name is (.*)"
---

Hi there, \1!

With this recipe, if the user tells their Dual My name is John, it'll reply with Hi there, John!.

All this is cute, but not all that useful or interesting. Among the recipes there's also this predefined recipe:

---
tags: "#dualrecipe"
pattern: "Find a note which (.*)"
---

'''dual
GET "/extract/This text \1"
'''

Now, this is good old descriptive search, expressed as a recipe which makes use of the /extract endpoint. When asking Find a note which describes a metaphor between machine learning and sociology, it'll answer with a list of results based on that GET HTTP call made behind the scene to the endpoint.

But if you wanted to customize the command triggers even for this predefined command, you could just wrap a new recipe around it, or change the original one. Here's a wrapper recipe:

---
tags: "#dualrecipe"
pattern: "Yo show me a thing which (.*)"
---

Here ya go:

'''dual
ASK "Find a note which \1"
'''

Cool, you just made your Dual a bit edgier.

So this is how you can express good old descriptive search and fluid search as recipes. What about good old open dialogue?

---
tags: "#dualrecipe"
pattern: "^(([Ww]hy|[Ww]hat|[Ww]hen|[Ww]here|[Ww]ho|[Hh]ow).*)"
---

'''dual
GET "/extract/This text is about \1"
'''

Q: \1
A:

'''dual
GET "/generate/"
'''

Now, when you ask it a question with that structure, Dual assembles the relevant notes in there, composes the prompt further with your query, and then generates the response. Good old open dialogue, but expressed as a recipe. Every command becomes a customizable recipe.

Now you want to teach your Dual to come up with writing prompts, you create this recipe:

---
tags: "#dualrecipe"
pattern: "^[Cc]ome up with a writing prompt\.?"
---

prompt: A sentient being has landed on your planet and your civilization's military has confronted it at the landing site of its ship. You are sent closer as a mediator and encounter a mass of energy that has no form but communicates with you in your language.

prompt: Your spaceship has landed on an unknown planet and there is data showing lifeforms who have created artistic structures. There is an artist in your group who wants to make first contact with the beings through art.

prompt: We discover that beneath its seemingly uninhabitable appearance, Mars has an entire race of subterranean alien lifeforms living on it. You are part of the team sent to explore this civilization.

prompt: 

'''dual
GET "/generate/"
'''

You ask it Come up with a writing prompt and you get some in return.

Sure, there are technicalities. The note contents until the generate call should be piped into it as the prompt. The endpoints are shorthand for localhost:5000/..., but you could perhaps change them to refer to a hosted instance at some point in the future. You could make calls to other people's instances through recipes. You could tap into any API through a recipe, turning Dual in a sort of conversational hub. Regex groups have to be entered when making calls. URL's have to be encoded properly because they contain text. Extract calls should know if to supply filenames or contents, through parameters probably. What should a recipe return, the entire contents or the result of the last call? Perhaps a metadata setting. A bunch of things still to settle on.

Remove code snippets from export

Pretty self-explanatory. A regex like the one for front-matter should help. Together with bullet points, subheadings. Are those tackled by beautifulsoup?

Bundle skeleton in a self-contained binary

Not sure what's the best way to go.

  • Docker + PyInstaller + Wine spitting out clean binaries for Linux/Windows sounds somewhat doable.
  • Or somehow turning Docker containers into binaries themselves? Those would be huge.
  • Several users contributing their binaries using PyInstaller on their own OS?

Implement topic search

Fluid search would perform a semantic search as in MemNav using the precomputed embeddings in the cache. Some parameters should define how many items are selected in each of the two passes. Filenames should be returned.

torch.embedding IndexError: index out of range in self

File "...\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\nn\functional.py", line 1916, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

This happens sometimes on Windows in open dialogue. It only pops up with specific questions. My working hypothesis is that nasty characters in retrieved notes hurdle the generation process.

Implement related search

Very similar functionality to fluid search (#2). However, instead of having a search query, this would take in a filename, strip away front matter and perhaps headings, compute its embedding, and treat that as the search query embedding. Should return filenames, just like fluid search.

Switch models to GPT-Neo versions

Similar models but with higher performance as they've been trained on more data. Hopefully they're still fine-tunable in a Colab notebook, at least the medium one.

Implement code block detection in new frontend

In order to actually follow recipes, the engine needs to be able to pick up on:

```dual
```

and

```js
```

code blocks and interpret them accordingly, as described in #45. Regex patterns should do, and a function would return an array of the beginning and end character numbers of all such blocks for later interpretation.

Add option to persist convo

Reloading Obsidian currently clears the conversation, an option could persist it. That would require an additional local store somehow... Perhaps in a text file which would also be human-readable.

Implement sentence-bert NLI

The sentence-bert NLI implementation enables easy access to individual NLI logits. This can be used for taking into consideration the neutral logit, in contrast with the HuggingFace pipeline implementation. Additionally, some prompt engineering is required.

Implement argument parsing in new frontend

Based on arguments detected in #52, such as *person* or *topic*, the values have to be extracted from the user query using text generation as describer here. Argument names and the query should go into a function, and a dictionary with the proper value attributions should come out.

Implement local server exposing API

A Flask-based API should expose an endpoint for receiving text-based commands and delivering the results via the response. Makes use of the rule-based query parser (#7).

Prepare backend for being deployed as binary

The backend component should eventually be a self-contained binary with the following behavior:

  1. It should first expose a limited API which can be used to get a snapshot.
  2. After managing that, it should download and load the two auxiliary models used, the ones for fluid and descriptive search, respectively (#2, #5). The point is that while working behind the scenes to download those, the user can start using the alignment notebook.
  3. The aligned model can be loaded if present, otherwise not. But this should change the behavior of the open dialogue function. It should essentially return more instructions, although those would have been presented initially, too.

Implement argument detection in new frontend

As described here, recipes contain fields such as *topic* or *person*. Those should simply be extracted into an array, for later processing by the recipe engine, following #51. The contents of a recipe should go in, and out come a list of such argument names.

Implement descriptive search in new backend

Similar to the refactoring for #48, descriptive search would need a barebones endpoint which receives a query and a collection of documents. No regex involved, as that is taken up by the frontend in #45. Highest scores of document-query entailment are returned.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.