The due's discuss from dariowho

Include Toy corpus for easier testing

AS A Due user
IN ORDER TO try out the software as easily as possible
I WANT TO import a toy corpus that needs no extra resources

Rationale
Currently there are no pre-trained agents to be imported, and the size of the available corpora is a barrier to a quick setup of the agent. We want a user to be able to try out an agent with as little external dependencies as possible.

TODO

Change serialization format to JSON/YAML
Add a built-in toy corpus
Update README to the toy corpus

Implement Due as a Jabber agent

Make Events, Episodes, Brains and Agents serializable

As of now most of the entities in Due can be "saved", meaning that they implement a save() method returning the entity itself as a Python object.

We want saved objects to be serializable (pickle? JSON?) and we need to implement the deserialization and loading counterpart of the process.

Allow for tokenized input in `due.nlp` module

Current NLP functions are meant to receive strings where tokens are delimited by spaces. These can either be raw inputs, or normalized strings where tokens are properly splitted (e.g. "It's raining" could be normalized into ("it 's raining").

This approach is not sufficient to handle tokens that contain multiple words.

As a solution, NLP methods should be allowed to receive both string (will be splitted on spaces, as it happens now) and list of strings (already splitted)

Implement Telegram serving module

AS A user
I WANT TO serve Due agents as telegram bots
SO THAT I can access the bot like any other contact

Test Python2 compatibility

So far Due has only been tested in a Python 3 environment. We want it to be compatible with 2.7 as well, through compatibility libraries such as python-future and six.

Acceptance criteria: run the test suite with Python 2.7

Finalize basic readme

A first version of README.md must have:

Implement cross-Brain evaluation metrics

Add support for Webhook Actions

We want an Action to wrap a call around an API resource. This allows Due to be used as an interface for existing applications exposing a REST API.

Add support for multiple languages

Move CosineBrain to the model library

Currently, there is a baseline CosineBrain in the core brain.py module.

We want to create a models package, and move the CosineBrain there.

Introduce interactive CLI for easier testing

AS A Due user
IN ORDER TO try out the software as easily as possible
I WANT TO have a console based agent that needs no extra resources

Rationale
Currently, the only way to deploy Due is through its XMPP interface, which requires access to an external chat server. We want a user to be able to try out an agent with as little external dependencies as possible.

This is also an opportunity to refactor the role of the Agent class, which currently does little more than passing information to/from a Brain class.

TODO

Refactor the role of the Agent interface
Introduce serve package porting current XMPP agent
Add a serve.cli for dialog over CLI
Update README to use ConsoleAgent (move XMPP example to docs)

Implement asynchronous notifications for events

Agents in an episode are notified every time an Event happens; each notification triggers a callback mechanism that is currently synchronous, and this allows an agent to process only one event at the time.

We want notifications to be asynchronous, to make possible for agents to suspend or modify their reasoning activities after events are triggered. It may be useful to include "typing" events in the flow.

Use Pipenv and read dependencies from Pipfile in setup.py

Implement a baseline CosineBrain model

We want a baseline Brain model that decides on Events to issue in Episodes based on a trivial vector similarity measure between sentences, and/or whole Event sequences.

Implement a Brain interface

The Brain is responsible for predicting the most appropriate Events to issue in the current Episode, based on the memory of the previous ones.

We want to define an interface to allow different implementations to be integrated in the Agents.

Introduce Rewards and/or other feedback option

It's realistic to think that Agents will implement some form of reinforcement learning to achieve good results; explicit feedback from the user would improve this. Possibly, the reward mechanisms should be intertwined with language understanding, so that regular sentences can be associated with "hardcoded" rewards.

Implement HTTP serving module

AS a user
I WANT TO have an agent exposed on HTTP as a REST API
SO THAT I can easily integrate it with my software

Replace Friends corpus source

AS A data scientist
I WANT TO load the friend corpus from a published source
SO THAT I can expect a higher level of curation and features

Technical details
This one fits: https://github.com/emorynlp/character-mining

Remove Agent references in Brain classes

Create a module to load the Cornell Movie Dialog Corpus

The Cornell corpus is a well known resource in Dialog Managers literature. We want a module to read the corpus, and return its content in the form of a collection of Episodes.

Write tests

Code written so far is untested. We want to catch up with tests before moving to the next steps.

Switch from Pipenv to Poetry

AS a developer
IN ORDER TO improve package lock speed and drop maintenance of setup.py
I WANT TO use Poetry for dependency management
INSTEAD of Pipenv

Rationale
Poetry (https://poetry.eustace.io/) is an alternative to pipenv that is supposed to speed up package lock and integrates a framework to build packages without maintaining a separate setup.py, we want that.

Implement a neural Dialogue Management model

We want to create a library of Dialogue Management models (ie. Brains), and kick it off with a proper neural model.

More details TBD

Create Sphinx documentation project

sphinx-quickstart
Check and update docstrings

Refine support for multi-agent episodes

Even though many parts of the framework are written with multi-agent support in mind, the "2 agents" assumption was taken here and there to ease development.

Someday, even though not in the foreseeable future, this assumption needs to be relaxed.

Publish documentation on GitHub Pages

Remove 'python-magic' dependency

AS a developer
I WANT TO have due running without python-magic
SO THAT i can install due easily, and on many different platforms

Technical details
We use python-magic to detect file types during serialization. This is inconvenient, because the package requires libmagic to be installed at OS level. We want to find a replacement, or in case change the de-serialization flow.

Add support for basic actions

An Event in an Episode can be an Utterance. It should be also possible to issue Action Events, possibly supporting dynamic loading from a user-supplied library.

In their basic implementation, Actions take no parameters.

Pass last Event along with Episode in Agent callbacks

The Event that triggers a callback is currently inferred by the Agent as the last Event of the Episode. To make it explicit, and to prepare for asynchronous notifications, we want to pass the Event as an argument of callback functions.

Create a Resource Loading framework

Due should have its own library of resources.

A Resource Manager should define the folder where Resources are located, and provide easy access to the other components of the application.

Create interface for creating/reviewing episodes

An Agent (more precisely, its Brain module) should record all the episodes it was involved in. Some of this episodes may be successful and valuable for learning, while others may contain non-ideal answers on the machine side (and, occasionally, on the human side as well).

It will be useful to have a user-friendly interface to filter, amend or just visualize episodes in an Agent's memory. Such an interface should also support the creation of new episodes, and cover the basic I/O operations on Episode files.

Create Dockerfile

AS a user who wants to try Due
IN ORDER TO get Due running as quick as possible
I WANT TO run Due as a Docker container

Rationale
There is currently some ambiguity on whether Due should be imported as a library or run as an application. We want Due to expose its packages and classes into external applications, but we also want to provide a stand-alone application, that loads an agent and serve it on a given channel (e.g. XMPP). Docker seems to be the most user-friendly option to do this.

Technical details
There are some issues to solve when it comes to make a battery-included Docker image for Due. Ideally, Due's container should be able to:

Be configurable with respect to the channel where to expose the agent (XMPP, REST, ...)
The start script could read an env variable to configure the channel
Load an arbitrary agent
Docker compose could mount an folder by default, where to put optional agent files
Download resources into the container
See above
Load Action classes that are provided by external packages
We could initially support only default actions. Possibly this is a long term solution, if we decide that a single RESTAction type is the only interface betewwn Due and the world

Separate core framework from NLU/NLG modules

AS a developer
IN ORDER TO avoid installing dependencies that are not necessary
I WANT TO install the Brain modules I need separately from Due's core framework

Rationale
Due is made to integrate a collection of ready-made NLU/NLG modules, that we call "Brain"; a Brain can learn from Episodes, and can predict the agent's answer in a conversation. Brains may be implemented with different technologies (PyTorch, Tensorflow, pure python, ...), and including a model library in the core Due package would mean to carry the burden of many heavy dependencies in the single core package. This would penalize users that only want to try one of them out, as well as developers that want to develop new ones. Because of this, we want to include only a couple of example brains in the core package, and move the more sophisticated ones to external packages.

Implement Event.add_event() with an asynchronous queue

AS a user
IN ORDER TO avoid recursion and receive Events fairly
I WANT TO use an asynchronous queue to handle Events from Agents

Rationale
Currently, each time an event is received by an Episode with Episode.add_event(), Agent callbacks are triggered to produce responses. The Agents receiving the callbacks will produce new Events and add them to the Episode. Currently, the agents call Episode.add_event() to add response Events. As this is a synchronous method, we introduce recursion in the process. This has two effects:

Two bots talking together produce a stack of recursive calls when generating replies, and there's no protection against stack overflow
When more than two bots are talking together, only the first two will be engaged in the conversation

As a solution, we propose to implement Episode.add_event() as a simple method that enqueues the event. The queue is consumed in parallel, so that event handling is more controlled and fair.

dariowho / due Goto Github PK

due's Issues

Recommend Projects

Recommend Topics

Recommend Org