dariowho / due Goto Github PK
View Code? Open in Web Editor NEWAn episodic, multi-model, servable framework for Dialog Systems
License: GNU General Public License v3.0
An episodic, multi-model, servable framework for Dialog Systems
License: GNU General Public License v3.0
AS A Due user
IN ORDER TO try out the software as easily as possible
I WANT TO import a toy corpus that needs no extra resources
Rationale
Currently there are no pre-trained agents to be imported, and the size of the available corpora is a barrier to a quick setup of the agent. We want a user to be able to try out an agent with as little external dependencies as possible.
TODO
As of now most of the entities in Due can be "saved", meaning that they implement a save()
method returning the entity itself as a Python object.
We want saved objects to be serializable (pickle? JSON?) and we need to implement the deserialization and loading counterpart of the process.
Current NLP functions are meant to receive strings where tokens are delimited by spaces. These can either be raw inputs, or normalized strings where tokens are properly splitted (e.g. "It's raining" could be normalized into ("it 's raining").
This approach is not sufficient to handle tokens that contain multiple words.
As a solution, NLP methods should be allowed to receive both string (will be splitted on spaces, as it happens now) and list of strings (already splitted)
AS A user
I WANT TO serve Due agents as telegram bots
SO THAT I can access the bot like any other contact
So far Due has only been tested in a Python 3 environment. We want it to be compatible with 2.7 as well, through compatibility libraries such as python-future
and six
.
Acceptance criteria: run the test suite with Python 2.7
A first version of README.md must have:
We want an Action to wrap a call around an API resource. This allows Due to be used as an interface for existing applications exposing a REST API.
Currently, there is a baseline CosineBrain
in the core brain.py
module.
We want to create a models
package, and move the CosineBrain
there.
AS A Due user
IN ORDER TO try out the software as easily as possible
I WANT TO have a console based agent that needs no extra resources
Rationale
Currently, the only way to deploy Due is through its XMPP interface, which requires access to an external chat server. We want a user to be able to try out an agent with as little external dependencies as possible.
This is also an opportunity to refactor the role of the Agent class, which currently does little more than passing information to/from a Brain class.
TODO
serve
package porting current XMPP agentserve.cli
for dialog over CLIAgents in an episode are notified every time an Event happens; each notification triggers a callback mechanism that is currently synchronous, and this allows an agent to process only one event at the time.
We want notifications to be asynchronous, to make possible for agents to suspend or modify their reasoning activities after events are triggered. It may be useful to include "typing" events in the flow.
We want a baseline Brain model that decides on Events to issue in Episodes based on a trivial vector similarity measure between sentences, and/or whole Event sequences.
The Brain is responsible for predicting the most appropriate Events to issue in the current Episode, based on the memory of the previous ones.
We want to define an interface to allow different implementations to be integrated in the Agents.
It's realistic to think that Agents will implement some form of reinforcement learning to achieve good results; explicit feedback from the user would improve this. Possibly, the reward mechanisms should be intertwined with language understanding, so that regular sentences can be associated with "hardcoded" rewards.
AS a user
I WANT TO have an agent exposed on HTTP as a REST API
SO THAT I can easily integrate it with my software
AS A data scientist
I WANT TO load the friend corpus from a published source
SO THAT I can expect a higher level of curation and features
Technical details
This one fits: https://github.com/emorynlp/character-mining
The Cornell corpus is a well known resource in Dialog Managers literature. We want a module to read the corpus, and return its content in the form of a collection of Episodes.
See also:
Code written so far is untested. We want to catch up with tests before moving to the next steps.
AS a developer
IN ORDER TO improve package lock speed and drop maintenance of setup.py
I WANT TO use Poetry for dependency management
INSTEAD of Pipenv
Rationale
Poetry (https://poetry.eustace.io/) is an alternative to pipenv that is supposed to speed up package lock and integrates a framework to build packages without maintaining a separate setup.py
, we want that.
We want to create a library of Dialogue Management models (ie. Brains), and kick it off with a proper neural model.
More details TBD
Even though many parts of the framework are written with multi-agent support in mind, the "2 agents" assumption was taken here and there to ease development.
Someday, even though not in the foreseeable future, this assumption needs to be relaxed.
AS a developer
I WANT TO have due running without python-magic
SO THAT i can install due easily, and on many different platforms
Technical details
We use python-magic
to detect file types during serialization. This is inconvenient, because the package requires libmagic
to be installed at OS level. We want to find a replacement, or in case change the de-serialization flow.
An Event in an Episode can be an Utterance. It should be also possible to issue Action Events, possibly supporting dynamic loading from a user-supplied library.
In their basic implementation, Actions take no parameters.
The Event that triggers a callback is currently inferred by the Agent as the last Event of the Episode. To make it explicit, and to prepare for asynchronous notifications, we want to pass the Event as an argument of callback functions.
Due should have its own library of resources.
A Resource Manager should define the folder where Resources are located, and provide easy access to the other components of the application.
An Agent (more precisely, its Brain module) should record all the episodes it was involved in. Some of this episodes may be successful and valuable for learning, while others may contain non-ideal answers on the machine side (and, occasionally, on the human side as well).
It will be useful to have a user-friendly interface to filter, amend or just visualize episodes in an Agent's memory. Such an interface should also support the creation of new episodes, and cover the basic I/O operations on Episode files.
AS a user who wants to try Due
IN ORDER TO get Due running as quick as possible
I WANT TO run Due as a Docker container
Rationale
There is currently some ambiguity on whether Due should be imported as a library or run as an application. We want Due to expose its packages and classes into external applications, but we also want to provide a stand-alone application, that loads an agent and serve it on a given channel (e.g. XMPP). Docker seems to be the most user-friendly option to do this.
Technical details
There are some issues to solve when it comes to make a battery-included Docker image for Due. Ideally, Due's container should be able to:
Action
classes that are provided by external packagesRESTAction
type is the only interface betewwn Due and the worldAS a developer
IN ORDER TO avoid installing dependencies that are not necessary
I WANT TO install the Brain modules I need separately from Due's core framework
Rationale
Due is made to integrate a collection of ready-made NLU/NLG modules, that we call "Brain"; a Brain can learn from Episodes, and can predict the agent's answer in a conversation. Brains may be implemented with different technologies (PyTorch, Tensorflow, pure python, ...), and including a model library in the core Due package would mean to carry the burden of many heavy dependencies in the single core package. This would penalize users that only want to try one of them out, as well as developers that want to develop new ones. Because of this, we want to include only a couple of example brains in the core package, and move the more sophisticated ones to external packages.
AS a user
IN ORDER TO avoid recursion and receive Events fairly
I WANT TO use an asynchronous queue to handle Events from Agents
Rationale
Currently, each time an event is received by an Episode with Episode.add_event()
, Agent callbacks are triggered to produce responses. The Agents receiving the callbacks will produce new Events and add them to the Episode. Currently, the agents call Episode.add_event()
to add response Events. As this is a synchronous method, we introduce recursion in the process. This has two effects:
As a solution, we propose to implement Episode.add_event()
as a simple method that enqueues the event. The queue is consumed in parallel, so that event handling is more controlled and fair.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.