Coder Social home page Coder Social logo

alice-benchmarks's People

Contributors

kirsle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

alice-benchmarks's Issues

Your begin.rive file is missing?

I noticed that your begin.rive file is missing in this repo "include a document named begin.rive that contains some configuration settings for your bot's brain. The most useful settings that would be set here include substitutions, which are able to make changes to the user's message before a reply is looked for." Would you be kind enough to point me to right file with all the bot settings? And fantastic work so far by the way. Any plans of trying to benchmark rosie?

ProgramV-style Reply Sorting

Converted from: https://www.kirsle.net/wiki/Optimize-RiveScript

Problem

None of the RiveScript modules can effectively handle a brain the size of Alice's. The Golang version is able to load Alice the fastest (< 1 second) whereas the others take closer to 20+ seconds. However, when actually fetching a reply they all take about 15 seconds.

The root problem is probably in the sorted reply structure, which looks generally like this:

sorted = {
    "random": [ // topic name
        ["how are you", pointer ], // triggers ordered by priority
        ["hello bot", pointer ],
        ["*", pointer]
    ]
};

Under a topic, all triggers are sorted in their optimal sort order, which is generally: atomic triggers with the most number of words are first, less specific triggers later, least specific last. But triggers with custom priorities ({weight} tags, or from a topic that inherits other topics, etc.) always come before lower priority sets of triggers.

In the Alice reply set this means there's about 68,000 triggers in one giant array under the "random" topic, so the code has to scan through several tens of thousands of triggers when finding a match.

Alicebot Program V

Alicebot Program V is an AIML bot and it stores patterns in a more efficient way: it separates the first word of the pattern away from the rest. When looking up a response for the user, it can then use the first word as a dictionary key (there's a relatively small set of distinct first words), and then have a much simpler array of triggers to look at. Example:

# The following patterns are represented here:
# ITS *
# ITS BORING
# ITS FUN
# ITS GOOD *

$data = {
   aiml => {
      matches => {
         'ITS' => [
            '* <that> * <topic> * <pos> 17818',
            'BORING <that> * <topic> * <pos> 17819',
            'FUN <that> * <topic> * <pos> 17820',
            'GOOD * <that> * <topic> * <pos> 17821',
         ],
      },
   },
};

My blog entry has more details. The <pos> refers to an array index where the pattern's details are; in the more recent RiveScript implementations (CoffeeScript and Go) we keep pointers with the triggers in the sorted structure so we don't need to worry about that.

Complex Triggers

At first glance a Program V style way of sorting triggers looks good, but in RiveScript triggers are much more complicated and "regexp-y", for example:

(what is|what was) your name

These things would still need to be taken into account. Also the relative priority of each trigger via {weight} and topic inheritance.

Possible Solution

Change the sort structure to look more like this:

sorted = {
    "random": [ // topic name
        [ // these arrays are for priority level, higher on top
            [
                "hello", // first word
                [ // list of triggers under that word
                    ["hello bot", pointer]
                ]
            ],
            ["how", [ ["how are you", pointer] ],
            ["*", [ ["*", pointer] ]
        ]
    ]
}

So the logic for matching a trigger would be along these lines:

user_first_word = re.split(r'\s+', message)[0]

for priority in self._sorted.topics[topic]:
    for first_word in priority:
        # this next line would actually be a regexp for * triggers, etc.
        if user_first_word == first_word[0]:
            # Their first word matches! Look through all the triggers for this word.
            for trigger in first_word[1]:
                # Again this would be a regexp in reality
                if message == trigger[0]:
                    # Have a match!
                    matched = trigger[1]

                    # now `matched` points to the trigger's details for the
                    # replies, conditions, etc.

For finding the first words, a function like getFirstWords(trigger) could be added that returns one or multiple first words.

  • If the trigger begins with [ or (, return the first words of all the regexp-y parts.
    • Example: (what time|when) is it would return ["what", "when"]
    • Example: how are you would return ["how"]
  • The first words would be sorted by length, with words like * at the bottom.
  • All triggers that share a first word get placed in an array under that word, sorted in the normal order (most optimal matching first).

Missing Previous Responses

Hi Kirsle,

I tried finding your email and contacting you but the mailer daemon fails in gmail. I have been working on this brain set for some days now and I am actually building a Java rive manager of sorts. It takes all the brain files, crunches them into one massive file, sorts them alphabetically from top to bottom and as of now it can find all the triggers that are dependent on a particular trigger as I am about to do cascade deletes.
The master file has 284915 lines of rive script roughly and my Java program can easily manage stuff with them. Here is a screenshot of this in action

capture
As per the above screenshot, I am trying to find all the rules that are related to this trigger called "+ what languages do you speak"
Outgoing links are the redirects a trigger has to other triggers recursively while incoming links are the list of other triggers that refer to yours. So when you delete a trigger, you dont have to worry about missing redirects and hence this program. I noticed that the alice brain files have tonnes of triggers with a % previous in them but I cannot find any bot response for most of the % that I encountered. Would it be possible for you to let me know if that is really the case? I will be more than willing to give this program to you after I add a few more features to it. Currently doing what I feel is necessary to delete unwanted triggers quickly. the idea is to remove facts from rules
Facts like who was the first president of xyz can be fetched from a factual db whereas pure rive rules in the master script will load things much faster and make this master file far better

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.