w3c / cogai Goto Github PK

View Code? Open in Web Editor NEW

52.0 17.0 24.0 13.59 MB

for work by the Cognitive AI community group

License: Other

JavaScript 0.98% HTML 97.30% Python 1.72%

cogai's Introduction

W3C Cognitive AI Community Group

This repository is for work by the W3C Cognitive AI Community Group.

Introduction
Webinars
Background materials
Program of work
Positioning relative to existing approaches to AI
Historical context
Cognitive Architecture
Long Term Aims

Introduction

According to wikipedia:

Cognitive science is the interdisciplinary, scientific study of the mind and its processes. It examines the nature, the tasks, and the functions of cognition. Cognitive scientists study intelligence and behavior, with a focus on how nervous systems represent, process, and transform information.

Cognitive AI can be defined as AI based upon insights from the cognitive sciences, including cognitive neuroscience, and cognitive sociology. To put it another way, the brain has evolved over hundreds of millions of years, and we would do well to borrow from nature when it comes to building AI systems.

The W3C Cognitive AI Community Group seeks to demonstrate the potential of Cognitive AI through:

Collaboration on defining use cases, requirements and datasets for use in demonstrators
Work on open source implementations and scaling experiments
Work on identifying and analysing application areas, e.g.
- Helping non-programmers to work with data
- Cognitive agents in support of customer services
- Smart chatbots for personal healthcare
- Assistants for detecting and responding to cyberattacks
- Teaching assistants for self-paced online learning
- Autonomous vehicles
- Smart manufacturing
- Smart web search
Outreach to explain the huge opportunities for Cognitive AI
Participation is open to all, free of charge: join group

We are using GitHub for documents, issue tracking and open source components. We have a public mailing list, and an IRC channel #cogai.

Current Work

Webinars

This is a series of recorded webinars, see webinar planning

7 March 2024, The role of symbolic knowledge at the dawn of AGI, as presented to the Estes Park Group: slides, and video recording

Background materials

Talks
- 04 July 2023 Combining Digital Twins with Cognitive Agents, talk at ETSI IoT Week 2023
- 14 June 2023 Human-like AI: from logic to argumentation, reasoning with imperfect knowledge in the era of AGI, see video recording talk for Darmstadt Ontology Group
- 18 April 2023 Human-Like AI for Artificial General Intelligence, as part of the Special day on Human-AI interaction, DATE 2023
- 20 February 2023 - Human-like AI for Artificial General Intelligence - research challenges for the Intelligent Web of Things)
- 14 November 2022 - Cognitive Agents, Plausible Reasoning and Scalable Knowledge Engineering, Workshop on Computing across the Continuum for the US NSF and the European Commission
- 12 September 2022 - Plausible reasoning and artificial minds, Workshop on Analogies: from Theory to Applications
- 16 February 2022 - Imperfect Knowledge, Distributed Knowledge Graphs workshop
- 06 September 2021 - Digital Transformation using chunks as a simple abstraction above triples and property graphs - talk at the Semantics2021 workshop on squaring the circle on graphs
- 17 March 2021 - Digital Transformation and the Knowledge Economy, an invited keynote for the Siemens conference on the IoT
- 17 March 2021 - Roadmap for realising general purpose AI
- 14-19 February 2021 - Human-like AI and the Sentient Web for Dagstuhl seminar on autonomous agents on the Web - slides, paper
- 11 January 2021 - Presentation and discussion of work on Cognitive Natural Language Processing
- 25 November 2020 - Seminar on Cognitive AI for Centre for Artificial Intelligence, Robotics and Human-Machine Systems, Cardiff University: slides, video
- 06 November 2020 - Seminar on Cognitive AI for Knowledge Media Institute, Open University - video and slides
- 08 June 2020 - Cognitive AI and the Sentient Web for ISO/TC 211 50th Plenary meeting, WG4 Geospatial Services
Chunks format for declarative and procedural knowledge
Common Sense Reasoning
Demonstrators
Frequently asked questions
Longer treatise on Cognitive AI
Contributing to the Cognitive AI Community Group

Program of work

The initial focus is to describe the aims for a sequence of demonstrators, to collaborate on the scenarios, detailed use cases, and associated datasets, and to identify and discuss questions that arise in this work. In relation to this, we are working on a formal specification of the chunk data and rules format with a view to its standardisation.

A further line of work deals with the means to express and reason with imperfect knowledge, that is, everyday knowledge subject to uncertainty, imprecision, incompleteness and inconsistencies. See the draft specifcation for the plausible knowledge notation (PKN), and the web-based PKN demonstrator. This is based upon work on guidelines for effective argumentation by a long line of philosophers since the days of Ancient Greece. In place of logical proof, we have multiple lines of argument for and against the premise in question just like in courtrooms and everyday reasoning.

Both PKN and chunks & rules can be considered in relation to RDF. RDF is a framework for symbolic graphs based upon labelled directed graph edges (aka triples). Compared to RDF, PKN and chunks & rules are higher level with additional semantics, and designed for use in human-like AI applications. Furthermore, both notations are designed to be easy to read and author compared with RDF serialisations such as RDF/XML, Turtle and even JSON-LD. See also the Notation3 (N3) Language which is an assertion and logic language defined as a superset of RDF.

Work is now underway on vector-space representations of knowledge using artificial neural networks. Advances with generative AI have shown the huge potential of vector-space representations in combination with deep learning. However, there is a long way to go to better model many aspects of human cognition, e.g. continual learning using a blend of type 1 and type 2 cognition, episodic memory, and the role of emotions and feelings in directing cognition. Symbolic models will continue to serve an important role for semantic interoperability. Neurosymbolic systems combine the complementary strengths of vector space and symbolic approaches.

Positioning relative to existing approaches to AI

Traditional AI focuses on symbolic representations of knowledge and on mathematical logic, e.g. Expert Systems and the Semantic Web. Deep Learning, by contrast, focuses on statistical models implemented as multi-layer neural networks. Both approaches have their weaknesses. Symbolic AI has difficulties with the ambiguities, uncertainties and inconsistencies commonplace in everyday situations. Furthermore, the reliance on manual knowledge engineering is a big bottleneck. Deep Learning has problems with recognising what’s salient, providing transparent explanations, the need for very large data sets for training, and difficulties with generalisation. Symbolic AI and Deep Learning are associated with siloed communities that typify modern science in which researchers are discouraged from interdisciplinary studies and the breadth of views that those encompass.

Cognitive AI seeks to address these weaknesses through mimicking human thought, taking inspiration from over 500 million years of neural evolution and decades of work across the cognitive sciences. This involves the combination of symbolic and statistical approaches using functional models of the human brain, including the cortex, basal ganglia, cerebellum and limbic system. Human memory is modelled in terms of symbolic graphs with embedded statistics reflecting prior knowledge and past experience. Human reasoning is not based upon logic, nor on the laws of probability, but rather on mental models of what is possible, along with the use of metaphors and analogies.

Research challenges include mimicry, emotional and social intelligence, natural language and common sense reasoning. Mimicry is key to social interaction, e.g. a baby learning to smile at its mother, and young children learning to speak. As a social species, we pay attention to social interaction and models of ourselves and others, including beliefs, desires, judgements and behaviours. Emotional control of cognition determines what is important, and plays a key role in how we learn, reason and act. Natural language is important for both communication and for learning and the means to break free from the manual programming bottleneck. Common sense is everyday knowledge and key to natural language understanding, and is learned through experience.

Historical context

AI lacks a precise agreed definition, but loosely speaking, it is about replicating intelligent behaviour, including perception, reasoning and action. There are many sub-fields of AI, e.g. logic and formal semantics, artificial neural networks, rule-based approaches including expert systems, statistical approaches including Bayesian networks and Markov chains, and a wide variety of approaches to search, pattern recognition and machine learning. Cognitive AI seeks to exploit work across the cognitive sciences on the organising principles of the human mind.

Chunk rules are a form of production rules as introduced by Allen Newell in 1973 in his production system theory of human cognition, which he subsequently developed as the SOAR project. John Anderson published his theory of human associative memory (HAM) in 1973, and inspired by Newell, went on to combine it with a production system to form the ACT system in 1976, and developed it further into ACT-R in 1993. ACT-R stands for adaptive control of thought - rational and has been widely applied to cognitive science experiments as a theory for simulating and understanding human cognition. For more details see An Integrated Theory of the Mind. Chunks, in turn, was inspired by ACT-R, and the realisation that the approach could be adapted for general use in artificial intelligence as the combination of graphs, statistics, rules and graph algorithms.

Credit is also due to Marvin Minsky for his work on frames, metacognition, self-awareness and appreciation of the importance of emotions for controlling cognition, to Philip Johnson-Laird for his work on mental models and demonstrating that humans don't reason using logic and probability, but rather by thinking about what is possible, to George Lakoff for his work on metaphors, Dedre Gentner for her work on reasoning with analogies, and to Allan Collins for his work on plausible reasoning. Cognitive AI has a broader scope than ACT-R and seeks to mimic the human brain as a whole at a functional level, inspired by advances across the cognitive sciences. As such, Cognitive AI can be contrasted with approaches that focus on logic and formal semantics. Cognitive AI can likewise be decoupled from the underlying implementation, as the phenomenological requirements are essentially independent of whether they are realised as explicit graphs, vector spaces or pulsed neural networks, see David Marr's three levels of analysis.

Cognitive Architecture

The following diagram depicts how cognitive agents can be built as a collection of different building blocks that connect via the cortex, which functions as a collection of cognitive databases and associated algorithms. Semantic integration across the senses mimics the Anterior Temporal Lobe's role as a hub for the unimodal spokes. The initial focus of work was on a chunk rule engine inspired by John Anderson's ACT-R. Current work is focusing on plausible reasoning and belief revision. Future work will look at the other building blocks.

Image of cognitive architecture as a set of modules connected via the cortex

Perception interprets sensory data and places the resulting models into the cortex, e.g. scene graphs. Cognitive rules can set the context for perception, and direct attention as needed. Events are signalled by queuing chunks to cognitive buffers to trigger rules describing the appropriate behaviour. A prioritised first-in first-out queue is used to manage events that are closely spaced in time.
Emotion is about cognitive control and prioritising what’s important. The limbic system provides rapid assessment of past, present and imagined situations without the delays incurred in deliberative thought. Emotions are perceived as positive or negative, and associated with passive or active responses, involving actual and perceived threats, goal-directed drives and soothing/nurturing behaviours.
Cognition is slower and more deliberate thought, involving sequential execution of rules to carry out particular tasks, including the means to invoke graph algorithms in the cortex, and to invoke operations involving other cognitive systems. Thought can be expressed at many different levels of abstraction, and is subject to control through metacognition, emotional drives, internal and external threats.
Action is about carrying out actions initiated under conscious control, leaving the mind free to work on other things. An example is playing a musical instrument where muscle memory is needed to control your finger placements as thinking explicitly about each finger would be far too slow. The Cerebellum provides real-time coordination of muscle activation guided by perception, computing smooth functions over time.

Zooming in on cognition and the role of the basal ganglia as a sequential rule engine, the architecture looks like:

Image of cognitive architecture for deliberative reasoning (System 2)

This has been implemented as an open source JavaScript library and used as the basis for an evolving suite of demos.

New work is underway on vector space approaches inspired by human cognition and the advances in generative AI. This will mimic human language processing (sequential, hierarchical and predictive), implicit and explicit memory, continual learning, and Type 1 and 2 cognition. This is being implemented in Python and PyTorch. Language processing uses retained feedback connections in conjunction with a small sliding window to mimic the buffering limitations of the phonological loop. Type 2 cognition features a vector based implementation of chunks and rules. Explicit memory (episodic and encyclopaedic) is based upon a vector database designed to mimic characteristics of human memory (forgetting curve, spreading activation and spacing effect). The different modules are integrated through shared access to the latent semantics (loosely equivalent to the buffers in the above diagram).

Long Term Aims

In the long run, the mission of the Cognitive AI Community Group is to enable cognitive agents that:

Are knowledgeable, general purpose, creative, collaborative, empathic, sociable and trustworthy
Can apply metacognition and past experience to reason about new situations
Support continual learning based upon curiosity about the unexpected
Have a level of self awareness in respect to current state, goals and actions
Have an awareness of others in respect to their beliefs, desires and intents
Are multilingual and can interact with people using their own language

These topics can be divided into areas for study and exploration with an emphasis on identifying use cases and building demonstrators that advance the overall mission. There is plenty to exploit along the way, with many opportunities to spin off practical applications as work proceeds.

p.s. useful tips on using GitHub for W3C projects

cogai's People

Contributors

Stargazers

Watchers

cogai's Issues

Rule conditional tests

To test that a property has a different value, use ~ as a prefix on the value. In the following rule, the second condition chunk the ~ prefix is used with a variable to ensure that this rule won't apply if the from and to properties have the same value.

# count up one at a time
count {@module goal; state counting; from ?num1; to ?num2},
count {@module goal; state counting; from ?num1; to ~?num1},
increment {@module facts; number ?num1; successor ?num3}
   =>
     count {@module goal; @do update; from ?num3},
     increment {@module facts; @do get; number ?num3},
     console {@module output; @do log; value ?num3}

You can also test that a given property is undefined by using ~ for the value on its own. This can also be used in action chunks when you want to set properties to be undefined .

Sometimes it may be necessary to test whether a variable holds a boolean, number, name, date, or string literal. This suggests the need for properties like @boolean, @Number, @name, @Date, and @string that test that their values have the given type.

Complex string operations would seem to be beyond the scope of a simple rule language, and something that could be better handled via invoking operations implemented by a module. This suggests that we don't need built in operators for string literals.

There is also a need for simple numerical operations, e.g. comparisons, such as @LTeq which would be used with two variables to test that the value of the first is less than or equal to the value of the second. Likewise, @gt tests that the first value is greater than the second. Both tests fail if there is only one value, or more than two, or when either of the two values is not a number.

Note that @gt and @LTeq involve the use of a comma separated list of values. That isn't supported in the minimalist version of chunks which limits properties to names. An alternative is to use an application defined action that passes the values as separate properties.

K-line model

Hi community,

Thanks for the fantastic repo for Cognitive AI.
I'm an incoming CS PhD student at The University of Maryland, College Park. In preparation for my PhD research topic, I am launching into various directions of nonsense, among which is the age-old topic of K-line.

Regarding this post (https://github.com/w3c/cogai/blob/master/demos/memory/README.md), do you have any plan to model a K-line model or how important it could be to the AI society?
I'm curious how important it could be compared to artificial neural networks. Or maybe we can simply combine k-line with cnn or Transformers.

Chunks syntax: characters allowed for types, names, and ids

The chunk.js implementation suggests that names are composed of letters and digits, as well as a restricted set of punctuation characters.

However, the description of @rdfmap suggests that chunk property values could be IRIs:

@rdfmap {
  dog http://example.com/ns/dog
  cat http://example.com/ns/cat
}

In practice, I wonder what are allowed characters for types, names, and ids. It seems to me that allowing IRIs (as done in JSON-LD) could also help mapping with the semantic world, and that it would allow reasoning about things. For instance, I could have

website https://example.org/ {
  name "An example page"
}

One problem is that commas are allowed in IRIs, which makes them problematic for use in a comma separated list of property values. A solution is to simply use space as a separator between values, or to mandate excaping of commas in IRIs.

Clarify that lists are ordered

A value is either an atomic value or a list of atomic values. The spec should clarify that lists of atomic values are ordered.

How to match lists of atomic values

The matching chunk algorithm describes conditions that must hold true for a chunk to match a condition/action chunk.

Pending PR #29 adds context matching and re-words the algorithm for it to be clearer and more understandable. It also adds an editor's note that the algorithm needs to be updated to explain how to match list of atomic values. There are two main possibilities to handle a list of atomic values:

"All of": All atomic values in the condition/action need to exist in the chunk being considered. The match can be strict (the chunk being considered must not have another atomic value) or loose (the chunk being considered may have an atomic value that is not in the list to match)
"One of": Consider that at least one atomic value in the condition/action needs to match in the chunk being considered.

Which one should we apply?

memory activation and decay

Unlike conventional database systems, cognitive agents just want to recall what is most important based upon experience. This is similar to web search engines which seek to provide the results that are likely to be most relevant given the words in the search query.

Forgetting as intrinsic memory decay or as interference from other memories, or some combination of the two?
Exponential decay over time provides good fit to lab data
But can also be ascribed to interference from new memories
Priming effect on related memories as spreading activation
We can recall memories from many years ago given the right cues
New memories lack strong evidence for their lasting value
Such evidence has to be acquired with experience
What’s the most effective model for all of these points?

Underwood (1957) showed that memory loss is largely attributable to interference with other memories. Memories can thus be recalled after an interval of many years provided that the interference is small. This reflects experience in selecting memories that have been more valuable.

For ACT-R, the decay of activation is only one component of the activation equation. There is also a context component to activation which works to increase the activation of items based on the current context. Thus, even chunks which have decayed significantly over time can have activations above the threshold if they are strongly related to the current context.

Proposed approach for the chunks specification

Chunks have parameters for an activation level and a timestamp
Activation decays over time like a leaky capacity losing its charge
Recalling or updating a chunk boosts its activation level
Boost is weaker for closely spaced rehearsals – aka the spacing effect – and is based on the Logistic function
Decaying wave spreads through linked chunks to boost related concepts
Stochastic recall - chunks with higher activation levels are more likely to be recalled, but sometimes weaker chunks are recalled in place of stronger chunks

Spreading Activation

Why is it easier to remember items in a group for groups with fewer items?
A wave of spreading activation provides one possible explanation
Activation of one item in the group spreads to other items in the same group following property links in both directions
The amount of wave activation for each item is inversely related to the number of items in the group
What is the underlying computational model for pulsed neural networks?

Here is an example:

# items belonging to group animals
item {word dog; group animals}
item {word horse; group animals}
item {word cat; group animals}

Remembering the item for dog boosts the chunk for the group (animals) and spreads out to boost the other items in that group
Does this depend on the property (in this case group) being the same?
How can we implement this efficiently on conventional computers?

One implementation strategy is to have one index mapping from chunk IDs to chunks, and another index from chunk IDs to the set of chunk IDs for chunks that have the given ID as a property value. A further index maps chunk types to the set of IDs for chunks with that type. This requires care to ensure that the indexes are kept up to date in respect to adding and removing chunks from a graph, as well as when the chunk type or chunk properties are updated.

Here is an implementation in JavaScript:

	// To mimic human learning and forgetting, the activation
	// of a chunk is modelled as a leaky capacitor where charge
	// is injected each time the chunk is recalled or updated,
	// and then decays over time. Related chunks are primed with
	// a fraction of the injected charge being divided across
	// linked chunks in a wave of spreading activation until a
	// cut-off threshold is reached.
	
	// This algorithm follows links unidirectionally from
	// properties to values, and needs to be extended to work
	// bidirectionally using an new index that lists chunks
	// with a type or property value equal to the given ID
	
	graph.activate = function (chunk) {
		// parameters for spreading activation
		const base = 1.0;
		const fraction = 0.5;
		const cutoff = 1E-5;
		const tau = 60000;  // arbitrarily 1 minute as mS
		
		// The spacing effect is that massed presentations have
		// reduced novelty, and are less effective for learning.
		// The logistic function is used to mimic the effect,
		// mapping the time interval since the chunk was last
		// recalled or updated to the boost in its activation,
		// see: https://en.wikipedia.org/wiki/Logistic_function
		function logistic () {
			return (1 + Math.tanh(x/2))/2;
		};
		
		function prime (chunk, boost) {
			chunk.activation += boost;
			
			// spread activation through linked chunks
			if (boost > cutoff) {
				// determine list of linked chunks
				let chunks = [];
				let props = chunk.properties;
				for (let name in props) {
					if (props.hasOwnProperty(name)) {
						let id = props[name];
						if (typeof (id) === "string" && id[0] !== '"') {
							let c = graph.chunks[id];
							if (c)
								chunks.push(c)
						}
					}
				}
			
				// prime the linked chunks
				if (chunks.length) {
					boost = boost*fraction/chunks.length;
			
					for (let i = 0; i < chunks.length; ++i) {
						prime (chunks[i], boost);
					}
				}
			}
		}
		
		let now = Date.now()
		let boost = base;
		
		if (chunk.timestamp)
			boost *= logistic(Math.log((now - chunk.timestamp)/tau));
			
		chunk.timestamp = now;
		prime(chunk, boost);
	}
	
	// used as part of stochastic recall of chunks where
	// where stronger chunks are more likely to be selected
	// This implementation uses the Box–Muller algorithm
	graph.gaussian = function (stdev) {
		const epsilon = 1E-20;
		const TwoPI = 2 * Math.PI;
		let u1, u2;
		do {
			u1 = Math.random();
			u2 = Math.random();
		} while (u1 < epsilon);
		return stdev*Math.sqrt(-2*Math.log(u1))*Math.cos(TwoPI*u2);
	};

Chunk recall first identifies matching chunks and for each match, applies gaussian noise to the chunk's activation level, and selects the matching chunk with the highest resulting score. The selected chunk is activated as above. Selection fails if the score is below a given threshold.

The gaussian distribution is centred around zero and drops off for negative and positive numbers. The graph.gaussian function above on average returns values close to zero, and more rarely large negative or positive numbers.

To apply gaussian noise to an activation level, multiply the level by e raised to the power of the noise value computed from graph.gaussian. The standard deviation should be a system wide constant.

For the memory test task, the successfully recalled items in the test are treated as an iteration (see @do next). Rules then have access to the number of items recalled as well as to the sequence of items. Items may failed to be recalled if their activation level is low, or if the stochastic noise depresses the score below the threshold.

Summary

Human memory is functionally modelled in terms of a graph of chunks where each chunk is associated with an activation level and a timestamp. Activation decays exponentially with time (like a leaky capacitor), but is boosted by recall or update, and via spreading activation in both directions through links between chunks. Recall is stochastic with noise being applied to the chunk activation level before comparison with a cut-off threshold.

Clarify "kindof" and prefix it with "@"

See related comment from @ngcharithperera:

(4) I got a bit confused by the use of the term 'KindOf'. what would be the difference between (Type, Type of or Is-a) Vs 'KindOf'. As a non-native to English speaker, I initially interpreted 'KindOf' as '…used when you are trying to explain or describe something, but you cannot be exact…' (https://dictionary.cambridge.org/dictionary/english/kind-of).

I wasn't sure whether you are trying to use 'KindOf' as an alternative to 'Is-a' or do you expect 'KindOf' to be able to handle ambiguity (some sort of probability)

@draggett noted that the intent is to distinguish between an instance of some type and a subtype of some type, and to provide built-in support in the rule language for convenience in matching along chains of subtypes. Names with built-in, i.e. reserved, semantics should start with an @, hence we could here use @isa and @kindof or similar names.

ISO8601 normative subset

ISO8061 has kinds of complex features in the full spec, e.g. week numbers and the use of commas for fractions, etc.

Chunks uses the subset defined by Misha Wolf and Charles Wicksteed in http://www.w3.org/TR/1998/NOTE-datetime-19980827

Chunk Grammar and railroad diagrams

The ABNF grammar format is hard to understand, so it would make sense to include the much easier railroad diagrams for productions. The following were produced using Gunther Rademacher's https://bottlecaps.de/rr/ui online generator with the following grammar:

chunksDoc ::= ws* (statement (sep statement)* ws*)?
sep ::= ws* (";" | #xA) ws* /* xA is linefeed or \n */
statement ::= link | rule | chunk
link ::= name ws+ name ws+ name
rule ::= chunklist ws* "=>" ws* chunklist
chunklist ::= chunk ( ws* ',' ws* chunk )*
chunk ::= type ws+ ID? ws* "{" property* "}"
type ::= name
ID ::= name
property ::= ws* name ws+ valuelist sep
valuelist ::= value (ws* "," ws* valuelist)*
value ::= boolean | number | name | ISO8601 | string
name ::= '@'? [a-zA-Z.-_:/]+ digit*
boolean ::= "true" | "false"
number ::= integer | decimal | double
integer ::= [+-]? digit+
decimal ::= [+-]? digit+ '.' digit+
double ::= [+-]? (digit+ '.' digit* exp | '.' digit+ exp | digit+ exp)
exp ::= [eE] [+-]? digit+
string ::= '"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"'
/* string literals exclude #x22=" #x5C=\ #xA=new line #xD=carriage return */
ws ::= #x20 | #x9 | #xD | #xA
/* #x20=space #x9=character tabulation #xD=carriage return #xA=new line */
ISO8601 ::= year ("-" month ("-" day ("T" hour ":" minute second? timezone?)?)?)?
/* a commonly used subset of the full ISO 8601 standard */
year ::= digit digit digit digit
month ::= digit digit
day ::= digit digit
hour ::= digit digit
minute ::= digit digit
second ::= ":" digit digit ("." digit+)?
timeszone ::= "Z" | ([+-] h h (":" m m)?)
h ::= digit
m ::= digit
digit ::= [0-9]

See PNG diagrams in chunk-grammar.zip
n.b. Github wouldn't accept the XHTML file with embedded SVG, and doesn't appear to accept SVG either, sigh.

Questions in respect to ACT-R

Chunks and rules are modelled after the design of ACT-R. The following describes feedback from the ACT-R community on some core design questions in respect to the operation of the rule engine, long last memories, and competition between the cortico basal-ganglia and cortico cerebellar circuits.

One such question relates to when the rule engine should for look for the next rule to execute, after executing a rule for which one of the actions is taking an extended period of time to complete, e.g. recall of a chunk with a low activation value. One possibility would be to wait until all of the rule?s actions have completed, but I wonder if that is cognitively plausible, given that the agent may need to attend to fresh input from say the visual module. Such an interrupt may leave the module buffers in a state that interferes with successful resumption of the rules for the task underway when the interrupt occurred.

The modular nature of the ACT-R system separates the firing of the rules from
the actions which are performed. The actions themselves are handled by the
appropriate modules which frees up the production system to immediately look
for another rule to apply. The rules are able to test the status of all of
the other modules and thus can be sensitive to whether they are busy or not
and can make new action requests that are appropriate e.g. stopping an action
that is currently busy but no longer needed or starting an alternate approach
given new input.

Comment: That may work for rules for a single task, e.g. when the first rule changes the goal state to counting, then no rules will match until the facts buffer is updated with the recalled value. It would imply the need to check for matching rules when any buffer is updated. I am concerned that when multiple tasks are interleaved, e.g. when responding to alerts, then the goal buffer may be left in the wrong state for the rule designed to handle a recalled value for the facts module.

A more robust architecture would make the updating of buffers into a transaction that occurs once all of modules has completed their action for the given rule. We can then interleave tasks safely. This would still allow for cancelling ongoing actions, although that would require a new term e.g. "@cancel invoke".

Another question relates to long lasting memories. ACT-R emulates the Ebbinghaus forgetting curve in which the probability of recall drops off exponentially over time, eventually dropping below a minimum threshold, and effectively being forgotten. However, human memory also supports successful recall over very long time scales for memories that were originally strong. Underwood (1957) showed that memory loss is largely attributable to interference with other memories. Strong memories can thus be recalled after an interval of many years provided that the interference is small. Has any work been done on revising ACTR's recall model to reflect interference theories of forgetting, and allowing for long term memories?

There are many papers written about modelling experiments over longer time
frames with ACT-R, and generally the existing activation equation works
well. When only the base-level learning component is used for long term
tasks the typical finding is that the exponent of the decay equation needs
to be decreased over the longer time period relative to the time spent on-
task, and here is a link to a paper on that:

Practice and retention: a unifying analysis (1999)

There has been discussion at ACT-R workshops as to why that should be, and
one of the important issues that comes up is that for tasks like that the
models aren't really doing anything in the intervening time. Thus using
a different decay rate may just be a modeling shortcut to avoid the need to
model all of the other things that happen during that time, and perhaps
there is additional practice of such items either explicitly (which are
not being modeled) or implicitly (by mechanisms not covered by the current
ACT-R theory) which would also account for the same data with a fixed
decay rate.

Comment: I don't really understand the above. One possibility is that old memories can become easier to retrieve if in the meantime, related memories are practiced, and these in turn help to boost the old memories.

Another issue is that the decay of activation is only one component of the
ACT-R activation equation. There is also a context component to activation
which works to increase the activation of items based on the current
context. Thus, even chunks which have decayed significantly over time
can have activations above the threshold if they are strongly related to
the current context. I don't know any specific papers to point you to in
that regard with respect to long term retrieval, but here is a paper from
early in the development of ACT-R which discusses context with respect to
interference:

A partial resolution of the paradox of interference: the role of integrating knowledge (1980)

Comment: The point about the role of spreading activation from the context is well taken, and a strong recent memory could boost a weak old memory sufficient to make it readily recallable. This suggests that old memories are only recallable via a suitable context to boost them. I want to stick to the leaky capacity model together with spreading activationg. In principle, I could, however, change the capacitance as a function of its charge, in order to allow for slower rate of decay for old memories.

For the ACT-R integer counting tutorial example, the first rule sets the goal buffer state to counting, and initiates a recall on the facts module. At this point there are no matching rules until the recall updates the facts buffer. Is there an explicit way for conditions to test that a module is still processing an action?

The query condition in a production tests the internal features that a module
makes available. All modules are required to provide a general state feature
which is either busy or free, and that is the most common test that is used.
They're described in the unit 1 tutorial text, and most tutorial models after
unit 1 use a lot of them to avoid issuing overlapping actions.

Comment: I could easily support @State with value free or busy for use in conditions.

Likewise, how do actions cancel or replace ongoing actions?

There is no general answer to that because there are no requirements
on the operation of a module with respect to the actions it takes or
how it processes them, and there are several different approaches among
the current ACT-R modules. Because of that, overlapping action requests
are typically avoided (it's referred to as jamming a module), but if
one knows the details of the module involved it may occasionally be a
useful thing to do.

If successive rules both initiate a recall action on the same module, are these pipelined so that both results are delivered to the module's buffer when ready, or does the second override and cancel the first?

For the declarative module, the operation is to cancel an ongoing request
and override it when a new one is received. It does not provide a way to
just cancel an ongoing request.

The imaginal module ignores new requests and continues with the ongoing
one until it is complete. It does not provide a way to cancel one.

The perceptual modules (vision and auditory) also ignore new requests
until an ongoing one completes, but do accept a request which will
cancel an ongoing action.

The motor modules (speech and motor) will pipeline new requests if
possible and provide additional queries to test the separate stages
of the pipeline. They also accept requests which can stop an ongoing
action if it hasn't yet passed the point of no return.

Comment: the above suggests that this is a matter for further exploration involving multitasking and reasoning at multiple levels. My idea for deferring buffer updates so that all buffers can be updated in a single transaction for each rule, seems worth investigation, as does the approach Dan describes for ACT-R.

A further question relates to models of attention and the potentially competing needs of the cortico basal-ganglia circuit and the cortico cerebellar circuit that is responsible for actions devolved to it by conscious control. Both circuits have access to the visual area of the cerebral cortex. This is neat in that it frees up the cortico basal-ganglia circuit to reason about the current situation, whilst the cortico basal-ganglia circuit gets on with real-time control of myriad muscles using visual input passed to it via the cortex. It falls down, when conscious awareness shifts the visual focus, denying the cerebellum of the visual information it needs for a current task.
Can you please point me at any work on ACT-R that looks at this dual circuit
model and the implications for models of attention in visual processing?

I don't know of any work in that regard. Most ACT-R work that I know about
is focused on the basal ganglia driven circuit. If anyone is looking into
that from an ACT-R perspective however, I would guess that it's Dr. Andrea
Stocco at the University of Washington: https://sites.uw.edu/ccdl/andrea-stocco/.

Comment: ACT-R has a limited scope, and further work is needed on visual processing, machine learning for motor skills, and emotional appraisal, all of which involve separate systems that complement the cortico basal-ganglia circuit.

Handling errors

Chunk operations such as get and put update the module's buffer status. The status is one of pending, okay, nomatch, failed, or forbidden. An example is where @do get is used to request a matching chunk, and no such match is found. In this case, the buffer is set to undefined, and the status to nomatch.

Should the buffer be cleared on such errors?

On the one hand, clearing the buffer makes sense given the failure to complete the operation. On the other hand, perhaps rules could be designed to use the previous value of the buffer before the operation was initiated.

Note that the JavaScript implementation of chunks currently doesn't implement failed or forbidden, but these would make sense for access controlled chunk graphs and for application specific actions. Rule conditions can (in principle) test the status with @status.

@context

The spec should define the required treatment of @context - in particular, if a rule names a context using @context, that rule will only match chunks that do likewise, and won't match chunks without @context. Contexts allow you to describe facts that only hold in a given context rather than generally. Contexts are needed for episodic memory, beliefs, stories, reported speech, lessons, abductive reasoning, and so forth.

See: https://github.com/w3c/cogai/blob/master/chunks-and-rules.md#statements-about-statements

Are chunk types needed and what if we were to make them optional?

Chunk types are useful for efficient indexing of chunks, and for better readability. Types are used in rule conditions and for some rule actions, e.g. to recall a matching chunk. For external actions, the type is just one piece of information to be passed to the action's implementation, and may be redundant, e.g. console is redundant in the following action as the implementation is selected using the name provided with @Do.

console {@do log; message "hello world"}

Making the type optional would introduce an ambiguity between the chunk type and the already optional chunk identifier. We could perhaps modify the syntax to remove the ambiguity, e.g. requiring a colon before the chunk identifier if it is present without the chunk type. e.g.

dog dog1 {name fido}
:p23 {name mia}
{name smokey}

But that feels rather awkward. Another idea would be to place the chunk type as a property, so that the example becomes:

dog1 {type dog; name fido}
p23 {name mia}
{name smokey}

That's better, but reduces readability compared to the existing syntax:

dog dog1 {name fido}
cat p23 {name mia}
cat {name smokey}

For now at least, I would recommend retaining the existing syntax where the chunk type is required, and can be replaced by an asterisk if it is unknown.

Tasks

Tasks allow you to write rules that are only applicable to specific tasks. Tasks are associated with modules, and a given module can have multiple active tasks at the same time. You can use @task to name a task in a rule condition. This will succeed if the named task is currently active for the module for that condition. The set of active tasks are held independently of the module's buffer. Clearing the buffer doesn't clear the tasks. In rule actions you can use @enter with the name of a task to enter, and @leave with the name of a task to leave. You can enter or leave multiple tasks by using comma separated lists of task names.

Tasks and contexts are complementary. You use @context to name a particular event/situation, e.g. having dinner at a restaurant, and @task to segregate rules for different tasks within the overall plan for having dinner (finding a table, reviewing the menu, ordering, paying the bill).

Note: it might be convenient to automatically leave sub-tasks when leaving a task. For this we would need a way to enter a sub-task, e.g. @subtask task1.1. This is part of the general challenge for reasoning about plans at multiple levels of abstraction, and being able to cope with the variety of different ways that natural language allows you to say more or less the same thing.

See https://github.com/w3c/cogai/blob/master/chunks-and-rules.md#tasks

Add example of special chunk type "*" to section 4.1

See related comment by @ngcharithperera:

(2) In Section 4.1 you talk about a special case. Is it possible to give example? (I could imagine how it looks like based on the information you provided, but I felt it would be better if you provide just in case if we misinterpret).

Scaling and indexing

This is in respect to enabling large scale cognitive databases and large rule sets. For now, I am focusing on RAM based databases, but I also want to understand the design choices for secondary storage, which could be measured in terabytes.

Right now chunks and rules are implemented as an open source JavaScript library with the following maps:

From chunk ID to the corresponding chunk, where IDs uniquely identify chunks within a given chunk graph
From chunk type to the set of chunks with that type
From chunk type and non-literal property values to the set of chunks in which they appear

This is just a start and needs improvement. The latter two maps currently use simple arrays for the set of chunks. If you want to get chunks with given values for the type and multiple properties, this involves the need to compute set intersection given multiple arrays where you are looking for the IDs that occur in each of the arrays.

What are good algorithms for that? One idea is to use a JavaScript associative array that maps an ID to an integer. You then iterate through each list and use the integer to count the number of arrays it appears in. When the count equals the number of arrays, you push the ID to an array for the results. The map can then be discarded.

This relies on JavaScript's implementation of maps. Would Bloom filters be better for really large databases? These sometimes give false positives, but never any false negatives. You thus need to verify the results, but the storage required is limited and lends itself to use in RAM even for sets that are too large to hold in RAM.

Another challenge is how to efficiently find which rules match the chunks held in the cognitive module buffers. Charles Forgy’s Rete algorithm is promising. A related idea is to compile rule conditions into a discrimination network. Whenever a module buffer is updated, this triggers the discrimination network to update the set of matching rules.

Forgy’s OPS5 rule language applied to the entire state of a database. For the cognitive rule language, rule conditions operate on module buffers that each hold a single chunk. There are only a few such modules, so in principle, we should be able to scale to much larger databases than OPS5. One complication is the need to update the discrimination network when adding, removing or revising rules.

I think that there are opportunities to improve performance using information about the expected utility of rules. This can be computed using reinforcement learning following the work done by John Anderson and colleagues at CMU.

Any advice you have on indexing algorithms would be much appreciated.

Chunks syntax: equivalence of links definitions

There are three main ways to describe a link between chunks:

A chunk property references another chunk identifier as in:

person John { likes Mary }
person Mary {}

The compact form is used:

person John {}
person Mary {}
John likes Mary

An explicit link gets created:

person John {}
person Mary {}
likes {
  @subject John
  @object Mary
}

The intro to chunks suggests that these syntaxes are equivalent, but I'm not sure what that means in practice:

If 1. is used, would a chunks parser automatically create the chunk in 3 (unless it already exists)? Or does that only happen when 2. is used?
If 2. or 3. is used, would a chunks parser automatically add a likes property to the chunk whose identifier is John (unless it already exists)?

If not, what does "equivalent" mean?

Chunks syntax: "@" prefix in links and reserved chunk types

Looking at Introduction to chunks and rules, I see that a @ prefix is envisioned for terms with special meanings.

The syntax allows links to be expressed as a compact form, meaning that:

dog kindof mammal

... is the same as

kindof {
  subject dog
  object mammal
}

This attaches special meanings to the subject and object properties. Shouldn't they be prefixed with @? In other words, shouldn't that rather be

kindof {
  @subject dog
  @object mammal
}

Same question with ISO 8601 date-time strings. The iso8601 chunk type seems to be automatically created. Shouldn't it be prefixed with @ so as not to override another iso8601 chunk type that people may want to create with their own meaning? What about individual properties of that chunk type?

Chunks syntax: Is "rule" a specific chunk type?

Looking at rule definitions in Introduction to chunks and rules, I see that all examples use a rule chunk type. Is this how rules are identified? Or are rules identified because they have @condition and @action properties?

If the former, same question as in #2, shouldn't rule be prefixed with @ since it's specific to the rule language itself?

Testing

This is a general reminder to update the existing test suite to ensure that it correctly handles all of the cases described in the chunks specification.

Chunks syntax: is "null" a possible property value?

Current text says chunk property values can be booleans, numbers, names, string literals enclosed in double quote marks, or a list thereof. Can a chunk property value also be null?

How can one update chunk properties prefixed with an @?

Added as an editor's note in PR #29: how can one update a chunk property prefixed with an @? Current @do update rules exclude @ properties from the list of properties updated in the module buffer.

However, it seems useful to update properties such as @subject, @object or @context. How can one do that?

Add example of chunk without identifier in section 4.2

See related comment from @ngcharithperera::

(3) In Section 4.2, you mentioned that identifier is optional. Is it possible to give an example to show how it looks like?

@draggett replied that:

The spec should further say something to the effect that the chunk ID is optional, and if missing, will be automatically assigned when adding the chunk to a graph. If the graph already has a chunk with the same ID, it will be replaced by this one.

Re. automatically assigning an identifier, where would that info be exposed? That is, if someone loads a chunk that was defined without identifier in a module buffer, will they see the automatically assigned identifier? If they don't, why mention it?

Built-in actions

Earlier versions of chunks use @do recall and @do remember, but these terms have multiple meanings in English. To avoid the potential for misunderstanding, and to benefit from a familiarity with HTTP, the names have been switched:

@do update to synchronously update the module buffer (this is the default action)
@do clear to synchronously clear the module buffer
@do get to request a matching chunk which is loaded into the module buffer
@do put to create a new chunk or overwrite one with the same ID
@do patch similar to put but only updates the properties passed in the action
@do delete delete matching chunks

In addition, you can use @status in a condition to check the status of the response to a previous request. It is set to pending, okay, forbidden, nomatch and failed as appropriate. To relate particular pairs of requests and responses, you can pass an identifier in an action with @tag and test for it with @tag in a subsequent rule condition.

compiling chunk rules

Chunk rules can be used to manipulate and update other rules. The built-in meaning of terms that start with '@' gets in the way. To work around that you can map a chunk rule in the rule model to a set of chunks in another module, where the '@' terms are mapped to terms you can readily manipulate. The reverse process maps set of chunks in another module to a rule in the rule model, reversing the mapping.

This needs to be documented in the chunks and rules specification in terms of @compile, @uncompile, @Map and @source

Here is what I wrote some years back:

The @compile property can be used with a chunk identifier to compile a set of chunks into a rule. This is needed as the use of @ terms in goals and rules interferes with retrieving or storing chunks involving these terms. The compilation process maps to these terms when copying chunks to the rule module. The default mapping simply inserts an @ character before the name, e.g. mapping do to @do. If the application needs to use the reserved terms for other purposes, you can reference your own map to the standard terms by using @map to reference a chunk with the map, e.g. if you wanted to use m instead of module, and diff instead of distinct:

@map {
    m module
    diff distinct
}

Note that for compile, @source identifies the module for the chunk referenced by @compile. In principle, there could be an @uncompile property which takes a chunk identifier for a rule in the rule module, and puts the mapped rule chunks into the module referenced by @source, and at the same time, placing the corresponding rule chunk into that module's buffer. This would provide an opportunity for inspection over procedural knowledge. Further work is needed to check whether this capability is really needed. See below for a brief discussion of the potential for declarative reasoning over rules as part of the process of learning how to address new tasks.

Issue: perhaps we should use @module instead of @source given that the rule module is implicit, on the assumption that there is only one rule module.

A rule is a chunk with the properties @condition and @action. The property values are the list of chunks for the conditions and actions respectively. There is a suite of @do actions that can be used to manipulate and update chunks. A rule could have an action with an @compile property whose value is the chunk ID for a rule to be compiled to the rule module. The same action could have an @map property to reference one or more @map chunks as explained above.

Negation operator "!"

The informal document that introduces chunks describes a negation operator ! for conditions:

!present {@module facts; person Mary; room room1}

... which will match the facts module buffer after a failure to get a chunk of type present with the corresponding properties.

This operator needs to be formalized. Things to resolve:

There needs to be a way to express the same thing in expanded form, for instance
```
rule r1 {
  @condition !c1
  @action a1
}
```
The above example shows that we can probably reuse the ~ operator in the expanded form. Which begs the question as to whether we need a distinct character for the compact form, which could rather be:
```
~present {@module facts; person Mary; room room1}
```
Actually, if we reuse the same operator, wouldn't ! be preferrable to ~ as a more universal negation operator?

Priority Queues

In a real-time environment, events can be notified by setting the goal module's buffer. If two processes independently set the goal buffer in a very short time interval, the first event will be overwritten before the rule engine has a chance to respond to it. To avoid this, the rule engine supports a priority queue for module buffers. Chunks can be pushed to the queue and are placed in the queue according to their priority. The priority is not exposed in the rule language. The rule engine pops the queue after executing a rule provided that none of the actions in that rule explicitly updated the buffer. In the JavaScript library (chunks.js) the priority is an integer in the range 1 to 10 with 1 as the lowest priority and 10 the highest.

Better visualize modules in introduction

@ngcharithperera commented:

(1) I wasn't sure how to visualise 'modules' in related to Figure 1. In your figure 1, are modules represented in blue colour?

First step would be to update the figure to:

It would probably be good to update the diagram to clearly delineate the modules: a module both encompasses a graph of chunks (the blue/yellow box) and a module buffer (the green box).

In any case, explanatory text should be added.

Mapping between Chunks and RDF

The chunks and rules spec has an empty section on the mapping with RDF. This needs to be fleshed out using the information at: chunks and rules - mapping with RDF

The essential idea is to map names to RDF URIs in a manner similar to that used by JSON-LD. This includes support for base URLs, the use of prefixes for brevity, and links to external definitions.

Which modules get created by default?

See related comment from @ngcharithperera:

(5) You have identified 'goal module' as a special module which will be created by the rule engine. Based on your diagram (Figure 1) I assume there will be 1 goals module and 1 rules module. Is this correct? Is there anything special about 'rules module'? Will 'rules module' get created automatically (similar to 'goal module')? Is it correct to assume long term memory (local and remote) are 'facts modules'?

@draggett noted that:

The rule engine assumes the “goal” module for a condition or action chunk if the module name is not given explicitly with @module. This is for convenience in authoring rules, and based upon experience in writing demos.

The rule engine assumes that the rules are held as chunks in the “rules” module following a similar approach in ACT-R. Another design choice would be to allow applications to register modules as containing rules, and for modules to contain a mix of facts and rules.

The specification needs to clarify:

which modules get created by default
whether it is expected that additional modules can be created
whether modules can mix chunks of different types (facts and rules) and can be targeted interchangeably as facts or rules buffers
whether modules can be read-only
...

The Rule engine execution section would be the perfect place to write these rules down.

Avoiding loops

Looping is a problem for badly designed rule sets. For machine generated rules, the system can compare how long a rule set takes compared to what is expected, and to forcefully abandon execution if it is taking too long. This can be combined with reinforcement learning to evolve effective rule sets.

In principle, a reasoning system can help to propose good rules that follow proven design patterns. This can involve some concept of recipes that constrain random choices. Reinforcement learning should be applicable to such recipes!

Rules need to be written to avoid the same rule being repeatedly fired without any progress. In the counting demo, the goal is updated to drive progress on the succeeding digit. The rule's conditions ensure that the rule isn't reapplied after processing the last digit.

By contrast, some other demos invoke external actions that push new goals to the goal queue, and the current goal needs to be cleared to prevent the current rule looping, and to make way for acting on the new goals in the queue. Ideally, you wouldn't need to explicitly clear the current goal, but it is proving hard to figure out how to do that implicitly.

The robot demo requires clearing the goal buffer which will then pop the queue if there is a pending goal. For the start rule, the actions initiate waits that in some cases are immediately satisfied, resulting in pushing goals to the goal queue.

If there is @do clear at start of the actions, the start goal is cleared. A subsequent call to pushBuffer then triggers the rule engine. If @do clear is at the end of the actions, it will result in a call to popBuffer which likewise will trigger the rule engine.

How could we make the @do clear implicit?

Looping on the same goal doesn't make any sense for this demo as there are no changes to it, and the same rule would re-apply. In other words, if at the end of this rule, its conditions are still all true, we need to make a change.

If the rule only involves a single buffer, it is clear that we should clear that buffer, but if multiple buffers are involved, what should we do? This involves deducing which buffer needs to be cleared/popped, how? What are some cases of interest?

Changing a property is usually a means to prepare for the next execution of this rule, e.g. as in the counting demo. However, not all such changes will be correct and an indefinite loop may still be possible. Another scenario is where one buffer stays the same but another changes, so that the same rule no longer applies.

I suspect the only way to be sure is to re-evaluate the rule's conditions. However, that isn't sufficient, as we still don't know what to do. Doing nothing will keep the old goal in the buffer despite there being new goals in the queue. The only idea I have right now is to introduce a syntax to clear the buffer for a given condition. However, that isn't a real advantage over the explicit @do clear.

associative search across multiple modules

Quoting from the Wikipedia article on Semantic Memory:

Some believe semantic memory lives in temporal neocortex. Others believe that semantic knowledge is widely distributed across all brain areas. To illustrate this latter view, consider your knowledge of dogs. Researchers holding the 'distributed semantic knowledge' view believe that your knowledge of the sound a dog makes exists in your auditory cortex, whilst your ability to recognize and imagine the visual features of a dog resides in your visual cortex. Recent evidence supports the idea that the temporal pole bilaterally is the convergence zone for unimodal semantic representations into a multimodal representation. These regions are particularly vulnerable to damage in semantic dementia, which is characterised by a global semantic deficit.

This raises the question of how to support associations that span multiple modules. You could define rules that invoke separate queries for each of the modules, but that has performance and scalability issues. We therefore should explore ideas for how to support efficient associative search that combines unimodal semantic representations into a multimodal representation.

The same Wikipedia page intriguingly says:

A new idea that is still at the early stages of development is that semantic memory, like perception, can be subdivided into types of visual information—color, size, form, and motion. Thompson-Schill (2003) found that the left or bilateral ventral temporal cortex appears to be involved in retrieval of knowledge of color and form, the left lateral temporal cortex in knowledge of motion, and the parietal cortex in knowledge of size.

Here are some related quotes, from Creating Concepts from Converging Features in Human Cortex, Marc N. Coutanche and Sharon L. Thompson-Schill

We encounter millions of objects during our lifetime that we recognize effortlessly. We know that a lime is green, round, and tart, whereas a carrot is orange, elongated, and sweet, helping us to never confuse the wedge on our margarita glass with our rabbit's favorite treat. One property (feature) alone is typically insufficient: Celery can also be green; tangerines are orange. Instead, we use the unique convergence of features that defines an object. How does our brain bind these sensorimotor features to form a unique memory representation?

and

Specifically, the “hub-and-spoke” model proposes that while sensory and verbal information is processed in modality-specific regions, a hub, based in the anterior temporal lobe (ATL), contains a high-dimensional modality-independent semantic space that allows computations to be based on semantic information rather than purely sensory similarities.This is analogous to a “hidden layer” in neural network models, which enables computation of nonlinear relationships between the information coded in sensory layers.

How are these different parts of the cortex connected and what functions are involved? What is the impact of the communication costs between different parts of the cortex?

To put that into context, the rule-engine buffers are analogous to HTTP clients, and the cognitive modules to HTTP servers. Efficient associative search across cognitive modules would seem to involve some form of module to module communication that works with sets of chunks to activate sub-graphs in the different modules to form a graph that spans modules. This imposes functional requirements on the inter module messaging, and should be explored via building a series of demonstrators for appropriately chosen use cases.

computational models of unconscious thought

Conscious thought is sequential and open to inspection, leaving a trail (autobiographical memory). However, conscious thought is constrained by extremely limited working memory. Unconscious thought, by contrast, can handle lots of information efficiently in relatively simple ways, but is not open to inspection. This issue gathers together ideas on unconscious thought and how it can be functionally modelled in terms of graph algorithms.

The Limbic system supports emotional processing providing for emotional control over cognition. Can this be modelled using feed-forward networks with some form of back propagation for training them?
There is Sharon L. Thompson-Schill's hub and spoke model for integration of information across different cortical regions. This could perhaps be implemented as graph query algorithms with message passing across cortical modules. These algorithms should also relate to the kinds of processing needed for statistical predictions and emotional evaluations.
Ap Dijksterhuis has investigated the role of unconscious and conscious thought in decision making. He showed that people are able to rank choices involving multiple criteria more effectively using subconscious reasoning than using conscious reasoning, which imposes processing constraints. It should be possible to functionally emulate this for examples used in his paper. In particular, he describes an experiment in which different apartments are described with a set of positive and negative attributes. This suggests the use of limbic system to evaluate the different choices from an emotional perspective.
Shahram Heshmat has likened the brain to a prediction machine that is continuously trying to predict incoming information based on past experiences. The discrepancy between the predictions made by the brain and the actual sensory input is a source of surprise, drawing conscious attention, and stimulating learning. This suggests the use of a system for statistical predictions of behaviour that is constantly being updated by observations. This needs to be able to work with sparse data, avoiding the need for vast numbers of observations. This relates to n-grams and Markov models, e.g. for predicting the next word based on the preceding words. Another approach is to use LSTM neural networks.
Natural language understanding involves a process for selecting the word sense and grammatical role for a word given the preceding words, the dialogue history, episodic and semantic memory. Other processes are needed to resolve references from nouns and pronouns, and for selecting between different ways to attach prepositions. An open question is what can be implemented effectively using chunk rules and what needs to be implemented as graph algorithms. Experimental work has shown that people are able to unconsciously learn patterns in artificial languages, see e.g. The role of familiarity in implicit learning.

The next step will be to identify some scenarios for building demos as a means to explore different algorithms. For natural language, I am working on a dialogue with a waiter at a restaurant, as the language usage and meaning are well defined. The Dijksterhuis task of ranking apartments could be a good choice for similar reasons. Further work is needed to identify practical scenarios for emotional reasoning in a social context, and for learning to spot anomalous behaviours.