The cluedin from romaklimenko

Rule Actions

Rules

The scope of this issue is two features to support CluedIn Rules:

filters evaluation
actions execution
code generation

In both cases, we need to consider that a rule consists of:

conditions
actions

Where each action can have its own conditions.

Hence, whether to execute an action or not, we must evaluate the rule's conditions first and then the action's conditions.

For rules evaluation, we need to map raw entities' properties to data frame column names.

We also need methods to get all rules (of a given type: data part, golden record, or survivorship).
And we need to get one rule.

Hence, a draft plan for this scope is:

Setup GitHub Actions

Read GitHub Actions docs.
Check what semver tells us about prerelease and PR versions.
Check the PyPi policy about prereleases and PR versions.
Generate a PyPi token and set it in GitHub
Push a PR package when a PR is created (why? I code alone 🤔)
Push a pre-release package when there's a new commit in the main branch (what if I don't change code?)
Push a release package when a release is created.

Rules Code Generation

Flatten entity properties

We already have the entries method:

Instead of writing this every time:

def flatten_properties(d):
    for k, v in d['properties'].items():
        if k == 'attribute-type':
            continue
        
        if k.startswith('property-'):
            k = k[9:] # len('property-') == 9
        
        k = k.replace('.', '_')

        d[k] = v

    del d['properties']

    return d

df_titles = pd.DataFrame(
    map(
        flatten_properties,
        cluedin.gql.entries(ctx, query, { 'query': 'entityType:/IMDb/Title', 'pageSize': 10_000 })))

Add cluedin.utils.flatten_properties method and use it by default for cluedin.gql.entries:

df_titles = pd.DataFrame(cluedin.gql.entries(ctx, query, { 'query': 'entityType:/IMDb/Title', 'pageSize': 10_000 }))

Which is equivalent to:

df_titles = pd.DataFrame(
  cluedin.gql.entries(
    context=ctx,
    query=query,
    variables={ 'query': 'entityType:/IMDb/Title', 'pageSize': 10_000 },
    flat=True))

Paged results

For search GraphQL queries, implement an iterator (or a generator) to return paged results.

API should be like:

cluedin.gql.entries(context, query, variables)

Inside the entries method, send a GraphQL request, and if there's a cursor and entries in the response, yield the entries and request the same GraphQL but with the cursor this time.

Custom Operators

Get Rules

import cluedin

ctx = cluedin.Context.from_json_file(os.environ['CLUEDIN_CONTEXT'])
ctx.get_token()

rules = cluedin.rules.get_rules(ctx, scope=cluedin.rules.scope.DATA_PART)

GraphQL:

query getRules($searchName: String, $isActive: Boolean, $pageNumber: Int, $sortBy: String, $sortDirection: String, $scope: String) {
  management {
    id
    rules(
      searchName: $searchName
      isActive: $isActive
      pageNumber: $pageNumber
      sortBy: $sortBy
      sortDirection: $sortDirection
      scope: $scope
    ) {
      total
      data {
        id
        name
        order
        description
        isActive
        createdBy
        modifiedBy
        ownedBy
        createdAt
        modifiedAt
        author {
          id
          username
          __typename
        }
        scope
        isReprocessing
        __typename
      }
      __typename
    }
    __typename
  }
}

with variables

{
  "scope": "DataPart"
}

Property Mapper

GraphQL

import cluedin

context = cluedin.utils.load(os.environ['CLUEDIN_CONTEXT'])

cluedin.load_token_into_context(context)

query = """
    query searchEntities($cursor: PagingCursor, $query: String, $pageSize: Int) {
      search(query: $query, sort: DATE, cursor: $cursor, pageSize: $pageSize) {
        totalResults
        cursor
        entries {
          id
          name
          entityType
        }
      }
    }
"""

variables = {
  "query": "entityType:/Infrastructure/User",
  "pageSize": 1
}

response = cluedin.gql.gql(context, query, variables)

1.0.0 preparations

README.md
Publish the package
Jupyter Notebook
Postman Collection

Evaluate a Rule

evaluator = cluedin.rules.get_evaluator(rule)

filtered_entities = evaluator.get_matching_entitties(rule, entities)

list of built-in operators (extendable)
[ ]

Postman collection

Commit the Postman collection with CluedIn API calls and keep it up to date.

Context must be a class

It's better late than never: the context passed to each method is a dict now, but it would be better if it were an object of a class.

It may be considered as a breaking change, but actually, we just need to change the logic of load_token_into_context to return a Context instead of a dict, and deprecate it.

Post a Clue to restore a User

Delete a /Infrastructure/User entity.
Log in with this user. The UI will be broken because the user is still in AspNetUsers, but there's no corresponding entity.

The goal is to post a Clue to restore this entity.

GraphQL for `/graphql`

cluedin.gql.gql calls the {api_url}/graphql but CluedIn provides another GraphQL endpoint at {org_url}/graphql – this is what we need to support.

`{{api}}/entity` API

/entity/blob
/entity/schema
/entity/clue

`{{auth}}/api/account/` API

Account:

{{auth}}/api/account/accounts
{{auth}}/api/account/accounts?organizationId={{organization_id}}

Availability:

{{auth}}/api/account/available?clientId={{organization}}
{{auth}}/api/account/username?username={{user}}&clientId={{organization}}

Register:

InvitationCode
new
Register

User:

user
user?id={}

Get a Rule

import cluedin

ctx = cluedin.Context.from_json_file(os.environ['CLUEDIN_CONTEXT'])
ctx.get_token()

rule = cluedin.rules.get_rule(ctx, id='...')

Search

import cluedin

context = {
    "org": "http://foobar.cluedin.local",
    "user": "[email protected]",
    "password": "Foobar23!"
}

cluedin.get_token(context)

search_results = cluedin.graphql.search(
    context=context,
    query="*",
    sort="DATE",
    payload=[ "totalResults", "cursor" ],
    entries=[ "id", "name" ])

Post a Clue
Restore accidentally deleted User entities

Save and Load

Add the following helping methods:

cluedin.utils.load_json(file)

cluedin.utils.save_json(obj, file)

Publish a PyPi package for CluedIn API

Name: cluedin

Basic usage:

import cluedin

context = {
    "protocol": "http", # default - `https`
    "domain": "cluedin.local",
    "organization": "foobar",
    "user": "[email protected]",
    "password": "Foobar23!"
}

cluedin.load_token_into_context(context)

cluedin.graphql.search(context, ...)

The search API to be designed and implemented in #3.

romaklimenko / cluedin Goto Github PK

cluedin's People

Contributors

Stargazers

Watchers

cluedin's Issues

Recommend Projects

Recommend Topics

Recommend Org