Coder Social home page Coder Social logo

cluedin's People

Contributors

dependabot[bot] avatar romaklimenko avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cluedin's Issues

Rules

The scope of this issue is two features to support CluedIn Rules:

  • filters evaluation
  • actions execution
  • code generation

In both cases, we need to consider that a rule consists of:

  • conditions
  • actions

Where each action can have its own conditions.

Hence, whether to execute an action or not, we must evaluate the rule's conditions first and then the action's conditions.

For rules evaluation, we need to map raw entities' properties to data frame column names.

We also need methods to get all rules (of a given type: data part, golden record, or survivorship).
And we need to get one rule.

Hence, a draft plan for this scope is:

  • get all rules
  • get a rule
  • entity to data frame (properties) convertors
  • convert a rule to an evaluator
  • actions execution
  • code generation (stretch goal)

Setup GitHub Actions

  • Read GitHub Actions docs.
  • Check what semver tells us about prerelease and PR versions.
  • Check the PyPi policy about prereleases and PR versions.
  • Generate a PyPi token and set it in GitHub
  • Push a PR package when a PR is created (why? I code alone ๐Ÿค”)
  • Push a pre-release package when there's a new commit in the main branch (what if I don't change code?)
  • Push a release package when a release is created.

Flatten entity properties

We already have the entries method:

Instead of writing this every time:

def flatten_properties(d):
    for k, v in d['properties'].items():
        if k == 'attribute-type':
            continue
        
        if k.startswith('property-'):
            k = k[9:] # len('property-') == 9
        
        k = k.replace('.', '_')

        d[k] = v

    del d['properties']

    return d

df_titles = pd.DataFrame(
    map(
        flatten_properties,
        cluedin.gql.entries(ctx, query, { 'query': 'entityType:/IMDb/Title', 'pageSize': 10_000 })))

Add cluedin.utils.flatten_properties method and use it by default for cluedin.gql.entries:

df_titles = pd.DataFrame(cluedin.gql.entries(ctx, query, { 'query': 'entityType:/IMDb/Title', 'pageSize': 10_000 }))

Which is equivalent to:

df_titles = pd.DataFrame(
  cluedin.gql.entries(
    context=ctx,
    query=query,
    variables={ 'query': 'entityType:/IMDb/Title', 'pageSize': 10_000 },
    flat=True))

Paged results

For search GraphQL queries, implement an iterator (or a generator) to return paged results.

API should be like:

cluedin.gql.entries(context, query, variables)

Inside the entries method, send a GraphQL request, and if there's a cursor and entries in the response, yield the entries and request the same GraphQL but with the cursor this time.

Get Rules

import cluedin

ctx = cluedin.Context.from_json_file(os.environ['CLUEDIN_CONTEXT'])
ctx.get_token()

rules = cluedin.rules.get_rules(ctx, scope=cluedin.rules.scope.DATA_PART)

GraphQL:

query getRules($searchName: String, $isActive: Boolean, $pageNumber: Int, $sortBy: String, $sortDirection: String, $scope: String) {
  management {
    id
    rules(
      searchName: $searchName
      isActive: $isActive
      pageNumber: $pageNumber
      sortBy: $sortBy
      sortDirection: $sortDirection
      scope: $scope
    ) {
      total
      data {
        id
        name
        order
        description
        isActive
        createdBy
        modifiedBy
        ownedBy
        createdAt
        modifiedAt
        author {
          id
          username
          __typename
        }
        scope
        isReprocessing
        __typename
      }
      __typename
    }
    __typename
  }
}

with variables

{
  "scope": "DataPart"
}

GraphQL

import cluedin

context = cluedin.utils.load(os.environ['CLUEDIN_CONTEXT'])

cluedin.load_token_into_context(context)

query = """
    query searchEntities($cursor: PagingCursor, $query: String, $pageSize: Int) {
      search(query: $query, sort: DATE, cursor: $cursor, pageSize: $pageSize) {
        totalResults
        cursor
        entries {
          id
          name
          entityType
        }
      }
    }
"""

variables = {
  "query": "entityType:/Infrastructure/User",
  "pageSize": 1
}

response = cluedin.gql.gql(context, query, variables)

Evaluate a Rule

evaluator = cluedin.rules.get_evaluator(rule)

filtered_entities = evaluator.get_matching_entitties(rule, entities)
  • list of built-in operators (extendable)
  • [ ]

Postman collection

Commit the Postman collection with CluedIn API calls and keep it up to date.

Context must be a class

It's better late than never: the context passed to each method is a dict now, but it would be better if it were an object of a class.

It may be considered as a breaking change, but actually, we just need to change the logic of load_token_into_context to return a Context instead of a dict, and deprecate it.

Post a Clue to restore a User

  1. Delete a /Infrastructure/User entity.
  2. Log in with this user. The UI will be broken because the user is still in AspNetUsers, but there's no corresponding entity.

The goal is to post a Clue to restore this entity.

GraphQL for `/graphql`

cluedin.gql.gql calls the {api_url}/graphql but CluedIn provides another GraphQL endpoint at {org_url}/graphql โ€“ this is what we need to support.

`{{auth}}/api/account/` API

Account:

  • {{auth}}/api/account/accounts
  • {{auth}}/api/account/accounts?organizationId={{organization_id}}

Availability:

  • {{auth}}/api/account/available?clientId={{organization}}
  • {{auth}}/api/account/username?username={{user}}&clientId={{organization}}

Register:

  • InvitationCode
  • new
  • Register

User:

  • user
  • user?id={}

Get a Rule

import cluedin

ctx = cluedin.Context.from_json_file(os.environ['CLUEDIN_CONTEXT'])
ctx.get_token()

rule = cluedin.rules.get_rule(ctx, id='...')

Search

import cluedin

context = {
    "org": "http://foobar.cluedin.local",
    "user": "[email protected]",
    "password": "Foobar23!"
}

cluedin.get_token(context)

search_results = cluedin.graphql.search(
    context=context,
    query="*",
    sort="DATE",
    payload=[ "totalResults", "cursor" ],
    entries=[ "id", "name" ])

Implement Data Parts Rules Operators

  1. Begins With
  2. Contains
  3. Ends With
  4. Equals
  5. Exists
  6. In
  7. Is Not Null
  8. Is Null
  9. Matches Pattern
  10. Not Begins With
  11. Not Contains
  12. Not Ends With
  13. Not Equal
  14. Does Not Exist
  15. Not In
  16. Does Not Match Pattern
  17. Greater
  18. Greater or Equal
  19. Less
  20. Less or Equal
  21. Between
  22. Not Between

Public API

  • Post a Clue
  • Restore accidentally deleted User entities

Save and Load

Add the following helping methods:

cluedin.utils.load_json(file)
cluedin.utils.save_json(obj, file)

Publish a PyPi package for CluedIn API

Name: cluedin

Basic usage:

import cluedin

context = {
    "protocol": "http", # default - `https`
    "domain": "cluedin.local",
    "organization": "foobar",
    "user": "[email protected]",
    "password": "Foobar23!"
}

cluedin.load_token_into_context(context)

cluedin.graphql.search(context, ...)

The search API to be designed and implemented in #3.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.