Coder Social home page Coder Social logo

cncf / landscape-graph Goto Github PK

View Code? Open in Web Editor NEW
37.0 13.0 11.0 104.7 MB

CNCF Landscape Graph, data model, and applications.

Home Page: https://github.com/orgs/cncf/projects/7/views/6

License: Other

TypeScript 0.01% Shell 0.23% Dockerfile 0.01% JavaScript 0.06% Cypher 0.05% Python 0.76% Jupyter Notebook 91.36% Batchfile 0.01% Roff 7.51%
cloudnative cncf cypher graph-data-science graphql javafx knowledge-graph landscape neo4j flat-data

landscape-graph's Introduction

CNCF Landscape Graph

Initial, open, active development.

Join us @ #landscape-graph. Here's our current activities. Formal plan and roadmap are in progress.


Often, we need to understand how an open source project interacts with others, how it's changing over time, and who's enabling it's continued success. We want to understand what alternatives exist, or how complementary projects might be combined in purpose-fit or novel ways. We might want to dive in and contribute! This is how projects and ecosystems grow to meet business challenges facing modern organizations.

Landscape Graph Data Model

Graphs can facilitate rich analysis of our vibrant and dynamic communities, the humans they comprise, and the clusters of contribution and thought leadership they produce.

Using the data underlying the existing landscape as input, a Labeled Property Graph (LPG) is constructed using Cypher (SQL for Graphs), resulting in a Neo4j graph database.

Here's the schema:

landscape-graph-data-model


"Origin Story"

In November of 2018 there were 25 CNCF projects.

At the time Ayrat Khayretdinov published the "Beginner's Guide to the CNCF Landscape." It opened with:

The cloud native landscape can be complicated and confusing. Its myriad of open source projects are supported by the constant contributions of a vibrant and expansive community. The Cloud Native Computing Foundation (CNCF) has a landscape map that shows the full extent of cloud native solutions, many of which are under their umbrella.

It described the CNCF Mission in these terms:

The CNCF fosters this landscape of open source projects by helping provide end-user communities with viable options for building cloud native applications. By encouraging projects to collaborate with each other, the CNCF hopes to enable fully-fledged technology stacks comprised solely of CNCF member projects. This is one way that organizations can own their destinies in the cloud.

We. Have. Grown.

Today there are 5.4 million humans using Kubernetes and the landscape continues to expand.

2022 Q2 Cards cap funding
projects 111 614,394 $291.4 M $29.6 M
ecosystem 1,061 3,066,372 $15.7 T $29.1 B

We have a "good" problem

The CNCF Landscape aggregates summary data from GitHub, Crunchbase, Yahoo Finance, Twitter, and other sources while providing the ability quickly find, filter, and group the more than 1000 Cards across numerous dimensions. It is automagically updated daily. It continues to work as designed.

landscape-all

With a single well placed click a wealth of data can be summoned. Here's the "Card" for Neo4j

neo4j-card

This is perfect when we know what we're looking for (specifically).

Technical TLDR

source: grandstack.io/docs/...

GRANDstack is a combination of technologies that work together to enable developers to build data intensive full stack applications. The components of GRANDstack are:

  • GraphQL - A new paradigm for building APIs, GraphQL is a way of describing data and enabling clients to query it.
  • React - A JavaScript library for building component based reusable user interfaces.
  • Apollo - A suite of tools that work together to create great GraphQL workflows.
  • Neo4j Database - The native graph database that allows you to model, store, and query your data the same way you think about it: as a graph.

Here's how it all fits together in the context of a movie search app:

grand-arch

Additional tools and frameworks

TODO: #27

Component What it is
Neo4j GraphQL Library {neo}/product/graphql-library, (dev blog)
Neo4j Streams {neo}/labs/kafka, {gh}/neo4j-contrib/neo4j-streams
gitbase Git history as MySQL, src-d/gitbase
JavaFX UI, 3d, openjfx.io
Quarkus AoT, minify, Dev UX, quarkus.io

Graph Data Science Algorithms ("Why Neo4j?")

https://neo4j.com/developer/graph-data-science/graph-algorithms

Graph Databases “perform the join on insert” instead of query time. No joins or table scans required.

A graph data model (vs. rectangular relational) can bring to bear all that we’ve learned from ad/fin/security tech, big data, ml, etc.

graph-data-science-pic

Graph Data Science Algorithm Types

Docs --> https://neo4j.com/docs/graph-data-science/current

Type Definition
Path Finding Help find the shortest path or evaluate the availability and quality of routes
Centrality Determine the importance of distinct nodes in a network
Community Detection Evaluate how a group is clustered or partitioned, as well as its tendency to strengthen or break apart
Similarity Help calculate the similarity of nodes
Topological link prediction Determine the closeness of pairs of nodes
Node Embeddings Compute vector representations of nodes in a graph.
Node Classification Uses machine learning to predict the classification of nodes.
Link prediction Use machine learning to predict new links between pairs of nodes.

Cypher ("SQL for Graphs")

https://github.com/opencypher/openCypher

Cypher is a declarative graph query language that allows for expressive and efficient querying, updating and administering of the graph. It is designed to be suitable for both developers and operations professionals. Cypher is designed to be simple, yet powerful; highly complicated database queries can be easily expressed, enabling you to focus on your domain, instead of getting lost in database access.

On its influences and roots:

Cypher is inspired by a number of different approaches and builds on established practices for expressive querying. Many of the keywords, such as WHERE and ORDER BY, are inspired by SQL. Pattern matching borrows expression approaches from SPARQL. Some of the list semantics are borrowed from languages such as Haskell and Python. Cypher’s constructs, based on English prose and neat iconography, make queries easy, both to write and to read.

How to Contribute

License

This repository contains data received from Crunchbase. This data is not licensed pursuant to the Apache License. It is subject to Crunchbase’s Data Access Terms, available at https://data.crunchbase.com/docs/terms, and is only permitted to be used with Linux Foundation landscape projects.

Everything else is under the Apache License, Version 2.0, except for project and product logos, which are generally copyrighted by the company that created them, and are simply cached here for reliability.

landscape-graph's People

Contributors

dependabot[bot] avatar flat-data avatar halcyondude avatar jeefy avatar kumarankit999 avatar rohankmr414 avatar xonx4l avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

landscape-graph's Issues

GitHub Issue { labels, templates}

  • reach out to contributor strategy for guidance
  • put in place sane/rational templates as a starting point
  • put in place sane/rational labels as a starting point
  • push learnings/feedback/our templates back to tag-contributor-strategy

MVP: base data model (graph) + app

Scope for Minimum Viable Product (MVP)

  • #38
  • import --> Neo4j database (via cypher), base entities in place
  • #7
  • minimal web based UI to explore graph
  • JavaFX native app (running Neo4j in embedded mode)

Stretch Goals

architecture: per-component overviews

For each component of landscape-graph create a .md file articulating:

  • what it is
  • how it works
  • what it provides
  • why it was chosen

This forms the precursor to proper docs.

Scope

  • Neo4j
  • Neo4j ETL Tool
  • gitbase
  • JavaFX
  • (to be determined): JavaScript library for front door (neoviz.js, ...)

Create GraphQL endpoint, leveraging neo4j graphql library v3

GraphQL schema --> source of truth

Tasks

  • #52

  • MVP CNCF Schema

    • export data model from arrows.app --> landscape.graphql
    • generate full schema from types
    • srg-xyz type heirarchy
    • LandscapeEntity (base type)
      • Card (base)
        • Member
        • Project
        • TAG
        • TOC
        • EUG
        • Person
  • use schema to drive data model instantiation --> neo

Moved to new/other issue(s):

  • #51
  • #96
  • constraints and indexes, full text where it makes sense(#73, #20)
  • use graphql to load data via mutations. (#63)

More Info


resources

[spike] Interactive Extensibility: "sub-graph packs (sgp)"

.
├── blogs
│   └── sgp-blogcncf
├── boards
│   ├── sgp-ghdiscuss
│   └── sgp-stackoverflow
├── core
│   └── generated
├── corp
│   ├── sgp-crunchbase
│   └── sgp-yahoofinance
├── email
├── packages
│   ├── sgp-brew
│   ├── sgp-choco
│   ├── sgp-crate
│   ├── sgp-deb
│   ├── sgp-deno
│   ├── sgp-go
│   ├── sgp-maven
│   ├── sgp-npm
│   ├── sgp-pip
│   └── sgp-rpm
├── rtc
│   ├── sgp-discord
│   └── sgp-slack
├── social
│   ├── sgp-linkedin
│   └── sgp-twitter
├── threats
│   └── sgp-nist
└── videos
    └── sgp-youtube

cnab.io is a great fit.

https://github.com/cnabio/cnab-spec#cloud-native-application-bundle-specifications

Cloud Native Application Bundles (CNAB) are a package format specification that describes a technology for bundling, installing, and managing distributed applications, that are by design, cloud agnostic.

The community has created implementations of the CNAB spec with
opinionated takes on authoring bundles. Some even use Duffle's
libraries to handle the CNAB implementation. If you want to make your own CNAB tooling, that is a great place to start!

design: Sub-Graph Modules (sgm)

Sub-Graph Modules

Goals

  • facilitate interactive, dynamic expansion of the graph using the core model as a nucleus and/or seed.
  • learn from k8s! Don't make "special" kind:'s of things (e.g. Pod, Deployment, Ingress) part of a "built-in" data model, then a custom mechanism for extensibility (CRD's). Instead make the core data model structured with the same compositional mechanisms.
  • work with a broad, arbitrary set of targets, environments, toolchains, and compositional frameworks
  • Enable a community to gel around this project such that work can happen safely in parallel
  • facilitate self-service + automated
    • modern CI (GitHub Actions) to validate SGM's work at PR level
    • autogenerate comprehensive documentation
    • allow exploration and composition
  • able to be easily distributed on existing transports.

Tasks

  • implement inheritance model (see below)

  • implement dependency mechanism

  • implement core data model as { sgm-base, sgm-cncf, sgm-xyz, ... }

    • base: (e.g. Object)
    • landscape: for all Linux Foundation Landscapes (see https://landscapes.dev)
    • sgm-cncf: Card, Member., Project., TOC, TAG, EUG. Licenses, GitRepo)
      • sgm-cncf-docs (DD's, Charters, governance)
    • sgm-git (Commit, Author, Branch, ...)
    • sgm-github (Issue, PR, Comment, Workflow/Action, Teams (nested)
  • Sub-Graph Modules: Design and Architecture

    • Create Summary info, slides, and a blog post ("How to extend our graph")
    • reach out to neo4j for feedback on sub-graph module approach
    • reach out to CNAB.io project to assess viability
    • sgm:blog + template + examples

Types of Sub-Graph Modules (SGM)

Each of these is an Interface, acting as a base class with shared properties. Reasons to structure in this way include:

  1. enables treating classes of things polymorphically while leaving concrete instances' portion of state undisturbed.
  2. lowers the barrier to entry for new contributions
  3. provide blast radii for the model as a whole
  4. facilitate pruning and cardinality reduction of test surface requisite to validate changes in CI. As even casual data sets have the potential to be non-trivial in size, and potential cost, an intentional & structured approach is warranted.
base types derived types
blogs CNCF, thenewstack, medium.*, LinkedIn Posts, ...
boards GH Discuss, StackOverflow
corp crunchbase, yahoofinance
email cncf project lists, k8s lists
packages brew, choco, crate, deb, deno, go, maven, npm, pip, rpm
rtc slack, discord, gitter
social twitter, linkedin
threats nist
learning youtube, books, online courses (public / open only!)

Each module shall have:

  • base metadata (name, version, ...)
  • GraphQL Schema fragment
  • cypher, javascript / other expression of orchestrating growing/pruning/mutating/refactoring/... the graph
  • Description / Documentation covering entities
  • png, svg,
    • portion of the model (from arrows.app or similar) <-- used for visual diff later
    • (optional, preferred): SVG/png used for Bloom and other front ends to annotate nodes
  • sample data, patterns, and queries
  • (optional) label map providing association between the module's own names/terms, and what they might be called in the broader data model that the SGM is being loaded into. This will reduce fragility, and provide a mitigation for the inevitable label name mismatches that could happen as a result of parallel development. it'll also make these more portable
  • CI

Taking this approach facilitates creation of a rich set of capabilities impacting model training, CI, and developer experience.

By using snapshots of the graph (Graph Projections TODO doc link) in a manner similar to virtual machine snapshot trees (esx, hyper-v, ...), CI can

  • quickly set up base cases and test variations for as a matrix
  • enable smart cross-SGM dependency-aware CI to be used, such as https://zuul-ci.org or similar workflows
  • enable automated ML model experimentation and training at scale
  • per-PR live instances

We'll also benefit from a sustainable, portable, useable data model that is documented.

(TODO: update w/ final set)

.
├── blogs
│   └── sgm-blogcncf
├── boards
│   ├── sgm-ghdiscuss
│   └── sgm-stackoverflow
├── core
│   └── generated
├── corp
│   ├── sgm-crunchbase
│   └── sgm-yahoofinance
├── email
├── packages
│   ├── sgm-brew
│   ├── sgm-choco
│   ├── sgm-crate
│   ├── sgm-deb
│   ├── sgm-deno
│   ├── sgm-go
│   ├── sgm-maven
│   ├── sgm-npm
│   ├── sgm-pip
│   └── sgm-rpm
├── rtc
│   ├── sgm-discord
│   └── sgm-slack
├── social
│   ├── sgm-linkedin
│   └── sgm-twitter
├── threats
│   └── sgm-nist
└── learning
    └── sgm-youtube

ACTIVE DEVELOPMENT

Closely related to this issue is: #4 (branch)

How GraphQL Interfaces Work

https://neo4j.com/docs/graphql-manual/current/type-definitions/interfaces/#_directive_inheritance

Any directives present on an interface or its fields will be "inherited" by any object types implementing it. For example, the type definitions above could be refactored to have the @relationship directive on the actors field in the Production interface instead of on each implementing type as it is currently:

interface Production {
    title: String!
    actors: [Actor!]! @relationship(type: "ACTED_IN", direction: IN, properties: "ActedIn")
}

type Movie implements Production {
    title: String!
    actors: [Actor!]!
    runtime: Int!
}

type Series implements Production {
    title: String!
    actors: [Actor!]!
    episodes: Int!
}

interface ActedIn @relationshipProperties {
    role: String!
}

type Actor {
    name: String!
    actedIn: [Production!]! @relationship(type: "ACTED_IN", direction: OUT, properties: "ActedIn")
}

https://neo4j.com/docs/graphql-manual/current/type-definitions/interfaces/#_overriding

In addition to inheritance, directives can be overridden on a per-implementation basis. Say you had an interface defining some Content, with some basic authorization rules:

interface Content
    @auth(rules: [{ operations: [CREATE, UPDATE, DELETE], allow: { author: { username: "$jwt.sub" } } }]) {
    title: String!
    author: [Author!]! @relationship(type: "HAS_CONTENT", direction: IN)
}

type User {
    username: String!
    content: [Content!]! @relationship(type: "HAS_CONTENT", direction: OUT)
}

type PublicContent implements Content {
    title: String!
    author: [Author!]!
}

type PrivateContent implements Content
    @auth(rules: [{ operations: [CREATE, READ, UPDATE, DELETE], allow: { author: { username: "$jwt.sub" } } }]) {
    title: String!
    author: [Author!]!
}

Core Data Model

core-png

Augment existing data model w/ learnings from github.com/community-graph

Here's the model we have today:

db-model

...and how it relates to the community-graph's model:

image

image

Here's the community-graph model:

cg

Note that the community graph data import uses the GH GraphQL api. In this project (landscape-graph) the bulk of git info will be coming from src-d/gitbase, however having the interactive / GraphQL mechanism is also useful.

  • The git model is already quite close, however the current landscape-graph model doesn't model Issues, and it probably should.

  • Add Issue to the Git Model
  • Determine what else we should take (now), and what to do in the future (new issues)

Create a Project Roadmap

Creating a project roadmap will give a high-level understanding to would-be contributors of where the project is and where it is going. This should be visualized and easily digestible.

Create Issue Template: "Questions we want answered"

Questions that landscape-graph can help answer (seed) are below.

Definition of Done

  • create GitHub Issue Template

  • Use it to enter these (post word smithing), and document the others we've talked to in various meetings.

    • The Graph Model (Neo4j based) I’m implementing aims to help to quantify this.
    • What companies / Vendors are employing which projects?
    • What are the resourcing trends across the LF landscapes?
    • What momentum / velocity / growth / correlations can be found?
    • (chris) Can correlate project popularity with market cap?

CODEOWNERS strategy/design for landscape-graph

cncf/landscape-graph is home to a variety of things.

  • applications (both utility and end-user facing)
  • daily update of landscape.cncf.io, sync'd and cleaned for import to the graph
  • graphql schema definitions (source of truth)
  • data model artifacts meant for both humans and robot consumption
  • Sub-Graph Modules

CODEOWNERS to the rescue!

Tasks

  • design w/ artifact (markdown in this repo) for using it for the project
  • Generate task breakdown
  • DoIt();

Create gource visualizations for all cncf project related repos

  • Determine if iterative CDC/ETL style workflow is possible.
  • POC Render the following using Envisaged Redux
    • single repo ( anything )
    • multiple repos in the same org (cncf/*)
    • multiple repos across orgs
  • Create automated mechanism (script) to drive containerized video creation at scale in matrixed GitHub Actions
  • Use langscape-graph to create videos for each project - one per repo, and one per aggregated project (across repos)
  • Create mechanism to create one YouTube playlist per project, uploading videos. Comments should have the local commands needed to run it interactively via docker.

Resources for https://gource.io

Envisaged Redux

Other / Old

Envisaged

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.