cncf / landscape-graph Goto Github PK

CNCF Landscape Graph, data model, and applications.

Home Page: https://github.com/orgs/cncf/projects/7/views/6

License: Other

TypeScript 0.01% Shell 0.23% Dockerfile 0.01% JavaScript 0.06% Cypher 0.05% Python 0.76% Jupyter Notebook 91.36% Batchfile 0.01% Roff 7.51%

cloudnative cncf cypher graph-data-science graphql javafx knowledge-graph landscape neo4j flat-data

landscape-graph's Introduction

CNCF Landscape Graph

Initial, open, active development.

Join us @ #landscape-graph. Here's our current activities. Formal plan and roadmap are in progress.

Often, we need to understand how an open source project interacts with others, how it's changing over time, and who's enabling it's continued success. We want to understand what alternatives exist, or how complementary projects might be combined in purpose-fit or novel ways. We might want to dive in and contribute! This is how projects and ecosystems grow to meet business challenges facing modern organizations.

Landscape Graph Data Model

Graphs can facilitate rich analysis of our vibrant and dynamic communities, the humans they comprise, and the clusters of contribution and thought leadership they produce.

Using the data underlying the existing landscape as input, a Labeled Property Graph (LPG) is constructed using Cypher (SQL for Graphs), resulting in a Neo4j graph database.

Here's the schema:

"Origin Story"

In November of 2018 there were 25 CNCF projects.

At the time Ayrat Khayretdinov published the "Beginner's Guide to the CNCF Landscape." It opened with:

The cloud native landscape can be complicated and confusing. Its myriad of open source projects are supported by the constant contributions of a vibrant and expansive community. The Cloud Native Computing Foundation (CNCF) has a landscape map that shows the full extent of cloud native solutions, many of which are under their umbrella.

It described the CNCF Mission in these terms:

The CNCF fosters this landscape of open source projects by helping provide end-user communities with viable options for building cloud native applications. By encouraging projects to collaborate with each other, the CNCF hopes to enable fully-fledged technology stacks comprised solely of CNCF member projects. This is one way that organizations can own their destinies in the cloud.

We. Have. Grown.

Today there are 5.4 million humans using Kubernetes and the landscape continues to expand.

2022 Q2	Cards	⭐	cap	funding
projects	111	614,394	$291.4 M	$29.6 M
ecosystem	1,061	3,066,372	$15.7 T	$29.1 B

We have a "good" problem

The CNCF Landscape aggregates summary data from GitHub, Crunchbase, Yahoo Finance, Twitter, and other sources while providing the ability quickly find, filter, and group the more than 1000 Cards across numerous dimensions. It is automagically updated daily. It continues to work as designed.

With a single well placed click a wealth of data can be summoned. Here's the "Card" for Neo4j

This is perfect when we know what we're looking for (specifically).

Technical TLDR

GRANDstack (https://grandstack.io)

source: grandstack.io/docs/...

GRANDstack is a combination of technologies that work together to enable developers to build data intensive full stack applications. The components of GRANDstack are:

GraphQL - A new paradigm for building APIs, GraphQL is a way of describing data and enabling clients to query it.
React - A JavaScript library for building component based reusable user interfaces.
Apollo - A suite of tools that work together to create great GraphQL workflows.
Neo4j Database - The native graph database that allows you to model, store, and query your data the same way you think about it: as a graph.

Here's how it all fits together in the context of a movie search app:

Additional tools and frameworks

TODO: #27

Component	What it is
Neo4j GraphQL Library	{neo}/product/graphql-library, (dev blog)
Neo4j Streams	{neo}/labs/kafka, {gh}/neo4j-contrib/neo4j-streams
gitbase	Git history as MySQL, src-d/gitbase
JavaFX	UI, 3d, openjfx.io
Quarkus	AoT, minify, Dev UX, quarkus.io

Graph Data Science Algorithms ("Why Neo4j?")

https://neo4j.com/developer/graph-data-science/graph-algorithms

Graph Databases “perform the join on insert” instead of query time. No joins or table scans required.

A graph data model (vs. rectangular relational) can bring to bear all that we’ve learned from ad/fin/security tech, big data, ml, etc.

Graph Data Science Algorithm Types

Docs --> https://neo4j.com/docs/graph-data-science/current

Type	Definition
Path Finding	Help find the shortest path or evaluate the availability and quality of routes
Centrality	Determine the importance of distinct nodes in a network
Community Detection	Evaluate how a group is clustered or partitioned, as well as its tendency to strengthen or break apart
Similarity	Help calculate the similarity of nodes
Topological link prediction	Determine the closeness of pairs of nodes
Node Embeddings	Compute vector representations of nodes in a graph.
Node Classification	Uses machine learning to predict the classification of nodes.
Link prediction	Use machine learning to predict new links between pairs of nodes.

Cypher ("SQL for Graphs")

https://github.com/opencypher/openCypher

Cypher is a declarative graph query language that allows for expressive and efficient querying, updating and administering of the graph. It is designed to be suitable for both developers and operations professionals. Cypher is designed to be simple, yet powerful; highly complicated database queries can be easily expressed, enabling you to focus on your domain, instead of getting lost in database access.

On its influences and roots:

Cypher is inspired by a number of different approaches and builds on established practices for expressive querying. Many of the keywords, such as WHERE and ORDER BY, are inspired by SQL. Pattern matching borrows expression approaches from SPARQL. Some of the list semantics are borrowed from languages such as Haskell and Python. Cypher’s constructs, based on English prose and neat iconography, make queries easy, both to write and to read.

How to Contribute

Join us @ #landscape-graph.
Review our current plan.
Help us to make this better :)

License

This repository contains data received from Crunchbase. This data is not licensed pursuant to the Apache License. It is subject to Crunchbase’s Data Access Terms, available at https://data.crunchbase.com/docs/terms, and is only permitted to be used with Linux Foundation landscape projects.

Everything else is under the Apache License, Version 2.0, except for project and product logos, which are generally copyrighted by the company that created them, and are simply cached here for reliability.

landscape-graph's People

Contributors

Stargazers

Watchers

Forkers

isabella232 lianghuiyuan alexxnica rohankmr414 atharva-shinde halcyondude sbdtu5498 solo-daemon federicobucchi kumarankit999 xonx4l

landscape-graph's Issues

GitHub Issue { labels, templates}

reach out to contributor strategy for guidance
put in place sane/rational templates as a starting point
put in place sane/rational labels as a starting point
push learnings/feedback/our templates back to tag-contributor-strategy

Enable GitHub Pages Site

cncf.github.io/landscape-graph

MVP: base data model (graph) + app

Scope for Minimum Viable Product (MVP)

#38
import --> Neo4j database (via cypher), base entities in place
#7
minimal web based UI to explore graph
JavaFX native app (running Neo4j in embedded mode)

Stretch Goals

architecture: per-component overviews

For each component of landscape-graph create a .md file articulating:

what it is
how it works
what it provides
why it was chosen

This forms the precursor to proper docs.

Scope

Create GraphQL endpoint, leveraging neo4j graphql library v3

GraphQL schema --> source of truth

Tasks

Moved to new/other issue(s):

#51
#96
constraints and indexes, full text where it makes sense(#73, #20)
use graphql to load data via mutations. (#63)

More Info

resources

App: emit maven package dependencies as build artifact

https://github.com/ferstl/depgraph-maven-plugin

Establish project governance, use good practices.

[spike] Interactive Extensibility: "sub-graph packs (sgp)"

.
├── blogs
│   └── sgp-blogcncf
├── boards
│   ├── sgp-ghdiscuss
│   └── sgp-stackoverflow
├── core
│   └── generated
├── corp
│   ├── sgp-crunchbase
│   └── sgp-yahoofinance
├── email
├── packages
│   ├── sgp-brew
│   ├── sgp-choco
│   ├── sgp-crate
│   ├── sgp-deb
│   ├── sgp-deno
│   ├── sgp-go
│   ├── sgp-maven
│   ├── sgp-npm
│   ├── sgp-pip
│   └── sgp-rpm
├── rtc
│   ├── sgp-discord
│   └── sgp-slack
├── social
│   ├── sgp-linkedin
│   └── sgp-twitter
├── threats
│   └── sgp-nist
└── videos
    └── sgp-youtube

cnab.io is a great fit.

https://github.com/cnabio/cnab-spec#cloud-native-application-bundle-specifications

Cloud Native Application Bundles (CNAB) are a package format specification that describes a technology for bundling, installing, and managing distributed applications, that are by design, cloud agnostic.

The community has created implementations of the CNAB spec with
opinionated takes on authoring bundles. Some even use Duffle's
libraries to handle the CNAB implementation. If you want to make your own CNAB tooling, that is a great place to start!

What companies are using which projects? What vendors support that?

Evaluate final data model for supernodes & chain locks. implement join hints as necessary

https://neo4j.com/developer/kb/how-to-avoid-costly-traversals-with-join-hints

https://medium.com/neo4j/relationship-chain-locks-dont-block-the-rock-e8db75254b63

Infra: Engage w/ CNCF Community Infra Lab

Resources to review:

TLDR: Ask

k8s cluster(s), either BM or cloud hosted (EKS, GKE, AKS, ...)
GitOps workflows assumed (flux, argo, etc)

Start here: https://github.com/cncf/cluster/issues/new

This issue is something of a "spike" and should result in subsequent issues / tasks identified as Next Steps, and some understanding of timeline.

For a set of projects' contributors, who employed them whilst they contributed? Who funded those organizations? Who owns them? What else did they invest in?

Create GraphQL API endpoint

design: Sub-Graph Modules (sgm)

Sub-Graph Modules

Goals

facilitate interactive, dynamic expansion of the graph using the core model as a nucleus and/or seed.
learn from k8s! Don't make "special" kind:'s of things (e.g. Pod, Deployment, Ingress) part of a "built-in" data model, then a custom mechanism for extensibility (CRD's). Instead make the core data model structured with the same compositional mechanisms.
work with a broad, arbitrary set of targets, environments, toolchains, and compositional frameworks
Enable a community to gel around this project such that work can happen safely in parallel
facilitate self-service + automated
- modern CI (GitHub Actions) to validate SGM's work at PR level
- autogenerate comprehensive documentation
- allow exploration and composition
able to be easily distributed on existing transports.
- OCI images are a great fit. https://github.com/cnabio/cnab-to-oci

Tasks

Types of Sub-Graph Modules (SGM)

Each of these is an Interface, acting as a base class with shared properties. Reasons to structure in this way include:

enables treating classes of things polymorphically while leaving concrete instances' portion of state undisturbed.
lowers the barrier to entry for new contributions
provide blast radii for the model as a whole
facilitate pruning and cardinality reduction of test surface requisite to validate changes in CI. As even casual data sets have the potential to be non-trivial in size, and potential cost, an intentional & structured approach is warranted.

base types	derived types
blogs	CNCF, thenewstack, medium.*, LinkedIn Posts, ...
boards	GH Discuss, StackOverflow
corp	crunchbase, yahoofinance
email	cncf project lists, k8s lists
packages	brew, choco, crate, deb, deno, go, maven, npm, pip, rpm
rtc	slack, discord, gitter
social	twitter, linkedin
threats	nist
learning	youtube, books, online courses (public / open only!)

Each module shall have:

base metadata (name, version, ...)
GraphQL Schema fragment
cypher, javascript / other expression of orchestrating growing/pruning/mutating/refactoring/... the graph
Description / Documentation covering entities
png, svg,
- portion of the model (from arrows.app or similar) <-- used for visual diff later
- (optional, preferred): SVG/png used for Bloom and other front ends to annotate nodes
sample data, patterns, and queries
(optional) label map providing association between the module's own names/terms, and what they might be called in the broader data model that the SGM is being loaded into. This will reduce fragility, and provide a mitigation for the inevitable label name mismatches that could happen as a result of parallel development. it'll also make these more portable
CI

Taking this approach facilitates creation of a rich set of capabilities impacting model training, CI, and developer experience.

By using snapshots of the graph (Graph Projections TODO doc link) in a manner similar to virtual machine snapshot trees (esx, hyper-v, ...), CI can

quickly set up base cases and test variations for as a matrix
enable smart cross-SGM dependency-aware CI to be used, such as https://zuul-ci.org or similar workflows
enable automated ML model experimentation and training at scale
per-PR live instances

We'll also benefit from a sustainable, portable, useable data model that is documented.

(TODO: update w/ final set)

.
├── blogs
│   └── sgm-blogcncf
├── boards
│   ├── sgm-ghdiscuss
│   └── sgm-stackoverflow
├── core
│   └── generated
├── corp
│   ├── sgm-crunchbase
│   └── sgm-yahoofinance
├── email
├── packages
│   ├── sgm-brew
│   ├── sgm-choco
│   ├── sgm-crate
│   ├── sgm-deb
│   ├── sgm-deno
│   ├── sgm-go
│   ├── sgm-maven
│   ├── sgm-npm
│   ├── sgm-pip
│   └── sgm-rpm
├── rtc
│   ├── sgm-discord
│   └── sgm-slack
├── social
│   ├── sgm-linkedin
│   └── sgm-twitter
├── threats
│   └── sgm-nist
└── learning
    └── sgm-youtube

ACTIVE DEVELOPMENT

Closely related to this issue is: #4 (branch)

How GraphQL Interfaces Work

https://neo4j.com/docs/graphql-manual/current/type-definitions/interfaces/#_directive_inheritance

Any directives present on an interface or its fields will be "inherited" by any object types implementing it. For example, the type definitions above could be refactored to have the @relationship directive on the actors field in the Production interface instead of on each implementing type as it is currently:

interface Production {
    title: String!
    actors: [Actor!]! @relationship(type: "ACTED_IN", direction: IN, properties: "ActedIn")
}

type Movie implements Production {
    title: String!
    actors: [Actor!]!
    runtime: Int!
}

type Series implements Production {
    title: String!
    actors: [Actor!]!
    episodes: Int!
}

interface ActedIn @relationshipProperties {
    role: String!
}

type Actor {
    name: String!
    actedIn: [Production!]! @relationship(type: "ACTED_IN", direction: OUT, properties: "ActedIn")
}

https://neo4j.com/docs/graphql-manual/current/type-definitions/interfaces/#_overriding

In addition to inheritance, directives can be overridden on a per-implementation basis. Say you had an interface defining some Content, with some basic authorization rules:

interface Content
    @auth(rules: [{ operations: [CREATE, UPDATE, DELETE], allow: { author: { username: "$jwt.sub" } } }]) {
    title: String!
    author: [Author!]! @relationship(type: "HAS_CONTENT", direction: IN)
}

type User {
    username: String!
    content: [Content!]! @relationship(type: "HAS_CONTENT", direction: OUT)
}

type PublicContent implements Content {
    title: String!
    author: [Author!]!
}

type PrivateContent implements Content
    @auth(rules: [{ operations: [CREATE, READ, UPDATE, DELETE], allow: { author: { username: "$jwt.sub" } } }]) {
    title: String!
    author: [Author!]!
}

Core Data Model

Create homebrew formula and/or keg for graph app

How does investment flow through the Landscape? Who maintains what? Who uses it?

MVP: src-d/gitbase

Augment existing data model w/ learnings from github.com/community-graph

Here's the model we have today:

...and how it relates to the community-graph's model:

Here's the community-graph model:

Note that the community graph data import uses the GH GraphQL api. In this project (landscape-graph) the bulk of git info will be coming from src-d/gitbase, however having the interactive / GraphQL mechanism is also useful.

The git model is already quite close, however the current landscape-graph model doesn't model Issues, and it probably should.

Add Issue to the Git Model
Determine what else we should take (now), and what to do in the future (new issues)

Make it easy to Practitioners and Ambassadors to add to our questions and scenarios

Identify communities. Understand how they interact. Comprehend how they collaborate with each other.

Reach out to Neo4j re: Enterprise License for OSS, github auth, live backups, clustering

Create project README.md

Are popularity and market cap correlated?

Determine what Project Badges are appropriate and implement

For a set of projects, for all repos by release, show package dependency trees, overlaid with current CVE announcements w/ reporting and alerting as necessary.

MVP: JavaFX landscape-graph application

https://github.com/quarkiverse/quarkus-neo4j

mvn io.quarkus.platform:quarkus-maven-plugin:2.7.5.Final:create \
    -DprojectGroupId=io.cncf \
    -DprojectArtifactId=panorama \
    -DclassName="org.acme.datasource.GreetingResource" \
    -Dextensions="neo4j,resteasy-reactive-jackson"

Quarkus has an AWESOME dev experience...IJW.

local docker dev env

Grok groupings of frequent code review <-> author interactions across projects.

Create contributors guide

Document Architecture (high level)

10k' diagram(s)
Explain diagram (summary for each component)

resources to incorporate into new contrib docs and/or landscape-graph library

videos

Create full-text indices (Lucene) for relationships' properties

...that are...

frequently used for queries
description fields (repos, orgs, user profiles, financial disclosures, etc

https://neo4j.com/docs/cypher-manual/current/indexes-for-full-text-search

CALL db.index.fulltext.createRelationshipIndex("taggedByRelationshipIndex",["FOLLOWS"],["date"], { eventually_consistent: "true" })

obtain Neo4j Enterprise License for OSS (clustering, bloom, metrics, ...)

Create Issue Template: "Questions we want answered"

Questions that landscape-graph can help answer (seed) are below.

Definition of Done

create GitHub Issue Template
Use it to enter these (post word smithing), and document the others we've talked to in various meetings.
- The Graph Model (Neo4j based) I’m implementing aims to help to quantify this.
- What companies / Vendors are employing which projects?
- What are the resourcing trends across the LF landscapes?
- What momentum / velocity / growth / correlations can be found?
- (chris) Can correlate project popularity with market cap?

https://medium.com/@halcyondude/on-measuring-developer-productivity-9a81a50175da

Add issue templates + PR templates

examples here https://github.com/devspace/awesome-github-templates
reach out to https://github.com/cncf/tag-contributor-strategy

CODEOWNERS strategy/design for landscape-graph

cncf/landscape-graph is home to a variety of things.

applications (both utility and end-user facing)
daily update of landscape.cncf.io, sync'd and cleaned for import to the graph
graphql schema definitions (source of truth)
data model artifacts meant for both humans and robot consumption
Sub-Graph Modules

CODEOWNERS to the rescue!

Tasks

design w/ artifact (markdown in this repo) for using it for the project
Generate task breakdown
DoIt();

Implement importing data --> model (post POC)

Create gource visualizations for all cncf project related repos

Determine if iterative CDC/ETL style workflow is possible.
POC Render the following using Envisaged Redux
- single repo ( anything )
- multiple repos in the same org (cncf/*)
- multiple repos across orgs
Create automated mechanism (script) to drive containerized video creation at scale in matrixed GitHub Actions
Use langscape-graph to create videos for each project - one per repo, and one per aggregated project (across repos)
Create mechanism to create one YouTube playlist per project, uploading videos. Comments should have the local commands needed to run it interactively via docker.

cncf / landscape-graph Goto Github PK

landscape-graph's Introduction

CNCF Landscape Graph

Landscape Graph Data Model

"Origin Story"

We. Have. Grown.

We have a "good" problem

Technical TLDR

GRANDstack (https://grandstack.io)

Additional tools and frameworks

Graph Data Science Algorithms ("Why Neo4j?")

Graph Data Science Algorithm Types

Cypher ("SQL for Graphs")

How to Contribute

License

landscape-graph's People

Contributors

Stargazers

Watchers

Forkers

landscape-graph's Issues

Scope for Minimum Viable Product (MVP)

Stretch Goals

For each component of landscape-graph create a .md file articulating:

Scope

GraphQL schema --> source of truth

Tasks

Moved to new/other issue(s):

resources

Sub-Graph Modules

Goals

Tasks

Types of Sub-Graph Modules (SGM)

ACTIVE DEVELOPMENT

How GraphQL Interfaces Work

Core Data Model

Here's the model we have today:

...and how it relates to the community-graph's model:

Here's the community-graph model:

resources to incorporate into new contrib docs and/or landscape-graph library

links

videos

Definition of Done

Tasks

Resources for https://gource.io

Envisaged Redux

Other / Old

Envisaged

Recommend Projects

Recommend Topics

Recommend Org