Coder Social home page Coder Social logo

bite_sized_data_science's Introduction

Bite-Sized Neo4j for Data Scientists

Written by: Dr. Clair J. Sullivan, Data Science Advocate, Neo4j

Twitter: @CJLovesData1

Last updated: April 13, 2022

All notebooks can be found in notebooks/. Some videos are strictly based on Cypher querys, which can be found in cypher/.

THIS SERIES IS ON HIATUS FOR A WHILE!!!

Stay tuned to the Neo4j YouTube channel for new episodes coming soon!

Note:

The notebooks in this repository are not meant to be stand-alone and thus are not commented. They go with the videos. So you are encouraged to watch the videos and then consult the notebooks should you will to look at the actual code in depth.

Videos

✨ ✨ Find this video series as its own webpage on the Neo4j webpage!!! ✨ ✨

Part 1: Connect from Jupyter to a Neo4j Sandbox

Part 2: Using the py2neo Python Driver

Part 3: Using the Neo4j Python Driver

Part 4: Basic Cypher Queries (and with Google Colab)

  • This video uses a Google Colab notebook, which can be found here

Part 5: Populating the Database from Pandas

  • This video refers to a YouTube video on how to create efficient Cypher queries, which is linked in the references below.

Part 6: Populating the Database with LOAD CSV

Part 7: Populating the Database with the neo4j-admin tool

  • This video works from the command line using Docker. The shell commands are provided in GitHub gists, which can be found here.
  • The data for this part can be found in data/ (the files are got-s1-nodes.csv and got-s1-edges.csv).

Part 8: Populating the Database from a JSON file

  • This video references a JSON file I created for my NODES 2021 tutorial, "Creating a Knowledge Graph with Neo4j: A Simple Machine Learning Approach."

Part 9: Cypher Queries 2

Part 10: Creating In-Memory Graphs with Cypher Projections

Part 11: Import RDF Data from Wikidata

  • To query Wikidata, it is helpful to know how to use SPARQL. The query builder that I showed (which has several great example queries) can be found here. Wikidata also provides a good SPARQL tutorial.
  • This video shows the use of Neosemantics for importing the RDF data. See below in the References for docs on how to use it.
  • This video also shows very quickly demonstrates Neo4j Bloom for visualization and queries. For an in-depth look at how to use Bloom, see this video.

Part 12: Creating In-Memory Graphs with Native Projections

  • This is the sister video for Part 10, which explored the other method for creating in-memory graphs.

Part 13: Calculating Centrality

Part 14: Community Detection with the Louvain Method

Part 15: Community Detection via Weakly Connected Components

Part 16: Using Strongly Connected Components to Detect Communities

Part 17: Creating FastRP Graph Embeddings

Part 18: Putting Graph Embeddings into a Machine Learning Model

  • This video moves quickly! It will be important to read this blog post, particularly for understanding how to get the embeddings into a format for the machine learning model.

Part 19: Starting with a SQL table...

  • This video is the start of a series looking at why we might want to go from SQL to a graph database
  • It is based off of the graph data that can be found in here
  • I use PostgreSQL for my demonstrations, but you can use your SQL of choice
  • All queries to populate your database are in ./sql_queries/part19

Part 20: ...And compare it to a graph... (2/n)

  • This video builds off of Part 19, using the same data imported into Neo4j
  • To create the CSV files used for this graph, I exported each of the tables in Part 19 directly from Postgres via pgAdmin
    • I made some tweaks of the headers to get them into Neo4j via LOAD CSV easily
    • The data files can be found in ./data

Part 21: An example of when querying a graph can be easier than SQL (3/n)

  • This video builds off of Parts 19 and 20 of this series
  • If you do not already have a Neo4j database populated with this data, follow the instructions in Part 20 or run the script ./cypher_queries/part20.cql to populate the database

Part 22: A side-by-side calculation of degree using SQL and Neo4j (4/n)

  • This video builds off of Parts 19-21 of this series
  • If you do not already have a SQL database populated with this data, use the queries in ./sql_queries/part19/
  • If you do not already have a Neo4j database populated with this data, follow the instructions in Part 20 or run the script ./cypher_queries/part20.cql to populate the database

Part 23: PageRank done two ways (5/n)

  • This video builds off of Parts 19-22 of this series
  • We will be using a very simplistic graph for this demonstration
  • The PageRank SQL query was taken from this Stack Overflow post, which was originally written for T-SQL and has been modified in this repo to work in PostgreSQL

Page 24: Why graphs? (6/6)

  • This video builds off of Parts 19-23 of this series
  • This is the final video in the mini series-within-a-series for the SQL vs. Neo4j comparisons

Part 25: Creating a graph for a Kaggle competition

Part 26: Creating a graph model of the Kaggle competition (2/n)

Part 27: Node similarity of Kaggle competition graph (3/n)

Part 28: Using KNN to identify similar items of Kaggle competition graph (4/n)

  • This video is based off of Parts 25-27
  • If you need a refresher on how to create an in-memory graph projection as is done in this video, please consult Part 12
  • In this video we will do some very basic feature engineering to explore the K-Nearest Neighbors for each article of clothing to obtain similar articles
  • (The next video will also do KNN, but using some much more sophisticated features!)

Part 29: Using KNN with more sophisticated feature vectors (5/n)

  • This video is based off of Parts 25-28

Part 30: Introducing GDS 2.0!

  • This video just scrapes the surface of all of the new offerings within GDS 2.0, but focuses on the new GDS Python Client

References

bite_sized_data_science's People

Contributors

cj2001 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.