A hackathon for Entity Resolution using Splink (by MoJ) on Azure Databricks done by NICD, MoJ, Microsoft and Databricks
-
Add repository to your Databricks Workspace via Repos. Paste this link when required: https://github.com/databricks-industry-solutions/splink-public-sector-hackathon.git
-
Walk through the end-to-end example notebook in the
notebooks
directory to understand how Splink works. -
Use the exercise notebook as a starting point for your own work. Consider exploring different parameters or feature engineering to understand the impact on linking. Visualise the results using Python, R, or the built in visualisation tools in Databricks Notebooks or Databricks SQL.