Coder Social home page Coder Social logo

datopian / markdowndb Goto Github PK

View Code? Open in Web Editor NEW
168.0 3.0 12.0 6.79 MB

JS library to turn markdown files into structured, queryable data. Build markdown-powered docs, blogs, sites and more quickly and reliably.

Home Page: https://markdowndb.com

License: MIT License

TypeScript 97.35% JavaScript 1.37% Shell 0.07% MDX 1.21%
awesomeness catalog database markdown contentlayer contentlayer-nextjs contentlayer-typescript headless-cms jamstack

markdowndb's Introduction

MarkdownDB

MarkdownDB is a javascript library that turns markdown files into structured queryable databaase (SQL-based and simple JSON). It helps you build rich markdown-powered sites easily and reliably. Specifically it:

  • Parses your markdown files to extract structured data (frontmatter, tags etc) and builds a queryable index either in JSON files or a local SQLite database
  • Provides a lightweight javascript API for querying the index and using the data files into your application

Features and Roadmap

  • Index a folder of files - create a db index given a folder of markdown and other files
    • Command line tool for indexing: Create a markdowndb (index) on the command line v0.1
    • SQL(ite) index v0.2
    • JSON index v0.6
    • BONUS Index multiple folders (with support for configuring e.g. prefixing in some way e.g. i have all my blog files in this separate folder over here)
    • Configuration for Including/Excluding Files in the folder

Extract structured data like:

  • Frontmatter metadata: Extract markdown frontmatter and add in a metadata field
    • deal with casting types e.g. string, number so that we can query in useful ways e.g. find me all blog posts before date X
  • Tags: Extracts tags in markdown pages
    • Extract tags in frontmatter v0.1
    • Extract tags in body like #abc v0.5
  • Links: links between files like [hello](abc.md) or wikilink style [[xyz]] so we can compute backlinks or deadlinks etc (see #4) v0.2
  • Tasks: extract tasks like this - [ ] this is a task (See obsidian data view) v0.4

Data enhancement and validation

  • Computed fields: add new metadata properties based on existing metadata e.g. a slug field computed from title field; or, adding a title based on the first h1 heading in a doc; or, a type field based on the folder of the file (e.g. these are blog posts). cf https://www.contentlayer.dev/docs/reference/source-files/define-document-type#computedfields.
  • ๐Ÿšง Data validation and Document Types: validate metadata against a schema/type so that I know the data in the database is "valid" #55
    • BYOT (bring your own types): i want to create my own types ... so that when i get an object out it is cast to the right typescript type

Quick start

Have a folder of markdown content

For example, your blog posts. Each file can have a YAML frontmatter header with metadata like title, date, tags, etc.

---
title: My first blog post
date: 2021-01-01
tags: [a, b, c]
author: John Doe
---

# My first blog post

This is my first blog post.
I'm using MarkdownDB to manage my blog posts.

Index the files with MarkdownDB

Use the npm mddb package to index Markdown files into an SQLite database. This will create a markdown.db file in the current directory. You can preview it with any SQLite viewer, e.g. https://sqlitebrowser.org/.

# npx mddb <path-to-folder-with-your-md-files>
npx mddb ./blog

Watching for Changes

To monitor files for changes and update the database accordingly, simply add the --watch flag to the command:

npx mddb ./blog --watch

This command will continuously watch for any modifications in the specified folder (./blog), automatically rebuilding the database whenever a change is detected.

Query your files with SQL...

E.g. get all the files with with tag a.

SELECT files.*
FROM files
INNER JOIN file_tags ON files._id = file_tags.file
WHERE file_tags.tag = 'a'

...or using MarkdownDB Node.js API in a framework of your choice!

Use our Node API to query your data for your blog, wiki, docs, digital garden, or anything you want!

Install mddb package in your project:

npm install mddb

Now, once the data is in the database, you can add the following script to your project (e.g. in /lib folder). It will allow you to establish a single connection to the database and use it across you app.

// @/lib/mddb.mjs
import { MarkdownDB } from "mddb";

const dbPath = "markdown.db";

const client = new MarkdownDB({
  client: "sqlite3",
  connection: {
    filename: dbPath,
  },
});

const clientPromise = client.init();

export default clientPromise;

Now, you can import it across your project to query the database, e.g.:

import clientPromise from "@/lib/mddb";

const mddb = await clientPromise;
const blogs = await mddb.getFiles({
  folder: "blog",
  extensions: ["md", "mdx"],
});

Computed Fields

This feature helps you define functions that compute additional fields you want to include.

Step 1: Define the Computed Field Function

Next, define a function that computes the additional field you want to include. In this example, we have a function named addTitle that extracts the title from the first heading in the AST (Abstract Syntax Tree) of a Markdown file.

const addTitle = (fileInfo, ast) => {
  // Find the first header node in the AST
  const headerNode = ast.children.find((node) => node.type === "heading");

  // Extract the text content from the header node
  const title = headerNode
    ? headerNode.children.map((child) => child.value).join("")
    : "";

  // Add the title property to the fileInfo
  fileInfo.title = title;
};

Step 2: Indexing the Folder with Computed Fields

Now, use the client.indexFolder method to scan and index the folder containing your Markdown files. Pass the addTitle function in the computedFields option array to include the computed title in the database.

client.indexFolder(folderPath: "PATH_TO_FOLDER", customConfig: { computedFields: [addTitle] });

Configuring markdowndb.config.js

  • Implement computed fields to dynamically calculate values based on specified logic or dependencies.
  • Specify the patterns for including or excluding files in MarkdownDB.

Example Configuration

Here's an example markdowndb.config.js with custom configurations:

export default {
  computedFields: [
    (fileInfo, ast) => {
      // Your custom logic here
    },
  ],
  include: ["docs/**/*.md"], // Include only files matching this pattern
  exclude: ["drafts/**/*.md"], // Exclude those files matching this pattern
};

(Optional) Index your files in a prebuild script

{
  "name": "my-mddb-app",
  "scripts": {
    ...
    "mddb": "mddb <path-to-your-content-folder>",
    "prebuild": "npm run mddb"
  },
  ...
}

With Next.js project

For example, in your Next.js project's pages, you could do:

// @/pages/blog/index.js
import React from "react";
import clientPromise from "@/lib/mddb.mjs";

export default function Blog({ blogs }) {
  return (
    <div>
      <h1>Blog</h1>
      <ul>
        {blogs.map((blog) => (
          <li key={blog.id}>
            <a href={blog.url_path}>{blog.title}</a>
          </li>
        ))}
      </ul>
    </div>
  );
}

export const getStaticProps = async () => {
  const mddb = await clientPromise;
  // get all files that are not marked as draft in the frontmatter
  const blogFiles = await mddb.getFiles({
    frontmatter: {
      draft: false,
    },
  });

  const blogsList = blogFiles.map(({ metadata, url_path }) => ({
    ...metadata,
    url_path,
  }));

  return {
    props: {
      blogs: blogsList,
    },
  };
};

API reference

Queries

Retrieve a file by URL path:

mddb.getFileByUrl("urlPath");

Currently used file path -> url resolver function:

const defaultFilePathToUrl = (filePath: string) => {
  let url = filePath
    .replace(/\.(mdx|md)/, "") // remove file extension
    .replace(/\\/g, "/") // replace windows backslash with forward slash
    .replace(/(\/)?index$/, ""); // remove index at the end for index.md files
  url = url.length > 0 ? url : "/"; // for home page
  return encodeURI(url);
};

๐Ÿšง The resolver function will be configurable in the future.

Retrieve a file by it's database ID:

mddb.getFileByUrl("fileID");

Get all indexed files:

mddb.getFiles();

By file types:

You can specify type of the document in its frontmatter. You can then get all the files of this type, e.g. all blog type documents.

mddb.getFiles({ filetypes: ["blog", "article"] }); // files of either blog or article type

By tags:

mddb.getFiles({ tags: ["tag1", "tag2"] }); // files tagged with either tag1 or tag2

By file extensions:

mddb.getFiles({ extensions: ["mdx", "md"] }); // all md and mdx files

By frontmatter fields:

You can query by multiple frontmatter fields at once.

At them moment, only exact matches are supported. However, false values do not need to be set explicitly. I.e. if you set draft: true on some blog posts and want to get all the posts that are not drafts, you don't have to explicitly set draft: false on them.

mddb.getFiles({
  frontmatter: {
    key1: "value1",
    key2: true,
    key3: 123,
    key4: ["a", "b", "c"], // this will match exactly ["a", "b", "c"]
  },
});

By folder:

Get all files in a subfolder (path relative to your content folder).

mddb.getFiles({ folder: "path" });

Combined conditions:

mddb.getFiles({ tags: ["tag1"], filetypes: ["blog"], extensions: ["md"] });

Retrieve all tags:

mddb.getTags();

Get links (forward or backward) related to a file:

mddb.getLinks({ fileId: "ID", direction: "forward" });

Architecture

graph TD

markdown --remark-parse--> st[syntax tree]
st --extract features--> jsobj1[TS Object eg. File plus Metadata plus Tags plus Links]
jsobj1 --computing--> jsobj[TS Objects]
jsobj --convert to sql--> sqlite[SQLite markdown.db]
jsobj --write to disk--> json[JSON on disk in .markdowndb folder]
jsobj --tests--> testoutput[Test results]

markdowndb's People

Contributors

camargomau avatar github-actions[bot] avatar mohamedsalem401 avatar olayway avatar rufuspollock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

markdowndb's Issues

Module not found: Can't resolve 'better-sqlite3'

Running a NextJs app on a Turborepo gives Module not found issues from knex

Module not found: Can't resolve 'better-sqlite3'

adding better-sqlite3 to package.json will keep asking for the next missing packages (tedious, mysql, mysql2, oracle, etc)

Import trace for requested module:
../../node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/knex/lib/dialects/index.js
../../node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/knex/lib/knex-builder/internal/config-resolver.js
../../node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/knex/lib/knex-builder/Knex.js
../../node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/knex/lib/index.js
../../node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/knex/knex.js
../../node_modules/.pnpm/[email protected][email protected]/node_modules/mddb/dist/src/lib/markdowndb.js
../../node_modules/.pnpm/[email protected][email protected]/node_modules/mddb/dist/src/index.js
./src/providers/markdowndb.ts
./src/app/page.tsx

Computed metadata fields

Computed fields (or just any operations on incoming records)

cf https://www.contentlayer.dev/docs/reference/source-files/define-document-type#computedfields

When loading a file i want to create new fields based on some computation so that i can have additional metadata

  • Add a type based on the folder of the file so that i can label blog posts
  • Add layout based on the folder so i can change layouts based on folder
  • compute title frontmatter from first heading

More Examples:

  • Generate keywords based on the content of the file.
  • Label blog posts based on the folder structure.
  • Count the number of words in the content.
  • Estimate the reading time based on the content.
  • Detect the language of the content.

Acceptance

  • Design sketch with API/UX for users of lib in README or similar
  • Simple example in examples folder just a single js file like index.js
import indexFolder from markdowndb

files = indexFolder(../../__mock__, computedFields=...)

console.log(files[0].readingTime)

Tasks

  • sketch out the design before implementing
  • ...

Design

We can achieve this by allowing users to define custom fields using JavaScript functions. Each function takes the following information as parameters:

  • File path
  • File metadata (e.g., tags, title, ...)
  • File body content
  • File type (e.g., blog, ...)

The custom field generated by the function can then be added to the document scheme.

Example Implementation

// User-defined function to generate keywords based on content
function generateKeywords(filePath, metadata, content, fileType) {
  const keywords = ....
  return { keywords };
}

// Document scheme with custom field added
const documentScheme = {
  keywords: { value: generateKeywords}, // this will be excuted
};

In this example, the generateKeywords function is a placeholder for the user-defined function. Users can implement similar functions to automate the generation of custom fields based on their specific requirements. The custom field is then added to the document scheme for validation and type decleration.

Notes

Config file with basic content include/exclude options

Acceptance โœ…19-12-2023 #97

A config file that allows for setting:

  • path to content directory (defaults to ./content)
  • files/directories to include
  • files/directories to exclude
  • ๐Ÿ† npx markdowndb init CLI script that will bootstrap the config file ( This is not needed )

[parse] Extract Links

I want to query for forward links and back links to files both markdown and non-markdown so that i can display back links, a network graph, deadlinks etc

  • standard markdown links
  • obsidian wiki links
  • embeds of files e.g. ![...]
    • wiki link embeds of files

So all of these

[...](...)
![...](...)
[[...]]   # wiki link style including with title
[[...|my title]]   # wiki link style including with title
![[...]]  # for images and other embeds

Acceptance

  • Support internal links (i.e. ../abc or /abc/ or abc
  • Supports all link types
    • markdown
    • wikilinks
    • markdown embeds
    • wikilink embeds

Bonus

  • convenience query functions (or at least a tutorial on select functions to run)
  • Support both with markdown file extension and without โœ…2023-11-17 โŒ does not make sense

Taks

  • first breaking test
  • code for link extraction from a markdown file (remark plugin or at least uses remark ast?)
  • table in database of links
  • query functions

Design - new

Why do i want links?

  • Backlinks List all files that link to this file
  • Network graph: do a network graph
  • Pages that don't yet exist (or deadlinks) Or list all files that are linked to but don't exist yet ...
  • Deadlinks is this image used anywhere?
getNonExistentFiles
getAllFilesThatLinkTo(fileOrUrl)

Architecture

Design

Crudely: [hello](world) becomes ...

// note we implicitly 
interface Link {
  from: "abc/foobar" // File path for the source file
  to: "abc/world.md" // points to the end of the link ( can be an external )  
  toRaw: 'world' // the raw link href
  text: 'hello'
  embed: true, // is it an embed link (default: false)
  internal: true // default true (external means http etc - not inside the contentbase) 
}

Analysis

For each of the following markdown in document abc/foobar.md what is the output link object

[hello](world)

=>

{
  toRaw: 'world' // raw link to
  text: 'hello'
  embed: true, // is it an embed link (default: false)
  internal: true // default true (external means http etc - not inside the contentbase) 
  from: "abc/foobar" // File path for the source file
  to: "abc/world.md" // points to the end of the link ( can be an external )  
}

This markdown

[hello](world.png)

=>

{
  toRaw: 'world.png' // raw link to
  text: 'hello'
  embed: true, // is it an embed link (default: false)
  internal: true // default true (external means http etc - not inside the contentbase) 
  from: "abc/foobar" // File path for the source file
  to: "abc/world.png" // points to the end of the link ( can be an external )  
}

This markdown

[hello](world.mdx)
{
  toRaw: 'world.mdx' 
  text: 'hello',
  embed: true,
  internal: true, 
  from: "abc/foobar",
  to: "abc/world.mdx" 
}

This markdown

[hello](/world)

=>

{
  toRaw: '/world', // raw link to
  text: 'hello',
  embed: true, 
  internal: true, 
  from: "abc/foobar",
  to: "abc/world"
}

This markdown

![hello](world.png)

=>

{
  toRaw: 'world.png', // raw link to
  text: 'hello',
  embed: true, 
  internal: true, 
  from: "abc/foobar",
  to: "abc/world.png"
}

![](world.png)

=>

{
  toRaw: 'world.png', // raw link to
  text: 'hello',
  embed: true, // is it an embed link (default: false)
  internal: true, // default true (external means http etc - not inside the contentbase)
  from: "abc/foobar", // File path for the source file
  to: "abc/world.png" // points to the end of the link ( can be an external )
}

Question:

  • how do we reference another file? what is it's primary key
    • obvious answer is path ...
      • but path relative to what? (relative to the root of the tree ... )
        • but what is the root of the tree? ... we know that when parsing ...

Please add "db", "sql" and "database" to about.

I was so excited to find this 2 days ago and forgot to star it and had more trouble than I should have finding it again. github search is terrible and these keywords would likely help this project get more visibility.

[inbox] MarkdownDB vision stuff (Rufus)

2023-09-23

MarkdownDB wiki vision stuff

Subject 1: MarkdownDB vision

Subject 2: How to build a directory of stuff in markdown (as a MarkdownDB) with obisidian (and git)

Subject 3: Why build a directory using markdown vs e.g. google sheets?

Subject 2a: How to use markdown-based approach for our DDP or Erasmus directory

Question: how do i create and contribute to a markdown-based directory

  • How to edit
  • How to collaborate (drive sync + git backups as extra)
  • How to publish?

Outflow

  • MarkdownDB as a pattern vs as a tool
  • Why better than X e.g. google sheets, airtable etc
    • And why is it worse?
    • => when to use markdowndb vs a full spreadsheet
  • For Life Itself mapping projects do we have "one big" database or several smaller databases?
    • Depends on whether there is overlap ... and in what way
      • Is there overlap?
      • What overlap is there?

Terminology

  • Markdown: a simple format for writing markup e.g. bold, italics, headings in a raw text file
  • A raw text file: a file made of simple ascii or unicode characters.
  • Document: a single markdown file e.g. my-file.md
  • Frontmatter: a convention for storing structured metadata in a Document (at the start hence "front"). Usually uses a format called yaml (common alternatives are json or toml)
  • Vault: Collection of files

MarkdownDB fails with more than 500 files in a top directory

Steps to reproduce

Try indexing this in markdowndb, this is the outcome
image

I've also tried indexing this directory on a fresh flowershow app, following this guide https://flowershow.app/docs/publish-tutorial and on the export section it will silently fail, but it will build again if i remove that dir

Possible problem

I'm pretty sure the problem lies somewhere here, from what i could gather in stackoverflow, sqlite has a limit on the batch insert of 500, so a possible fix would be to slice the filesToInsert variable in batches of 499 and then insert each batch separatedly.
https://github.com/datopian/markdowndb/blob/main/src/lib/markdowndb.ts#LL181C4-L181C4

Publish to markdowndb on npm

We are already publishing to @flowershow/markdowndb ... and would like to publish to markdowndb. However, markdown-db is in use ...

Tasks

  • Ping current owner of markdown-db to see if there is a way we can use markdowndb โœ…2023-05-02 pinged him
  • See what happens

Pages not generated for files with specific names in content folder

Adding a markdown file or page in the content directory starting with the word "obsidian" does not generate a path in markdown.db thus rendering a 404 page.

Acceptance

  • pages with word "obsidian" generate and render

Tasks

  • fix regex for ignore path patterns
  • ...

Notes:

The ignore regex pattern in scripts/mddb may be the underlying issue -

const ignorePatterns = [/Excalidraw/, /.obsidian/, /DS_Store/];

[epic] MarkdownDB Index and Library v1

A database of markdown files so that you can quickly access the metadata and content you want.

  • All metadata including frontmatter, links, tags, tasks etc
  • Auto-reloading
  • Super simple javascript API

Bonus

  • Can generate sqlite so you get full sql access (if you want)

Non-features

  • Does not index the full-text content

Re Flowershow: Use this to replace contentlayer.dev.

See https://datahub.io/notes/markdowndb

Acceptance aka Roadmap

  • POC covering basic extraction etc #6
  • #5
  • #2 - specifically parser plugins

Feature list

Marketing

Features

Index a folder of files - create an "DB" index from a folder of markdown files (and other files including images)

  • Index a folder and get JS/TS objects
  • Index a folder and get json output
  • BONUS Index multiple folders (with support for configuring e.g. prefixing in some way e.g. i have all my blog files in this separate folder over here)
  • Command line tool for indexing: Create a markdowndb (index) on the command line
  • Index a folder and get SQLite

Extract structured data like:

  • Frontmatter metadata: Extract markdown frontmatter and add in a metadata field
  • Tags: Extracts tags in markdown pages
    • Extract tags in frontmatter
    • Extract tags in body like #abc #49
  • Links: links between files like [hello](abc.md) or wikilink style [[xyz]] so we can compute backlinks or deadlinks etc (see #4)
  • #60

Data types, data enhancement and validation

  • Computed fields: add new metadata properties based on existing metadata e.g. a slug field computed from title field; or, adding a title based on the first h1 heading in a doc; or, a type field based on the folder of the file (e.g. these are blog posts). cf https://www.contentlayer.dev/docs/reference/source-files/define-document-type#computedfields. #54
  • Data validation and Document Types: validate metadata against a schema/type so that I know the data in the database is "valid" #55
    • deal with casting types e.g. string, number so that we can query in useful ways e.g. find me all blog posts before date X
    • BYOT (bring your own types): i want to create my own types ... so that when i get an object out it is cast to the right typescript type

Inbox

Marketing

Sections on front page about major features

  • Have a section on front page about links feature
  • Have a section for tags
  • etc

๐Ÿ’ค

  • Refactor: improve our interfaces, do something similar to CachedMetadata and CachedFile
  • "multi-thread" support for fast indexing

Misc

  • โž• 2023-03-15 Add layout e.g. layout: blog as a rule in markdown db loading rather than in getStaticPaths for rendering blogs (follow up to work in datopian/datahub-next#51) โ›”2023-03-17 on having markdowndb support for rules

Rufus random notes

  • how can we get type stuff like contentlayer has e.g. a given type in markdown frontmatter leads to use of X typescript type/interface
  • check out astro-build - how do they do type stuff?

Notes

Questions

  • What is a ContentBase / ContentDB? โœ…2023-03-07 a database (index) of content e.g. of text files on disk, images etc. DB need not store content of files but it "indexes" them i.e. has a list of them, with associated metadata etc.
  • Why do we need one? โœ…2023-03-07 a) to replace this (basic) functionality in ContentLayer.dev so we can replace ContentLayer.dev b) so we can richer things like get files with all tags etc
    • What contentlayer.dev API calls do we need to replace **โœ…2023-03-07 ~8 of them. quite simple. see below. **
  • What is the different between a Content Layer (API) and a ContentBase
  • What are the key technical components of a ContentBase โœ…2023-03-07 see diagram
  • What is MarkdownDB? โœ…2023-03-07 It is a ContentBase whose text files are in markdown format
  • What information do we index about markdown files in ContentBase? โœ…2023-03-07
    • frontmatter
    • list of all blocks and their types?
    • tags?
  • What is the unique identifier for files?
  • What are the job stories that the MarkdownDB needs to support? ๐Ÿ”ฅ
  • What about assets other than markdown files? e.g. images and pngs? โœ…2023-03-07 these should also get processed.
  • Does something like this already exist and how does it work?
  • How big will the sqlite db get? (i.e. per 1k documents indexed) NB: we aren't storing the text ... (though perhaps we could ...) ๐Ÿšง2023-03-07 guess metadata is ~1kb per file. so 1k files = 1Mb and 100k files = 100Mb so seems ok for memory
  • What happens if the sqlite file gets really big? โœ…2023-03-07 we've probably have to store it somewhere in cloud etc
  • What DB should we use e.g. IndexedDB or sqlite? โœ…2023-03-07 propose sqlite3 b/c you get sql etc and now pretty much supported in browser if we ever need that
  • How do we handle the indexing of remote files, such as files in GitHub repos? โœ…2023-03-07 โŒ kind of invalid question. we can index the remote files easily and then cache that locally. We aren't indexing on the fly.
    • Do we just store a reference to that file?
  • What's a minimal viable API? ๐Ÿšง2023-03-08 see section below

Notes on obsidian dataview API

blacksmithgu/obsidian-dataview#1811

How to handle document types 2023-03-09

I'm not sure how we want to handle types, since having it as a frontmatter field might not be the most ideal way because if we had a blog folder we'd have to add the type metadata to all the files individually.

On contentlayer.dev it uses a filePathPattern for that:

const Blog = defineDocumentType(() => ({
  name: "Blog",
  filePathPattern: `${siteConfig.blogDir}/!(index)*.md*`,
  contentType: "mdx",
  fields: {
  ...

I believe that's a good way of handling this. The caveat is that the path of a file is now determining its type and therefore folders with mixed types are impossible, although we could apply the pattern as something like *.blog.md*.

The use case I'm imaging is something like (there are probably better examples than blog):

blogs
  my-first-post.blog.mdx    // Blog type
  my-second-post.blog.mdx     // Blog type 
  index.mdx    // Generic page type 
  about-our-authors.mdx    // Generic page type
  write-for-us.contact.mdx    // Generic contact type                   

How could we index frontmatter into our db? 2023-03-09

My idea is to have another table for frontmatter, something like:

file_id field value (maybe) type: array or string
d9fc09 title My new post string

file_id should be a foreign key pointing to file._id.

To increase performance, since we are going to have many more rows now, we can create a DB index on this table (using the file_id field)

If done this way we are going to be able to query mdx files using frontmatter fields. E.g: (may not be exactly this)

MyMdDb.query({ tags: [economy], frontmatter: { author: 'Joรฃo' } })

Could not find a declaration file for module 'mddb'

I'm having this TypeScript issue using the client on a NextJs app with type: module set in package.json

// @ts-expect-error Could not find a declaration file for module 'mddb'
import { MarkdownDB } from "mddb";

MarkdownDB README with features, motivation, brief tutorial

New user: When coming to MarkdownDB I want to know (briefly) what it is, what features it has (existing or future) and a brief tutorial on use.

We could port material from https://datahub.io/notes/markdowndb especially the job stories (they could become features section).

  • Note one clarification re https://datahub.io/notes/markdowndb is that there is an ambiguity in use between markdowndb as the database itself and markdowndb as a library (index and API) for accessing a markdowndb. Our package is the latter and so for us the description of "markdowndb" is something like: "A javascript library for treating a directory of markdown files as a database" (or a better version thereof).

Emphasize: fine if this is a rough cut, we will rapidly iterate. better to get something down ...

Also find starting from a tutorial is something easier (a short example is worth a thousand descriptions) eg.

On disk you have:

```
my-blog-folder-of-markdown-files/
  blog-1.md
  blog-2.md
```

Then do ...

```
import db from markdowndb

db.create('path-to-folder')
db.query('all blogs written by x author'_
```

Behind the scenes we are creating an SQL(ite) database so you can do everything sql can! Here are some examples ...

etc

Acceptance

Feature List of MarkdownDB on website

Acceptance

Questions / blockers

  • apart from posting on markdowndb.com, do we post the blog post about MarkdownDB new features anywhere else? we don't post at markdowndb.com at all, post on datahub.io instead

Notes

Content

Why use mddb?

  • Tag Querying:

    • Retrieve tags from all files using the library.
  • Backward/Forward Links:

    • Establish backward and forward links for a file, enhancing file interconnectivity.
  • Custom Field Calculation:

    • Automatically calculate custom fields based on the content of a file.
  • Schema Validation:

    • Ensure that your files adhere to a predefined schema through built-in validation.
  • Comprehensive Feature Set:

    • The library offers a range of features to enhance file management and organization.
  • Content to Data Transformation:

    • Convert content into a format that is usable by your code.

Later Features:

  • Next.js Watch Integration:

    • Next.js doesn't watch for md file changes...
    • Utilize the library's Hot-reloading for seamless file change detection in Next.js.
  • Task Extraction:

    • Extract tasks directly from files for improved task management and organization.
  • Extensibility and Plugins:

    • Extend the functionality of mddb through plugins, allowing you to tailor the library to meet evolving needs and incorporate additional features seamlessly.
  • Searching capabilities:

    • Search though all of your files by a simple query.
    • Integrate a search functionality by leveraging mddb's out of the box search capabilities.

X Thread

MarkdownDB Tweet thread

๐Ÿš€ Announcing #MarkdownDB cool new features: export to JSON, task extraction, and computed fields!

๐Ÿงต๐Ÿ‘‡


(1/4) ๐Ÿ“ค Export to JSON files ๐Ÿ“ค

MarkdownDB now supports seamless export to #JSON!
Check out the example output in JSON format! ๐Ÿš€

json


(2/4) ๐Ÿ“‹ Task extraction ๐Ÿ“‹

Streamline task management with MarkdownDB! Extract tasks using robust queries.

tasks


(3/4) ๐Ÿค– Computed fields ๐Ÿค–

Enrich your Markdown with additional metadata computed on the fly using custom functions.

computed fields


(4/4) ๐Ÿš€ Getting Started ๐Ÿš€

Excited? Dive into our documentation for detailed instructions on how to use the new fatures today!

Tasks: extract tasks like this `- [ ] this is a task` (See obsidian data view)

extract tasks like this - [ ] this is a task (See obsidian data view)

Acceptance

  • File object has a tasks property with list of tasks โœ…2023-11-27 in PR #71
  • Each task has a description (the full text) and checked (true/false) โœ…2023-11-27 in PR #71
  • BONUS: summary of what dataview does in detail below
    • short intro text
    • copy/paste the full interface definitions
    • highlight what a task is versus a list item
    • describe how we differ (may be obvious but still briefly spell out)

Design

This is the ListItem interface for DataView

ListItem {
    /** The symbol ('*', '-', '1.') used to define this list item. */
    symbol: string;
    /** A link which points to this task, or to the closest block that this task is contained in. */
    link: Link;
    /** A link to the section that contains this list element; could be a file if this is not in a section. */
    section: Link;
    /** The text of this list item. This may be multiple lines of markdown. */
    text: string;
    /** The line that this list item starts on in the file. */
    line: number;
    /** The number of lines that define this list item. */
    lineCount: number;
    /** The line number for the first list item in the list this item belongs to. */
    list: number;
    /** Any links contained within this list item. */
    links: Link[];
    /** The tags contained within this list item. */
    tags: Set<string>;
    /** The raw Obsidian-provided position for where this task is. */
    position: Pos;
    /** The line number of the parent list item, if present; if this is undefined, this is a root item. */
    parent?: number;
    /** The line numbers of children of this list item. */
    children: number[];
    /** The block ID for this item, if one is present. */
    blockId?: string;
    /** Any fields defined in this list item. For tasks, this includes fields underneath the task. */
    fields: Map<string, Literal[]>;

    task?: {
        /** The text in between the brackets of the '[ ]' task indicator ('[X]' would yield 'X', for example.) */
        status: string;
        /** Whether or not this task has been checked in any way (it's status is not empty/space). */
        checked: boolean;
        /** Whether or not this task was completed; derived from 'status' by checking if the field 'X' or 'x'. */
        completed: boolean;
        /** Whether or not this task and all of it's subtasks are completed. */
        fullyCompleted: boolean;
    };

What might be beneficial for us:

  1. Symbol: It could be '*', '-', '1.', or another character.

  2. Link:

    • Represents a link pointing to the task or the closest block containing the task.
  3. Text:

    • Represents the text of the list item, which may consist of multiple lines of Markdown.
  4. Tags:

    • Represents the tags contained within the list item.
    • Scenario in which we might need it: Tags could be utilized to categorize tasks or list items. This allows users to easily filter and search for specific types of tasks, such as those related to a particular project or with a specific priority level.
  5. Parent and Children Information:

  • parent?: number;
    Represents the line number of the parent list item, if present. If undefined, this is a root item.
  • children: number[];
    Represents the line numbers of children of this list item.
    Scenario in which we might need it: When organizing tasks in a hierarchical manner, the parent and children information becomes crucial. For instance, in a project management tool, a parent task could represent a project, and its children could be individual tasks or subtasks. This hierarchy helps in visualizing and managing the project structure.
  1. Fields:
  • Represents any fields defined in this list item. For tasks, this includes fields underneath the task.
    • created: This field could be used to show when a task was initially created.
    • due: It provides the due date for a task.
    • start: Represents the start date of a task.
    • scheduled: Indicates the scheduled date for a task.

Distinguishing Tasks from List Items:

In DataView, tasks and list items are distinct entities. Tasks are characterized by the presence of checkboxes and associated status indicators.

Notes

interface Task {
  description:
  checked: true/false
}

And then give an example e.g.

- [ ] publish hello world

turn into ...

Unique Features That DataView has and we don't:

  • Comprehensive Task Completion Status:
    DataView surpasses basic completion tracking by assessing not only the task's completion status but also examining whether all subtasks associated with it are fully completed.

  • Selective Property Extraction:

    In addition to task completion analysis, DataView offers the functionality to selectively extract specific properties related to time management. The properties considered for extraction are:

    • created: If available, the 'created' property is extracted and incorporated into the DataView result.
    • due: If present, the 'due' property is extracted and included in the DataView result.
    • start: If the 'start' property exists, DataView extracts and integrates it into the result.
    • scheduled: DataView also considers the 'scheduled' property, extracting and including it if available.
  • List of Children:

    This might be needed to check for subtasks.

Commented line within code block incorrectly parsed as tag

Hey, happy new year and kudos for involving yourself into such a nice project !

I gave it a try today and ran into quite a few problems, which I'll try to report here. Here's the first one.

Using latest mddb on Ubuntu 23.04.

One of my notes contains

```bash
#---------------------------------------------------------------------------------------
# Install MySQL and Dependencies
#---------------------------------------------------------------------------------------
echo -e "\\n\\n######### Installing mysql Server #########\\n\\n"
...
```

and caused mddb to try and insert '---------------------------------------------------------------------------------------' as a tag.

extract headings to db

how about to extract headings (marked as #, ... ###### ) to db ?
heading is the main structor of a markdown file.
this helps structer the file and make local knowedege db avaluable.

[epic] MarkdownDB v0.1

Spike solution of an index of our markdown files so that I can quickly access the metadata and content I want.

See parent epic for details: #3

Acceptance

  • We have a new lib/markdowndb.js
  • We are in a position to replace contentlayer.dev queries points with our new system (though doing this is separate issue - see datopian/datahub-next#32) โœ… 2023-03-11 MarkdownDB is capable of indexing a folder and retrieving files using the Query function (which replaces the conentlayer.dev getters), we should be able to pipe that into next-mdx-remote and replace contenrlayer.dev
  • Tests โœ… 2023-03-11 added unit test for indexing and querying

Tasks

Design

API sketch

Minimal viable API

lib/db.ts

indexFolder(folderPath, sqliteDb)

interface Database {

  getFileInfo()

  getTags
  
  query(query: DatabaseQuery)
}

interface File {
  filetype
}

interface MarkdownFile extends File {
  frontmatter: // raw frontmatter
  // metadata // someday or even we just have specific objects that 
}

What's the db schema?


CREATE TABLE files (
  "_id": hashlib.sha1(path.encode("utf8")).hexdigest(),
  "_path": path,
  "frontmatter": json version of frontmatter
  "filetype": "markdown" | "csv" | "png" | ... (by extension?)
  -- "fileclass": "text" | "image" | "data"
  "type": type field in frontmatter if it exists -- ? do we want this
)

[inbox] What is a MarkdownDB?

Inbox of other material

Intro

A MarkdownDB is a pattern for treating markdown files as a lightweight database along with an API for accessing them.

More specifically it is a simple way to turn a collection of markdown files and their structured data (frontmatter, tags etc) into a queryable SQL database and API. Extracted metadata includes simple things like frontmatter, tags like #mytag and much more.

MarkdownDB is especially appropriate for ...

  • Collections where the individual records include both rich text and structured metadata
  • "Micro" collections with less than 10k records

The Pattern: markdown files are records

Database consists of two things:

  • The data itself - in markdowndb these are markdown files
  • An index and API for accessing and querying those files - in markdowndb this is an sqlite database

Roughly:

  • Each markdown file corresponds to a record
  • Each directory corresponds to a table (if you want it to)

Let's have an example:

my-random-file.md
movies
  return-of-the-jedi.md
  ...

return-of-the-jedi.md

---
date: 1983
budget: 32.7
---

# Return of the Jedi

Return of the Jedi (also known as Star Wars: Episode VI โ€“ Return of the Jedi) is a 1983 American epic space opera film directed by Richard Marquand. The screenplay is by Lawrence Kasdan and George Lucas from a story by Lucas, who was also the executive producer. The sequel to Star Wars (1977) and The Empire Strikes Back (1980), it is the third installment in the original Star Wars trilogy.

Architecture

Related efforts

What's distinctive about markdowndb approach (๐Ÿšฉ this is where pattern and library intermingle a bit - in the most general sense markdowndb would just be a way of treating markdown files as data)

  • Do one thing only (and well): focused on building the index and providing an API
    • Does not get involved in render pipeline (as contentlayer.dev, etc
  • extract structured content beyond frontmatter
  • sql(ite) oriented (why reinvent the wheel!)
  • not tied to a particular stack (in contrast to nuxt content, astro etc)
  • open source (in constrast to tina content layer)

Inspirations etc

  • contentlayer.dev
  • https://content.nuxtjs.org
    • really nice
    • use sqlite (rather than mongo syntax)
    • tied to nuxt
    • gets involved in the rendering pipeline in a variety of ways

JSON output option

Write output objects to .markdowndb/ e.g.

.markdowndb/
  files.json  # array of file objects

  # BONUS - don't need to do these yet
  tags.json  # tags indexed by tag names with a list of files
  links.json   # array of link objects

Why do this? Simplest thing that a web dev could use ... may also solve the hot reloading need in #45 โœ…28-11-2023 #73

Strict Mode, Cross-Platform Compatibility, and JSDoc Documentation

This is done โœ… 30/11/2023

  1. Include strict: true, as it is considered a best practice. This modification will necessitate some updates in the code but will substantially reduce the occurrence of bugs.
  2. Initiate a discussion on how the library should behave on Windows versus macOS concerning file_path. Additionally, provide a fix for unit tests (as they currently fail on Windows due to path library differences)."
  3. Add Jsdoc documentation for important functions in the library

MarkdownDB tutorial

Tutorial sketch

Let's make a list of the cool projects we've built over the years.

[TODO: do we use datopian projects or make something up a bit like tailwindui!]

Let's create a markdown file for each project.

Let's start with simplest possible

# My Cool Project 1

All about my cool project.

Maybe a picture ...

Let's run markdowndb:

mddb .

Now we have an sqlite file:

sqlite markdowndb.sqlite

Ok, our file is in there:

> SELECT * FROM TABLE X ...

Turn our projects list into something nice on the terminal

Shall we write some javascript ...

import xxx

// get the list of files 

// show them (bonus to use chalk and do something nice but that is later)

console.log(`### {project.title}`.format)
console.log(project.description)

e.g. list the projects by line:

### Project Title 1
description

### Project Title 2
description

or even

| filename | title | description |
| filename | title | description |

Let's create some metadata

---
date:
stars: ...
---

Rebuild markdowndb ...

ASIDE: would be nice to have watch functionality built in to mddb

Cool thing would be extracting title

Move job story and feature content from https://datahub.io/notes/markdowndb

https://datahub.io/notes/markdowndb#job-stories

Acceptance

  • Move job stories โœ…2023-11-18 Have moved job stories into #3 plus specific issues for the features.
  • See if any other content to move (and move it) โœ…2023-12-19 no other content really to move
  • Move notes item to a blog post with an update this got turned into markdowndb โœ…2023-12-19 โŒ i think at some point we could turn https://datahub.io/notes/markdowndb in a back-post as "markdowndb - a first sketch" or something like that

Research Obsidian dataview approach to a markdown db

Obsidian dataview contains a sophisticated markdowndb index. its open source and we could learn from or even reuse some of.

In progress notes about obsidian where we could include these: https://datahub.io/notes/obsidian

Acceptance

The core question we want to answer:

  • Can we directly reuse obsidian-dataview or parts of it directly e.g. via javascript import etc or is it somehow dependent on obsidian (NB: this is the preferred option if possible. let's not reinvent the wheel) ๐Ÿ’ฌ2023-11-09 preliminary research back in February here blacksmithgu/obsidian-dataview#1811 โœ…2023-11-10 i don't think this is possible - see #5 (comment)
  • If we can't directly reuse it, can we indirectly reuse e.g. by copying code/patterns etc โœ…2023-11-10 yes i think so. see all the notes below.

We have researched what dataview does https://github.com/blacksmithgu/obsidian-dataview work? specifically ...

  • What is the "database" structure?
  • What is indexed? e.g. tags, tasks (where is code for this - see next item)
  • What code does the indexing/parsing?
    • tags extraction e.g. #tag-name
    • tasks extraction
    • ...
  • What is the query API?
  • What is the query language?
  • What is the code for converting queries to db access?

Tags extraction from body

Currently, only frontmatter tags are being extracted.

Acceptance

  • tags are also extracted from the markdown body โœ…2023-11-17 ( see PR )
  • tests โœ…2023-11-17 ( see PR )

Refactor code to have clean interface between parse layer and write to database and focus tests on parse layer

Refactor to focus the code on the first steps to get a JS/TS objects representing essential "data". Create clear separation of concerns so that we can rapidly add new well tested features e.g. tag parsing, computed fields.

graph TD

markdown --remark-parse--> st[syntax tree]
st --extract features--> jsobj1[TS Object eg. File plus Metadata plus Tags plus Links]
jsobj1 --computing--> jsobj[TS Objects]
jsobj --convert to sql--> sqlite[SQLite markdown.db]
jsobj --write to disk--> json[JSON on disk in .markdowndb folder]
jsobj --tests--> testoutput[Test results]

Comment: we should write most of our tests at the js object level. Not at sqlite level.

Acceptance

See design sketch below.

FUTURE

  • Bonus: Split out the sqlite wrapper to write to sqlite in a separate file and refactored current code to use the processFile function

Comment: we can ignore sqlite right now in this branch not worry about refactoring that code to write to sqlite from JS objects. This would simplify code and allow us to use typescript for everything for now. It is easy to convert json to sqlite later if we want.

Notes

  • Use micromark to parse the markdown file? โœ…2023-11-10 โŒ just use remark-parse for now

Sample code

// FUTURE - to show how code path will run
function buildMarkdownSQLDb(folder) {
  walkFolder(folder)
  for each file in folder:
    parseFile => File
    storeFileInDb(File) // or storeAllFiles at once later
}

// what we want to add ...
function parseFile(path) returns File
  getKeyInfoOnFile(path)
  if(isMarkdown) {
    parseMarkdownFile(string) // returns Metadata, Links, Tags, Tasks
    // add this info to the FileInfo ...
  } else {
    return FileInfo
 } 
}

interface FileInfo {
  path
  
}

interface MarkdownFile extend File {
  // adds Metadata
  // adds Tags?
}

Schema

interface File {
  path: string; // relative path on disk, also will be primary key
  extension: string;
  metadata: MetaData | null; // frontmatter
}

interface Metadata {
  key: value
}

interface Link {
  link_type: "normal" | "embed";
  from: string; // path
  to: string; // to path or url
}

Design sketch for JSON on disk

NB: โŒ we don't need this atm

.markdowndb/
  files.json
  links.json

Define the schema for that in typescript

NextJS blog example using markdowndb

Create a tutorial and example in examples folder of using markdowndb to make a simple nextjs based blog.

Reference: https://nextjs.org/learn-pages-router/basics/data-fetching/blog-data - note we are replacing their hand-coded get the blog posts with our quicker version and explaining the extra stuff we can do.

Acceptance

  • Tutorial written ๐Ÿšง2023-11-22 see sketch below โœ…2023-11-30 #76
  • Example in examples folder showing finished state (in perfect world we have a branch with each major change in tutorial being a commit so we can point to that) โœ…2023-11-30 #76

Tasks

  • Draft an outline of tutorial โœ…2023-11-30 #76
  • Roughly fill it in and write the code as you go ...
    • Review together in a pull request (you can submit even at first step as a draft PR and we keep reviewing)
  • Fill it out ...
  • Done ...

Notes

Something like this:

  • we're going to create a nextjs based blog with help of markdowndb
  • here's a simple nextjs project
  • here's a folder with 3 blog posts
  • here's some code to create the blog listing page
  • we need to load those blog posts and get their titles etc
  • let's use markdowndb
    • install
    • use: here's the code snippet
  • ok, so we could have done that by hand ...
  • but here are some cool features
    • look ma, no front matter: auto-extract title for us from the first heading in the file. auto-extract an image
    • validation! suppose we add a blog post and forget to add an image field. now our site will erro
      • we could check for this easily in markdowndb ... here's how
    • tags: just set tags with hashbang

Migrate markdowndb to its own repo

We want markdowndb split out from Flowershow so that it is more visible and can be more cleanly reused and contributed to

Acceptance

NB: removing from Flowershow is not part of this work (will be a follow up in flowershow and once we have a release setup)

  • Has own repo at datopian/markdowndb โœ…2023-04-28
    • Includes description
  • README and code migrated from https://github.com/datopian/flowershow/tree/main/packages/markdowndb
  • Code working locally e.g. tests, etc. installation works
  • Migrate existing issues and discussions from flowershow, datahub-next etc ๐Ÿšง2023-05-02 have migrated all discussions so far
    • Flowershow discussions
    • Flowershow issues no open issues to migrate
    • DataHub-next issues no open issues to migrate

Tasks

[epic] MarkdownDB site (landing page) and "launch"

Landing page / launch announce

Objective: can announce the "idea" and prototype on e.g. hacker news

I'm imagining we 3 pieces of core content (which could be posts/pages in themselves) which then inform the main front page

  • Why: what's cool about markdowndb
  • What: the vision of markdowndb, how it works at a high level and where it's going
  • How: some concrete detail (a demo, or short video etc)

๐Ÿšฉ I'm not convinced doing this so thoroughly is necessary. Maybe we can just have a rough landing page and flesh out detailed post later ๐Ÿ˜‰ /cc @olayway

Acceptance

  • Core content
    • Why
    • What
    • How
  • Landing page
    • Vision statement
    • Roadmap (maybe in form of features "coming soon")
    • CTA (e.g. sign up / follow us / get in touch / contribute on github)

Announce

Tasks

  • Brainstorm the main content
  • Turn into mini posts (?)
  • Draft landing page

2023-10-11

Minor additions

  • Reduce whitespace at top of hero (there's a lot and "Built with ..." is right at bottom of screen)
  • Quickstart link in hero and navbar should link to quickstart not tutorial
  • Built with โค๏ธ by Datopian in footer and at bottom of Hero
  • Centering the quickstart titles
  • Fix-up Github README
  • Review Roadmap

2023-10-04

  • Update hero section as per discussion in excalidraw
  • Create how it works sequence from above story
    • Bonus: turn gif into video and put on youtube and then embed in the hero on the right (can have this and the "how it works" section)
  • Leave out "How MarkdownDB fits your world" (for now - we'll add back later)
  • Features section ... hmmm (leave for now and we can revisit - what i note here are just thoughts)
    • Can we simplify to have fewer features and

2023-10-07

  • Fix theme color bug
  • Hero section
    • have new tagline and summary i.e. "A rich SQL API to your markdown files in seconds." and "An open library to transform markdown content into sql-queryable data. Build rich markdown-powered sites easily and reliably."
    • Add 3 key features
    • Image placeholder for video
  • Move features up below hero
  • Then unified vision
  • Then quickstart

Quickstart text

  1. You have a folder of markdown content e.g. some blog posts

    1. Each file has some frontmatter
  2. Install markdowndb (optional)

  3. Index the files using markdowndb mddb

  4. Query our files: get a list of all our blog posts with their titles

  • sql query
  • js examples (nextjs)
  1. A bit more interesting: query just featured blog posts

  2. Use in your application with framework your of choice [show js code using this in e.g. getStaticProps]

  3. [optional] Running app with blog posts or list of projects (reuse our screenshot ...)

Notes

Inspirations perhaps for landing pages

These aren't all as relevant (some we just like layout or approach vs actual info architectuve)

  • editable.website (like the way it explains the need)
  • dub.sh (a nice simple product landing page with clear explanation of what it is)

Landing page v1 copy

Hero section

Title 1: The missing API/interface from your markdown files to a blog/digital garden/notion alterantive/...

Title A:
Welcome to MarkdownDB - your next-level content base. ๐Ÿ”ฅ๐Ÿ”ฅ

Title B:
Unlock the potential of Markdown as data.

Title C:
Reimagining Markdown's potential with MarkdownDB.
...


Subtitle A:
Combine the simplicity of Markdown with the capabilities of a database. ๐Ÿ”ฅ

Subtitle B:
From Markdown files to a rich, queryable database in a snap. ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ‘ˆ๐Ÿ‘ˆ

Subtitle C:
Elevate your markdown files. Create, extend, and extract with MarkdownDB.

[DEMO gif or side-by-side screenshots of md files opened in e.g. Obsidian and indexed in the db]

Features / Why MarkdowbDB

  • Power of plain text: Combination of unstructured content and structured data in simple Markdown files. With MarkdownDB, you no longer have to compromise between the ease of writing in Markdown and the functionality of a full-fledged database.

  • Simplicity at core: Turn your Markdown files into a queryable, lightweight SQL database.

  • Flexible and extendabile: Bring your own document types, extend your frontmatter with computed fields and check for errors with with custom validations.

  • Simple API: Get a list of all or some Markdown files, filter them by frontmatter fields, and more.

  • Do one thing well ๐Ÿ˜‰ markdowndb just gives you a database, an API a super-powerful and extensible way to create those from markdown. We don't provide a UI, live editing of values etc ... (though others may do!)

  • Open source: Your content isnโ€™t locked away in proprietary platforms. Itโ€™s open, it's free, itโ€™s yours.

  • Not tied to any stack: Use anywhere you want - NextJS, SvelteKit, from the command line etc etc.

  • Images and other assets as well as text

MarkdownDB the Pattern - why use markdown to create a database/collections

  • Power of plain text: Combination of unstructured content and structured data in simple Markdown files. With MarkdownDB, you no longer have to compromise between the ease of writing in Markdown and the functionality of a full-fledged database.

  • Open source/formats

  • Combined Structured and unstructured information

  • Images and other assets as well as text

Roadmap

Phase 1 - Basic Implementation:

  • Indexing
    • Conversion of Markdown files into database records
  • Structured Data Extraction
    • Extraction of frontmatter data
    • Automatic type casting for simplified queries
  • Links Extraction:
    • Extraction of forward links and backlinks
  • Basic API:
    • Get a list of all or some of the Markdown files
    • Get a list of forward and/or backlinks to a Markdown file
    • Filter by metadata fields

Phase 2 - Advanced Features:

  • BYOT (Bring Your Own Types) System:
    • Ability for users to define their own types for more customized data validation and retrieval.
  • Plugins system:
    • ...

Phase 3 - Optimization:

  • Database Optimization:
    • Refinement of database operations for enhanced speed and data integrity.

The vision

Unified Content Management

Imagine a world where Markdown isnโ€™t just text - itโ€™s an entry in a database, it's a source of structured and unstructured data. With MarkdownDB, we aim to balance the simplicity and accessibility of writing in Markdown with the ability to treat your collection of markdown files like a database (think Notion) - allowing, for example presenting each markdown file in a folder as a row in a sheet (e.g. for a project list or any other kind of collection), think querying your markdown files like a database e.g. show me documents with a created in the last week with "hello world" in the title or show me all tasks in all documents with "โญ๏ธ" emoji in the task (indicating it's next up!)

How does it work?

  1. Extract: From frontmatter, links, tasks, and more - data extraction is comprehensive and intuitive.

  2. Index: Effortlessly index a folder of markdown files and transform them into structured databse records.

  3. Query: Utilize the simple API to query your content. Whether youโ€™re creating an individual page, generating a tag list, or referencing backlinks, MarkdownDB has got you covered.

[See It in Action]: Dive into our demo video that walks you through how MarkdownDB transforms the familiar markdown syntax into a rich, interactive database. You'll witness how effortlessly you can extend, embed, and extract content, all with the foundational simplicity of Markdown.

CRF

Old taglines use on github:

MarkdownDB is a pattern and toolkit for using markdown to store collections of stuff. Think a simple open source Airtable or spreadsheet alternative.

Improvements to task extraction

Follow on to #60

Stuff like parsing out (like dataview):

  • created: If available, the 'created' property is extracted and incorporated into the DataView result.
  • due: If present, the 'due' property is extracted and included in the DataView result.
  • start: If the 'start' property exists, DataView extracts and integrates it into the result.
  • scheduled: DataView also considers the 'scheduled' property, extracting and including it if available.

Also maybe pulling all list items and being like dataview ...

[epic] MarkdownDB plugin system

We want a plugin system in MarkdownDB so people can easily extend the core functionality, for example to extract additional metadata, so that not all functionality has to be in core and people can rapidly add functionality

Sketch (April 2023)

https://link.excalidraw.com/l/9u8crB2ZmUo/9hkrQmVl9QX

image

Acceptance

  • Identify the different types of plugins โœ…2023-11-19 roughly: parsing, computing, validating (and maybe serializing ...)
  • Research how remark works to see if we can reuse it ๐Ÿšง2023-11-19 see notes in comment below
  • Design of MarkdownDB and especially the plugin system.
    • extract first heading as title metadata
    • add a metadata field

Notes

MarkdownDB vs Contentlayer

Contentlayer supported:

  • document types with
    • frontmatter schema definition and validation
    • assigning document types based on glob patterns
    • computed fields, e.g. description auto-extracted from the document content
  • excluding/including some content folders we kinda already have this but it's not configurable
  • ...

What we need:

  • probably config file similar to Contentlayer one, with:
    • custom document types,
    • content include/exclude option
    • plugins
    • ...
  • ...

(Meta)Data Validation and Document Types

When loading a file I want to validate it against a schema/type so that I know the data/content in my contentbase/"database" is valid

  • When validation fails what happens?
  • Error messages should be super helpful
  • Follow the principle of erroring early

When loading a file I want to allow "extra" metadata by default so that I don't get endless warnings about extra fields that are not defined for document type X.

When accessing a File I want to cast it to a proper typescript type so that I can use it from code with all the benefits of typescript.

BYOT (bring your own types) When working with markdowndb i want to create my own types ... so that when I get an object out it is cast to the right typescript type

Acceptance

  • Design sketch in README of API for users
  • Example just simple usage from JS (no need for nextjs) e.g. like this
import indexFolder from markdowndb

// setup zod stuff and configure markdowndb with

files, errors = indexFolder(withZodStuff)
console.log(errors)

Tasks

  • Sketch out a simple example concrete use case (write a short tutorial illustrating this hypothetical feature)
  • Research how astro does it with its zod stuff
  • sketch out the design before implementing
  • Implement

Design

  1. Should the user define the scheme using Zod, or should we build a more intuitive way that doesn't require knowledge of external libraries?

I believe we should make the use of Zod optional, as it may be overkill for users. Most users likely want to validate if a specific front matter field is provided, rather than engaging in complex validation.

The best approach would be to allow users to:

  1. Make a field required or optional.
  2. Specify whether it's a string or a number.
  3. Provide their own validation logic (giving users flexibility without imposing unnecessary complexity).
  4. Choose whether to use Zod for validation.

Example:

If a user wants to validate that a field named dates with two or more dates is provided, the schema could be defined as follows:

dates: {
    type: string,
    required: true,
    validate: (fileObject) => {
        if (/* field 'dates' is incorrect */) {
            return {
                status: false, // Error
                message: ""     // Error message
            };
        } else {
            return {
                status: true,  // Correct
            };
        }
    }
}

  1. Should all the file schemes be defined in the config file?
    Yes, it's cleaner, and we should allow the users to import schemes from other js files for organization.

  1. Determining Files Matching a Scheme:

Option 1: Continue with type Property

  • If no pattern field is provided, default to using the type property (e.g., type: "post"). This caters to simplicity for basic users and use cases.

Option 2: Introduce a pattern Property

  • If a pattern field is provided, utilize it to match files based on the specified pattern (e.g., pattern: "post/**"). This offers flexibility for handling diverse and complex use cases.

Consideration:
The goal is to strike a balance between providing essential functionality and accommodating complex use cases. The defaulting to type for simplicity while allowing the use of a custom pattern enables users to tailor the file matching process to their specific needs.


  1. Adding Computed Fields:

To incorporate computed fields into the scheme, a compute or value field can be introduced, accepting a function. This allows users to perform computations or set values dynamically.

Example:

Suppose you have a scenario where the dates field needs to be computed based on some dynamic logic. You can define it as follows:

dates: {
    type: string,
    required: true,
    compute: (fileObject, ast) => {
        // Perform dynamic computation based on fileObject or ast
        // ...

        // Return the updated fileObject
        return fileObject;
    },
    validate: (fileObject) => {
        // Validation logic for the 'dates' field
        // ...
    }
}

In this example, the compute function takes the fileObject (representing the file's metadata) and ast (abstract syntax tree) as parameters. Users can apply custom logic within the compute function to dynamically calculate or set the value of the dates field.

This approach enhances flexibility by allowing users to include computed fields seamlessly within the overall scheme, contributing to a more versatile and adaptable file structure.


  1. Also Maybe we can build npm packages with a pre-defined library of schemes or provide a comprehensive list of examples, as I consider schemes to be something not that simple for all users.
    I think a few examples that cover what the user might want to do is enough.

Notes

Architecture

I think we use zod here and typescript types. (cf astro approach)

How does zod work?

  1. Defining Schemas:

    • You create a schema using the various methods provided by Zod, such as z.string(), z.number(), z.object(), etc.
    • Example:
      import { z } from "zod";
      
      const userSchema = z.object({
        username: z.string(),
        age: z.number().min(18),
      });
  2. Validation:

    • You can then use the parse method to validate and parse data according to the defined schema.

    • Example:

      const validUserData = userSchema.parse({
        username: "john_doe",
        age: 25,
      });
    • If the provided data doesn't match the schema, a ZodError is thrown with details about the validation errors.

  3. Custom Error Messages:

    • You can customize error messages to provide meaningful feedback to users.
    const userSchema = z.object({
      username: z.string().min(3, { message: "Username must be at least 3 characters" }),
      age: z.number().min(18, { message: "Age must be at least 18" }),
    });

MarkdownDB v0.2: link extraction

Original issue in datahub-next repo: #4

I want to query for forward links and back links to files both markdown and non-markdown so that i can display back links, a network graph, deadlinks etc

  • standard markdown links
  • obsidian wiki links
  • embeds of files e.g. ![...]
    • wiki link embeds of files

So all of these

[...](...)
![...](...)
[[...]]   # wiki link style including with title
[[...|my title]]   # wiki link style including with title
![[...]]  # for images and other embeds

Design

Crudely:

// note we implicitly 
interface Link {
  src: <FileID>
  dest: <FileID>
  text: // link text if any
  type: normal | embed
  // raw:  // the raw text of the original link
}

// functions like the following
// or these are attributes on File object
getLinks(fileId1: FileID): Array<Link> {
}
getBackLinks(fileId1: FileID): Array<Link> {
}

Acceptance

  • Support internal links (i.e. ../abc or /abc/ or abc
  • Supports all link types
    • markdown
    • wikilinks
    • markdown embeds
    • wikilink embeds
  • convenience query functions (or at least a tutorial on select functions to run)

Bonus

  • Support both with markdown file extension and without

Taks

  • first breaking test
  • code for link extraction from a markdown file (remark plugin or at least uses remark ast?)
  • table in database of links
  • query functions

Support Obsidian-style tags list in frontmatter

Standard YAML frontmatter lists can be written in either of the following two formats:

  1. Single line:
tags: [typora, basic, export]
  1. Multiple lines:
tags:
  - typora
  - basic
  - export

Obsidian tags frontmatter field also supports this format:

tags: typora, basic, export

Currently MarkdownDB extracts it as a string "typora, basic, export" which results in errors when MarkdownDB tries to iterate over it.
See related issue reported in Flowershow repo: datopian/flowershow#543

Is there a program to update the database in real time for real time previews

We want a way so that you can rebuild the database in real time so that we get real time previews.

Simplest approach is probably to use https://github.com/paulmillr/chokidar to watch the markdown folders and rebuild some json files (with these files being watched by nextjs or whatever tool you are using). this at least is a first pass.

This probably depends on the JSON output option.

Acceptance

Refactor part II: refactor mddb code to use new code (process.ts)

Next step in refactoring started in #47, specifically

  • indexFolder function to index a folder and give back File objects โœ…2023-03-07 ( #59 )
  • refactor sqlite generation code to run off this function (i.e. code in markdowndb.ts) โœ…2023-03-07 ( #59 )
    • no change in API compared to the past โœ…2023-11-22 ( #63 )

Acceptance

  • sql code is running off the core extraction code โœ…2023-03-07 ( #59 )
  • Merged to main with changeset and v0.4.0 tag โœ…2023-11-22 ( #63 )

Notes

graph TD

markdown --remark-parse--> st[syntax tree]
st --extract features--> jsobj1[TS Object eg. File plus Metadata plus Tags plus Links]
jsobj1 --computing--> jsobj[TS Objects]
jsobj --convert to sql--> sqlite[SQLite markdown.db]
jsobj --write to disk--> json[JSON on disk in .markdowndb folder]
jsobj --tests--> testoutput[Test results]

Link improvements

Links with ./world.md should be treated like world.md, and /world should default to the root of the content system.

โœ…1-12-2023 22dcad9

Create a documentation page for features

Features to add:

  • What does mddb extract? ( I don't know if this is needed )
  • Document types ( I will add this after we finish configuration file )
  • Computed fields

Refactor AST Processing for tags, links and tasks

We will extract the conversion of source to AST from the ExtractlinksFromBody to processAST function

// This is parsing the AST for the link extraction
 const processor = unified()
   .use(markdown)
   .use([
     gfm,
     [
       remarkWikiLink,
       { pathFormat: "obsidian-short", permalinks: options?.permalinks },
     ],
     ...userRemarkPlugins,
   ]);

 const ast = processor.parse(source);

then instead of

  const bodyTags = extractTagsFromBody(source);
  const links = extractWikiLinks(options?.from || "", source, {
    permalinks: options?.permalinks,
  });

we will do:

  const AST = processAST(src)
  const bodyTags = extractTagsFromASt(AST);
  const links = extractWikiLinks(options?.from || "", AST, {
    permalinks: options?.permalinks,
  });
  // TODO
  const tasks = extractTasks(AST)

Custom document types

Acceptance

  • a user can define custom document types in a config file, e.g. markdowndb.config.js
  • files table has a new column: type
  • document types can be assigned to files based on glob patterns (e.g. filePathPattern option of document types)
  • document types can be assigned to files based on type frontmatter field (takes precedence over filePathPattern)
  • document types can have their frontmatter fields defined
  • fields validation
  • computed fields
  • option to set front matter fields validation as strict, i.e. no more, no less fields

Publish to npm under `mddb`

@olayway let's go with mddb for now.

Originally posted by @rufuspollock in #9 (comment)

Tasks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.