Module not found: Can't resolve 'better-sqlite3'

Running a NextJs app on a Turborepo gives Module not found issues from knex

Module not found: Can't resolve 'better-sqlite3'

adding better-sqlite3 to package.json will keep asking for the next missing packages (tedious, mysql, mysql2, oracle, etc)

Import trace for requested module:
../../node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/knex/lib/dialects/index.js
../../node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/knex/lib/knex-builder/internal/config-resolver.js
../../node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/knex/lib/knex-builder/Knex.js
../../node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/knex/lib/index.js
../../node_modules/.pnpm/[email protected][email protected][email protected]/node_modules/knex/knex.js
../../node_modules/.pnpm/[email protected][email protected]/node_modules/mddb/dist/src/lib/markdowndb.js
../../node_modules/.pnpm/[email protected][email protected]/node_modules/mddb/dist/src/index.js
./src/providers/markdowndb.ts
./src/app/page.tsx

Computed metadata fields

Computed fields (or just any operations on incoming records)

cf https://www.contentlayer.dev/docs/reference/source-files/define-document-type#computedfields

When loading a file i want to create new fields based on some computation so that i can have additional metadata

Add a type based on the folder of the file so that i can label blog posts
Add layout based on the folder so i can change layouts based on folder
compute title frontmatter from first heading

More Examples:

Generate keywords based on the content of the file.
Label blog posts based on the folder structure.
Count the number of words in the content.
Estimate the reading time based on the content.
Detect the language of the content.

Acceptance

Design sketch with API/UX for users of lib in README or similar
Simple example in examples folder just a single js file like index.js

import indexFolder from markdowndb

files = indexFolder(../../__mock__, computedFields=...)

console.log(files[0].readingTime)

Tasks

sketch out the design before implementing
...

Design

We can achieve this by allowing users to define custom fields using JavaScript functions. Each function takes the following information as parameters:

File path
File metadata (e.g., tags, title, ...)
File body content
File type (e.g., blog, ...)

The custom field generated by the function can then be added to the document scheme.

Example Implementation

// User-defined function to generate keywords based on content
function generateKeywords(filePath, metadata, content, fileType) {
  const keywords = ....
  return { keywords };
}

// Document scheme with custom field added
const documentScheme = {
  keywords: { value: generateKeywords}, // this will be excuted
};

In this example, the generateKeywords function is a placeholder for the user-defined function. Users can implement similar functions to automate the generation of custom fields based on their specific requirements. The custom field is then added to the document scheme for validation and type decleration.

Notes

Announce MarkdownDB on Twitter

Acceptance

draft tweets thread
publish

Config file with basic content include/exclude options

Acceptance ✅19-12-2023 #97

A config file that allows for setting:

path to content directory (defaults to ./content)
files/directories to include
files/directories to exclude
🏆 npx markdowndb init CLI script that will bootstrap the config file ( This is not needed )

[parse] Extract Links

I want to query for forward links and back links to files both markdown and non-markdown so that i can display back links, a network graph, deadlinks etc

standard markdown links
obsidian wiki links
embeds of files e.g. ![...]
- wiki link embeds of files

So all of these

[...](...)
![...](...)
[[...]]   # wiki link style including with title
[[...|my title]]   # wiki link style including with title
![[...]]  # for images and other embeds

Acceptance

Bonus

convenience query functions (or at least a tutorial on select functions to run)
~~Support both with markdown file extension and without~~ ✅2023-11-17 ❌ does not make sense

Taks

first breaking test
code for link extraction from a markdown file (remark plugin or at least uses remark ast?)
table in database of links
query functions

Design - new

Why do i want links?

Backlinks List all files that link to this file
Network graph: do a network graph
Pages that don't yet exist (or deadlinks) Or list all files that are linked to but don't exist yet ...
Deadlinks is this image used anywhere?

getNonExistentFiles
getAllFilesThatLinkTo(fileOrUrl)

Architecture

Design

Crudely: [hello](world) becomes ...

// note we implicitly 
interface Link {
  from: "abc/foobar" // File path for the source file
  to: "abc/world.md" // points to the end of the link ( can be an external )  
  toRaw: 'world' // the raw link href
  text: 'hello'
  embed: true, // is it an embed link (default: false)
  internal: true // default true (external means http etc - not inside the contentbase) 
}

Analysis

For each of the following markdown in document abc/foobar.md what is the output link object

[hello](world)

=>

{
  toRaw: 'world' // raw link to
  text: 'hello'
  embed: true, // is it an embed link (default: false)
  internal: true // default true (external means http etc - not inside the contentbase) 
  from: "abc/foobar" // File path for the source file
  to: "abc/world.md" // points to the end of the link ( can be an external )  
}

This markdown

[hello](world.png)

=>

{
  toRaw: 'world.png' // raw link to
  text: 'hello'
  embed: true, // is it an embed link (default: false)
  internal: true // default true (external means http etc - not inside the contentbase) 
  from: "abc/foobar" // File path for the source file
  to: "abc/world.png" // points to the end of the link ( can be an external )  
}

This markdown

[hello](world.mdx)

{
  toRaw: 'world.mdx' 
  text: 'hello',
  embed: true,
  internal: true, 
  from: "abc/foobar",
  to: "abc/world.mdx" 
}

This markdown

[hello](/world)

=>

{
  toRaw: '/world', // raw link to
  text: 'hello',
  embed: true, 
  internal: true, 
  from: "abc/foobar",
  to: "abc/world"
}

This markdown

![hello](world.png)

=>

{
  toRaw: 'world.png', // raw link to
  text: 'hello',
  embed: true, 
  internal: true, 
  from: "abc/foobar",
  to: "abc/world.png"
}

![](world.png)

=>

{
  toRaw: 'world.png', // raw link to
  text: 'hello',
  embed: true, // is it an embed link (default: false)
  internal: true, // default true (external means http etc - not inside the contentbase)
  from: "abc/foobar", // File path for the source file
  to: "abc/world.png" // points to the end of the link ( can be an external )
}

Question:

how do we reference another file? what is it's primary key
- obvious answer is path ...
  - but path relative to what? (relative to the root of the tree ... )
    - but what is the root of the tree? ... we know that when parsing ...

Please add "db", "sql" and "database" to about.

I was so excited to find this 2 days ago and forgot to star it and had more trouble than I should have finding it again. github search is terrible and these keywords would likely help this project get more visibility.

[inbox] MarkdownDB vision stuff (Rufus)

2023-09-23

MarkdownDB wiki vision stuff

Subject 1: MarkdownDB vision

Subject 2: How to build a directory of stuff in markdown (as a MarkdownDB) with obisidian (and git)

Subject 3: Why build a directory using markdown vs e.g. google sheets?

Subject 2a: How to use markdown-based approach for our DDP or Erasmus directory

Question: how do i create and contribute to a markdown-based directory

How to edit
How to collaborate (drive sync + git backups as extra)
How to publish?

Outflow

MarkdownDB as a pattern vs as a tool
Why better than X e.g. google sheets, airtable etc
- And why is it worse?
- => when to use markdowndb vs a full spreadsheet
For Life Itself mapping projects do we have "one big" database or several smaller databases?
- Depends on whether there is overlap ... and in what way
  - Is there overlap?
  - What overlap is there?

Terminology

Markdown: a simple format for writing markup e.g. bold, italics, headings in a raw text file
A raw text file: a file made of simple ascii or unicode characters.
Document: a single markdown file e.g. my-file.md
Frontmatter: a convention for storing structured metadata in a Document (at the start hence "front"). Usually uses a format called yaml (common alternatives are json or toml)
Vault: Collection of files

Add discord link (portaljs channel) to markdowndb website and github readme

This is the discord link https://discord.gg/EeyfGrGu4U

Add to readme
Add to website ✅Done

MarkdownDB fails with more than 500 files in a top directory

Steps to reproduce

Copy a directory with more than 500 markdown files inside to your content folder eg: https://github.com/openspending/community.openspending.org/tree/gh-pages/resources

Try indexing this in markdowndb, this is the outcome

I've also tried indexing this directory on a fresh flowershow app, following this guide https://flowershow.app/docs/publish-tutorial and on the export section it will silently fail, but it will build again if i remove that dir

Possible problem

I'm pretty sure the problem lies somewhere here, from what i could gather in stackoverflow, sqlite has a limit on the batch insert of 500, so a possible fix would be to slice the filesToInsert variable in batches of 499 and then insert each batch separatedly.
https://github.com/datopian/markdowndb/blob/main/src/lib/markdowndb.ts#LL181C4-L181C4

Publish to markdowndb on npm

We are already publishing to @flowershow/markdowndb ... and would like to publish to markdowndb. However, markdown-db is in use ...

Tasks

Ping current owner of markdown-db to see if there is a way we can use markdowndb ✅2023-05-02 pinged him
See what happens

Pages not generated for files with specific names in content folder

Adding a markdown file or page in the content directory starting with the word "obsidian" does not generate a path in markdown.db thus rendering a 404 page.

Acceptance

pages with word "obsidian" generate and render

Tasks

fix regex for ignore path patterns
...

Notes:

The ignore regex pattern in scripts/mddb may be the underlying issue -

const ignorePatterns = [/Excalidraw/, /.obsidian/, /DS_Store/];

[epic] MarkdownDB Index and Library v1

A database of markdown files so that you can quickly access the metadata and content you want.

All metadata including frontmatter, links, tags, tasks etc
Auto-reloading
Super simple javascript API

Bonus

Can generate sqlite so you get full sql access (if you want)

Non-features

Does not index the full-text content

Re Flowershow: Use this to replace contentlayer.dev.

See https://datahub.io/notes/markdowndb

Acceptance aka Roadmap

POC covering basic extraction etc #6
#5
#2 - specifically parser plugins

Feature list

Marketing

#1

Features

Index a folder of files - create an "DB" index from a folder of markdown files (and other files including images)

Index a folder and get JS/TS objects
Index a folder and get json output
BONUS Index multiple folders (with support for configuring e.g. prefixing in some way e.g. i have all my blog files in this separate folder over here)
Command line tool for indexing: Create a markdowndb (index) on the command line
Index a folder and get SQLite

Extract structured data like:

Frontmatter metadata: Extract markdown frontmatter and add in a metadata field
Tags: Extracts tags in markdown pages
- Extract tags in frontmatter
- Extract tags in body like #abc #49
Links: links between files like [hello](abc.md) or wikilink style [[xyz]] so we can compute backlinks or deadlinks etc (see #4)
#60

Data types, data enhancement and validation

Computed fields: add new metadata properties based on existing metadata e.g. a slug field computed from title field; or, adding a title based on the first h1 heading in a doc; or, a type field based on the folder of the file (e.g. these are blog posts). cf https://www.contentlayer.dev/docs/reference/source-files/define-document-type#computedfields. #54
Data validation and Document Types: validate metadata against a schema/type so that I know the data in the database is "valid" #55
- deal with casting types e.g. string, number so that we can query in useful ways e.g. find me all blog posts before date X
- BYOT (bring your own types): i want to create my own types ... so that when i get an object out it is cast to the right typescript type

Inbox

Marketing

Sections on front page about major features

Have a section on front page about links feature
Have a section for tags
etc

💤

Refactor: improve our interfaces, do something similar to CachedMetadata and CachedFile
"multi-thread" support for fast indexing

Misc

➕ 2023-03-15 Add layout e.g. layout: blog as a rule in markdown db loading rather than in getStaticPaths for rendering blogs (follow up to work in datopian/datahub-next#51) ⛔2023-03-17 on having markdowndb support for rules

Rufus random notes

how can we get type stuff like contentlayer has e.g. a given type in markdown frontmatter leads to use of X typescript type/interface
check out astro-build - how do they do type stuff?

Notes

Questions

Notes on obsidian dataview API

blacksmithgu/obsidian-dataview#1811

How to handle document types 2023-03-09

I'm not sure how we want to handle types, since having it as a frontmatter field might not be the most ideal way because if we had a blog folder we'd have to add the type metadata to all the files individually.

On contentlayer.dev it uses a filePathPattern for that:

const Blog = defineDocumentType(() => ({
  name: "Blog",
  filePathPattern: `${siteConfig.blogDir}/!(index)*.md*`,
  contentType: "mdx",
  fields: {
  ...

I believe that's a good way of handling this. The caveat is that the path of a file is now determining its type and therefore folders with mixed types are impossible, although we could apply the pattern as something like *.blog.md*.

The use case I'm imaging is something like (there are probably better examples than blog):

blogs
  my-first-post.blog.mdx    // Blog type
  my-second-post.blog.mdx     // Blog type 
  index.mdx    // Generic page type 
  about-our-authors.mdx    // Generic page type
  write-for-us.contact.mdx    // Generic contact type

How could we index frontmatter into our db? 2023-03-09

My idea is to have another table for frontmatter, something like:

file_id	field	value	(maybe) type: array or string
d9fc09	title	My new post	string

file_id should be a foreign key pointing to file._id.

To increase performance, since we are going to have many more rows now, we can create a DB index on this table (using the file_id field)

If done this way we are going to be able to query mdx files using frontmatter fields. E.g: (may not be exactly this)

MyMdDb.query({ tags: [economy], frontmatter: { author: 'João' } })

Make tests for folders with 500+ files

We need to add tests for large folders: #22

I think we should make a new repository for that

Could not find a declaration file for module 'mddb'

I'm having this TypeScript issue using the client on a NextJs app with type: module set in package.json

// @ts-expect-error Could not find a declaration file for module 'mddb'
import { MarkdownDB } from "mddb";

MarkdownDB README with features, motivation, brief tutorial

New user: When coming to MarkdownDB I want to know (briefly) what it is, what features it has (existing or future) and a brief tutorial on use.

We could port material from https://datahub.io/notes/markdowndb especially the job stories (they could become features section).

Note one clarification re https://datahub.io/notes/markdowndb is that there is an ambiguity in use between markdowndb as the database itself and markdowndb as a library (index and API) for accessing a markdowndb. Our package is the latter and so for us the description of "markdowndb" is something like: "A javascript library for treating a directory of markdown files as a database" (or a better version thereof).

Emphasize: fine if this is a rough cut, we will rapidly iterate. better to get something down ...

Also find starting from a tutorial is something easier (a short example is worth a thousand descriptions) eg.

On disk you have:

```
my-blog-folder-of-markdown-files/
  blog-1.md
  blog-2.md
```

Then do ...

```
import db from markdowndb

db.create('path-to-folder')
db.query('all blogs written by x author'_
```

Behind the scenes we are creating an SQL(ite) database so you can do everything sql can! Here are some examples ...

etc

Acceptance

Brief description of what markdowndb is and why you might use it (e.g. why we will use it in Flowershow etc) see https://github.com/flowershow/flowershow/tree/main/packages/markdowndb#markdowndb
Brief description (e.g. bullet list) of features existing or forthcoming (emojis are a bonus!) see https://github.com/flowershow/flowershow/tree/main/packages/markdowndb#features
Quick start tutorial see https://github.com/flowershow/flowershow/tree/main/packages/markdowndb#quick-start

Feature List of MarkdownDB on website

Acceptance

README.md updated #85
features list added on the MarkdownDB website 🛑 do this first - and fast
- probably one section for each significant feature - use code like we have in the tutorial for the image. Probably a layout a bit like https://tailwindui.com/components/marketing/sections/feature-sections#component-11e5dbce11b8c462441792503ea864fc
write a blog post about new features - pretty much copy and past with headings of the above
prepare a Tweet thread (e.g. in HackMD) announcing new features (link to the blog post) - the post in chunked form ✅18/12/2023
ask @popovayoana to tweet ✅18/12/2023

Questions / blockers

apart from posting on markdowndb.com, do we post the blog post about MarkdownDB new features anywhere else? we don't post at markdowndb.com at all, post on datahub.io instead

Notes

examples of some old threads on Datopian Twitter (inspiration for the Tweet thread about MarkdownDB new features)
- https://twitter.com/datopian/status/1712755357368336797
- https://twitter.com/datopian/status/1661012395324739586

Content

Why use mddb?

Tag Querying:
- Retrieve tags from all files using the library.
Backward/Forward Links:
- Establish backward and forward links for a file, enhancing file interconnectivity.
Custom Field Calculation:
- Automatically calculate custom fields based on the content of a file.
Schema Validation:
- Ensure that your files adhere to a predefined schema through built-in validation.
Comprehensive Feature Set:
- The library offers a range of features to enhance file management and organization.
Content to Data Transformation:
- Convert content into a format that is usable by your code.

Later Features:

Next.js Watch Integration:
- Next.js doesn't watch for md file changes...
- Utilize the library's Hot-reloading for seamless file change detection in Next.js.
Task Extraction:
- Extract tasks directly from files for improved task management and organization.
Extensibility and Plugins:
- Extend the functionality of mddb through plugins, allowing you to tailor the library to meet evolving needs and incorporate additional features seamlessly.
Searching capabilities:
- Search though all of your files by a simple query.
- Integrate a search functionality by leveraging mddb's out of the box search capabilities.

X Thread

MarkdownDB Tweet thread

🚀 Announcing #MarkdownDB cool new features: export to JSON, task extraction, and computed fields!

🧵👇

(1/4) 📤 Export to JSON files 📤

MarkdownDB now supports seamless export to #JSON!
Check out the example output in JSON format! 🚀

(2/4) 📋 Task extraction 📋

Streamline task management with MarkdownDB! Extract tasks using robust queries.

(3/4) 🤖 Computed fields 🤖

Enrich your Markdown with additional metadata computed on the fly using custom functions.

(4/4) 🚀 Getting Started 🚀

Excited? Dive into our documentation for detailed instructions on how to use the new fatures today!

Tasks: extract tasks like this `- [ ] this is a task` (See obsidian data view)

extract tasks like this - [ ] this is a task (See obsidian data view)

Acceptance

File object has a tasks property with list of tasks ✅2023-11-27 in PR #71
Each task has a description (the full text) and checked (true/false) ✅2023-11-27 in PR #71
BONUS: summary of what dataview does in detail below
- short intro text
- copy/paste the full interface definitions
- highlight what a task is versus a list item
- describe how we differ (may be obvious but still briefly spell out)

Design

This is the ListItem interface for DataView

ListItem {
    /** The symbol ('*', '-', '1.') used to define this list item. */
    symbol: string;
    /** A link which points to this task, or to the closest block that this task is contained in. */
    link: Link;
    /** A link to the section that contains this list element; could be a file if this is not in a section. */
    section: Link;
    /** The text of this list item. This may be multiple lines of markdown. */
    text: string;
    /** The line that this list item starts on in the file. */
    line: number;
    /** The number of lines that define this list item. */
    lineCount: number;
    /** The line number for the first list item in the list this item belongs to. */
    list: number;
    /** Any links contained within this list item. */
    links: Link[];
    /** The tags contained within this list item. */
    tags: Set<string>;
    /** The raw Obsidian-provided position for where this task is. */
    position: Pos;
    /** The line number of the parent list item, if present; if this is undefined, this is a root item. */
    parent?: number;
    /** The line numbers of children of this list item. */
    children: number[];
    /** The block ID for this item, if one is present. */
    blockId?: string;
    /** Any fields defined in this list item. For tasks, this includes fields underneath the task. */
    fields: Map<string, Literal[]>;

    task?: {
        /** The text in between the brackets of the '[ ]' task indicator ('[X]' would yield 'X', for example.) */
        status: string;
        /** Whether or not this task has been checked in any way (it's status is not empty/space). */
        checked: boolean;
        /** Whether or not this task was completed; derived from 'status' by checking if the field 'X' or 'x'. */
        completed: boolean;
        /** Whether or not this task and all of it's subtasks are completed. */
        fullyCompleted: boolean;
    };

What might be beneficial for us:

Symbol: It could be '*', '-', '1.', or another character.
Link:
- Represents a link pointing to the task or the closest block containing the task.
Text:
- Represents the text of the list item, which may consist of multiple lines of Markdown.
Tags:
- Represents the tags contained within the list item.
- Scenario in which we might need it: Tags could be utilized to categorize tasks or list items. This allows users to easily filter and search for specific types of tasks, such as those related to a particular project or with a specific priority level.
Parent and Children Information:

parent?: number;
Represents the line number of the parent list item, if present. If undefined, this is a root item.
children: number[];
Represents the line numbers of children of this list item.
Scenario in which we might need it: When organizing tasks in a hierarchical manner, the parent and children information becomes crucial. For instance, in a project management tool, a parent task could represent a project, and its children could be individual tasks or subtasks. This hierarchy helps in visualizing and managing the project structure.

Fields:

Represents any fields defined in this list item. For tasks, this includes fields underneath the task.
- created: This field could be used to show when a task was initially created.
- due: It provides the due date for a task.
- start: Represents the start date of a task.
- scheduled: Indicates the scheduled date for a task.

Distinguishing Tasks from List Items:

In DataView, tasks and list items are distinct entities. Tasks are characterized by the presence of checkboxes and associated status indicators.

Notes

interface Task {
  description:
  checked: true/false
}

And then give an example e.g.

- [ ] publish hello world

turn into ...

Unique Features That `DataView` has and we don't:

Comprehensive Task Completion Status:
DataView surpasses basic completion tracking by assessing not only the task's completion status but also examining whether all subtasks associated with it are fully completed.
Selective Property Extraction:

In addition to task completion analysis, DataView offers the functionality to selectively extract specific properties related to time management. The properties considered for extraction are:
- created: If available, the 'created' property is extracted and incorporated into the DataView result.
- due: If present, the 'due' property is extracted and included in the DataView result.
- start: If the 'start' property exists, DataView extracts and integrates it into the result.
- scheduled: DataView also considers the 'scheduled' property, extracting and including it if available.
List of Children:

This might be needed to check for subtasks.

Commented line within code block incorrectly parsed as tag

Hey, happy new year and kudos for involving yourself into such a nice project !

I gave it a try today and ran into quite a few problems, which I'll try to report here. Here's the first one.

Using latest mddb on Ubuntu 23.04.

One of my notes contains

```bash
#---------------------------------------------------------------------------------------
# Install MySQL and Dependencies
#---------------------------------------------------------------------------------------
echo -e "\\n\\n######### Installing mysql Server #########\\n\\n"
...
```

and caused mddb to try and insert '---------------------------------------------------------------------------------------' as a tag.

extract headings to db

how about to extract headings (marked as #, ... ###### ) to db ?
heading is the main structor of a markdown file.
this helps structer the file and make local knowedege db avaluable.

Add instructions for developers to README

E.g. how to version, tag and publish

[epic] MarkdownDB v0.1

Spike solution of an index of our markdown files so that I can quickly access the metadata and content I want.

See parent epic for details: #3

Acceptance

We have a new lib/markdowndb.js
- We can index a folder of markdown files into sqlite3
- Query function to get all files
- Query to get files from folder e.g. all blog posts ✅ 2023-03-08 PR: https://github.com/datopian/datahub-next/pull/39
- Index non markdown assets e.g. images, csv ... so can query for all images using same kind of options for markdown files e.g. by folder etc ✅ 2023-03-08 PR: https://github.com/datopian/datahub-next/pull/39, added querying by filetype
We are in a position to replace contentlayer.dev queries points with our new system (though doing this is separate issue - see datopian/datahub-next#32) ✅ 2023-03-11 MarkdownDB is capable of indexing a folder and retrieving files using the Query function (which replaces the conentlayer.dev getters), we should be able to pipe that into next-mdx-remote and replace contenrlayer.dev
Tests ✅ 2023-03-11 added unit test for indexing and querying

Tasks

Define basic schema ✅2023-03-07 see below
Write a test and create a fixture folder of files ✅ 2023-03-08 PR: https://github.com/datopian/datahub-next/pull/36
Walk content folder ✅ 2023-03-08 PR: https://github.com/datopian/datahub-next/pull/36
- For each markdown file extract frontmatter etc ✅ 2023-03-08 PR: https://github.com/datopian/datahub-next/pull/36
...

Design

API sketch

Minimal viable API

lib/db.ts

indexFolder(folderPath, sqliteDb)

interface Database {

  getFileInfo()

  getTags
  
  query(query: DatabaseQuery)
}

interface File {
  filetype
}

interface MarkdownFile extends File {
  frontmatter: // raw frontmatter
  // metadata // someday or even we just have specific objects that 
}

What's the db schema?


CREATE TABLE files (
  "_id": hashlib.sha1(path.encode("utf8")).hexdigest(),
  "_path": path,
  "frontmatter": json version of frontmatter
  "filetype": "markdown" | "csv" | "png" | ... (by extension?)
  -- "fileclass": "text" | "image" | "data"
  "type": type field in frontmatter if it exists -- ? do we want this
)

[inbox] What is a MarkdownDB?

Inbox of other material

https://next.datahub.io/notes/markdowndb

Intro

A MarkdownDB is a pattern for treating markdown files as a lightweight database along with an API for accessing them.

More specifically it is a simple way to turn a collection of markdown files and their structured data (frontmatter, tags etc) into a queryable SQL database and API. Extracted metadata includes simple things like frontmatter, tags like #mytag and much more.

MarkdownDB is especially appropriate for ...

Collections where the individual records include both rich text and structured metadata
"Micro" collections with less than 10k records

The Pattern: markdown files are records

Database consists of two things:

The data itself - in markdowndb these are markdown files
An index and API for accessing and querying those files - in markdowndb this is an sqlite database

Roughly:

Each markdown file corresponds to a record
Each directory corresponds to a table (if you want it to)

Let's have an example:

my-random-file.md
movies
  return-of-the-jedi.md
  ...

return-of-the-jedi.md

---
date: 1983
budget: 32.7
---

# Return of the Jedi

Return of the Jedi (also known as Star Wars: Episode VI – Return of the Jedi) is a 1983 American epic space opera film directed by Richard Marquand. The screenplay is by Lawrence Kasdan and George Lucas from a story by Lucas, who was also the executive producer. The sequel to Star Wars (1977) and The Empire Strikes Back (1980), it is the third installment in the original Star Wars trilogy.

Architecture

Related efforts

What's distinctive about markdowndb approach (🚩 this is where pattern and library intermingle a bit - in the most general sense markdowndb would just be a way of treating markdown files as data)

Do one thing only (and well): focused on building the index and providing an API
- Does not get involved in render pipeline (as contentlayer.dev, etc
extract structured content beyond frontmatter
sql(ite) oriented (why reinvent the wheel!)
not tied to a particular stack (in contrast to nuxt content, astro etc)
open source (in constrast to tina content layer)

Inspirations etc

contentlayer.dev
https://content.nuxtjs.org
- really nice
- use sqlite (rather than mongo syntax)
- tied to nuxt
- gets involved in the rendering pipeline in a variety of ways

JSON output option

Write output objects to .markdowndb/ e.g.

.markdowndb/
  files.json  # array of file objects

  # BONUS - don't need to do these yet
  tags.json  # tags indexed by tag names with a list of files
  links.json   # array of link objects

Why do this? Simplest thing that a web dev could use ... may also solve the hot reloading need in #45 ✅28-11-2023 #73

Strict Mode, Cross-Platform Compatibility, and JSDoc Documentation

This is done ✅ 30/11/2023

Include strict: true, as it is considered a best practice. This modification will necessitate some updates in the code but will substantially reduce the occurrence of bugs.
Initiate a discussion on how the library should behave on Windows versus macOS concerning file_path. Additionally, provide a fix for unit tests (as they currently fail on Windows due to path library differences)."
Add Jsdoc documentation for important functions in the library

Auto release doesn't publish build files

Related to #29

MarkdownDB tutorial

Tutorial sketch

Let's make a list of the cool projects we've built over the years.

[TODO: do we use datopian projects or make something up a bit like tailwindui!]

Let's create a markdown file for each project.

Let's start with simplest possible

# My Cool Project 1

All about my cool project.

Maybe a picture ...

Let's run markdowndb:

mddb .

Now we have an sqlite file:

sqlite markdowndb.sqlite

Ok, our file is in there:

> SELECT * FROM TABLE X ...

Turn our projects list into something nice on the terminal

Shall we write some javascript ...

import xxx

// get the list of files 

// show them (bonus to use chalk and do something nice but that is later)

console.log(`### {project.title}`.format)
console.log(project.description)

e.g. list the projects by line:

### Project Title 1
description

### Project Title 2
description

or even

| filename | title | description |
| filename | title | description |

Let's create some metadata

---
date:
stars: ...
---

Rebuild markdowndb ...

ASIDE: would be nice to have watch functionality built in to mddb

Cool thing would be extracting title

Move job story and feature content from https://datahub.io/notes/markdowndb

https://datahub.io/notes/markdowndb#job-stories

Acceptance

Move job stories ✅2023-11-18 Have moved job stories into #3 plus specific issues for the features.
See if any other content to move (and move it) ✅2023-12-19 no other content really to move
Move notes item to a blog post with an update this got turned into markdowndb ✅2023-12-19 ❌ i think at some point we could turn https://datahub.io/notes/markdowndb in a back-post as "markdowndb - a first sketch" or something like that

Setup auto-publishing to npm

Copy over solution implemented in portaljs repo.
datopian/datahub#922

Research Obsidian dataview approach to a markdown db

Obsidian dataview contains a sophisticated markdowndb index. its open source and we could learn from or even reuse some of.

In progress notes about obsidian where we could include these: https://datahub.io/notes/obsidian

Acceptance

The core question we want to answer:

Can we directly reuse obsidian-dataview or parts of it directly e.g. via javascript import etc or is it somehow dependent on obsidian (NB: this is the preferred option if possible. let's not reinvent the wheel) 💬2023-11-09 preliminary research back in February here blacksmithgu/obsidian-dataview#1811 ✅2023-11-10 i don't think this is possible - see #5 (comment)
If we can't directly reuse it, can we indirectly reuse e.g. by copying code/patterns etc ✅2023-11-10 yes i think so. see all the notes below.

We have researched what dataview does https://github.com/blacksmithgu/obsidian-dataview work? specifically ...

What is the "database" structure?
What is indexed? e.g. tags, tasks (where is code for this - see next item)
What code does the indexing/parsing?
- tags extraction e.g. #tag-name
- tasks extraction
- ...
What is the query API?
What is the query language?
What is the code for converting queries to db access?

Tags extraction from body

Currently, only frontmatter tags are being extracted.

Acceptance

tags are also extracted from the markdown body ✅2023-11-17 ( see PR )
tests ✅2023-11-17 ( see PR )

Refactor code to have clean interface between parse layer and write to database and focus tests on parse layer

Refactor to focus the code on the first steps to get a JS/TS objects representing essential "data". Create clear separation of concerns so that we can rapidly add new well tested features e.g. tag parsing, computed fields.

graph TD

markdown --remark-parse--> st[syntax tree]
st --extract features--> jsobj1[TS Object eg. File plus Metadata plus Tags plus Links]
jsobj1 --computing--> jsobj[TS Objects]
jsobj --convert to sql--> sqlite[SQLite markdown.db]
jsobj --write to disk--> json[JSON on disk in .markdowndb folder]
jsobj --tests--> testoutput[Test results]

Comment: we should write most of our tests at the js object level. Not at sqlite level.

Acceptance

See design sketch below.

process.ts with a processFile function exists which given a path returns a File object or a derivative thereof ✅2023-11-15 this is done - see https://github.com/datopian/markdowndb/blob/8fcda1db4fc2b19eade703c25f91438eb6e69d30/src/lib/process.ts
- supports frontmatter extraction ✅2023-11-15 see tests https://github.com/datopian/markdowndb/blob/8fcda1db4fc2b19eade703c25f91438eb6e69d30/src/lib/process.spec.ts
- supports links ✅2023-11-15 ditto
has tests ✅2023-11-15 see https://github.com/datopian/markdowndb/blob/8fcda1db4fc2b19eade703c25f91438eb6e69d30/src/lib/process.spec.ts

FUTURE

Bonus: Split out the sqlite wrapper to write to sqlite in a separate file and refactored current code to use the processFile function

Comment: we can ignore sqlite right now in this branch not worry about refactoring that code to write to sqlite from JS objects. This would simplify code and allow us to use typescript for everything for now. It is easy to convert json to sqlite later if we want.

Notes

Use micromark to parse the markdown file? ✅2023-11-10 ❌ just use remark-parse for now

Sample code

// FUTURE - to show how code path will run
function buildMarkdownSQLDb(folder) {
  walkFolder(folder)
  for each file in folder:
    parseFile => File
    storeFileInDb(File) // or storeAllFiles at once later
}

// what we want to add ...
function parseFile(path) returns File
  getKeyInfoOnFile(path)
  if(isMarkdown) {
    parseMarkdownFile(string) // returns Metadata, Links, Tags, Tasks
    // add this info to the FileInfo ...
  } else {
    return FileInfo
 } 
}

interface FileInfo {
  path
  
}

interface MarkdownFile extend File {
  // adds Metadata
  // adds Tags?
}

Schema

interface File {
  path: string; // relative path on disk, also will be primary key
  extension: string;
  metadata: MetaData | null; // frontmatter
}

interface Metadata {
  key: value
}

interface Link {
  link_type: "normal" | "embed";
  from: string; // path
  to: string; // to path or url
}

Design sketch for JSON on disk

NB: ❌ we don't need this atm

.markdowndb/
  files.json
  links.json

Define the schema for that in typescript

NextJS blog example using markdowndb

Create a tutorial and example in examples folder of using markdowndb to make a simple nextjs based blog.

Reference: https://nextjs.org/learn-pages-router/basics/data-fetching/blog-data - note we are replacing their hand-coded get the blog posts with our quicker version and explaining the extra stuff we can do.

Acceptance

Tutorial written 🚧2023-11-22 see sketch below ✅2023-11-30 #76
Example in examples folder showing finished state (in perfect world we have a branch with each major change in tutorial being a commit so we can point to that) ✅2023-11-30 #76

Tasks

Draft an outline of tutorial ✅2023-11-30 #76
Roughly fill it in and write the code as you go ...
- Review together in a pull request (you can submit even at first step as a draft PR and we keep reviewing)
Fill it out ...
Done ...

Notes

Something like this:

we're going to create a nextjs based blog with help of markdowndb
here's a simple nextjs project
here's a folder with 3 blog posts
here's some code to create the blog listing page
we need to load those blog posts and get their titles etc
let's use markdowndb
- install
- use: here's the code snippet
ok, so we could have done that by hand ...
but here are some cool features
- look ma, no front matter: auto-extract title for us from the first heading in the file. auto-extract an image
- validation! suppose we add a blog post and forget to add an image field. now our site will erro
  - we could check for this easily in markdowndb ... here's how
- tags: just set tags with hashbang

Announce our MarkdownDB tutorial

Acceptance

Twitter
- draft a tweet
- tweet
...

Migrate markdowndb to its own repo

We want markdowndb split out from Flowershow so that it is more visible and can be more cleanly reused and contributed to

Acceptance

NB: removing from Flowershow is not part of this work (will be a follow up in flowershow and once we have a release setup)

Has own repo at datopian/markdowndb ✅2023-04-28
- Includes description
README and code migrated from https://github.com/datopian/flowershow/tree/main/packages/markdowndb
Code working locally e.g. tests, etc. installation works
Migrate existing issues and discussions from flowershow, datahub-next etc 🚧2023-05-02 have migrated all discussions so far
- Flowershow discussions
- Flowershow issues no open issues to migrate
- DataHub-next issues no open issues to migrate

Tasks

Create the repo
Migrate current README and code there from https://github.com/datopian/flowershow/tree/main/packages/markdowndb
Make any necessary corrections
Migrate existings discussions and material there

[epic] MarkdownDB site (landing page) and "launch"

Landing page / launch announce

Objective: can announce the "idea" and prototype on e.g. hacker news

I'm imagining we 3 pieces of core content (which could be posts/pages in themselves) which then inform the main front page

Why: what's cool about markdowndb
What: the vision of markdowndb, how it works at a high level and where it's going
How: some concrete detail (a demo, or short video etc)

🚩 I'm not convinced doing this so thoroughly is necessary. Maybe we can just have a rough landing page and flesh out detailed post later 😉 /cc @olayway

Acceptance

Announce

Tasks

Brainstorm the main content
- Collect content - perhaps merge from #7 and also see datopian/datahub#899
Turn into mini posts (?)
Draft landing page

2023-10-11

Minor additions

Reduce whitespace at top of hero (there's a lot and "Built with ..." is right at bottom of screen)
Quickstart link in hero and navbar should link to quickstart not tutorial
Built with ❤️ by Datopian in footer and at bottom of Hero
Centering the quickstart titles
Fix-up Github README
Review Roadmap

2023-10-04

Update hero section as per discussion in excalidraw
Create how it works sequence from above story
- Bonus: turn gif into video and put on youtube and then embed in the hero on the right (can have this and the "how it works" section)
Leave out "How MarkdownDB fits your world" (for now - we'll add back later)
Features section ... hmmm (leave for now and we can revisit - what i note here are just thoughts)
- Can we simplify to have fewer features and

2023-10-07

Fix theme color bug
Hero section
- have new tagline and summary i.e. "A rich SQL API to your markdown files in seconds." and "An open library to transform markdown content into sql-queryable data. Build rich markdown-powered sites easily and reliably."
- Add 3 key features
- Image placeholder for video
Move features up below hero
Then unified vision
Then quickstart

Quickstart text

You have a folder of markdown content e.g. some blog posts
1. Each file has some frontmatter
Install markdowndb (optional)
Index the files using markdowndb mddb
Query our files: get a list of all our blog posts with their titles

sql query
js examples (nextjs)

A bit more interesting: query just featured blog posts
Use in your application with framework your of choice [show js code using this in e.g. getStaticProps]
[optional] Running app with blog posts or list of projects (reuse our screenshot ...)

Notes

Inspirations perhaps for landing pages

These aren't all as relevant (some we just like layout or approach vs actual info architectuve)

editable.website (like the way it explains the need)
dub.sh (a nice simple product landing page with clear explanation of what it is)

Landing page v1 copy

Hero section

Title 1: The missing API/interface from your markdown files to a blog/digital garden/notion alterantive/...

Title A:
Welcome to MarkdownDB - your next-level content base. 🔥🔥

Title B:
Unlock the potential of Markdown as data.

Title C:
Reimagining Markdown's potential with MarkdownDB.
...

Subtitle A:
Combine the simplicity of Markdown with the capabilities of a database. 🔥

Subtitle B:
From Markdown files to a rich, queryable database in a snap. 🔥🔥 👈👈

Subtitle C:
Elevate your markdown files. Create, extend, and extract with MarkdownDB.

[DEMO gif or side-by-side screenshots of md files opened in e.g. Obsidian and indexed in the db]

Features / Why MarkdowbDB

Power of plain text: Combination of unstructured content and structured data in simple Markdown files. With MarkdownDB, you no longer have to compromise between the ease of writing in Markdown and the functionality of a full-fledged database.
Simplicity at core: Turn your Markdown files into a queryable, lightweight SQL database.
Flexible and extendabile: Bring your own document types, extend your frontmatter with computed fields and check for errors with with custom validations.
Simple API: Get a list of all or some Markdown files, filter them by frontmatter fields, and more.
Do one thing well 😉 markdowndb just gives you a database, an API a super-powerful and extensible way to create those from markdown. We don't provide a UI, live editing of values etc ... (though others may do!)
Open source: Your content isn’t locked away in proprietary platforms. It’s open, it's free, it’s yours.
Not tied to any stack: Use anywhere you want - NextJS, SvelteKit, from the command line etc etc.
Images and other assets as well as text

MarkdownDB the Pattern - why use markdown to create a database/collections

Power of plain text: Combination of unstructured content and structured data in simple Markdown files. With MarkdownDB, you no longer have to compromise between the ease of writing in Markdown and the functionality of a full-fledged database.
Open source/formats
Combined Structured and unstructured information
Images and other assets as well as text

Roadmap

Phase 1 - Basic Implementation:

Indexing
- Conversion of Markdown files into database records
Structured Data Extraction
- Extraction of frontmatter data
- Automatic type casting for simplified queries
Links Extraction:
- Extraction of forward links and backlinks
Basic API:
- Get a list of all or some of the Markdown files
- Get a list of forward and/or backlinks to a Markdown file
- Filter by metadata fields

Phase 2 - Advanced Features:

BYOT (Bring Your Own Types) System:
- Ability for users to define their own types for more customized data validation and retrieval.
Plugins system:
- ...

Phase 3 - Optimization:

Database Optimization:
- Refinement of database operations for enhanced speed and data integrity.

The vision

Unified Content Management

Imagine a world where Markdown isn’t just text - it’s an entry in a database, it's a source of structured and unstructured data. With MarkdownDB, we aim to balance the simplicity and accessibility of writing in Markdown with the ability to treat your collection of markdown files like a database (think Notion) - allowing, for example presenting each markdown file in a folder as a row in a sheet (e.g. for a project list or any other kind of collection), think querying your markdown files like a database e.g. show me documents with a created in the last week with "hello world" in the title or show me all tasks in all documents with "⏭️" emoji in the task (indicating it's next up!)

How does it work?

Extract: From frontmatter, links, tasks, and more - data extraction is comprehensive and intuitive.
Index: Effortlessly index a folder of markdown files and transform them into structured databse records.
Query: Utilize the simple API to query your content. Whether you’re creating an individual page, generating a tag list, or referencing backlinks, MarkdownDB has got you covered.

[See It in Action]: Dive into our demo video that walks you through how MarkdownDB transforms the familiar markdown syntax into a rich, interactive database. You'll witness how effortlessly you can extend, embed, and extract content, all with the foundational simplicity of Markdown.

CRF

Old taglines use on github:

MarkdownDB is a pattern and toolkit for using markdown to store collections of stuff. Think a simple open source Airtable or spreadsheet alternative.

Improvements to task extraction

Follow on to #60

Stuff like parsing out (like dataview):

created: If available, the 'created' property is extracted and incorporated into the DataView result.
due: If present, the 'due' property is extracted and included in the DataView result.
start: If the 'start' property exists, DataView extracts and integrates it into the result.
scheduled: DataView also considers the 'scheduled' property, extracting and including it if available.

Also maybe pulling all list items and being like dataview ...

[epic] MarkdownDB plugin system

We want a plugin system in MarkdownDB so people can easily extend the core functionality, for example to extract additional metadata, so that not all functionality has to be in core and people can rapidly add functionality

Sketch (April 2023)

https://link.excalidraw.com/l/9u8crB2ZmUo/9hkrQmVl9QX

Acceptance

Identify the different types of plugins ✅2023-11-19 roughly: parsing, computing, validating (and maybe serializing ...)
Research how remark works to see if we can reuse it 🚧2023-11-19 see notes in comment below
Design of MarkdownDB and especially the plugin system.
- extract first heading as title metadata
- add a metadata field

Notes

MarkdownDB vs Contentlayer

Contentlayer supported:

document types with
- frontmatter schema definition and validation
- assigning document types based on glob patterns
- computed fields, e.g. description auto-extracted from the document content
excluding/including some content folders we kinda already have this but it's not configurable
...

What we need:

probably config file similar to Contentlayer one, with:
- custom document types,
- content include/exclude option
- plugins
- ...
...

Support extracting wiki links with Obsidian-style shortest paths

Extract wikilinks with Obsidian style shortest-path wiki links, i.e.

wiki link: ![[Some Page]]
file at /some/folder/Some Page.md

(Meta)Data Validation and Document Types

When loading a file I want to validate it against a schema/type so that I know the data/content in my contentbase/"database" is valid

When validation fails what happens?
Error messages should be super helpful
Follow the principle of erroring early

When loading a file I want to allow "extra" metadata by default so that I don't get endless warnings about extra fields that are not defined for document type X.

When accessing a File I want to cast it to a proper typescript type so that I can use it from code with all the benefits of typescript.

BYOT (bring your own types) When working with markdowndb i want to create my own types ... so that when I get an object out it is cast to the right typescript type

Acceptance

Design sketch in README of API for users
Example just simple usage from JS (no need for nextjs) e.g. like this

import indexFolder from markdowndb

// setup zod stuff and configure markdowndb with

files, errors = indexFolder(withZodStuff)
console.log(errors)

Tasks

Sketch out a simple example concrete use case (write a short tutorial illustrating this hypothetical feature)
Research how astro does it with its zod stuff
sketch out the design before implementing
Implement

Design

Should the user define the scheme using Zod, or should we build a more intuitive way that doesn't require knowledge of external libraries?

I believe we should make the use of Zod optional, as it may be overkill for users. Most users likely want to validate if a specific front matter field is provided, rather than engaging in complex validation.

The best approach would be to allow users to:

Make a field required or optional.
Specify whether it's a string or a number.
Provide their own validation logic (giving users flexibility without imposing unnecessary complexity).
Choose whether to use Zod for validation.

Example:

If a user wants to validate that a field named dates with two or more dates is provided, the schema could be defined as follows:

dates: {
    type: string,
    required: true,
    validate: (fileObject) => {
        if (/* field 'dates' is incorrect */) {
            return {
                status: false, // Error
                message: ""     // Error message
            };
        } else {
            return {
                status: true,  // Correct
            };
        }
    }
}

Should all the file schemes be defined in the config file?
Yes, it's cleaner, and we should allow the users to import schemes from other js files for organization.

Determining Files Matching a Scheme:

Option 1: Continue with type Property

If no pattern field is provided, default to using the type property (e.g., type: "post"). This caters to simplicity for basic users and use cases.

Option 2: Introduce a pattern Property

If a pattern field is provided, utilize it to match files based on the specified pattern (e.g., pattern: "post/**"). This offers flexibility for handling diverse and complex use cases.

Consideration:
The goal is to strike a balance between providing essential functionality and accommodating complex use cases. The defaulting to type for simplicity while allowing the use of a custom pattern enables users to tailor the file matching process to their specific needs.

Adding Computed Fields:

To incorporate computed fields into the scheme, a compute or value field can be introduced, accepting a function. This allows users to perform computations or set values dynamically.

Example:

Suppose you have a scenario where the dates field needs to be computed based on some dynamic logic. You can define it as follows:

dates: {
    type: string,
    required: true,
    compute: (fileObject, ast) => {
        // Perform dynamic computation based on fileObject or ast
        // ...

        // Return the updated fileObject
        return fileObject;
    },
    validate: (fileObject) => {
        // Validation logic for the 'dates' field
        // ...
    }
}

In this example, the compute function takes the fileObject (representing the file's metadata) and ast (abstract syntax tree) as parameters. Users can apply custom logic within the compute function to dynamically calculate or set the value of the dates field.

This approach enhances flexibility by allowing users to include computed fields seamlessly within the overall scheme, contributing to a more versatile and adaptable file structure.

Also Maybe we can build npm packages with a pre-defined library of schemes or provide a comprehensive list of examples, as I consider schemes to be something not that simple for all users.
I think a few examples that cover what the user might want to do is enough.

Notes

Architecture

I think we use zod here and typescript types. (cf astro approach)

How does zod work?

Defining Schemas:
- You create a schema using the various methods provided by Zod, such as z.string(), z.number(), z.object(), etc.
- Example:
```
import { z } from "zod";

const userSchema = z.object({
  username: z.string(),
  age: z.number().min(18),
});
```
Validation:
- You can then use the parse method to validate and parse data according to the defined schema.
- Example:
```
const validUserData = userSchema.parse({
  username: "john_doe",
  age: 25,
});
```
- If the provided data doesn't match the schema, a ZodError is thrown with details about the validation errors.

Custom Error Messages:

You can customize error messages to provide meaningful feedback to users.

const userSchema = z.object({
  username: z.string().min(3, { message: "Username must be at least 3 characters" }),
  age: z.number().min(18, { message: "Age must be at least 18" }),
});

MarkdownDB v0.2: link extraction

Original issue in datahub-next repo: #4

I want to query for forward links and back links to files both markdown and non-markdown so that i can display back links, a network graph, deadlinks etc

standard markdown links
obsidian wiki links
embeds of files e.g. ![...]
- wiki link embeds of files

So all of these

[...](...)
![...](...)
[[...]]   # wiki link style including with title
[[...|my title]]   # wiki link style including with title
![[...]]  # for images and other embeds

Design

Crudely:

// note we implicitly 
interface Link {
  src: <FileID>
  dest: <FileID>
  text: // link text if any
  type: normal | embed
  // raw:  // the raw text of the original link
}

// functions like the following
// or these are attributes on File object
getLinks(fileId1: FileID): Array<Link> {
}
getBackLinks(fileId1: FileID): Array<Link> {
}

Acceptance

Support internal links (i.e. ../abc or /abc/ or abc
Supports all link types
- markdown
- wikilinks
- markdown embeds
- wikilink embeds
convenience query functions (or at least a tutorial on select functions to run)

Bonus

Support both with markdown file extension and without

Taks

first breaking test
code for link extraction from a markdown file (remark plugin or at least uses remark ast?)
table in database of links
query functions

Test for 500+ files

Aside: Probably in future we want a test of some kind - for now we could just open an issue about adding a test so we don't forget.

Originally posted by @rufuspollock in #24 (comment)

Support Obsidian-style tags list in frontmatter

Standard YAML frontmatter lists can be written in either of the following two formats:

Single line:

tags: [typora, basic, export]

Multiple lines:

tags:
  - typora
  - basic
  - export

Obsidian tags frontmatter field also supports this format:

tags: typora, basic, export

Currently MarkdownDB extracts it as a string "typora, basic, export" which results in errors when MarkdownDB tries to iterate over it.
See related issue reported in Flowershow repo: datopian/flowershow#543

Is there a program to update the database in real time for real time previews

We want a way so that you can rebuild the database in real time so that we get real time previews.

Simplest approach is probably to use https://github.com/paulmillr/chokidar to watch the markdown folders and rebuild some json files (with these files being watched by nextjs or whatever tool you are using). this at least is a first pass.

This probably depends on the JSON output option.

Acceptance

Instructions for how to run this in watch mode so that that the markdowndb files are regularly rebuilt
You can use the CLI flag --watch to toggle the watch feature on or off. #90
Documentation https://github.com/datopian/markdowndb?tab=readme-ov-file#watching-for-changes

Move all the tests into a root test folder

Some further refactoring.

Move all tests to root test folder
merge utils with lib folder!
- remove utils/index.ts (no need)
merge indexFolder.ts into process.ts

Refactor part II: refactor mddb code to use new code (process.ts)

Next step in refactoring started in #47, specifically

indexFolder function to index a folder and give back File objects ✅2023-03-07 ( #59 )
refactor sqlite generation code to run off this function (i.e. code in markdowndb.ts) ✅2023-03-07 ( #59 )
- no change in API compared to the past ✅2023-11-22 ( #63 )

Acceptance

sql code is running off the core extraction code ✅2023-03-07 ( #59 )
Merged to main with changeset and v0.4.0 tag ✅2023-11-22 ( #63 )

Notes

graph TD

markdown --remark-parse--> st[syntax tree]
st --extract features--> jsobj1[TS Object eg. File plus Metadata plus Tags plus Links]
jsobj1 --computing--> jsobj[TS Objects]
jsobj --convert to sql--> sqlite[SQLite markdown.db]
jsobj --write to disk--> json[JSON on disk in .markdowndb folder]
jsobj --tests--> testoutput[Test results]

Link improvements

Links with ./world.md should be treated like world.md, and /world should default to the root of the content system.

✅1-12-2023 22dcad9

Create a documentation page for features

Features to add:

What does mddb extract? ( I don't know if this is needed )
Document types ( I will add this after we finish configuration file )
Computed fields

Refactor AST Processing for tags, links and tasks

We will extract the conversion of source to AST from the ExtractlinksFromBody to processAST function

// This is parsing the AST for the link extraction
 const processor = unified()
   .use(markdown)
   .use([
     gfm,
     [
       remarkWikiLink,
       { pathFormat: "obsidian-short", permalinks: options?.permalinks },
     ],
     ...userRemarkPlugins,
   ]);

 const ast = processor.parse(source);

then instead of

  const bodyTags = extractTagsFromBody(source);
  const links = extractWikiLinks(options?.from || "", source, {
    permalinks: options?.permalinks,
  });

we will do:

  const AST = processAST(src)
  const bodyTags = extractTagsFromASt(AST);
  const links = extractWikiLinks(options?.from || "", AST, {
    permalinks: options?.permalinks,
  });
  // TODO
  const tasks = extractTasks(AST)

Custom document types

Acceptance

a user can define custom document types in a config file, e.g. markdowndb.config.js
files table has a new column: type
document types can be assigned to files based on glob patterns (e.g. filePathPattern option of document types)
document types can be assigned to files based on type frontmatter field (takes precedence over filePathPattern)
document types can have their frontmatter fields defined
fields validation
computed fields
option to set front matter fields validation as strict, i.e. no more, no less fields

Publish to npm under `mddb`

@olayway let's go with mddb for now.

Originally posted by @rufuspollock in #9 (comment)

Tasks

Beta Give feedback

publish to npm under mddb
deprecate @flowershow/markdowndb
replace @flowershow/markdowndb with mddb in our other projects
update README, docs and examples
Options

datopian / markdowndb Goto Github PK

markdowndb's Introduction

MarkdownDB

Features and Roadmap

Quick start

Have a folder of markdown content

Index the files with MarkdownDB

Watching for Changes

Query your files with SQL...

...or using MarkdownDB Node.js API in a framework of your choice!

Computed Fields

Step 1: Define the Computed Field Function

Step 2: Indexing the Folder with Computed Fields

Configuring markdowndb.config.js

Example Configuration

(Optional) Index your files in a prebuild script

With Next.js project

API reference

Queries

Architecture

markdowndb's People

Contributors

Stargazers

Watchers

Forkers

markdowndb's Issues

Acceptance

Tasks

Design

Example Implementation

Notes

Acceptance

Acceptance ✅19-12-2023 #97

Acceptance

Taks

Design - new

Architecture

Design

Analysis

2023-09-23

Outflow

Terminology

Steps to reproduce

Possible problem

Tasks

Acceptance

Tasks

Notes:

Acceptance aka Roadmap

Feature list

Marketing

Features

Inbox

Marketing

💤

Rufus random notes

Notes

Questions

Notes on obsidian dataview API

How to handle document types 2023-03-09

How could we index frontmatter into our db? 2023-03-09

Acceptance

Acceptance

Questions / blockers

Notes

Content

Why use mddb?

Later Features:

MarkdownDB Tweet thread

Acceptance

Design

What might be beneficial for us:

Distinguishing Tasks from List Items:

Notes

Unique Features That DataView has and we don't:

Selective Property Extraction:

List of Children:

Acceptance

Tasks

Design

Configuring `markdowndb.config.js`

(Optional) Index your files in a `prebuild` script

Unique Features That `DataView` has and we don't: