Coder Social home page Coder Social logo

scalpel-ts's Introduction

scalpel-ts

Coming soon!

scalpel-ts's People

Contributors

imax153 avatar

Stargazers

Andrejs Agejevs avatar Mepuka Kessy avatar Dan Minshew avatar

Watchers

James Cloos avatar  avatar

scalpel-ts's Issues

Add tests

We have no tests at all at the moment. This is a big problem.

At the very minimum, we need to build a comprehensive test suite that unit tests all modules.

Add documentation

While I was (relatively) good about documenting things as I wrote the library, it would be helpful to go back and add documentation to the different modules and fill in descriptions of the types so that doc-ts (when added) can generate more useful information.

[Internal]: Move monad stack definitions into their own modules

Currently the definitions for the constructors, destructors, combinators, and utilities for the Scraper Monad stacks utilized in the library are co-located with the definitions of the instances that they back. The current Monad stacks used are:

type Scraper<A> = Reader<TagSpec, O.Option<A>>

type SerialScraper<A> = State<SpecZipper, Option<A>>

It would be cleaner if we separated concerns by splitting the monad stacks into their own modules. This way the Scraper and Serial modules can focus on their primary function instead of also defining all the function boilerplate for the monad stacks.

The type definitions for these monad stacks would be as follows:

type ReaderOption<R, A> = Reader<R, Option<A>>

type StateOption<S, A> = State<S, Option<A>>

Fix: Scraper does not backtrack with chroots

Description

A Scraper should be able to backtrack when failing on a selected node, and continue searching for all nodes matched by a selector. Currently, this is not the case.

For example, the following code should isolate and return the only comment containing the word "cat".

export const chroots = (selector: Selector) => <A>(
  scraper: Scraper<A>
): Scraper<ReadonlyArray<A>> => flow(select(selector), RA.traverse(O.Applicative)(scraper))

However, the current behavior is to fail the entire scraper and return None if a single node is filtered from the scraped results.

Solution

This can be solved by modifying the behavior of the chroots function in the Scraper module. Currently we are executing the scraper action for every element in the list of selected nodes, and accumulating the results of the scraper into an Option<ReadonlyArray<A>>. The default behavior of traverse is to return None if the scraper returns None for any of the selected nodes.

Instead, the scraper should be executed on each element in the list of selected nodes to return a ReadonlyArray<Option<A>>, which can then be compacted, resulting in a ReadonlyArray<A>. This will allow scrapers to backtrack and evaluate all selected nodes only retaining values which evaluate to Some<A>.

export const chroots = (selector: Selector) => <A>(
  scraper: Scraper<A>
): Scraper<ReadonlyArray<A>> =>
  pipe(
    ask(),
    map((spec) => pipe(spec, select(selector), RA.map(scraper), RA.compact))
  )

Add Filterable instance to Scraper

The addition of a Filterable instance to the Scraper module would allow for filtering through scraped web content within the context of the Scraper Monad.

For example, if we wanted to parse all <h1 /> tags and only keep those whose text starts with the word "hello", we could do something along the lines of:

pipe(
  Scraper.text(Select.tag('h1')),
  Scraper.filter((text) => text.startsWith('hello'))
)

Add http interface

It would be helpful to add some helpers to query a webpage, parse it, and scrape the contents with a provided Scraper. The main Haskell library does this here.

Add examples

Haskell's scalpel has a number of excellent examples which showcase various ways to use the library.

We should consider porting several of these examples with scalpel-ts to allow users to better understand how to use the library.

Add Semigroup and Monoid instances for Scraper

import * as O from 'fp-ts/Option'
import * as R from 'fp-ts/Reader'

const getSemigroup = <A>(S: Semigroup<A>): Semigroup<Scraper<A>> =>
  R.getSemigroup(O.getApplySemigroup(S))

const getMonoid = <A>(M: Monoid<A>): Monoid<Scraper<A>> => R.getMonoid(O.getMonoid(M))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.