Coder Social home page Coder Social logo

bogrep's Introduction

Bogrep – Grep your bookmarks

Latest Version Build Status codecov

Bogrep downloads and caches your bookmarks in plaintext without images or videos. Use the Bogrep CLI to grep through your cached bookmarks in full-text search.

bogrep -i "reed-solomon code"

Bogrep mockup

Install Bogrep

Install Bogrep from crates.io

# Build and install bogrep binary to ~/.cargo/bin
cargo install bogrep

To update bogrep to a new version, run cargo install bogrep again. Versions 0.x will not be backwards compatible and breaking changes are expected. Remove Bogrep's configuration directory (see Supported operating systems) if you experience an issue when running Bogrep.

Install Bogrep from github.com

git clone [email protected]:quambene/bogrep.git
cd bogrep

# Build and install bogrep binary to ~/.cargo/bin
cargo install --path .

Usage

Settings and cache are installed to the configuration path, after Bogrep has been run for the first time. The configuration path depends on your operating system (see Supported operating systems).

# Import bookmarks from selected sources
bogrep import

# Fetch and cache bookmarks
bogrep fetch

# Search your bookmarks in full-text search
bogrep <pattern>

To simulate the import of bookmarks, use bogrep import --dry-run.

Search

bogrep [OPTIONS] [PATTERN]
Options:
  -v, --verbose...          
  -m, --mode <MODE>         Search the cached bookmarks in HTML or plaintext format [possible values: html, text]
  -i, --ignore-case         Ignore case distinctions in patterns
  -l, --files-with-matches  Print only URLs of bookmarks with matched lines
  -h, --help                Print help
  -V, --version             Print version

Getting help

# Check version
bogrep --version

# Print help
bogrep --help

# Print help for subcommands
bogrep config --help
bogrep import --help
bogrep fetch --help

Import bookmarks

Import of bookmarks is supported from the following browsers:

  • Firefox (in .json and .jsonlz4 format)
  • Chromium (in .json format)
  • Chrome (in .json format)
  • Edge (in .json format)
  • Safari (in .plist format)

If bookmark files are not detected by bogrep import, you can configure them manually using:

bogrep config --source ~/path/to/bookmarks/file

Filter bookmark folders

Filter which bookmark folders are imported. Multiple folders are separated by comma:

bogrep config --source "my/path/to/bookmarks_file.json" --folders dev,science,articles

Ignore urls

Ignore specific urls. The content for these urls will not be fetched and cached.

It can be useful to ignore urls for video or music platforms which usually don't include relevant text to grep.

# Ignore one or more urls
bogrep config --ignore <url1> <url2> ...

Fetch underlying urls

Fetch the underlying urls of supported websites:

bogrep config --underlying <url1> <url2> ...

For example, if a specific url like https://news.ycombinator.com/item?id=00000000 is bookmarked, the underlying article will be fetched and cached.

Supported domains are:

  • news.ycombinator.com
  • reddit.com

Diff websites

Fetch difference between cached and fetched website for multiple urls, and display changes:

bogrep fetch --diff <url1> <url2> ...

Manage internal bookmarks

If you need to add specific URLs to the search index, use the bogrep add subcommand.

# Add URLs to search index
bogrep add <url1> <url2> ...

# Remove URLs from search index
bogrep remove <url1> <url2> ...

# Add URLs to search index and fetch content from URLs
bogrep fetch <url1> <url2> ...

Request throttling

Fetching of bookmarks from the same host is conservatively throttled, but can also be configured in the settings.json usually placed at ~/.config/bogrep in your home directory:

{
    "cache_mode": "text",
    "max_concurrent_requests": 100,
    "request_timeout": 60000,
    "request_throttling": 3000,
    "max_idle_connections_per_host": 10,
    "idle_connections_timeout": 5000
}

where request_throttling is the waiting time between requests for the same host in milliseconds.

Too speed up fetching, set max_concurrent_requests to e.g. 1000. The maximum number of available sockets depends on your operating system. Run ulimit -n to show the maximum number of open sockets allowed on your system.

For the available settings see https://docs.rs/bogrep/latest/bogrep/struct.Settings.html.

Supported operating systems

Bogrep assumes and creates a configuration path at

  • $HOME/.config/bogrep for Linux,
  • $HOME/Library/Application Support/bogrep for macOS,
  • C:\Users\<Username>\AppData\Roaming/bogrep for Windows,

in your home directory for storing the settings.json, bookmarks.json, and cache folder.

You can configure the configuration path via the environment variable BOGREP_HOME.

Troubleshooting

Missing file permissions on macOS

If file permissions are missing for Bogrep, allow CLI applications to access the filesystem in the System Preferences.

Missing search results

The default bogrep search is case-sensitive. Try bogrep -i for case-insensitive search.

Testing

# Run unit tests and integration tests
cargo test

# Run unit tests
cargo test --lib

# Run integration tests
cargo test --test '*'

bogrep's People

Contributors

quambene avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

hbcbh1999 cocowan

bogrep's Issues

Improve line breaks for plaintext parsing

Plaintext of fetched bookmarks is cached in a single line. Instead each paragraph in HTML should translate to a new line in the parsed plaintext. This will lead to better grepability.

Spacing errors in cached files

If there is a link in the text or formatting like italics spaces are omitted between the linked text and adjacent words.

In the example below, popularCookie Clicker, "Cookie Clicker" is a link, but cached there is no space between it and the preceding word "popular".

In apaperclip maximizer, "paperclip maximizer" is italicized and cached there is no space between it and the preceding word "a".

In Thissoundslike the italicized "sounds" is run together with the words before and after it.

Also, the paragraphs are run together instead of having a newline between them.

The example URL is active so you can see the original document.

$ bogrep paperclips
Match in bookmark: https://www.vice.com/en/article/xwgnxq/this-game-about-paperclips-will-make-you-ponder-the-apocalypse Ever since the wildly popularCookie Clicker,idle clicker games have been about hockey stick curves, about exponential growth unleashed by multiplicative advances in productivity. InCookie Clicker,that was employed in service of an absurdist joke about cookies.Universal Paperclips,a new free game from designer Frank Lantz, instead takes this to its darkly literalistic conclusion.It's a clicker game where you play as apaperclip maximizer,an AI that, once tasked with making paperclips, proceeds to turn the entire universe into paperclips.Thissoundslike a premise arrived at specifically to spoof clicker games, but it harkens back to a thought experiment proposed by Nick Bostrom, an Oxford philosophy professor, in a2003 paper:The risks in developing superintelligence include the risk of failure to give it the supergoal of philanthropy. […] Another way for it to happen is that a well-meaning team of programmers make a big mistake in designing its goal system. This could resul

Feature Request: Flag to Only list URLs

Like grep or RipGrep there should be a -l flag to not show the text of the match, but only the URLs that have a match. This can be aproximated with bogrep <SEARCH> | grep 'Match in bookmark'

Cache Mode ignored in settings

With a clean import (and cache cleaned), using bogrep import & bogrep fetch the cache is filled with .txt files. Again after cleaning, bogrep import & bogrep fetch --mode markdown fills the cache with .md files. These are expected behaviors.

After cleaning again and editing settings.json setting "cache_mode": "markdown", and running bogrep import & bogrep fetch the cache is filled with .txt files. The expected behavior should be to download .md files.

Fetch error: Too many open files

When trying to do a bogrep fetch I am getting the error below.

$ bogrep fetch
Error: Can't create file at /Users/USERNAME/Library/Application Support/bogrep/cache/78aa542f-52c1-4b5e-b475-15293854996a.txt: Too many open files (os error 24)
$ (140/8005)

I tried setting "max_concurrent_requests": 50, and still get this issue.

OS: Darwin 23.1.0 - macOS 14.1.1 (Sonoma)
version: bogrep 0.5.0

GIF files fetched

It seems that gif files are not excluded when fetching and are put into the cache.
I hadhttp://sirocco.accuweather.com/sat_mosaic_640x480_public/rs/isarNE.gif in my bookmarks and when I did a bogrep search I got a Match in bookmark with that URL and binary junk. The query I made happened to match some of the binary data.

Format not supported for bookmark file

When I try to import by bookmarks I get this error message:
Error: Format not supported for bookmark file

I tried using a source that points to my Firefox Profile dir, My Chrome Bookmarks, a json file export of my Firefox bookmarks, and an export file of my Chrome bookmarks. I deleted settings.json between each bogrep config --source command.

I am running on Mac OS 11.7.10 (Big Sur). The unit and integration tests all pass.

How can I troubleshoot this issue further?

thread 'main' panicked

A search panicked the main thread. Possibly a pdf was included in the cache?

RUST_BACKTRACE=1 bogrep  fugitive
thread 'main' panicked at 'byte index 7369 is not a char boundary; it is inside '\u{a0}' (bytes 7368..7370) of `#REF!Far From The Maddening Crowdby David Nicholls (Based on the novel by Thomas Hardy)   2015september 2013 final shooting     x kb   pdf formatimdbFargoby Joel & Ethan Coen   1996undated, unspecified draft   106 kb   html formatimdbFarg`[...]', src/cmd/search.rs:86:35
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::str::slice_error_fail_rt
   3: core::str::slice_error_fail
   4: bogrep::cmd::search::search_bookmarks
   5: bogrep::cmd::search::search
   6: bogrep::main::{{closure}}
   7: tokio::runtime::park::CachedParkThread::block_on
   8: tokio::runtime::context::runtime::enter_runtime
   9: tokio::runtime::runtime::Runtime::block_on
  10: bogrep::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Here is the full backtrace:

RUST_BACKTRACE=full bogrep fugitive
thread 'main' panicked at 'byte index 7369 is not a char boundary; it is inside '\u{a0}' (bytes 7368..7370) of `#REF!Far From The Maddening Crowdby David Nicholls (Based on the novel by Thomas Hardy)   2015september 2013 final shooting     x kb   pdf formatimdbFargoby Joel & Ethan Coen   1996undated, unspecified draft   106 kb   html formatimdbFarg`[...]', src/cmd/search.rs:86:35
stack backtrace:
   0:        0x10224f8c8 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::he69c0e17cb41f255
   1:        0x10227a90b - core::fmt::write::h66293df4c7dd941a
   2:        0x102233206 - std::io::Write::write_fmt::h2f5a7ea5f48a0b56
   3:        0x10224f690 - std::sys_common::backtrace::print::h71fd332624ce1826
   4:        0x1022506f5 - std::panicking::default_hook::{{closure}}::ha2a0e70fb3678142
   5:        0x102250471 - std::panicking::default_hook::hb166cd42dec7ff92
   6:        0x102250cb8 - std::panicking::rust_panic_with_hook::h2b924837648ff0c0
   7:        0x102250bf3 - std::panicking::begin_panic_handler::{{closure}}::h04e24a68d30d9f5c
   8:        0x10224faf9 - std::sys_common::backtrace::__rust_end_short_backtrace::hd45b5152c8265971
   9:        0x10225096d - _rust_begin_unwind
  10:        0x1022a2003 - core::panicking::panic_fmt::h9302663e63786640
  11:        0x102275f22 - core::str::slice_error_fail_rt::h16947361fdce3fc4
  12:        0x1022a1ee9 - core::str::slice_error_fail::hc7dbb20721e2925b
  13:        0x101df3cee - bogrep::cmd::search::search_bookmarks::he87011c56d994e70
  14:        0x101df10bb - bogrep::cmd::search::search::h7c9b200747f586e2
  15:        0x101d9f7f0 - bogrep::main::{{closure}}::h9bf999662de91d64
  16:        0x101d9f08b - tokio::runtime::park::CachedParkThread::block_on::h475f5be8938b1cdf
  17:        0x101d832ef - tokio::runtime::context::runtime::enter_runtime::h0c39744fbd9d979e
  18:        0x101d8ea91 - tokio::runtime::runtime::Runtime::block_on::hcd0b6f794fbc0b1a
  19:        0x101d7ff7b - bogrep::main::h2c2d8feaaf0c8766
  20:        0x101d6da36 - std::sys_common::backtrace::__rust_begin_short_backtrace::h3044435b1b36dee6
  21:        0x101d6da51 - std::rt::lang_start::{{closure}}::h849919968bfedf8b
  22:        0x102250854 - std::panicking::try::hb5cb29dbfee1dcfc
  23:        0x10223bd6e - std::rt::lang_start_internal::h634e63ff6023f727
  24:        0x101d8005c - _main

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.