Coder Social home page Coder Social logo

xarantolus / collect Goto Github PK

View Code? Open in Web Editor NEW
74.0 5.0 10.0 2.12 MB

A server to collect & archive websites that also supports video downloads

Home Page: https://010.one/Collect/

License: MIT License

TypeScript 54.28% JavaScript 32.59% Shell 0.26% Pug 12.87%
self-hosted webinterface archive website-archive video-downloader website-scraper web-archiving

collect's Issues

Content file not being created

I am not sure what you what to know so:

I get this:

Error 500
ENOENT: no such file or directory, open 'public/s/content.json'

Have this installed:

npm -v
5.8.0
nodejs -v
v8.11.4

and run this to start

sudo npm start production

Set proper content-types

When visiting a downloaded website that has HTML content but a .php or .md extension, the server assumes the wrong content type. Browsers usually display the file wrong.

Allow public access

I would be great to have an option (either global or by site) to allow unauthentified acces to website (and also potentially list of website)

Main problem is that this would need to distribute the call to the auth middleware to the url handlers or another equivalent solution

Support Video Downloads

As this server is for archiving all types of websites, it should also support downloading only videos. To specify whether to download only the video or the website with the video, the url could be given as video:https://youtube.com/watch?v=..... or with another option on the new page.

This features would use youtube-dl because it supports a wide range of sites.

Redirect after login

An user should be redirected to the page they wanted to go to after logging in

Archive Format

Have you considered using webarchive format and or pdf for archiving? .war is pretty standard now

Embed sites in Iframe

When clicking on the title in the table view, redirect to a site with an iframe so that the header is still displayed

Add option for using PhantomJS

Add an option to use the website-scraper-phantom module to download pages.

This can might be accomplished by using the module if it is installed (users just have to install the module & the normal install doesn't fail if their platform is not supported by PhantomJS)

Details page improvements

Improve the "details" page:

  • Link to the original url instead of displaying it in an input box
  • Link to the saved page instead of just displaying it

Modernize

This software is

  1. quite old (especially regarding TypeScript/JS coding conventions)
  2. not well suited for larger collections (no search, no tagging etc.)
  3. barely updated (dependencies are out of date, some of them no longer supported etc)
    • I think this server doesn't work with newer versions of the website-scraper node module, so it's still on an older version

While it still works (I've had it running for 4+ years), it feels like it's time for an overhaul. The following could be good steps:

  1. Creating a better frontend (the code in browser.js works, but is a mess)
  2. Creating a docker (compose) configuration that just works to simplify installation
  3. Maybe switch to a database instead of just storing everything in a JSON file (not sure I'll do this, as it complicates setup etc.)
  4. Updating the downloader code to use newer website-scraper versions

2 & 4 should be possible without major rewrites, the others are somewhat more involved.

Maybe it also makes sense to rewrite the server in Go because I like it better, but I'm not sure if there's a good website-scraper-like module for Go. I actually tried writing something like that that started a chromium browser and used SingleFile to download pages, but it didn't work that well.

I'm not sure when (or if) I'll work on this.

Add an editor

Add an editor(link to it from the details page) for html pages so they can be edited from the webinterface

Fix mobile layout

The layout of the details page looks horrible in mobile browsers.

Socket.IO authentication

Right now, any client can connect to the server to receive events. This should be restricted to people who have an api token or the right cookie.

How to reproduce: register events and enter io.connect() in your browser console

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.