Coder Social home page Coder Social logo

iemejia / catho Goto Github PK

View Code? Open in Web Editor NEW
6.0 3.0 1.0 95 KB

A file catalog utility inspired by the awesome Robert Vasicek's Cathy project. Or my excuse to hack something that I really need.

License: GNU Lesser General Public License v3.0

Python 100.00%

catho's Introduction

Catho

Catho is a catalog utility inspired by the awesome Robert Vasicek's Cathy project. The idea is to have an util to save the catalog of the different files that you have in different media, volumes, network places, etc, that you can search, and update locally without having to put or connect to such media.

Or put in other words, it's my excuse to hack some python. Yes, yes I promised to do it in haskell, but i don't have time now :P.

Installation

TODO

Requirements

For python < 2.7 pip install argparse

Use

# Prepare catalog folder
catho init

# Add a catalog with alias
catho add name path  

# Remove a catalog
catho rm name

# Search for filenames matching with a pattern (ex. *.zip, c*.*) in some catalogs or in all if none is provided

catho find pattern [catalog1] [catalog2] [catalogn]

# List all catalogs
catho ls

# Find apparently existing files from dir in the catalog
catho scan name dir

# Import existing cathy catalogs to the catho format
catho import file.cat

Developing

The catalog correspond to a simple sqlite3 database, for more info about the catalog structure, see the docs/catalog.sql file. Catalogs are saved automatically in the ~/.catho folder.

We would like to create a sort of simple minimalist catalog system so people can create nicer tools based on the catho system (e.g. GUI, web, etc).

Continuous integration is running automatically via travis-ci . The current project status is: Build Status

Collaboration

Contributions are welcome, but please add new unit tests to test your changes and/or features. Also, please try to make changes platform independent and backward compatible.

catho's People

Contributors

iemejia avatar rgamez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

karagul

catho's Issues

Save only relative paths

When saving catalog, save the full path in the metadata.
And only relative paths in the catalog table.
Currently it works for ., but it fails with the system auto completed paths (e.g. ~)

Write docs, tutorial

Tutorials:

  • How to use catho
  • Encrypting your catalog
  • Synchronizing with dropbox

Error parsing hardlinks

Not such file or directory while parsing a hardlink

$ ls -al ~/Dropbox/iPad/
lrwxr-xr-x 1 rgamez staff 38 18 feb 2012 test.pdf -> /Users/rgamez/Documents/test.pdf

$ ./catho.py add iPad ~/Dropbox/iPad
Creating catalog: iPad
An error occurred: [Errno 2] No such file or directory: '/Users/rgamez/Dropbox/iPad/test.pdf'

Write results to stdout

Writing with the logger to stderr by default prevent of using unix tools like grep for filtering, for example $ catho/catho.py find \* media | grep Animal just print 133652 records from the test catalog.

There is a workaround, redirecting stderr to stdout, but it is not really straight forward.
$ catho/catho.py find \* media 2>&1 >/dev/null | grep Animal

add before init is not fatal error

if ~/.catho doesn't exist because catho init hasn't been executed, errors are displayed but the parsing of the directory continues.

$ ~/catho/catho/catho.py add Home ~
Creating catalog: Home
An error occurred: unable to open database file
An error occurred: unable to open database file
An error occurred: unable to open database file
$ ls ~/.catho
ls: cannot access ~/.catho: No such file or directory

Two options/suggestions @iemejia

  • It's a fatal error that ends with the suggestion of executing catho init.
  • ~/.catho it's "touched" before adding any catalog and perhaps any operation.

Encrypt database

for security reasons in case of use in cloud servers
-e --encrypt
we have to decide a schema

List catalog information

with the ls command plus the name of the catalog, it should display metadata and eventually the contents.

cathod service

a runtime service that automatically indexes modified files and upgrades the catalog.

TypeError on listing the catalogs

$ catho/catho.py ls
Traceback (most recent call last):
File "catho/catho.py", line 334, in
logger.info(catalogs_str())
File "catho/catho.py", line 226, in catalogs_str
date = str(datetime.fromtimestamp(timestamp))
TypeError: a float is required

Add tag support

We have to discuss what to support:

  • filesystem tags ?
  • user generated tags ?
  • inferred tags (e.g. .mp4 = video, or .mp3 = audio ?

Error when calculating hash of BIG files

I found it trying to index a VM that's bigger than the memory in my local machine:

Python(15742) malloc: *** mmap(size=140645843243008) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
File "catho/catho.py", line 303, in
for files in filesubsets:
File "catho/catho.py", line 77, in file_get_filelist
hash = file_hash(fullpath, hash_type)
File "catho/catho.py", line 50, in file_hash
sha1.update(f.read())
MemoryError

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.