Coder Social home page Coder Social logo

python-file-set-sorter's Introduction

Python File Set Analysis

This tool provides a simple analysis of files passed as inputs of the program. The user divides files with strings into two categories:

  • Set A - list of files in the first set
  • Set B - list of files in the second set

After that, the tool runs the analysis where it takes all elements in the set A (e.g. file contents) and compares them with the set B, where the record from A is added to the:

  • Unique set - there is no such record in the set A
  • Non-unique set - there is a match in the second B

This tool is just a quickly implemented because I needed to solve this as fast as possible and I didn't know about any suitable tool which is available online.

WARNING: The tool is very simple and it was written in 30 minutes. Therefore, I am actually missing some unit tests and the tool is not guarded against all user inputs :-).

Input File Format

The file format contains paths to files (one per a line). There is also a possibilty to specify the STRIP value which defines the number of removed path elements from the prefix. For example, the STRIP=2 applied on /home/user/abc results to abc. The STRIP value can be passed as the first line of the file or you can specify the strip value as the parameter of passed file like files.txt:2. You can also specify the default strip value using the --strip parameter. Empty lines are filtered out.

So, the example of file can be:

STRIP=2
/home/user/a/b/c/d.txt
/home/user/a/b/c/x.txt
/home/user/c/d.txt
/home/user/files.txt
/home/user/file.txt

Without the STRIP:

/home/user/a/b/c/d.txt
/home/user/a/b/c/x.txt
/home/user/c/d.txt
/home/user/files.txt
/home/user/file.txt

Example of Usage

You can use this scrip to run the analysis:

#!/usr/bin/env bash

python3 sorter.py --unique-files data/input/* --other-files data/others/* $@ | tee output.txt

You can get more inforamtion about the tool using the --help option:

Process two sets of files and search for a unique files in the start group.
The program tries to find a unique files in Set A that are not in the Set B.
The format of input data is a text file where each file is on a line ending
with a \n character. Empty lines are filtered out. The format of unique files
can contain the strip value which removes the first n paths from each analyzed
files. The value can be defined globally or localy per file in the form
FILE:STRIP, where the file is a file path and STRIP is the strip value.The
default value (passed via the --strip argumnet) is assined iff the STRIP value
is not defined. Another possibility is to define a strip value directly in a
file as the first line of the file where the format is STRIP=value, where the
value is a positive integer. The value inside the file has a bigger priority
than value passed by in the argument.

optional arguments:
  -h, --help            show this help message and exit
  --unique-files UNIQUE_FILES [UNIQUE_FILES ...]
                        The set of files which will be analyzed in the unique
                        set (= Set A)
  --other-files OTHER_FILES [OTHER_FILES ...]
                        The set of files which will be analyzed in the non-
                        unique set (= Set B)
  --strip STRIP         Default strip value for all passed files. The default
                        value is 4.
  --debug               Enable the debug mode for the analysis.             Enable the debug mode for the analysis.

The tool also contains a stupid code which can be used for the generation of test files. It generes one file with inputs and second file with randomly selected files (from the first file). I suggest you to edit the file directly to customize it to your needs.

python-file-set-sorter's People

Contributors

benycze avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.