Coder Social home page Coder Social logo

alexlamson / datawrangler Goto Github PK

View Code? Open in Web Editor NEW
11.0 11.0 2.0 361 KB

Make quick and dirty data mining made easier in Sublime Text

License: MIT License

Python 100.00%
data-cleaning data-cleansing data-munging data-wrangling sublime-text-plugin text-manipulation

datawrangler's Introduction

alexlamson.com

Code for alexlamson.com

datawrangler's People

Contributors

alexlamson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

ptzagk ffdsouza

datawrangler's Issues

Align columns fails with this exact string

String to reproduce error:

ReportId, ActivityId, Name, GroupId, CommonGroupId, GroupListId, CommonGroupListId, IsActive, IsBillable, StartLocalTime, StartUtcTime, EndLocalTime, EndUtcTime, Notes, RelatedActivityId, SourceId, CurrentChangeSequence, CurrentChangeRandomValue, Other
(3, 611, 'ReportId, ActivityId, Name, GroupId, CommonGroupId • (Quantified Self, manictime sample) - Sublime Text (UNREGISTERED)', 62, 12, None, None, 1, None, '2018-06-27 13:39:31', '2018-06-27 17:39:31', '2018-06-27 13:39:36', '2018-06-27 17:39:36', None, None, None, 1500, 1218390469, '{}')

Stack trace:

  File "C:\Program Files\Sublime Text 3\sublime_plugin.py", line 1072, in run_
    return self.run(edit)
  File "C:\Users\Alex\AppData\Roaming\Sublime Text 3\Packages\User\DataWrangler.py", line 205, in run
    column_widths = detect_col_widths(self, sep, num_columns)
  File "C:\Users\Alex\AppData\Roaming\Sublime Text 3\Packages\User\DataWrangler.py", line 59, in detect_col_widths
    column_widths[i] = max(len(cell_string), column_widths[i])
IndexError: list index out of range

align columns visually

 * add spaces to columns such that there is no vertical overlap between columns
 * maybe also re-order the columns such that the narrow column come first?

Example:

aaaa bb ccc
dd eeeeee ff

Becomes:

aaaa bb     ccc
dd   eeeeee ff

outsource hard stuff to pandas

automatically detecting dtype and column separations could potentially be done by using some pandas libraries. If that's possible, it should be done

Automatic merging of similar lines

Not sure how it should be implemented, but take all lines that are uncommon but within a certain edit distance of a commonly occuring line, and replace the uncommon line with the common line.

Also, it may be wise to have some human oversight into this process while it happens.

standardize timestamps

convert dates and times to the format: YYY-MM-DD HH:MM:SS.SS (where hours is zero-padded 24-hour time)

Remove trailing tabs

when copying from google sheets, there are lines at the end of the doc that are empty sans tabs. this will remove those lines

would recommend creating a small screencast that captures plugin in action

For instance,
https://github.com/recite/re-cite.org/

The plugin seems useful, btw.

Is your feature request related to a problem? Please describe.
Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

delete column

Assuming that the document being viewed is a table, this command should delete the column that the cursor is currently in

All pairs

Given a list of elements, list all pairs of elements for that list

ex.
Input
a,b,c,d
Output
a b
a c
a d
b c
b d
c d

Can be done with the following code

import itertools
list(itertools.combinations(['a','b','c','d'], 2))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.