Coder Social home page Coder Social logo

mito-ds / monorepo Goto Github PK

View Code? Open in Web Editor NEW
1.8K 20.0 121.0 195.54 MB

The mitosheet package, trymito.io, and other public Mito code.

Home Page: https://trymito.io

License: Other

CSS 2.50% TypeScript 44.66% JavaScript 0.15% Python 52.52% Jupyter Notebook 0.04% Shell 0.11% HTML 0.01%
data-science python data data-visualization data-analysis jupyter pandas

monorepo's Introduction

Mito Logo Mito Monorepo

Deploy mitosheet and mitoinstaller PyPI - Downloads

Mito is a spreadsheet that lives inside your JupyterLab notebooks. It allows you to edit Pandas dataframes like an Excel file, and generates Python code that corresponds to each of your edits.

Mito aims to be the first tool in your data science toolkit and supports:

  • Point-and-click CSV and XLSX import
  • Excel-style pivot tables
  • Graph generation
  • Filtering and sorting
  • Merge (lookups)
  • Excel-Style formulas
  • Column summary statistics
  • And much more!

Mito is an open source tool (look around...), and will always be built by and for our community. See our plans page for more detail about our features, and consider purchasing Mito Pro to help fund development.

⚡️ Quick start

To get started, open a terminal, command prompt, or Anaconda Prompt. Then, download the Mito installer:

python -m pip install mitoinstaller

Then, run the installer. This command may take a few moments to run:

python -m mitoinstaller install

This will install Mito for classic Jupyter Notebooks and JupyterLab 3.0. More detailed installation instructions can also be found here.

If you're interested in Mito Pro, see our plans page.

Documentation

You can find all Mito documentation available here.

Getting Help

To get support, join our Discord or Slack.

Docker Quick Start

Coming soon!

MyBinder

MyBinder link for the main branch: Binder

Contributing

This repo is the monorepo for the Mito project, and so contains the mitosheet package, the trymito.io website, and our documentation as well.

Mitosheet

To see the code for the mitosheet package, see the mitosheet folder.

Testing

To test the current version of mitosheet that is deployed on Test PyPi, create an empty venv, and run the command

python3 -m pip install mitoinstaller
python3 -m mitoinstaller install --test-pypi

Then, launch JLab to test the current version of the mitosheet package on Test PyPi.

Mitoinstaller

To see the mitoinstaller package, see the mitoinstaller folder.

Trymito.io

To see the code for our website, see the trymito.io folder.

Docs

Our docs are hosted on Gitbooks here. You can see and edit the docs in the /docs folder, PRs greatly appreciated!

monorepo's People

Contributors

aarondr77 avatar cirotix avatar jake-stack avatar marthacryan avatar naterush avatar tylerjrichards avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

monorepo's Issues

Unable to change the data type on multi-select columns

Describe the bug
If you select multiple columns and you want to assign them all the same dtype the dtype is only assigned to the last column selected.

To Reproduce
Steps to reproduce the behavior:

  1. Select multiple columns
  2. In the taskpane change the type to an int or whatever
  3. Note that only the last of the selected columns have its type changed

This happens no matter what dataset you are using. I was trying to change 20 str's to int's In the end I had to complete the action 20 times instead of selecting the 20 columns and carrying out the action once.

Expected behavior
Select multiple columns and in the task pane change all the columns to the same dtype

Screenshots
image

Desktop (please complete the following information):

  • OS: Windows 10 Pro, Version 21H1, Build: 19043.1526
  • Browser Chrome Version 98.0.4758.82 (Official Build) (64-bit)
  • Mito Version 0.3.170

Better error messages when trying to display a sheet in the wrong location

Is your feature request related to a problem? Please describe.

#113 introduces useful utilities for checking where Mito is running. We should use these to provide better error messages to users, when they use mito in the wrong location.

As I note in the PR, I don't want to do this until we're a bit more secure in the utils + feel they aren't gonna lead to sheet crashing or non-rendering issues...

Describe the solution you'd like
I'd like to print error messages that guide users to the installation instructions when they use the mitosheet in the wrong location.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Make state management more intuitive

Describe the bug
How we manage state in /mitosheet is confusing and not well explained. For example, whether graph_data_json should be part of the step class or the state class is not initially obvious.

Furthermore, how to create the property on the state class is unclear. Although we initially agreed on using self.final_defined_state.graph_data_json, but other properties likecolumn_format_types are accessed via self.post_state.column_format_types. When to use the final_defined_state and when to use post_state is not clear.

Log dataframe name defaulting to df1, df2 ...

Describe the bug
Occasionally, Mito doesn't read in dataframe names of passed arguments correctly. It defaults the dataframe names to being df1, df2, etc. This breaks all of the generated code because the dataframes that it references are non existent.

I believe that its just a timing issue with the active code cell.

To Reproduce
I'm unable to reproduce this bug.

Expected behavior
Use the correct dataframe names, please!

Additional context
When this occurs, it usually goes away if you just rerun the mitosheet.sheet() call.

Make mitosheet launchable on specific data frames from the command line

Is your feature request related to a problem? Please describe.

I wanted to use Mito for a quick analysis - and doing so required getting data into Mito. I didn't have a mitosheet instance started up.

I want to be able to really quickly launch mitosheet from the command line with some file paths as arguments, and get them in a mitosheet immediately.

Describe the solution you'd like

python -m mitosheet analyze path/to/file.csv

This should launch a JupyterLab instance with a notebook with this file in a cell, ready to be analyzed. I should just have to run it. This seems very doable!

Describe alternatives you've considered
We could do more complex things like adding icons, etc - but for my own usage of Mito (as a real, geniune user in this case), I think this solution would have been totally cool and really satisfactory. It's also pretty easy to do - we just need to reuse the infrastructure from the installer1

Additional context
No.

Arrow keys and the enter button should import a file in the import menu

Is your feature request related to a problem? Please describe.

One expects to scroll up and down with arrow keys, and click enter to import.

Describe the solution you'd like

Arrow keys up and down. Enter to import.

Describe alternatives you've considered
No alternatives. This is clearly the way!

Additional context
Nope!

Pivot table search does not autofocus

Describe the bug
The pivot table select dropdown's have a search input field in them. When opening the dropdown, the search field should be auto focussed on so it is easy for the user to begin searching!

Screen Shot 2022-02-10 at 1 15 30 PM

When you create and then rename a column the original column name persists in the auto-generated comment

Describe the bug
When you Add a new column in Mitosheet.sheet() it is given a random name.
When you change that random name to something more meaning full the column header updates but the autogenerated comment does not use the NEW column name but use the ORIGINAL column name.

To Reproduce
Steps to reproduce the behavior:

  1. Add a column
  2. Rename the column
  3. Note the comment above the rename statement. It knows what the old (Circled in black in screen shot)and new names (Circled in red in the screen shot) are.
  4. Carry out another task in the renamed column. Note that it is still referencing the old column name.

This is not dataset specific.

Expected behavior
I would like the new column name to be persisted in the comments from the column rename forward.

Screenshots
image

Desktop (please complete the following information):

  • OS: Windows 10 Pro, Version 21H1, Build: 19043.1526
  • Browser Chrome Version 98.0.4758.82 (Official Build) (64-bit)
  • Mito Version 0.3.157

Investigate size of `mitosheet/labextension`

Currently, an install of the mitosheet package will place a folder of 3.5 MB on disk, as of Monday, Feb 7th. 1.5 MBs of this is form the labextension folder itself.

It's worth understanding where this size comes from, and if we can optimize it. The smaller we can keep this package, the better installs will be (although 3.5 MBs is pretty small).

This also might not be worth optimizing, given the size of the other packages we need to install!

Investigate unifying JLab 2.0 and JLab 3.0 in a single package

Currently, to support both JLab 2.0 and JLab 3.0, we deploy multiple packages. It would be preferable if we could just deploy a single package, and have it all work.

To do this, we might be able to build directly into the mitosheet/labextension folder, and just provide both a ZIP file, and also the rest of the files for the prebuilt extension. This would be very sweet.

This might increase the size of the mitosheet package, which we should investigate...

CI doesn't catch bug with setting cell value

Describe the bug
The test test_set_cell_value_convert_datetime should be failing on dev, but all tests on dev are passing.

If you go to the CI report for dev (commit: 4b533c4), you will see that most of the set_cell_value tests are being skipped.

Screen Shot 2022-02-16 at 2 48 54 PM

In particular, this test is skipped because sys.version_info.minor <= 6, reason="requires 3.7 or greater".

This caused us to merge broken code into dev. Specifically, the code in dev does not support datetime or timedelta handling.
Screen Shot 2022-02-16 at 2 52 00 PM

Questions:

  • Does the CI in dev use a different version of pandas than CI in the PRs?

Mito doesn't generate code if there is no code cell below Mito

Describe the bug
I just watched a user create a mitosheet and because there was no code cell below Mito, Mito didn't create any generated code for the user. When we added a new blank cell below Mito and used Mito, all of the generated code appeared.

To Reproduce

  1. Create a blank Mitosheet
  2. Delete the code cell below Mito
  3. Use Mito
  4. Notice that no code is generated

Proposals
In plugin.tsx, we should check if there is no code cell below Mito. If there isn't we should create one!

API.tsx cleanup

Describe the bug
The api.tsx file is huge! Its very difficult to find specific functions in it. We should break it up into several files, one per event. One for edit events, one for api calls, etc.

Update development commands

Currently, there are a variety of workarounds that we need to use to make the development commands run consistency, including outdated peer dependencies, and more. This sucks.

We should take the most common issues, and fix them up so these commands are more robust!

Delete Row

Is your feature request related to a problem? Please describe.
Several users want to be able to delete rows without using a filter. Oftentimes, they want to do this because import did not work as they expected. See the below for more details on the relationship with import.

Describe the solution you'd like
I want to be able to click on rows in the index column to select the entire row, and then press either the backspace key or a delete button in the toolbar to delete all of the selected rows.

Describe alternatives you've considered
There are two alternatives to this 1) using a filter to remove specific rows from the data set 2) using the skip rows configuration while importing a .xlsx file.

Additional context
Deleting rows is often something we have seen users want to do directly after importing a file. Specifically, because the import didn't handle their file as they expected. Because users are trying to fix these issues in the sheet, we should give them the tools to do it. Yay for discoverability

Take the following import for example.

Screen Shot 2022-02-14 at 6 01 24 PM

Screen Shot 2022-02-14 at 6 01 45 PM

In this example, Mito/pandas was tripped up by the format of the original dataset, and the user ended up wanting to delete the row in Mito once they saw it. They didn't realize that they could go back and skip the rows in the import step.

Some other features that address related issues:

  • Select a row and promote that row to be the column headers
  • Add a row
  • Transpose a data frame

Non Adjacent Column selection in Mitosheet uses the wrong key combination for Windows

Describe the bug
As described in [Deleting Columns](https://docs.trymito.io/how-to/deleting-columns) to select non-adjacent columns in Mitosheet you use ...

cmd + click to add the column you just clicked on to your selection.

Whilst this may be true for Mac keyboards this actually translates to "Windows Key + click" on a windows PC. This in itself is wrong as the correct key sequence for Multi-select, non-adjacent columns should be "Right CTRL + click" ( I would hasten to add I always use right CTRL and not Left CTRL but both right and left CTRL should allow multiple select non-adjacent column selection)

This may be true for a Mac but for Windows users, this currently forces the user to use the Windows key and not the correct CTRL + Click key combination. This has been a Windows standard for as long as I can remember.

To Reproduce
Steps to reproduce the behavior:

  1. On a Windows PC load up Mitosheet in Jupyter Lab
  2. Click on a column, hold down the CTRL key
  3. Select other columns. You will note none are selected.

Expected behavior
Allow non-adjacent column or row selection on a Windows PC I would expect to use the CTRL + Click combination

Desktop (please complete the following information):

  • OS: Windows 10 Pro, Version 21H1, Build: 19043.1526
  • Browser Chrome Version 98.0.4758.82 (Official Build) (64-bit)
  • Mito Version 0.3.157

Import excel files from within mitosheet.sheet() like you can with CSV files

Is your feature request related to a problem? Please describe.
I work with a variety of Excel spreadsheets. Some of these have multiple sheets. I am currently unable to read them in from Python and must use the Mitosheet Import function to select/deselect the sheets that I want.

If I try and read them in from Mitosheet.sheet('spreadsheet.xlsx') I get the following error pointing to read_csv
Error as follows

Invalid argument passed to sheet: DroneWarsData.xlsx. This path could not be read with a pd.read_csv call. Please pass in the parsed dataframe directly.

Describe the solution you'd like
I would like to be able to load in all the sheets or specific sheets into mito from python using pandas read_excel or something similar. Ifound the following article helpful.
https://pythoninoffice.com/read-multiple-excel-sheets-with-python-pandas/

Describe alternatives you've considered
The only other alternative seems to be using Mitosheet's Import function. While this works it interrupts what could be an automatic flow.

Additional context
importing all the sheets via Mitosheet Import function results in the following code being created

# Imported DroneWarsData.xlsx
import pandas as pd
sheet_df_dictonary = pd.read_excel('DroneWarsData.xlsx', sheet_name=['Afghanistan', 'Somalia', 'Yemen', 'Pakistan', 'Variables', 'All', 'Unknown_Locations', 'All_WithoutUnknown', 'All_WithUnknown', 'Copy of All_WithoutUnknown', 'US Confirmed', 'US Confirmed (updated)', 'Confirmed vs Unconfirmed', 'Copy of Confirmed vs Unconfirme', 'Afghanistan(2)', 'Somalia(2)', 'Yemen(2)', 'All(2)'], skiprows=0)
Afghanistan = sheet_df_dictonary['Afghanistan']
Somalia = sheet_df_dictonary['Somalia']
Yemen = sheet_df_dictonary['Yemen']
Pakistan = sheet_df_dictonary['Pakistan']
Variables = sheet_df_dictonary['Variables']
All = sheet_df_dictonary['All']
Unknown_Locations = sheet_df_dictonary['Unknown_Locations']
All_WithoutUnknown = sheet_df_dictonary['All_WithoutUnknown']
All_WithUnknown = sheet_df_dictonary['All_WithUnknown']
Copy_of_All_WithoutUnknown = sheet_df_dictonary['Copy of All_WithoutUnknown']
US_Confirmed = sheet_df_dictonary['US Confirmed']
US_Confirmed_updated = sheet_df_dictonary['US Confirmed (updated)']
Confirmed_vs_Unconfirmed = sheet_df_dictonary['Confirmed vs Unconfirmed']
Copy_of_Confirmed_vs_Unconfirme = sheet_df_dictonary['Copy of Confirmed vs Unconfirme']
Afghanistan_2 = sheet_df_dictonary['Afghanistan(2)']
Somalia_2 = sheet_df_dictonary['Somalia(2)']
Yemen_2 = sheet_df_dictonary['Yemen(2)']
All_2 = sheet_df_dictonary['All(2)']

DroneWarsData.xlsx

Pivot tables should return 0 instead of NaN on count, count unique, and sum aggreatations

Is your feature request related to a problem? Please describe.
When creating a pivot table with a count aggregation, instead of returning 0 if there are no records that fall into a specific group, Mito returns NaN. Returning NaN values is reasonable for other types of aggregation types (ie: mean), but not for count.

Describe the solution you'd like
I'd like Mito to automatically convert NaN values to 0 when that is clearly the expected behavior in pivot tables. This should occur when the aggregation type is count, count unique, or sum.

It can be accomplished using the code aggfunc='count', fill_value=0,

Describe alternatives you've considered
Alternatively, I can use the pandas fillnan(0) function. However, this takes me out of Mito, which breaks up my analysis across multiple Mitosheets for no good reason.

Autoformat with black

We should start auto formatting with black (as a precommit hook?), so we don't have to ever think about formatting. This would be nice!

Create an opt-in, Unity like tour for the app

Is your feature request related to a problem? Please describe.

I was messing about with Unity tonight for the first time, and I took the tutorial my first time through. It looks like this:

Screen Shot 2022-03-01 at 9 09 33 PM

A few things:

  1. It was on a sample screen (getting data into the tool would literally be impossible).
  2. It was opt-in very seriously; they asked me if I wanted to take a tutorial or just get to work explicitly. I realized I didn't know shit, and so took the tutorial.
  3. It blocks out all but the relevant parts of the screen. This is really awesome in helping me focus; I am sure we could do this with Mito (including blocking out all of JupyterLab!)
  4. It has very detailed instructions that explains and highlights every button I click. It disables everything else.

The net result of all this is that I felt super guided. This is amazing.

Describe the solution you'd like
Do we want to take another shot a tour? I think that if we do, going for:

  1. Sample data, so we can guide users through an explicit process.
  2. Being very clear about what to click, and how to click it.
  3. Allowing users to opt into more tours, if they want to.

I know most users never really did tours, but I bet at least 5% of users would totally "fully opt in" if we gave them the option. For the other users, getting rid of any semblance of a tour might be good too - if it's just an annoyance.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

No way to sign up for Mito Pro after signed in

Currently, there is no way to sign up for Mito Pro after you finish the signing process, without using the installer. We should make it easy to sign up for Mito Pro.

Probably the easiest solution is a simple modal with an input that allows you to signup for Pro; that's it!

Telemetery still gets sent when mitosheet.sheet() is NOT loaded

Describe the bug
When running mito auto-generated code outside of Jupyter lab the telemetry (I think) upload fails and a corresponding error message is displayed

To Reproduce
Steps to reproduce the behaviour:

  1. In Jupyter Notebook
import mitosheet as mito
  1. Read in some data. I used a local excel file
import pandas as pd
sheet_df_dictonary = pd.read_excel('risks.xlsx', sheet_name=['Sheet1'], skiprows=0)
Sheet1 = sheet_df_dictonary['Sheet1']
  1. Do some column work
# Added column new-column-rmkw to df
df.insert(4, 'new-column-rmkw', 0)

# Renamed new-column-rmkw to Risk and Control ID in df
df.rename(columns={'new-column-rmkw': 'Risk and Control ID'}, inplace=True)
  1. Execute 'df' in a new cell then wait about 5 minutes or whatever polling freq you have set and you will receive the following error which will be displayed beneath the dataframe.
    error uploading: HTTPSConnectionPool(host='api.segment.io', port=443): Max retries exceeded with url: /v1/batch (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available."))
    risks.xlsx

I do not think this is dataset dependant.

Here is my full code

import mitosheet as mito

# Imported risks.xlsx
import pandas as pd
sheet_df_dictonary = pd.read_excel('risks.xlsx', sheet_name=['Sheet1'], skiprows=0)
Sheet1 = sheet_df_dictonary['Sheet1']

# Renamed Sheet1 to df
df = Sheet1

# Added column new-column-rmkw to df
df.insert(4, 'new-column-rmkw', 0)

# Renamed new-column-rmkw to Risk and Control ID in df
df.rename(columns={'new-column-rmkw': 'Risk and Control ID'}, inplace=True)

# Set new-column-rmkw in df to =CONCAT(Process ID, '-', Risk ID, '-', control ID, '-', Random)
df['Risk and Control ID'] = mito.CONCAT(df['Process ID'], '-', df['Risk ID'], '-', df['control ID'], '-', df['Random'])

Expected behaviour
I don't expect to see this error if I am using Mitosheet generated code outside of Jupyter lab.
Maybe something like the following pseudocode would take care of it

if mitosheet.sheet() exists:
    gather telemetry
else:
    don't gather telemetry

Screenshots
image

Desktop (please complete the following information):

  • OS: Windows 10 Pro, Version 21H1, Build: 19043.1526
  • Browser Chrome Version 98.0.4758.82 (Official Build) (64-bit)
  • Mito Version 0.3.157

Additional context
Add any other context about the problem here.

More granular deployment

Describe the request
Currently, we have a mix of big technical changes that have a higher probability of introducing bugs. For these PR's we ensure to do thorough testing that takes almost a week to complete from doing sanity checks, hiring UpWork users, reviewing the videos, doing workflows ourselves, and addressing bugs.

At the same time, we have a lot of high impact, small bugs that we don't need to test so thoroughly. For example, handling new file encodings and fixing sheet crashing errors.

I suspect that this trend will continue as we continue to focus on new features + special deployments while also consistently staying on top of robustness. In fact, it might intensify as we want to iterate on final UI implementations with the feedback of active users and Upwork users testing Mito through test-pypi.

With the new retention tracking, its really important that we get as many users in each month on the best version of Mito possible. Waiting a week to deploy bug fixes means that hundreds of more users have the potential to be impacted by that bug and those 100 users are part of our retention goal.

All of that is to say, it would be nice if we designed our deployment procedure to allow us to deploy low risk PR's quickly, while also allowing us the time required to thoroughly test high risk PRs.

Potential Solutions

  • Add feature flags that we add to features that are merged into dev at the time of deployment that we don't want to be accessible by users
  • Have two different branches that we merge into main for deployment 1) dev-low - small fixes that we can test in <1 day and deploy 2) dev-high - risky changes that require usability testing, etc. dev-low would be consistently merged into dev-high to reduce merge conflicts
  • For small fixes that we can test in <1 day, do all of the testing in the PR and then merge them directly into main instead of merging into dev first.

Formatting is limited in Formula Columns

Describe the bug
The potential format options for a number series is dependent on whether the column is a float or an int. For most of the format options this isn't an issue. For example, the Accounting format option treats both floats and ints in the same manner. However, for the default and plain text formatting options, floats and ints are treated differently. Specifically, in the plain text and default formatting option, a float has a trailing .0 and an int does not.

In a data column, its easy to convert the dtype from float to int in order to get rid of the trailing .0. But there is no way to convert a formula column from a float to an int. The only dtype manipulation available is to use the value formula which converts a series to a number series.

To Reproduce

  1. Create a dataframe in Mito that has a column, Date, that is a datetime series.
  2. Create a new column, Year, and use the formula =YEAR(Date)
  3. Notice that the values in the year column are formatted as 2,022.0
  4. Convert the format of the Year column from Default to Plain Text in order to get rid of the ,
  5. Notice that there is still a trailing .0 and no way to remove it.

Expected behavior
I expect to be able to remove the .0 from my data. Preferably, I want to change the dtype of the column through the column control panel, even for formula columns. In lieu of that, I want to be able to use formulas to specify the dtype of my column with more granularity. For example, I expect there to be INT and FLOAT Mito formulas.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: macOS Monteray
  • Browser: Firefox
  • Mito Version: 0.3.165

Move installer to `mitoinstaller` folder

Is your feature request related to a problem? Please describe.

Currently, the mitosheet tests run when we change the installer. This doesn't make much sense, and would be easily sorted if we just moved the installer to a top level mitoinstaller folder. This would communicate the package hierarchy better always.

Bug with filtering + selection

When you filter down to an empty dataset, and then undo the filter, the column you are filtering on might not be visible anymore. This was confusing!

To reproduce

  1. Select a column that is not in the first 10 columns (so that you have to scroll to the right in the dataframe to see it)
  2. Filter that column so that there are no rows remaining in the datset
  3. Remove the filter
  4. Notice that the selected column is not visible.
Screen.Recording.2022-02-13.at.3.17.10.PM.mov

Multi Column Merge

Is your feature request related to a problem? Please describe.
Select multiple columns from each sheet to use as the merge key.

Describe the solution you'd like
In the merge taskpane, the user should be required to select at least one column to use as the merge key for each sheet. However, they should have the ability to select multiple columns as well.

Although the backend changes are really simple, this requires a design spec. The biggest complexity in the design is how to match columns for comparison in an intuitive way.

Initial thinking would be to change the order of the taskpane so that there are two sections. In the first section, maybe laid out as two columns, the user selects the sheets to merge and the merge keys. Using columns could be a great way of making it obvious which columns are going to be compared. In the second section, the user selects which columns to keep from each sheet.

Describe alternatives you've considered
The alternative is using the concat function to combine multiple columns into one and then merging.

Additional context
This is one of the most commonly requested feature updates.

Move to storing analysis metadata in the notebook so users can share notebooks

Is your feature request related to a problem? Please describe.

Currently, I cannot send a notebook with a mitosheet to Aaron and have it work, as he doesn't have the analysis file saved on this local machine.

I'd like to be able to send him my analysis sometimes. I think it would make our dogfooding more interesting and useful, if we were able to engage with each others analysis.

Describe the solution you'd like

We need to do some experiments, but it seems we can store extra metadata with the notebook as noted here.

If we simply moved to storing the analysis like this - and doing out lookup for the new analysis on the front-end rather than the backend, we'd pretty much solve this problem.

We need to do some heavy investigation to make sure this is gonna work before JupyterLab versions, etc.

Note that this solution:

  1. Might have an annoying upgrade process; we would have to support both the front-end and the backend for a long time (until users moved to the other solution. 2.

Describe alternatives you've considered
Another option would be having a server; this is a much more annoying, complex, and bad solution, IMO - it requires a lot more infrastructure and maintain. Our best bet is to store this with the notebook file!

Additional context
Add any other context or screenshots about the feature request here.

Support more date formats

Describe the bug
Mito should detect the date format, 17.14.11 (YY.DD.MM) and successfully convert it to a pandas datetime.

We should figure out all of the most popular date formats and make sure Mito supports all of them.

Toggle All in the Filter by Value Taskpane

Is your feature request related to a problem? Please describe.
Users want to filter their dataset to only contain a few values.

Describe the solution you'd like
In the values taskpane, there should be Toggle All button. Using it should generate an inclusive filter in the generated code, instead of generating a large quantity of exclusive filters.

Describe alternatives you've considered
They can either use the values taskpane to uncheck a ton of boxes or they can identify values in the value taskpane and then set the filter in the filter/sort taskpane. Neither option is great.

Fully separate floats and integers

Is your feature request related to a problem? Please describe.
Currently, users have trouble working with floats and ints (see #54). This is for a bunch of reasons, but generally we need to stop treating them as the same thing anywhere. They are not the same thing!

Describe the solution you'd like
We should take a few specific steps to fully separate them:

  • Separate icons for floats and ints
  • INT and FLOAT functions that allow users to cast to a specific type.

There might be other places as well!

Bug with error traceback

The traceback for the column dependent error is not working.

To reproduce:

  1. Import data
  2. Add a new column and set its formula equal to another column, ex: =A
  3. Delete column A
  4. Click to display the error traceback and notice that it doesn't provide any information

Screen Shot 2022-02-13 at 2 38 51 PM

Formula column changes

Issues with formula columns

Currently, there is a lot of complexity that results from our specific notion of a formula column - both complexity for us developers, and complexity from a user perspective.

TODO: this should be a spec?

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Add types to remaining functions

We still have bugs in our Python code as a result of no types on our functions. This is a bigger issue in the API where there is no type on the step manager (which we need to fix, because circular imports).

Add Automated Confirmation Email after Pro Sign-up

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

A pro user found confusion in getting the pro access code and knowing if their payment went through

Describe the solution you'd like
A clear and concise description of what you want to happen.

An automated email with link to access code and confirmation of Pro version

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Generated filter code errors with mixed data types

Describe the bug
When a series contains strings and numbers, applying two contains functions generates code that errors.

To Reproduce

  1. Create a mitosheet using the following code
    import mitosheet import pandas as pd df = pd.DataFrame({'A': ['abc', 'def', 2, 'aaron', 'nate']}) mitosheet.sheet(df)
  2. Add a filter for contains 'a' and a filter for contains 'b'
  3. Run the generated code and see that it errors.

Screenshots
Screen Shot 2022-02-23 at 10 50 09 AM

graph and pivot table taskpane configuration doesn't update on Redo

Describe the bug
The graph and pivot table taskpane configuration doesn't update when a redo occurs, but does update when an undo occurs. The method that we use for recognizing that an undo occured does not work for recognizing redo events. We recognize undo events by checking that that prevLastStepIndex !== props.lastStepIndex - 1. However, just using these variables, there is no way to differentiate between redo events and just regular events. In both cases, the prevLastStepIndex is one less than props.lastStepIndex.

To Reproduce

  1. Create a pivot table
  2. Undo and notice that the pivot table taskpane updates
  3. Press redo and notice that the data in the pivot table updated, but the taskpane did not.

Move from sheet index -> sheet id

Is your feature request related to a problem? Please describe.
Currently, we reference sheets by sheet ids. This is nice in-so-far as it allows us to store sheets as an array, and terrible in so far as it complicates a lot of things:

  1. It makes it harder to optimize codes out after deletes: #40, because it is hard to track which data frames are what.
  2. It makes it harder to do things like reorder data frames in the sheet.
  3. It makes it harder to remove certain types of metadata from our steps: #584.

It also leads to a few bugs that currently exist in the app, that are very hard to fix up otherwise:

  1. Editing pivot tables that have dataframes with lower sheet indexes deleted leads to the params not being gotten properly (aka, you cannot edit them).

Describe the solution you'd like
We'd like to move from sheet indexes -> sheet ids. Doing so requires a few things:

  1. Refactoring the frontend to think about sheet ids, rather than sheet indexes.
  2. Refactoring the steps to take sheet ids, rather than sheet indexes.
  3. Refactoring the apis to take sheet ids, rather than sheet indexes.
  4. Upgrading all previous analyses that take sheet indexes to use sheet ids. Note that this a bit of an undertaking, as we need a good algorithm for building/maintaing sheet ids over time (which, needs some more infrastructure).

All-in-all, I expect this to be a 3-5 day task to complete. It will unlock a considerable amount of new functionality and solutions to problems, and generally seems like something we should have done a long time ago.

Describe alternatives you've considered
I haven't thought much about alternatives. Some form of ID seems necessary, although I guess we could layer it on top of the current system. Hmm.

View in Dataframe Button creates Mitosheet with invalid sheet names

Describe the bug
A clear and concise description of what the bug is.

To Reproduce

Screen.Recording.2022-02-28.at.1.42.58.PM.mov

Proposals
There are a few different ways we can resolve this issue:

  1. Try to not display the View in Mito button if the dataframe is not a variable already
  2. Detect that the dataframe is not already saved as a variable, and add a new line of code that creates the variable and then passes it to Mito
  3. When a user does this, either through the View in Mito button or otherwise, give an error
  4. Detect when this happens, and create a the actual dataframe as the first line of the generate code, and then update the name of the sheet so that all future code is valid.

If this is a common bug, then we should prioritize this!

Undo key presses should work on Mito

Describe the bug

On Mac, command + z in Mito does not undo the most recent operation.

Expected behavior

On Mac: command + z in Mito should undo the most recent operation
On Windows: control + z should undo the most recent operation.

Checklist for Feb 20th Deployment

Motivation

We are currently four big new changes that are WIP:

  1. New packages. mitosheet goes to JLab 3.0.
  2. State refactor. Removing unnecessary metadata.
  3. Dependency flexibility. Allowing pandas to be more flexible (old versions).
  4. New graphing code.

This is a lot of big things to test. We aim to get them all merged and deployed on test PyPi by the end of the week, so that we can explicitly test them very thoroughly, and then deploy. Below, we lay out what the timeline is for testing, and making sure that everything works.

Overall Timeline

  • Merge in all open PRs by EOD Thursday. This includes state refactor, dependency flexibility, and new graphing code.
  • Go on Upwork and setup 3-5 (aiming for 5 usability tests) before the weekend. Give them instructions for installing the TestPyPi versions of Mito. You can see the job posting here: https://www.upwork.com/jobs/~015e9ac34042570db8
  • Get the usability tests back before EOD Tuesday next week.
  • Aim for deployment by EOD Wednesday next week.

Specific test cases

For a few of these changes, there are explicit test cases we want to run. I list them here:

Test cases for the new packages

NOTE: run all these inside a virtual environment:

Test the installer

Command:

python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ mitoinstaller
python -m mitoinstaller install --test-pypi

Check:

  1. The installer should work, and launch Mito (which should render and work)
  2. Check you're on JLab 3.0, and the package installed is mitosheet

Test upgrading with the installer

After the above test case, run the command:

python3 -m mitoinstaller upgrade --test-pypi

Check:

  1. Mito still works, same packages are still installed.

Then, try the same with the command:

python3 -m mitoinstaller install --test-pypi

Check:

  1. Mito still works, same packages are still installed.

Direct mitosheet installation

python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ mitosheet

Check:

  1. You can launch jlab and render a mitosheet
  2. Check you're on JLab 3.0, and the package installed is mitosheet

Test installing mitosheet3 raw

python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ mitosheet3

Check:

  1. You can launch jlab and render a mitosheet
  2. Check you're on JLab 3.0, and the package installed is mitosheet3

Test upgrading mitosheet3

After the above test case, run the command:

python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ mitosheet3 --upgrade

Everything should work as before.

Test mitosheet2 install works with correct commands

Commands:

python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ mitosheet2
jupyter labextension install @jupyter-widgets/jupyterlab-manager@2

NOTE: on the labextension command, you might need to do export NODE_OPTIONS=--openssl-legacy-provider if it fails. This is a JupyterLab bug, not ours.

Check:

  1. You can launch jlab and render a mitosheet
  2. Check you're on JLab 2.0, and the package installed is mitosheet2

Test existing mitosheet2 users will not have a breaking upgrade process

Commands:

pip install mitosheet
jupyter labextension install @jupyter-widgets/jupyterlab-manager@2
pip uninstall mitosheet
python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ mitosheet2 --upgrade
jupyter labextension install @jupyter-widgets/jupyterlab-manager@2

NOTE: on the labextension command, you might need to do export NODE_OPTIONS=--openssl-legacy-provider if it fails. This is a JupyterLab bug, not ours.

Check:

  1. On launch JLab you get no errors (one popped up in the past)
  2. You can render a mitosheet
  3. Check you're on JLab 2.0, and the package installed is mitosheet2

Test existing mitosheet3 users will not have a breaking upgrade process

Commands:

python3 -m pip install mitosheet3
python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ mitoinstaller
python3 -m mitoinstaller install --test-pypi

Check:

  1. On launch JLab you get no errors (one popped up in the past)
  2. You can render a mitosheet
  3. Check you're on JLab 3.0, and the package installed is mitosheet

Checklist for the new package tests

  • Nate ran these tests on Mac
  • Aaron ran these tests on Mac
  • Someone ran these tests on Windows (TODO: figure out how to run these...)
  • Communicated to relevant users, see doc in product on Notion call Changes in Package Hierarchy Communication
  • Bumped versions correctly (medium version change) for mitosheet2

Tests for dependency flexibility

We simply want to try to use Mito on a workflow with a bunch of different versions of Pandas. After we have a workflow selected (see below), I want to test on pandas versions: 0.24.2, 0.25.4, 1.0.0, and 1.4.5.

Now, the main problem here is that I cannot personally install these earlier pandas versions. I'm thinking though... maybe we can use MyBinder instances? This will be annoying, as it's not a dev version, but I think we might have to.

I'll think on this one, but I think 4 workflows on each of these would be great! I don't think we should do these with end users, b/c they very likely will just lead to installation errors.

Tests for state refactor and new graphing code

These are best tested by a) doing workflows ourselves, and b) having users do workflows. @aarondr77 can we take one of the graphing workflows and adapt it (so it has a bit more transformations, or something), so we can send it out to users. Thoughts on what's best here?

EDIT: for now, I posted a UpWork job with the stack overflow analysis. I asked users to heavily user merge, pivot, and graphing, and we'll see what they end up doing here. They are also testing the new installer as well (they are installing locally).

Save scroll location in sheets

Describe the bug

Currently, we don't save scroll when switching between sheets. Pretty much every application on earth other than ours saves scroll. We should save this!

To Reproduce

  1. Import two data frames with enough data size to scroll in.
  2. Scroll to the end of the first sheet. Switch to a different sheet, and you'll be at the end of that too.

Expected behavior

Each sheet should maintain its scroll position when you switch between it.

Docs improvements specification

Overview and motivation

Good documentation is commonly cited as one of the main areas to invest in if you want to improve how users use ones product. We currently have documentation, but it is not very good - it's pretty much just a document that explains what features we have.

Also

Two methods of doc usage

Documentation structure

Other documentation changes

Ability to copy and paste Mito cells (or whole DF) to Excel or Gsheets

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

A user requested this feature.

Describe the solution you'd like
A clear and concise description of what you want to happen.

The ability to copy and paste Mito cells (or whole DF) to Excel to Gsheets

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

A button that connect the mitosheet to gsheets through the authentication package.

Additional context
Add any other context or screenshots about the feature request here.

I think this would be especially useful for business team users (finance, operations, marketers etc. )

Copy and Paste Out of Mito

Is your feature request related to a problem? Please describe.
One of the primary ways that users get data into and out of other spreadsheet tools is by copying / pasting the data. The process of getting data into Mito is more formal. Its a several step process, and becomes even more burdensome if the file is not in the same directory that you have launched Jupyter from.

Describe the solution you'd like
Users should be able to copy data from an Excel or CSV file into Mito, and they should be able to copy data from Mito into an Excel or CSV file.

Describe alternatives you've considered
The alternatives are using Mito's import and export features, which work, but take more steps + time.

Remove all unnecessary metadata from steps

Currently, we track all sorts of metadata in steps that is duplicated or otherwise can be gotten in other way. We should do our best to remove this data, as it:

  1. Requires extra work when adding any new step.
  2. Has led to multiple bugs as it gets out of date.

Ideally, we'd be able to come up with an algorithm that maintains this metadata automatically, but I figure this is not possible unless we subscribe to changes in pandas.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.