Coder Social home page Coder Social logo

Comments (3)

semcogli avatar semcogli commented on July 18, 2024

I am adding a request for MultiIndex test. I think this is something definitely needed.

Our database have tables like "annual_employment_control_totals", which contains multilevel index "year", " large_area_id" and "sector_id". Primary key test does not work in this situation. A simple solution is to add a multiindex assertion. Some sample codes as follows ( not checking the order of index though but probably good enough)

code snippets for test:

multiindex_cols=[ ]
if (k, v) == ('multiindex', True):
multiindex_cols.append(k)

if len(multiindex_cols)>0:
assert_columns_are_multiindex(table_name, multiindex_cols)


def assert_columns_are_multiindex(table_name, multiindex_cols):
"""
doc string here
"""

try:
    idx = orca.get_table(table_name).index
    assert set(idx.names) == set (multiindex_cols)
except:
    msg = "Column '%s' is not set as the index of table '%s'" \
            % (multiindex_cols, table_name)
    raise OrcaAssertionError(msg)

try:
    assert len(idx.unique()) == len(idx)
except:
    msg = "Column '%s' is the index of table '%s' but its values are not unique" \
            % (multiindex_cols, table_name)
    raise OrcaAssertionError(msg)

try:
    assert sum(pd.isnull(idx)) == 0
except:
    msg = "Column '%s' is the index of table '%s' but it contains missing values" \
            % (multiindex_cols, table_name)
    raise OrcaAssertionError(msg)

return

from orca_test.

smmaurer avatar smmaurer commented on July 18, 2024

Thank you, this is good feedback! I agree that the YAML syntax will be helpful. I don't think we have any code for that yet, but it should be fairly straightforward to implement. We might be able to borrow code from the UrbanSim functions for working with yaml-based settings and model specs.

Regarding the indexes, we should probably come up with a unified approach for how Orca_test treats them. Here are some potential cases, to get us started:

  1. Column is an index of underlying DataFrame
  2. Column is an index, plus its values are unique and non-missing
  3. Column's values correspond to index of another table
  4. Column's values correspond to index of another table, and are non-missing
  5. Columns are a multi-index of underlying DataFrame
  6. Additional multi-index uniqueness and missing-ness cases?
  7. Others?

Currently, the primary_key spec represents the 2nd case, and the foreign_key spec represents the 4th case. How many permutations do we want to handle with Orca_test?

Some criteria might be: (a) it's a plausible and intended use of Orca, and (b) any missing piece would potentially break model step logic.

What do you think? I'll have to read up a bit on Orca and on DataFrame indexes to get a better idea of what the plausible and intended use cases are. Let's leave this issue open and use it for discussion of how we want to handle this.

from orca_test.

semcogli avatar semcogli commented on July 18, 2024

I strongly agree the 2 criteria you proposed. Since the test is intended to work with UrbanSim code, it should follow the standards and expectations of the model.

Additional index checking could be, column values are correspond to multiindex of another table, plus the uniqueness and missing values and so on. So it may end up with many more tests. I am thinking, whether we can simply the case by focusing on index and value-index combinations only. But let user choose, as options of index test, the additional uniqueness and non-missing tests. what do you think?

from orca_test.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.