Comments (3)
I am adding a request for MultiIndex test. I think this is something definitely needed.
Our database have tables like "annual_employment_control_totals", which contains multilevel index "year", " large_area_id" and "sector_id". Primary key test does not work in this situation. A simple solution is to add a multiindex assertion. Some sample codes as follows ( not checking the order of index though but probably good enough)
code snippets for test:
multiindex_cols=[ ]
if (k, v) == ('multiindex', True):
multiindex_cols.append(k)
if len(multiindex_cols)>0:
assert_columns_are_multiindex(table_name, multiindex_cols)
def assert_columns_are_multiindex(table_name, multiindex_cols):
"""
doc string here
"""
try:
idx = orca.get_table(table_name).index
assert set(idx.names) == set (multiindex_cols)
except:
msg = "Column '%s' is not set as the index of table '%s'" \
% (multiindex_cols, table_name)
raise OrcaAssertionError(msg)
try:
assert len(idx.unique()) == len(idx)
except:
msg = "Column '%s' is the index of table '%s' but its values are not unique" \
% (multiindex_cols, table_name)
raise OrcaAssertionError(msg)
try:
assert sum(pd.isnull(idx)) == 0
except:
msg = "Column '%s' is the index of table '%s' but it contains missing values" \
% (multiindex_cols, table_name)
raise OrcaAssertionError(msg)
return
from orca_test.
Thank you, this is good feedback! I agree that the YAML syntax will be helpful. I don't think we have any code for that yet, but it should be fairly straightforward to implement. We might be able to borrow code from the UrbanSim functions for working with yaml-based settings and model specs.
Regarding the indexes, we should probably come up with a unified approach for how Orca_test treats them. Here are some potential cases, to get us started:
- Column is an index of underlying DataFrame
- Column is an index, plus its values are unique and non-missing
- Column's values correspond to index of another table
- Column's values correspond to index of another table, and are non-missing
- Columns are a multi-index of underlying DataFrame
- Additional multi-index uniqueness and missing-ness cases?
- Others?
Currently, the primary_key
spec represents the 2nd case, and the foreign_key
spec represents the 4th case. How many permutations do we want to handle with Orca_test?
Some criteria might be: (a) it's a plausible and intended use of Orca, and (b) any missing piece would potentially break model step logic.
What do you think? I'll have to read up a bit on Orca and on DataFrame indexes to get a better idea of what the plausible and intended use cases are. Let's leave this issue open and use it for discussion of how we want to handle this.
from orca_test.
I strongly agree the 2 criteria you proposed. Since the test is intended to work with UrbanSim code, it should follow the standards and expectations of the model.
Additional index checking could be, column values are correspond to multiindex of another table, plus the uniqueness and missing values and so on. So it may end up with many more tests. I am thinking, whether we can simply the case by focusing on index and value-index combinations only. But let user choose, as options of index test, the additional uniqueness and non-missing tests. what do you think?
from orca_test.
Related Issues (18)
- Ultra specific testing for debug purposes HOT 4
- Overbroad Except Clauses HOT 4
- Split into multiple files HOT 1
- Complete a test with report on specific issues HOT 5
- Possible bug in missing value assertions HOT 2
- Write unit tests HOT 3
- Add Python 3 cross-compatibility HOT 1
- Standardize docstrings and put together Sphinx documentation
- Foreign_key test report additional/missing values HOT 1
- Set defaults in test specs HOT 3
- Error messages should put table name in a consistent spot
- What can we learn from engarde HOT 1
- Slow performance & redundant tests HOT 10
- categorical columns
- Raise an error when an undefined key is included in a spec
- More informative error message for max_portion_missing
- Warnings or reports rather than assertions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from orca_test.