nickderobertis / data-code Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/loader.py:331. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/loader.py:339. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/pipeline/operations/generate.py:37. It will automatically be closed when the TODO comment is removed from the default branch (master).
Some initial work is done in _daily_multipler, but need to add
for other functions, and be more flexible for custom calendars
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/portfolio/cumret.py:179. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in .github/workflows/automerge.yml:51. It will automatically be closed when the TODO comment is removed from the default branch (master).
Already hashing Transform
base class. But was getting unhashable type: 'AppliedTransform'
even with that.
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/transform/applied.py:62. It will automatically be closed when the TODO comment is removed from the default branch (master).
Currently the code assumes the same index in the existing source and loaded source.
Need to add code to change the index. But if this was due to a desired aggregation,
how should the user select what aggregation would be applied?
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/loader.py:68. It will automatically be closed when the TODO comment is removed from the default branch (master).
Needs to be after adding data types to variables. Then can use data types to optimize
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/loader.py:260. It will automatically be closed when the TODO comment is removed from the default branch (master).
The DataLoader
checks what variables are needed for calculations that are not
included in load_variables
, and if it requires multiple transformations of
a variable, then it copies that series for as many transformations are needed.
It would be better to have an implementation that doesn't require carrying copies
through everything.
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/loader.py:203. It will automatically be closed when the TODO comment is removed from the default branch (master).
The workflow docs.yml is referencing action peaceiris/actions-gh-pages using references v2.5.0. However this reference is missing the commit d2178821cb5968f5b7c818210297f3dbeea3114c which may contain fix to the some vulnerability.
The vulnerability fix that is missing by actions version could be related to:
(1) CVE fix
(2) upgrade of vulnerable dependency
(3) fix to secret leak and others.
Please consider to update the reference to the action.
Loader should be able to take a DataFrame instead of just a filepath, then use
that here. Will need to handle columns, variables, transformations correctly
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/source.py:181. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/pipeline/operations/transform.py:43. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in Pipfile:34. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in tests/test_data.py:233. It will automatically be closed when the TODO comment is removed from the default branch (master).
Currently the same from_str method is in the subclasses because they have a different init
Only int and float have different from_str methods, and both of those are the same. Create
mixin or intermediate classes to eliminate repeated code.
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/dtypes/base.py:43. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in tests/test_data.py:167. It will automatically be closed when the TODO comment is removed from the default branch (master).
last_modified
is calculated a lot and goes through the
entire pipeline each time. Caching the result of the
calculations will give a significant speed up, especially
in DataExplorer.graph
. Need to handle updating the cache
whenever data sources or operations change, and somehow
also when OS modified time of file changes (fs events?).
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/pipeline/base.py:229. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/outputter.py:111. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/compareids/models/datasets.py:106. It will automatically be closed when the TODO comment is removed from the default branch (master).
Entire jobs are getting copied between workflow files due to limitations in Github Actions.
The only difference in these jobs is that they checkout master instead of requiring master
Possible changes to Github Actions that would allow the automerge workflow to be refactored:
This issue has been automatically created by todo-actions based on a TODO comment found in .github/workflows/automerge.yml:89. It will automatically be closed when the TODO comment is removed from the default branch (master).
Need to wait for pd_utils to support it
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/transform/specific/lag.py:157. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in tests/test_source.py:381. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in tests/test_data.py:95. It will automatically be closed when the TODO comment is removed from the default branch (master).
class TestLoadAndMergeCompustat(DataFrameTest):
def test_freq_a(self):
expect_df = pd.DataFrame(data = [
('001076', Timestamp('1995-03-01 00:00:00'), Timestamp('1994-03-31 00:00:00'),
185.18400000000003, 112.70299999999999),
('001076', Timestamp('1995-04-01 00:00:00'), Timestamp('1995-03-31 00:00:00'),
228.892, 113.575),
('001722', Timestamp('2012-01-01 00:00:00'), Timestamp('2011-06-30 00:00:00'),
80676.0, 1247.0),
('001722', Timestamp('2012-07-01 00:00:00'), Timestamp('2012-06-30 00:00:00'),
89038.0, 1477.0),
('001722', numpy.timedelta64('NaT','ns'), numpy.timedelta64('NaT','ns'),
numpy.timedelta64('NaT','ns'), numpy.timedelta64('NaT','ns')),
(numpy.datetime64('NaT'), numpy.datetime64('2012-01-01T00:00:00.000000000'), numpy.datetime64('NaT'),
numpy.datetime64('NaT'), numpy.datetime64('NaT')),
], columns = ['GVKEY', 'Date', 'datadate', 'sale', 'capx'])
c_str = datacode.load_and_merge_compustat(self.df_gvkey_str, get=['sale','capx'], freq='a',
gvkeyvar='GVKEY', debug=True)
c_num = datacode.load_and_merge_compustat(self.df_gvkey_num, get=['sale','capx'], freq='a',
gvkeyvar='GVKEY', debug=True)
assert_frame_equal(expect_df, c_str, check_dtype=False)
assert_frame_equal(expect_df, c_num, check_dtype=False)
def test_freq_q(self):
expect_df = pd.DataFrame(data = [
('001076', Timestamp('1995-03-01 00:00:00'), Timestamp('1994-12-31 00:00:00'),
56.511, 21.96799999999999),
('001076', Timestamp('1995-04-01 00:00:00'), Timestamp('1995-03-31 00:00:00'),
59.551, 29.421000000000006),
('001722', Timestamp('2012-01-01 00:00:00'), Timestamp('2011-12-31 00:00:00'),
23306.0, 409.0),
('001722', Timestamp('2012-07-01 00:00:00'), Timestamp('2012-06-30 00:00:00'),
22675.0, 284.0),
('001722', numpy.timedelta64('NaT','ns'), numpy.timedelta64('NaT','ns'),
numpy.timedelta64('NaT','ns'), numpy.timedelta64('NaT','ns')),
(numpy.datetime64('NaT'), numpy.datetime64('2012-01-01T00:00:00.000000000'), numpy.datetime64('NaT'),
numpy.datetime64('NaT'), numpy.datetime64('NaT')),
], columns = ['GVKEY', 'Date', 'datadate', 'saleq', 'capxq'])
c_str = datacode.load_and_merge_compustat(self.df_gvkey_str, get=['sale','capx'], freq='q',
gvkeyvar='GVKEY', debug=True)
c_num = datacode.load_and_merge_compustat(self.df_gvkey_num, get=['sale','capx'], freq='q',
gvkeyvar='GVKEY', debug=True)
assert_frame_equal(expect_df, c_str, check_dtype=False)
assert_frame_equal(expect_df, c_num, check_dtype=False)
This issue has been automatically created by todo-actions based on a TODO comment found in tests/test_data.py:493. It will automatically be closed when the TODO comment is removed from the default branch (master).
Currently just checking to make sure they can be generated with no errors.
Should also check the contents of the graphs. Also see TestCreateSource.test_graph
This issue has been automatically created by todo-actions based on a TODO comment found in tests/pipeline/test_data_merge.py:96. It will automatically be closed when the TODO comment is removed from the default branch (master).
We are adding extra columns here for calculated variables which require variables not
included in load_variables
. Currently, it will load extra variables even if
the calculation could just be done before variable transforms. For example, the
test TestLoadSource.test_load_with_calculate_on_transformed_before_transform
should be able
to complete without adding any extra columns
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/source.py:97. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/dtypes/str_type.py:12. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in tests/test_data.py:12. It will automatically be closed when the TODO comment is removed from the default branch (master).
The dictionary of columns has keys as names in the original source and values as columns.
A calculated column is not in the original source, so uuid was used for now just to ensure
that these columns can be in the dictionary, but they should be tracked separately.
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/loader.py:154. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/portfolio/cumret.py:146. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in tests/init.py:3. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/portfolio/resample.py:11. It will automatically be closed when the TODO comment is removed from the default branch (master).
Examining last_modified or pipeline_last_modified on
a large pipeline structure is extremely slow. Performance
of DataExplorer graphing could be improved if it first found
only the terminal pipelines and sources and used only those,
as the nested is included anyway.
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/explorer.py:118. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/pipeline/operations/combine.py:39. It will automatically be closed when the TODO comment is removed from the default branch (master).
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/pipeline/operations/analysis.py:40. It will automatically be closed when the TODO comment is removed from the default branch (master).
E.g. MatchComparisonBarData(100000000, 100000, 100000, name='Unbalanced')
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/compareids/models/bars.py:44. It will automatically be closed when the TODO comment is removed from the default branch (master).
class TestGetGvkeyOrPermno(DataFrameTest):
def test_get_gvkey_with_nan(self):
expect_df = pd.DataFrame(data = [
('a', Timestamp('2000-01-01 00:00:00'), 10516.0, 1722),
('a', Timestamp('2000-01-02 00:00:00'), 10516.0, 1722),
('a', Timestamp('2000-01-03 00:00:00'), 10516.0, 1722),
('a', Timestamp('2000-01-04 00:00:00'), 10516.0, 1722),
('b', Timestamp('2000-01-01 00:00:00'), 10516.0, 1722),
('b', Timestamp('2000-01-02 00:00:00'), 10516.0, 1722),
('b', Timestamp('2000-01-03 00:00:00'), 10516.0, 1722),
('b', Timestamp('2000-01-04 00:00:00'), 10516.0, 1722),
('a', Timestamp('2008-01-01 00:00:00'), nan, nan),
('a', Timestamp('2009-01-02 00:00:00'), nan, nan),
('a', Timestamp('2010-01-03 00:00:00'), 78049.0, 1076),
('a', Timestamp('2011-01-04 00:00:00'), 10517.0, 1076),
], columns = ['byvar', 'Date', 'PERMNO', 'GVKEY'])
ggop = datacode.get_gvkey_or_permno(self.permno_df_with_nan, datevar='Date',
other_byvars='byvar') #default is on permno get gvkey
assert_frame_equal(expect_df, ggop)
This issue has been automatically created by todo-actions based on a TODO comment found in tests/test_data.py:141. It will automatically be closed when the TODO comment is removed from the default branch (master).
Currently calling self._set_variables_and_collections() before self._create_variable_map()
as variables need to have the custom name attributes created. But then still calling after to
set the variables attributes correctly. Could reorganize this.
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/variables/collection.py:41. It will automatically be closed when the TODO comment is removed from the default branch (master).
Had to put safe=False in merge pipeline output to make it happen
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/pipeline/operations/merge.py:53. It will automatically be closed when the TODO comment is removed from the default branch (master).
When variables are calculated there is no corresponding column
being passed to DataSource so it does not have a consistent load_key
for saving purposes. Passing the column results in an error for it not
existing in the original data. Need to be able to pass columns which are
from calculations and not to load from existing data
This issue has been automatically created by todo-actions based on a TODO comment found in tests/pipeline/test_auto_cache.py:163. It will automatically be closed when the TODO comment is removed from the default branch (master).
This code is supposed to prevent that but is not working as expected.
The original variables are still being modified. The problem occurs with both
SourceTransform.apply and Transform.apply_to_source. A test has been added which
catches this issue in test_lags_as_source_transform_with_subset but it has been
commented out for now.
This issue has been automatically created by todo-actions based on a TODO comment found in datacode/models/transform/source.py:52. It will automatically be closed when the TODO comment is removed from the default branch (master).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.