Coder Social home page Coder Social logo

python-record-api's People

Contributors

aktech avatar biogeek avatar github-actions[bot] avatar mreid-moz avatar rgommers avatar saulshanabrook avatar steff456 avatar ueshin avatar vfdev-5 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-record-api's Issues

We don't know where ufuncs are from!

During the tracing, it's helpful to know not only which methods on the ufuncs class are called (__call__, reduce, etc) but also which ufuncs themselves are used (add, multiple, etc).

Currently, we are presenting the results of those, not as the product of those two features, but as their union. i.e. we should stats for the reduce method on the ufunc class, but we don't show how many times reduce was called on add vs multiple. That's one "issue", but the other more pressing one is we don't know where ufuncs come from!

All we know is their names. Up until now, I had been assuming they are all defined in the numpy module. However, scipy for example has many that are not.

We should somehow figure out how to understand where they were defined, or what module they were imported from.

I guess to do this, we would have to do some kind of traversal of imported modules, to understand where they are defined? This also could be helpful for the related problem of recording, which module, exports a certain type instead of which module it was defined in.

ufunc data seems to be missing

I was looking for def sin and other such functions in typing/numpy.py, and they're missing completely. It's unclear why.

The actual question I was trying to figure out is: how often is the dtype keyword used for unary ufuncs. I thought the data I needed would be here, but it looks like it's not.

Record return types and determine exposed modules for objects

One thing we should add is the ability to record the return type from a function.

This would help us understand the signatures better.

Also, it could help with another issue, which is that currently we output functions/classes from the modules they are defined in, not the module they are imported from.

We should instead try to output them where they are imported from. However, if multiple libraries import from different locations, we should choose one and have the others be aliases.

One way to do that would be to record the return types from the getattr calls, so if you did something like import numpy; numpy.arange we would know the return type of the getattr is the arange function.

We currently record those getattr, but don't have the return type.

If we did, we could infer that any getattr from a module, which returns a class/fn from a different module represents an import alias.

Don't include long dask strings

We are currently recording some long dask strings like: qr-stack-getitem-b2984ed7b79fa1d55e4f99a321ae62ae-

We are currently trimming them. Instead, we should just record them as an arbitrary string type instead of a literal.

Add more downstream libraries

Look at libraries.io for other usage, like: https://libraries.io/pypi/numpy/usage

NumPy (over ~5k stars):

  • django: Doesn't use pytest https://code.djangoproject.com/ticket/30415
  • matplotlib
  • scipy
  • keras: doesnt use pytest
  • networkx
  • xgboost
  • tensorflow: doesnt use pytest
  • spaCy
  • statsmodels
  • autograd
  • pytorch
  • mxnet
  • airflow
  • lightgbm
  • allennlp

Pandas (excluding above)

  • hypothesis
  • pint
  • h2o-3
  • tablib
  • pandas-datareader
  • arrow
  • google/grr
  • scikit-optimize
  • influxdb-python
  • fastai
  • pymc3
  • tsfresh
  • vaex
  • modin
  • feather
  • dagster
  • ray
  • zipline
  • prohet
  • plotly
  • tpot
  • onnxruntime

Add profiling for C calls

Currently, if a library calls another library through their C API we are unable to trace it. This includes calling anything in Cython. This is too bad, because a lot of calls to NumPy are from Cython or C libraries.

One idea on how to achieve this, from talking to @scopatz, was to use lldb's Python API. It is now building on Conda Forge on mac so I can get started exploring this.

Version parts seperately

It would be nice if we could re-run the API generation without having to re-run the test suite recording.

To do this, we could split the python package into three:

  • jsonl: common functionality for reading/writing jsonl files
  • record-api: recording the calls made from an API and doing groupby lines
  • analyze-api: taking records of API calls and creating modules from them

Consider adding support for tracking subclass method invocations

An issue arose when analyzing GeoPandas and its consumption of Pandas APIs.

GeoPandas subclasses Pandas (see here), so, in principle, a subclassed method should correspond to an equivalent Pandas DataFrame method. However, this is not the case.

Based on analysis, GeoPandas appears to only consume 3 Pandas APIs, but this is, presumably, not a fair representation, given that many a GeoPandas DataFrame method is a Pandas DataFrame method.

Accordingly, may be worth investigating whether we can track subclass method invocations.

In conversation with @saulshanabrook, he suggested updating the Tracer to look at method resolution order (MRO).

Cleanup of Python Stack Access

There were some comments here by @Caagr98 on how we can clean up our ability to access the Python stack, that we could try to incorporate. I haven't looked into them yet.

Cannot pull images locally

The images are currently hosted on a private Digital Ocean registry. We should move them to a public registry instead, now that this repo is public. Maybe Github?

Record metadata for properties

Currently I am throwing away the usage metadata for properties when generating the API. We should preserve these and add them as comments.

Create better view of results

Currently, to see the results in some human readable way you can navigate the data/typing directory here. However, this is a bit cumbersome, especially now that some files have gotten so long Github will no longer allow you to link to a line or view them.

We could possibly use Sphinx to generate a site based off the type definitions? I see that Sphinx does have support for type overloads: sphinx-doc/sphinx@fb2f777

BUG: record_api.line_counts raises TypeError on unexpected keyword arguments to dump()

I'm getting the next error when calling record_api.line_counts. I'm using version 1.1.1.

/tmp/record_api_results.jsonl contains the output generated by python -m record_api, here there is a sample:

$ head -n 5 record_api_results.jsonl
{"location":"/api_stats/scripts/10002306.py:22","function":{"t":"builtin_function_or_method","v":"getattr"},"params":{"args":[{"t":"module","v":"pandas"},"read_csv"]}}
{"location":"/api_stats/scripts/10002306.py:22","function":{"t":"function","v":{"module":"pandas.io.parsers","name":"_make_parser_function.<locals>.parser_f"}},"bound_params":{"pos_or_kw":[["filepath_or_buffer","../input/train.csv"],["index_col",null]]}}
{"location":"/api_stats/scripts/10002306.py:23","function":{"t":"builtin_function_or_method","v":"getattr"},"params":{"args":[{"t":{"module":"pandas.core.frame","name":"DataFrame"}},"shape"]}}
{"location":"/api_stats/scripts/10002306.py:25","function":{"t":"method","v":{"self":{"t":{"module":"pandas.core.frame","name":"DataFrame"}},"name":"head"}},"bound_params":{}}
{"location":"/api_stats/scripts/10002306.py:27","function":{"t":"builtin_function_or_method","v":"getattr"},"params":{"args":[{"t":"module","v":"pandas"},"read_csv"]}}

Call to line_counts:

export PYTHON_RECORD_API_INPUT=/tmp/record_api_results.jsonl
export PYTHON_RECORD_API_OUTPUT=/tmp/record_api_results_line_counts.jsonl
python -m record_api.line_counts

Result:

Counting lines...
reading /tmp/record_api_results.jsonl: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 478985/478985 [00:02<00:00, 217632.24it/s]
writing:   0%|                                                                                                                                                                                                      | 0/13169 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/line_counts.py", line 42, in <module>
    __main__()
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/line_counts.py", line 38, in __main__
    write(row_)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/jsonl.py", line 45, in write_line
    buffer.write(orjson.dumps(o, **kwargs))
TypeError: dumps() got an unexpected keyword argument

AST for Literal's not right

Currently it looks like:

Literal[("The kernel, DotProduct(sigma_0=1), is not returnin",)]

We should remove the inner paranthesis there.

Don't serialize empty fields

we should ignore all the empty fields in the JSON and not serialize them to save space and make it easier to browse

Fix Koalas Runs

#86 fixed the github actions which caused the koalas image to be built and run, that was added in #82.

It built properly, but failed to run. Here is the output copied:

Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/usr/local/lib/python3.8/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-2e6e8f27-d3fa-4533-bc4e-38938ded36f3;1.0
	confs: [default]
	found io.delta#delta-core_2.12;0.7.0 in central
	found org.antlr#antlr4;4.7 in central
	found org.antlr#antlr4-runtime;4.7 in central
	found org.antlr#antlr-runtime;3.5.2 in central
	found org.antlr#ST4;4.0.8 in central
	found org.abego.treelayout#org.abego.treelayout.core;1.0.3 in central
	found org.glassfish#javax.json;1.0.4 in central
	found com.ibm.icu#icu4j;58.2 in central
downloading https://repo1.maven.org/maven2/io/delta/delta-core_2.12/0.7.0/delta-core_2.12-0.7.0.jar ...
	[SUCCESSFUL ] io.delta#delta-core_2.12;0.7.0!delta-core_2.12.jar (96ms)
downloading https://repo1.maven.org/maven2/org/antlr/antlr4/4.7/antlr4-4.7.jar ...
	[SUCCESSFUL ] org.antlr#antlr4;4.7!antlr4.jar (28ms)
downloading https://repo1.maven.org/maven2/org/antlr/antlr4-runtime/4.7/antlr4-runtime-4.7.jar ...
	[SUCCESSFUL ] org.antlr#antlr4-runtime;4.7!antlr4-runtime.jar (6ms)
downloading https://repo1.maven.org/maven2/org/antlr/antlr-runtime/3.5.2/antlr-runtime-3.5.2.jar ...
	[SUCCESSFUL ] org.antlr#antlr-runtime;3.5.2!antlr-runtime.jar (5ms)
downloading https://repo1.maven.org/maven2/org/antlr/ST4/4.0.8/ST4-4.0.8.jar ...
	[SUCCESSFUL ] org.antlr#ST4;4.0.8!ST4.jar (7ms)
downloading https://repo1.maven.org/maven2/org/abego/treelayout/org.abego.treelayout.core/1.0.3/org.abego.treelayout.core-1.0.3.jar ...
	[SUCCESSFUL ] org.abego.treelayout#org.abego.treelayout.core;1.0.3!org.abego.treelayout.core.jar(bundle) (4ms)
downloading https://repo1.maven.org/maven2/org/glassfish/javax.json/1.0.4/javax.json-1.0.4.jar ...
	[SUCCESSFUL ] org.glassfish#javax.json;1.0.4!javax.json.jar(bundle) (3ms)
downloading https://repo1.maven.org/maven2/com/ibm/icu/icu4j/58.2/icu4j-58.2.jar ...
	[SUCCESSFUL ] com.ibm.icu#icu4j;58.2!icu4j.jar (118ms)
:: resolution report :: resolve 1696ms :: artifacts dl 278ms
	:: modules in use:
	com.ibm.icu#icu4j;58.2 from central in [default]
	io.delta#delta-core_2.12;0.7.0 from central in [default]
	org.abego.treelayout#org.abego.treelayout.core;1.0.3 from central in [default]
	org.antlr#ST4;4.0.8 from central in [default]
	org.antlr#antlr-runtime;3.5.2 from central in [default]
	org.antlr#antlr4;4.7 from central in [default]
	org.antlr#antlr4-runtime;4.7 from central in [default]
	org.glassfish#javax.json;1.0.4 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   8   |   8   |   8   |   0   ||   8   |   8   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-2e6e8f27-d3fa-4533-bc4e-38938ded36f3
	confs: [default]
	8 artifacts copied, 0 already retrieved (15071kB/48ms)
20/10/21 12:58:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/10/21 12:58:31 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://[email protected]:37093
	at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)
	at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:140)
	at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
	at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
	at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:34)
	at org.apache.spark.executor.Executor.<init>(Executor.scala:206)
	at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
	at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:201)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:555)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
20/10/21 12:58:31 ERROR Utils: Uncaught exception in thread Thread-5
java.lang.NullPointerException
	at org.apache.spark.scheduler.local.LocalSchedulerBackend.org$apache$spark$scheduler$local$LocalSchedulerBackend$$stop(LocalSchedulerBackend.scala:168)
	at org.apache.spark.scheduler.local.LocalSchedulerBackend.stop(LocalSchedulerBackend.scala:144)
	at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:734)
	at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2171)
	at org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:1973)
	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1357)
	at org.apache.spark.SparkContext.stop(SparkContext.scala:1973)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:641)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
20/10/21 12:58:31 WARN MetricsSystem: Stopping a MetricsSystem that is not running
ImportError while loading conftest '/usr/src/app/databricks/conftest.py'.
databricks/conftest.py:41: in <module>
    session = utils.default_session(shared_conf)
databricks/koalas/utils.py:384: in default_session
    session = builder.getOrCreate()
/usr/local/lib/python3.8/site-packages/pyspark/sql/session.py:186: in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
/usr/local/lib/python3.8/site-packages/pyspark/context.py:376: in getOrCreate
    SparkContext(conf=conf or SparkConf())
/usr/local/lib/python3.8/site-packages/pyspark/context.py:135: in __init__
    self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
/usr/local/lib/python3.8/site-packages/pyspark/context.py:198: in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
/usr/local/lib/python3.8/site-packages/pyspark/context.py:315: in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
/usr/local/lib/python3.8/site-packages/py4j/java_gateway.py:1568: in __call__
    return_value = get_return_value(
/usr/local/lib/python3.8/site-packages/py4j/protocol.py:326: in get_return_value
    raise Py4JJavaError(
E   py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
E   : org.apache.spark.SparkException: Invalid Spark URL: spark://[email protected]:37093
E   	at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)
E   	at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:140)
E   	at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
E   	at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
E   	at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:34)
E   	at org.apache.spark.executor.Executor.<init>(Executor.scala:206)
E   	at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
E   	at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
E   	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:201)
E   	at org.apache.spark.SparkContext.<init>(SparkContext.scala:555)
E   	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
E   	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
E   	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
E   	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
E   	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
E   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
E   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
E   	at py4j.Gateway.invoke(Gateway.java:238)
E   	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
E   	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
E   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
E   	at java.lang.Thread.run(Thread.java:748)

@ueshin were you able to test it locally? Would you be able to help debug this failure? I just added you as a maintainor of this repo as well.

Put types in parent modules

Currently, we list the type where an object is defined as its module. However, many libraries define types in some sub-module but only expose them in a parent module.

So instead, when recording a type, we should look in all parent modules to see if they export this same type. If so, we should use that as the module name instead of where it is created.

Pandas Ran out of memory again!

So the pandas test suite ran out of memory again in Kubernetes. It used up ~13Gb and then was killed, because the pods only have that much available.

I am a bit hesitant to just raise the pod memory limit again... If anyone knows if this is a reasonable amount of memory for Pandas to use when testing (cc @datapythonista), that would be helpful! It's also possible that the tracing has some sort of memory leak which is blowing things up for pandas, although all the other test suites don't seem to have the same problem.

Maybe I can run Pandas test suites with some flags to ignore some high memory tests? These are my current ones:

CMD [ "pytest", "pandas", "--skip-slow", "--skip-network", "--skip-db", "-m", "not single", "-r", "sxX", "--strict", "--suppress-tests-failed-exit-code" ]

I copied it from the test-fast script, or whatever that is, in the Pandas repo.

Failure on pandas commands

I've got this script:

import pandas

df = pandas.DataFrame({'col': ['foo bar']})
df['col'].map(lambda x: len(x.split(' ')))

When I run it with the Python interpreter, it works without problems.

But when I run it with PYTHON_RECORD_API_TO_MODULES="pandas" python -m record_api, I get the following error:

Traceback (most recent call last):
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/__main__.py", line 12, in <module>
    tracer.calls_from_modules[0], run_name="__main__", alter_sys=True
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 205, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mgarcia/quansight/dataframe_tools/kaggle/mutable/scripts/9996822.py", line 4, in <module>
    df['col'].map(lambda x: len(x.split(' ')))
  File "/home/mgarcia/quansight/dataframe_tools/kaggle/mutable/scripts/9996822.py", line 4, in <module>
    df['col'].map(lambda x: len(x.split(' ')))
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 564, in __call__
    Stack(self, frame)()
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 372, in __call__
    getattr(self, method_name)()
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 477, in op_CALL_METHOD
    self.process((function,), function, args)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 354, in process
    log_call(f"{filename}:{line}", fn, *args, **kwargs)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 262, in log_call
    bound = Bound.create(fn, args, kwargs)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 239, in create
    sig = signature(fn)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/pandas/core/generic.py", line 1799, in __hash__
    f"{repr(type(self).__name__)} objects are mutable, "
TypeError: 'Series' objects are mutable, thus they cannot be hashed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/__main__.py", line 15, in <module>
    raise Exception(f"Error running {tracer.calls_from_modules}")
Exception: Error running ['9996822']

Not sure what's the exact pattern, but I'd say I get an error like this in almost every script that uses pandas. Let me know if you need more information, I can find other examples, but I guess it should be obvious for you what's wrong.

Error: editable mode currently requires a setup.py based build

At the moment the README.md suggests to install the package via pip install -e .

I get the following error, when I try to install in editable mode:

ERROR: File "setup.py" not found. Directory cannot be installed in editable mode: /Users/aktech/quansight/python-record-api
(A "pyproject.toml" file was found, but editable mode currently requires a setup.py based build.)

I believe the fix is to change that to:

pip install .

This worked for me.

Fix ufunc call

This doesn't look right, the real function doesn't include one of the overloads:

class ufunc:
    @overload
    def __call__(self, _0: numpy.ndarray, _1: numpy.ndarray, /):
        """
        usage.sample-usage: 1
        """
        ...

    @overload
    def __call__(self, _0: numpy.ndarray, /):
        """
        usage.sample-usage: 2
        """
        ...

    def __call__(self, _0: numpy.ndarray, /):
        """
        usage.sample-usage: 3
        """

contains bug again

the current typings still have __contains__ on the wrong object... we should make sure we are tracing this properly now

Improvements from internal call

  • See if we can get signatures for c functions which we fail to get now
  • Record the position of named args as well
  • Seperate function name, if it is a method or classmethod
  • seperate functions passed in (see if there are any)
  • groupby lines first, and record that to minimize length
  • ignore from test file
  • add docker images
  • add some top used numpy repos

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.