data-apis / python-record-api Goto Github PK

View Code? Open in Web Editor NEW

75.0 75.0 6.0 10.59 MB

Inferring Python API signatures from tracing usage.

License: MIT License

Python 99.60% Makefile 0.12% Dockerfile 0.29%

python trace-calls

python-record-api's People

Contributors

Stargazers

Watchers

Forkers

jhadjar biogeek mreid-moz nguyenducnhaty ueshin vfdev-5

python-record-api's Issues

Koalas Not Tracing

Koalas tests are now running, but they don't seem to be generating any traces? See http://167.172.14.45:2746/artifacts/argo/koalas-1.2.1-0-1-1/koalas-1.2.1-0-1-1-1410725124/main-logs for the successful logs.

No traces generated: #89

Document how to add new k8 builds and clean up readme

Document how to add new downstream and upstream libraries

We should document, in the README, how to add, say tensorflow as an upstream library, and some other library as a downstream one, to trace how it is used.

reduce minimum resource needs for containers

We don't know where ufuncs are from!

During the tracing, it's helpful to know not only which methods on the ufuncs class are called (__call__, reduce, etc) but also which ufuncs themselves are used (add, multiple, etc).

Currently, we are presenting the results of those, not as the product of those two features, but as their union. i.e. we should stats for the reduce method on the ufunc class, but we don't show how many times reduce was called on add vs multiple. That's one "issue", but the other more pressing one is we don't know where ufuncs come from!

All we know is their names. Up until now, I had been assuming they are all defined in the numpy module. However, scipy for example has many that are not.

We should somehow figure out how to understand where they were defined, or what module they were imported from.

I guess to do this, we would have to do some kind of traversal of imported modules, to understand where they are defined? This also could be helpful for the related problem of recording, which module, exports a certain type instead of which module it was defined in.

ufunc data seems to be missing

I was looking for def sin and other such functions in typing/numpy.py, and they're missing completely. It's unclear why.

The actual question I was trying to figure out is: how often is the dtype keyword used for unary ufuncs. I thought the data I needed would be here, but it looks like it's not.

Record return types and determine exposed modules for objects

One thing we should add is the ability to record the return type from a function.

This would help us understand the signatures better.

Also, it could help with another issue, which is that currently we output functions/classes from the modules they are defined in, not the module they are imported from.

We should instead try to output them where they are imported from. However, if multiple libraries import from different locations, we should choose one and have the others be aliases.

One way to do that would be to record the return types from the getattr calls, so if you did something like import numpy; numpy.arange we would know the return type of the getattr is the arange function.

We currently record those getattr, but don't have the return type.

If we did, we could infer that any getattr from a module, which returns a class/fn from a different module represents an import alias.

Don't include long dask strings

We are currently recording some long dask strings like: qr-stack-getitem-b2984ed7b79fa1d55e4f99a321ae62ae-

We are currently trimming them. Instead, we should just record them as an arbitrary string type instead of a literal.

Add more downstream libraries

Look at libraries.io for other usage, like: https://libraries.io/pypi/numpy/usage

NumPy (over ~5k stars):

Pandas (excluding above)

String literals must not be treated as separate types

Strings are things like column names; there is no point treating those as separate types, it gives a huge amount of pollution in the generated type annotations. Example: typing/pandas.core.frame.py has 10,000+ lines for __getitem__/__setitem__ overloads.

Related, issue to make results more interpretable: gh-71

Add profiling for C calls

Currently, if a library calls another library through their C API we are unable to trace it. This includes calling anything in Cython. This is too bad, because a lot of calls to NumPy are from Cython or C libraries.

One idea on how to achieve this, from talking to @scopatz, was to use lldb's Python API. It is now building on Conda Forge on mac so I can get started exploring this.

Version parts seperately

It would be nice if we could re-run the API generation without having to re-run the test suite recording.

To do this, we could split the python package into three:

jsonl: common functionality for reading/writing jsonl files
record-api: recording the calls made from an API and doing groupby lines
analyze-api: taking records of API calls and creating modules from them

Consider adding support for tracking subclass method invocations

An issue arose when analyzing GeoPandas and its consumption of Pandas APIs.

GeoPandas subclasses Pandas (see here), so, in principle, a subclassed method should correspond to an equivalent Pandas DataFrame method. However, this is not the case.

Based on analysis, GeoPandas appears to only consume 3 Pandas APIs, but this is, presumably, not a fair representation, given that many a GeoPandas DataFrame method is a Pandas DataFrame method.

Accordingly, may be worth investigating whether we can track subclass method invocations.

In conversation with @saulshanabrook, he suggested updating the Tracer to look at method resolution order (MRO).

Cleanup of Python Stack Access

There were some comments here by @Caagr98 on how we can clean up our ability to access the Python stack, that we could try to incorporate. I haven't looked into them yet.

Cannot pull images locally

The images are currently hosted on a private Digital Ocean registry. We should move them to a public registry instead, now that this repo is public. Maybe Github?

Add sklearn

keep key orderings ordered instead of using set

for deterministic outputs

Include usage seperate by calling types

Generate @typing.overload for functions, for all the different calling signatures, so we can see what kwargs are used most

Record metadata for properties

Currently I am throwing away the usage metadata for properties when generating the API. We should preserve these and add them as comments.

Constructor for unrelated modules

We are emitting some typings for things like collections.py

We should see why that is and make sure we don't.

Create better view of results

Currently, to see the results in some human readable way you can navigate the data/typing directory here. However, this is a bit cumbersome, especially now that some files have gotten so long Github will no longer allow you to link to a line or view them.

We could possibly use Sphinx to generate a site based off the type definitions? I see that Sphinx does have support for type overloads: sphinx-doc/sphinx@fb2f777

BUG: record_api.line_counts raises TypeError on unexpected keyword arguments to dump()

I'm getting the next error when calling record_api.line_counts. I'm using version 1.1.1.

/tmp/record_api_results.jsonl contains the output generated by python -m record_api, here there is a sample:

$ head -n 5 record_api_results.jsonl
{"location":"/api_stats/scripts/10002306.py:22","function":{"t":"builtin_function_or_method","v":"getattr"},"params":{"args":[{"t":"module","v":"pandas"},"read_csv"]}}
{"location":"/api_stats/scripts/10002306.py:22","function":{"t":"function","v":{"module":"pandas.io.parsers","name":"_make_parser_function.<locals>.parser_f"}},"bound_params":{"pos_or_kw":[["filepath_or_buffer","../input/train.csv"],["index_col",null]]}}
{"location":"/api_stats/scripts/10002306.py:23","function":{"t":"builtin_function_or_method","v":"getattr"},"params":{"args":[{"t":{"module":"pandas.core.frame","name":"DataFrame"}},"shape"]}}
{"location":"/api_stats/scripts/10002306.py:25","function":{"t":"method","v":{"self":{"t":{"module":"pandas.core.frame","name":"DataFrame"}},"name":"head"}},"bound_params":{}}
{"location":"/api_stats/scripts/10002306.py:27","function":{"t":"builtin_function_or_method","v":"getattr"},"params":{"args":[{"t":"module","v":"pandas"},"read_csv"]}}

Call to line_counts:

export PYTHON_RECORD_API_INPUT=/tmp/record_api_results.jsonl
export PYTHON_RECORD_API_OUTPUT=/tmp/record_api_results_line_counts.jsonl
python -m record_api.line_counts

Result:

Counting lines...
reading /tmp/record_api_results.jsonl: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 478985/478985 [00:02<00:00, 217632.24it/s]
writing:   0%|                                                                                                                                                                                                      | 0/13169 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/line_counts.py", line 42, in <module>
    __main__()
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/line_counts.py", line 38, in __main__
    write(row_)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/jsonl.py", line 45, in write_line
    buffer.write(orjson.dumps(o, **kwargs))
TypeError: dumps() got an unexpected keyword argument

AST for Literal's not right

Currently it looks like:

Literal[("The kernel, DotProduct(sigma_0=1), is not returnin",)]

We should remove the inner paranthesis there.

Don't serialize empty fields

we should ignore all the empty fields in the JSON and not serialize them to save space and make it easier to browse

Fix Koalas Runs

#86 fixed the github actions which caused the koalas image to be built and run, that was added in #82.

It built properly, but failed to run. Here is the output copied:

Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/usr/local/lib/python3.8/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-2e6e8f27-d3fa-4533-bc4e-38938ded36f3;1.0
	confs: [default]
	found io.delta#delta-core_2.12;0.7.0 in central
	found org.antlr#antlr4;4.7 in central
	found org.antlr#antlr4-runtime;4.7 in central
	found org.antlr#antlr-runtime;3.5.2 in central
	found org.antlr#ST4;4.0.8 in central
	found org.abego.treelayout#org.abego.treelayout.core;1.0.3 in central
	found org.glassfish#javax.json;1.0.4 in central
	found com.ibm.icu#icu4j;58.2 in central
downloading https://repo1.maven.org/maven2/io/delta/delta-core_2.12/0.7.0/delta-core_2.12-0.7.0.jar ...
	[SUCCESSFUL ] io.delta#delta-core_2.12;0.7.0!delta-core_2.12.jar (96ms)
downloading https://repo1.maven.org/maven2/org/antlr/antlr4/4.7/antlr4-4.7.jar ...
	[SUCCESSFUL ] org.antlr#antlr4;4.7!antlr4.jar (28ms)
downloading https://repo1.maven.org/maven2/org/antlr/antlr4-runtime/4.7/antlr4-runtime-4.7.jar ...
	[SUCCESSFUL ] org.antlr#antlr4-runtime;4.7!antlr4-runtime.jar (6ms)
downloading https://repo1.maven.org/maven2/org/antlr/antlr-runtime/3.5.2/antlr-runtime-3.5.2.jar ...
	[SUCCESSFUL ] org.antlr#antlr-runtime;3.5.2!antlr-runtime.jar (5ms)
downloading https://repo1.maven.org/maven2/org/antlr/ST4/4.0.8/ST4-4.0.8.jar ...
	[SUCCESSFUL ] org.antlr#ST4;4.0.8!ST4.jar (7ms)
downloading https://repo1.maven.org/maven2/org/abego/treelayout/org.abego.treelayout.core/1.0.3/org.abego.treelayout.core-1.0.3.jar ...
	[SUCCESSFUL ] org.abego.treelayout#org.abego.treelayout.core;1.0.3!org.abego.treelayout.core.jar(bundle) (4ms)
downloading https://repo1.maven.org/maven2/org/glassfish/javax.json/1.0.4/javax.json-1.0.4.jar ...
	[SUCCESSFUL ] org.glassfish#javax.json;1.0.4!javax.json.jar(bundle) (3ms)
downloading https://repo1.maven.org/maven2/com/ibm/icu/icu4j/58.2/icu4j-58.2.jar ...
	[SUCCESSFUL ] com.ibm.icu#icu4j;58.2!icu4j.jar (118ms)
:: resolution report :: resolve 1696ms :: artifacts dl 278ms
	:: modules in use:
	com.ibm.icu#icu4j;58.2 from central in [default]
	io.delta#delta-core_2.12;0.7.0 from central in [default]
	org.abego.treelayout#org.abego.treelayout.core;1.0.3 from central in [default]
	org.antlr#ST4;4.0.8 from central in [default]
	org.antlr#antlr-runtime;3.5.2 from central in [default]
	org.antlr#antlr4;4.7 from central in [default]
	org.antlr#antlr4-runtime;4.7 from central in [default]
	org.glassfish#javax.json;1.0.4 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   8   |   8   |   8   |   0   ||   8   |   8   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-2e6e8f27-d3fa-4533-bc4e-38938ded36f3
	confs: [default]
	8 artifacts copied, 0 already retrieved (15071kB/48ms)
20/10/21 12:58:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/10/21 12:58:31 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://[email protected]:37093
	at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)
	at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:140)
	at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
	at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
	at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:34)
	at org.apache.spark.executor.Executor.<init>(Executor.scala:206)
	at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
	at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:201)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:555)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
20/10/21 12:58:31 ERROR Utils: Uncaught exception in thread Thread-5
java.lang.NullPointerException
	at org.apache.spark.scheduler.local.LocalSchedulerBackend.org$apache$spark$scheduler$local$LocalSchedulerBackend$$stop(LocalSchedulerBackend.scala:168)
	at org.apache.spark.scheduler.local.LocalSchedulerBackend.stop(LocalSchedulerBackend.scala:144)
	at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:734)
	at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2171)
	at org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:1973)
	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1357)
	at org.apache.spark.SparkContext.stop(SparkContext.scala:1973)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:641)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
20/10/21 12:58:31 WARN MetricsSystem: Stopping a MetricsSystem that is not running
ImportError while loading conftest '/usr/src/app/databricks/conftest.py'.
databricks/conftest.py:41: in <module>
    session = utils.default_session(shared_conf)
databricks/koalas/utils.py:384: in default_session
    session = builder.getOrCreate()
/usr/local/lib/python3.8/site-packages/pyspark/sql/session.py:186: in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
/usr/local/lib/python3.8/site-packages/pyspark/context.py:376: in getOrCreate
    SparkContext(conf=conf or SparkConf())
/usr/local/lib/python3.8/site-packages/pyspark/context.py:135: in __init__
    self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
/usr/local/lib/python3.8/site-packages/pyspark/context.py:198: in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
/usr/local/lib/python3.8/site-packages/pyspark/context.py:315: in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
/usr/local/lib/python3.8/site-packages/py4j/java_gateway.py:1568: in __call__
    return_value = get_return_value(
/usr/local/lib/python3.8/site-packages/py4j/protocol.py:326: in get_return_value
    raise Py4JJavaError(
E   py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
E   : org.apache.spark.SparkException: Invalid Spark URL: spark://[email protected]:37093
E   	at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)
E   	at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:140)
E   	at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
E   	at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
E   	at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:34)
E   	at org.apache.spark.executor.Executor.<init>(Executor.scala:206)
E   	at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
E   	at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
E   	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:201)
E   	at org.apache.spark.SparkContext.<init>(SparkContext.scala:555)
E   	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
E   	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
E   	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
E   	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
E   	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
E   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
E   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
E   	at py4j.Gateway.invoke(Gateway.java:238)
E   	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
E   	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
E   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
E   	at java.lang.Thread.run(Thread.java:748)

@ueshin were you able to test it locally? Would you be able to help debug this failure? I just added you as a maintainor of this repo as well.

investigate using DO spaces instead of minio for artifacts

https://developers.digitalocean.com/documentation/spaces/#aws-s3-compatibility

Put types in parent modules

Currently, we list the type where an object is defined as its module. However, many libraries define types in some sub-module but only expose them in a parent module.

So instead, when recording a type, we should look in all parent modules to see if they export this same type. If so, we should use that as the module name instead of where it is created.

Pandas Ran out of memory again!

So the pandas test suite ran out of memory again in Kubernetes. It used up ~13Gb and then was killed, because the pods only have that much available.

I am a bit hesitant to just raise the pod memory limit again... If anyone knows if this is a reasonable amount of memory for Pandas to use when testing (cc @datapythonista), that would be helpful! It's also possible that the tracing has some sort of memory leak which is blowing things up for pandas, although all the other test suites don't seem to have the same problem.

Maybe I can run Pandas test suites with some flags to ignore some high memory tests? These are my current ones:

python-record-api/k8/images/pandas/Dockerfile

Line 40 in 94902a2

    
           CMD [ "pytest", "pandas", "--skip-slow", "--skip-network", "--skip-db", "-m", "not single", "-r", "sxX", "--strict", "--suppress-tests-failed-exit-code"  ]

I copied it from the test-fast script, or whatever that is, in the Pandas repo.

Failure on pandas commands

I've got this script:

import pandas

df = pandas.DataFrame({'col': ['foo bar']})
df['col'].map(lambda x: len(x.split(' ')))

When I run it with the Python interpreter, it works without problems.

But when I run it with PYTHON_RECORD_API_TO_MODULES="pandas" python -m record_api, I get the following error:

Traceback (most recent call last):
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/__main__.py", line 12, in <module>
    tracer.calls_from_modules[0], run_name="__main__", alter_sys=True
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 205, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mgarcia/quansight/dataframe_tools/kaggle/mutable/scripts/9996822.py", line 4, in <module>
    df['col'].map(lambda x: len(x.split(' ')))
  File "/home/mgarcia/quansight/dataframe_tools/kaggle/mutable/scripts/9996822.py", line 4, in <module>
    df['col'].map(lambda x: len(x.split(' ')))
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 564, in __call__
    Stack(self, frame)()
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 372, in __call__
    getattr(self, method_name)()
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 477, in op_CALL_METHOD
    self.process((function,), function, args)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 354, in process
    log_call(f"{filename}:{line}", fn, *args, **kwargs)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 262, in log_call
    bound = Bound.create(fn, args, kwargs)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 239, in create
    sig = signature(fn)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/pandas/core/generic.py", line 1799, in __hash__
    f"{repr(type(self).__name__)} objects are mutable, "
TypeError: 'Series' objects are mutable, thus they cannot be hashed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/__main__.py", line 15, in <module>
    raise Exception(f"Error running {tracer.calls_from_modules}")
Exception: Error running ['9996822']

Not sure what's the exact pattern, but I'd say I get an error like this in almost every script that uses pandas. Let me know if you need more information, I can find other examples, but I guess it should be obvious for you what's wrong.

Error: editable mode currently requires a setup.py based build

At the moment the README.md suggests to install the package via pip install -e .

I get the following error, when I try to install in editable mode:

ERROR: File "setup.py" not found. Directory cannot be installed in editable mode: /Users/aktech/quansight/python-record-api
(A "pyproject.toml" file was found, but editable mode currently requires a setup.py based build.)

I believe the fix is to change that to:

pip install .

This worked for me.

Clone buildx step in workflow.yml is failing at the moment

python-record-api/.github/workflows/workflow.yml

Line 59 in 8694ef8

- name: Clone buildx

Here are the logs: https://github.com/data-apis/python-record-api/pull/78/checks?check_run_id=1120230770

Fix ufunc call

This doesn't look right, the real function doesn't include one of the overloads:

class ufunc:
    @overload
    def __call__(self, _0: numpy.ndarray, _1: numpy.ndarray, /):
        """
        usage.sample-usage: 1
        """
        ...

    @overload
    def __call__(self, _0: numpy.ndarray, /):
        """
        usage.sample-usage: 2
        """
        ...

    def __call__(self, _0: numpy.ndarray, /):
        """
        usage.sample-usage: 3
        """