wearepal / data-science-types Goto Github PK

Mypy stubs, i.e., type information, for numpy, pandas and matplotlib

License: Apache License 2.0

Python 99.84% Shell 0.16%

python mypy-stubs numpy pandas matplotlib stubs mypy type-stubs

data-science-types's Issues

Error on merge function of pandas data frames

Hi,
I'm using pyright in combination with the stubs provided by this repo. I'm getting a problem when I check the following code:

import pandas as pd

d = {'a': [1, 2]}
df = pd.DataFrame(data=d)

df_merged: pd.DataFrame = df.merge(right=df)

By running pyright against this code I get the following errors:

 3:19 - error: Argument of type "Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[int]]" cannot be assigned to parameter "data" of type "Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')] | DataFrame | Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')]] | None" in function "__init__"
  Type "Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[int]]" cannot be assigned to type "Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')] | DataFrame | Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')]] | None"
    "Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[int]]" is incompatible with "Series[TypeVar('_DType')]"
    "Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[int]]" is incompatible with "Index[TypeVar('_T')]"
    "Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[int]]" is incompatible with "ndarray[TypeVar('_DType')]"
    "Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[int]]" is incompatible with "Sequence[TypeVar('_T_co')]"
    "Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[int]]" is incompatible with "DataFrame"
    Cannot assign to "None"
      TypeVar "_VT" is invariant
... (reportGeneralTypeIssues)
  4:13 - error: "df.merge(right=df)" has type "Series[Unknown]" and is not callable (reportGeneralTypeIssues)
  4:1 - error: Type of "df_merged" is unknown (reportUnknownVariableType)

Ignoring error 3.19 (that however is not clear to me as well), I would like to focus on error 4:13. Why the function merge is said to have type Series[Unknown]?

To Reproduce
install data-science-types stubs:
pip install data-science-types

install pyright:
sudo npm install -g pyright

run the file with the code above:
pyright test.py

I've already posted the issue on pyright and I have been suggested to submit an issue here because pyright is behaving accordingly with the information provided by the sutbs.

False positives on `np.empty`

I have these two calls:

    master_df[DF_VAR_COLUMN] = np.empty(shape=master_df.shape[0], dtype=str)
    master_df[DF_VAR_IDX_COLUMN] = np.empty(shape=master_df.shape[0], dtype=int)

With the numpy stubs in place, mypy does not like this, claiming that:

eval.py:809: error: Value of type variable "_DType" of "empty" cannot be "str"
eval.py:810: error: Value of type variable "_DType" of "empty" cannot be "int"

But these are legitimate values, AFAICT. I'll try to see about a PR.

Missing pandas.to_datetime and pandas.Timestamp

pandas.to_datetime is not in the type stubs

Also pandas.Timestamp is missing

Pandas `DataFrame.concat` missing some arguments

The concat method for joining multiple DataFrames appears to be missing several arguments, such as join, keys, levels, and more.

https://github.com/predictive-analytics-lab/data-science-types/blob/faebf595b16772d3aa70d56ea179a2eaffdbd565/pandas-stubs/__init__.pyi#L37-L42

Compare to the Pandas docs:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html

Numpy has no method 'isfinite'

Script used:

import numpy as np

arr: np.ndarray[np.float32] = np.array([0, 1, np.inf], dtype=np.float32)

print(np.isfinite(arr))

test_frame_iloc fails on Pandas 1.2

tests/pandas_test.py line 92 fails on Pandas 1.2

Extracting the relevant code

import pandas as pd
df: pd.DataFrame = pd.DataFrame(
    [[1.0, 2.0], [4.0, 5.0], [7.0, 8.0]],
    index=["cobra", "viper", "sidewinder"],
    columns=["max_speed", "shield"],
)
s: "pd.Series[float]" = df["shield"].copy()
df.iloc[0] = s

Results in

ValueError: could not broadcast input array from shape (3) into shape (2)

This runs fine on Pandas 1.1.5

Missing commas in generated pyplot.pyi

I think I've found a bug in the pyplot.pyi generation scripts.

I cloned this repository master branch, and set up a virtualenv from python 3.7.3 (virtualenv -p python3 .venv).

Running pip install -e . inside the virtual environment, generated a pyplot.pyi with missing commas at line 229 and below.

   222	def plot(
   223	    x: Data,
   224	    y: Data,
   225	    fmt: Optional[str] = ...,
   226	    *,
   227	    scalex: bool = ...,
   228	    scaley: bool = ...,
   229	    agg_filter: Callable[[_NumericArray, int], _NumericArray] = ...    # <-- comma missing here and at end of lines below
   230	    alpha: Optional[float] = ...
   231	    animated: Optional[bool] = ...
   232	    antialiased: Optional[bool] = ...
   233	    aa: Optional[bool] = ..., #alias of antialiased
   234	    clip_box: Optional[Bbox] = ...
   235	    clip_on: Optional[bool] = ...
   236	    clip_path: Optional[Callable[[Path, Transform], None]] = ...
   237	    color: Optional[str] = ...
   238	    c: Optional[str] = ...
   239	    contains: Optional[Callable[[Artist, MouseEvent], Tuple[bool, dict]]] = ...
   240	    dash_capstyle: Optional[Literal['butt', 'round', 'projecting']] = ...
   241	    dash_jointstyle: Optional[Literal['miter', 'round', 'bevel']] = ...
   242	    dashes: Optional[[Sequence[float], Tuple[None, None]]] = ...
   243	    drawstyle: Literal['default', 'steps', 'steps-pre', 'steps-mid', 'steps-post'] = ...
   244	    ds: Literal['default', 'steps', 'steps-pre', 'steps-mid', 'steps-post'] = ...
   245	    figure: Optional[Figure] = ...
   246	    fillstyle: Literal['full', 'left', 'right', 'bottom', 'top', 'none'] = ...

Generalize many types to Sequence

There are many types that are unions of List and np.ndarray and Series. These should probably all be transformed to use Sequence instead (which also would cover legitimate uses of Tuple).

Missing pandas.isna and pandas.Index.isna

API reference : pandas.isna and pandas.Index.isna

Problems with dtypes

There is something about numpy dtype's and stubs that I don't understand that is keeping me from fixing some stubs. I hope someone can correct me.

After extending the type stubs for DataFrame's __init__ and astype as follows:

class DataFrame:
    def __init__(
        self,
        data: Optional[Union[_ListLike, DataFrame, Dict[_str, _np.ndarray]]] = ...,
        columns: Optional[_ListLike] = ...,
        index: Optional[_ListLike] = ...,
        dtype: Optional[_np.dtype] = ...,
    ): ...
...
    def astype(self, dtype: Union[_str, Dict[str, _np.dtype]], copy: bool=True, errors: _ErrorType = 'raise') -> DataFrame: ...

I have the following which does not type-check properly:

    query_df = pd.DataFrame(
        columns=[
            TEMPERATURE_COL,
            OD_COL,
            "od_log",
            "media",
            "gate",
            "input",
            "mean_log_gfp_live",
            "mean_log_gfp_",
        ],
        dtype=np.float64,
    )

with the error eval.py:210: error: Argument "dtype" to "DataFrame" has incompatible type "Type[float64]"; expected "Optional[dtype]"

and this also:

    query_df = query_df.astype(dtype={"input": np.str_, "gate": np.str_}, copy=False)

mypy3: Dict entry 0 has incompatible type "str": "Type[str_]"; expected str?: "dtype" and mypy3: Dict entry 1 has incompatible type "str": "Type[str_]"; expected str?: "dtype"

I look at numpy_stubs/__init__.pyi, and it looks like np.float64 and np.str_ are both defined there:

class floating(number, float): ...
class float64(floating): ...
 ...
class str_(dtype, str): ...

but it seems like mypy is seeing the actual values from numpy instead of the values from the numpy stubs.

Add all the function needed in EthicML

This PR: wearepal/EthicML#236 and this PR: wearepal/EthicML#246 added the use a lot of functions to EthicML that aren't in data-science-types yet.

Fails type checking on dataframe.to_pickle

Running mypy with data-science-types on the following

import pandas as pd
df = pd.DataFrame({'a': [1]})
df.to_pickle('output.pkl')

Produces an error:

error: "Series[Any]" not callable

I would expect it to pass using DataFrame.to_pickle

types missing (e.g. uint32)

I am new to type-hinting, but I thought I will give it a go :)

I noticed that quite some types are missing:
https://numpy.org/doc/stable/user/basics.types.html
https://github.com/predictive-analytics-lab/data-science-types/blob/master/numpy-stubs/__init__.pyi#L43

Is there a reason for these values missing (except that it is a lot of work to migrate all at once)? I might be able to free up some time to add these in a PR if you are interested.

matplotlib.pyplot.close optional argument

matplotlib.pyplot.close has None as a default argument, however, the stub does not specify None.

https://matplotlib.org/api/_as_gen/matplotlib.pyplot.close.html#matplotlib.pyplot.close

Shouldn't line 217 in https://github.com/predictive-analytics-lab/data-science-types/blob/master/matplotlib-stubs/pyplot.pyi.in
be changed from the first to the second?

def close(fig: Union[Figure, Literal["all"]]) -> None: ...

def close(fig: Union[Figure, Literal["all"], None]) -> None: ...

Missing Pandas types: dataframe.index.names, columns.names, read_*, assignments, starts with

If of all thanks so much for doing this typing library. I use nptyping and we were trying your data-science-types. Here are the types that are missing. I'm no .pyi expert, but happy to help, so in our project here is what is not working and should all exist, in looking at the pyi files

I can see the errors, because index is just an array, but it actually has the name property

Pandas.DataFrame.index.name
Pandas.DataFrame.columns.name
Panda.read_hdf, read_html, read_excel, to_hdf
Pandas dataframe can't accept assignment
Pandas.columns can't be assigned
Pandas.dropna missing
Pandas.to_replace missing
Pandas.replace
Pandas.startwith
Pandas.string.startswith
Pandas dataframe cannot be used as a left operand

Numpy problems

Numpy.ones_like=
Numpy.einsum
numpy.array doesn't handle the pass of a dataframe as an input (which works btw)
numpy.any not available

Would you consider removing your numpy types and letting nptyping handle that. There does not seem to be a good way to interoperate conflicting.pyi

add iterable to possible input types of pandas.concat

Hi,
in my opinion it's universally considered a best practice/pythonic to safe memory using a generator rather than a list e.g. when concatenating dataframes. Sadly for now that raises an exception in mypy since concat only excepts Union[Sequence[DataFrame], Mapping[str, DataFrame]]. Notice that generators are not considered sequences.

I would very much appreciate it if somebody could add iterables to the accepted input types of pandas.concat.

Thank you for your time.

Dataframe.reset_index only allows inplace=False

Looks like stubs only allow inplace=True.

    @overload
    def reset_index(self, drop: bool = ...) -> DataFrame: ...
    @overload
    def reset_index(self, inplace: Literal[True], drop: bool = ...) -> None: ...

inplace=False is indeed the default behavior and there is no need to specify it but it should still be allowed.

slicing with pandas .loc

Hi! Cool project, and great to finally be able to not just ignore missing imports in mypy.ini!

I've just been getting started today trying out the library with some pre-existing code that heavily uses pandas. I know it's a work-in-progress so was not too surprised to get a few errors.

Some of these were just missing bits of functionality that would, I think, be straightforward additions - things like pd.date_range, pd.to_datetime, pd.tseries and the like. I'm hoping to find some time to contribute, since I see you encourage it. :)

I was a bit less sure about about the results I got for this pattern: df.loc[:, "column_name"] = , for which mypy threw this:

error: Invalid index type "Tuple[slice, str]" for "_LocIndexerFrame"; expected type "Tuple[Union[str, str_], Union[str, str_]]"

This was solveable with a minor refactor, but I had to just type: ignore this one:

error: No overload variant of "__getitem__" of "_LocIndexerFrame" matches argument type "slice"

which appeared when doing df.loc["2018-10":, "column_name"]/df.loc[datetime_object:, "column_name"] (using the date-time index slicing functionality)

I wondered whether supporting slices is unfeasible or something you'd hope to include? .loc's behaviour is pretty complex so I appreciate type-annotating fully would be painful!

A related issue was this mypy error:

error: Invalid index type "Tuple[Series[bool], str]" for "_LocIndexerFrame"; expected type "Tuple[Union[Union[Series[bool], ndarray[bool_], List[bool]], List[str]], Union[Union[Series[bool], ndarray[bool_], List[bool]], List[str]]]"

This one comes about from df.loc[boolean_series, "column_name"] and my examination of the expected type showed me I could refactor to df.loc[boolean_series, ["column_name"]] to get the same functionality - looks to me as though, as implemented, the type annotations allow you to either pass two collections to .loc or two labels, but not a mixture?

Just want to check what's in-scope for this project before slinging PRs around!

Misssing inplace keyword argument for pandas.Series.fillna

https://github.com/predictive-analytics-lab/data-science-types/blob/7213aab242fbef20b5bb3e8e2b28099d2711ade3/pandas-stubs/core/series.pyi#L124

Missing pandas.to_numeric

The pandas stubs are missing pandas.to_numeric.

I would like to do a PR but I'm not really sure where to start or how to write proper type hints for this, as I've only just started learning about python typing for the last few days. Any help would be much appreciated.

Missing numpy.savetxt

Top level function documented here: https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html

test that python3.6 is still fully supported

this is here as a reminder to look at this post NeurIPS deadline

Add type information for pandas.DataFrame.transpose.

There is currently no type information for pandas.DataFrame.transpose.

Error when typing Numpy arrays within NamedTuples

data-science-types==0.2.12
mypy==0.770
typing==3.6.4

Code to reproduce:

import numpy as np
from typing import List, NamedTuple

class Ok(NamedTuple):
    x: List[int]

class Problem(NamedTuple):
    x: np.ndarray[np.int64]  # TypeError: 'type' object is not subscriptable

Module has no attribute "to_datetime"

It seems there's a missing stub for the pandas module (or I at least can't find it). In any case this code:

pd.to_datetime(...)

Throws
error: Module has no attribute "to_datetime"

Add type information for read_json method.

There is currently type information for read_csv, read_feather, and read_sql, but no information for read_json.

Tests failing on forking

I forked the repo and ran the tests - ./check_all.sh, it resulted in 152 errors found in 4 files. How to get started?

bug in ndarray: incompatible type error

I think there is a bug in ndarray type hint.
I think that in case of np.array that has one row with only integers (type int or int[64]) and one row with at least one float (e.g. type float), then it produces an error since rows have different types so conflict like "np.ndarray[float] != np.ndarray[int]" occurs.

Reproducing code example:

# bug.py
import numpy as np 
arr = np.array([[4.2, 2, 3.5], [12, 3, 6]])

and run mypy:

$ mypy bug.py

Error message:

bug.py:3: error: Argument 1 to "array" has incompatible type "List[object]"; expected "Union[List[bool], List[List[bool]], List[List[List[bool]]], List[List[List[List[bool]]]]]"

NumPy/Python/data-science-types versions information:

1.19.2 / 3.8.5 / 0.2.20

missing np.diff

https://numpy.org/doc/stable/reference/generated/numpy.diff.html

_DtypeSpec type

Should there be a type that captures the type of thing that can be given as a dtype spec? I believe that this is Union[_str, Type[_np.dtype]], but I could be wrong. If we can identify a reasonable type for this, that might make a lot of typing smoother and more consistent.

how did you create these?

are you writing these by hand or in some other way?

Pandas `DataFrame.drop_duplicates` missing keywords 'subset' and 'inplace'

Script used:

import pandas as pd

df: pd.DataFrame = pd.DataFrame([[1, 2], [1, 4]], columns=["a", "b"], index=["c", "d"])

df.drop_duplicates(subset=["a"], inplace=True)

print(df)

Will numpy stubs be removed after next numpy release?

Numpy has finally merged the stubs from numpy-stubs into the main numpy project.

numpy/numpy-stubs#88
numpy/numpy#16515

Will the numpy stubs in this project be removed when numpy 1.20.0 is released?

Cannot assign a list of strings to a pandas.DataFrame.columns

Example :

def upper_cased_header(df: pd.DataFrame) -> pd.DataFrame:
    df.columns = [header.upper() for header in df.columns]
    return df

mypy will return

error: Unexpected keyword argument "inplace" for "fillna" of "Series"

Pandas has no method 'read_hdf'

Script used:

import pandas as pd

x: pd.DataFrame = pd.read_hdf("your_hdf_here.hdf")

"at" method is missing from pandas DataFrame stub

I'm working on a PR for this.

init of Index class does not takes all the parameters

pandas.Index takes a few optional parameters in the init after data such as dtype copy, name and tupleize_cols

Current type stubs only have data
https://github.com/predictive-analytics-lab/data-science-types/blob/3990a8f876a6e36afa53cc044b77d0448a5c468c/pandas-stubs/core/indexes/base.pyi#L19

pyplot savefig type too narrow

In VSCode with pyright, I'm trying to call savefig with a BytesIO object where the fname would be. I'm ending up with:

Argument of type "BytesIO" cannot be assigned to parameter "fname" of type "str | Path" in function "savefig"
  Type "BytesIO" cannot be assigned to type "str | Path"
    "BytesIO" is incompatible with "str"
    "BytesIO" is incompatible with "Path"Pyright (reportGeneralTypeIssues)

Since BytesIO (also I think files opened in wb mode? I'm not sure about that part) is definitely a valid target, I'd like to PR the type in. Would making the fname type Union[str, Path, BytesIO] be sufficient? Or would you prefer more types that could technically fit into the fname slot?

I should add: this is great work. A co-worker of mine and I were looking around for numpy typings and this repo offers such an improvement over the default typing

[pandas] `_AtIndexerFrame` should support indexing by [int, int]

if you build a Dataframe with header=None the axes are [RangeIndex(start=0, stop=4, step=1), RangeIndex(start=0, stop=9, step=1)], so you can't access elements with df.at[123, 'xyz'], but you need to use df.at[123,456].

according to https://github.com/predictive-analytics-lab/data-science-types/blob/7dab8238df9e93d00be6d683d8efabbdf95fc958/pandas-stubs/core/indexing.pyi#L88, however, _AtIndexerFrame now only allow the second index to be a "_StrLike".

Ideally this should be a runtime check, but as I thing this is not possible with mypy, there should not be a false positive

Add TypedDict to DataFrame class Optional

Since PEP 589 in python 3.8, we can now create a class that inherits TypedDict and determine which type every key takes in a Dictionary, but it is not included at the DataFrame class optional, so it raises an error when running mypy.

I fixed locally by adding a variable:

_TypedDictLike = TypedDict

and passing it to the DataFrame class optional:

data: Optional[Union[_ListLike, DataFrame, Dict[_str, _ListLike], _TypedDictLike]]

I appreciate if you guys could implement it or if i could push a branch with this.

Thank you for your time!

Various missing matplotlib.pyplot function stubs

https://matplotlib.org/api/_as_gen/matplotlib.pyplot.gca.html?highlight=gca#matplotlib.pyplot.gca
https://matplotlib.org/api/_as_gen/matplotlib.pyplot.vlines.html?highlight=vlines#matplotlib.pyplot.vlines
https://matplotlib.org/api/_as_gen/matplotlib.pyplot.hlines.html?highlight=hlines#matplotlib.pyplot.hlines

Three missing stubs from matplotlib.pyplot are gca, hlines, and vlines.

check_all.sh fails when using a project level virtual environment

I am in the process of fleshing out a few pyi files with the definitions from Pandas.

My normal process for python development is to create a virtual environment on the root level of each project (to keep code segregated), like so:

python -m venv venv && . venv/bin/activate && pip install --upgrade pip && pip install -e .[dev]

After updating the pyi files and adding tests, everything looks okay, right up to the end of check_all.sh. When it is running the line && mypy tests \ this causes it to find a LOT (> 900 on my machine) of errors from packages in the venv folder. Sample output:

venv/lib/python3.8/site-packages/mypy/typeshed/stdlib/3/typing.pyi:675: error: Return type becomes "Union[bool, Any]" due to an unfollowed import
venv/lib/python3.8/site-packages/mypy/typeshed/stdlib/3/tkinter/commondialog.pyi:7: error: Function is missing a type annotation for one or more arguments
venv/lib/python3.8/site-packages/mypy/typeshed/stdlib/3/tkinter/commondialog.pyi:8: error: Function is missing a type annotation for one or more arguments
venv/lib/python3.8/site-packages/mypy/typeshed/stdlib/3/_thread.pyi:43: error: Function is missing a type annotation for one or more arguments
venv/lib/python3.8/site-packages/packaging/_typing.py:34: error: Statement is unreachable

Similar lines to those continue for many more lines.

I did notice if I deleted no_silence_site_packages = True this goes away, but not sure the intention behind that setting, so I didn't want to delete it and cause downstream issues.

Add logo to README

This is of course silly, but it's also fun.

Pandas 'SeriesGroupBy' has no method 'apply', 'groups', or 'get_group'

Script used:

import pandas as pd

df: pd.DataFrame = pd.DataFrame([[1, 2], [3, 4]], columns=["a", "b"], index=["c", "d"])

grouped = df.groupby("a")["b"]
grouped_list = grouped.apply(list)

print(df)
print(grouped)
print(grouped_list)
print(grouped.groups)
print(grouped.get_group(1))

Data frame init compalins if data is a dictionary

I'm running this code:

import pandas

d = {"c": [1,2,3], "d": [4,5,6]}
df = pandas.DataFrame(data=d)

I was expecting no errors. However, I get this message:

Argument of type "Dict[str, List[int]]" cannot be assigned to parameter "data" of type "Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')] | DataFrame | Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')]] | None" in function "__init__"
  Type "Dict[str, List[int]]" cannot be assigned to type "Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')] | DataFrame | Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')]] | None"
    "Dict[str, List[int]]" is incompatible with "Series[TypeVar('_DType')]"
    "Dict[str, List[int]]" is incompatible with "Index[TypeVar('_T')]"
    "Dict[str, List[int]]" is incompatible with "ndarray[TypeVar('_DType')]"
    "Dict[str, List[int]]" is incompatible with "Sequence[TypeVar('_T_co')]"
    "Dict[str, List[int]]" is incompatible with "DataFrame"
    Cannot assign to "None"
      TypeVar "_VT" is invariant

I was expecting that Dict[str, List[int]] is compatible with Dict[_str, Sequence[TypeVar('_T_co')]] which is listed in the possible types of data. Probably I am missing what TypeVar('_T_co') means.

gen_pyi.py not in tar ball from PyPI

MWE:

wget https://pypi.io/packages/source/d/data-science-types/data-science-types-0.2.21.tar.gz
tar -xvf data-science-types-0.2.21.tar.gz
tree

and you can see that gen_pyi.py is not there. However, setup.py calls it so it fails.

Improve the test infrastructure

We should probably use this: https://github.com/typeddjango/pytest-mypy-plugins .

One particular disadvantage of the current way of testing is that you can't do "negative tests" by which I mean you can't specify that something should throw an error.

No overload variant of "where"

Is there a work around for this when using the where function in numpy? error: No overload variant of "where" matches argument types "Any", "int", "int"

Missing Union with primitive (_DType) types in numpy functions

Many numpy functions allow the usage not only of an _ArrayLike (List or ndarray) argument but also simple primitive value.
This is currently often not allowed with the type hints.

Two examples of code that results in errors when checked with mypy:

myarr = np.array(1.0) -> No overload variant of "array" matches argument type "float"
np.append(myarr, 1.0) -> Argument 2 to "append" has incompatible type "float"; expected "Union[Array[Any], Sequence[Any]]"

This could be fixed by using a Union[_ArrayLike, _DType] instead of just _ArrayLike.
Because this might be done for a lot of functions, I have refrained from partially changing the code where I know numpy accepts primitive types and submitting a pull request. I think this is better implemented on a general scale with more overview than I currently have.

Missing pandas.Series.iteritems

doc: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.iteritems.html?highlight=iteritem

(sorry if I open a lot of issues, I have tried to add this to my project and report every thing that fails)

wearepal / data-science-types Goto Github PK

data-science-types's Issues

Numpy problems

Reproducing code example:

Error message:

NumPy/Python/data-science-types versions information:

Recommend Projects

Recommend Topics

Recommend Org