Coder Social home page Coder Social logo

wearepal / data-science-types Goto Github PK

View Code? Open in Web Editor NEW
202.0 10.0 52.0 453 KB

Mypy stubs, i.e., type information, for numpy, pandas and matplotlib

License: Apache License 2.0

Python 99.84% Shell 0.16%
python mypy-stubs numpy pandas matplotlib stubs mypy type-stubs

data-science-types's Introduction

Mypy type stubs for NumPy, pandas, and Matplotlib

Join the chat at https://gitter.im/data-science-types/community

⚠️ this project has mostly stopped development ⚠️

The pandas team and the numpy team are both in the process of integrating type stubs into their codebases, and we don't see the point of competing with them.


This is a PEP-561-compliant stub-only package which provides type information for matplotlib, numpy and pandas. The mypy type checker (or pytype or PyCharm) can recognize the types in these packages by installing this package.

NOTE: This is a work in progress

Many functions are already typed, but a lot is still missing (NumPy and pandas are huge libraries). Chances are, you will see a message from Mypy claiming that a function does not exist when it does exist. If you encounter missing functions, we would be delighted for you to send a PR. If you are unsure of how to type a function, we can discuss it.

Installing

You can get this package from PyPI:

pip install data-science-types

To get the most up-to-date version, install it directly from GitHub:

pip install git+https://github.com/predictive-analytics-lab/data-science-types

Or clone the repository somewhere and do pip install -e ..

Examples

These are the kinds of things that can be checked:

Array creation

import numpy as np

arr1: np.ndarray[np.int64] = np.array([3, 7, 39, -3])  # OK
arr2: np.ndarray[np.int32] = np.array([3, 7, 39, -3])  # Type error
arr3: np.ndarray[np.int32] = np.array([3, 7, 39, -3], dtype=np.int32)  # OK
arr4: np.ndarray[float] = np.array([3, 7, 39, -3], dtype=float)  # Type error: the type of ndarray can not be just "float"
arr5: np.ndarray[np.float64] = np.array([3, 7, 39, -3], dtype=float)  # OK

Operations

import numpy as np

arr1: np.ndarray[np.int64] = np.array([3, 7, 39, -3])
arr2: np.ndarray[np.int64] = np.array([4, 12, 9, -1])

result1: np.ndarray[np.int64] = np.divide(arr1, arr2)  # Type error
result2: np.ndarray[np.float64] = np.divide(arr1, arr2)  # OK

compare: np.ndarray[np.bool_] = (arr1 == arr2)

Reductions

import numpy as np

arr: np.ndarray[np.float64] = np.array([[1.3, 0.7], [-43.0, 5.6]])

sum1: int = np.sum(arr)  # Type error
sum2: np.float64 = np.sum(arr)  # OK
sum3: float = np.sum(arr)  # Also OK: np.float64 is a subclass of float
sum4: np.ndarray[np.float64] = np.sum(arr, axis=0)  # OK

# the same works with np.max, np.min and np.prod

Philosophy

The goal is not to recreate the APIs exactly. The main goal is to have useful checks on our code. Often the actual APIs in the libraries is more permissive than the type signatures in our stubs; but this is (usually) a feature and not a bug.

Contributing

We always welcome contributions. All pull requests are subject to CI checks. We check for compliance with Mypy and that the file formatting conforms to our Black specification.

You can install these dev dependencies via

pip install -e '.[dev]'

This will also install NumPy, pandas, and Matplotlib to be able to run the tests.

Running CI locally (recommended)

We include a script for running the CI checks that are triggered when a PR is opened. To test these out locally, you need to install the type stubs in your environment. Typically, you would do this with

pip install -e .

Then use the check_all.sh script to run all tests:

./check_all.sh

Below we describe how to run the various checks individually, but check_all.sh should be easier to use.

Checking compliance with Mypy

The settings for Mypy are specified in the mypy.ini file in the repository. Just running

mypy tests

from the base directory should take these settings into account. We enforce 0 Mypy errors.

Formatting with black

We use Black to format the stub files. First, install black and then run

black .

from the base directory.

Pytest

python -m pytest -vv tests/

Flake8

flake8 *-stubs

License

Apache 2.0

data-science-types's People

Contributors

adimyth avatar bradley-butcher avatar clouds56 avatar dnaaun avatar dvarrazzo avatar edwardjross avatar eganjs avatar fabiencelier avatar hvlot avatar ickc avatar jeremiq avatar jmargeta avatar krassowski avatar loganamcnichols avatar maarten-vd-sande avatar melentye avatar mylesbartlett avatar nicoddemus avatar olliethomas avatar patriktrelsmo-izettle avatar pmav99 avatar rpgoldman avatar sarar-1 avatar skydetulliov avatar sukuldhoka avatar thecleric avatar tmke8 avatar wwuck avatar zhsimon avatar zsimoncentene avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-science-types's Issues

Tests failing on forking

I forked the repo and ran the tests - ./check_all.sh, it resulted in 152 errors found in 4 files. How to get started?

Problems with dtypes

There is something about numpy dtype's and stubs that I don't understand that is keeping me from fixing some stubs. I hope someone can correct me.

After extending the type stubs for DataFrame's __init__ and astype as follows:

class DataFrame:
    def __init__(
        self,
        data: Optional[Union[_ListLike, DataFrame, Dict[_str, _np.ndarray]]] = ...,
        columns: Optional[_ListLike] = ...,
        index: Optional[_ListLike] = ...,
        dtype: Optional[_np.dtype] = ...,
    ): ...
...
    def astype(self, dtype: Union[_str, Dict[str, _np.dtype]], copy: bool=True, errors: _ErrorType = 'raise') -> DataFrame: ...

I have the following which does not type-check properly:

    query_df = pd.DataFrame(
        columns=[
            TEMPERATURE_COL,
            OD_COL,
            "od_log",
            "media",
            "gate",
            "input",
            "mean_log_gfp_live",
            "mean_log_gfp_",
        ],
        dtype=np.float64,
    )

with the error eval.py:210: error: Argument "dtype" to "DataFrame" has incompatible type "Type[float64]"; expected "Optional[dtype]"

and this also:

    query_df = query_df.astype(dtype={"input": np.str_, "gate": np.str_}, copy=False)

mypy3: Dict entry 0 has incompatible type "str": "Type[str_]"; expected str?: "dtype" and mypy3: Dict entry 1 has incompatible type "str": "Type[str_]"; expected str?: "dtype"

I look at numpy_stubs/__init__.pyi, and it looks like np.float64 and np.str_ are both defined there:

class floating(number, float): ...
class float64(floating): ...
 ...
class str_(dtype, str): ...

but it seems like mypy is seeing the actual values from numpy instead of the values from the numpy stubs.

slicing with pandas .loc

Hi! Cool project, and great to finally be able to not just ignore missing imports in mypy.ini!

I've just been getting started today trying out the library with some pre-existing code that heavily uses pandas. I know it's a work-in-progress so was not too surprised to get a few errors.

Some of these were just missing bits of functionality that would, I think, be straightforward additions - things like pd.date_range, pd.to_datetime, pd.tseries and the like. I'm hoping to find some time to contribute, since I see you encourage it. :)

I was a bit less sure about about the results I got for this pattern: df.loc[:, "column_name"] = , for which mypy threw this:

error: Invalid index type "Tuple[slice, str]" for "_LocIndexerFrame"; expected type "Tuple[Union[str, str_], Union[str, str_]]"

This was solveable with a minor refactor, but I had to just type: ignore this one:

error: No overload variant of "__getitem__" of "_LocIndexerFrame" matches argument type "slice"

which appeared when doing df.loc["2018-10":, "column_name"]/df.loc[datetime_object:, "column_name"] (using the date-time index slicing functionality)

I wondered whether supporting slices is unfeasible or something you'd hope to include? .loc's behaviour is pretty complex so I appreciate type-annotating fully would be painful!

A related issue was this mypy error:

error: Invalid index type "Tuple[Series[bool], str]" for "_LocIndexerFrame"; expected type "Tuple[Union[Union[Series[bool], ndarray[bool_], List[bool]], List[str]], Union[Union[Series[bool], ndarray[bool_], List[bool]], List[str]]]"

This one comes about from df.loc[boolean_series, "column_name"] and my examination of the expected type showed me I could refactor to df.loc[boolean_series, ["column_name"]] to get the same functionality - looks to me as though, as implemented, the type annotations allow you to either pass two collections to .loc or two labels, but not a mixture?

Just want to check what's in-scope for this project before slinging PRs around!

matplotlib.pyplot.close optional argument

matplotlib.pyplot.close has None as a default argument, however, the stub does not specify None.

https://matplotlib.org/api/_as_gen/matplotlib.pyplot.close.html#matplotlib.pyplot.close

Shouldn't line 217 in https://github.com/predictive-analytics-lab/data-science-types/blob/master/matplotlib-stubs/pyplot.pyi.in
be changed from the first to the second?

def close(fig: Union[Figure, Literal["all"]]) -> None: ...
def close(fig: Union[Figure, Literal["all"], None]) -> None: ...

Missing pandas.to_numeric

The pandas stubs are missing pandas.to_numeric.

I would like to do a PR but I'm not really sure where to start or how to write proper type hints for this, as I've only just started learning about python typing for the last few days. Any help would be much appreciated.

False positives on `np.empty`

I have these two calls:

    master_df[DF_VAR_COLUMN] = np.empty(shape=master_df.shape[0], dtype=str)
    master_df[DF_VAR_IDX_COLUMN] = np.empty(shape=master_df.shape[0], dtype=int)

With the numpy stubs in place, mypy does not like this, claiming that:

eval.py:809: error: Value of type variable "_DType" of "empty" cannot be "str"
eval.py:810: error: Value of type variable "_DType" of "empty" cannot be "int"

But these are legitimate values, AFAICT. I'll try to see about a PR.

Missing Pandas types: dataframe.index.names, columns.names, read_*, assignments, starts with

If of all thanks so much for doing this typing library. I use nptyping and we were trying your data-science-types. Here are the types that are missing. I'm no .pyi expert, but happy to help, so in our project here is what is not working and should all exist, in looking at the pyi files

I can see the errors, because index is just an array, but it actually has the name property

Pandas.DataFrame.index.name
Pandas.DataFrame.columns.name
Panda.read_hdf, read_html, read_excel, to_hdf
Pandas dataframe can't accept assignment
Pandas.columns can't be assigned
Pandas.dropna missing
Pandas.to_replace missing
Pandas.replace
Pandas.startwith
Pandas.string.startswith
Pandas dataframe cannot be used as a left operand

Numpy problems

Numpy.ones_like=
Numpy.einsum
numpy.array doesn't handle the pass of a dataframe as an input (which works btw)
numpy.any not available

Would you consider removing your numpy types and letting nptyping handle that. There does not seem to be a good way to interoperate conflicting.pyi

Dataframe.reset_index only allows inplace=False

Looks like stubs only allow inplace=True.

    @overload
    def reset_index(self, drop: bool = ...) -> DataFrame: ...
    @overload
    def reset_index(self, inplace: Literal[True], drop: bool = ...) -> None: ...

inplace=False is indeed the default behavior and there is no need to specify it but it should still be allowed.

Numpy has no method 'isfinite'

Script used:

import numpy as np

arr: np.ndarray[np.float32] = np.array([0, 1, np.inf], dtype=np.float32)

print(np.isfinite(arr))

gen_pyi.py not in tar ball from PyPI

MWE:

wget https://pypi.io/packages/source/d/data-science-types/data-science-types-0.2.21.tar.gz
tar -xvf data-science-types-0.2.21.tar.gz
tree

and you can see that gen_pyi.py is not there. However, setup.py calls it so it fails.

[pandas] `_AtIndexerFrame` should support indexing by [int, int]

if you build a Dataframe with header=None the axes are [RangeIndex(start=0, stop=4, step=1), RangeIndex(start=0, stop=9, step=1)], so you can't access elements with df.at[123, 'xyz'], but you need to use df.at[123,456].

according to https://github.com/predictive-analytics-lab/data-science-types/blob/7dab8238df9e93d00be6d683d8efabbdf95fc958/pandas-stubs/core/indexing.pyi#L88, however, _AtIndexerFrame now only allow the second index to be a "_StrLike".

Ideally this should be a runtime check, but as I thing this is not possible with mypy, there should not be a false positive

Data frame __init__ compalins if data is a dictionary

I'm running this code:

import pandas

d = {"c": [1,2,3], "d": [4,5,6]}
df = pandas.DataFrame(data=d)

I was expecting no errors. However, I get this message:

Argument of type "Dict[str, List[int]]" cannot be assigned to parameter "data" of type "Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')] | DataFrame | Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')]] | None" in function "__init__"
  Type "Dict[str, List[int]]" cannot be assigned to type "Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')] | DataFrame | Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')]] | None"
    "Dict[str, List[int]]" is incompatible with "Series[TypeVar('_DType')]"
    "Dict[str, List[int]]" is incompatible with "Index[TypeVar('_T')]"
    "Dict[str, List[int]]" is incompatible with "ndarray[TypeVar('_DType')]"
    "Dict[str, List[int]]" is incompatible with "Sequence[TypeVar('_T_co')]"
    "Dict[str, List[int]]" is incompatible with "DataFrame"
    Cannot assign to "None"
      TypeVar "_VT" is invariant

I was expecting that Dict[str, List[int]] is compatible with Dict[_str, Sequence[TypeVar('_T_co')]] which is listed in the possible types of data. Probably I am missing what TypeVar('_T_co') means.

Module has no attribute "to_datetime"

It seems there's a missing stub for the pandas module (or I at least can't find it). In any case this code:

pd.to_datetime(...)

Throws
error: Module has no attribute "to_datetime"

Missing commas in generated pyplot.pyi

I think I've found a bug in the pyplot.pyi generation scripts.

I cloned this repository master branch, and set up a virtualenv from python 3.7.3 (virtualenv -p python3 .venv).

Running pip install -e . inside the virtual environment, generated a pyplot.pyi with missing commas at line 229 and below.

   222	def plot(
   223	    x: Data,
   224	    y: Data,
   225	    fmt: Optional[str] = ...,
   226	    *,
   227	    scalex: bool = ...,
   228	    scaley: bool = ...,
   229	    agg_filter: Callable[[_NumericArray, int], _NumericArray] = ...    # <-- comma missing here and at end of lines below
   230	    alpha: Optional[float] = ...
   231	    animated: Optional[bool] = ...
   232	    antialiased: Optional[bool] = ...
   233	    aa: Optional[bool] = ..., #alias of antialiased
   234	    clip_box: Optional[Bbox] = ...
   235	    clip_on: Optional[bool] = ...
   236	    clip_path: Optional[Callable[[Path, Transform], None]] = ...
   237	    color: Optional[str] = ...
   238	    c: Optional[str] = ...
   239	    contains: Optional[Callable[[Artist, MouseEvent], Tuple[bool, dict]]] = ...
   240	    dash_capstyle: Optional[Literal['butt', 'round', 'projecting']] = ...
   241	    dash_jointstyle: Optional[Literal['miter', 'round', 'bevel']] = ...
   242	    dashes: Optional[[Sequence[float], Tuple[None, None]]] = ...
   243	    drawstyle: Literal['default', 'steps', 'steps-pre', 'steps-mid', 'steps-post'] = ...
   244	    ds: Literal['default', 'steps', 'steps-pre', 'steps-mid', 'steps-post'] = ...
   245	    figure: Optional[Figure] = ...
   246	    fillstyle: Literal['full', 'left', 'right', 'bottom', 'top', 'none'] = ...

test_frame_iloc fails on Pandas 1.2

tests/pandas_test.py line 92 fails on Pandas 1.2

Extracting the relevant code

import pandas as pd
df: pd.DataFrame = pd.DataFrame(
    [[1.0, 2.0], [4.0, 5.0], [7.0, 8.0]],
    index=["cobra", "viper", "sidewinder"],
    columns=["max_speed", "shield"],
)
s: "pd.Series[float]" = df["shield"].copy()
df.iloc[0] = s

Results in

ValueError: could not broadcast input array from shape (3) into shape (2)

This runs fine on Pandas 1.1.5

add iterable to possible input types of pandas.concat

Hi,
in my opinion it's universally considered a best practice/pythonic to safe memory using a generator rather than a list e.g. when concatenating dataframes. Sadly for now that raises an exception in mypy since concat only excepts Union[Sequence[DataFrame], Mapping[str, DataFrame]]. Notice that generators are not considered sequences.

I would very much appreciate it if somebody could add iterables to the accepted input types of pandas.concat.

Thank you for your time.

check_all.sh fails when using a project level virtual environment

I am in the process of fleshing out a few pyi files with the definitions from Pandas.

My normal process for python development is to create a virtual environment on the root level of each project (to keep code segregated), like so:

python -m venv venv && . venv/bin/activate && pip install --upgrade pip && pip install -e .[dev]

After updating the pyi files and adding tests, everything looks okay, right up to the end of check_all.sh. When it is running the line && mypy tests \ this causes it to find a LOT (> 900 on my machine) of errors from packages in the venv folder. Sample output:

venv/lib/python3.8/site-packages/mypy/typeshed/stdlib/3/typing.pyi:675: error: Return type becomes "Union[bool, Any]" due to an unfollowed import
venv/lib/python3.8/site-packages/mypy/typeshed/stdlib/3/tkinter/commondialog.pyi:7: error: Function is missing a type annotation for one or more arguments
venv/lib/python3.8/site-packages/mypy/typeshed/stdlib/3/tkinter/commondialog.pyi:8: error: Function is missing a type annotation for one or more arguments
venv/lib/python3.8/site-packages/mypy/typeshed/stdlib/3/_thread.pyi:43: error: Function is missing a type annotation for one or more arguments
venv/lib/python3.8/site-packages/packaging/_typing.py:34: error: Statement is unreachable

Similar lines to those continue for many more lines.

I did notice if I deleted no_silence_site_packages = True this goes away, but not sure the intention behind that setting, so I didn't want to delete it and cause downstream issues.

Missing Union with primitive (_DType) types in numpy functions

Many numpy functions allow the usage not only of an _ArrayLike (List or ndarray) argument but also simple primitive value.
This is currently often not allowed with the type hints.

Two examples of code that results in errors when checked with mypy:

  • myarr = np.array(1.0) -> No overload variant of "array" matches argument type "float"
  • np.append(myarr, 1.0) -> Argument 2 to "append" has incompatible type "float"; expected "Union[Array[Any], Sequence[Any]]"

This could be fixed by using a Union[_ArrayLike, _DType] instead of just _ArrayLike.
Because this might be done for a lot of functions, I have refrained from partially changing the code where I know numpy accepts primitive types and submitting a pull request. I think this is better implemented on a general scale with more overview than I currently have.

Error when typing Numpy arrays within NamedTuples

data-science-types==0.2.12
mypy==0.770
typing==3.6.4

Code to reproduce:

import numpy as np
from typing import List, NamedTuple

class Ok(NamedTuple):
    x: List[int]

class Problem(NamedTuple):
    x: np.ndarray[np.int64]  # TypeError: 'type' object is not subscriptable

pyplot savefig type too narrow

In VSCode with pyright, I'm trying to call savefig with a BytesIO object where the fname would be. I'm ending up with:

Argument of type "BytesIO" cannot be assigned to parameter "fname" of type "str | Path" in function "savefig"
  Type "BytesIO" cannot be assigned to type "str | Path"
    "BytesIO" is incompatible with "str"
    "BytesIO" is incompatible with "Path"Pyright (reportGeneralTypeIssues)

Since BytesIO (also I think files opened in wb mode? I'm not sure about that part) is definitely a valid target, I'd like to PR the type in. Would making the fname type Union[str, Path, BytesIO] be sufficient? Or would you prefer more types that could technically fit into the fname slot?

I should add: this is great work. A co-worker of mine and I were looking around for numpy typings and this repo offers such an improvement over the default typing

Generalize many types to Sequence

There are many types that are unions of List and np.ndarray and Series. These should probably all be transformed to use Sequence instead (which also would cover legitimate uses of Tuple).

No overload variant of "where"

Is there a work around for this when using the where function in numpy? error: No overload variant of "where" matches argument types "Any", "int", "int"

bug in ndarray: incompatible type error

I think there is a bug in ndarray type hint.
I think that in case of np.array that has one row with only integers (type int or int[64]) and one row with at least one float (e.g. type float), then it produces an error since rows have different types so conflict like "np.ndarray[float] != np.ndarray[int]" occurs.

Reproducing code example:

# bug.py
import numpy as np 
arr = np.array([[4.2, 2, 3.5], [12, 3, 6]])   

and run mypy:

$ mypy bug.py 

Error message:

bug.py:3: error: Argument 1 to "array" has incompatible type "List[object]"; expected "Union[List[bool], List[List[bool]], List[List[List[bool]]], List[List[List[List[bool]]]]]"

NumPy/Python/data-science-types versions information:

1.19.2 / 3.8.5 / 0.2.20

_DtypeSpec type

Should there be a type that captures the type of thing that can be given as a dtype spec? I believe that this is Union[_str, Type[_np.dtype]], but I could be wrong. If we can identify a reasonable type for this, that might make a lot of typing smoother and more consistent.

Add TypedDict to DataFrame class Optional

Since PEP 589 in python 3.8, we can now create a class that inherits TypedDict and determine which type every key takes in a Dictionary, but it is not included at the DataFrame class optional, so it raises an error when running mypy.

I fixed locally by adding a variable:

_TypedDictLike = TypedDict

and passing it to the DataFrame class optional:

data: Optional[Union[_ListLike, DataFrame, Dict[_str, _ListLike], _TypedDictLike]]

I appreciate if you guys could implement it or if i could push a branch with this.

Thank you for your time!

Error on merge function of pandas data frames

Hi,
I'm using pyright in combination with the stubs provided by this repo. I'm getting a problem when I check the following code:

import pandas as pd

d = {'a': [1, 2]}
df = pd.DataFrame(data=d)

df_merged: pd.DataFrame = df.merge(right=df)

By running pyright against this code I get the following errors:

 3:19 - error: Argument of type "Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[int]]" cannot be assigned to parameter "data" of type "Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')] | DataFrame | Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')]] | None" in function "__init__"
  Type "Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[int]]" cannot be assigned to type "Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')] | DataFrame | Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[TypeVar('_T_co')]] | None"
    "Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[int]]" is incompatible with "Series[TypeVar('_DType')]"
    "Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[int]]" is incompatible with "Index[TypeVar('_T')]"
    "Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[int]]" is incompatible with "ndarray[TypeVar('_DType')]"
    "Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[int]]" is incompatible with "Sequence[TypeVar('_T_co')]"
    "Dict[_str, Series[TypeVar('_DType')] | Index[TypeVar('_T')] | ndarray[TypeVar('_DType')] | Sequence[int]]" is incompatible with "DataFrame"
    Cannot assign to "None"
      TypeVar "_VT" is invariant
... (reportGeneralTypeIssues)
  4:13 - error: "df.merge(right=df)" has type "Series[Unknown]" and is not callable (reportGeneralTypeIssues)
  4:1 - error: Type of "df_merged" is unknown (reportUnknownVariableType)

Ignoring error 3.19 (that however is not clear to me as well), I would like to focus on error 4:13. Why the function merge is said to have type Series[Unknown]?

To Reproduce
install data-science-types stubs:
pip install data-science-types

install pyright:
sudo npm install -g pyright

run the file with the code above:
pyright test.py

I've already posted the issue on pyright and I have been suggested to submit an issue here because pyright is behaving accordingly with the information provided by the sutbs.

Pandas 'SeriesGroupBy' has no method 'apply', 'groups', or 'get_group'

Script used:

import pandas as pd

df: pd.DataFrame = pd.DataFrame([[1, 2], [3, 4]], columns=["a", "b"], index=["c", "d"])

grouped = df.groupby("a")["b"]
grouped_list = grouped.apply(list)

print(df)
print(grouped)
print(grouped_list)
print(grouped.groups)
print(grouped.get_group(1))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.