Coder Social home page Coder Social logo

pathwaycom / pathway Goto Github PK

View Code? Open in Web Editor NEW
1.7K 1.7K 67.0 105.63 MB

Pathway is a high-throughput, low-latency data processing framework that handles live data & streaming for you. Made with ❤️ for Python & ML/AI developers.

Home Page: https://pathway.com

License: Other

Python 69.31% Rust 30.69%
batch-processing kafka machine-learning-algorithms pathway python real-time streaming

pathway's People

Contributors

avriiil avatar berkecanrizai avatar dependabot[bot] avatar dxtrous avatar embe-pw avatar gitfoxcode avatar izulin avatar janchorowski avatar kamilpiechowiak avatar krzysiek-pathway avatar lewymati avatar olruas avatar pathway-dev avatar pw-ppodhajski avatar szymondudycz avatar voodoo11 avatar xgendre avatar zxqfd555-pw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pathway's Issues

I want to convert a dict inside list into table with key as column name

What is your question or problem? Please describe.
How to convert a dict inside list into table with key as column name

Describe what you would like to happen
response = [
"key 1" : "abcc"
"key 2": "efgh"
]

I want to convert response to table
table = "key 1" | "key 2"
"abcc" "efgh"
Please help to achieve this

[Bug]: Error in loading documents AttributeError: 'ParseUnstructured' object has no attribute 'executor'

Steps to reproduce

Error in loading documents AttributeError: 'ParseUnstructured' object has no attribute 'executor'

Steps to reproduce:

  1. Clone this repo by Pathway https://github.com/pathway-labs/dropbox-ai-chat
  2. Follow the readme to setup the app and .env file
  3. Run the docker file using docker compose up to reproduce the error.
  4. Or Install dependencies using the pip and then run python main.py.
    image

Relevant log output

2024-03-21 11:10:11 /usr/local/lib/python3.11/site-packages/beartype/_util/hint/pep/utilpeptest.py:311: BeartypeDecorHintPep585DeprecationWarning: PEP 484 type hint typing.Sequence[str] deprecated by PEP 585. This hint is scheduled for removal in the first Python version released after October 5th, 2025. To resolve this, import this hint from "beartype.typing" rather than "typing". For further commentary and alternatives, see also:
2024-03-21 11:10:11     https://beartype.readthedocs.io/en/latest/api_roar/#pep-585-deprecations
2024-03-21 11:10:11   warn(
2024-03-21 11:10:11 /usr/local/lib/python3.11/site-packages/pathway/io/http/_server.py:669: UserWarning: delete_completed_queries arg of rest_connector should be set explicitly. It will soon be required.
2024-03-21 11:10:11   warn(
2024-03-21 11:10:13 Traceback (most recent call last):
2024-03-21 11:10:13   File "/app/main.py", line 12, in <module>
2024-03-21 11:10:13     app_api.run(host=host, port=port)
2024-03-21 11:10:13   File "/app/api.py", line 32, in run
2024-03-21 11:10:13     documents = input_data.select(texts=parser(pw.this.data))
2024-03-21 11:10:13                                         ^^^^^^^^^^^^^^^^^^^^
2024-03-21 11:10:13   File "/usr/local/lib/python3.11/site-packages/pathway/internals/udfs/__init__.py", line 194, in __call__
2024-03-21 11:10:13     return self.executor._apply_expression_type(
2024-03-21 11:10:13            ^^^^^^^^^^^^^
2024-03-21 11:10:13 AttributeError: 'ParseUnstructured' object has no attribute 'executor'

What did you expect to happen?

The main pipline was expected to start.

Version

0.2.1

Docker Versions (if used)

4.28.0

OS

Linux

On which CPU architecture did you run Pathway?

x86-64

RESTful API's exposed by Pathway - should tolerate trailing slash in endpoint URI?

Is your feature request related to a problem? Please describe.

Many (most?) RESTful API endpoints out there tolerate a trailing slash or not, e.g.:
https://www.codever.dev/api/public/bookmarks - 200 OK
https://www.codever.dev/api/public/bookmarks/ - 200 OK
Giving a 30X is a possibility, but usually both work identically either way.

Pathway RESTful API's don't seem to allow a trailing slash at all.

https://demo-document-indexing.pathway.stream/v1/statistics - 200 OK
https://demo-document-indexing.pathway.stream/v1/statistics/ - 404 Not found

Describe the solution you'd like
Discuss if the trailing-slash version should also work, and if so, as 30X or transparent 200.
Take into account implications for route caching, etc., in the decision.
If not, leave current behavior as is, and document.
Either way, add explanation to docs.

Describe alternatives you've considered
None

Additional context
Originally reported by user Hemant on Discord: https://discord.com/channels/1042405378304004156/1047451777940852736/1217801242760314893

Add syntax to be able to reduce over multiple windows

I wonder if it's possible to have some additional syntax to make working with many windows possible?

For example, with ClickHouse SQL it's possible to work across many windows like this:

WITH sales AS (
    -- Your dataset source here
)
SELECT
    date,
    salesperson_id,
    region,
    amount,
    product_id,
    ROW_NUMBER() OVER w_region_amount AS row_number_region,
    RANK() OVER w_salesperson_amount AS rank_salesperson,
    DENSE_RANK() OVER w_product_amount AS dense_rank_product,
    SUM(amount) OVER w_region AS sum_sales_region,
    AVG(amount) OVER w_region AS avg_sales_region,
    MAX(amount) OVER w_salesperson AS max_sales_salesperson,
    MIN(amount) OVER w_product AS min_sales_product,
    LEAD(amount, 1) OVER w_salesperson_date AS lead_amount,
    LAG(amount, 1) OVER w_salesperson_date AS lag_amount,
    NTILE(10) OVER w_global_amount AS decile_rank_by_amount,
    FIRST_VALUE(salesperson_id) OVER w_region_amount AS top_salesperson_region,
    LAST_VALUE(salesperson_id) OVER w_region_amount_rows AS last_salesperson_region,
    COUNT(*) OVER w_region AS count_sales_region,
    PERCENT_RANK() OVER w_region_amount AS percent_rank_region,
    CUME_DIST() OVER w_region_amount AS cume_dist_region
FROM sales
WINDOW
    w_region AS (PARTITION BY region),
    w_salesperson AS (PARTITION BY salesperson_id),
    w_product AS (PARTITION BY product_id),
    w_region_amount AS (PARTITION BY region ORDER BY amount DESC),
    w_salesperson_amount AS (PARTITION BY salesperson_id ORDER BY amount DESC),
    w_product_amount AS (PARTITION BY product_id ORDER BY amount DESC),
    w_salesperson_date AS (PARTITION BY salesperson_id ORDER BY date),
    w_global_amount AS (ORDER BY amount DESC),
    w_region_amount_rows AS (PARTITION BY region ORDER BY amount DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
ORDER BY region, amount DESC;

With Pathway, I found myself having to define each window as a separate table, and then joining them back together. The resulting code was quite verbose.

Unless I am missing something and it's possible to do it succinctly with Pathway?

Static table import (table_from_...) should not be under `pw.debug`

Is your feature request related to a problem? Please describe.
A number of table_from_... import functions, such as those from Pandas, Markdown, etc. are hidden away in the pw.debug module. They should not be.

I am not sure to what extent this is also true for table export functions (table_to_...).

Describe the solution you'd like
Moving these functions to some other more natural module for static data import, or perhaps allowing pw.Table(dataframe_like_object) constructors directly.

Describe alternatives you've considered
N/A

Additional context
This is a topic that comes up regularly in Discord user questions and also in issues like #30.

Pathway Python package on conda-forge

Dear Pathway Team,

Thank you very much for this amazing project. Can you please also make it available on conda-forge?

Thank you very much in advance

Set up GitHub action to run tests

Set up GitHub action to do automated testing of pathway codebase, whenever a PR is submitted for main branch of the repository.

  • Rust tests (via Cargo)
  • Python tests (via Pytest)

Later codecov and other linting checks can be added for better visibility.

I would love to work on this.

[Bug]: "AttributeError: 'DataFrame' object has no attribute 'map'" when running Live Data Jupyter notebook.

Steps to reproduce

Run the Live Data Jupyter notebook.
The graph will get updated but at some point it will crash.

Relevant log output

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

[<ipython-input-9-09a87c3beb30>](https://localhost:8080/#) in <cell line: 1>()
----> 1 pw.run()

7 frames

[/usr/local/lib/python3.10/dist-packages/pathway/internals/runtime_type_check.py](https://localhost:8080/#) in with_type_validation(*args, **kwargs)
     17         """
     18         try:
---> 19             return beartype.beartype(f)(*args, **kwargs)
     20         except beartype.roar.BeartypeCallHintParamViolation as e:
     21             raise TypeError(e) from None

<@beartype(pathway.internals.run.run) at 0x79e30a1a7880> in run(__beartype_func, __beartype_conf, __beartype_get_violation, __beartype_object_96379127938720, __beartype_object_134015854386560, __beartype_object_134015838820288, __beartype_object_134015997294208, __beartype_object_96379063192384, *args, **kwargs)

[/usr/local/lib/python3.10/dist-packages/pathway/internals/run.py](https://localhost:8080/#) in run(debug, monitoring_level, with_http_server, default_logging, persistence_config, runtime_typechecking, license_key, terminate_on_error)
     49         terminate_on_error=terminate_on_error,
     50         _stacklevel=4,
---> 51     ).run_outputs()
     52 
     53 

[/usr/local/lib/python3.10/dist-packages/pathway/internals/graph_runner/__init__.py](https://localhost:8080/#) in run_outputs(self, after_build)
    116         after_build: Callable[[ScopeState, OperatorStorageGraph], None] | None = None,
    117     ) -> None:
--> 118         self.run_nodes(self._graph.global_scope.output_nodes, after_build=after_build)
    119 
    120     def has_bounded_input(self, table: table.Table) -> bool:

[/usr/local/lib/python3.10/dist-packages/pathway/internals/graph_runner/__init__.py](https://localhost:8080/#) in run_nodes(self, nodes, after_build)
     92     ):
     93         all_nodes = self._tree_shake(self._graph.global_scope, nodes)
---> 94         self._run(all_nodes, after_build=after_build)
     95 
     96     def run_tables(

[/usr/local/lib/python3.10/dist-packages/pathway/internals/graph_runner/__init__.py](https://localhost:8080/#) in _run(self, nodes, output_tables, after_build, run_all)
    196             ):
    197                 try:
--> 198                     return api.run_with_new_graph(
    199                         logic,
    200                         event_loop=event_loop,

[/usr/local/lib/python3.10/dist-packages/pathway/internals/table_subscription.py](https://localhost:8080/#) in on_change_wrapper(key, values, time, diff)
    132             row[field_name] = field_value
    133 
--> 134         return on_change(key=key, row=row, time=time, is_addition=(diff == 1))
    135 
    136     table_to_datasink(

[/usr/local/lib/python3.10/dist-packages/pathway/stdlib/viz/table_viz.py](https://localhost:8080/#) in update(key, row, time, is_addition)
    137                 df = df[col_names]
    138 
--> 139                 df = df.map(_format_types)
    140 
    141                 dynamic_table.value = df

[/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py](https://localhost:8080/#) in __getattr__(self, name)
   5987         ):
   5988             return self[name]
-> 5989         return object.__getattribute__(self, name)
   5990 
   5991     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'map'

What did you expect to happen?

Running until the end without error.

Version

latest with pip install

Docker Versions (if used)

No response

OS

Linux

On which CPU architecture did you run Pathway?

None

Deprecating `pw.DATE_TIME_UTC`

Is your feature request related to a problem? Please describe.
Pathway column pw.DATE_TIME_UTC and data type pw.DateTimeUtc can be unified into the latter one.
Similarly pw.DATE_TIME_NAIVE and pw.DateTimeNaive

Describe the solution you'd like
Depreaction of pw.DATE_TIME_UTC and pw.DATE_TIME_NAIVE.

encountering a ValueError when executing simple pw.io.airbyte.write

this is the code i am trying to run

  import pathway as pw
  
  issues_table = pw.io.airbyte.read(
      "./connections/github.yaml",
      streams=["issues"],
  )
  
  pw.io.jsonlines.write(issues_table, "issues.jsonlines")
  pw.run()

this is whats in the ./connections/github.yaml

    source:
      docker_image: "airbyte/source-github" 
      config: 
        credentials:
         
          option_title: "PAT Credentials" 
          personal_access_token:  
        repositories: 
          - pathwaycom/pathway 
        api_url: "https://api.github.com/" 
      streams: issues 

the error i am getting for executing this is :

ValueError: Could not read json file /mnt/temp/catalog.json: Expecting value: line 1 column 1 (char 0). Please ensure that it is a valid JSON.\n", "failure_type": "system_error", "stream_descriptor": null}

[Bug]: Error when running the debezium postgres example

Steps to reproduce

Hello everyone, I am trying to run the debezium-postgres-example under the pathway-examples repo, however I get the error shown in the log below.
I am just running the make command under the debezium-postgres-example; all the contains start correctly, but for some reason pathway is not able to parse the received message from Kafka.

Any idea of how this can be solved?

Relevant log output

Imports OK!
Starting Pathway:
<pathway.Table schema={'value': <class 'int'>}>
[2024-01-27T17:48:28]:INFO:Preparing Pathway computation
[2024-01-27T17:48:28]:ERROR:librdkafka: Global error: UnknownTopicOrPartition (Broker: Unknown topic or partition): Subscribed topic not available: dbserver1.public.values: Broker: Unknown topic or partition
[2024-01-27T17:48:28]:ERROR:There had been an error processing the row read result: Message consumption error: UnknownTopicOrPartition (Broker: Unknown topic or partition)
[2024-01-27T17:48:28]:INFO:KafkaReader-0: 0 entries (1 minibatch(es)) have been sent to the engine
[2024-01-27T17:48:28]:INFO:PsqlWriter-1: Done writing 0 entries, time 1706377708732. Current batch writes took: 0 ms. All writes so far took: 0 ms.
[2024-01-27T17:48:28]:INFO:FileWriter-0: Done writing 0 entries, time 1706377708732. Current batch writes took: 0 ms. All writes so far took: 0 ms.
[2024-01-27T17:48:41]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:41]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:42]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:42]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:43]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:43]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:44]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:45]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:45]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:46]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:46]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:47]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:47]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:48]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:48]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:49]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:49]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:50]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:50]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:51]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:51]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:52]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload
[2024-01-27T17:48:53]:ERROR:Read data parsed unsuccessfully. received message doesn't have payload

What did you expect to happen?

I would expect to see the same output as the one shown in the tutorial page:
https://pathway.com/developers/user-guide/exploring-pathway/realtime-analytics-with-cdc

Version

0.7.10

Docker Versions (if used)

24.0.7

OS

Linux

On which CPU architecture did you run Pathway?

x86-64

[Bug]: rdkafka consumer queue not available

Steps to reproduce

When I use kafka connector, program crashed and tell me rdkafka consumer queue not available.

Relevant log output

Traceback (most recent call last):
  File "/home/ubuntu/Desktop/syscall_ids/pp.py", line 34, in <module>
    pw.run()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/runtime_type_check.py", line 19, in with_type_validation
    return beartype.beartype(f)(*args, **kwargs)
  File "<@beartype(pathway.internals.run.run) at 0x7fc66df8fc70>", line 107, in run
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/run.py", line 41, in run
    ).run_outputs()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/graph_runner/__init__.py", line 91, in run_outputs
    return self._run(context, after_build=after_build)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/graph_runner/__init__.py", line 139, in _run
    return api.run_with_new_graph(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/graph_runner/__init__.py", line 117, in logic
    storage_graph.build_scope(scope, state, self)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/graph_runner/storage_graph.py", line 326, in build_scope
    handler.run(operator, self.output_storages[operator])
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/graph_runner/operator_handler.py", line 81, in run
    self._run(operator, output_storages)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/graph_runner/operator_handler.py", line 132, in _run
    materialized_table = self.scope.connector_table(
ValueError: Creating Kafka consumer failed: Client creation error: rdkafka consumer queue not available
Occurred here:
    Line: t = pw.io.kafka.read(
    File: /home/ubuntu/Desktop/syscall_ids/pp.py:17

What did you expect to happen?

it works fine

Version

0.7.9

Docker Versions (if used)

No response

OS

Linux

On which CPU architecture did you run Pathway?

x86-64

My kafka deploy way

kafka is in docker

version: '2'

networks:
  app-tier:
    driver: bridge

services:
  kafka:
    image: 'bitnami/kafka:latest'
    networks:
      - app-tier
    ports:
      - 9094:9094
    container_name: kafka
    hostname: kafka
    environment:
      - KAFKA_CFG_NODE_ID=0
      - KAFKA_CFG_PROCESS_ROLES=controller,broker
      - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093,EXTERNAL://:9094
      - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092,EXTERNAL://:9094
      - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,EXTERNAL:PLAINTEXT,PLAINTEXT:PLAINTEXT
      - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=0@kafka:9093
      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
  kafka-ui:
    image: provectuslabs/kafka-ui:latest
    container_name: kafka-ui
    hostname: kafka-ui
    networks:
      - app-tier
    ports: 
      - 19094:8080
    depends_on:
      - kafka
    environment:
      - DYNAMIC_CONFIG_ENABLED=true
      - KAFKA_CLUSTERS_0_NAME=aptcapture-syscall-channel
      - KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=kafka:9092

rdkafka config

kafka_settings = {"bootstrap.servers": "10.8.56.233:9094", "group.id": ""}

[QUESTION] How to create accumulated batch for processing?

Hello,

I have a use-case where I need to process 10 million rows. First, I want to process 1M rows when they arrive, then I want to process 2M rows (1M previous + 1M new), then 3M rows, and so on in that order. How can I do it with Pathway?

I was able to do something like this with AsyncTransformer and accumulating the rows inside of it, but I find this solution clunky and perhaps there is more Pathway-like approach I could take here?

Thank you very much in advance for your help

[QUESTION]: Pathway support for custom AirByte connector?

Question:
If we were to build a custom AirByte source connector, would it get automatically supported on Pathway via AirByte Serverless? If not, what would be the turn around time for support via Pathway? Also, would there be additional work on our side towards AirByte Serverless/ PyAirByte or to use Pathway?

Jsonlines connector issue with mapping

Flatten data structures in a Jsonline file can not be mapped to structured schemas automatically.

For example, list_price and current_price mapping to the scheme is failing:

{"position": 1, "link": "https://www.amazon.com/Avia-Resistant-Restaurant-Service-Sneakers/dp/B0BJY1FN8F", "asin": "B0BJXSKK9L", "is_lightning_deal": false, "deal_type": "BEST_DEAL", "is_prime_exclusive": false, "starts_at": "2023-08-14T07:00:08.270Z", "ends_at": "2023-08-21T06:45:08.270Z", "type": "multi_item", "title": "Avia Anchor SR Mesh Slip On Black Non Slip Shoes for Women, Comfortable Water Resistant Womens Food Service Sneakers - Black, Blue, or White Med or Wide Restaurant, Slip Resistant Work Shoes Women", "image": "https://m.media-amazon.com/images/I/3195IpEIRpL._SY500_.jpg", "deal_price": 39.98, "list_price": {"value": 59.98, "currency": "USD", "symbol": "$", "raw": "59.98", "name": "List Price"}, "current_price": {"value": 39.98, "currency": "USD", "symbol": "$", "raw": "39.98", "name": "Current Price"}, "merchant_name": "Galaxy Active", "free_shipping": false, "is_prime": true, "is_map": false, "deal_id": "34f3da97", "seller_id": "A3GMJQO0HY62S", "description": "Avia Anchor SR Mesh Slip On Black Non Slip Shoes for Women, Comfortable Water Resistant Womens Food Service Sneakers - Black, Blue, or White Med or Wide Restaurant, Slip Resistant Work Shoes Women", "rating": 4.16, "ratings_total": 1148, "old_price": 59.98, "currency": "USD"}

In this data schema:

class Price(pw.Schema):
value: float
currency: str
symbol: str
raw: str
name: str

class DealResult(pw.Schema):
position: int
link: str
asin: str
is_lightning_deal: bool
deal_type: str
is_prime_exclusive: bool
starts_at: str
ends_at: str
type: str
title: str
image: str
deal_price: Price
list_price: Price
current_price: Price
merchant_name: str
free_shipping: bool
is_prime: bool
is_map: bool

I tried to read the above jsonlines:

sales_data = pw.io.jsonlines.read(
    data_dir,
    schema=schema,
    mode="streaming",
    autocommit_duration_ms=50,
)

The error I got:

Read data parsed unsuccessfully. field deal_price with no JsonPointer path specified is absent in

If you do not specify schema param at all, Pathway does not do automatic mapping with all fields existing in jsonlines file.

Expected outcome:

An ideal solution will be when the connector can map everything automatically to table columns if I do not specify to extract any specific fields for processing. If I specify only extract list_price, then LLM App can create a table only with list_price.

how to change sync mode from incremental to full refresh

this is the code i am trying to execute

import pathway as pw

issues_table = pw.io.airbyte.read(
    "./connections/github.yaml",
    streams=["tags"],
)

pw.io.jsonlines.write(issues_table, "issues.jsonlines")

pw.run()

and this is the error i am receiving

  Traceback (most recent call last):
    File "/home/hisham/Documents/GitHub/github-service/github.py", line 3, in <module>
      issues_table = pw.io.airbyte.read(
    File "/usr/local/lib/python3.10/dist-packages/pathway/io/airbyte/__init__.py", line 280, in read
      raise ValueError(f"Stream {name} doesn't support 'incremental' sync mode")
  ValueError: Stream tags doesn't support 'incremental' sync mode

how do i change the sync mode ??

Groupby / ixref tutorial enhancement

Is your feature request related to a problem? Please describe.
The examples in the tutorial at https://pathway.com/developers/user-guide/data-transformation/indexing-grouped-tables don't mention the name / identifier of the employee in any column in the source data. This makes it harder to tell apart groupby results (aggregates) from source data.

Describe the solution you'd like
Update the tutorial, e.g., add first names of employees as an extra column in input example tables (Alice, Bob, Charlie, etc.). Leaving decisions / priorities to @olruas.

Describe alternatives you've considered
n/a.

Additional context
This could also make the tutorial more immediately relatable when comparing to SQL-windowing syntax like in #19.

support datetime + formatting in connectors

Is your feature request related to a problem? Please describe.
Connectors (python connector, jsonlines, kafka...) pass datetime values as str.

Describe the solution you'd like
Add support for:

  • formatting auxiliary info in schemas and/or pathway types
  • connectors doing basic preprocessing using schema auxiliary info

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Add specialized Pointer type.

Is your feature request related to a problem? Please describe.
Right now pathway tracks just the type Pointer, without the type of the pointer itself.

Describe the solution you'd like
It should at least track the type used to construct this pointer, eg Pointer[int,str]. This way it will have (at least) poor-man's type verification of key columns, and it would catch errors such as:

  • concating/update_cells between tables with incompatible keys;
  • ix_ref'ing into tables using wrong columns as keys.

[QUESTION] Why does `asof_join` line take long to spin up?

I noticed that when running in Colab building up a compute graph for an asof_join takes several seconds, regardless of table size.

In the example below, taken from API documentation, it takes 4s, then if you duplicate the line in the cell, it takes 8s, etc.

Why is this the case? Does this only happen in interactive mode?

image

# -*- coding: utf-8 -*-
"""Colab_test_asof_join.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/1pWox7vvDoSuohRZ1EpWR2QDGSRSOmgsV
"""

!pip install pathway

import pathway as pw

t1 = pw.debug.table_from_markdown(
    '''
    value | event_time | __time__
      2   |      2     |     4
      3   |      5     |     6
      4   |      1     |     8
      5   |      7     |    14
'''
)
t2 = pw.debug.table_from_markdown(
    '''
    value | event_time | __time__
      42  |      1     |     2
       8  |      4     |    10
'''
)

result_join = t1.join(t2, t1.event_time ==t2.event_time, how=pw.JoinMode.LEFT).select(event_time = t1.event_time)

result_asof_join = t1.asof_join(
    t2, t1.event_time, t2.event_time, how=pw.JoinMode.LEFT
).select(
    left_value=t1.value,
    right_value=t2.value,
    left_time=t1.event_time,
    right_time=t2.event_time,
)```

[Bug]: RuntimeError: exception in Python subject: KeyError: 'data'

Steps to reproduce

Getting an error when trying to run the Airbyte showcase example from here: https://pathway.com/developers/showcases/etl-python-airbyte

Relevant log output

Traceback (most recent call last):
  File "/Users/vikassinghvi/Documents/GitHub/veloraapp/services/github_issues.py", line 53, in <module>
    pw.run()
  File "/Users/vikassinghvi/Documents/GitHub/veloraapp/services/venv/lib/python3.11/site-packages/pathway/internals/runtime_type_check.py", line 19, in with_type_validation
    return beartype.beartype(f)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<@beartype(pathway.internals.run.run) at 0x1478f8360>", line 129, in run
  File "/Users/vikassinghvi/Documents/GitHub/veloraapp/services/venv/lib/python3.11/site-packages/pathway/internals/run.py", line 47, in run
    ).run_outputs()
      ^^^^^^^^^^^^^
  File "/Users/vikassinghvi/Documents/GitHub/veloraapp/services/venv/lib/python3.11/site-packages/pathway/internals/graph_runner/__init__.py", line 103, in run_outputs
    self.run_nodes(self._graph.global_scope.output_nodes, after_build=after_build)
  File "/Users/vikassinghvi/Documents/GitHub/veloraapp/services/venv/lib/python3.11/site-packages/pathway/internals/graph_runner/__init__.py", line 79, in run_nodes
    self._run(all_nodes, after_build=after_build)
  File "/Users/vikassinghvi/Documents/GitHub/veloraapp/services/venv/lib/python3.11/site-packages/pathway/internals/graph_runner/__init__.py", line 179, in _run
    return api.run_with_new_graph(
           ^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: exception in Python subject: KeyError: 'data'

What did you expect to happen?

Expected to pull the commits into a jsonlines file, as demonstrated.

Version

0.8.3

Docker Versions (if used)

No response

OS

MacOS

On which CPU architecture did you run Pathway?

None

Add strict key checking mode

Is your feature request related to a problem? Please describe.
It is easy to make a promise that is indeed later broken. This might led to a runtime error later down the line.

Describe the solution you'd like
Add a mode that would check at runtime:

  • every universe equality/subset/disjointness promise
  • that keys produced by with_id_from are in fact disjoint
  • each -1 in a row is matched with preceding +1
  • ... (probably more usages)

Allow sending columns with raw bytes in the python connector

Is your feature request related to a problem? Please describe.
Cannot stream a table with multiple columns via a Python connector, when one of them is of type bytes. For example:

message = {"col1": "foo", "col2": some_bytes}
self.next_json(message)

doesn't work as bytes are not json serializable.
self.next_bytes is not useful as well as it expects bytes encoded json string.

Some workaround is to use base64 to serialize/deserialize, but it's not ideal.

[Bug]: Adding schemas loses column properties.

Steps to reproduce

The schema resulting from the operation:

schema1 | schema2

where both these schemas have nontrivial properties, loses those properties.

Relevant log output

n/a

What did you expect to happen?

Properties preserved

Version

0.8.0

Docker Versions (if used)

No response

OS

MacOS

On which CPU architecture did you run Pathway?

None

[Bug?]: Missing Sharepoint support?

Steps to reproduce

  1. Run in Python: from pathway.xpacks.connectors import sharepoint
  2. Observe error: ModuleNotFoundError: No module named 'pathway.xpacks.connectors'

What did you expect to happen?

According to the changelog (here and here), there should be support for Sharepoint via pathway.xpacks.connectors.sharepoint.read. However, pw.xpacks.connectors does not exist (in 0.8.5). Sharepoint support is advertised in multiple places, but I cannot find it anywhere (also not in pathway.io). Did I overlook something? Or is it a documentation bug or a code bug?

Version

0.8.5

OS

Linux

On which CPU architecture did you run Pathway?

x86-64

Code clean-up: prompt template string elements repeated across code locations

Is your feature request related to a problem? Please describe.
The following lines of code:
https://github.com/pathwaycom/pathway/blob/main/python/pathway/xpacks/llm/prompts.py#L121
https://github.com/pathwaycom/pathway/blob/main/python/pathway/xpacks/llm/question_answering.py#L15
https://github.com/pathwaycom/pathway/blob/main/python/pathway/xpacks/llm/prompts.py#L35
define string constants representing no-answer-found in separate ways. I am wondering if there is a purpose in having a default argument value which is overridden by its primary wrapper.

Describe the solution you'd like
This could merit clean-up.

Describe alternatives you've considered
Leave as is.

Additional context
N/A

[Bug]: Reproduction of Twitter's Custom Python Connector code encours Auth issues

Steps to reproduce

I've used the code directly from the Python Custom Connector example as give below:
Code

There's a constant Authentication issue when I use the BEARER_TOKEN generated directly from Twitter's API dashboard.

I'm using the default project and the default app generated by twitter upon signing-up for free tier

Relevant log output

When use BEARER TOKEN following is the log-output on dashboard:

Stream encountered HTTP error: 403                                                                                                                    
                      ERROR    HTTP error response text: {"client_id":"28689517","detail":"When authenticating requests to the Twitter API v2 endpoints, you must use keys and       
                               tokens from a Twitter developer App that is attached to a Project. You can create a project via the developer                                         
                               portal.","registration_url":"https://developer.twitter.com/en/docs/projects/overview","title":"Client Forbidden","required_enrollment":"Appropriate   
                               Level of API Access","reason":"client-not-enrolled","type":"https://api.twitter.com/2/problems/client-forbidden"}

What did you expect to happen?

Expected to run without any AUTH issues

Version

0.9.0

Docker Versions (if used)

No response

OS

Linux

On which CPU architecture did you run Pathway?

x86-64

Document how to run Airbyte connectors (via Airbyte Serverless) with Pathway inside Docker

Is your feature request related to a problem? Please describe.
Airbyte Serverless which is used inside https://pathway.com/developers/api-docs/pathway-io/airbyte relies on Docker.
Dockerizing a Docker app causes Docker in Docker ("DinD") needs a choice of specific setup flags: https://medium.com/@shivam77kushwah/docker-inside-docker-e0483c51cc2c
These are not explained in Pathway documentation, causing questions / confusion.

Describe the solution you'd like
Choose the preferred resolution for DinD.
Mention that dockerization needs special attention (inside pw.io.airbyte docs).
Link to tutorial with this resolution - e.g. include in guide or add to https://pathway.com/developers/showcases/etl-python-airbyte as subsection.

Describe alternatives you've considered
n/a

Additional context
This might also help workaround situations like #13.

Warn about missing `pw.run()`

Is your feature request related to a problem? Please describe.
It's easy to forget doing pw.run(), which results in no computation taking place, which can be confusing.

Describe the solution you'd like
There should be a warning when exiting if there are nodes in the computation graph added after the last pw.run() call (including if there was no call at all)

[Bug]: Error with subclassing behavior in pw.Schema

Steps to reproduce

If you have a schema that has an optional parameter, and you create a subclass of that schema, the optional parameter becomes non-optional.

Example:

class SummarizeQuerySchema(pw.Schema):
    text_list: list[str]
    model: str | None = pw.column_definition(default_value="gpt-3.5-turbo")
class SummarizeQuerySchemaAPI(SummarizeQuerySchema):
    openai_api_key: str

In this case, model that was optional becomes non-optional in the second schema.

Relevant log output

I get an error message stating `model` is needed during runtime.

What did you expect to happen?

It should keep the initial behavior after inheriting.

Version

0.8.2

Docker Versions (if used)

No response

OS

Linux

On which CPU architecture did you run Pathway?

x86-64

Add lead / lag / first_value / last_value - like reducers.

I tried to implement some functionality that would give me the previous seen value inside a window and I have realised that it is currently quite cumbersome to do it in Pathway.

I think this feature will be useful for anyone trying to do not-so-trivial aggregations.

[Bug]: Json type does not behave like a transparent wrapper for the value

Steps to reproduce

when passing Json unpacked float values to geopy distance function, I received a following error:

    return distance.distance([lat, lon], [lat2, lon2]).meters
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/przemek/miniconda3/envs/202306-pathway/lib/python3.11/site-packages/geopy/distance.py", line 540, in __init__
    super().__init__(*args, **kwargs)
  File "/home/przemek/miniconda3/envs/202306-pathway/lib/python3.11/site-packages/geopy/distance.py", line 276, in __init__
    kilometers += self.measure(a, b)
                  ^^^^^^^^^^^^^^^^^^
  File "/home/przemek/miniconda3/envs/202306-pathway/lib/python3.11/site-packages/geopy/distance.py", line 556, in measure
    a, b = Point(a), Point(b)
           ^^^^^^^^
  File "/home/przemek/miniconda3/envs/202306-pathway/lib/python3.11/site-packages/geopy/point.py", line 175, in __new__
    return cls.from_sequence(seq)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/przemek/miniconda3/envs/202306-pathway/lib/python3.11/site-packages/geopy/point.py", line 472, in from_sequence
    return cls(*args)
           ^^^^^^^^^^
  File "/home/przemek/miniconda3/envs/202306-pathway/lib/python3.11/site-packages/geopy/point.py", line 188, in __new__
    _normalize_coordinates(latitude, longitude, altitude)
  File "/home/przemek/miniconda3/envs/202306-pathway/lib/python3.11/site-packages/geopy/point.py", line 57, in _normalize_coordinates
    latitude = float(latitude or 0.0)
               ^^^^^^^^^^^^^^^^^^^^^^
TypeError: float() argument must be a string or a real number, not 'Json'

Relevant log output

-

What did you expect to happen?

Json unpacked float to be a transparent wrapper for float value

Version

0.8.0

Docker Versions (if used)

No response

OS

Linux

On which CPU architecture did you run Pathway?

None

[Bug]: pw.debug.compute_and_print() + pathway processes

Steps to reproduce

when run under pathway spawn -n 8, tables are printed weirdly, eg:

 | cnt
 | cnt
 | cnt
            | cnt
^PWSRT42... | 4216540
 | cnt
 | cnt
 | cnt
 | cnt

Relevant log output

-

What did you expect to happen?

 | cnt
^PWSRT42... | 4216540

Version

0.8.0

Docker Versions (if used)

No response

OS

Linux

On which CPU architecture did you run Pathway?

None

I am facng issue while getting column names

What is your question or problem? Please describe.
Am facing problem when am trying to get column names using pw.debug.compute_and_print(T1.column_names())

Describe what you would like to happen
Actually i want to check if a specific column exists in Table if exists I want to carry out some logic

[Bug]: Simple processing of 1M rows takes more than 10 minutes

Steps to reproduce

I have a simple Pathway function that takes more than 30 seconds to run. I am curious if this is expected or there is something wrong? How long does this take for you?

The code is:

import pathway as pw

class MySchema(pw.Schema):
    datetime: str
    flag1: bool
    val1: float
    val2: str
    val3: int
    
def run():
    data = pw.io.csv.read('data.csv', schema=MySchema, mode='static')
    clean_data = data.select(val1=pw.this.val1, val2=pw.this.val2, datetime=pw.this.datetime.dt.strptime(fmt='%Y-%m-%dT%H:%M:%S.%f'))
    pw.debug.compute_and_print(clean_data, n_rows=5)
  
run()

Just create a CSV file with 1M rows to test this.

Now, I am trying to understand why is this taking so long. What is the best way to profile Pathway performance? Also, what is the best way to load the data with the DateTimeNaive datatype from CSV? The logs from previous runs are telling me parsing DateTimeNaive from an external datasource is not supported.

Relevant log output

There are no errors, just a Pathway run to completion

What did you expect to happen?

I expected that this operation would take a few hundred ms at best? Or maybe a second.

Version

0.8.3

Docker Versions (if used)

No response

OS

Linux

On which CPU architecture did you run Pathway?

x86-64

Make external dependencies optional

Hello, do you think it's possible to make some of the external dependencies optional for Pathway?

Things like:

  • google-api-python-client
  • Office365-REST-Python-Client

or anything else that the end users might not actually need to run Pathway? In my personal case, I never intend to use google-api or Office365 and I don't want to bring these dependencies in my Python project. I think it makes sense if these become optional / installable separately?

[Bug]: pathway monitoring + pathway processes

Steps to reproduce

when run under pathway spawn -n 8, pathway monitoring feature doesn't work properly.

It runs 8 parallel printing processes at once, which obfuscates the output.

Relevant log output

-

What did you expect to happen?

Version

0.8.0

Docker Versions (if used)

No response

OS

Linux

On which CPU architecture did you run Pathway?

None

[Bug]: Filewriter doesn't seem to write any data to json while using custom connector

Steps to reproduce

I'm trying to connect reddit with pathway

This is the code
`class RedditClient:
def init(self, keyword, client_id= <Reddit_client_id>, client_secret= <Reddit_secret>, user_agent='pathway::1.0 (by /u/)'):
self.reddit = praw.Reddit(client_id=client_id, client_secret=client_secret, user_agent=user_agent)
self.keyword = keyword

def get_top_comments(self, limit=10):
    submissions = self.reddit.subreddit('all').search(self.keyword, sort='top', limit=limit)
    top_comments = []
    for submission in submissions:
        submission.comments.replace_more(limit=0)
        for comment in submission.comments:
            if isinstance(comment, praw.models.Comment):
                top_comments.append(comment.body)
    return top_comments

def read(self):
    top_comments = self.get_top_comments()
    for idx, comment in enumerate(top_comments, start=1):
        yield {'key': idx, 'text': comment}

def write(self, data):
    raise NotImplementedError("This connector does not support writing")

def delete(self, key):
    raise NotImplementedError("This connector does not support deleting")

def update(self, key, data):
    raise NotImplementedError("This connector does not support updating")

class RedditSubject(pw.io.python.ConnectorSubject):
_reddit_client: RedditClient

def __init__(self, keyword) -> None:
    super().__init__()
    self._reddit_client = RedditClient(keyword)

def run(self) -> None:
    top_comments = self._reddit_client.get_top_comments()
    for comment in top_comments:
        print("Emitting comment:", comment)  # Debug statement
        self.next_json({"text": comment})

def on_stop(self) -> None:
    pass  # RedditClient doesn't have a disconnect method

class InputSchema(pw.Schema):
key: int = pw.column_definition(primary_key=True)
text: str

input = pw.io.python.read(
RedditSubject("python"),
schema=InputSchema,
autocommit_duration_ms=1000,
)
pw.io.jsonlines.write(input, "table.jsonlines")
pw.run()`

Reddit client_id and secret can be obtained from here: https://www.reddit.com/prefs/apps

Relevant log output

LOGS                                                                                                                                                                                                              
                      ERROR    Read data parsed unsuccessfully. field key with no JsonPointer path specified is absent in {"text":"[deleted]"}                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                    
  [04/02/24 17:20:06] ERROR    Read data parsed unsuccessfully. field key with no JsonPointer path specified is absent in {"text":"What is this? A reddit thread that's being generally pro-vegan? \n\nWhat's next, civil discourse with people adjusting their views when presented with new info?!?"}                                                                                                                        
                      ERROR    Read data parsed unsuccessfully. field key with no JsonPointer path specified is absent in {"text":"Vegans

What did you expect to happen?

The data should've been written to the json file

Screenshot from 2024-04-02 16-50-16
Screenshot from 2024-04-02 17-07-24

Version

0.8.5

Docker Versions (if used)

No response

OS

Linux

On which CPU architecture did you run Pathway?

x86-64

[Bug]: Slow pip dependencies - wheels building for python-sat and others

Steps to reproduce

Today, fresh Pathway installations from pip are taking rather long minutes rather than the usual 35 seconds (Colab timings).
I would propose to fix this as a priority.

Relevant log output

Pip takes most time on: `Building wheels for collected packages: python-sat`.

What did you expect to happen?

No wheels built from sources

Version

0.8.4

Docker Versions (if used)

No response

OS

Linux

On which CPU architecture did you run Pathway?

None

[Bug]: Tutorial on Defining behaviors for streaming windows

Steps to reproduce

I try to reproduce your tutorial : https://pathway.com/developers/user-guide/exploring-pathway/from-jupyter-to-deploy/#part-2-from-static-data-exploration-to-interactive-dashboard-prototyping

I use only copy/paste your code and I obtain an error of input type when I execute the cell bellow "Please add the behavior argument to window definition as in the code snippet below."

python version : 3.11.8
pathway version : 0.8.2

Relevant log output

~~~python
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[23], line 3
      1 minute_20_stats = (
      2     data
----> 3     .windowby(
      4         pw.this.t, 
      5         window=pw.temporal.sliding(
      6             hop=datetime.timedelta(minutes=1),
      7             duration=datetime.timedelta(minutes=20)
      8         ),
      9         # Wait until the window collected all data before producing a result
     10         behavior=pw.temporal.exactly_once_behavior(),
     11         instance=pw.this.ticker
     12     )
     13     .reduce(
     14         ticker=pw.this._pw_instance,
     15         t=pw.this._pw_window_end,
     16         volume=pw.reducers.sum(pw.this.volume),
     17         transact_total=pw.reducers.sum(pw.this.volume * pw.this.vwap),
     18         transact_total2=pw.reducers.sum(pw.this.volume * pw.this.vwap**2)
     19     )
     20     .with_columns(
     21         vwap=pw.this.transact_total [/](http://localhost:8888/) pw.this.volume
     22     )
     23     .with_columns(
     24         vwstd=(pw.this.transact_total2 [/](http://localhost:8888/) pw.this.volume - pw.this.vwap**2)**0.5
     25     ).with_columns(
     26         bollinger_upper=pw.this.vwap + 2 * pw.this.vwstd,
     27         bollinger_lower=pw.this.vwap - 2 * pw.this.vwstd
     28     )
     29 )
     31 minute_1_stats = (
     32     data.windowby(
     33         pw.this.t,
   (...)
     44     .with_columns(vwap=pw.this.transact_total [/](http://localhost:8888/) pw.this.volume)
     45 )

File [~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/internals/trace.py:129](http://localhost:8888/lab/workspaces/~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/internals/trace.py#line=128), in trace_user_frame.<locals>._pathway_trace_marker(*args, **kwargs)
    127     return func(*args, **kwargs)
    128 except Exception as e:
--> 129     _reraise_with_user_frame(e)

File [~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/internals/trace.py:109](http://localhost:8888/lab/workspaces/~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/internals/trace.py#line=108), in _reraise_with_user_frame(e, trace)
    106 if user_frame is not None:
    107     add_pathway_trace_note(e, user_frame)
--> 109 raise e

File [~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/internals/desugaring.py:341](http://localhost:8888/lab/workspaces/~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/internals/desugaring.py#line=340), in desugar.<locals>.wrapper(*args, **kwargs)
    334     args = tuple(
    335         desugaring_context._desugaring.eval_expression(arg) for arg in args
    336     )
    337     kwargs = {
    338         key: desugaring_context._desugaring.eval_expression(value)
    339         for key, value in kwargs.items()
    340     }
--> 341 return func(*args, **kwargs)

File [~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/internals/arg_handlers.py:20](http://localhost:8888/lab/workspaces/~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/internals/arg_handlers.py#line=19), in arg_handler.<locals>.wrapper.<locals>.inner(*args, **kwargs)
     17 @wraps(func)
     18 def inner(*args, **kwargs):
     19     args, kwargs = handler(*args, **kwargs)
---> 20     return func(*args, **kwargs)

File [~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/internals/arg_handlers.py:20](http://localhost:8888/lab/workspaces/~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/internals/arg_handlers.py#line=19), in arg_handler.<locals>.wrapper.<locals>.inner(*args, **kwargs)
     17 @wraps(func)
     18 def inner(*args, **kwargs):
     19     args, kwargs = handler(*args, **kwargs)
---> 20     return func(*args, **kwargs)

File [~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/internals/runtime_type_check.py:19](http://localhost:8888/lab/workspaces/~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/internals/runtime_type_check.py#line=18), in check_arg_types.<locals>.with_type_validation(*args, **kwargs)
     14 """Hides beartype dependency by reraising beartype exception as TypeError.
     15 
     16 Should not be needed after resolving https://github.com/beartype/beartype/issues/234
     17 """
     18 try:
---> 19     return beartype.beartype(f)(*args, **kwargs)
     20 except beartype.roar.BeartypeCallHintParamViolation as e:
     21     raise TypeError(e) from None

File <@beartype(pathway.stdlib.temporal._window.windowby) at 0x14cdad6c0>:108, in windowby(__beartype_func, __beartype_conf, __beartype_get_violation, __beartype_object_140551755614240, __beartype_object_140551766098816, __beartype_object_140551790108176, __beartype_object_5625875136, __beartype_object_5638154688, __beartype_object_140551766513840, *args, **kwargs)

File ~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/stdlib/temporal/_window.py:910, in windowby(self, time_expr, window, behavior, instance)
    858 @trace_user_frame
    859 @desugar
    860 @arg_handler(handler=shard_deprecation)
   (...)
    869     instance: pw.ColumnExpression | None = None,
    870 ) -> pw.GroupedTable:
    871     """
    872     Create a GroupedTable by windowing the table (based on `expr` and `window`),
    873     optionally with `instance` argument.
   (...)
    908     1        | 1     | 16    | 2
    909     """
--> 910     return window._apply(self, time_expr, behavior, instance)

File [~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/internals/runtime_type_check.py:19](http://localhost:8888/lab/workspaces/~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/internals/runtime_type_check.py#line=18), in check_arg_types.<locals>.with_type_validation(*args, **kwargs)
     14 """Hides beartype dependency by reraising beartype exception as TypeError.
     15 
     16 Should not be needed after resolving https://github.com/beartype/beartype/issues/234
     17 """
     18 try:
---> 19     return beartype.beartype(f)(*args, **kwargs)
     20 except beartype.roar.BeartypeCallHintParamViolation as e:
     21     raise TypeError(e) from None

File <@beartype(pathway.stdlib.temporal._window._SlidingWindow._apply) at 0x14cdac180>:98, in _apply(__beartype_func, __beartype_conf, __beartype_get_violation, __beartype_object_140551755614240, __beartype_object_140551766098816, __beartype_object_5625875136, __beartype_object_5638154688, __beartype_object_140551766513840, *args, **kwargs)

File ~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/stdlib/temporal/_window.py:340, in _SlidingWindow._apply(self, table, key, behavior, instance)
    332 @check_arg_types
    333 def _apply(
    334     self,
   (...)
    338     instance: pw.ColumnExpression | None,
    339 ) -> pw.GroupedTable:
--> 340     check_joint_types(
    341         {
    342             "time_expr": (key, TimeEventType),
    343             "window.hop": (self.hop, IntervalType),
    344             "window.duration": (self.duration, IntervalType),
    345             "window.origin": (self.origin, TimeEventType),
    346         }
    347     )
    349     key_dtype = eval_type(key)
    350     assign_windows = self._window_assignment_function(key_dtype)

File [~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/stdlib/temporal/utils.py:79](http://localhost:8888/lab/workspaces/~/miniconda3/envs/mixtral_ollama/lib/python3.11/site-packages/pathway/stdlib/temporal/utils.py#line=78), in check_joint_types(parameters)
     75 else:
     76     expected_types_string = " or ".join(
     77         repr(tuple(ex_types.values())) for ex_types in expected_types
     78     )
---> 79     raise TypeError(
     80         f"Arguments ({', '.join(parameters.keys())}) have to be of types "
     81         + f"{expected_types_string} but are of types {tuple(types.values())}."
     82     )

TypeError: Arguments (time_expr, window.hop, window.duration) have to be of types (INT, INT, INT) or (FLOAT, FLOAT, FLOAT) or (DATE_TIME_NAIVE, DURATION, DURATION) or (DATE_TIME_UTC, DURATION, DURATION) but are of types (INT, DURATION, DURATION).
Occurred here:
    Line: .windowby(
    File: [/var/folders/2l/p4vvj_3j3fq2h1l38s2dgtq00000gn/T/ipykernel_9280/4248455004.py:3](http://localhost:8888/var/folders/2l/p4vvj_3j3fq2h1l38s2dgtq00000gn/T/ipykernel_9280/4248455004.py#line=2)
~~~

What did you expect to happen?

A tutorial up to date ;)

Version

0.8.2

Docker Versions (if used)

No response

OS

MacOS

On which CPU architecture did you run Pathway?

x86-64

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.