Comments (10)
Looks like one of the pipeline steps has introduced spurious UTF escaping of your characters. Easily fixable but likely to be an annoyance.
To help pin-point the cause, could you replace the pw.io.csv.write
line by pw.debug.compute_and_print(documents)
, look at your terminal output, and see if the problem persists?
from pathway.
Yeah I will try and update you
from pathway.
ﻣﻣﻠﻛﺔ\n\nﺗﻛون\n\nاﻟﺗﻲ\n\nاﻟدوﻟﯾﺔ\n\nواﻻﺗﻔﺎﻗﯾﺎت\n\nواﻟﻣﻌﺎھدات\n\nاﻷﻧظﻣﺔ\n\nﺑﮫ\n\nﺗﻘﺿﻲ\n\nﻣﺎ\n\nﻣراﻋﺎة\n\nﻣﻊ\n\nﻋﻠﯾﮭم.\n\nأو\n\nﻣﻧﮭم\n\nﻛﺎﻧت ﺳواءً\n\nﺗﻘﺎم\n\nاﻟﺗﻲ\n\nاﻟدﻋﺎوى\n\nﻓﻲ ،\n\nﺟﻧﺎﺋﯾﺔ\n\nﻏﯾر\n\nﻣﺎﻟﯾﺔ\n\nﻗﺿﺎﯾﺎ\n\nﻓﻲ\n\nاﻟﻘﺿﺎﺋﯾﺔ\n\nاﻟﺗﻛﺎﻟﯾف\n\nاﺳﺗﺣﻘﺎق\n\nوﻗت\n\nواﻟﻣوﻗوﻓون\n\nاﻟﻣﺳﺟوﻧون\n\n.1\n\nﻋﻣل.\n\nﻋﻘود\n\nﻋن\n\nاﻟﻧﺎﺷﺋﺔ\n\nﺑﻣﺳﺗﺣﻘﺎﺗﮭم\n\n؛ ﻟﻠﻣطﺎﻟﺑﺔ\n\nﻋﻧﮭم\n\nواﻟﻣﺳﺗﺣﻘون\n\nﻣﻧﮫ\n\nواﻟﻣﺳﺗﺛﻧون\n\nاﻟﻌﻣل\n\nﺑﻧظﺎم\n\nاﻟﻣﺷﻣوﻟون\n\nاﻟﻌﻣﺎل\n\n.2\n\nاﻟﺣﻛوﻣﯾﺔ.\n\nواﻷﺟﮭزة\n\nاﻟوزارات\n\n.3\n\nﺑذﻟك.\n\nاﻟﺧﺎﺻﺔ\n\nواﻟﻘواﻋد\n\nاﻹﺟراءات\n\nاﻟﻼﺋﺣﺔ\n\nوﺗﺣدد\n\nﻋﺷرة\n\nاﻟﺛﺎﻣﻧﺔ\n\nاﻟﻘﺿﺎﺋﯾﺔ.\n\nاﻟﺗﻛﺎﻟﯾف\n\nﺑدﻓﻊ ﻋﻠﯾﮫ\n\nاﻟﻣﺣﻛوم\n\nﻓﯾﻠزم\n\nاﻟﻘﺿﺎﺋﯾﺔ\n\nاﻟﺗﻛﺎﻟﯾف\n\nﻣن\n\nاﻟﻣُﻌﻔﻰ\n\nﻟﻣﺻﻠﺣﺔ\n\nاﻟدﻋوى\n\nﻓﻲ ﺣﻛم\n\nﺻدر\n\nإذا\n\n(،\n\nﻋﺷرة\n\nاﻟﺳﺎﺑﻌﺔ\n\nاﻟﻣﺎدة )\n\nﺑﮫ\n\nﺗﻘﺿﻲ\n\nﻣﺎ\n\nﻣراﻋﺎة\n\nﻣﻊ\n\nاﻟرد.\n\nﻣﺳوﻏﺎت\n\nﺗواﻓرت\n\nإذا\n\nِھﺎ\n\nوردّ\n\nﻋﻠﯾﮫ.\n\nواﻹﺷراف\n\nﻋﻣﻠﮫ\n\nاﻟﻘﺿﺎﺋﯾﺔ ،\n\nإﺟراءات\n\nﻋﺷرة\n\nاﻟﺗﺎﺳﻌﺔ\n\nاﻟﺳﻌودي.\n\nاﻟﻣرﻛزي\n\nاﻟﺑﻧك\n\nﻟدى\n\nاﻟﻣﺎﻟﯾﺔ\n\nوزارة\n\nﺟﺎري\n\nﺣﺳﺎب ﻓﻲ\n\nاﻟﻣﺣﺻﻠﺔ\n\nاﻟﻘﺿﺎﺋﯾﺔ\n\nاﻟﺗﻛﺎﻟﯾف\n\nﻣﺑﺎﻟﻎ\n\nﺗودع\n\nاﻟﻌﺷرون\n\nاﻟﺗﻛﺎﻟﯾف\n\nﺑﺗﺣﺻﯾل\n\nاﻟطﻠب -\n\nإﻟﯾﮭﺎ\n\nاﻟﻣﻘدم\n\nأو ،\n\nاﻟدﻋوى\n\nإﻟﯾﮭﺎ\n\nاﻟﻣرﻓوع\n\nاﻟﻣﺣﻛﻣﺔ\n\nﻓﻲ -\n\nاﻟﻣﺧﺗﺻﺔ\n\nاﻹدارة\n\nﻣﻧﮫ\n\nﺑﻘرار\n\nاﻟﻌدل\n\nوزﯾر\n\nﯾﺣدد\n\nواﻟﻌﺷرون\n\nاﻟﺣﺎدﯾﺔ\n\nوﻗواﻋد\n\nﻟﮫ\n\nاﻟﺗراﺧﯾص\n\nأﺣﻛﺎم\n\nاﻟﻼﺋﺣﺔ\n\nوﺗﺣدد\n\nاﻟﻧظﺎم.\n\nﻟﺗطﺑﯾﻖ\n\nاﻟﻣﺳﺎﻧدة\n\nﺑﺎﻷﻋﻣﺎل\n\nﺑﺎﻟﻘﯾﺎم\n\nاﻟﺧﺎص\n\nﻟﻠﻘطﺎع\n\nاﻟﺗرﺧﯾص\n\nاﻟﻌدل\n\nﻟوزﯾر\n\nواﻟﻌﺷرون\n\nاﻟﺛﺎﻧﯾﺔ\n\nاﻟوزراء.\n\nﻣﺟﻠس ﻣن\n\nﺑﻘرار\n\nوﺗﺻدر\n\nاﻟﻧظﺎم ،\n\nﺻدور\n\nﺗﺎرﯾﺦ\n\nﻣن\n\nﯾوﻣﺎً\n\nﺳﺗﯾن (\n\nﺧﻼل )\n\nاﻟﻼﺋﺣﺔ\n\nاﻟﻌدل\n\nوزارة\n\nﺗﻌد\n\nواﻟﻌﺷرون\n\nاﻟﺛﺎﻟﺛﺔ\n\nاﻟرﺳﻣﯾﺔ.\n\nاﻟﺟرﯾدة\n\nﻓﻲ\n\nﻧﺷره\n\nﺗﺎرﯾﺦ\n\nﻣن\n\nﯾوﻣﺎً\n\nوﺛﻣﺎﻧﯾن (\n\nﻣﺎﺋﺔ\n\nﺑﻌد )\n\nﺑﺎﻟﻧظﺎم\n\nﯾﻌﻣل\n\n2021 2021\n\nاﻟﻌﺪل اﻟﻌﺪل\n\nﻟﻮزارة ﻟﻮزارة\n\n© ©\n\nﻣﺤﻔﻮﻇﺔ ﻣﺤﻔﻮﻇﺔ\n\nاﻟﺤﻘﻮق اﻟﺤﻘﻮق\n\nﺟﻤﻴﻊ ﺟﻤﻴﻊ', pw.Json({'filetype': 'application/pdf', 'languages': ['eng'], 'links': [], 'page_number': 4})),)
when i made the mode static and used pw.debug.compute_and_print(documents) the output in terminal is able to show in arabic , so how to fix this in streaming mode while using pw.io.csv.write
from pathway.
Hey @abdul756 , streaming mode is only usable when you are running the app with pw.run()
.
In this case, seems like parser is working ok. If you want to dump the text into some file and keep the app running (so that when a new content or new file arrives, new data is put into your csv file), you can run your code with streaming mode enabled, you can achieve this with the addition of pw.run()
at the end of the code.
So, it will look as:
documents = folder.select(text=parser(pw.this.data))
pw.io.csv.write(documents, "output_stream_en_7.csv")
pw.run()
You can run this in notebook or in regular python file.
This will start the pipeline that will keep running until you close. After running the pw.run, you will see the output file being created.
If, you are interested in taking a dump for one time in a static manner, you can run the following:
df = pw.debug.table_to_pandas(documents)
df.to_csv("outputs_en.csv")
this will put the content into csv file. In this case, we take data into Pandas DataFrame, then write it to a file. This one doesn't need a pw.run()
since it is statically running for single time.
from pathway.
In streaming mode the output is not coming its giving only unicode characters, you can refer the file i attached with the issue
from pathway.
In streaming mode the output is not coming its giving only unicode characters, you can refer the file i attached with the issue
Yes, I just replicated the issue with another file. The static mode works ok (refer to the df.to_csv
snippet above), and internally, the Pathway table also stores the data correctly in Arabic characters, however writing them with csv casts them to Unicode.
So, you can use it in your app without any issues, in the meantime, we are investigating. Will keep updated.
from pathway.
Sure thanks
from pathway.
@abdul756 thanks for reporting this. The problem will be fixed in the next release (it'll be released this week).
from pathway.
Hey @abdul756,
Today Pathway v0.11.0 has been released. Your problem should be fixed now. However nobody in the team knows Arabic language. Could you update your pathway version and confirm that it is a satisfactory solution to your problem?
from pathway.
I will test and update
from pathway.
Related Issues (20)
- [Bug]: Json type does not behave like a transparent wrapper for the value HOT 1
- Allow sending columns with raw bytes in the python connector
- Add `first_value`/`last_value` reducers. HOT 3
- Static table import (table_from_...) should not be under `pw.debug`
- [Bug]: "AttributeError: 'DataFrame' object has no attribute 'map'" when running Live Data Jupyter notebook. HOT 4
- [Bug]: Reproduction of Twitter's Custom Python Connector code encours Auth issues HOT 1
- Adjust examples using Twitter API HOT 1
- support datetime + formatting in connectors
- How to Integrate Replicate API using LiteLLMEmbedder and LiteLLMChat HOT 2
- getting error in "Streaming ETL pipelines" showcase HOT 5
- Cannot import name 'default_vector_document_index' from 'pathway.stdlib.indexing' (/usr/local/lib/python3.10/dist-packages/pathway/stdlib/indexing/__init__.py) HOT 4
- ImportError: cannot import name 'DataIndex' from 'pathway.stdlib.indexing' HOT 3
- Private Rag Example not working HOT 5
- [Bug]: *Table desugaring + *.ix(pw.this) desugaring fails
- [QUESTION] How to deploy a Pathway AirByte streaming ETL microservice to Google Cloud run? HOT 7
- [Bug]: `pw.run()` never terminates when used with `multiprocessing`
- [Bug]: Error messages are not printed to screen when error happens on PATHWAY_PROCESS_ID > 0
- How to Embed each dict in jsonline format HOT 6
- Download files in parallel in gdrive connector
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pathway.