Coder Social home page Coder Social logo

Comments (10)

dxtrous avatar dxtrous commented on June 9, 2024

Looks like one of the pipeline steps has introduced spurious UTF escaping of your characters. Easily fixable but likely to be an annoyance.

To help pin-point the cause, could you replace the pw.io.csv.write line by pw.debug.compute_and_print(documents), look at your terminal output, and see if the problem persists?

from pathway.

abdul756 avatar abdul756 commented on June 9, 2024

Yeah I will try and update you

from pathway.

abdul756 avatar abdul756 commented on June 9, 2024

ﻣﻣﻠﻛﺔ\n\nﺗﻛون\n\nاﻟﺗﻲ\n\nاﻟدوﻟﯾﺔ\n\nواﻻﺗﻔﺎﻗﯾﺎت\n\nواﻟﻣﻌﺎھدات\n\nاﻷﻧظﻣﺔ\n\nﺑﮫ\n\nﺗﻘﺿﻲ\n\nﻣﺎ\n\nﻣراﻋﺎة\n\nﻣﻊ\n\nﻋﻠﯾﮭم.\n\nأو\n\nﻣﻧﮭم\n\nﻛﺎﻧت ﺳواءً\n\nﺗﻘﺎم\n\nاﻟﺗﻲ\n\nاﻟدﻋﺎوى\n\nﻓﻲ ،\n\nﺟﻧﺎﺋﯾﺔ\n\nﻏﯾر\n\nﻣﺎﻟﯾﺔ\n\nﻗﺿﺎﯾﺎ\n\nﻓﻲ\n\nاﻟﻘﺿﺎﺋﯾﺔ\n\nاﻟﺗﻛﺎﻟﯾف\n\nاﺳﺗﺣﻘﺎق\n\nوﻗت\n\nواﻟﻣوﻗوﻓون\n\nاﻟﻣﺳﺟوﻧون\n\n.1\n\nﻋﻣل.\n\nﻋﻘود\n\nﻋن\n\nاﻟﻧﺎﺷﺋﺔ\n\nﺑﻣﺳﺗﺣﻘﺎﺗﮭم\n\n؛ ﻟﻠﻣطﺎﻟﺑﺔ\n\nﻋﻧﮭم\n\nواﻟﻣﺳﺗﺣﻘون\n\nﻣﻧﮫ\n\nواﻟﻣﺳﺗﺛﻧون\n\nاﻟﻌﻣل\n\nﺑﻧظﺎم\n\nاﻟﻣﺷﻣوﻟون\n\nاﻟﻌﻣﺎل\n\n.2\n\nاﻟﺣﻛوﻣﯾﺔ.\n\nواﻷﺟﮭزة\n\nاﻟوزارات\n\n.3\n\nﺑذﻟك.\n\nاﻟﺧﺎﺻﺔ\n\nواﻟﻘواﻋد\n\nاﻹﺟراءات\n\nاﻟﻼﺋﺣﺔ\n\nوﺗﺣدد\n\nﻋﺷرة\n\nاﻟﺛﺎﻣﻧﺔ\n\nاﻟﻘﺿﺎﺋﯾﺔ.\n\nاﻟﺗﻛﺎﻟﯾف\n\nﺑدﻓﻊ ﻋﻠﯾﮫ\n\nاﻟﻣﺣﻛوم\n\nﻓﯾﻠزم\n\nاﻟﻘﺿﺎﺋﯾﺔ\n\nاﻟﺗﻛﺎﻟﯾف\n\nﻣن\n\nاﻟﻣُﻌﻔﻰ\n\nﻟﻣﺻﻠﺣﺔ\n\nاﻟدﻋوى\n\nﻓﻲ ﺣﻛم\n\nﺻدر\n\nإذا\n\n(،\n\nﻋﺷرة\n\nاﻟﺳﺎﺑﻌﺔ\n\nاﻟﻣﺎدة )\n\nﺑﮫ\n\nﺗﻘﺿﻲ\n\nﻣﺎ\n\nﻣراﻋﺎة\n\nﻣﻊ\n\nاﻟرد.\n\nﻣﺳوﻏﺎت\n\nﺗواﻓرت\n\nإذا\n\nِھﺎ\n\nوردّ\n\nﻋﻠﯾﮫ.\n\nواﻹﺷراف\n\nﻋﻣﻠﮫ\n\nاﻟﻘﺿﺎﺋﯾﺔ ،\n\nإﺟراءات\n\nﻋﺷرة\n\nاﻟﺗﺎﺳﻌﺔ\n\nاﻟﺳﻌودي.\n\nاﻟﻣرﻛزي\n\nاﻟﺑﻧك\n\nﻟدى\n\nاﻟﻣﺎﻟﯾﺔ\n\nوزارة\n\nﺟﺎري\n\nﺣﺳﺎب ﻓﻲ\n\nاﻟﻣﺣﺻﻠﺔ\n\nاﻟﻘﺿﺎﺋﯾﺔ\n\nاﻟﺗﻛﺎﻟﯾف\n\nﻣﺑﺎﻟﻎ\n\nﺗودع\n\nاﻟﻌﺷرون\n\nاﻟﺗﻛﺎﻟﯾف\n\nﺑﺗﺣﺻﯾل\n\nاﻟطﻠب -\n\nإﻟﯾﮭﺎ\n\nاﻟﻣﻘدم\n\nأو ،\n\nاﻟدﻋوى\n\nإﻟﯾﮭﺎ\n\nاﻟﻣرﻓوع\n\nاﻟﻣﺣﻛﻣﺔ\n\nﻓﻲ -\n\nاﻟﻣﺧﺗﺻﺔ\n\nاﻹدارة\n\nﻣﻧﮫ\n\nﺑﻘرار\n\nاﻟﻌدل\n\nوزﯾر\n\nﯾﺣدد\n\nواﻟﻌﺷرون\n\nاﻟﺣﺎدﯾﺔ\n\nوﻗواﻋد\n\nﻟﮫ\n\nاﻟﺗراﺧﯾص\n\nأﺣﻛﺎم\n\nاﻟﻼﺋﺣﺔ\n\nوﺗﺣدد\n\nاﻟﻧظﺎم.\n\nﻟﺗطﺑﯾﻖ\n\nاﻟﻣﺳﺎﻧدة\n\nﺑﺎﻷﻋﻣﺎل\n\nﺑﺎﻟﻘﯾﺎم\n\nاﻟﺧﺎص\n\nﻟﻠﻘطﺎع\n\nاﻟﺗرﺧﯾص\n\nاﻟﻌدل\n\nﻟوزﯾر\n\nواﻟﻌﺷرون\n\nاﻟﺛﺎﻧﯾﺔ\n\nاﻟوزراء.\n\nﻣﺟﻠس ﻣن\n\nﺑﻘرار\n\nوﺗﺻدر\n\nاﻟﻧظﺎم ،\n\nﺻدور\n\nﺗﺎرﯾﺦ\n\nﻣن\n\nﯾوﻣﺎً\n\nﺳﺗﯾن (\n\nﺧﻼل )\n\nاﻟﻼﺋﺣﺔ\n\nاﻟﻌدل\n\nوزارة\n\nﺗﻌد\n\nواﻟﻌﺷرون\n\nاﻟﺛﺎﻟﺛﺔ\n\nاﻟرﺳﻣﯾﺔ.\n\nاﻟﺟرﯾدة\n\nﻓﻲ\n\nﻧﺷره\n\nﺗﺎرﯾﺦ\n\nﻣن\n\nﯾوﻣﺎً\n\nوﺛﻣﺎﻧﯾن (\n\nﻣﺎﺋﺔ\n\nﺑﻌد )\n\nﺑﺎﻟﻧظﺎم\n\nﯾﻌﻣل\n\n2021 2021\n\nاﻟﻌﺪل اﻟﻌﺪل\n\nﻟﻮزارة ﻟﻮزارة\n\n© ©\n\nﻣﺤﻔﻮﻇﺔ ﻣﺤﻔﻮﻇﺔ\n\nاﻟﺤﻘﻮق اﻟﺤﻘﻮق\n\nﺟﻤﻴﻊ ﺟﻤﻴﻊ', pw.Json({'filetype': 'application/pdf', 'languages': ['eng'], 'links': [], 'page_number': 4})),)

when i made the mode static and used pw.debug.compute_and_print(documents) the output in terminal is able to show in arabic , so how to fix this in streaming mode while using pw.io.csv.write

from pathway.

berkecanrizai avatar berkecanrizai commented on June 9, 2024

Hey @abdul756 , streaming mode is only usable when you are running the app with pw.run().

In this case, seems like parser is working ok. If you want to dump the text into some file and keep the app running (so that when a new content or new file arrives, new data is put into your csv file), you can run your code with streaming mode enabled, you can achieve this with the addition of pw.run() at the end of the code.

So, it will look as:

documents = folder.select(text=parser(pw.this.data))
pw.io.csv.write(documents, "output_stream_en_7.csv")

pw.run()

You can run this in notebook or in regular python file.

This will start the pipeline that will keep running until you close. After running the pw.run, you will see the output file being created.

If, you are interested in taking a dump for one time in a static manner, you can run the following:

df = pw.debug.table_to_pandas(documents)
df.to_csv("outputs_en.csv")

this will put the content into csv file. In this case, we take data into Pandas DataFrame, then write it to a file. This one doesn't need a pw.run() since it is statically running for single time.

from pathway.

abdul756 avatar abdul756 commented on June 9, 2024

In streaming mode the output is not coming its giving only unicode characters, you can refer the file i attached with the issue

from pathway.

berkecanrizai avatar berkecanrizai commented on June 9, 2024

In streaming mode the output is not coming its giving only unicode characters, you can refer the file i attached with the issue

Yes, I just replicated the issue with another file. The static mode works ok (refer to the df.to_csv snippet above), and internally, the Pathway table also stores the data correctly in Arabic characters, however writing them with csv casts them to Unicode.
So, you can use it in your app without any issues, in the meantime, we are investigating. Will keep updated.

from pathway.

abdul756 avatar abdul756 commented on June 9, 2024

Sure thanks

from pathway.

KamilPiechowiak avatar KamilPiechowiak commented on June 9, 2024

@abdul756 thanks for reporting this. The problem will be fixed in the next release (it'll be released this week).

from pathway.

KamilPiechowiak avatar KamilPiechowiak commented on June 9, 2024

Hey @abdul756,
Today Pathway v0.11.0 has been released. Your problem should be fixed now. However nobody in the team knows Arabic language. Could you update your pathway version and confirm that it is a satisfactory solution to your problem?

from pathway.

abdul756 avatar abdul756 commented on June 9, 2024

I will test and update

from pathway.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.