Coder Social home page Coder Social logo

justext's Introduction

jusText
=======

Prerequisites:
    lxml (>=2.2.4)

To install the package type:
    python setup.py install

For usage information see:
    justext --help

More information can be found online at:
    <http://code.google.com/p/justext/>

justext's People

justext's Issues

The online demo throws unicode error on some pages

Steps to reproduce:

1. go to http://nlp.fi.muni.cz/projekty/justext/
2. enter http://clawfinger.tym.sk/indexx.php into the URL field

This results in the following error:

Traceback (most recent call last):
  File "/nlp/projekty/justext/public_html/index.cgi", line 106, in <module>
    no_headings, enc_errors='ignore')
  File "/nlp/projekty/justext/justext-svn/justext/core.py", line 448, in justext
    default_encoding=default_encoding, enc_errors=enc_errors)
  File "/nlp/projekty/justext/justext-svn/justext/core.py", line 179, in preprocess
    add_kw_tags(root)
  File "/nlp/projekty/justext/justext-svn/justext/core.py", line 136, in add_kw_tags
    if node.text and node.tag not in (lxml.etree.Comment, lxml.etree.ProcessingInstruction):
  File "lxml.etree.pyx", line 897, in lxml.etree._Element.text.__get__ (src/lxml/lxml.etree.c:37022)
  File "apihelpers.pxi", line 691, in lxml.etree._collectText (src/lxml/lxml.etree.c:16626)
  File "apihelpers.pxi", line 1344, in lxml.etree.funicode (src/lxml/lxml.etree.c:21864)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 14: invalid 
continuation byte

Attaching the HTML source of http://clawfinger.tym.sk/indexx.php

Original issue reported on code.google.com by [email protected] on 11 Nov 2012 at 10:41

Attachments:

Future of jusText

Hi,
I have a fork of your library at GitHub https://github.com/miso-belica/jusText 
with some changes. For example I solved issue #2 by removing whole function 
add_kw_tags 
https://github.com/miso-belica/jusText/commit/e3cdc04b4599d8af08ef08aeea764ba562
9e67c3 that is useless in my version of code. And there is more changes 
https://github.com/miso-belica/jusText/commits/master.

Yesterday I was asked if I can publish my version at PyPi 
https://github.com/miso-belica/jusText/issues/2. I can, but I don't want to 
block this possibility for you if you decide to publish your version. Maybe you 
can check out my version and merge changes into your repo. Or, IMHO better 
option, you can move development of this project to GitHub and I can transfer 
my version of project to your GitHub account if you want.

Original issue reported on code.google.com by [email protected] on 18 Aug 2013 at 10:00

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.