Coder Social home page Coder Social logo

bgweb's People

bgweb's Issues

title html not unescaped properly

What steps will reproduce the problem?
1. some stupid titles will cause the table not display correctly
2. some stupid titles are very very long, should be limited
3.

What is the expected output? What do you see instead?
stupid titles like this:
         厦门大学2010年研究生迎新晚会10月17日晚七点隆重上演,让我们.....
stupid titles which contain html tags :
         <a>what a ,<!--...

Please use labels and text to provide additional information.
title html should be unescaped properly
title length should be limited.


Original issue reported on code.google.com by zinking3 on 14 Oct 2010 at 2:17

status in_normal sites

What steps will reproduce the problem?
1. some sites with status code 3 will not be spidered
2. caused by historic reasons
3.

What is the expected output? What do you see instead?
all sites correctly displayed.

Please use labels and text to provide additional information.
all parsing engines are turning to regular expression parsers
these problems will be fixed.

Original issue reported on code.google.com by zinking3 on 19 Oct 2010 at 1:20

GBK encoding is still problem.

What steps will reproduce the problem?
1. parsing a gb2312 encoded page
2. render the page using corresponding GBK
3. generally OK, but still not function ok on some special characters.

What is the expected output? What do you see instead?
no more errors on the detail page
currently 5.2% error occured during requests.


Please use labels and text to provide additional information.
seen a GBK character filter package on mico-blog project
use this filter to filter all encoding and convert encoding all to UTF-8



Original issue reported on code.google.com by zinking3 on 14 Oct 2010 at 1:54

site size need to be reduced for faster view.

What steps will reproduce the problem?
1. I guess the script size is too large now
2.
3.

What is the expected output? What do you see instead?


Please use labels and text to provide additional information.

Combining combined-en.js...
  Running yuicompressor... 72791 bytes
Combining combined-content.css...
  Running yuicompressor... 14959 bytes
Combining combined-content.js...
  Running yuicompressor... 114784 byte
Combining combined-frame-style.css...
  Running yuicompressor... 9070 bytes
Combining combined-jquery-toolkit.js..
  Running yuicompressor... 96676 bytes
Combining content-xlayout.css...
  Running yuicompressor... 14411 bytes

Original issue reported on code.google.com by zinking3 on 20 Oct 2010 at 12:59

so long a link for detailed view

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?
should be shortned

Please use labels and text to provide additional information.


Original issue reported on code.google.com by zinking3 on 14 Oct 2010 at 2:27

Revision to current version

1. Ad System
2. Ad design
3. debug views

4. A rewrite of current verison using nonrel


Original issue reported on code.google.com by zinking3 on 1 Sep 2010 at 6:31

parse engine defect

What steps will reproduce the problem?
1. buaa parse configuration
2. harvestlink
3. no content actually generated

What is the expected output? What do you see instead?
failed actually but engine did not rpoduce error log

DEBUG    2010-10-19 22:39:37,743 bbs_parser.py:238] Successfully parsing 
school:buaa costing 309 milliseconds;


Original issue reported on code.google.com by zinking3 on 19 Oct 2010 at 2:52

CRON debug report is not consistent

What steps will reproduce the problem?
1. the log errors report the ZJU site structure changed
2. but the site is keeping update, merely temporarily unavailable
3.
not reproduced

What is the expected output? What do you see instead?

could not open ZJU....

failed to parse bbs by RE parser; schoolname= zju
failed to parse required content SITE structure changed; schoolname= zju




Please use labels and text to provide additional information.


Original issue reported on code.google.com by zinking3 on 20 Oct 2010 at 8:11

SpiderCron is still under error

What steps will reproduce the problem?
1. croning links from various bbses
2. not reproducing all the time, because some bbs sytem go down casually
3. generally are deadline exceeded error. some casual request aborted.

What is the expected output? What do you see instead?
no more cron failures, links can be updated in time.
29% cron failures at current.

Please use labels and text to provide additional information.
find algorithms to test the health of target server, and harvestlinks 
accordingly


Original issue reported on code.google.com by zinking3 on 14 Oct 2010 at 2:01

mblog engine problems

What steps will reproduce the problem?
1. failed to login to specific mblog
2.
3.

What is the expected output? What do you see instead?
AttributeError: 'Microblog' object has no attribute 'sinahttp'

Please use labels and text to provide additional information.

  File "/base/data/home/apps/bbstop10/2.344140536702112663/pageharvest/bbs_parser.py", line 123, in login_to_sina_microblog
    self.sinacookie = Cookie.SimpleCookie(self.sinahttp.headers.get('set-cookie', ''));
AttributeError: 'Microblog' object has no attribute 'sinahttp'

Original issue reported on code.google.com by zinking3 on 20 Oct 2010 at 1:32

int object not iterateable

What steps will reproduce the problem?
1. parsing zju pic board probably
2.
3.

What is the expected output? What do you see instead?
do not understand where is the problem

Traceback (most recent call last):
  File "F:\WorkSpace\zihigh\imgwall\pageharvest\img_parser.py", li
    tn = self.saveParsedImagePage( id,purl,title,c,cc);
  File "F:\WorkSpace\zihigh\imgwall\pageharvest\img_parser.py", li
    (imglist, tn ) = self.parseImageConifgedPage( c, cc, purl );

Please use labels and text to provide additional information.


Original issue reported on code.google.com by zinking3 on 8 Nov 2010 at 12:57

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.