The judgmental from frabcus

Improve layout on smaller screens

From fjmd1 (Francis Davey ):
@Judgmentals It would be great to add a contact email so I can ask you to remove the annoying white space at the LHS

What's the name of the technique for adding conditionals into your CSS so that the format can change depending on the page width? We can use that to remove the left nav column (which is a bit empty right now anyway) on narrower displays.

Missing indexes

These indexes don't exist right now:

NICC
UKFTT-HESC
UKFTT-TC
UKIAT
UKSSCSC

Link to legislation.gov.uk

Similar to issue 1, we could identify all mentions of Acts of Parliament and link to them on legislation.gov.uk.

Tricky enhancement: if "clause X" is mentioned in the same sentence or nearby, link to the correct clause, not just the whole Act.

Analyse and interpret BAILII links

Bailii's files contain lots of links. We should read them and process them:

if it is a link to another judgment on bailii, we should try to promote it to a link the relevant judgmental page (without duplicating the efforts of the crossreference linker);
if it is a link to some piece of locally stored legislation, we should try to promote it to a link to our preferred relevant legislation page;
if it is something else, it should probably be preserved.

Install feedparser

feedparser is needed for legislation.py. It's kludged in at the moment but should be properly installed.

best_filename fails for many files

Many (approx 1300) judgments aren't being processed since best_filename complains that there is 'no good citation'.

Remove control codes

Many judgments have random ASCII control codes in them. This causes problems for lxml (more precisely for the lxml version on the server) and results in judgments without any text (for example, http://www.judgmental.org.uk/judgments/_uk_cases_CAT_2005_11.html).

Deal with duplicate citation errors

A few judgments raise errors during the analysis stage complaining that the extracted citation is not unique.

This may be because of duplicated judgments, or because of bugs still remaining in the citation extraction code.

Tagging & categorising

It would be extremely useful if judgments were tagged with keywords and categories, as has been done with PCC complaints data here: http://complaints.pccwatch.co.uk/

So a user could, with ease, view all 'defamation' or 'copyright' cases and filter by judge name, date, court, party name etc.

Fix 'Cite as:' extraction

The extraction of neutral citations from a judgment is a little buggy and also needs a better strategy for extracting citations from titles of judgments where necessary.

Mark up ids of companies named in cases

Lots of cases mention corporations in their title, and perhaps in their body.

Sainsbury's Supermarkets Ltd, R (on the application of) v Wolverhampton City Council & Anor [2010] UKSC 20 (12 May 2010)
Gold Group Properties Ltd v BDW Trading Ltd [2010] EWHC 1632 (TCC) (01 July 2010)

We should get together with OpenCorporates to name match those, so you can easily find a list of cases about one company, and so you can hyperlink to more info about the company the other way.

Specify the open source license of the code

We don't have a license file at the moment, so it isn't licensed. Github doesn't have one by default, they even allow hosting of non-open source but "source visible" code.

Deal with really badly encoded files

Although we've gone to some length to deal with character encoding problems, some files simply declare their character encoding incorrectly and/or are not encoded with an encoding known to humans.

Annotations on each judgment using Disqus

We could just be lazy, and throw annotations on every judgment using Disqus.

Very quick and easy to do, some people will already have accounts, user experience is good. Will see if we get valuable comments.

Make case name obvious <h1> title on case pages

Raised by frabcus on-list: https://groups.google.com/d/msg/judgmental/esZDc1VEGv8/rUTVLMnfJLYJ

The obvious <h1> of case pages are currently the names of the courts, not the cases, which isn't very intuitive.

Investigate using annotateit.org for comments

From @okfn (okfn)
Mentioned to Nick Bull of @Judgmentals - http://annotateit.org/ (Still in beta). Could be interesting tool for [Judgmental].

Be a little bit dynamic

At present we need to rebuild the entire site every time we want to make a small modification to pages but not judgmentts, eg. the menu, title, footer, google analytics etc. This is obviously rubbish and we should probably instead generate a marked up html fragment for judgments and use some server side inclusion.

Make a sitemap.xml for Google

Worth doing perhaps to encourage it to index everything.

Put something in robots.txt

Would seem a shame not to be sarky there.

Incorrect dates for ScotHC

http://www.judgemental.org.uk/judgments/ScotHC/

Latest judgments listed as 2012. Definitely wrong, don't know why!

RSS feeds

Francis D really wants good RSS feeds of judgments.

So really ideally by court, by search term, and that kind of thing.

Full content RSS vital - so can actually read it all in news reader.

"How can I help" section

From @jpstacey (J-P Stacey)
@judgmentals worth a "How can I help?" section on your homepage?

Deal with changing filenames

Changes to the code/original judgments could result in some judgments being moved to different locations. We should add links or redirects in this case since someone may have linked to the old location.

This is probably unlikely to occur in practice, but it should be easy to take care of.

Design a better interface to the formatter

By now, the interface to the formatter provided by run.py has a set of command-line options that is becoming bewilderingly numerous and also annoying to use. But on the other hand, these options actually form only a small part of all the options one could want.

Chris and I feel that an interactive launcher would be more helpful. But what technology should it use? Here are the possibilities I can think of:

Simple text-based question-and-answer. Advantages: easy to code. Disadvantages: hard to navigate, annoying, ad hoc.
Curses or X-based thing. Advantages: nice. Disadvantages: hard to code, possibly restrictive for platforms.
HTML form. Advantages: easy to code the form itself, nice. Disadvantages: can't think of a really simple way of interfacing that with Python.

Any suggestions?

Scrape HUDOC

HUDOC is the database of judgments from the European Court of Human Rights.

http://www.echr.coe.int/ECHR/EN/Header/Case-Law/HUDOC/HUDOC+database/

There is no robots.txt, and no conditions of use that I can find.

Update: there is one : "The information and texts available on the Court’s site may be reproduced provided the source is acknowledged. Users should nevertheless be aware that certain information and texts may be protected under intellectual property law, in particular by copyright."

Register with Google Webmaster Tools

And up the crawl rate to the maximum. This will get us indexed well sooner.

Restore paragraph numbers

From @A_Ecclestone (Andrew Ecclestone)
@Judgmentals Why don't judgments in Judgmental have paragraph numbers? Isn't that needed for citation and intra-judgment references?

This is because the paragraph numbers are inside LI tags, which get stripped as part of our cleanup operation.

Browser toolbar to add judgments to judgmental

You'd go to a judgment on BAILII and press a button to add it to judgmental.

That wouldn't need screen scraping, but we need to consider the legal issues a bit more before doing it.

Add links to Civil Procedure Rules

From @gwire (Lee Maguire):
@judgmentals any plans to add relevant hyperlinks to [legislation.gov.uk or] the CPR references? http://bit.ly/iKQcno

Support citation capture (eg Zotero)

From @adreagui (Guillaume Adreani )
@Judgmentals Are you compatible with Zotero ? http://www.zotero.org

Usability feature: Zotero allows users to capture the citation of the page in one click. Not sure what's needed on our side to support this.

Judgments in Comic Sans

The Scottish Sheriff Court has issued some judgments in Comic Sans, eg:

http://judgmental.org.uk/judgments/ScotSC/2008/[2008]%20ScotSC%2034.html

This choice of font perhaps doesn't reflect the full might and majesty of the judicial process. Or at any rate looks a bit messy on our site.

It's probably easier to use !important in the CSS to override it than to find and remove instances of

.

Offer download of all the HTML of all the case law

May as well, for people who want to use it locally.

Search (using Google)

Add a search, use a Google Custom site: search.

That's easier and quicker to implement, and also makes sure we do all we can to get every judgment in Google.

Generate valid HTML!

Currently, the HTML we generate does not validate; validation is clearly a desirable aim.

Improve logging on failures

If we have to abort processing any file, for example in many cases if exceptions of various sorts are raised, a message is written to a log.

It is well worth increasing the quality of those messages to provide more information about what's going wrong.

Upgrade to Python 2.6?

Obviously, we're limited by what runs on the server. But it would be nice to take advantage of more modern Python features. Benefits include the following:

It would give us use of "multiprocessing" library for multithreading, thus speeding up on-server conversions;
We could employ the "with" idiom to safely and transparently open and close the database and the multiprocessing pool;
Um, anything else?

Speedup: best_filename

In convert.py, we (I) rather dumbly pass the court name to best_filename and then use Levenshtein to find the abbreviated name. This is done for every judgment.

We should put the abbreviated names into the courts table instead.

Google links case name to court index page, not case text

Raised by frabcus on-list: https://groups.google.com/d/msg/judgmental/esZDc1VEGv8/rUTVLMnfJLYJ

Not sure exactly why this happens, but might be the result of the crawler's logic given that we have the court name, not the case name, as the page's main <h1>.

Will have to see if the situation improves upon solving gh#39 (the title thing) as well as gh#21 (sitemap.xml).

Improve output page formats

We need to better utilise the metadata we have collected.

There are several things we have which we are not using at all:

crossreferences in to each page
crossreferences out of each page
the original bailii URL

And there are probably prettier and better ways to display the things we are using.

Once we have nice indexing pages (an index for each court, etc), we can use the metadata to generate URLs to the appropriate page.

What we need, really, is an improved HTML template for the page.

frabcus / judgmental Goto Github PK

judgmental's People

Contributors

Stargazers

Watchers

Forkers

judgmental's Issues

Recommend Projects

Recommend Topics

Recommend Org