Coder Social home page Coder Social logo

justicehub-in / justice-hub-docs Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 9.4 MB

Documentation platform for the Justice Hub

Home Page: https://docs.justicehub.in/

CSS 5.23% HTML 73.57% JavaScript 11.73% TeX 0.22% Jupyter Notebook 1.61% Shell 0.08% Python 0.20% R 3.61% SCSS 3.75%
docs hugo-academic justice legal open-data platform

justice-hub-docs's People

Contributors

apoorv74 avatar hackmd-deploy avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

justice-hub-docs's Issues

Create a format to curate datasets within an organisation

How to curate datasets within an organisation

Organisations that work across sectors and teams and don't necessarily maintain a data catalog, will have to collect information on what datasets have the potential to be made available on the Justice Hub. A few important fields to identify such datasets are:

  • Organisation
  • Dataset Title
  • Dataset description
  • Sector
  • How was the data sourced (RTI's, Web Scraping, etc.)
  • Availability of raw data (different from processed data)
  • Availability of data dictionary
  • Importance (How frequently is the dataset used for research use-cases)
  • Is the data still maintained
  • Maintainer email

Cleanup Docs

To track the status of things to do to clean-up the docs website before the alpha launch:

Homepage

  • Change language - Text should match the narrative at JusticeHub
  • Change notification content
  • Redirect Get in touch to JH Contact Page
  • Way to include substack

Nav Bar

  • Remove About
  • Remove Data
  • Remove Blog
  • Add substack
  • Redirect Contact to JH Contact

Resources

  • Update content

Contract Enforcement Litigation Data from District Courts | NIPFP

Data Accessibility Report

Links:


Data standardisation

  • Can the columns with type as HTML string be stored as separate tables
  • Marking columns as either directly sourced from the source or user-generated E.g : court_code , complexcode, day_pending, etc. in the data dictionary
  • Only 76 out of the total 86 columns are present in the data dictionary

PII variables present in file

  • petNameAdd
  • pet_adv
  • pet_name
  • petnameadArr
  • petparty_name
  • resNameAdd
  • res_adv
  • res_name
  • resparty_name

Data License

  • Mention license[s] under which the data is to be released on the Justice Hub

Other files required (if, available)

  • Raw Data
  • Data processing document

Questions

  • How does the dataset deal with empty values ?

Update partner curation page

  • Change URL from partner-curation to partners
  • Add status of onboarding
  • Add status of open data pledge
  • Share links to open data pledge
  • Add quotes from partners
  • Add social media links

Data points to curate

  • Parliament session wise questions for Law and Justice
  • High Court data released by Daksh

Data report | Judiciary Expenditure | CBGA

Data Accessibility Report

Links:


Files available

  • Data dictionary (Human readable dictionary of data contents)
  • Data License (How to use and share the data)
  • Raw Dataset (The original/first data provided)
  • Processed Dataset (Final data used in analysis)
  • Dataset README (A Human readable description of the data)
  • Citation (How you want your data to be cited)

Data Cleaning & Standardisation Report

  • Presence of PII's (Personally Identifiable Information)
  • Data to be uploaded is in a machine-readable format (CSV, JSON)

Other details

  • Data maintainer details

Comments

Data report | Judicial Vacancies in India | Vidhi

Data Accessibility Report

Links
Sample Dataset
Data Documentation
Data Dictionary

Files available

Files Status
Data dictionary (Human readable dictionary of data contents)
Data License (How to use and share the data)
Raw Dataset (The original/first data provided)
Processed Dataset (Final data used in analysis)
Dataset README (A Human readable description of the data)
Citation (How you want your data to be cited)

Data Cleaning & Standardisation Report

Issue Status
Data does not have any PII's (Personally Identifiable Information)
Data to be uploaded is in a machine-readable format (CSV, JSON)

Other details

  • Data maintainer details

Comments/Next Steps:

  • Please mention if raw data reports (PDF's - those scraped from the DoJ) are available ?
  • Mention the date of data collection/publication ? When was this dataset released on the JALDI portal
  • Share the process of updating the datasets for all levels of the Judiciary - Mention details like Frequency, methodology, etc
    • The district court dataset is available for 2017 and 2019, while the datasets for Supreme Court and High Court are only present for 2019, would you like to share any specific reasons for this ?
    • Would you like to share the datasets for every year and maintain them as individual files, or there will be one master for each court which will be updated periodically ?
  • Are there any variables that are not directly available from the reports but calculated by the team (derived variables) ?
  • Some files are shared as XLS(x) files while some are shared as CSV. Please share all files as CSV files to make this dataset more accessible

❗ Important:

  • Please share a link to the data dictionary (This is a CSV file which contains information about the columns present in all files under a dataset). Learn more
  • Mention the license under which this dataset is to be released on the JusticeHub. Please refer to this link for learning more about open data licenses

📈 Improving data accessibility:

  • If possible, share all files listed under the Files available section above.
  • Share the data as CSV files.
  • Include a README file which is short description about the dataset. Refer here, to know more

Data report | Supreme Court workload | Nick Robinson

Data Accessibility Report

Links
Sample Dataset
Data Documentation
Data Dictionary

Files available

Files Status
Data dictionary (Human readable dictionary of data contents)
Data License (How to use and share the data)
Raw Dataset (The original/first data provided)
Processed Dataset (Final data used in analysis)
Dataset README (A Human readable description of the data)
Citation (How you want your data to be cited)

Data Cleaning & Standardisation Report

Issue Status
Data does not have any PII's (Personally Identifiable Information)
Data to be uploaded is in a machine-readable format (CSV, JSON)

Other details

  • Data maintainer details

Comments/Next Steps:

❗ Important:

  • Please share a link to the data dictionary (This is a CSV file which contains information about the columns present in all files under a dataset). Learn more
  • Mention the license under which this dataset is to be released on the JusticeHub. Please refer to this link for learning more about open data licenses

📈 Improving data accessibility:

  • If possible, share all files listed under the Files available section above.
  • Share the data as CSV files.
  • Include a README file which is short description about the dataset. Refer here, to know more

Data report | Contract Enforcement Litigation Data from District Courts | NIPFP

Data Accessibility Report

Links
Sample Dataset
Data Documentation
Data Dictionary

Files available

Files Status
Data dictionary (Human readable dictionary of data contents)
Data License (How to use and share the data)
Raw Dataset (The original/first data provided)
Processed Dataset (Final data used in analysis)
Dataset README (A Human readable description of the data)
Citation (How you want your data to be cited)

Data Cleaning & Standardisation Report

Issue Status
Data does not have any PII's (Personally Identifiable Information)
Data to be uploaded is in a machine-readable format (CSV, JSON)

Other details

  • Data maintainer details

Comments/Next Steps:

  • Can the columns with type as HTML string be stored as separate tables
  • Mark columns as either directly sourced from the source (raw/original) or derived/user-generated in the data dictionary. E.g. columns court_code , complexcode, day_pending, etc. can be marked as derived
  • Only 76 out of the total 86 columns are present in the data dictionary
  • How does the dataset deal with empty values ? Is it different for all individual columns? This information for each column, can be included in the data dictionary as well
  • Variables with personally identifiable information (PII's) (As per our data sharing policy, we are not uploading any datasets with sensitive information either about communities (CII's) or individuals):
File Variable
sample_dataframe petNameAdd
sample_dataframe pet_adv
sample_dataframe pet_name
sample_dataframe petnameadArr
sample_dataframe petparty_name
sample_dataframe resNameAdd
sample_dataframe res_adv
sample_dataframe res_name
sample_dataframe resparty_name

❗ Important:

  • Anonymise sensitive information. To do this, columns with PII's listed above can be removed from the original dataset
  • Mention the license under which this dataset is to be released on the JusticeHub. Please refer to this link for learning more about open data licenses

📈 Improving data accessibility:

  • If possible, share all files listed under the Files available section above.
  • Include a README file which is short description about the dataset. Refer here, to know more

Data report | Company Registration Data | Veratech

Data Accessibility Report

Links
Sample Dataset
Data Documentation
Data Dictionary

Files available

Files Status
Data dictionary (Human readable dictionary of data contents)
Data License (How to use and share the data)
Raw Dataset (The original/first data provided)
Processed Dataset (Final data used in analysis)
Dataset README (A Human readable description of the data)
Citation (How you want your data to be cited)

Data Cleaning & Standardisation Status

Issue Status
Data does not have any PII's (Personally Identifiable Information)
Data to be uploaded is in a machine-readable format (CSV, JSON)

Other details

  • Data maintainer details

Comments/Next Steps:

  • Data is shared as SQL reports. We can upload this directly to the hub or convert it to CSV files to make it more accessible by our users
  • A schema/architecture map of the database will help users to navigate the dataset
  • Variables with personally identifiable information (PII's) (As per our data sharing policy, we are not uploading any datasets with sensitive information either about communities (CII's) or individuals):
File Variable
charge_dtls charge_holder_name
charge_dtls address
company_dtls company_name
company_dtls reg_add
company_dtls email
signatory_dtls name

❗ Important:

  • Anonymise sensitive information. To do this, columns with PII's listed above can be removed from the original dataset
  • Please share a link to the data dictionary (This is a CSV file which contains information about the columns present in all files under a dataset). Learn more
  • Mention the license under which this dataset is to be released on the JusticeHub. Please refer to this link for learning more about open data licenses

📈 Improving data accessibility:

  • If possible, share all files listed under the Files available section above.
  • Share the data as CSV files.
  • Include a README file which is short description about the dataset. Refer here, to know more

Data report | India Justice Report | Tata Trusts

Data Accessibility Report

Links
Sample Dataset
Data Documentation
Data Dictionary

Files available

Files Status
Data dictionary (Human readable dictionary of data contents)
Data License (How to use and share the data)
Raw Dataset (The original/first data provided)
Processed Dataset (Final data used in analysis)
Dataset README (A Human readable description of the data)
Citation (How you want your data to be cited)

Data Cleaning & Standardisation Report

Issue Status
Data does not have any PII's (Personally Identifiable Information)
Data to be uploaded is in a machine-readable format (CSV, JSON)

Other details

  • Data maintainer details

Comments/Next Steps:

❗ Important:

  • Mention the license under which this dataset is to be released on the JusticeHub. Please refer to this link for learning more about open data licenses

📈 Improving data accessibility:

  • If possible, share all files listed under the Files available section above.

Data report | Death Penalty - Annual Statistics 2019 | Project 39A

Data Accessibility Report

Links
Sample Dataset
Data Documentation
Data Dictionary

Files available

Files Status
Data dictionary (Human readable dictionary of data contents)
Data License (How to use and share the data)
Raw Dataset (The original/first data provided)
Processed Dataset (Final data used in analysis)
Dataset README (A Human readable description of the data)
Citation (How you want your data to be cited)

Data Cleaning & Standardisation Report

Issue Status
Data does not have any PII's (Personally Identifiable Information)
Data to be uploaded is in a machine-readable format (CSV, JSON)

Other details

  • Data maintainer details

Comments

  • Remove cell formatting (colors, etc)
  • Remove column summaries from the end (Row 105 and beyond, in worksheet Persons sentenced to death)
  • Worksheet titled Other is not in a standard format (No columns found)
  • Variables with personally identifiable information (PII's) (As per our data sharing policy, we are not uploading any datasets with sensitive information either about communities (CII's) or individuals):
Worksheet/File Variable Status
Persons sentenced to death Name of person Deleted. Added an ID column
Movements in HC and SC Name of person Deleted. Added an ID column

❗ Important:

  • Anonymise sensitive information. To do this, columns with PII's listed above can be removed from the original dataset - Done
  • Please share a link to the data dictionary (This is a CSV file which contains information about the columns present in all files under a dataset). Learn more - Done
  • Mention the license under which this dataset is to be released on the JusticeHub. Please refer to this link for learning more about open data licenses - Done

📈 Improving data accessibility:

  • If possible, share all files listed under the Files available section above.
  • Share the data as CSV files. We can have three individual files i.e. Persons sentenced to death, Movements in HC and SC & Statistics under the Annual Death Penalty Report dataset
  • Include a README file which is short description about the dataset. Refer here, to know more

Create a data dictionary format

A standard format to be shared with all data contributors. This is required because most of the data contributors maintain their own formats and some don't maintain any form of data dictionaries. Data moderators, while creating reports from these datasets would want to know specific details around variables, such as :

  • Name
  • Data type (Numeric, Character, etc)
  • Definition (How a variable is defined)
  • Variable Type (Categorical or Continuous)
  • Variable codes (possible values, if data is categorical)
  • Missing values code (How are missing values treated in a dataset)
  • File present in (If a dataset has multiple files)
  • Variable source (Origin, Created by the user)
    • Calculated fields
  • Mathematical formulas used to calculate a field

Check this - https://karthik.github.io/ddd/#minimal

Data report | Correctional Fascilities in Assam | Studio Nilima

Data Accessibility Report

Links
Sample Dataset
Data Documentation
Data Dictionary

Files available

Files Status
Data dictionary (Human readable dictionary of data contents)
Data License (How to use and share the data)
Raw Dataset (The original/first data provided)
Processed Dataset (Dataset used for analysis)
Dataset README (A Human readable description of the data)
Citation (How you want your data to be cited)

Data Cleaning & Standardisation Report

Issue Status
Data does not have any PII's (Personally Identifiable Information)
Data to be uploaded is in a machine-readable format (CSV, JSON, XLS*)

Other details

  • Data maintainer details

Comments/Next Steps:

Worksheet: Sheet 1

  • RTI responses under each indicator can be shared as a separate worksheet. For E.g.: Each of Women and Child Health,MEDICAL FACILITIES,MEDICAL STAFF,EDUCATION AND HEALTH,DETENTION MANUAL + FORTNIGHTLY PRISON REPORT can be a separate sheet as they all have information under different heads (columns)
  • All date columns such as RTI dated on, Reply received on, etc should only contain valid date values, in similar date formats. For Eg: YYYY-MM-DD
  • Remove cell formats (Colors, Bold, Italics, etc)
  • Indicators with RTI IGP official no. should be a separate worksheet/file. Eg:
Group RTI Details
MEDICAL FACILITIES RTI IGP official no. - 34
Women and Child Health RTI IGP official no. - 35
EDUCATION AND HEALTH RTI IGP official no. - 44
  • All Column names should be standardised (small case, mostly an identifier instead of a description)
  • Every column should just be a label and its description shall be available in the data dictionary. For E.g. a column name can be gynaecologists_available and its description can be How many gynaecologists are appointed or available for visits in the correctional homes of Assam? Please provide the number of such doctors and frequency of visit (of last 3 years) along with institution/hospital where they are appointed or available. (which is the actual column name in the file shared).
  • A few RTI responses can be converted to quantitative data as well. For E.g. responses mentioning nil can be converted to 0, etc. (Depends on use-case to use-case, sometimes it is not feasible to assign numbers to text, but should be done where possible)

Worksheet: Nature of illness - details

  • Share this as a CSV file
  • Remove Nature of Illness from Cell 1 as this is the title of the file/worksheet
  • Include geographic details as a sepearate column
  • Values of the same type should be present in each individual columns. For E.g. column titled Monthly Average (approx) should only have numbers and not dates e.g. 2019

❗ Important:

  • Please share a link to the data dictionary (This is a CSV file which contains information about the columns present in all files under a dataset). Learn more
  • Mention the license under which this dataset is to be released on the JusticeHub. Please refer to this link for learning more about open data licenses

📈 Improving data accessibility:

  • If possible, share all files listed under the Files available section above.
  • Share the data as CSV files.
  • Include a README file which is short description about the dataset. Refer here, to know more

Document the JusticeHub - Terms of Service

Justice Hub - Terms of Service

  • The purpose of the Justice Hub (A collaborative legal data platform) is to enable the sharing of data across the legal and justice sector. We collaborate with our partners, including researchers, practitioners, and government agencies to curate and understand data relevant to the operations of the justice ecosystem.
  • Only approved organisations are able to share data through the platform.
  • JusticeHub will work with its partners to assess the quality of the datasets before the data can be shared on the platform. Any dataset that does not fit the data quality framework of the JusticeHub will not be uploaded on the JusticeHub
  • The data moderators at JusticeHub will work with the data contributors to identify gaps in the datasets and document the processes required before the data can be shared on the platform
  • JusticeHub does not allow data that includes personally identifiable information (PII) to be shared publicly through the site. All data shared publicly must be sufficiently aggregated or anonymized so as to prevent identification of people or other harm to affected people and the community.
  • JusticeHub endeavors not to allow publicly shared data that includes community identifiable information (CII) or demographically identifiable information (DII) that may put affected people at risk. However, this type data is more challenging to identify within datasets during our quality assurance process without deeper analysis. We invite users of the JusticeHub to notify us should they become aware of this type of data being shared through the site.
  • Organisations sharing data through the platform should ensure, to the extent possible, that all data was collected in a legal, ethical and responsible manner
  • Datasets on the platform can be shared under a user-selected Creative Commons license or as public domain.
  • Should a user become aware of data shared through the JusticeHub platform that could cause harm by being shared openly, the user should contact [email protected] immediately to request that the data be removed.
  • Data shared through the JusticeHub platform will be held indefinitely or until such a time that the data contributor deletes it, or there is a request from a user for it to be deleted. In the latter case, the user would have to provide a convincing reason (e.g. privacy) and possibly also supporting evidence for the claim.
  • If a data source becomes aware of data that has been shared through the JusticeHub platform by a third party and disagrees with it being shared, the data source should contact [email protected] to request that the data be removed.
  • If CivicDataLab as system administrator of the JusticeHub platform becomes aware of any data that is in violation of these Terms of Service, CivicDataLab will contact the individual or organization to notify them.

About these Terms

The JusticeHub core team (CivicDataLab & Agami) may modify these terms or any additional terms that apply to the JusticeHub platform to reflect changes to our services. Users should look at the terms regularly. We will post notice of modifications to these terms. If you do not agree to these terms, you should discontinue use of the JusticeHub platform.

Data points (or datasets) from the IndianKanoon portal

ID Dataset Category Columns Use-Case Status
1 Aggregated list of cases (judgements) under all IPC acts/sections by bench and time-period (month/year) Case-Law bench,month,year,act,section,case-count Trends of disposed cases under various IPC acts and sections over time
2 Year/Month and Bench wise aggregation of cases Case-Law bench,month,year,total-cases Temporal view of total disposed cases by bench (Last n year)
3 Author/Bench/Year wise count of cases Case-Law author,bench,month,year,total-cases Judge (Author) wise analysis of cases over time
4 Number of citations per act/section. This can further be aggregated by court bench Citations act,section,total-citations,bench Total act specific judgement citations categorised by Bench
5 Number of citations per author (judge), all courts Citations author,bench,total-citations Total citations categorised by Judge and Bench
6 Top n (100) most cited judgements of each court/bench Citations bench,judgement,total-citations,rank Top most cited judgements of each court
7 A few interesting data points from IndianKanoon Website Analytics (Search requests over time) Analytics To analyse search trends over time and other user-specific website usage behavior on IndianKanoon

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.