Coder Social home page Coder Social logo

pezmc / biblatex-check Goto Github PK

View Code? Open in Web Editor NEW
168.0 5.0 33.0 456 KB

A python script for checking BibLatex .bib files for common referencing mistakes!

Home Page: https://github.com/Pezmc/BibLatex-Linter

License: MIT License

Python 84.31% TeX 15.69%
biblatex bib-files linting validating python-script

biblatex-check's Introduction

This project no longer under active development

It remains fully functional and maintenance is still completed

Assistance with enhancements, feature requests and bug fixes are all very welcome!

Any PR's will be reviewed promptly.


BibLatex-Check

A web based version of this checker is now available: https://github.com/Pezmc/BibLatex-Linter

A python2/3 script for checking BibLatex .bib files

BibTeX Check is a small Python script that goes through a list of references and checks if certain required fields are available, for instance, if each publication is assigned a year or if a journal article has a volume and issue number.

Additionally, it allows for consistency checks of names of conference proceedings and could easily be extended to other needs.

The results of the check are printed to an html file, which includes links to Google Scholar, DBLP, etc. for each flawed reference. These links help retrieving missing information and correcting the entries efficiently.

Please note that it is not a BibLaTeX validator. And in the current version, it might not yet be able to parse every valid bib file. The software targets the specific needs of Computer Scientist, but may be applicable in other fields as well.

For use in automated environments, BibLaTeX-Check returns errors on the console (can be disabled). Further, it returns an exit code depending on whether problems have been found.

The html output is tested with Firefox and Chrome, but the current version does not properly work with Internet Explorer.

Getting Started

Just copy the file into a directory with write permission, then run the script

./biblatex_check.py <-b input.bib> [-a input.aux] [-o output.html]

If you provide the additional aux file (created when compiling a tex document), then the check of the bib file is restricted to only those entries that are really cited in the tex document.

Options

Specify these when calling the script.

  • -b (--bib=file.bib) Set the input Bib File
  • -a (--aux=file.aux) Set the input Aux File
  • -o (--output=file.html) Write results to the HTML Output File.
  • -v (--view) Open in Browser. Use together with -o.
  • -N (--no-console) Do not print problems to console. An exit code is always returned.

Help

See ./biblatex_check.py -h for basic help.

If your getting an environment error, try using python ./biblatex_check.py or python3 ./biblatex_check.py depending on your OS.

Alternatives

BibLatex check is adapted from BibTex Check by Fabian Beck, which can be used to validate BibTex files.

See BibTex vs BibLaTex vs NatBib for a comparison of different referencing packages.

Screenshot

Screenshots of the BibLatex check screen

Development

The checker is a single python script that takes .bib files as input and prints to console and/or an html file.

It maintains compatibility with Python 2, so any changes should be run against both Python 2.7 and 3.

Any bug fixes should be paired with a new test case in tests/input.bib

"Running" the tests

python3 ./biblatex_check.py -b tests/input.bib
python2 ./biblatex_check.py -b tests/input.bib

Then manually confirm the number of errors matches the details top of tests/input.bib

License

MIT license

biblatex-check's People

Contributors

alexanderwillner avatar auge avatar dtzwill avatar geritwagner avatar ikkebr avatar johannjacobsohn avatar lekonjak avatar machawk1 avatar nikosavola avatar pezmc avatar rindphi avatar vvanbeveren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

biblatex-check's Issues

Skipping Last Citation, Comput J invalid, school= considered invalid

  1. The last citation in a file isn't processed at all (last one should otherwise get a complaint of missing author) (#38)
  2. Comput. J. isn't recognized as journal,
  3. school=... in phdthesis gives an error if citation is NOT standalone.
@article{Bra78b,
   title={Pattern-based representation of chess end-game knowledge},
   author={Bratko, Ivan and Kopec, Danny and Michie, Donald},
   journal={Comput. J.},
   volume={21},
   number={2},
   pages={149--153},
   year={1978}
}

@phdthesis{Str70,
   title={Untersuchungen \"{u}ber kombinatorische Spiele},
   author={Str\"{o}hlein, Thomas},
   year={1970},
   school={TU M\"{u}nchen}
}

@misc{LOM18,
title={Lomonosov tablebases},
year={2018},
url={http://tb7.chessok.com/},
howpublished={ChessOK}
}

Search entry ID omits first letter of the citation handle

Hi, thanks for developing this great tool! I just cloned the repo this morning and all seemed to work fine. However, when I wanted to search for a specific entry using the search field in the generated html file, I noticed that the search seems to omit the first letter of the search term. For example, one of my .bib-entries is this entry here:

@article{Abraham2014FN,
Abstract = {Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.},
Author = {Abraham, Alexandre and Pedregosa, Fabian and Eickenberg, Michael and Gervais, Philippe and Mueller, Andreas and Kossaifi, Jean and Gramfort, Alexandre and Thirion, Bertrand and Varoquaux, Ga{\"e}l},
Date-Added = {2019-11-06 17:20:24 +0100},
Date-Modified = {2019-11-06 17:21:27 +0100},
Doi = {10.3389/fninf.2014.00014},
Issn = {1662-5196},
Journal = {Frontiers in Neuroinformatics},
Month = {Feb},
Publisher = {Frontiers Media SA},
Title = {Machine learning for neuroimaging with scikit-learn},
Url = {http://dx.doi.org/10.3389/fninf.2014.00014},
Volume = {8},
Year = {2014}}

Now, if I start typing Abraham the entry is not displayed. However, if I type braham (omitting the first letter) it can find the corresponding entry. Therefore, I assume that the ID search somehow omits the first letter. I can try to fix it myself but wanted to point to this issues here. Thanks!

Support for alias fields

The linter flags an issue if a @phdthesis doesn't use institution, but everything I can find says school is in fact the right entry.

Add support for xref

In my .bib file, I've tried to extract journals to @xref entries to save typing and reduce errors. An example of this:

@xref{computer,
  journaltitle = {Computer},
  publisher = {{IEEE}},
  issn = {0018-9162},
}

@article{Bowman2007a,
  title = {Virtual Reality},
  subtitle = {How Much Immersion Is Enough?},
  author = {Doug A. Bowman and Ryan P. McMahan},
  crossref = {computer},
  volume = 40,
  number = 7,
  date = {2007-07},
  doi = {10.1109/MC.2007.257},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4287241&sortType%3Dasc_p_Sequence%26filter%3DAND%28p_IS_Number%3A4287226%29}
}

It would be nice if the checker would merge these (as per the BibLaTeX documentation) before complaining about missing fields.

accept journal instead of journaltitle

It would be great, if I could have the script either accept journal as a substitute for journaltitle or like pep8 have the script ignore certain types of errors. But the script was useful for bringing my bib file into shape. Thanks.

Python error when trying to run the script

I'm trying to use your script with on my thesis and bibliography and I get this error:
Traceback (most recent call last): File "./biblatex_check.py", line 201, in <module> cleanedTitle = currentTitle.translate(removePunctuationMap) TypeError: expected a string or other character buffer object

I had to change the line endings as I am running Linux (see this issue on stackoverflow), perhaps is it related?

No check for last entry

The check of the current entry is performed on finding the new '@'. Thus the last entry is NOT CHECKED.

Error

Describe the bug
Just tried the script with a small bibtex file and it gives the error message

INFO: Reading references from 'test.bib'
INFO: Filtering by references found in 'references.aux'
WARNING: Aux file 'references.aux' doesn't exist -> not restricting entries
Traceback (most recent call last):
  File "/home/jonas/shared/bin/biblatex_check.py", line 473, in <module>
    handleEntryEnding(bibLineNumber, bibLine)
  File "/home/jonas/shared/bin/biblatex_check.py", line 355, in handleEntryEnding
    entryProblemsHTML = generateEntryProblemsHTML(
  File "/home/jonas/shared/bin/biblatex_check.py", line 243, in generateEntryProblemsHTML
    html += "<div class='reference'>" + title + " (" + author + ")"
TypeError: can only concatenate str (not "filter") to str

I'm using ubuntu 21.04 and replaced the python in the first line of the script by a python3.

To Reproduce
Execute the script with the command line

biblatex_check.py -b test.bib

where the file test.bib contains the following:

@book {hartshorne1977algebraic,
    AUTHOR = {Hartshorne, R.},
     TITLE = {Algebraic geometry},
      NOTE = {Graduate Texts in Mathematics, No. 52},
 PUBLISHER = {Springer-Verlag, New York-Heidelberg},
      YEAR = {1977},
     PAGES = {xvi+496},
      ISBN = {0-387-90244-9},
   MRCLASS = {14-01},
  MRNUMBER = {0463157 (57 \#3116)},
MRREVIEWER = {Robert Speiser},
}

Expected behavior

Should tell me that the bibtex file is correct.

BibLatex-Check as a linter: missing line numbers in messages

To use biblatex-check as a linter in text editors, the messages should contain line numbers of the offending line.

For example, $ bibclean my.bib > /dev/null follows the style filename:line number: message (https://ctan.org/pkg/bibclean):

❯ bibclean  my.bib > /dev/null
%% my.bib:6:Expected http://dx.doi.org/ prefix in DOI value ``"10.1098/rstl.1856.0022"''.
%% my.bib:10:Unexpected value in ``month = "1"''.

However, biblatex-check returns

❯ biblatex-check -b my.bib -a my.aux
INFO: Reading references from 'my.bib'
INFO: Filtering by references found in 'my.aux'
PROBLEM: Blatov2010 - non-unique id: 'Blatov2010'

Could biblatex-check be extended to provide line numbers as well?

Autofix

Some issues, such as journal instead of journaltitle are autofixable. An autofix option could replace those with the correct version.

Command line mode

Could we just have the warnings on the command line, instead of the HTML?

Of course, keep the HTML rendering optionally :)

List index out of range

I'm getting this with Python3, latest version from master

Traceback (most recent call last):
  File "biblatex_check.py", line 232, in <module>
    currentId = line.split("{")[1].rstrip(",\n")
IndexError: list index out of range

No errors

I just used this to validate the following

@techreport{StrumShafferEbersoleVitale2005,
	author = "Strum, Lindsey Marie and Shaffer, Jeanne Angela and Ebersole, Garrett P. and Vitale, Daniel F."
	title = "3rd grade engineering and technology curriculum"
	institution = "Worcester Polytechnic Institute"
	%address = "100 Institute Road, Worcester MA 01609-2280 USA"
	year = "2005"
	%month = "January"
}

and got

Info
# entries: 0
# problems: 0
# missing fields: 0
# flawed names: 0
# wrong types: 0
# non-unique id: 0
# wrong field: 0

Linting in CI

Is your feature request related to a problem? Please describe.
Similiarly to #59, code linting could be implemented in CI/CD

Describe the solution you'd like
pylint is easy to implement with GitHub Actions

Book entry with editor incorrectly flagged as warning

BibTeX requires a book entry to have an author or editor entry. However, BibLatex-Check warns about correct entries that have an editor instead of an author, such as

@Book{TestBook,
    editor = "John Doe",
    title = "An Correct Entry Creating a Warning",
    publisher = "The Publisher",
    year = "2020",
}

Problem opening .bib file with custom name

Contrary to what is currently written in the Readme, calling ./biblatex_check.py customname.bib did not work. I had to specify the bib argument explicitly with ./biblatex_check.py -b customname.bib.

Great work though!

Run script as CLI command anywhere

Is your feature request related to a problem? Please describe.
It would be more accessible to be able to just run something like

biblatex_check -b input.bib

from anywhere.

Describe the solution you'd like
This should be possible by setting up a setup.py accordingly, see https://stackoverflow.com/questions/56534678/how-to-create-a-cli-in-python-that-can-be-installed-with-pip

Describe alternatives you've considered
Google has a library https://github.com/google/python-fire for generating CLIs from Python but this is a bit overkill for a single script.

Run tests in CI

Is your feature request related to a problem? Please describe.
There is a currently a 'test' that could be validated for all MRs in CI/CD

Describe the solution you'd like
GitHub Actions can be freely and easily implemented to install necessary stuff and run the shell command for running the tests

Describe alternatives you've considered
Writing the test in terms of a real pytest setup is also possible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.