petermr / ami3 Goto Github PK

Integration of cephis and normami code into a single base. Tests will be slimmed down

License: Apache License 2.0

Java 3.62% HTML 77.87% Python 0.01% XSLT 0.13% CSS 0.02% JavaScript 0.04% Shell 0.01% Dockerfile 0.01% PHP 0.04% Batchfile 0.01% PowerShell 0.01% Jupyter Notebook 18.28% Hack 0.01%

ami3's Introduction

petermr repositories

Many of these repos are widely used in collaborative projects and include:

code
data
projects

This special repo is to coordinate navigation and discussion

discussion lists

The "Discussions" for this repo https://github.com/petermr/petermr/discussions include discussions for the other repos and are of indicated by their name. They may replace our (private) Slack for all public-facing material (private project management will remain on Slack).

active repos

https://github.com/petermr/pygetpapers. automatic downloading of articles and preprints in bulk. Pioneered by Rik Smith-Unna and ported to Python by @ayush garg. CLI. PyPi: https://pypi.org/project/pygetpapers/
https://github.com/petermr/pyami. Port of Java https://github.com/petermr/ami3 to Python (@petermr). CLI. Includes a prototype GUI in tkinter. PyPi: https://pypi.org/project/py4ami/ (Note there is already an unrelated pyAMI in PyPi so in that namespace we are py4ami, but on Github it's pyami)
https://github.com/petermr/pyamiimage. Analysis of scientific duagrams (@petermr, @anuvc). No CLI, or PyPI yet.
https://github.com/petermr/docanalysis. Text-based analysis of scientific articles (@shweatahegde). CLI under dev. No PyPi yet.
https://github.com/petermr/cevopen. Projects, dictionaries and outreach for analysing articles in plant sciences.
https://github.com/petermr/openvirus. Projects, dictionaries and outreach for analysing articles on viral epidemics.
https://github.com/petermr/crops. NIPGR-intern projects on crops. Minicorpora and dictionaries for terpene synthases
https://github.com/petermr/opendiagram. Adaptation of pyamiimage to extract data from diagrams, especially materials science/batteries
https://github.com/petermr/dictionary. Software for distributed dictionaries and many dictionaries

active Python projects:

For context: We have 4 packages (if that's the right word). They are largely standalone but can have useful library routines. They all share a common data structure on disk (simply named directories). This means that state is less important and often held on the filesystem. It also means that data can be further manipulated by Unix tools and other utilities. This is very fluid as we are constantly adding new data substructures. (I developed much of this in Java - https://github.com/petermr/ami3/blob/master/README.md) . The top directory is a CProject and its document children are called CTrees as they are useful split into many subdirectory trees.

Each package has a maintainer. These are all volunteers. Their Python is all self-taught . There are also interns - mixture of compsci/engineers/plant_sci who have a 3-month stay. They test the tools, develop resources, explore text-mining, NLP, image analysis, machine-learning, etc. They are encouraged to use the packages, link them into Python scripts or Notebooks but don't have time for serious development. (They might add readers or exporters).

pygetpapers , Ayush Garg. https://github.com/petermr/pygetpapers . Searches and downloads articles from repositories. Standalone, but the results may be used by docanalysis or possibly imageanalysis. Can be called from other tools.
docanalysis. Shweata Hegde. https://github.com/petermr/docanalysis . Ingests CProjects and carries out text-analysis of documents, including sectioning, NLP/text-mining, vocabulary generation. Uses NLTK and other Python tools for many operations, and spaCy, scispaCy for annotation of entities. Outputs summary data, correlations, word-dictionaries. Links entities to Wikidata.
pyamiimage, Anuv Chakroborty + PMR. https://github.com/petermr/pyamiimage . Ingests Figures/images, applies many image processing techniques (erode-dilate, colour quantization, skeletons, etc.), extracts words (Tesseract) , extracts lines and symbols (uses sknw/NetworkX) and recreates semantic diagrams (not finished)
py4ami . PMR. https://github.com/petermr/pyami . Translation of ami3(J) to Python. Processes CProjects to extract and combine primitives into semantic objects. Some functionality overlaps with docanalysis and imageanalysis. Includes libraries (e.g. for Wikimedia) and includes prototype GUI in tkinter, and a complex structure of word-dictionaries covering science and related disciplines. (Note the project is called pyami locally but there is already a PyAMI project so there it is called py4ami)

All packages aim to have a common commandline approach, use config files, generate and process CProjects (e.g. iterating over CTrees and applying filters, transformers, map/reduce, etc.). All 4 packages have been uploaded to PyPI

basicTest

Checks that the Python environment works (independently of the applications) https://github.com/petermr/basicTest/blob/main/README.md

presentations

Some presentations about the software, many from collaborators/interns

pygetpapers

https://youtu.be/pUjiNzLVHLY (@ayushGarg) 5 min

notebook

https://github.com/petermr/docanalysis/blob/main/resources/docanalysis_demo.ipynb

docanalysis

docanalysis slides (MADICES): https://github.com/petermr/CEVOpen/blob/master/outreach/docanalysis_demo_madices.pdf

wikidata

WikidataCon Presentation slides and recording: https://github.com/petermr/crops/tree/main/outreach/WikidataCon2021

ami3's People

Contributors

Stargazers

Watchers

Forkers

nuest anjackson remkop l-hawizy ziflex

ami3's Issues

We need a try/catch for bad pdf files

get_papers got a bad pdf file and it kills ami when it hits it. The bad pdf is attached to this issue.
The command is
ami -p ./results pdfbox

The exception it returns is

java.lang.RuntimeException: Cannot load PDF ./results/PMC2718502/fulltext.pdf
	at org.contentmine.cproject.files.CTree.processPDFTree(CTree.java:1753)
	at org.contentmine.ami.tools.AMIPDFTool.docProcRunPDF(AMIPDFTool.java:270)
	at org.contentmine.ami.tools.AMIPDFTool.processTree(AMIPDFTool.java:190)
	at org.contentmine.ami.tools.AbstractAMITool.processTrees(AbstractAMITool.java:588)
	at org.contentmine.ami.tools.AMIPDFTool.runSpecifics(AMIPDFTool.java:159)
	at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:193)
	at org.contentmine.ami.tools.AbstractAMITool.call(AbstractAMITool.java:173)
	at org.contentmine.ami.tools.AbstractAMITool.call(AbstractAMITool.java:41)
	at picocli.CommandLine.executeUserObject(CommandLine.java:1853)
	at picocli.CommandLine.access$1100(CommandLine.java:145)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2255)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2249)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2213)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2080)
	at org.contentmine.ami.tools.AMI.enhancedLoggingExecutionStrategy(AMI.java:176)
	at picocli.CommandLine.execute(CommandLine.java:1978)
	at org.contentmine.ami.tools.AMI.main(AMI.java:113)
Caused by: java.io.IOException: Error: End-of-File, expected line
	at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1124)
	at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2589)
	at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2560)
	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1099)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1082)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1041)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:989)
	at org.contentmine.cproject.files.CTree.processPDFTree(CTree.java:1751)
	... 16 more

fulltext.pdf

Cache test errors on my side

Compononent Cache Test has 1 failure returning this message "49880 [main] ERROR

org.contentmine.graphics.svg.cache.ComponentCacheTest  - No file: \Users\pm286\workspace\cm-ucliijkb\corpus-oa-uclii-01"

I don't have this directory but I can't update it to work on my end since the corpus isn't on ami3 so I don't have that folder either

Document Cache Test has 1 failure returning the following messages

528  [main] WARN  org.contentmine.graphics.svg.cache.DocumentCache  - no files found in: dir: src\test\resources\org\contentmine\graphics\svg\page\varga1
 with .*/svg/fulltext\-page(\d+)(\.svg\.compact)?\.svg
3900 [main] WARN  org.contentmine.graphics.svg.cache.DocumentCache  - no files found in: dir: target\document\bmc\12936_2017_Article_1948
4387 [main] WARN  org.contentmine.graphics.svg.cache.DocumentCache  - no files found in: dir: target\document\varga1
 with .*/svg/fulltext\-page(\d+)(\.svg\.compact)?\.svg

I can confirm all these 3 directories contain files

ami makeproject does not produce folders with fulltext.pdf file

I have just run ami makeproject, but folders with fulltext.pdf file are not created.
This is the output:

ls test_small_pdfs/*.pdf
test_small_pdfs/test_1.pdf  test_small_pdfs/test_1-1.pdf  test_small_pdfs/test_2-1.pdf

Usuario@Medium /cygdrive/J/N/java/Aplicaciones/_new/tmp/projects/chess
$ ami -p test_small_pdfs/ makeproject --rawfiletypes pdf

Generic values (AMIMakeProjectTool)
================================
-v to see generic values

Specific values (AMIMakeProjectTool)
================================
compress            25
omit                [template\.xml, log\.txt, summary\.json]
file types          [pdf]
logfile             null

Usuario@Medium /cygdrive/J/N/java/Aplicaciones/_new/tmp/projects/chess
$ tree test_small_pdfs/
test_small_pdfs/
├── make_project.json
├── test_1.pdf
├── test_1-1.pdf
└── test_2-1.pdf

0 directories, 4 files

I suppose that is not the expected behaviour

`amidict search --help` does not work

Currently amidict search --help does not work.

Unify logging and tool output in AMI

From the Slack chat:

@remko Popma I need to get a Logging strategy for AMI. It's currently a mess. Most of the debugging is log4j with TRACE->ERROR all used.
You have introduced -v/vv in picocli and I've used this fairly simplistically:
if (verbosity().length >= 2) System.out.println(+"readTerms");
12:31
or to mimic log4j
if (verbosity().length >= 2) System.out.println(this.getClass()+";" + LocalDate.now()+"; "enter readTerms");
12:33
(I find it useful to have the class for finding the debug statements).
Since I want to try to clean some of this up , do you have a recommended strategy? in particular is it a good idea to combine this with log4j or do something completely separate.
12:36
I also have a problem communicating between the Tool class that "calls" the operations, and communication to legacy libraries. log4j is added to every class using a static variable:
public static final Logger LOG = Logger.getLogger(DictionaryCreationTool.class);
static {
LOG.setLevel(Level.DEBUG);
}

12:38
For the time being I'm using the verbosity() with sysout but it feels there could be more...
12:38
Many thanks.
12:43
Also can there be different levels of verbosity for different subcommands, e.g.
ami -p foo makeproject pdfbox -v image -vv
12:44
This would have no debug for makeproject , level 1 for pdfbox and level 2 for image`

Peter Murray-Rust 1:01 AM
As an example I have a class HTMLFactory which creates HTMLElements and I want to debug it. It's completely separate from FooTool so I'm reluctant to add implements Verbosity because that means potentially altering all libraries. I could allow verbosity() to set the Level in the library classes but again it doesn't feel quite right.

Remko Popma 2:37 PM
I am working on a growing command line toolset at work and I faced the same issue with logging and user-facing messages.
2:38
One strategy (which I no longer use) is to completely separate the user-facing messages from the logging; the thinking being that they serve different purposes and have a different audience. In practice however, to me this meant coding the same message twice: once to display it to the user and once to make sure it is also captured in the log file for troubleshooting purposes. I did not like this duplication. (edited)

Remko Popma 2:45 PM
So now I am trying something else.
2:47
What I am currently doing is I removed all explicit use of System.out.println and System.err.println from the code and instead use the logging library (Log4j2 in my case) everywhere.

Remko Popma 2:54 PM
In the log4j configuration, I have 2 appenders: a log file and the console. Everything (TRACE level and up) is logged to the file, prefixed by the log level, timestamp and the logging class.
The console appender shows the user-facing messages. By default, only WARN, ERROR and FATAL log messages are logged to the console - and these are not prefixed with the log level, timestamp and the logging class - only the message is shown.
If the user specifies one -v option (--verbose), then the config is changed such that INFO level messages are also shown on the console. -vv includes DEBUG level and -vvv includes TRACE level messages. (edited)

Remko Popma 3:16 PM
The picocli project has two examples (1 and 2) of how to code this; but I can help if you want to try this. (edited)
3:17
It may need some extra work if we want to cleanly separate System.out for command output from System.err for diagnostics; this becomes important when we want to pipe command output to other tools.

Remko Popma 5:33 PM
Should I assume that we want to pipe output of tools to other tools? (So strict segregation between STDOUT (for data) and STDERR (for diagnostics)?)

Peter Murray-Rust 5:34 PM
The user needs the following:
the core data from a Tool. For example running ami search they want the results/ directories. These (probably) should not be under logging control as we all want the same output for scientific purposes.
5:34
At present I don't use stderr and stdout at all as I don't expect users to use them in this way
5:35
(This may change). The present intern users are much more competent than (say) hack days I did 3+ years ago.
5:37
But piping the output of search into bash (sed, awk ) etc only works for geeks. For some users (not the current group) commandlines are terrifying.
5:38
all essential data is communicated through the local filestore.

Remko Popma 5:58 PM
Regarding controlling output with --verbose: I suggest we make this option an inherited option (with scope = INHERIT) to make it available on all commands/subcommands. We then install an execution strategy that programmatically modifies the log4j configuration based on the verbosity before running the actual command.

`ami section` fails with valid `--summary` argument

ami [...] section

runs correctly without options. and with

ami [...] section --sections

but fails with

ami [...] section --sections ALL --summary figure

see: https://github.com/petermr/ami3/blob/master/src/test/java/org/contentmine/ami/tools/AMISectionToolTest.java

	public void testSectionsSummaryBug() {
		String args;
		args = ""
				+ "-p " + AMIFixtures.TEST_ZIKA10_DIR
				+ " --forcemake"
				+ " section"
				+ " --sections ALL"
			;
		AMI.execute(AMISectionTool.class, args);
		System.err.println("=======end 1 runs OK ========");
		args = ""
				+ "-p " + AMIFixtures.TEST_ZIKA10_DIR
				+ " --forcemake"
				+ " section"
				+ " --sections ALL"
				+ " --summary foo"
			;
		AMI.execute(AMISectionTool.class, args);
		System.err.println("=======end 2 runs OK, detects bad arg 'foo' ========\n"
				+ "Invalid value for option '--summary' at index 0 (<summaryList>): "
				+ "    expected one of [figure, results, supplementary, table] (case-sensitive) but was 'foo'\n" + 
				"Usage: ami section [OPTIONS]\n" + 
				"Try 'ami section --help' for more information.\n"
				+ "===============================" + 
				"");
		args = ""
				+ "-p " + AMIFixtures.TEST_ZIKA10_DIR
				+ " --forcemake"
				+ " section"
				+ " --sections ALL"
				+ " --summary figure"
			;
		AMI.execute(AMISectionTool.class, args);
		System.err.println("=======end 3 fails========\n"
				+ "Expected parameter for option '--summary' but found 'figure'\n" + 
				"Usage: ami section [OPTIONS]\n" + 
				"Try 'ami section --help' for more information.\n" + 
				"==============================");
		
	}

Bump picocli version to 4.3.2

AMIDictionaryTest NPE in createDictionary: missing `--wptype`. Can we assign a default?

Many AMIDictionaryTests fail with a NullPointerException in AMIDictionaryTool::createDictionary because the --wptype option is not specified, so the WikiFormat wptype field is null.

Should the dictionary tool require a --wptype value, or can we assign a reasonable default?

Also:

AMIDictionaryTest::testWikipediaTables - "Unknown options: '--urlcol', 'Link'"
testWikipediaConservation1 - Unknown option: '--hreftext'; Possible solutions: --hrefcols
testReadMammalsCSV - Unmatched argument at index 9: 'names'

What to do with these? Looks like these tests are old.

Ensure that `parseSpecifics` prints option values for all ami commands

This is the remaining follow-up item from #46 (the logging epic).

One issue remains to complete this topic:
many tools print the values of the @Option-annotated fields in the parseSpecifics method, for troubleshooting purposes.

The majority of the tools use System.out.println for this. (...) it is better to use the logging library for this so that the values are also captured in the log file instead of only on the console.

The log level to use for this is INFO (only visible if users specify --verbose on the console).

Consider converting Operations in `ami dictionary` to subcommands

Currently, the ami dictionary command allows the following operations:

create
display
help
search
translate

Looking at the usage in the test, these "feel like" subcommands. It may make sense to implement them this way. (Perhaps with @Command-annotated methods in AMIDictionaryTool.)

That said, some analysis is required on the options:
some of the options in ami dictionary may apply to all these operations, and some may be specific to certain operations.

The next version of picocli will support "inherited" options, which would allow users to specify options on either the parent command or the subcommand. That may be useful here: it would allow end users to specify options either before or after the operation. Consider postponing this task until picocli 4.3.0 is released.

AMIDownload Test fails

From Lezan Hawizy:
have had a look at AMIDownloadTest, these are the errors I found:
A few seem to have an IllegalThreadStateException which i am not sure how to go about testBiorxivSmall() only fails because there is no variable for the landingpage argument on line 67 and its missing a comma between html and pdf in line 68

testBiorxivClimate could be a false assert statement, it's looking for a folder called metadata and file called page1.html, but it's created a __metadata folders instead with under it called resultSetX.html are those the same thing?

Reduce shared `ami` options

Problem Description

(Raised by @petermr in this comment on #13)

All current ami tools share about 20 common options. Additionally some commands have some command-specific options. This is too many: it makes the usage help message difficult to grasp. Not all common options apply to all commands.

Analysis

I believe there are two aspects to this (@petermr, correct me if I'm wrong):

Some options are "shared" but do not really apply to all commands. This may be because all commands inherit from AbstractAMITool, where these options are defined.
Some commands actually do use all these options, but casual users only use a few of them.

Solutions

Idea 1: move real shared options to the `ami` top-level command

This requires some analysis on which options are really applicable to/used by all (or the vast majority of) commands. These options could then be moved to the ami top-level command.

Invoking ami --help would show these shared options, while ami <cmd> --help would only show the command-specific options.

Command invocations would then look like this:

ami --shared-option1 --shared-option2 <cmd> --cmd-specific-option1 ...

This would hook nicely into our idea of creating workflows, because shared options would only need to be specified once on the top-level command instead of on each command:

ami --shared-option1 --shared-option2 \
  <cmd1> --cmd1-specific-option \
  <cmd2> --cmd2-specific-option \
  <cmd3> --cmd3-specific-option1 --cmd3-specific-option2

Idea 2: use mixins instead of inheritance

Picocli offers mixins as an alternative reuse mechanism to Java inheritance. I we find there are some groups of options that apply to several, but not all, commands, then these options could be split off into a separate class, and "mixed in" to the commands where they are actually used with the @Mixin annotation.

Again, this requires some analysis on which options are applicable to/used by each command.

Idea 3: custom help

If there are some commands that still have too many options, we can give these commands a custom usage help message, where ami <cmd> --help would only show the "often used" options, and ami <cmd> --help-details would show the full list of all options.

This would require some guidance from experienced users on which options fall into the "often used" category, and which are "rare use case" options.

Document AMI tools

`parent` not injected into `AMI`

The test org.contentmine.cproject.files.CProjectTest.testPicocliInjectparentOK()
runs AMI over a CProject with the command pdfbox and completes. (It's largely a no-op as the data are already maked)

	@Test
	public void testPicocliInjectparentOK() {
		CProject project = new CProject(new File(NAConstants.TEST_AMI_DIR, "battery10"));
		List<String> treeNames = Arrays.asList(new String[] {
				"PMC3776197",
				"PMC4062906",
				});

		String treeNamesString = String.join(" ", treeNames);
		String cmd = "-p " + project
				+ " -v"
				+ " --includetree " + treeNamesString
				+ " pdfbox";
		AMI.execute(cmd);
				
	}

The same setup with a different command


	@Test
	public void testPicocliInjectParentNPEFail() {
		CProject project = new CProject(new File(NAConstants.TEST_AMI_DIR, "battery10"));
		List<String> treeNames = Arrays.asList(new String[] {
				"PMC3776197",
				"PMC4062906",
				});

		String treeNamesString = String.join(" ", treeNames);
		String cmd = "-p " + project
				+ " -v"
				+ " --includetree " + String.join(" ", treeNamesString)
				+ " image";
		AMI.execute(cmd);
				
	}

fails with NPE because parent is not set in abstractAMITool.

	Generic values (AMIImageTool)
================================
input basename      null
input basename list null
cproject            /Users/pm286/workspace/cmdev/ami3/src/test/resources/org/contentmine/ami/battery10
ctree               
cTreeList           [src/test/resources/org/contentmine/ami/battery10/PMC3776197, src/test/resources/org/contentmine/ami/battery10/PMC4062906]
excludeBase         null
excludeTrees        null
forceMake           false
includeBase         null
includeTrees        2 [PMC3776197, PMC4062906]
log4j               
verbose             1

Specific values (AMIImageTool)
================================
minHeight           100
minWidth            100
smalldir            small
monochromeDir       monochrome
duplicateDir        duplicate
borders             null
binarize            null
despeckle           false
erodeDilate         false
maxheight           1000
maxwidth            1000
posterize           false
priority            RAW
rotate              null
scalefactor         null
sharpen             none
template            null
threshold           null


Generic values (AMIFilterTool)
================================
java.lang.NullPointerException
	at org.contentmine.ami.tools.AbstractAMITool.getCProjectDirectory(AbstractAMITool.java:427)
	at org.contentmine.ami.tools.AbstractAMITool.validateCProject(AbstractAMITool.java:247)
	at org.contentmine.ami.tools.AbstractAMITool.parseGenerics(AbstractAMITool.java:219)
	at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:196)
	at org.contentmine.ami.tools.AMIImageTool.runPrevious(AMIImageTool.java:373)
	at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:204)
	at org.contentmine.ami.tools.AbstractAMITool.call(AbstractAMITool.java:186)
	at org.contentmine.ami.tools.AbstractAMITool.call(AbstractAMITool.java:1)
	at picocli.CommandLine.executeUserObject(CommandLine.java:1783)
	at picocli.CommandLine.access$900(CommandLine.java:145)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2150)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2144)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2108)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:1975)
	at picocli.CommandLine.execute(CommandLine.java:1904)
	at org.contentmine.ami.tools.AMI.execute(AMI.java:147)
	at org.contentmine.ami.tools.AMI.execute(AMI.java:144)
	at org.contentmine.cproject.files.CProjectTest.testPicocliInjectParentNPEFail(CProjectTest.java:675)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:89)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:41)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:541)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:763)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:463)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:209)

Note that it fails in parseGenerics() which is before any main business logic.

(It also causes the NPE reported in the previous distinct issue)

Image bounds displaced from real position

Hello,

The previous request was because I wanted to use that ami3 function, because mine (based also on the one of ami3) with some books adds an offset to position of images.
But I have checked, linking from my library to ami3 and using ami3 getBoundingRect().
That function gives also the displaced location on images of some books.

You can check with that pdf example:
https://www.frojasg1.com/Test-1.pdf

It gives an offset to the position of the image at the bottom.

Improve ami deployment pipeline

With #20 the ami3 project now has continuous integration: the project is built and the tests are run every time someone pushes changes to master. This ticket is about improving this further.

Note that this ticket is work in progress - some of it is just me thinking out loud.

Current Deployment Mechanism

The current mechanism for doing an ami3 release is to build a binary and use the ami-jars.sh script to publish the ami jar to a separate petermr/ami-jars project.

This involves manually editing the VERSION in the ami-jars.sh script before running it. It also means there is a separate ami-jars project that contains all past binaries.

For users to get an ami binary, they need to clone the ami-jars project (including all older versions) before they can use the ami version they need.

Goals

Publish ami jars (potentially including SNAPSHOT jars) to a location where users can get it simply by updating the version number of the ami dependency in their project pom. This means making our ami jar available on Maven Central, Bintray JCenter or GitHub Packages.

Ideally, we want to make this process fully automatic, where the only manual steps are tagging a commit with the release version string and running a command like mvn deploy or mvn publish.

Note on version management:

Ideally, this tag is the only place where we provide the version number, but this may not be feasible; we may need to put this version string in a file as well, so that it can be displayed when users request version information with ami --version and we also want to use this version string when building the ami jar (so the build produces a ami-0.1-20200406.jar file).

Required Changes

The ami3 project now has two CI jobs in GitHub actions (see #20):

one ("Java CI with Maven", defined in .github/workflows/maven.yml) that just does the build, runs the test and creates a package (jar)
another one ("Maven Package", defined in .github/workflows/mavenpublish.yml) that "build a package using Maven and then publish it to GitHub packages when a release is created" (details)

Not quite sure yet, but perhaps the latter can do what we want, if we make some minor changes.

Maven settings config for authentication
Configuring a distributionManagement section in the project pom.xml

Alternatively, we can deploy from our local workstation by running mvn deploy (requires similar authentication and distributionManagement setup).

Proposed New Release Workflow

The new release workflow will likely be something like this:

update version.properties in the root of the project (this file does not exist currently, we will reference it in the pom.xml to get the jar version number, and we will include it in the ami.jar for printing version info with ami --version)
commit the change, tag this commit with the same version string, and push this commit
go to the ami project page Releases tab and create and publish a release for this tag.
- this will trigger the "Maven Package" CI job set up in GitHub Actions, and should result in the binaries being made available in the GitHub Packages package hosting service. This may allow us to publish and install packages in GitHub Packages without needing to store and manage a personal access token.
- alternatively, we can deploy from our local workstation by running mvn deploy

I will need to experiment a bit. Once I worked out all the steps for the new release workflow, I will update BUILDING.md.

Improve ami packaging

Investigate the jpackage tool that is included in Java 14.

From the documentation:

The jpackage tool packages a Java application into a platform-specific package that includes all of the necessary dependencies. The application may be provided as a collection of ordinary JAR files or as a collection of modules. The supported platform-specific package formats are:

Linux: deb and rpm

macOS: pkg and dmg

Windows: msi and exe

It may be possible to add this to the Maven build, with plugins like jlink-jpackager-maven-plugin.

Streamline ami logging

Summary

AMI currently uses a combination of technologies for logging, including Log4j-1.2.16, System.out/err.println and a combination of the two. Below details a number of ways on how the design and implementation can be simplified.

1. User-specified verbosity

Application classes that are aware of the --verbose option use the user-specified verbosity level to control whether to show certain messages or not.

Problem

This requires application classes to be aware of the existence of, and follow the convention of, the AbstractAMITool::verbosity() method. Unfortunately this mechanism is not easy to use for application classes that do not extend or have a reference to AbstractAMITool.

Solution

One idea is to configure Log4j at startup time based on the user-specified --verbose value. Application classes could then simply use the Log4j API to log messages, without checking log levels. The picocli wiki has an example of configuring logging with a --verbose option.

2. Logging vs. console messages

Problem

Logging and showing a message to the user are two different things.

However, it is desirable to have a single mechanism in the application to accomplish both. That way, the application does not need to make two calls with the same message to ensure the message is both shown to the user and is written to the log. At the same time, we don't always want to do both.

Solution

One idea is to configure log4j with a FileAppender as well as a ConsoleAppender. The ConsoleAppender could be configured so that it does not display a timestamp, log level or the logging class, and only shows the log message. The threshold of the ConsoleAppender could be modified at startup based on the --verbosity option (see previous section).

Additionally, it may be nice to have a mechanism for command classes to have a way to show messages to the user regardless of the --verbosity level. (Note to self: Hm... Why would this be necessary?) One idea to accomplish this is to use "markers" (supported by SLF4J and Log4j2, but not by Log4j-1.2) or an urgent log level like FATAL (or a custom log level in Log4j2).

Drawback

The separate FileAppender works well when running locally but may not be useful when running in a CI (Continuous Integration) environment where only console logs are recorded and accessible via the provider's web interface. This may need more thought.

3. Interrupting execution

If an application class calls AbstractAMITool::addLoggingLevel with a log level of WARN or ERROR during invocations of printGenericHeader, parseGenerics, printSpecificHeader or parseSpecifics, this results in the runCommands method to abort and not invoke runPrevious, runGenerics(), and runSpecifics`.

This is currently used when some preconditions are not met: missing required arguments, or specified arguments point to invalid directories or files, etc.

Problem

Unconventional, it is not intuitively obvious to future maintainers that a call to addLoggingLevel will impact future processing.

Solution

The standard mechanism in Java to abort execution is to throw an exception. Investigate whether calls to addLoggingLevel cannot be replaced with throwing an exception.

Assuming the entry point was a call to picocli's CommandLine.execute, we are already using picocli's exception handling mechanism. A ParameterException will result in the error message being shown together with a short usage message. Any other exception will result in a stack trace being shown.

Improve `ami clean`

The AMICleanTool was not working correctly.
@petermr identified the cause: there was missing deletion logic.
This was addressed by adding --fileglob and --fileregex options.

During discussion, we found that the ideal way to invoke this tool would be like this:

ami -p some/dir clean foo.html bar/*.xml plugh/**/*.txt log*data/text.xml

That is:

no need for options like --fileglob
globbing by default
accept multiple positional parameters as globbing patterns

I will proceed to change the AMICleanTool implementation remove the existing options and accept only positional parameters. At least one positional parameter must be specified, or an error "missing parameter" is shown to the user.

Set up continuous integration

Set up continuous integration to run the build (and the tests) automatically on every commit.
This is especially useful because there are many slow-running tests.

Candidates:

GitHub Actions
Travis
AppVeyor

Configure `picocli-codegen` for compile-time validation of picocli annotations

@petermr indicated in the openVirus ami chat channel that having validation on the picocli annotations would be helpful.

By configuring the picocli-codegen module as annotation processor, we can get some compile-time validation of the picocli annotations. It is not perfect and does not contain all checks that are done at runtime, but it is still better than nothing.

Provide ability to specify GROBID installation directory

From Slack conversation:

Andy Jackson (...)
GROBID is currently a bit of a pain and has to be in {users home folder}/workspace/grobid to actually work, which I couldn't do in the Dockerfile for reasons related to how MyBinder works.

Peter Murray-Rust
Agreed. That's because GROBID didn't have an installer, I think. If we create a distro we can probably copy GROBID into it??

Andy Jackson (...)
The one thing that would make it easier is being able to point ami at a user-defined folder containing whatever version of GROBID.

Peter Murray-Rust
There's a wider issue of where AMI finds its resources. Will post.

Bugs

Some current bugs

bug list

These can be seen on page 2 of /ami3/src/test/resources/org/contentmine/ami/omar/test/lichtenburg19a/svg/fulltext-page.1.svg

The current release has several serious bugs since migrating to PDFBox2.0 . This include:

stroke not being added to path
font-size defaulting to 1.0
spaces at end of strings - causes problems with display if arrays are used
spaces in middle of text string instead of splitting into phrases
maths equations not properly displayed
subscripts at wrong level
final text line on LHS has wrong y-coordinate
some graphics are inverted in the page (i.e. y => 800-y) often mirroring the correct half. (page6/10)
codepoints not reported for CMEX10 and CMSY10 - appear as "?"

Documentation: create man pages for ami commands

Background

The ami toolset is used in the https://github.com/petermr/openVirus project to convert scientific papers from PDF and other formats into machine-readable and searchable formats. The toolset is very large, being the result of many years of work; we want to make it more accessible and lower the learning curve for collaborators.

Proposed Change

I propose that we add a step to the Maven build to generate man pages (in unix man page format and HTML). Picocli can auto-generate AsciiDoc pages for all picocli commands; these pages can then be converted to various other formats with the AsciiDoctor tool.

TBD: should the generated HTML pages be hosted somewhere in the ami3 project for easy reference?

Benefits

Make it easier for ami users to find commands that meet their needs, and how to use these commands.

Man page documentation can be generated automatically for all commands.

Drawbacks

Adds dependencies to the project and complicates the build.

short commands in `picocli` not working when concatenated

Create correct SVG from ML papers

PMR is currently debugging the PDF2SVG conversion. An alternative (a) for a small number of papers is to use an online servjce such as https://cloudconvert.com/pdf-to-svg and convert the files manually (b) find another OpenSource package.

The cloudconvert acts as a reference for the debugging.

Caveat. The SVG produced may need come transformations to nnormalize the coordinates (e.g. to screen units)

cloudconvert

Creates codepoint-oriented output, but includes some transformation matrices. Not open source.
Not a long-term solution.
(Could write a compacter routine to make the SVG more tractable).

pdf2svg

http://www.cityinthesky.co.uk/opensource/pdf2svg/ creates transformed paths but creates characters as paths not codepoints. The paths could be a useful reference.

PDFBox-AMI

A little while to go. Not picking up graphics state at right places. Might combine with pdf2svg (very messy)

Null pointer when invoking ami

I have just downloaded the latest master branch and compiled.
After doing that, I invoked ami getting that error:

$ ./appassembler/bin/ami -p ../../../_new/tmp/projects/chess/test.with.all.pdfs/ makeproject --rawfiletypes pdf

Generic values (AMIMakeProjectTool)
================================
java.lang.NullPointerException
		at org.contentmine.ami.tools.AMIMakeProjectTool.validateRawFormats(AMIMakeProjectTool.java:153)
		at org.contentmine.ami.tools.AbstractAMITool.parseGenerics(AbstractAMITool.java:223)
		at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:198)
		at org.contentmine.ami.tools.AbstractAMITool.call(AbstractAMITool.java:188)
		at org.contentmine.ami.tools.AbstractAMITool.call(AbstractAMITool.java:33)
		at picocli.CommandLine.executeUserObject(CommandLine.java:1783)
		at picocli.CommandLine.access$900(CommandLine.java:145)
		at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2150)
		at picocli.CommandLine$RunLast.handle(CommandLine.java:2144)
		at picocli.CommandLine$RunLast.handle(CommandLine.java:2108)
		at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:1975)
		at picocli.CommandLine.execute(CommandLine.java:1904)
		at org.contentmine.ami.tools.AMI.main(AMI.java:102)

Disable `ami getpapers` in favor or contentmine/getpapers

Disable the ami getpapers subcommand.
Users should use contentmine/getpapers instead.

@petermr I assume that this is because the contentmine/getpapers tool is more mature, and the ami getpapers tool is still work in progress?

Separate slow tests from fast tests

When running the maven build with mvn clean package, the tests to run for several hours and at some point seem to get stuck.

It would make sense to me to separate fast tests ("unit tests") from slow-running tests ("integration", "acceptance" or "system" tests).

On the one hand, developers need fast feedback to make sure their changes did not break anything, and on the other hand we need a set of thorough tests that require a complete system setup.

There are several ways to accomplish this with Maven:

class or package naming conventions
separate module for slow-running tests
JUnit Categories

Optionally, we can use the Maven failsafe plugin for integration tests. With the failsafe plugin enabled, we can run the following:

mvn test (will run only the basic unit tests, and will stop the build if any of them fails),
mvn integration-test (will run integration tests, and will not stop the build if any of them fails), and
mvn verify (will stop the build if an integration test fails).

Resources:

The main problem will be going through the existing tests and filtering the slow ones from the fast ones. Perhaps we can automate this from the test log.

copy tests from `normami`

normami had a huge number of tests, often including large PDFs in test/resource. Total size > 4 GB. So the tests have all been removed and we will gradually introduce them. The tests are, of course, all still there in normami. It may even be possible to link to them.
Tasks:
add regression tests (there should be one test per class that tests that the main function runs.
add tutorial tests that demo the functionality.

test plot extraction routines and document the useful ones

The package org.contentmine.graphics.svg.plot contains many plot extraction tests. Run them, list failures and errors in Issues and document briefly those useful for data extraction. @OmarBani

Consolidate ami regex tools

This is a follow-up item from discussion on #15:

@petermr I noticed there are two classes named AMIRegexTool and they both extend AbstractAMISearchTool. Which one should be the subcommand for ami? Or do you want both?

I picked org.contentmine.ami.tools.AMIRegexTool but it looks like that was the wrong one...

Note that org.contentmine.ami.plugins.regex.RegexPlugin is still available as a top-level command with a separate ami-regex launcher script. I can make that one the subcommand for ami if you want, but then what to do with org.contentmine.ami.tools.AMIRegexTool? (That one did not have a launcher script so perhaps you don't care too much about that one...)

Peter's reply:

What happened was a primitive pre-picocli command line which
supported something I called Plugins (they weren't actually Plugins as the
links were hardcoded but they were designed to be if and when I worked out
how! I even looked at OSGI at one stage).
AMIRegex is currently "broken" - i.e. it isn't linked in, but it should be.
It would be great to have the following:

AMIRegex which is very useful for lots of things, such as identifiers.
(There's a separate AMIIdentifier which is just a specialisation of regex.
AMISpecies which uses style (...) to detect a possible species and
then regex ([A-Z][a-z]+\s+[a-z]*) to pick up Tyrannosaurus rex . (The
regex is more complex but ..). There's also more lookup logic - not just
lexical.
Gene, Sequence (biological), are also useful.
and
word frequencies also suffer from this mess.

If you look at org.contentmine.ami.plugins you can see these all had
pre-picocli commands (you can see why Picocli saved the project!!)

I have forgotten exactly how the commands linked in, but It should be
relatively easy to reconstruct a prototype. I started doing this but stuck
about halfway through (I think when I broke the leg). Somewhere in there I
have used a Bloom filter for rapid searching (I think it's still linked in).

But it may be that for general word searching and frequency it's better to
use Lucene/Solr and write results back into the tree.

migrate PDFRenderer and PageDrawer from `pdfbox` to `ami`

migrate PDFRendering and PageDrawer to ami

The examples in pdfbox are the basis for extraction of graphic primitives from the PDFStream and demonstrate subclassing. This more-or-less worked with PDFBox1 but the transition to PDFBox 2 has only partially worked. see Issue #7

Almost certainly because I rushed it.

This is now hopefully a careful, documented re-engineering.

Investigate memory leak in `ami pdf`

This is a follow-up item from discussion on #15:

We should mend ami-pdf . It's got a memory leak and crashes/hangs after
~~100 documents. One idea is to put a loop in:

// set chunk size 50; CTree chunk doesn't yet exist
for (CTreeChunk chunk : cProject.getCTreeList()) {
    for (CTree cTree : chunk) {
        process(cTree)
    }
}

process(cTree) itself iterates over chunk of pages. One page might have
200,000 vectors so they have to be written as we go. Big documents can
cause problems and it's not easy to spot them in advance.

ami-pdf has a certain amount of make. If it sees pdfimages/ or svg/ it will
skip. Ideally this should be definable by the user but that's another day.

Default values not working in `ami` commands

Isolation of function to be used in other projects easily

Hello,
I have a request:

I have located the function that extracts the bounds of an image in the pdf:
The function is that:

AbstractPageParser -> getBoundingRect()

I have tried to isolate it in one single class, so I can use it in a project of mine without the need to add the whole dependency of ami3 to my project.
But I have found many dependencies, I finally I was not able to isolate it.

Would it be easy to isolate it?
Would you do that for me if it is not difficult?

Thank you very much!

Missing resources causing tests to fail

It looks like a lot of the tests fail because the resource files they rely on are not present in the project under src/test/resources. If these resources are no longer available these tests can/should be removed. If they do exist but are too large, perhaps they can be made available online and the test be renamed to a (slow-running) IT test. Otherwise, if the resources exist and are small enough, can they be committed to the ami3 project under src/test/resources?

I will update this list as I find more occurrences. Last update: 2020-04-03T08:16+0900

Adding PDFs to exisiting project

How to add more PDFs to an existing project.

Null values in Options Chains

Methods such as org.contentmine.ami.tools.AbstractAMITool.getCProjectDirectory()

protected String getCProjectDirectory() {
	return parent.projectOrTreeOptions.cProjectOptions.cProjectDirectory;
}

can crash on nulls:

Generic values (AMIFilterTool)
================================
java.lang.NullPointerException
	at org.contentmine.ami.tools.AbstractAMITool.getCProjectDirectory(AbstractAMITool.java:427)
	at org.contentmine.ami.tools.AbstractAMITool.validateCProject(AbstractAMITool.java:247)
	at org.contentmine.ami.tools.AbstractAMITool.parseGenerics(AbstractAMITool.java:219)
...

it's difficult to know which of the objects in the chain are null.
(a) Is'nt there something in later Java Versions which helps detect where?
(b) should we recurse down trapping each level?
(c) is <Optional> useful here?

Add usage examples to ami commands usage help message

Follow-up item coming out of discussion on #15 :

Examples

There are not enough examples in the subcommands. In the @Command section of each class we should have some examples. These should be runnable by copy paste so newcomers can run them to help understand how they work.

Change pdfbox options --pdfimages and --svgpages

Looking at pdfbox, this command has two options (--pdfimages and --svgpages) defined with arity="0..1" so they optionally take a parameter true or false. The default is true, I assume that is why the parameters are necessary.

I propose we change these to "negatable" options without parameters. So, --[no-]pdfimages and --[no-]svgpages respectively. Users would specify --no-pdfimages to stop the command from outputting pdf images. The default is unchanged.

What is the current command for ami image?

What is the current syntax for running ami image? I tried:

Matthews-MBP-5:open-battery matthew$ ami -p liion image

Exception in thread "main" java.lang.NoSuchMethodError: 'picocli.CommandLine picocli.CommandLine.setUnmatchedOptionsAllowedAsOptionParameters(boolean)'
	at org.contentmine.ami.tools.AMI.createCommandLine(AMI.java:170)
	at org.contentmine.ami.tools.AMI.main(AMI.java:113)

Document how to create a release and publish a package

As a result of the work done for #24 and #51 we now have a way to create releases and publish packages.

Document:

the steps for creating releases and publishing packages
how this works under the hood (with links to relevant docs for further detail)

Bump picocli to 4.4.0

This version has some parser improvements that allow applications to be more strict and reject unknown options as option values ( a feature request @petermr made about a year ago finally landed).

Also has support for abbreviated commands and/or options if we want to use this in AMI.

upload latest jars

Some resource file names too long for Windows

There are 2 resource files with names that are very long.
These files cause issues in my Windows working environment.

Would it be possible to rename or remove them?

The files are:

src/test/resources/org/contentmine/ami/tools/download/redalyc/Building an Infrastructure to Support Researchers - An Interview with Redalyc's Arianna Becerril _ ORCID_files/css__BJ6Ou6QsBRtnFTmxaakamOIS8n4QswDP2XnnZ1sxtaM__NBuvkP6eInGIkb1aJvUHx5PX79XApuxBDkk_77W5tYk__W4wyxLgayc6FNI5lRMpGt5NrINyeZ7VVotqXDSmjiKw.css
src/test/resources/org/contentmine/ami/tools/download/redalyc/Building an Infrastructure to Support Researchers - An Interview with Redalyc's Arianna Becerril _ ORCID_files/css__srZBnwQ7-NfW9Wb3hrFIsa6jwF1ImQeTmCM-iG2pQ7A__zbwffEQw83GIj6DPf4tkLMnDLuKxIdVpNn_Syxakyo0__W4wyxLgayc6FNI5lRMpGt5NrINyeZ7VVotqXDSmjiKw.css

Manage ami version

Currently, ami --version does not show any useful output.
There is a need for better output to facilitate support and troubleshooting.

From the OpenVirus Slack:

Peter Murray-Rust 6:05 PM
@remko Popma I need to make ami --version give a meaningful response. I don't use major/minor releases, though perhaps we should have a release strategy. Are there any tools to help automate this. (Currently I refer to commits by their git-hash and jar files by their date).

Remko Popma 6:12 PM
We can include version information as part of the build. But the version number/string needs to come from somewhere (perhaps a version.properties file next to pom.xml ) and that is usually manually maintained - meaning the version number is bumped up when some meaningful increment in functionality was added. (edited)
6:13
In our case, since our users build from source, we may need to think about some variation of that... We don't want to automatically change this version with every build, because then ami users would see a different version with each rebuild... (edited)
6:15
One simple solution for now is to modify AMI.java and have @Command(name = "ami", version = "SOME NUMBER/STRING" - and update this as we add functionality. (NUMBER/STRING could be a date/timestamp). We can migrate to a version provider that reads from a version.properties file later. (edited)

Peter Murray-Rust 6:46 PM
Thanks @remko Popma.
The immediate reason was so that we could ask users what version they were running. So we can diagnose the --validate problems as only occurring before a certain version.

Publish GitHub Release from `release.bash` script

See if it is feasible/what is involved to publish a GitHub Release from the release.bash script.

That would make the release.bash script a true "one push" button that does the full release all the way up to publishing the distribution zip with ami binaries to GitHub Packages.

This likely involves making a curl call to the GitHub REST API.

Release Notes are an open issue here.

Now that we have more formal releases, I think it makes sense for each release to have release notes. This could be as simple as a summary and a list of issues fixed.

In the picocli project I track all changes in GitHub issues and group them in "Milestones" that correspond to releases. I have a separate RELEASE_NOTES.md file in the project where I write a post for each release. This is quite a lot of work, we can be less formal, but some release notes would be nice and we should think about where to store them.

Update Log4j from 1.2.x to 2.x

Migration from Log4j 1.2 to Log4j 2:
This will change in many files :

// current
import org.apache.log4j.Logger;
...
public static final Logger LOG = Logger.getLogger(DictionaryCreationTool.class);

becomes:

// after
import org.apache.logging.log4j.Logger;
import org.apache.logging.log4j.LogManager;
...
public static final Logger LOG = LogManager.getLogger(DictionaryCreationTool.class);

so mostly a package change. The Logger API then has some new methods that we may use later on.

Introduce top-level `ami` command

Background

Proposed Change

I propose that we introduce a top-level ami command and make the existing commands subcommands of that top-level command.

Benefits

This presents newcomers with a single command (ami) instead of the 28 or so top-level commands that currently exist. Similar to git, the top-level command becomes the entry point from where users can browse the documentation and find commonly used subcommands.

This opens possibilities for grouping subcommands in the usage help, perhaps by workflow (commands that are commonly used together), or by the type of work they perform.

The top-level command can provide global options, like a directory for processing documents.

Also, it would let us leverage picocli's repeatable subcommands feature for running multiple commands sequentially in a single JVM (without starting separate processes).

Drawbacks

Potentially this would break existing scripts that rely on the ability to invoke ami-xxx tasks as top-level commands.

Potentially, the impact can be reduced by keeping the old command name as an alias, but the introduction of a global option on the parent command especially may introduce a dependency on the top-level command that could break existing scripts.

petermr / ami3 Goto Github PK

ami3's Introduction

petermr repositories

discussion lists

active repos

active Python projects:

basicTest

presentations

pygetpapers

notebook

docanalysis

wikidata

ami3's People

Contributors

Stargazers

Watchers

Forkers

ami3's Issues

Problem Description

Analysis

Solutions

Idea 1: move real shared options to the ami top-level command

Idea 2: use mixins instead of inheritance

Idea 3: custom help

Current Deployment Mechanism

Goals

Note on version management:

Required Changes

Proposed New Release Workflow

Summary

1. User-specified verbosity

Problem

Solution

2. Logging vs. console messages

Problem

Solution

Drawback

3. Interrupting execution

Problem

Solution

Some current bugs

bug list

Background

Proposed Change

Benefits

Drawbacks

cloudconvert

pdf2svg

PDFBox-AMI

migrate PDFRendering and PageDrawer to ami

Examples

Background

Proposed Change

Benefits

Drawbacks

Recommend Projects

Recommend Topics

Recommend Org

Idea 1: move real shared options to the `ami` top-level command