Coder Social home page Coder Social logo

cicirello / javapermutationtools Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 11.0 6.5 MB

A Java library for computation on permutations and sequences

Home Page: https://jpt.cicirello.org/

License: GNU General Public License v3.0

Java 100.00%
permutations sequences permutation-distance-metrics string-distance permutation-distance edit-distance

javapermutationtools's Introduction

JavaPermutationTools (JPT): A Java library for computation on permutations and sequences

JavaPermutationTools - A Java library for computation on permutations and sequences

Copyright (C) 2018-2024 Vincent A. Cicirello.

Website: https://jpt.cicirello.org/

API documentation: https://jpt.cicirello.org/api

Publications About the Library DOI
Packages and Releases Maven Central GitHub release (latest by date)
Build Status build docs CodeQL
JaCoCo Test Coverage coverage branch coverage
Security Snyk security score Snyk Known Vulnerabilities
DOI DOI
Other Information GitHub style
Support GitHub Sponsors Liberapay Ko-Fi

How to Cite

If you use this library in your research, please cite the following paper:

Cicirello, Vincent A (2018). JavaPermutationTools: A Java Library of Permutation Distance Metrics. Journal of Open Source Software, 3(31), 950. https://doi.org/10.21105/joss.00950 .

Overview

The JavaPermutationTools (JPT) library provides Java classes and interfaces, etc that enable representing and generating permutations and sequences, as well as performing computation on permutations and sequences. It includes implementations of a variety of permutation distance metrics as well as distance metrics on sequences (i.e., Strings, arrays, and other ordered data types).

Java 17+

We currently support Java 17+. See the following table for mapping between library version and minimum supported Java version.

version Java requirements
4.w.x to 5.y.z Java 17+
3.x.y Java 11+
1.x.y to 2.x.y Java 8+

The jar files of the library are released via Maven Central, GitHub Packages, and GitHub Releases.

Versioning Scheme

The JPT uses Semantic Versioning with version numbers of the form: MAJOR.MINOR.PATCH, where differences in MAJOR correspond to incompatible API changes, differences in MINOR correspond to introduction of backwards compatible new functionality, and PATCH corresponds to backwards compatible bug fixes.

Building the Library (with Maven)

The JavaPermutationTools library is built using Maven. The root of the repository contains a Maven pom.xml. To build the library, execute mvn package at the root of the repository, which will compile all classes, run all tests, run javadoc, and generate jar files of the library, the sources, and the javadocs. All build outputs will then be found in the directory target.

To include generation of a code coverage report during the build, execute mvn package -Pcoverage at the root of the repository to enable a Maven profile that executes JaCoCo during the test phase. The JaCoCo report will also be found in the target directory.

To run all static analysis tools (i.e., SpotBugs, Find Security Bugs, refactor-first), execute mvn package -Panalysis to enable a Maven profile that executes the various static analysis tools that we are using. The SpotBugs html report will be found in the target directory, or you can use the SpotBugs GUI with: mvn spotbugs:gui -Panalysis. The refactor-first report will be found in the target/site directory.

To run all of the above: mvn package -P "analysis,coverage".

Example Programs

There are several example programs available in a separate repository: cicirello/jpt-examples. The examples repository contains example usage of several of the classes of the library. Each of the examples contains detailed comments within the source code explaining the example. Running the examples without reading the source comments is not advised. Some of the example in the examples repository are based on the experiments from published papers that have either used the library directly, or which led to some of the code in the library.

Java Modules

This library provides a Java module, org.cicirello.jpt. To use in your project, add the following to your module-info.java:

module your.module.name.here {
	requires org.cicirello.jpt;
}

This module includes the org.cicirello.permutations and org.cicirello.sequences packages as well as their subpackages. See the API documentation for details of all packages included in this module.

Beginning with version 3.0.0, randomization and other math utilities, and some generic utilities, have been moved to a pair of new libraries ρμ and org.cicirello.core, which are now dependencies of JavaPermutationTools. Your dependency manager (see next section) will handle downloading these for you.

If you are directly utilizing the functionality of the dependencies, then you may instead need the following:

module your.module.name.here {
	requires org.cicirello.jpt;
	requires org.cicirello.rho_mu;
	requires org.cicirello.core;
}

Importing the Library from Maven Central

Add this to the dependencies section of your pom.xml, replacing the version number with the version you want to use.

<dependency>
  <groupId>org.cicirello</groupId>
  <artifactId>jpt</artifactId>
  <version>5.0.0</version>
</dependency>

Importing the Library from Github Packages

If you'd prefer to import from Github Packages, rather than Maven Central, then: (1) add the dependency as indicated in previous section above, and (2) add the following to the repositories section of your pom.xml:

<repository>
  <id>github</id>
  <name>GitHub cicirello Apache Maven Packages</name>
  <url>https://maven.pkg.github.com/cicirello/JavaPermutationTools</url>
  <releases><enabled>true</enabled></releases>
  <snapshots><enabled>true</enabled></snapshots>
</repository>

Downloading Jar Files

If you don't use a dependency manager that supports importing from Maven Central, or if you simply prefer to download manually, prebuilt jars are also attached to each GitHub Release. If you manually download jar files, make sure you also get the relevant versions of the dependencies. The simplest way to do this is to import from Maven Central, which will obtain the relevant dependencies automatically.

License

The JPT library is licensed under the GNU General Public License 3.0.

Contribute

If you would like to contribute in any way, such as reporting bugs, suggesting new functionality, or code contributions such as bug fixes or implementations of new functionality, then start by reading the contribution guidelines. This project has adopted the Contributor Covenant Code of Conduct.

javapermutationtools's People

Contributors

cicirello avatar dependabot[bot] avatar github-actions[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

javapermutationtools's Issues

Version of the Permutation.apply methods with validation

Summary

The Permutation.apply methods apply custom operators to the Permutation or a pair of Permutations depending upon which of these apply methods is called. They are clearly documented that it is the responsibility of the custom operators to ensure that the result is a valid permutation, with a warning that otherwise the permutation may become unstable.

Some users of the library may be more comfortable with an apply that also validates the result. Thus, this request is for a counterpart to each of the apply methods that validates the result, throwing an exception if it produces an invalid permutation.

Refactor SequenceDistanceMeasurer and SequenceDistanceMeasurerDouble into hierarchy

The SequenceDistanceMeasurer and SequenceDistanceMeasurerDouble interfaces are closely related. Any implementation of the former (distances computed as ints) have a natural equivalent of the latter (computed as doubles). This is why there is an abstract base class (with private access) that implements both and is the base class for several of the distance metrics currently in the library. The only thing that abstract base class does, however, is implement the distancef methods of SequenceDistanceMeasurerDouble by delegating to the distance methods of SequenceDistanceMeasurer. i.e., it is really just providing default implementations of the distancef methods.

This Issue proposes to have SequenceDistanceMeasurer extend SequenceDistanceMeasurerDouble, and to move all of the distancef implementations from the existing abstract base class into SequenceDistanceMeasurer as default implementations. Once this is done the abstract base class can be removed. This is a non-breaking change because the abstract base class is currently private so only the library is affected. The classes currently extending that abstract base class will instead simply implement SequenceDistanceMeasurer.

Fix ReversalDistance.max for general case

The max method of ReversalDistance is currently implemented for specific length permutation configured in constructor. Remove exception currently thrown in other cases and replace with actual max. Found old paper in SIAM Journal on Computing that proved max reversal distance between permutations of length n is n - 1. It had apparently been an open conjecture for some time previously.

Investigate publishing a jar-with-dependencies

Summary
Investigate feasibility of publishing a jar-with-dependencies to ease transition for those who manually download the library's jar file. This may not be feasible since we are now utilizing Java modules, which require no more than 1 module per jar file. A jar-with-dependencies will violate that, but should still work if used from class path rather than module path (which is likely what such a user is probably doing).

Feature Req: A method for creating a permutation cycle

Is your feature request related to a problem? Please describe.
This feature request is related to planned functionality in the Chips-n-Salsa library, which depends upon JavaPermutationTools for permutation representation.

Describe the solution you'd like
A method in the Permutation class that generates a permutation cycle from a set of indexes. A permutation cycle is a similar concept to a swap, but generalized to k elements. A possible method signature might be something like: cycle(int[] indexes).

Describe alternatives you've considered
A permutation cycle can be created by using a sequence of calls to the swap method. However, to create a k-cycle, you would need k-1 calls to swap, each swap call executes 3 assignments, for a total of 3k-3 assignments. A dedicated cycle method can accomplish the same thing with only k+1 assignments, cutting the number of assignments down to a third. But this would need to be within the Permutation class.

Update Sonatype Lift configuration

Summary

The configuration file for Sonatype Lift includes an unnecessary exclusion for the source code of the test cases. The default exclusions should cover that. Additionally, it includes some obsolete exclusions that were previously needed when the javadocs were served via GitHub Pages from the docs directory of the repository. Now that it is in a separate branch those exclusions are not needed.

BUG: Test cases for RandomIndexer

Describe the bug
Boundary case in RandomIndexer.nextWindowedIntTriple is not always covered in all runs of test cases. This is not strictly a bug as there is currently no reason to believe that the method itself is buggy.

To Reproduce
Steps to reproduce the behavior:

  1. Run the build.
  2. Examine JaCoCo test coverage report.
  3. Go back to step 1 above if full coverage.
  4. Exhibits itself approximately once every 2 runs of the test cases.

Expected behavior
Test cases should consistently provide full coverage.

Feature Req: Enhanced functionality in the Permutation.Mechanic class

Is your feature request related to a problem? Please describe.
Planned new mutation operators for Permutations that are planned for the Chips-n-Salsa library, which depends upon JavaPermutationTools, can be implemented more efficiently if the requested feature was added to the Permutation.Mechanic class.

Describe the solution you'd like
In the Permutation.Mechanic class, a method with the following signature (or something similar): protected final void set(Permutation p, int[] source, int fromIndex, int toIndex, int numElementsToSet). The behavior of this method copy numElementsToSet elements from source beginning at fromIndex into the Permutation p beginning at index toIndex.

Describe alternatives you've considered
This can be accomplished with several calls to the existing set methods, but can be done more efficiently internally within the Permutation.Mechanic class.

Move package org.cicirello.util to separate library

Summary
The org.cicirello.util package isn't strictly related to the core content of the JPT library. Other of our own projects will need this package soon, but don't actually need any of the permutation and sequence metrics or other relevant functionality. Thus, if kept where it is, we'll end up with weird dependencies.

Create another library, org.cicirello.core, where commonly required utilities can reside, and then include that library as a dependency of jpt. Remove org.cicirello.util from jpt once this is complete.

Move general math related classes to separate library

Summary
Package org.cicirello.math, and all of its subpackages, contain classes related to various math concepts (e.g., basic statistics, linear algebra, etc). None of it is actually needed or used (with the exception of the random number generation classes of package org.cicirello.math.rand) by the core functionality of the JPT library (i.e., by the permutation distance metrics, sequence distance metrics, etc). Most of this was included in this library in support of the sample programs.

There is another issue (#150) related to creating a library for the random number generation classes. Either include the rest of the org.cicirello.math hierarchy in that proposed library, or extract it to a different new library.

The benefit to a different new library is that JPT will need the random number lib as a dependency, but not the rest of it. The benefit to just rolling this into #150 is that there would be a single new library to maintain just with slightly broader scope (math rather than random number algs).

Either way this will be a breaking change, so schedule this with release 3.0.0. Less severe of a break if integrated into #150 since that lib will become a dependency of JPT 3.0.0.

Utilize Java 17's RandomGenerator interface

Summary
The Permutation class has multiple constructors that differ only in the specific source of randomness (one that takes a Random and another that takes a SplittableRandom). It also has multiple versions of methods with a similar structure. In Java 17, a new interface, RandomGenerator was introduced, which all of the random number generator classes implement. Duplicated methods and constructors that differ only in specific random number generator should be consolidated using this new interface. It will reduce redundant code. And it will also enable users of the library to use some of the newer random number generators that were added to Java. This depends on #192.

Fix broken image link in README

Summary
Repo banner image recently moved to gh-pages branch with documentation website. The README references it in its old location. Fix link to image.

Refactor AbstractPermutationDistanceMeasurer

Refactor AbstractPermutationDistanceMeasurer to move overridden interface methods of PermutationDistanceMeasurerDouble into PermutationDistanceMeasurer as default methods, and to move overriden intrerface methods of NormalizedPermutationDistanceMeasurerDouble into NormalizedPermutationDistanceMeasurer as default methods.

Move the classes of the org.cicirello.math.rand package to a separate library

Summary
The classes of the org.cicirello.math.rand package provide efficient random number generation to JPT. They are, however, of more general use than the JPT library, potentially to other projects that don't actually need the rest of JPT. It is awkward to depend on JPT for its random number generation related utility classes if you don't actually need the permutation and sequence related functionality.

Extract a separate library of random number related utilities, and then add that new library as a dependency.

weighted kendall tau distance

Implement weighted kendall tau distance as described in: "Failure proximity: a fault localization-based approach" (Liu and Han, SIGSOFT 2006, pages 46-56).

Transition to Java 11

Summary
Library currently supports Java 8+. Transition to Java 11 to gain access to newer language features.

Cycle Edit Distance

Summary

Add an implementation of Cycle Edit Distance as described in the article: Vincent A. Cicirello. Cycle Mutation: Evolving Permutations via Cycle Induction. Applied Sciences, 2022 (in press).

Documentation website: bug in amp cache

Google's amp cache doesn't seem to recognize &percnt; as the percent sign. Looks fine served from gh pages, but amp cache leaves character encoding as if verbatim text.

Revise docs.yml workflow to deploy to web on releases

Summary
Revise docs.yml workflow to deploy API docs changes to the web on releases and workflow dispatches, and NOT on pushes (just build docs to confirm they build on pushes). Push API changes directly to gh-pages rather than current PR. This depends on #162 .

Investigate and correct (if necessary) warnings from initial run of MuseDev

Describe the bug
We recently enabled MuseDev's static analysis tool on the repository. Investigate whether any of the identified issues are actual bugs, and correct if necessary. Additionally, make any code improvements motivated by the MuseDev report (e.g., eliminating any warnings generated by ErrorProne).

Cycle distance

Summary

Add an implementation of Cycle Distance as described in the article: Vincent A. Cicirello. Cycle Mutation: Evolving Permutations via Cycle Induction. Applied Sciences, 2022 (in press).

Refactor SequenceSampler to reduce redundancy

Summary

While editing the SequenceSampler class for other reasons, I discovered that there are several places where code is repeated, providing opportunities to refactor by extracting the common code to methods.

Update documentation for dependency changes and Java version changes

Summary
With the randomization and other math utilities moved to a separate library, and the util package likewise moved to a separate library, the documentation (website, repo README,md) should be updated to explicitly notify users of the JPT library. Users of dependency management tools shouldn't need to do much since their tool, such as mvn will just pull in the new libraries transitively. But anyone manually downloading jars might be surprised.

Must also clarify in docs that minimum supported Java is now 11.

Restructure directory hierarchy to maven default

Summary
We originally used ant as the build tool. When we switched to maven, we kept existing directory structure altering maven's directories via the pom.xml. Reorganize directories to use typical maven structure for consistency with other projects.

Feature Req: A non-contiguous scramble method in Permutation class

Describe the problem
This issue relates to a library that uses JPT as a dependency. It requires the ability to randomize the ordering of a non-contiguous set of permutation elements. The existing scramble methods either randomize the entire permutation, or randomize a contiguous set of elements (between a pair of indexes).

Describe the solution you'd like
A method in Permutation for randomizing a non-contiguous set of elements. Should have a parameter that indicates the indexes of the set of elements to reorder (e.g., an array of ints). For consistency with the existing scramble methods, should include 3 versions of method, one that has a Random parameter, one that has a SplittableRandom parameter, and one that uses a default source of randomness.

Describe alternatives you've considered
This can be done externally from the Permutation class with several calls to the swap method, but if done within the Permutation class can be done more efficiently.

k-Cycle distance

Summary

Add an implementation of k-Cycle Distance as described in the article: Vincent A. Cicirello. Cycle Mutation: Evolving Permutations via Cycle Induction. Applied Sciences, 2022 (in press).

Inefficiency in WeightedKendallTauDistance

Describe the bug
The way in which WeightedKendallTauDistance currently computes the weighted inversions within the modified mergesort potentially impacts worst case runtime. When an inversion is detected, it currently iterates over remaining left half to sum weights. This can lead to redundant work if the next element in right half is also part of a sequence of inversions. This can be done more efficiently if sum of left half weights is determined first, and modified as left half elements are chosen during the modified mergesort.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.