Coder Social home page Coder Social logo

seriate's Introduction

seriate

Optimal ordering of elements in a set given their distance matrix.

Travis build status Code coverage PyPi package status stability: stable Apache 2.0 license

example

OverviewHow To UseContributionsLicense

Overview

This is a Python implementation of Seriation algorithm. Seriation is an approach for ordering elements in a set so that the sum of the sequential pairwise distances is minimal. We state this task as a Travelling Salesman Problem (TSP) and leverage the powerful Google's or-tools to do heavy-lifting. Since TSP is NP-hard, it is not possible to calculate the precise solution for a big number of elements. However, the or-tools' heuristics work very well in practice, and they are used in e.g. Google Maps.

Any numpy.roll-ed result is equivalent.

How To Use

import numpy
from scipy.spatial.distance import pdist
from seriate import seriate

elements = numpy.array([
    [3, 3, 3],
    [5, 5, 5],
    [4, 4, 4],
    [2, 2, 2],
    [1, 1, 1]
])

print(seriate(pdist(elements)))

# Output: [4, 3, 0, 2, 1]

The example above shows how we order 5 elements: [3, 3, 3], [5, 5, 5], [4, 4, 4], [2, 2, 2] and [1, 1, 1]. The result is expected:

  1. [1, 1, 1]
  2. [2, 2, 2]
  3. [3, 3, 3]
  4. [4, 4, 4]
  5. [5, 5, 5]

pdist from scipy.spatial.distance uses Euclidean (L2) dstance metric by default, so the distance between [x, x, x] and [x + 1, x + 1, x + 1] is constant: √3. Any other distance is bigger, so the optimal ordering is to list our elements in the increasing norm order.

Contributions

Contributions are very welcome and desired! Please follow the code of conduct and read the contribution guidelines.

License

Apache-2.0, see LICENSE.md.

seriate's People

Contributors

vmarkovtsev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

seriate's Issues

Input to function seriate

Hi,

In the set of comments given in seritae.py, it is mentioned that
:param dists: Either a condensed pdist-like or a symmetric square distance matrix.

Does that mean a correlation matrix shouldn't be used as input? Should the correlation matrix be
converted to a distance matrix?

Segmentation error

When I try import seriate within a conda env, I am getting the error below:

Fatal Python error: Segmentation fault

Current thread 0x00007fec5a5a7740 (most recent call first):
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 1043 in create_module
File "<frozen importlib._bootstrap>", line 583 in module_from_spec
File "<frozen importlib._bootstrap>", line 670 in _load_unlocked
File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 983 in _find_and_load
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1035 in _handle_fromlist
File "/nas/longleaf/home/lobanov/anaconda3/envs/python7bucket/lib/python3.7/site-packages/google/protobuf/descriptor.py", line 47 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 728 in exec_module
File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 983 in _find_and_load
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1035 in _handle_fromlist
File "/nas/longleaf/home/lobanov/anaconda3/envs/python7bucket/lib/python3.7/site-packages/ortools/constraint_solver/routing_enums_pb2.py", line 6 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 728 in exec_module
File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 983 in _find_and_load
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1035 in _handle_fromlist
File "/proj/dschridelab/ddray/seriate/seriate.py", line 6 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 728 in exec_module
File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 983 in _find_and_load
File "<stdin>", line 1 in <module>
Segmentation fault (core dumped)

I meet all the requirements listed on the requirements.txt and was previously using the package about a week ago

Attribute error while trying to seriate data

Hi,
I am using the following code

import os
import numpy
import pickle
from seriate import seriate
from scipy.spatial.distance import pdist

def serialize_data(f_input):

    if os.path.exists(f_input):
        with open(f_input, "rb") as f:
            df = pickle.load(f)
            pprint(df.head())
            elements = df.values
            pprint(elements)
            print(seriate(pdist(elements)))


if __name__ == '__main__':
    f_input = #input_file,
    serialize_data(f_input)

I get the following error while trying to seriate data using this input file,

index = assignment.Value(routing.NextVar(index))
AttributeError: 'NoneType' object has no attribute 'Value'

I am not sure what's causing the error, I could use the same input and plot a heatmap.
Could you please look into this?

Errors and warnings

Just wondering aloud if these errors, which I believe ultimately come from seriate, could be obviated. I'll try to put together reproducible example but perhaps this is already known to you?

    E   TypeError: Descriptors cannot not be created directly.
    E   If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
    E   If you cannot immediately regenerate your protos, some other possible workarounds are:
    E    1. Downgrade the protobuf package to 3.20.x or lower.
    E    2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
    E   
    E   More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

ortools 8

Just as FYI this fails for ortools-8.0.8283 and above, seemingly

       File "/Users/petercotton/virtual-envs/parity/lib/python3.9/site-packages/seriate.py", line 35, in seriate
          routing = pywrapcp.RoutingModel(size + 1, 1, size)
        File "/Users/petercotton/virtual-envs/parity/lib/python3.9/site-packages/ortools/constraint_solver/pywrapcp.py", line 4714, in __init__
          _pywrapcp.RoutingModel_swiginit(self, _pywrapcp.new_RoutingModel(*args))
      TypeError: Wrong number or type of arguments for overloaded function 'new_RoutingModel'.
        Possible C/C++ prototypes are:
          operations_research::RoutingModel::RoutingModel(operations_research::RoutingIndexManager const &)
          operations_research::RoutingModel::RoutingModel(operations_research::RoutingIndexManager const &,operations_research::RoutingModelParameters const &)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.