The Playground – My Optimized Solution

This is an informal description of my solution to an internal programming competition in my algorithms and data structures course. The competition has ended, but at the time of writing winners have not been announced. I hold second place based on test cases behind an implementation in Python, so this can’t be optimal, but not that bad either. It turns out the first place, which deserves to win, combined setting up the graph and running the flow algorithm. This way the flow network is constructed lazily and a lot of the work I spend on setting up the expensive A3-activities is saved.

The code is a single file and should built easily with both GCC and Clang, I use the following with varying degrees of optimization:

clang -g -O2 -Weverything -std=c11 playground.c -o playground

Unfortunately GitHub doesn’t support org files fully, so the references don’t work. I hope the it’s clear from context what is meant. If not, the document is also available as HTML, but with different styling.

This commit describes the solution I handed in. In this past week since that, I’ve made some improvements; it is this improved solution that I will describe below as it also has the benefit of being clearer.

1 The Problem

The original problem statement is available online, I will describe the problem here as well.

The program is given a list of platforms on standard input. I will use the input in listing text:platforms as a running example.

8
0 1 4 0 2 0
2 2 1 2 3 4
4 1 3 1 1 0
4 3 1 0 0 1
6 4 0 0 1 1
9 1 1 0 0 2
9 4 2 0 0 1
9 3 0 0 0 0

The first line is the number of platforms, $P$, the next $P$ lines are the platforms specified by six parameters: x- and y-coordinate and four capacities. The platforms are implicitly numbered by their position in the input and their positions are distinct. We’re given at least two platforms and at most 100.000. Both coordinates and capacities are between 0 and 100.000. The input is illustrated in figure fig:platforms.

Now the thing is that we need to turn this set of platforms into a playground, where you enter at platform 0 and leave from platform $P-1$. To do this we will build activities from one platform to another. There are four kinds of activities, and the four capacities specify how many of each of these can be build from a specific platform. A trail is a path via a unique set of activities from the start-platform to the end-platform and the task is to maximize the number of trails. The only output produced is the maximum number of trails; we don’t need to print how activities are placed etc.

The four different activities are (taken almost verbatim from the problem description):

NEWS^{:North, East, West, South} Slide (A1, red): Can be built from a platform A to any other platform with the same x- or y-coordinate as A. That is, any platform on the same horizontal or vertical line as A.
Human Cannon (A2, blue): Can be built from a platform A to the platform furthest away from A in euclidean distance. In case of a tie, the platform that appeared first in the input wins.
Platform Trampoline (A3, green): Can be built from a platform A to a platform B if there are at least 2 platforms on the line segment between A and B.
EOF^:End-Of-Fun Wormhole (A4, orange): Can be built from any platform directly to the end-platform.

These are illustrated (without capacities) one by one on figures fig:a1, fig:a2, fig:a3, fig:a4 and together on figure fig:a1234.

2 Modeling the Problem

We are going to solve this a maximum flow problem, and I’ll assume the reader is familiar with these.

If one can follow activities from the start-platform to the end-platform then this corresponds to flow from the source to the sink of the corresponding flow network. The maximum flow through this network will be the maximum number of trails. Actually figure fig:a1234 is exactly this flow network for the sample input, so the correspondence is very direct.

The problem with the graph in figure fig:a1234 is that it has vertex capacities instead of edge capacities that an off-the-shelf maximum flow algorithm expects. So we need to transform the graph a bit before we can use it.

The technique for converting vertex capacities into edge capacities is well known. The trick is to insert a dummy vertex as in figure fig:vertexcaps.

Any incoming edges will still go to the original vertex, but any outgoing will start at the new vertex instead. Now to get from the old vertex to the new vertex, the flow must cross an edge with the capacity of the old vertex. This is equivalent to the vertex having the capacity.

So it’s easy to convert a graph where vertices have a single capacity to one with edge capacities instead, but our graph has four vertex capacities. It turns out that we can just insert four dummy vertices instead of one and it works. Our platforms from figure fig:platforms now look like in figure fig:edgecaps. Notice how the original capacities can now be read off the edges instead of the vertices.

It turns out that because only one A2- and A4-activity per platform can be constructed, the dummy vertices for these are redundant: There will just be one edge to it and one from it with the same capacity. Therefore we only inserts dummmys for A1 and A3 as in fig:edgecaps2.

The numbering scheme is that the A1-dummy for a platform $x$ will be numbered $x+P$ and the A3-dummy numbered $x+2⋅ P$.

The final graph which we can run a standard flow algorithm on is given in figure fig:a1234all. The original vertices are black, as are their edges to the new vertices. The new vertices are colored by the activity whose capacity they match. Compare this to one of the first four graphs: All edges are still there, they just start at one of the new nodes, but still go to an original node. The outgoing edges of the new dummy vertices without labels can have unlimited capacity or the same capacity as the corresponding platform.

I’m using Edmonds-Karp to find the actual maximum flow, a possible solution with maximum flow 5 can be seen on figure fig:sol. One trail consists of going from 0 to 4 with an A3-activity, of which 2 can be built from 0, and then from 4 to 7 with an A4-activity. Another trail uses A1-activities to go from 0 to 5 to 7, and so on.

3 Setting up the Graph

Now we know how to model the problem as something we can solve with an off-the-shelf algorithm, so we just need to set up this flow network efficiently.

I will simply discuss constructing an activity between two platforms, so just remember that this actually means from the proper dummy vertex of the first platform to the other platform as described in the previous section.

3.1 NEWS Slide (A1, red)

We need to construct edges between all platforms sharing either an x- or y-coordinate.

By sorting the platforms twice, by x- and y-coordinate separately, all platforms on the same vertical or horizontal line, respectively, will be placed next to each other. Then we can iterate through the array and construct the activities in $O(P^2)$ time worst-case, but $Ω(P)$ in the best when no platforms share a coordinate (excluding the sorting).

We can do this with the following algorithm assuming sorted platforms in the array ps with size P.

for i = 0..P:
    p = ps[i];
    for j = i+1..P while ps[j].x == p.x:
        insert edge from from p to ps[j] and from ps[j] to p

So first we sort by y and run the above for the vertical activities, then by x (with y breaking ties for later) and run the above for the horizontal ones. We sort twice, but need the sorted array for both A2- and A3-activities, so the work is well spent. We could use a hash map with direct addressing for a better best case, but that would involve allocating memory for all the platforms again. Quicksort sorts in-place, which I think is nicer.

3.2 Human Cannon (A2, blue)

Because the platform furthest away from any other will always be part of the convex hull, it’s enough to only check the platforms that are part of the convex hull as candidates for the A2-activity. This is faster since the amount of platforms in the convex hull is likely to be significantly smaller than $P$. Of course the worst case is still $O(P^2)$ if the platforms are all part of the convex hull.

Constructing the convex hull can be done in linear time given sorted points and we can take advantage of this, because the points were sorted to solve A1-activities.

Then for each platform we see which platform on the convex hull is furthest away and construct an A2-activity to that.

As a detail $\sqrt{x}> \sqrt{y}$ implies $x> y$, so we can save the square root and compare manhattan instead of euclidean distances.

3.3 Platform Trampoline (A3, green)

The algorithm for constructing these activities is actually quite simple, but it took me a while to figure it out. A partial run is animated on figure fig:slopes.

What we want to do is consider each platform in order. Then when considering a platform, we want to consider a different one and as efficiently as possible, determine whether it is legal to make an A3-activity between the two.

The trick is to look at the platforms, not in the order they’re given, but from left to right, and from the bottom up. The order we sorted them in previously. And furthermore to only look at platforms to the right and up, when already considering one. This guarantees that we look at platforms on the same line in order by their distance. That is, when multiple platforms lie on the same line given an origin platform, we will see the closest one first, then the second-closest etc.

The algorithm then becomes

(Sort the platforms by x- then y-coordinate (left to right, bottom up))
Initialize an empty hashtable
For each platform p
- For each platform q ahead of p in the sorted order
  - Calculate the slope between p and q
  - Look up the slope in our hash table
  - If the value associated with the slope is 2 (or more), draw an A3-activity from $p$ to $q$ and from q to p
  - Otherwise, increase the count
- Clear the hash table

We have to construct the activity in both directions because we only look ahead in the sorted order.

Looking at figure fig:slopes, the count associated with each slope is noted next to its blue line. Platforms are marked red when visited but no activity is built and green when one is. We see that the only platform to which an A3-activity can be built from 0 is 4 as expected.

With expected constant time lookup in the hash table, this runs in $Θ(P^2)$ time. That’s the best upper bound we can hope for, as there might be upwards of $P^2$ legal A3-activities; consider the case where all platforms lie on a single line. Unfortunately this is also the lower bound of the algorithm: we always spend $O(P^2)$ time, even if no A3-activities can be constructed. I would love to solve this with a lower bound of $Ω(Plog P)$ or something instead, but can’t figure out how.

This is where my competitors method of constructing these lazily really matters.

3.4 EOF Wormhole (A4, orange)

Here we just construct an A4-activity from each platform to the last one. Also we insert the dummy vertices while we’re iterating through the platforms anyway.

This of course takes linear time in the number of platforms.

4 Annotated Code

After setting up the graph as described above, it really is just a matter of running Edmonds-Karp or another maximum flow algorithm. I won’t go into details with that, instead I have annotated the source code below, so the above discussion becomes a bit more concrete.

I’ve chosen to include the entirety of the code, just under 600 lines, so feel free to skip a section or two.

Because the code has been split up by my commentary, the indentation is somewhat lost; I hope it’s readable regardless. If not, the entire source file is included with the repository.

4.1 Includes

CodeJudge is the online system used, among other things, to test the submissions. First I disable assertions when running on CodeJudge for performance. Also, it’s a Linux box so the time header has a different path than on my Mac. This checking should really be more robust (ie. using __APPLE__) but it doesn’t really matter.

#ifdef CODEJUDGE
#define NDEBUG
#endif

#include <stdlib.h>
#include <stdbool.h>
#include <assert.h>
#include <limits.h>
#include <stdint.h>
#include <string.h>
#include <math.h>
#include <stdio.h>

#ifdef CODEJUDGE
#include <time.h>
#else
#include <sys/time.h>
#endif

The definitions are, in order: the maximum number of characters on a given line of input rounded to a nice number, when to switch from quicksort to insertion sort, and how many vertices are in the flow network per platform (see above).

Finally there’s a macro for calculating the difference in milliseconds between two timeval s. This is used for timing different parts of the code.

#define MAX_LINE 42
#define SORT_CUTOFF 16
#define VERTEX_FACTOR 3

#define MS(S, T) (T.tv_sec - S.tv_sec) * 1000.0 + (T.tv_usec - S.tv_usec) / 1000.0

4.2 Structs

What’s nice about C is that data and functionality is separated. These five types will be used throughout the rest of the program.

One thing to note is the bit-flag for an Edge to see if it points forwards or backwards. We can afford this, since capacity is at most 100.000 by the problem description which fits easily in 31 bits and I’ve found this to be the easiest way to represent the residual flow network.

Also note that I’m using adjacency tables for my graph representation instead of linked lists, to improve cache performance. It made a surprising difference for the breadth-first search.

x, y for a Platform is int32_t even though they’re never negative, to save a cast later.

Some of these members could actually be marked const according to both Clang and myself, but then GCC won’t compile it…

/*
 * STRUCTS
 */

typedef struct Edge {
  uint32_t from;
  uint32_t to;
  uint32_t flow;
  uint32_t capacity : 31;
  bool forwards : 1;
} Edge;

typedef struct Vertex {
  Edge * parent_edge;
  Edge * edge_list;
  size_t capacity;
  size_t size;
} Vertex;

typedef struct {
  Vertex * vertices;
  size_t const size;
} Graph;

typedef struct {
  uint32_t * const data;
  size_t head;
  size_t tail;
  size_t const capacity;
} Queue;

typedef struct {
  int32_t x, y;
  uint32_t n, a1, a2, a3, a4;
} Platform;

4.3 Prototypes

I like to mark as many things const as possible. That way, I opt in to mutation and get an error if I change anything accidentally.

The _alloc functions ended up taking pointers to pre-allocated memory instead of allocating themselves. This makes it easier to control the allocation and I get to share some of it between the graph setup and flow algorithm, but makes the name a bit odd.

LESS is an abbreviation for a pointer to a function comparing two platforms and specifying which should come first in the sorted order. Quicksort and insertion sort take this as a parameter, so we can sort by both x and y with the same function definitions.

/*
 * PROTOTYPES
 */

typedef bool (*LESS)(Platform, Platform);

Graph graph_alloc(Vertex * const vertices, size_t const V);
void graph_free(Graph * const G);
void insert_edge(Graph * const G, uint32_t const from, uint32_t const to, uint32_t const capacity, bool const forwards);

Queue queue_alloc(uint32_t * const data, size_t const capacity);
void enqueue(Queue * const Q, uint32_t const x);
uint32_t dequeue(Queue * const Q);
bool queue_is_empty(Queue const * const Q);
void queue_clear(Queue * const Q);

uint32_t min(uint32_t const a, uint32_t const b);
uint32_t max(uint32_t const a, uint32_t const b);

void insert_flow_edge(Graph * const  G, uint32_t const from, uint32_t const, uint32_t const capacity);
uint32_t edmonds_karp(Graph * const G, uint32_t const source, uint32_t const sink, uint32_t * const queue_data, uint8_t * const marked, uint32_t * const caps);

bool lessxy(Platform const a, Platform const b);
bool lessy(Platform const a, Platform const b);
void swap_platform(Platform * const a, Platform * const b);
void quicksort(Platform * const xs, int const lo, int const hi, LESS less);
int partition(Platform * const xs, int const lo, int const hi, LESS less);
void insertion_sort(Platform * const xs, size_t const len, LESS less);
void sort(Platform * const ps, size_t const P, LESS less);

uint32_t next_prime(uint32_t const a);
bool update_slope_count(uint32_t * const slopes, uint8_t * const counts, uint32_t const slopes_len, uint32_t const key);

int ccw(Platform * p1, Platform * p2, Platform * p3);
void convex_hull(Platform * points, size_t npoints, Platform ** const out_hull, size_t * out_hullsize);

4.4 Graph

The initial capacity for the adjacency tables is 32. This is found experimentally to be the fastest and doesn’t seem excessive in terms of memory use.

Inserting an edge is really easy with realloc, insert it, check if the size is equal to the capacity and if so double the capacity and reallocate.

/*
 * GRAPH
 */

Graph graph_alloc(Vertex * const vertices, size_t const V) {
  for (size_t i = 0; i < V; i++) {
    Edge * const edge_list = malloc(32 * sizeof *edge_list);
    vertices[i] = (Vertex) {.edge_list = edge_list, .size = 0, .capacity = 32, .parent_edge = NULL};
  }

  return (Graph) {.vertices = vertices, .size = V};
}

void graph_free(Graph * const G) {
  assert(G != NULL);
  for (size_t i = 0; i < G->size; i++)
    free(G->vertices[i].edge_list);
}

void insert_edge(Graph * const G, uint32_t const from, uint32_t const to, uint32_t capacity, bool forwards) {
  assert(G != NULL);
  Vertex * v = &G->vertices[from];

  v->edge_list[v->size++] = (Edge) {.from = from, .to = to, .flow = 0, .capacity = capacity, .forwards = forwards};

  if (v->size == v->capacity) {
    v->capacity *= 2;
    v->edge_list = realloc(v->edge_list, v->capacity * sizeof *v->edge_list);
  }
}

4.5 Queue

This queue is taken straight from CLRS. It’s used for the breadth-first search in Edmonds-Karp where we know the upper bound, 3P, of vertices enqueued and we know this doesn’t get too large. So we just pre-allocate a big enough chunk of memory and keep two indices into it: one for the head and one for the tail. Enqueuing and dequeuing is just a read/write and a modulo operation and more importantly, we can clear the queue in constant time by just setting these to 0.

/*
 * QUEUE
 */

Queue queue_alloc(uint32_t * const data, size_t const capacity) {
  return (Queue) {.capacity = capacity, .data = data, .head = 0, .tail = 0};
}

void enqueue(Queue * const Q, uint32_t const x) {
  assert(Q != NULL);
  assert(Q->head != (Q->tail+1) % Q->capacity);

  Q->data[Q->tail] = x;
  Q->tail = (Q->tail + 1) % Q->capacity;
}

uint32_t dequeue(Queue * const Q) {
  assert(Q != NULL);
  assert(!queue_is_empty(Q));

  uint32_t const x = Q->data[Q->head];
  Q->head = (Q->head+1) % Q->capacity;

  return x;
}

bool queue_is_empty(Queue const * const Q) {
  assert(Q != NULL);
  return Q->head == Q->tail;
}

void queue_clear(Queue * const Q) {
  Q->head = 0;
  Q->tail = 0;
}

4.6 Flow

Now the Edmonds-Karp algorithm. First two helper functions; it wouldn’t be C if you didn’t have to write everything yourself.

/*
 * FLOW
 */

uint32_t min(uint32_t const a, uint32_t const b) {
  if (a < b) return a; else return b;
}

uint32_t max(uint32_t const a, uint32_t const b) {
  if (a > b) return a; else return b;
}

Then the actual algorithm where I have chosen to inline the breadth-first search. We see how the forwards bit-flag is used to determine which way an edge goes, and thus whether we’re adding flow to it or letting some of the flow take a different route. Again, see Wikipedia for a description of the algorithm.

uint32_t edmonds_karp(Graph * const G, uint32_t const source, uint32_t const sink, uint32_t * const queue_data, uint8_t * const marked, uint32_t * const caps) {
  assert(G != NULL);

  Edge * head;
  Queue q = queue_alloc(queue_data, G->size);

  do {
    queue_clear(&q);
    enqueue(&q, source);
    caps[source] = UINT_MAX;
    marked[source] = 1;
    G->vertices[sink].parent_edge = NULL;

    while (!queue_is_empty(&q)) {
      uint32_t v = dequeue(&q);

      for (size_t i = 0; i < G->vertices[v].size; i++) {
        uint32_t u;
        uint32_t residual;
        Edge * cur = &G->vertices[v].edge_list[i];

        if (cur->forwards) {
          u = cur->to;
          residual = cur->capacity - cur->flow;
        } else {
          u = cur->from;
          residual = cur->flow;
        }

        if (residual > 0 && !marked[u]) {
          marked[u] = 1;
          G->vertices[u].parent_edge = cur;
          caps[u] = min(caps[v], residual);

          if (u == sink)
            goto done;

          enqueue(&q, u);
        }
      }
    }

  done:
    head = G->vertices[sink].parent_edge;

    while (head != NULL) {
      size_t idx;

      if (head->forwards) {
        head->flow += caps[sink];
        idx = head->from;
      } else {
        head->flow -= caps[sink];
        idx = head->to;
      }

      head = G->vertices[idx].parent_edge;
    }

    memset(marked, 0, G->size * sizeof *marked);

  } while (G->vertices[sink].parent_edge != NULL);

  int32_t sum = 0;
  for (size_t i = 0; i < G->vertices[source].size; i++) {
    Edge cur = G->vertices[source].edge_list[i];
    sum += cur.forwards ? cur.flow : -cur.flow;
  }

  return (uint32_t) sum;
}

Another little helper for adding edges in both directions and avoiding adding useless ones.

void insert_flow_edge(Graph * const  G, uint32_t const from, uint32_t const to, uint32_t const capacity) {
  assert(G != NULL);

  if (capacity == 0) return;

  insert_edge(G, from, to, capacity, true);
  insert_edge(G, to, from, capacity, false);
}

4.7 Sorting

I started out sorting a lot more than the one time I do now, so the sorting is optimized more than turned out to be necessary.

First we quicksort down to buckets of SORT_CUTOFF, 16, and then a single insertion sort is run over the entire array to put these buckets into order. The asymptotic running time is the same, but because insertion sort has lower constants, this is faster.

I’m using the Hoare partitioning scheme as described by Sedgewick and Wayne. The code is almost an exact replica of their implementation in Java.

We see how nicely the function pointer works.

/*
 * SORTING
 */

void swap_platform(Platform * a, Platform * b) {
  Platform const t = *a;
  *a = *b;
  *b = t;
}

bool lessxy(Platform const a, Platform const b) {
  return a.x < b.x || (a.x == b.x && a.y < b.y);
}

bool lessy(Platform const a, Platform const b) {
  return a.y < b.y;
}

int partition(Platform * const xs, int const lo, int const hi, LESS less) {
  int const idx = rand() % (hi-lo+1) + lo;
  swap_platform(xs+idx, xs+lo);

  int i = lo;
  int j = hi + 1;
  Platform const x = xs[lo];

  while (true) {
    while (less(xs[++i], x))
      if (i == hi) break;

    while (less(x, xs[--j]))
      if (j == lo) break;

    if (i >= j) break;

    swap_platform(xs+i, xs+j);
  }

  swap_platform(xs+lo, xs+j);

  return j;
}

void quicksort(Platform * const xs, int const lo, int const hi, LESS less) {
  if (hi - lo > SORT_CUTOFF) {
    int const p = partition(xs, lo, hi, less);
    quicksort(xs, lo, p-1, less);
    quicksort(xs, p+1, hi, less);
  }
}

void insertion_sort(Platform * const xs, size_t const len, LESS less) {
  for (size_t i = 1; i < len; i++) {
    Platform x = xs[i];
    size_t j = i;

    while(j > 0 && less(x, xs[j-1])) {
      xs[j] = xs[j-1];
      j--;
    }

    xs[j] = x;
  }
}

void sort(Platform * const ps, size_t const P, LESS less) {
  quicksort(ps, 0, (int) P-1, less);
  insertion_sort(ps, P, less);
}

4.8 Hashing

For the hash table I’m just using a simple linear probing technique. The slopes turn out to be really well distributed and I’m using a load factor of at most $\frac15$, so clustering is minimal and this turns out to work nicely.

Because the counts are stored as single byte we don’t want to increment it unnecessarily and risk an overflow. Fortunately the compiler seems to optimize the very straight-forward code really well.

/*
 * HASHING
 */

bool update_slope_count(uint32_t * const slopes, uint8_t * const counts, uint32_t const slopes_len, uint32_t const key) {
  assert(slopes != NULL);

  for (uint32_t idx = key % slopes_len;; idx++, idx %= slopes_len) {
    if (counts[idx] == 0) {
      counts[idx] = 1;
      slopes[idx] = key;
      return false;
    }

    if (slopes[idx] == key) {
      if (counts[idx] >= 2) {
        return true;
      } else {
        counts[idx]++;
        return false;
      }
    }
  }

  return false;
}

Because we’re hashing so much, it’s worth spending a tiny amount of time finding a prime to hash against for better distribution.

uint32_t next_prime(uint32_t const a) {
  for (uint32_t x = a;; x++) {
    if (x % 2 == 0) continue;
    bool is_prime = true;
    double const limit = sqrt(x);
    for (uint32_t i = 3; i <= limit; i += 2) {
      if (x % i == 0) {
        is_prime = false;
        break;
      }
    }
    if (is_prime) return x;
  }
}

4.9 Convex Hull

Create the convex hull in linear time given sorted input by finding upper and lower hulls. This is very nice since we already sort the input to do the A1-activities so constructing this is cheap as described.

/*
 * CONVEX HULL
 */

int ccw(Platform * p1, Platform * p2, Platform * p3) {
  return (p2->x - p1->x)*(p3->y - p1->y) - (p2->y - p1->y)*(p3->x - p1->x);
}

void convex_hull(Platform * points, size_t npoints, Platform ** const out_hull, size_t * out_hullsize) {
  Platform * hull;
  size_t t, k = 0;
  int64_t i;

  hull = *out_hull;

  for (i = 0; i < (int64_t) npoints; ++i) {
    while (k >= 2 && ccw(&hull[k-2], &hull[k-1], &points[i]) <= 0) --k;
    hull[k++] = points[i];
  }

  for (i = (int64_t) npoints-2, t = k+1; i >= 0; i--) {
    while (k >= t && ccw(&hull[k-2], &hull[k-1], &points[i]) <= 0) --k;
    hull[k++] = points[i];
  }

  *out_hull = hull;
  *out_hullsize = k;
}

4.10 Main

Finally the main function. There’s some benchmarking stuff in there behind the ifndef.

The random number generator is seeded for the quicksort, which uses a random pivot.

Then I do all the allocation aside from the adjacency lists up front.

/*
 * MAIN
 */

int main() {
#ifndef CODEJUDGE
  struct timeval t1, t2, t3, t4, t5, t6, t7;
  gettimeofday(&t1, NULL);
#endif

  srand((unsigned int) time(NULL));

  uint32_t P;
  char line[MAX_LINE];

  fgets(line, MAX_LINE, stdin);
  sscanf(line, "%d", &P);

  uint32_t const n_slopes = next_prime(5*P);

  Platform * const ps = malloc(P * sizeof *ps);
  Vertex * const vertices = malloc(VERTEX_FACTOR*P * sizeof *vertices);
  uint32_t * const slopes = malloc(n_slopes * sizeof *slopes);
  uint8_t * const counts = calloc(n_slopes, sizeof *counts);
  uint32_t * const caps = malloc(VERTEX_FACTOR*P * sizeof *caps);
  Platform * hull = malloc((P+1) * sizeof *hull);

  Graph G = graph_alloc(vertices, VERTEX_FACTOR*P);

Read in the input.

  for (uint32_t i = 0; i < P; i++) {
    fgets(line, MAX_LINE, stdin);
    sscanf(line, "%d %d %u %u %u %u", &ps[i].x, &ps[i].y, &ps[i].a1, &ps[i].a2, &ps[i].a3, &ps[i].a4);
    ps[i].n = i;
  }

#ifndef CODEJUDGE
  gettimeofday(&t2, NULL);
#endif

Construct the A1-activities as described: Sort the platforms so candidates are next to each other and then run through and construct them as long as the coordinates match.

  /* Vertex schema:
   * i: ingoing (original vertex)
   * i+1*P: A1
   * i+2*P: A3
   * There can only be one of each of A2 and A4,
   * so we don't need the dummy vertices for these
   */

  /*
   * A1 (NEWS)
   */

  sort(ps, P, lessy);

  for (size_t i = 0; i < P; i++) {
    Platform const p = ps[i];
    for (size_t j = i+1; j < P && ps[j].y == p.y; j++) {
      insert_flow_edge(&G, p.n+P, ps[j].n, p.a1);
      insert_flow_edge(&G, ps[j].n+P, p.n, ps[j].a1);
    }
   }

  sort(ps, P, lessxy);

  for (size_t i = 0; i < P; i++) {
    Platform const p = ps[i];
    for (size_t j = i+1; j < P && ps[j].x == p.x; j++) {
      insert_flow_edge(&G, p.n+P, ps[j].n, p.a1);
      insert_flow_edge(&G, ps[j].n+P, p.n, ps[j].a1);
    }
   }

#ifndef CODEJUDGE
  gettimeofday(&t3, NULL);
#endif

Now the platforms are sorted so we can create the convex hull and use this for constructing A2-activities to the platforms furthest away.

  /*
   * A2 (Human Cannon)
   */

  size_t hull_size;
  convex_hull(ps, P, &hull, &hull_size);

  for (size_t i = 0; i < P; i++) {
    Platform const p = ps[i];
    uint32_t furthest = UINT_MAX;
    int64_t furthest_dist = 0;

    for (size_t j = 0; j < (size_t) hull_size; j++) {
      Platform const q = hull[j];
      if (p.n == q.n) continue;

      int64_t const dx = p.x - q.x, dy = p.y - q.y;
      int64_t const dist = dx*dx + dy*dy;

      if (dist > furthest_dist || (dist == furthest_dist && q.n < furthest)) {
        furthest = q.n;
        furthest_dist = dist;
      }
    }

    insert_flow_edge(&G, p.n, furthest, p.a2);
   }

#ifndef CODEJUDGE
  gettimeofday(&t4, NULL);
#endif

I was very curious how to hash floating point values, but simply copying the bits verbatim into an unsigned integer works really well. I use this as the key also instead of the float, because they are identical anyway and this avoids a warning about unsafe comparison of floating point values.

Note that this is probably unsafe since floating point operations aren’t exact. Instead we should reduce the fraction and use the numerator and denominator as keys.

Two arrays comprise the hash table, the slopes/keys and counts/values and it is enough to clear the counts to clear the hash table as I’ve implemented it. This saves a constant factor because the byte array of counts takes up a quarter of the space of the 4 byte slopes.

/*
 * A3 (Platform Trampoline)
 */

for (uint32_t i = 0; i < P; i++) {
  Platform const p = ps[i];
  if (p.a3 == 0) continue;

  for (uint32_t j = i+1; j < P; j++) {
    Platform const q = ps[j];

    int64_t const dx = p.x - q.x, dy = p.y - q.y;
    float const fslope = (float) dy / dx;
    uint32_t slope;
    memcpy(&slope, &fslope, sizeof slope);

    if (update_slope_count(slopes, counts, n_slopes, slope)) {
      insert_flow_edge(&G, p.n+2*P, q.n, p.a3);
      insert_flow_edge(&G, q.n+2*P, p.n, q.a3);
    }
  }

  memset(counts, 0, n_slopes * sizeof *counts);
 }

Finally the simplest of activities; so simple that we can easily manage to connect the dummy vertices as well.

  /*
   * A4 (EOF)
   * vertex -> edge capacities
   */

  for (uint32_t i = 0; i < P; i++) {
    Platform const p = ps[i];

    insert_flow_edge(&G, p.n, p.n+  P, p.a1);
    insert_flow_edge(&G, p.n, p.n+2*P, p.a3);

    insert_flow_edge(&G, p.n, P-1, p.a4);
  }

#ifndef CODEJUDGE
  gettimeofday(&t6, NULL);
#endif

Run the flow algorithm, print the result and free the memory. Freeing the adjacency tables takes too long because we need to iterate over all the vertices, so I only do that locally.

In the end I print some running times for testing.

  /*
   * FLOW
   */

  uint32_t const flow = edmonds_karp(&G, 0, P-1, slopes, counts, caps);
  printf("%u\n", flow);

  free(ps);
  free(slopes);
  free(counts);
  free(caps);
  free(hull);

#ifndef CODEJUDGE
  gettimeofday(&t7, NULL);

  graph_free(&G);
  free(vertices);

  double const
    input_time = MS(t1, t2),
    a1_time = MS(t2, t3),
    a2_time = MS(t3, t4),
    a3_time = MS(t4, t5),
    a4_time = MS(t5, t6),
    flow_time = MS(t6, t7),
    total_time = MS(t1, t7);

  printf("IN: %4.2f, A1: %5.2f, A2: %5.2f, A3: %6.2f, A4: %3.2f, FLOW: %4.2f, TOTAL: %6.2f\n", input_time, a1_time, a2_time, a3_time, a4_time, flow_time, total_time);
#endif

  return 0;
}

And we’re done.

5 Sample running times

I’ve included the input size (in parenthesis) and running times for the largest input given, in table tab:times, generated by the implementation. The time taken to solve different inputs of the same size can vary greatly based on how many platforms are on line, their capacities etc. This is just to get an idea about where the time is spent.

2680 (2000)
IN: 1.74, A1:  6.17, A2:  0.28, A3:  48.77, A4: 0.38, FLOW: 0.26, TOTAL:  57.60
7632 (4000)
IN: 2.88, A1: 32.82, A2:  0.40, A3: 252.31, A4: 1.01, FLOW: 0.72, TOTAL: 290.14
199880 (5000)
IN: 5.80, A1:  6.22, A2:  0.61, A3: 135.04, A4: 0.94, FLOW: 0.19, TOTAL: 148.80
2420 (6000)
IN: 4.91, A1:  3.30, A2:  1.30, A3: 167.06, A4: 2.39, FLOW: 0.06, TOTAL: 179.02
70404 (8000)
IN: 6.50, A1:  8.21, A2:  0.74, A3: 339.12, A4: 2.34, FLOW: 0.13, TOTAL: 357.03

A3 dominates greatly.

6 License

Distributed under the MIT License.

This applies of course only to my code, text and figures, not necessarily anything linked from this page.

astahfrom / theplayground Goto Github PK

theplayground's Introduction