The sdp-q2-project from e-caste

Add time benchmarks

As per project specifications, we need to provide CPU time statistics:

labeling generation phase
reachability query phase

See page 2 of Q2 General Project Definition.pdf. Screenshot here:

Send email to Quer after delivery as requested

Use linked lists instead of arrays to speed up file read

Since with the current implementation we use arrays, we have to know in advanced how much memory we're going to need for the DAG struct.
Using linked lists (see online for the implementation) we could store the data for each vertex while reading the file, cutting in half the number of file reads required to store the DAG.

This should be done before #3

Merge "common" part thread ScanFile, ScanRoots

In order to avoid the "re-creation" of the same threads, we could enlarge the "scanFile" by adding the code of "scanRoot" inside.

SCAN FILE INIT
for(j=0; j < num_threads; j++) {
        args[j].filename = argv[1];
        args[j].graph = rows;
        args[j].total_vertex = num_vertex;
        args[j].total_threads = num_threads;
        args[j].size_file = size;
        args[j].roots_num = &roots_num;             // pointer of 'shared' variable
        args[j].roots_mutex = roots_mutex;          // protection for 'shared' variable
    }
    printf("Starting the reading of DAG file...\n");

    for(i=0; i<num_threads; i++) {
        args[i].id = i;
        err_code = pthread_create(&threads[i], NULL, scanFile, (void *)&args[i]);
        if(err_code) {
            printf ("Error number %i in thread %i creation.\n", err_code, i);
            exit(1);
        }
    }

    // Wait until all thread end
    for(j=0; j < num_threads; j++) {
        err_code = pthread_join(threads[j], NULL);
        if(err_code) {
            printf ("Error number %i in thread %i joining.\n", err_code, j);
            exit(1);
        }
    }

SCAN ROOTS INIT
for(i=0; i<num_threads; i++) {
        args[i].id = i;
        args[i].roots = roots;
        args[i].root_index = &root_index;
        err_code = pthread_create(&threads[i], NULL, scanRoots, (void *)&args[i]);
        if(err_code) {
            printf ("Error number %i in thread %i creation.\n", err_code, i);
            exit(1);
        }
    }

    for(j=0; j < num_threads; j++) {
        err_code = pthread_join(threads[j], NULL);
        if(err_code) {
            printf ("Error number %i in thread %i joining.\n", err_code, j);
            exit(1);
        }
    }

NOTE: could be useful barrier for thread syncronization

Read DAG file

Sequentially read DAG file and store it in a list of C structures.

Also see if we can reverse engineer dag_stq_gen.c

Fix graph files bundled in this repo

Some files (at least citPatents.scc.gra) are corrupted. The correct graph files should be downloaded from https://code.google.com/archive/p/grail/downloads. Replace benchmark.zip.

This should also be done dynamically in #20

Use array instead of list for roots

Replace fake_push with a counter function for the number of roots, protected by a semaphore.

In main allocate a roots array with the number of counters, then protect the append of a root into the array with a semaphore.

modificare lista di archi in array per velocizzare generazione labels

guardare la funzione RandomizedVisitSequentialRecursive

Add bash launcher

Add launch.sh to add ability to:

download original graph files from https://code.google.com/archive/p/grail/downloads (since we won't be able to upload many GBs of data to PoliTo's portal)
generate Quer's benchmark graphs locally
[optional] auto-run / select which graphs to run our algorithm implementation against

Bash is a prerequisite, but I think the professor will have no problem using it.

Add semaphores around fake_push calls

In scanRoots, to prevent pushing data at the exact same moment to the list from different threads.

deallocazione risorse prima di exit(1)

scrivere una funzione a cui passare tutti i puntatori (labels, ... ) che deallochi tutto:

// Deallocazione di tutte le risorse

    for(i=0; i<num_vertex; i++) {
        free_list(rows[i].edges_pointer);
        pthread_mutex_destroy(rows[i].node_mutex);
        free(rows[i].node_mutex);
    }

    pthread_mutex_destroy(roots_mutex);
    free(roots_mutex);
    free(roots);

    for(i=0; i<num_vertex; i++){
        free(labels[i].lbl_start);
        free(labels[i].lbl_end);
        free(labels[i].visited);
    }
    free(labels);
    free(rows);
    free_list_query(head_query);
    ```

Update memory usage in README

Since we changed unsigned long to unsigned int.

Add memory usage benchmarks

As per project specifications, we need to provide memory usage statistics:

labeling generation phase
reachability query phase

See page 2 of Q2 General Project Definition.pdf. Screenshot here:

Completing #15

Replace fscanf with read

Reading the following email section by Quer, it's evident that we should

read the input files sequentially instead of in parallel
use read instead of fscanf for better performance

Detect number of system threads

Instead of using a constant N for threads, try and detect the number of threads of the physical/virtual machine in which the program is running. E.g. quad-core CPU with hyperthreading -> N=8

This was already mentioned in #3 but deserves its own issue

Make RandomizedVisit parallel

try to use thread for implement the Index Construction (paper par. 3.1)

REMINDER: Use only one thread for the whole label generation. 1 thread for each label otherwise could be problem in scheduling!

NOTE: We could do better. If we have to generate 4 labels, we create 4 threads. but if our processor support 8 threads, 4 of them do nothing...

Parallelize DAG file read

Use the number of the system's threads (e.g. CPU has 4 cores 8 threads -> use 8 threads).

We could use the lseek function to read a buffer of N lines/characters per time per thread.

Finalize Python script to plot DAG

See docs here

dividere q2.c in più file

utilizzare almeno i seguenti file con relativi ".h"

lettura DAG
creazione labels
Raggiungibilità Query

Randomize roots order

See line 4 of RandomizedLabeling in the paper.

Possible implementation:

for (int i=0; i<roots_len; ++i) {
   indexes[i] = i;
}
// TODO implement index randomization
// then do
for (int i=0; i<roots_len; ++i) {
    RandomizedVisit(x, roots[indexes[i]], G); 
}

This code is already parallelized for d which is the number of dimensions, or labels, equal to NUM_THREADS.

Optimize the parallelization

Currently we are parallelizing the DAG read from file.
We also need to parallelize as much code as possible of the GRAIL algorithm implementation.

Possible approaches:

parallelize each for loop - e.g. line 1, instantiate a thread for each label | possible issue: NUM_THREADS > d --> (NUM_THREADS - d) threads do nothing
parallelize dynamically: parallelize other for loops only if the issue above presents itself | possible issue: hard to implement
only parallelize compute-heavy code segments | issue: determine which are these segments
other?

See algorithm in the screenshot below:

Change from int to unsigned long

vertex_num, roots, etc etc

// TODO: https://en.wikipedia.org/wiki/C_data_types
// Ottimizzazione delle memoria: sostituire int con short se possibile
// Quanti nodi al massimo? unsigned int: [0, 65,535] ; unsigned long int: [0, 4,294,967,295]

Vertices inf and sup calculation for thread splitting

Are these lines safe? What if my_data->size_file*(my_data->id+1)/NUM_THREADS is a float?

sdp-q2-project/q2.c

Line 134 in 560b0c7

    
           fseek(fp, my_data->size_file*(my_data->id+1)/NUM_THREADS, SEEK_SET); // 1Gb / 4Thread = 250Mb ciascuno * id+1

sdp-q2-project/q2.c

Line 150 in 560b0c7

    
           fseek(fp, my_data->size_file*my_data->id/NUM_THREADS, SEEK_SET); // 1Gb / 4Thread = 250Mb ciascuno * id

Ignore third number on the line of query file

Add %*i to the end of the fscanfexpression.

Convert README to pdf before delivery

Replace ISO C I/O functions with corresponding POSIX functions

We are currently using fopen, fseek and the like. As stated in slide 27 of u02s02-fileSystem.pdf:
(see screenshot)

Since we have chosen the "UNIX-like POSIX system" flavor, I suggest a refactor. See end of page 2 of Q2 General Project Definition.pdf for project specifications.

Parallelize reachability query output write

Example:
NUM_THREADS=4
NUM_QUERIES=1000
-> for each thread, we have 1000/4=250 queries.

Since we are using threads, they all reference the same data structure. Each of them can save in a boolean if each query is reachable or not.

After this, to prevent a lot of semaphore interactions, we wait for all the threads to finish and then we write sequentially, in file2 (.que), at the end of each line, if the query is reachable or not.

Note: remember to write at the end of the output where the results are stored so the professor knows where to look.

Multiple Index (3.1 Paper)

for the moment the sequential code works for only 1 label.

try to implement it for multiple labels (In general D < 5)

Issue when NUM_THREADS > number of vertices

The problem starts at this line.

When NUM_THREADS is greater than the number of vertices (in my case, 24 threads vs 10 vertices), the threads will wait indefinitely. This should be fixed by something similar to the following block:

if (NUM_THREADS > num_vertex)
    num_threads = num_vertex;
// then use num_threads instead of NUM_THREADS

The last node has [0, 0] as labels with big graphs

Tested with citPatents.scc.gra (download here).

The label for the last node is [0, 0], while in reality it is connected to other nodes.

This is probably because the last vertex/node is always the root node in patents citations.

Add README

add usage of run.sh
compare times with C++ implementation
...

table like the one we have now
histogram to compare results (x=DAGs, y=TT for different number of threads)

Randomize the label construction

For the moment the single label is build following the order of the rows (0 to N)

Implement random order in RandomizedVisit() and RandomizedLabeling()

Try to optimize Label generation

For the moment, just one thread for each label is created. (high level of un-utilized threads)

We could try to create (NUM_THREADS - NUM_LABELS). Each of these will handle inf < ... < sup roots in "random order"

We need to add protection (mutex) to avoid that many thread access the same node cuncurrency.

Implement reachability query (3.2)

No parallelization required at this point., see #18 instead.

See algorithm here:

e-caste / sdp-q2-project Goto Github PK

sdp-q2-project's Introduction

My university courses projects

Team projects I've launched or contributed to

👇 Scroll down for everything else 👇

sdp-q2-project's People

Contributors

Stargazers

Watchers

sdp-q2-project's Issues

Recommend Projects

Recommend Topics

Recommend Org