Coder Social home page Coder Social logo

sdp-q2-project's Introduction

presentation.sh

My university courses projects

Team projects I've launched or contributed to


👇 Scroll down for everything else 👇

sdp-q2-project's People

Contributors

bennygiu avatar e-caste avatar enrico-git avatar

Stargazers

 avatar  avatar

Watchers

 avatar

sdp-q2-project's Issues

Add time benchmarks

As per project specifications, we need to provide CPU time statistics:

  • labeling generation phase
  • reachability query phase

See page 2 of Q2 General Project Definition.pdf. Screenshot here:
Screenshot 2020-12-02 at 12 07 53

Use linked lists instead of arrays to speed up file read

Since with the current implementation we use arrays, we have to know in advanced how much memory we're going to need for the DAG struct.
Using linked lists (see online for the implementation) we could store the data for each vertex while reading the file, cutting in half the number of file reads required to store the DAG.

This should be done before #3

Merge "common" part thread ScanFile, ScanRoots

In order to avoid the "re-creation" of the same threads, we could enlarge the "scanFile" by adding the code of "scanRoot" inside.

SCAN FILE INIT
for(j=0; j < num_threads; j++) {
        args[j].filename = argv[1];
        args[j].graph = rows;
        args[j].total_vertex = num_vertex;
        args[j].total_threads = num_threads;
        args[j].size_file = size;
        args[j].roots_num = &roots_num;             // pointer of 'shared' variable
        args[j].roots_mutex = roots_mutex;          // protection for 'shared' variable
    }
    printf("Starting the reading of DAG file...\n");

    for(i=0; i<num_threads; i++) {
        args[i].id = i;
        err_code = pthread_create(&threads[i], NULL, scanFile, (void *)&args[i]);
        if(err_code) {
            printf ("Error number %i in thread %i creation.\n", err_code, i);
            exit(1);
        }
    }

    // Wait until all thread end
    for(j=0; j < num_threads; j++) {
        err_code = pthread_join(threads[j], NULL);
        if(err_code) {
            printf ("Error number %i in thread %i joining.\n", err_code, j);
            exit(1);
        }
    }
SCAN ROOTS INIT
for(i=0; i<num_threads; i++) {
        args[i].id = i;
        args[i].roots = roots;
        args[i].root_index = &root_index;
        err_code = pthread_create(&threads[i], NULL, scanRoots, (void *)&args[i]);
        if(err_code) {
            printf ("Error number %i in thread %i creation.\n", err_code, i);
            exit(1);
        }
    }

    for(j=0; j < num_threads; j++) {
        err_code = pthread_join(threads[j], NULL);
        if(err_code) {
            printf ("Error number %i in thread %i joining.\n", err_code, j);
            exit(1);
        }
    }

NOTE: could be useful barrier for thread syncronization

Read DAG file

Sequentially read DAG file and store it in a list of C structures.

Also see if we can reverse engineer dag_stq_gen.c

Use array instead of list for roots

Replace fake_push with a counter function for the number of roots, protected by a semaphore.

In main allocate a roots array with the number of counters, then protect the append of a root into the array with a semaphore.

Add bash launcher

Add launch.sh to add ability to:

  • download original graph files from https://code.google.com/archive/p/grail/downloads (since we won't be able to upload many GBs of data to PoliTo's portal)
  • generate Quer's benchmark graphs locally
  • [optional] auto-run / select which graphs to run our algorithm implementation against

Bash is a prerequisite, but I think the professor will have no problem using it.

deallocazione risorse prima di exit(1)

scrivere una funzione a cui passare tutti i puntatori (labels, ... ) che deallochi tutto:

// Deallocazione di tutte le risorse

    for(i=0; i<num_vertex; i++) {
        free_list(rows[i].edges_pointer);
        pthread_mutex_destroy(rows[i].node_mutex);
        free(rows[i].node_mutex);
    }

    pthread_mutex_destroy(roots_mutex);
    free(roots_mutex);
    free(roots);

    for(i=0; i<num_vertex; i++){
        free(labels[i].lbl_start);
        free(labels[i].lbl_end);
        free(labels[i].visited);
    }
    free(labels);
    free(rows);
    free_list_query(head_query);
    ```

Add memory usage benchmarks

As per project specifications, we need to provide memory usage statistics:

  • labeling generation phase
  • reachability query phase

See page 2 of Q2 General Project Definition.pdf. Screenshot here:
Screenshot 2020-12-02 at 12 07 53

Completing #15

Replace fscanf with read

Reading the following email section by Quer, it's evident that we should

  1. read the input files sequentially instead of in parallel
  2. use read instead of fscanf for better performance

photo_2020-12-29 16 17 55

Detect number of system threads

Instead of using a constant N for threads, try and detect the number of threads of the physical/virtual machine in which the program is running. E.g. quad-core CPU with hyperthreading -> N=8

This was already mentioned in #3 but deserves its own issue

Make RandomizedVisit parallel

try to use thread for implement the Index Construction (paper par. 3.1)

REMINDER: Use only one thread for the whole label generation. 1 thread for each label otherwise could be problem in scheduling!

NOTE: We could do better. If we have to generate 4 labels, we create 4 threads. but if our processor support 8 threads, 4 of them do nothing...

Parallelize DAG file read

Use the number of the system's threads (e.g. CPU has 4 cores 8 threads -> use 8 threads).

We could use the lseek function to read a buffer of N lines/characters per time per thread.

dividere q2.c in più file

utilizzare almeno i seguenti file con relativi ".h"

lettura DAG
creazione labels
Raggiungibilità Query

Randomize roots order

See line 4 of RandomizedLabeling in the paper.

Possible implementation:

for (int i=0; i<roots_len; ++i) {
   indexes[i] = i;
}
// TODO implement index randomization
// then do
for (int i=0; i<roots_len; ++i) {
    RandomizedVisit(x, roots[indexes[i]], G); 
}

This code is already parallelized for d which is the number of dimensions, or labels, equal to NUM_THREADS.

Optimize the parallelization

Currently we are parallelizing the DAG read from file.
We also need to parallelize as much code as possible of the GRAIL algorithm implementation.

Possible approaches:

  • parallelize each for loop - e.g. line 1, instantiate a thread for each label | possible issue: NUM_THREADS > d --> (NUM_THREADS - d) threads do nothing
  • parallelize dynamically: parallelize other for loops only if the issue above presents itself | possible issue: hard to implement
  • only parallelize compute-heavy code segments | issue: determine which are these segments
  • other?

See algorithm in the screenshot below:
Screenshot 2020-12-02 at 19 04 55

Replace ISO C I/O functions with corresponding POSIX functions

We are currently using fopen, fseek and the like. As stated in slide 27 of u02s02-fileSystem.pdf:
Screenshot 2020-12-02 at 11 53 34 (see screenshot)

Since we have chosen the "UNIX-like POSIX system" flavor, I suggest a refactor. See end of page 2 of Q2 General Project Definition.pdf for project specifications.

Parallelize reachability query output write

Example:
NUM_THREADS=4
NUM_QUERIES=1000
-> for each thread, we have 1000/4=250 queries.

Since we are using threads, they all reference the same data structure. Each of them can save in a boolean if each query is reachable or not.

After this, to prevent a lot of semaphore interactions, we wait for all the threads to finish and then we write sequentially, in file2 (.que), at the end of each line, if the query is reachable or not.

Note: remember to write at the end of the output where the results are stored so the professor knows where to look.

Multiple Index (3.1 Paper)

for the moment the sequential code works for only 1 label.

try to implement it for multiple labels (In general D < 5)

Issue when NUM_THREADS > number of vertices

The problem starts at this line.

When NUM_THREADS is greater than the number of vertices (in my case, 24 threads vs 10 vertices), the threads will wait indefinitely. This should be fixed by something similar to the following block:

if (NUM_THREADS > num_vertex)
    num_threads = num_vertex;
// then use num_threads instead of NUM_THREADS

The last node has [0, 0] as labels with big graphs

Tested with citPatents.scc.gra (download here).

The label for the last node is [0, 0], while in reality it is connected to other nodes.

This is probably because the last vertex/node is always the root node in patents citations.

Add README

  1. add usage of run.sh
  2. compare times with C++ implementation
  3. ...

Segfault when allocating a big graph

Tested with v10e3.gra (working) and v1000000e200.gra (segfault).

Output of CLion: Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Add benchmark data for different numbers of threads

In the email Quer said to try with 1,2,3,4 threads. Since we can, we could add some insights for 1,2,8,32 threads.
Keep labels=2 for small DAGs (dense and sparse) and labels=5 for large DAGs.

  • table like the one we have now
  • histogram to compare results (x=DAGs, y=TT for different number of threads)

Randomize the label construction

For the moment the single label is build following the order of the rows (0 to N)

Implement random order in RandomizedVisit() and RandomizedLabeling()

Try to optimize Label generation

For the moment, just one thread for each label is created. (high level of un-utilized threads)

We could try to create (NUM_THREADS - NUM_LABELS). Each of these will handle inf < ... < sup roots in "random order"

image

We need to add protection (mutex) to avoid that many thread access the same node cuncurrency.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.