Coder Social home page Coder Social logo

adm-hw3's Introduction

ADM-HW3

In this github repository we stored the files written for the third Homework of the ADM course.

Team Members #Group-19

  • Clara Lecce
  • Giulia Luciani
  • Luca Mattei
  • Zeeshan Asghar

File and Scripts descriptions

  1. main.ipynb:

    this is the notebook which contains the executed parts of the points below of the homework:

      1. Data collection
         1.1 Get the list of animes
         1.2 Crawl animes
         1.3 Parse downloaded pages
    
      2. Search Engine
         2.1 Conjunctive query
            2.1.1 Create your index!
            2.1.2 Execute the query
         2.2 Conjunctive query & Ranking score
            2.2.1 Inverted index
            2.2.2 Execute the query
    
      3. Define a new score!
    
      5. Algorithmic question
    
  2. functions.py:

Python script in which we have written the useful functions to solve the questions.

  # 1 DATA COLLECTION
     # 1.1 Get the list of animes
     def get_link(url_link, file_txt);
     # 1.2 Crawl animes
     def crawl_html(start_index, stop_index=0);
     # 1.3 Parse downloaded pages
        # 1. Anime Name, String
        def get_title(soup);
        # 2. Anime Type, String
        def get_type(soup);
        # 3. Number of episode, Integer
        def get_num_ep(soup);
        # 4. Release and End Dates of anime, datetime format
        def get_dates(soup);
        # 5. Number of members, Integer
        def get_memb(soup);
        # 6. Score, Float
        def get_score(soup);
        # 7. Users, Integer
        def get_users(soup);
        # 8. Rank, Integer
        def get_rank(soup);
        # 9. Popularity, Integer
        def get_pop(soup);
        # 10. Synopsis, String
        def get_descr(soup);
        # 11. Related animes, List of strings
        def get_rel_an(soup);
        # 12. Characters, List of strings
        def get_char(soup);
        # 13. Voices, List of strings
        def get_voices(soup);
        # 14. Staff, List of strings
        def get_staff(soup);

  # 2 SEARCH ENGINE
     # 2.1 Conjunctive query
        def download();
        # function to stem the string given
        def text_mining(string);
        # function to create the vocabulary
        def create_vocab();
        
        # 2.1.1 Create your index!
           # function to create the inverted_index and stores it in a json file
           def invertedIndex();

     # 2.2 Conjunctive query & Ranking score
        # 2.2.1 Inverted index tf*idf
           # function to create the inverted_index_tfidf and stores it in a json file
           def invertedIndex_tfidf(vocabulary, inverted_index);
        # 2.2.2 Execute the query
           # function to take the first k documents
           def top_k_documents(query, k, inverted_index, inverted_index_tfidf, inverted_doc, vocabulary);
           # function to calculate the cosine similarity
           def search_similarity(query, inverted_index, inverted_index_tfidf, inverted_doc, vocabulary);

  # 3. DEFINE A NEW SCORE!
     # function to stem the string given
     def text_mining_score(string);
     # function to calculate the new score
     def new_score(query);
  
  # 5. ALGORITHMIC QUESTION
     # implementation function
     def MyAlg(seq,query);

the files below are only used to exchange data between us

  1. anime_links.txt:

contains the links of the animes

  1. vocabulary.json:

contains the vocabulary of the words contained in all the anime descriptions (but parsed with nltk library).

  1. inverted_index.json:

contains the inverted index for the Search Engine 2.1

  1. inverted_index_tfidf.json:

contains the inverted index tfidf for the Search Engine 2.2

  1. inverted_doc.json:

contains the tfidf for every documents used for the cosine similarity

  1. score_dict.json:

contains the new documents for the new score

  1. heap.json:

contains a list unordered with only scores

adm-hw3's People

Contributors

claral27 avatar giulia-luciani avatar lucamattei99 avatar zeeshan6851 avatar

Watchers

 avatar

Forkers

lucamattei99

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.