Coder Social home page Coder Social logo

html-crawler's Introduction

HTML Crawler

Purpose

This program was created for educational purposes only as a course work for my "Algorithms and Data Structures" course. The program was created without using any external libraries or any of the integrated data structures. All string editing methods like (split) were implemented manually by me. All data structures used in the project apart from the standard array were implemented manually. This project was used to show understanding of common algorithms and data structures studied as part of this course.

Features

  • Loading HTML documents from file and creating a general tree in system memory
  • Enabling the user to edit, copy and paste nodes and save the edited document as an HTML document
  • Enable the user to visualize the document using the integrated picture box window

Supported commands

The sample HTML document used in the examples is in sample.html

  • PRINT

    Prints part of the document or the entire document

    Example:

    PRINT "//" -> prints the entire document

    PRINT "//html/body/p" -> prints <p> Text1 </p> <p> Text2 </p> <p id='p3'> Text3 </p>

    PRINT "//html/body/table/tr/td" -> prints <td> 11 </td> <td> 22 </td> <td> 33 </td> <td> 44 </td>

    PRINT "//html/body/p[2]" -> prints <p> Text2 </p>

    PRINT "//html/body/div/*" -> prints <div>Text4</div><p>Text5</p>

    PRINT "//html/body/div" -> prints <div><div>Text4</div><p>Text5</p></div>

    PRINT "//html/body/p[@id='p3']" -> prints <p id='p3'> Text3 </p>

    PRINT "//html/body/table[@id='table2']/tr[2]/td" -> prints <td> 22 </td>

  • SET

    Replace the content of a node with what the user has entered

    Example:

    SET "//html/body/p" "AAA" -> <p> AAA </p> <p> AAA </p> <p id='p3'> AAA </p>

    SET "//html/body/div/div" "<b>Text4</b>" -> <div><b>Text4</b></div>

  • COPY

    Copies one node to another Example:

    COPY "//html/body/div/div" "//html/body/table[@id='table2']/tr[2]/td"

  • SAVE

    Saves the tree as an HTML document

  • VISUALIZE

    Renders the content of the document using a BMP image and System.Drawing

Limitations

  • The HTML document should start with the <html> tag
  • The crawler does not support <head>, <!DOCTYPE> or comments
  • The crawler does not support script tags and style tags

Tech Stack

The HTML Crawler is written entirely in the C# programming language on .net 6. Windows Form is used for the user interface.

Main look of the program

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.