This program was created for educational purposes only as a course work for my "Algorithms and Data Structures" course. The program was created without using any external libraries or any of the integrated data structures. All string editing methods like (split) were implemented manually by me. All data structures used in the project apart from the standard array were implemented manually. This project was used to show understanding of common algorithms and data structures studied as part of this course.
- Loading HTML documents from file and creating a general tree in system memory
- Enabling the user to edit, copy and paste nodes and save the edited document as an HTML document
- Enable the user to visualize the document using the integrated picture box window
-
Prints part of the document or the entire document
Example:
PRINT "//" -> prints the entire document
PRINT "//html/body/p" -> prints <p> Text1 </p> <p> Text2 </p> <p id='p3'> Text3 </p>
PRINT "//html/body/table/tr/td" -> prints <td> 11 </td> <td> 22 </td> <td> 33 </td> <td> 44 </td>
PRINT "//html/body/p[2]" -> prints <p> Text2 </p>
PRINT "//html/body/div/*" -> prints <div>Text4</div><p>Text5</p>
PRINT "//html/body/div" -> prints <div><div>Text4</div><p>Text5</p></div>
PRINT "//html/body/p[@id='p3']" -> prints <p id='p3'> Text3 </p>
PRINT "//html/body/table[@id='table2']/tr[2]/td" -> prints <td> 22 </td>
-
Replace the content of a node with what the user has entered
Example:
SET "//html/body/p" "AAA" -> <p> AAA </p> <p> AAA </p> <p id='p3'> AAA </p>
SET "//html/body/div/div" "<b>Text4</b>" -> <div><b>Text4</b></div>
-
Copies one node to another Example:
COPY "//html/body/div/div" "//html/body/table[@id='table2']/tr[2]/td"
-
Saves the tree as an HTML document
-
Renders the content of the document using a BMP image and System.Drawing
- The HTML document should start with the <html> tag
- The crawler does not support <head>, <!DOCTYPE> or comments
- The crawler does not support script tags and style tags
The HTML Crawler is written entirely in the C# programming language on .net 6. Windows Form is used for the user interface.