Coder Social home page Coder Social logo

robertobrandini / scraping-toolkit Goto Github PK

View Code? Open in Web Editor NEW

This project forked from otavioalfenas/scraping-toolkit

0.0 0.0 0.0 2.11 MB

The Scrapping-Toolkit is a fast-based structure to capture information within web pages

License: GNU General Public License v3.0

C# 100.00%

scraping-toolkit's Introduction

Contributors Forks Issues MIT License

Scraping-Toolkit

Read this in other language: English, Portuguese

Overview

The Scrapping-Toolkit is a fast-based structure to capture information within web pages, used to track websites and even extract or insert data on the web pages. It can be widely used to reach to any goal from data-mining to web site monitoring and automated tests.

Prerequisites

HTML Agility Pack or higher

Framework or higher

Framework or higher

How to use

.NET Framework

To install the component you can use the "Install" command or access https://www.nuget.org/packages/Scraping/

Install-Package Scraping

.NET Core

To install the component you can use the "Install" command or access https://www.nuget.org/packages/Scraping.Core/

Install-Package Scraping.Core

To make it use the "load", you must inform the url (FromUrl) and one usage possibility is to let the tool try to identify the screen components.

public void LoadComponents()
{
	var ret = new HttpRequestFluent(true)
		.FromUrl("https://github.com/otavioalfenas/Scraping-Toolkit")
		.TryGetComponents(Scraping.Enums.TypeComponent.LinkButton| Scraping.Enums.TypeComponent.InputHidden)
		.Load();
}

or async method

public async void LoadComponents()
{
	var ret = await new HttpRequestFluent(true)
		.FromUrl("https://github.com/otavioalfenas/Scraping-Toolkit")
		.TryGetComponents(Scraping.Enums.TypeComponent.LinkButton| Scraping.Enums.TypeComponent.InputHidden)
		.LoadAsync();
}

Inside the tool, there are also many extensions that make the parse work easier.

public void AllTags()
{
	var ret = new HttpRequestFluent(true)
		.FromUrl("https://github.com/otavioalfenas/Scraping-Toolkit")
		.Load();
	var byClassContain = ret.HtmlPage.GetByClassNameContains("Box mb-3 Box--");
	var byClassEquals = ret.HtmlPage.GetByClassNameEquals("Box mb-3 Box--condensed");
	var byId = ret.HtmlPage.GetById("readme");
}

Examples

Below there is an example of all the methods inside the Load. The folder "test" contains many examples on Load usage and extensions. If any doubt or suggestion comes up, you may contact us or open an issue so we can improve the tool together.

public void LoagPageFull()
{
	var ret = new HttpRequestFluent(true);
	ret.OnLoad += Ret_OnLoad;
	NameValueCollection parameters = new NameValueCollection();
	parameters.Add("Name", "Value");

	ret.FromUrl("https://github.com/otavioalfenas/Scraping-Toolkit")
		.TryGetComponents(Enums.TypeComponent.ComboBox| Enums.TypeComponent.DataGrid| 
						Enums.TypeComponent.Image|Enums.TypeComponent.InputCheckbox|
						Enums.TypeComponent.InputHidden| Enums.TypeComponent.InputText|
						Enums.TypeComponent.LinkButton)
		.RemoveHeader("name")
		.AddHeader("name", "value")
		.KeepAlive(true)
		.WithAccept("Accept")
		.WithAcceptEncoding("Accept-Encoding")
		.WithAcceptLanguage("Accept-Language")
		.WithAutoRedirect(true)
		.WithContentType("ContentType")
		.WithMaxRedirect(2)
		.WithParameters(parameters)
		.WithPreAuthenticate(true)
		.WithReferer("Referer")
		.WithRequestedWith("WithRequestedWidth")
		.WithTimeoutRequest(100)
		.WithUserAgent("User-Agent")
	.Load();

}

private void Ret_OnLoad(object sender, RequestHttpEventArgs e)
{
	e.HtmlPage;
	e.ResponseHttp;
}

Contribution

Below you can contribute to the project as much as you want. Any advice,suggestion or adjust will always be welcomed. Here is a step-by-step guide on how to proceed to upload your update.

  1. Fork the Project;
  2. Create your Feature Branch (git checkout -b branch/Example);
  3. Commit your updates (git commit -m 'Message of any updates that were made to the program');

Request permission to send your branch. 4. Send to your Branch (git push --set-upstream origin Example); 5. Open a Pull Request;

Licences

Distributed over GNU Licence. See the file LICENSE for more information.

Contact

Otavio Alfenas: @otavioalfenas
E-mail: [email protected]

Leandro Klaiber: @leandroklaiber
E-mail: [email protected]

Acknowledgement

Eduardo Chen - https://www.linkedin.com/in/EduardoChen
Edgard Yamashita - https://www.linkedin.com/in/eguilherme

scraping-toolkit's People

Contributors

otavioalfenas avatar chenhuade1 avatar rickfrankel avatar eguilherme avatar leklaiber avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.