Leagify - complete with new name!
Most of the code is in the repository with the old name Leagueify
Scraping information on consensus big boards.
Leagify - complete with new name!
Most of the code is in the repository with the old name Leagueify
Currently, the program reads this line in scraper.conf to determine what draft year should be scraped:
The year is loaded in the program here:
prospect-scraper-mddb-2022/Program.cs
Lines 27 to 40 in 4b5436b
It would be nice to get this information as an array, then run this scraping job for any values in the array.
An example of how an array is stored in the configuration file is right here:
The configuration uses SharpConfig to read values.
.NET is getting updated more often. This repo currently uses .NET 6 for the runtime and the github action that verifies that everything is building properly in pull requests.
Example of the PR action:
Here's the GitHub Actions code:
prospect-scraper-mddb-2022/.github/workflows/dotnet5.yml
Lines 1 to 23 in 82aff16
Here's the commit that made changes from .NET 5 to 6: 38dd342
It would be nice to change the devcontainer as well.
Here's the commit that was used to change from .NET 5 to .NET 6: 7662f93
It seems that .NET 8 will come soon. Maybe November 13-14?
https://devblogs.microsoft.com/dotnet/announcing-dotnet-8-rc1/
We should change things to .NET 7 for now, then do the similar changes for .NET 8 when the official version 8 release happens.
An exception of type 'System.NullReferenceException' occurred in prospect-scraper-mddb-2022.dll but was not handled in user code: 'Object reference not set to an instance of an object.'
at prospect_scraper_mddb_2022.Extensions.HtmlNodeCollectionExtensions.FindProspects(HtmlNodeCollection nodes, String todayString, Dictionary`2& schoolImages) in /workspaces/prospect-scraper-mddb-2022/Extensions/HtmlNodeCollectionExtensions.cs:line 56
at prospect_scraper_mddb_2022.Extensions.StatusContextExtensions.ScrapeYear(StatusContext ctx, HtmlWeb webGet, String scrapeYear, String urlToScrape) in /workspaces/prospect-scraper-mddb-2022/Extensions/StatusContextExtensions.cs:line 77
at prospect_scraper_mddb_2022.Program.<>c__DisplayClass0_0.
I'm not sure why one state was missing a label, so it's possible that a school didn't have a state?
prospect-scraper-mddb-2022/Program.cs
Line 117 in 9f9b683
Now that records are being written about prospects, it would be nice to regroup the results by state.
The information that I would like to be present is this:
State, Region, LeagifyPoints, SchoolCount, ProspectCount, Date
The previous version of this program was https://github.com/Leagify/prospect-scraper-dt2021.
In that program, console output for "chatty" stuff was handled better, although it was done by manually writing to files outside of any official logging framework, I think.
The goal here: Log the chatty output better, in a way that is rewritten after each run, so the most recent log will have only the info from the last run.
Manual writing to files is probably OK, but if you're looking for a logging framework to use, here is an article discussing possible options.
Ben Brown of Ole Miss appears to be one of the offenders.
Darnell Jefferies appears to be the other.
I'm not sure if this issue is happening at the scraping level, or at what point the issue was introduced.
Looking at Ben Brown, I think the issue might just be in the 12-31 ranks (Line 410):
Darnell Jefferies's problems seem to go back potentially all the way to the beginning.
Looking at the older ranks, these may not be the only players with this positional issue.
This makes me think that when I'm scraping, I'm getting an element that sometimes has more than one thing inside of it. I'll need to debug to verify.
There are two steps to this:
prospect-scraper-mddb-2022/Program.cs
Line 117 in 9f9b683
Now that records are being written about prospects, it would be nice to regroup the results by school.
The information that I would like to be present is this:
School, Conference, LeagifyPoints, ProspectCount, Date
Learn to scrape results from existing records:
https://www.nflmockdraftdatabase.com/nfl-draft-results-2021
Determine whether the draft scraper should be separate or a part of this tool.
One issue that I've noticed is that when scraping, sometimes I'll end up with a date that's in the future.
I assume that at some point I'm using GMT for the time, and I'm working in the evening, so GMT shows a date that is technically tomorrow.
For example, it's still 10/2 where I am, but the scraper assigned the date of 10/3 to the most recent scrape from #60.
It would be nice not to have the future date, and to instead have the date from the local time, if possible.
My thought is that it may be happening here:
prospect-scraper-mddb-2022/Extensions/StatusContextExtensions.cs
Lines 46 to 50 in 029a645
I'm not 100% sure, though.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.