Coder Social home page Coder Social logo

Comments (7)

serve-and-volley avatar serve-and-volley commented on August 22, 2024

@hkakkar04: I'm currently finishing on a complete revision the Python scripts, as well as adding some new ones, and I'm close to scraping all the data from the ATP website from 1877-2016, a total of 187,906 matches.

As for the error, can you provide me the entire command line output? I need to see the exact Federer match where the script broke down. I have a feeling it may be a match where the ATP website has denoted the match score as "W/O", i.e. a "walkover".

from atp-world-tour-tennis-data.

hkakkar04 avatar hkakkar04 commented on August 22, 2024

@serve-and-volley: Hi Kevin, thank for your reply. Wow, scraping all that data would be a wonderful addition. If you can make that data available once you are done collecting that would be great.

Here's the output for the command line regarding the error.

image

I commented the error line (line 270 in this case) and ran the script. The script ran but then stopped after a while on its own after scraping few matches. Here's the command line output for that and also attaching the csv that was generated.

image

This error looks like that the host is dropping the connection after few seconds of activity. May be they have a rate limit and sleep command might be handy?

However, more importantly it also appears to me that the data generated is not for all matches played by Federer. For instance, only 6 entries in 2000 and 5 in 2001.

If you are already developing a script that will include data from all matches then probably not necessary to debug this. I was planning to generate data for all players from 1990 by running the script for each player but if you can make the data available from 1877 that should take care of everything.

Thanks a lot for your help. Appreciate it totally!

Hemant.
roger-federer_1998-2016.xlsx

from atp-world-tour-tennis-data.

serve-and-volley avatar serve-and-volley commented on August 22, 2024

@hkakkar04: As for the first error, I'm not sure what's causing it, because when I run the script, I'm not getting the error.

In addition, no matches are being skipped for me, as you can see in this output (I stopped the script at a certain point):

$ python atp_match_data_player.py roger-federer f324 1998 2016
1998 | Basel | Round of 32 | Andre Agassi
1998 | Toulouse | Quarter-Finals | Jan Siemerink
1998 | Toulouse | Round of 16 | Richard Fromberg
1998 | Toulouse | Round of 32 | Guillaume Raoux
1998 | Geneva | Round of 32 | Orlin Stanoytchev
1998 | Gstaad | Round of 32 | Lucas Arnold Ker
1999 | Brest | Finals | Max Mirnyi
1999 | Brest | Semi-Finals | Martin Damm
1999 | Brest | Quarter-Finals | Michael Llodra
1999 | Brest | Round of 16 | Rodolphe Gilbert
1999 | Brest | Round of 32 | Lionel Roux
1999 | Lyon | Round of 32 | Lleyton Hewitt
1999 | Lyon | Round of 64 | Daniel Vacek
1999 | Vienna | Semi-Finals | Greg Rusedski
1999 | Vienna | Quarter-Finals | Karol Kucera
1999 | Vienna | Round of 16 | Jiri Novak
1999 | Vienna | Round of 32 | Vincent Spadea
1999 | Basel | Quarter-Finals | Tim Henman
1999 | Basel | Round of 16 | Alexander Popp
1999 | Basel | Round of 32 | Martin Damm
1999 | Toulouse | Round of 16 | Fabrice Santoro
1999 | Toulouse | Round of 32 | Rainer Schuettler
1999 | Tashkent | Round of 16 | Peter Wessels
1999 | Tashkent | Round of 32 | Cedric Pioline
1999 | Washington | Round of 64 | Bjorn Phau
1999 | Segovia | Round of 16 | Nicolas Escude
1999 | Segovia | Round of 32 | Joaquin Munoz-Hernandez
1999 | SUI V BEL QF | Round Robin | Xavier Malisse
1999 | SUI V BEL QF | Round Robin | Christophe Van Garsse
1999 | Gstaad | Round of 32 | Younes El Aynaoui
1999 | Wimbledon | Round of 128 | Jiri Novak
1999 | London / Queen's Club | Round of 64 | Byron Black
1999 | Surbiton | Semi-Finals | Sargis Sargsian
1999 | Surbiton | Quarter-Finals | Alex O'Brien
1999 | Surbiton | Round of 16 | John van Lottum
1999 | Surbiton | Round of 32 | Maurice Ruah
1999 | Roland Garros | Round of 128 | Patrick Rafter
1999 | Ljubljana | Semi-Finals | Dinu Pescariu
1999 | Ljubljana | Quarter-Finals | Juan Albert Viloca-Puig
1999 | Ljubljana | Round of 16 | Radomir Vasek
1999 | Ljubljana | Round of 32 | Eduardo Medica
1999 | Espinho | Round of 32 | Juan Balcells
1999 | ATP Masters Series Monte Carlo | Round of 64 | Vincent Spadea
1999 | ITA V SUI 1RD | Round Robin | Gianluca Pozzi
1999 | ITA V SUI 1RD | Round Robin | Davide Sanguinetti
1999 | ATP Masters Series Miami | Round of 128 | Kenneth Carlsen
1999 | Grenoble | Round of 16 | Julien Boutter
1999 | Grenoble | Round of 32 | Rodolphe Gilbert
^Z
[1]+  Stopped                 python atp_match_data_player.py roger-federer f324 1998 2016

Also, having the connection dropped by the host is just one of the headaches of scraping data from websites. The web servers typically don't allow web requests to hammer them from a single IP address, which means one has to run the scraping scripts strategically.

In any case, I'm finalizing the documentation for the scripts, and I'm currently uploading a bunch of new CSV files.

from atp-world-tour-tennis-data.

hkakkar04 avatar hkakkar04 commented on August 22, 2024

@serve-and-volley: Thanks for the quick reply Kevin. Its strange that you aren't getting that error and also the script is downloading all matches. I am not sure why I am getting that error. Its completely puzzling. Is there any module that I need to install apart from lxml and may be beautiful soup to run the script?

Anyways I'm not sure what's the problem. Looking forward to the new files and data from 1877. Thanks for your help.

Hemant.

from atp-world-tour-tennis-data.

serve-and-volley avatar serve-and-volley commented on August 22, 2024

@hkakkar04: I have added new python scripts and all the CSV files fro 1877-2017 for the following:

  • Tournament data
  • Match scores data
  • Match stats data

I'll be adding more documentation soon.

Also I have been analyzing the data after loading the data into a Postgres database, and will be adding details on my results soon as well.

from atp-world-tour-tennis-data.

hkakkar04 avatar hkakkar04 commented on August 22, 2024

@serve-and-volley: Thanks Kevin! This is great! I was also able to run these scripts without any errors.

I'll try to use these scripts to download data for womens' single matches as well. Thanks again!

Hemant

from atp-world-tour-tennis-data.

serve-and-volley avatar serve-and-volley commented on August 22, 2024

@hkakkar04: Sounds good! Let me know if you have any other questions 😄

from atp-world-tour-tennis-data.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.