Coder Social home page Coder Social logo

bertrandmartel / tableau-scraping Goto Github PK

View Code? Open in Web Editor NEW
122.0 8.0 17.0 497 KB

Tableau scraper python library. R and Python scripts to scrape data from Tableau viz

License: MIT License

R 1.26% Python 98.71% Shell 0.03%
tableau r python dataframe web-scraping pandas

tableau-scraping's Introduction

Tableau Scraper

PyPI CI codecov License

Python library to scrape data from Tableau viz

R library is under development but a script is available to get the worksheets, see this

Python

Install

pip install TableauScraper

Usage

Get worksheets data

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/views/PlayerStats-Top5Leagues20192020/OnePlayerSummary"

ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

for t in workbook.worksheets:
    print(f"worksheet name : {t.name}") #show worksheet name
    print(t.data) #show dataframe for this worksheet

Try this on repl.it

Get a specific worksheet

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/views/PlayerStats-Top5Leagues20192020/OnePlayerSummary"

ts = TS()
ts.loads(url)

ws = ts.getWorksheet("ATT MID CREATIVE COMP")
print(ws.data)

select a selectable item

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/views/PlayerStats-Top5Leagues20192020/OnePlayerSummary"

ts = TS()
ts.loads(url)

ws = ts.getWorksheet("ATT MID CREATIVE COMP")

# show selectable values
selections = ws.getSelectableItems()
print(selections)

# select that value
dashboard = ws.select("ATTR(Player)", "Vinicius Júnior")

# display worksheets
for t in dashboard.worksheets:
    print(t.data)

Try this on repl.it

set parameter

Get list of parameters with workbook.getParameters() and set parameter value using workbook.setParameter("column_name", "value") :

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/views/PlayerStats-Top5Leagues20192020/OnePlayerSummary"

ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

# show parameters values / column
parameters = workbook.getParameters()
print(parameters)

# set parameters column / value
workbook = workbook.setParameter("P.League 2", "Ligue 1")

# display worksheets
for t in workbook.worksheets:
    print(t.data)

Try this on repl.it

It's possible to override the parameter name used in the API requests using inputParameter, which is different from the input name:

wb = wb.setParameter(inputName=None, value="Ligue 1",
                     inputParameter="[Parameters].[P.League (copy)_1642969456470679625]")

set filter

Get list of filters with worksheet.getFilters and set filter value using worksheet.setFilter("column_name", "value"):

from tableauscraper import TableauScraper as TS

url = 'https://public.tableau.com/views/WomenInOlympics/Dashboard1'
ts = TS()
ts.loads(url)

# show original data for worksheet
ws = ts.getWorksheet("Bar Chart")
print(ws.data)

# get filters columns and values
filters = ws.getFilters()
print(filters)

# set filter value
wb = ws.setFilter('Olympics', 'Winter')

# show the new data for worksheet
countyWs = wb.getWorksheet("Bar Chart")
print(countyWs.data)

Try this on repl.it

More advanced filtering options

  • You can specify dashboardFilter=True in order to use dashboard-categorical-filter API instead of categorical-filter-by-index API (related)

  • When using dashboardFilter=True you can skip the filter value check usin noCheck=True (related)

  • You can discard membershipTarget property from being sent in setFilter using setFilter('COLUMN','VALUE', membershipTarget=False) (related)

  • You can specify multiple filters for filters that enable that feature using setFilter('COLUMN', ['VALUE1','VALUE2'])

  • You can specify a "filter-delta" filter type adding the parameter filterDelta=True like the following setFilter('COLUMN','VALUE', filterDelta=True). This will discard all filters and add the one corresponding to ['VALUE'] in this case. This is helpful when all or some filters are selected by default, and you want to unselect them. The default behaviour (filterDelta=False) is filter-replace which sometimes doesn't work when filter multi-selection is possible in the dashboard. example

  • In last recourse, you can use indexValues property to directly specify the indices (if there is a bug in the library or anything comes up): setFilter('COLUMN', [], indexValues=[0,1,2])

Story points

Some Tableau dashboard have storypoints where you can navigate. To list the storypoints and go to a specific storypoints:

from tableauscraper import TableauScraper as TS

url = 'https://public.tableau.com/views/EarthquakeTrendStory2/Finished-Earthquakestory'
ts = TS()
ts.loads(url)
wb = ts.getWorkbook()

print(wb.getStoryPoints())
print("go to specific storypoint")
sp = wb.goToStoryPoint(storyPointId=10)

print(sp.getWorksheetNames())
print(sp.getWorksheet("Timeline").data)

Try this on repl.it

Level drill Up/Down

On some graph/table, there is a drill up/down feature used to zoom in or out data like this drill up/down

from tableauscraper import TableauScraper as TS

url = 'https://tableau.azdhs.gov/views/ELRv2testlevelandpeopletested/PeopleTested'
ts = TS()
ts.loads(url)
wb = ts.getWorkbook()

sheetName = "P1 - Tests by Day W/ % Positivity (Both) (2)"

drillDown1 = wb.getWorksheet(sheetName).levelDrill(drillDown=True, position=1)
drillDown2 = drillDown1.getWorksheet(sheetName).levelDrill(drillDown=True, position=1)
drillDown3 = drillDown2.getWorksheet(sheetName).levelDrill(drillDown=True, position=1)

print(drillDown1.getWorksheet(sheetName).data)
print(drillDown2.getWorksheet(sheetName).data)
print(drillDown3.getWorksheet(sheetName).data)

Try this on repl.it

The position parameter is default to 0. It doesn't seem to be present in the json configuration. If the default is not working try incrementing it or checkout the network tabs using Chrome devtools.

Download CSV data

For Tableau URL that have the download feature enabled, you can download full data using:

from tableauscraper import TableauScraper as TS

url = 'https://public.tableau.com/views/WYCOVID-19Dashboard/WyomingCOVID-19CaseDashboard'
ts = TS()
ts.loads(url)
wb = ts.getWorkbook()
data = wb.getCsvData(sheetName='case map')

print(data)

Note that in some Tableau server, the prefix used in the API url is different. As it's set in the javascript, it must be set manually if it's not the same as public.tableau.com like:

wb.getCsvData(sheetName='worksheet1', prefix="vud")

The prefix values, I've encountered are: vud and vudcsv. The default is vudcsv.

Try this on repl.it

Download Cross Tab data

For Tableau URL that have the crosstab feature enabled, you can download the crosstab using:

from tableauscraper import TableauScraper as TS

url = "https://tableau.soa.org/t/soa-public/views/USPostLevelTermMortalityExperienceInteractiveTool/DataTable2"

ts = TS()
ts.loads(url)
wb = ts.getWorkbook()

wb.setParameter(inputName="Count or Amount", value="Amount")

data = wb.getCrossTabData(
    sheetName="Data Table 2 - Premium Jump & PLT Duration")

print(data)

Go to sheet

Get list of all sheets with subsheets visible or invisible, ability to send a go-to-sheet command (dashboar button) :

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/views/COVID-19VaccineTrackerDashboard_16153822244270/Dosesadministered"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

sheets = workbook.getSheets()
print(sheets)

nycAdults = workbook.goToSheet("NYC Adults")
for t in nycAdults.worksheets:
    print(f"worksheet name : {t.name}")  # show worksheet name
    print(t.data)  # show dataframe for this worksheet

Render tooltip

Get the tooltip html output when render-tooltip-server API is called. This is particularly useful when dealing with server side rendering dashboard:

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/views/CMI-2_0/CMI"

ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
ws = workbook.getWorksheet("US Map - State - CMI")

tooltipHtml = ws.renderTooltip(x=387, y=196)
print(tooltipHtml)

Sample usecases

Server side rendering

If the tableau url you're working on is using server side rendering, data can't be extracted as is.

You can checkout if your tableau url is using server side rendering by opening chrome development console / network tab. You would notice the API calls have renderMode properties set to render-mode-server:

server side render mode

Server side rendering means that no data is sent to the browser. Instead, the server is rendering the tableau chart using images only and detects selection using mouse coordinates.

To extract the data, one thing that has worked with some tableau url was to trigger a specific filter that is not server-side-rendered. You can checkout the network tab on Chrome development console to check if the filter call is using or not server-side rendering or client-side-rendering with renderMode:

client side rendering

If the filter is only using client side rendering, you can list all filters and perform the filter for each value. This technique only works if the tableau data has "cleared" the filter by default otherwise the data is already cached when the tableau data is loaded, and since it's using server side rendering you can't access this data

Checkout the following repl.it for examples with tableau url using server side rendering:

Testing Python script

To discover all worksheets, selectable columns and dropdowns, run prompt.py script under scripts directory :

git clone [email protected]:bertrandmartel/tableau-scraping.git
cd tableau-scraping/scripts

#get worksheets data
python3 prompt.py -get workbook -url "https://public.tableau.com/views/PlayerStats-Top5Leagues20192020/OnePlayerSummary"

#select a selectable item
python3 prompt.py -get select -url "https://public.tableau.com/views/MKTScoredeisolamentosocial/VisoGeral"

#set a parameter
python3 prompt.py -get parameter -url "https://public.tableau.com/views/COVID-19DailyDashboard_15960160643010/Casesbyneighbourhood"

Settings

TableauScraper class has the following optional parameters :

Parameters default value description
logLevel logging.INFO log level
delayMs 500 minimum delay in millis between api calls

R

under R directory :

Rscript tableau.R

R library is under development

Dependencies

requirements.txt

  • pandas
  • requests
  • beautifulsoup4

Stackoverflow Questions

See those stackoverflow posts about this topic

tableau-scraping's People

Contributors

bertrandmartel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tableau-scraping's Issues

getTupleIds fail to run when `presModel` is None

error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_22296/2734251275.py in <module>
----> 1 province_ws.getTupleIds()

~\.virtualenvs\icu-ZFp6mLxe\lib\site-packages\tableauscraper\TableauWorksheet.py in getTupleIds(self)
    324             columnObj = [
    325                 t
--> 326                 for t in utils.getIndicesInfo(
    327                     presModel, self.name, noSelectFilter=True, noFieldCaption=True
    328                 )

~\.virtualenvs\icu-ZFp6mLxe\lib\site-packages\tableauscraper\utils.py in getIndicesInfo(presModelMap, worksheet, noSelectFilter, noFieldCaption)
    126 
    127 def getIndicesInfo(presModelMap, worksheet, noSelectFilter=True, noFieldCaption=False):
--> 128     genVizDataPresModel = presModelMap["vizData"][
    129         "presModelHolder"
    130     ]["genPresModelMapPresModel"]["presModelMap"][worksheet]["presModelHolder"][

TypeError: 'NoneType' object is not subscriptable

reproduce

from tableauscraper import TableauScraper as TS

url= "https://public.tableau.com/views/moph_covid_v3/Story1"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

province_ws = workbook.getWorksheet("province_total")
province_ws.getTupleIds()

for more, pls look at https://github.com/CircleOnCircles/tableau-scraping/tree/fail-getTupleIds

pipenv install
pipenv run python issue.py

notes

  1. the column Prov Name has duplicate values. and in Thai alphabet
  2. worksheet.select() API fails consequently e.g. province_ws.select("Prov Name", "สมุทรสาคร")

No data dictionary present in response

Hey Bertrand,

AZDHS just recently changed their dashboards and the testing tableau now requires that I take two actions on it to get to the data I need. When I try that I get a warning that no data dictonary present in response.

This could just be my weakness with python.

Thanks for the help.

from tableauscraper import TableauScraper as TS
import pandas as pd
import time
import numpy as np


url = 'https://tableau.azdhs.gov/views/ELR/TestsConducted?%3Aembed=y&'

# In[]:
    
data_list = []
value = 0
df = pd.DataFrame( columns = ["date", "tests"])
sheetName = "P1 - Tests by Day W/ % Positivity (Both)"

ts = TS()
ts.loads(url)
dashboard = ts.getWorkbook().getWorksheet('Date Range Filter').select("Time Frame", "All Time")

d = dashboard.getWorksheet(sheetName).levelDrill(drillDown=False, position=1)

enhancement: workbook_iterate and workbook_flatten

I wrote two higher level functions that could be useful to others if included in your library?

def workbook_iterate(url, **selects):
    "generates combinations of workbooks from combinations of parameters, selects or filters"
def workbook_flatten(wb, date=None, **mappings):
    """return a single DataFrame from a workbook flattened according to mappings
    mappings is worksheetname=columns
    if columns is type str puts a single value into column
    if columns is type dict will map worksheet columns to defined dataframe columns
    if those column names are in turn dicts then the worksheet will be pivoted and the values mapped to columns
    e.g.
    worksheet1="Address", 
    worksheet2=dict(ws_phone="phone", ws_state="State"), 
    worksheet3=dict(ws_state=dict(NSW="State: New South Wales", ...))
    """
    # TODO: generalise what to index by and default value for index

The code and examples of how I'm using it

Used in combination you can reliably scrape lots of data with not too much code, at least in the case of similar to what I've used it for?

KeyError: 'cstring' when calling setDropdown

Hi! Not sure how much you're maintaining this, but I was playing around with the library for some tableau scraping of COVID data and came across an issue.

Everything worked fantastic until I tried to use setDropdown() on this dashboard. When I do that, I get:

    201 
    202 def getWorksheetCmdResponse(selectedZone, dataFull):
--> 203     cstring = dataFull["cstring"]
    204     details = selectedZone["presModelHolder"]["visual"]["vizData"]
    205 

KeyError: 'cstring'

I think this is happening because the data in question is integer data instead of strings, but I'm not entirely sure what's going on.

I have a testcase that demonstrates the issue at https://colab.research.google.com/drive/14nUWYCvJB1ERsROIwY-AL_ZkIE73GrMU

Persist parameters info payload when calling consecutive commands

From this url https://tableau.ons.org.br/t/ONS_Publico/views/DemandaMxima/HistricoDemandaMxima?:embed=y&:display_count=y&:showAppBanner=true&:showVizHome=y the parameter list is lost when calling consecutive commands for example:

from tableauscraper import TableauScraper as TS

url = 'https://tableau.ons.org.br/t/ONS_Publico/views/DemandaMxima/HistricoDemandaMxima'
ts = TS()
ts.loads(url)
wb = ts.getWorkbook()
print(wb.getParameters())

# Set units
wb = wb.setParameter("Selecione DM Simp 4", "Demanda Máxima Instântanea (MW)")
print(wb.getParameters())

# Set to daily resolution
wb = wb.setParameter("Escala de Tempo DM Simp 4", "Dia")

Output

[{'column': 'Fim Primeiro Período DM Simp 4', 'values': [], 'parameterName': '[Parameters].[Fim Primeiro Período DM Simp 4]'}, {'column': 'Início Primeiro Período DM Simp 4', 'values': [], 'parameterName': '[Parameters].[Início Primeiro Período DM Simp 4]'}, {'column': 'Escala de Tempo DM Simp 4', 'values': ['Ano', 'Mês', 'Semana Operativa', 'Dia'], 'parameterName': '[Parameters].[Escala do Tempo DM Simp 4]'}, {'column': 'Selecione DM Simp 4', 'values': ['Demanda Máxima Horária (MWh/h)', 'Demanda Máxima Instântanea (MW)'], 'parameterName': '[Parameters].[Parameter 1]'}]

[{'column': 'Selecione DM Simp 4', 'values': ['Demanda Máxima Horária (MWh/h)', 'Demanda Máxima Instântanea (MW)'], 'parameterName': '[Parameters].[Parameter 1]'}]

2021-10-03 06:12:45,754 - tableauScraper - ERROR - column Escala de Tempo DM Simp 4 not found

One solution is to keep original info payload along with the info of the cmd response

Get selectable items from storypoints items in command response

Storypoints items are not parsed when getting selectable items from a command response

Example:

from tableauscraper import TableauScraper as TS

url = 'https://tableau.ons.org.br/t/ONS_Publico/views/DemandaMxima/HistricoDemandaMxima'
ts = TS()
ts.loads(url)

wb = ts.getWorkbook()
print(wb.getParameters())
ws = wb.getWorksheet("Simples Demanda Máxima Semana Dia")

# Set units
wb = wb.setParameter("Selecione DM Simp 4", "Demanda Máxima Instântanea (MW)")

# Set to daily resolution
wb = wb.setParameter("Escala de Tempo DM Simp 4", "Dia")

# # Set the start date
wb = wb.setParameter("Início Primeiro Período DM Simp 4", "01/01/2017")

# Set the end date
wb = wb.setParameter("Fim Primeiro Período DM Simp 4", "31/12/2017")

# Retrieve daily worksheet
ws = wb.getWorksheet("Simples Demanda Máxima Semana Dia")

print(ws.data[['Data Escala de Tempo 1 DM Simp 4-value',
               'SOMA(Selecione Tipo de DM Simp 4)-value', 'ATRIB(Subsistema)-alias']])

print(ws.getSelectableItems())

Filter data from dropdown

Actually I'm not sure what should I name the topic, but this is what I want, I would like to scrape covid19 data from Thailand from this website : https://ddc.moph.go.th/covid19-dashboard/?dashboard=province

You might see on the website that host the embedded Tableau dashboard can filter both date (calendar one) and province (เลือกจังหวัด << select province in Thai) but I cannot filter it by using Python code.

I use this code in order to scrape see the vaccinated number from a specific dat but I have no idea how can I specify the province. Once I call wb.getParameters() , there is no province option.

I might not know your library well.
Is there a way to specify both province and date ? Please hlep

This is my code you can try

from tableauscraper import TableauScraper as TS
province = 'กรุงเทพมหานคร' # this is Bangkok in Thai
url = "https://public.tableau.com/views/SATCOVIDDashboard/2-dash-tiles-province?:showVizHome=no&province=" + province
#url = "https://ddc.moph.go.th/covid19-dashboard/?dashboard=province"
print(url)
print(f"getting data from province : {province}")
ts = TS()
ts.loads(url)
wb = ts.getWorkbook()

parameters = wb.getParameters()
print(parameters)

sheetName = 'D2_Vac1Today'  # sheet that represent vaccinated people on the specific day

# show dataframe with yearly data
ws = wb.getWorksheet(sheetName)
print(ws.data)

# change date
wb = wb.setParameter('param_date', '2021-08-16')
ws = wb.getWorksheet(sheetName)
print(ws.data)

# change date
wb = wb.setParameter('param_date', '2021-08-15')

ws = wb.getWorksheet(sheetName)
print(ws.data)

dashboard.setDropdown() is not working.

dashboard = dashboard.setDropdown("P.League 2", "Ligue 1")

I was testing the setDropdown script and found that the example script is not working. Parser works perfectly on dashboard first loaded data, but it doesn't work on the setDropdown response.

To reproduce the error just execute the script provided in the library (and referenced in this issue).

Hope you can find the issue and fix it.

Thank you very much

How to collect fully vaccinated data?

Hello,

I want to collect COVID vaccination data from Wisconsin (https://www.dhs.wisconsin.gov/covid-19/vaccine-data.htm#day) and Ohio (https://coronavirus.ohio.gov/wps/portal/gov/covid-19/dashboards/covid-19-vaccine/covid-19-vaccination-dashboard). More specifically, I am interested in how many people are fully vaccinated in each county for each race. Using your package, I could successfully collect vaccination data which is at least one dose, but cannot get fully vaccinated data. Can you help me to fix this problem, thanks!

image

image

Issue selecting Parameter in Story Point

I'm trying to scrape the data here. In the browser I scroll to the "Cases Demographics" item at the top, which appears to be a Story Point. After selecting that I then select the "Age" radio button. Finally I am able to click Download -> Crosstab -> Excel and get the data I need.

I'm trying to use your library to automate this process. I can see the "Cases Demographics" in the storypoints list that comes back, and I'm able to select that. After selecting the storypoint I can see the different parameters available. But as soon as I select the "Age" parameter (or any of the available parameters) I just get this error and no data:

tableauScraper - WARNING - no data dictionary present in response

Also after selecting the story point, if I call getWorksheetNames() on the storypoint I just get an empty list.

Here is my code:

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/views/NCDHHS_COVID-19_DataDownload_Story_16220681778000/DataBehindtheDashboards"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

print(workbook.getStoryPoints())
sp = workbook.goToStoryPoint(storyPointId=6)

# show parameters values / column
parameters = workbook.getParameters()
print(parameters)
workbook = workbook.setParameter(inputName="Select Demographic", value="Age")
print(sp.getWorksheetNames())

for ws in workbook.worksheets:
    print(f"worksheet name : {ws.name}")
    print(ws.data)

And the output:

{'storyBoard': 'DataBehindtheDashboards', 'storyPoints': [[{'storyPointId': 1, 'storyPointCaption': 'Daily Cases and Deaths Metrics'}, {'storyPointId': 16, 'storyPointCaption': 'Daily Cases and Deaths Metrics by County \n'}, {'storyPointId': 2, 'storyPointCaption': 'Daily Testing Metrics'}, {'storyPointId': 15, 'storyPointCaption': 'Percent Positive by County'}, {'storyPointId': 17, 'storyPointCaption': 'County Cases and Deaths \n'}, {'storyPointId': 4, 'storyPointCaption': 'ZIP Code Cases and Deaths'}, {'storyPointId': 5, 'storyPointCaption': 'Demographics'}, {'storyPointId': 6, 'storyPointCaption': 'Cases Demographics'}, {'storyPointId': 7, 'storyPointCaption': 'Outbreaks and Clusters'}, {'storyPointId': 8, 'storyPointCaption': 'Personal Protective Equipment (PPE)'}, {'storyPointId': 9, 'storyPointCaption': 'Hospital Patient Data'}, {'storyPointId': 10, 'storyPointCaption': 'Hospital Beds and Ventilators'}, {'storyPointId': 11, 'storyPointCaption': 'Hospital Demographics'}, {'storyPointId': 12, 'storyPointCaption': 'Vaccinations - Doses by County '}, {'storyPointId': 13, 'storyPointCaption': 'People Vaccinated by County '}, {'storyPointId': 14, 'storyPointCaption': 'People Vaccinated Demographics'}, {'storyPointId': 18, 'storyPointCaption': 'Wastewater Monitoring'}]]}
2021-12-08 17:14:01,366 - tableauScraper - WARNING - no data dictionary present in response
[{'column': 'Select Demographic', 'values': ['Race', 'Ethnicity', 'Age', 'Gender', 'Birth through K-12'], 'parameterName': '[Parameters].[Parameter 2 1]'}]
2021-12-08 17:14:03,302 - tableauScraper - WARNING - no data dictionary present in response
[]

setFilter() returning unexpected data

Hi Bertrand

I am continuing with the scrape you kindly helped with on stack overflow but have come across some strange behavior. The following code sets parameters and a filter to display hourly generation for a power plant named 'ABAÚNA' but after setting the filter the data for 'AIMORÉS' is returned. Do I have to load an intermediate worksheet before applying the filter?

from tableauscraper import TableauScraper as TS

url = "https://tableau.ons.org.br/t/ONS_Publico/views/GeraodeEnergia/HistricoGeraodeEnergia"
ts = TS()

ts.loads(url)
wb = ts.getWorkbook()
ws = wb.getWorksheet("Simples Geração de Energia Barra Semana")

# Set resolution
print(f"Set resolution to Hora")
wb.setParameter("Escala de Tempo GE Simp 4", "Hora")

# Set the start date
print(f"Set start date to 01/01/2016")
wb.setParameter("Início Primeiro Período GE Simp 4",f"01/01/2016")

# Set the end date
print(f"Set end date to 01/01/2017")
wb = wb.setParameter("Fim Primeiro Período GE Simp 4",f"01/01/2017")

# Get plant names
usina = [
    t["values"]
    for t in ws.getFilters()
    if t["column"] == "USINACEG"
][0]

# Set plant name to 'ABAÚNA'
plantName=" ABAÚNA (CEG: CGH.PH.RS.000015-9.01)"
print(f"Set plant to {plantName}")
wb = ws.setFilter("USINACEG",plantName,filterDelta=True)

# Retrieve daily worksheet
ws = wb.getWorksheet("Simples Geração de Energia Dia")

# Show plants
print(ws.data)  # Contains data for 'AIMORÉS'

add missing method for command `select-region-no-return-server`

this is the example curl of it.

curl 'https://public.tableau.com/vizql/w/moph_covid_v3/v/Story1/sessions/42D45C5A5CE64D568F6A9FC5E4B49538-0:0/commands/tabsrv/select-region-no-return-server' \
  -H 'authority: public.tableau.com' \
  -H 'sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Microsoft Edge";v="96"' \
  -H 'x-tsi-active-tab: Story%201' \
  -H 'x-newrelic-id: XA4CV19WGwIBV1RVBQQBUA==' \
  -H 'x-xsrf-token: VKwCIrLfvwR3CEuxIjzKRObYjUdnikog' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62' \
  -H 'content-type: multipart/form-data; boundary=fHyp2ZJG' \
  -H 'accept: text/javascript' \
  -H 'x-requested-with: XMLHttpRequest' \
  -H 'sec-ch-ua-platform: "Windows"' \
  -H 'origin: https://public.tableau.com' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-dest: empty' \
  -H 'referer: https://public.tableau.com/views/moph_covid_v3/Story1?%3Adisplay_static_image=y&%3AbootstrapWhenNotified=true&%3Aembed=true&%3Alanguage=en-US&:embed=y&:showVizHome=n&:apiID=host0' \
  -H 'accept-language: en-US,en;q=0.9' \
  -H 'cookie: _ga=GA1.2.1052995761.1640085245; _ga=GA1.3.1052995761.1640085245; _gcl_au=1.1.1745605423.1640093530; _fbp=fb.1.1640093531449.1204649223; seerid=3063ccc8-baa5-438d-8b18-3e590ecba6dc; ELOQUA=GUID=3A902EAE177F48259C6773185FBA384F; tableau_public_negotiated_locale=en-us; _gid=GA1.2.1491619847.1640332454; _gid=GA1.3.1491619847.1640332454; tableau_locale=en; tableau_xsrf_token=yWJgBuoqlShwu1eBrV0qMyZksxxH7ew4; tableau_client_id=2b11338d-0ee2-40df-980c-ffc7b1bc48ec; tableau_access_token=heh8YOUFSt2k0LtQxXikyQ|NC3bh1ap1NvqxvBbPurVV3GWrzUuU8eQ; tableau_refresh_token="3PtDoccuQYSeq2+QE1UQfQ==:OukH9ADVg9lhCkWvgw0qApSHlyYoPMC0"; workgroup_session_id=null; XSRF-TOKEN=VKwCIrLfvwR3CEuxIjzKRObYjUdnikog' \
  --data-raw $'--fHyp2ZJG\r\nContent-Disposition: form-data; name="visualIdPresModel"\r\n\r\n{"worksheet":"province_total","dashboard":"Dashboard_Province_index_new_v3","storyboard":"Story 1","storyPointId":12}\r\n--fHyp2ZJG\r\nContent-Disposition: form-data; name="vizRegionRect"\r\n\r\n{"x":120,"y":130,"w":0,"h":0,"r":"yheader"}\r\n--fHyp2ZJG\r\nContent-Disposition: form-data; name="mouseAction"\r\n\r\nsimple\r\n--fHyp2ZJG\r\nContent-Disposition: form-data; name="zoneSelectionType"\r\n\r\nreplace\r\n--fHyp2ZJG\r\nContent-Disposition: form-data; name="dashboardPm"\r\n\r\n{"sheetName":"Dashboard_Province_index_new_v3","isDashboard":true,"storyboard":"Story 1","storyPointId":12}\r\n--fHyp2ZJG\r\nContent-Disposition: form-data; name="zoneId"\r\n\r\n96\r\n--fHyp2ZJG--\r\n' \
  --compressed

this selection happens when I click on the list. It will filter data.

reproduce

on https://public.tableau.com/app/profile/karon5500/viz/moph_covid_v3/Story1
click on any items on the rightmost list.

KeyError: 'zones' returned by getWorkbook()

This is occurring with: https://public.tableau.com/views/StatebasedbuildingapprovalsABSNatHERS/NatHERSCertificatesvs_ABSBuildingApprovals

I attempted to see if I could fix the issue by checking to see if zones existed and return an empty string where zones is looked for. I imagine this causes other issues though as there was the only data I could find afterwards was returned by getSheets().

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)

/tmp/ipykernel_78053/2712717284.py in describe_tableau(ts, lines, url)
     10     workbook = ""
     11 
---> 12     workbook = ts.getWorkbook()
     13 
     14 

~/.local/lib/python3.9/site-packages/tableauscraper/TableauScraper.py in getWorkbook(self)
     98 
     99     def getWorkbook(self) -> TableauWorkbook:
--> 100         return dashboard.getWorksheets(self, self.data, self.info)
    101 
    102     def getWorksheet(self, worksheetName) -> TableauWorksheet:

~/.local/lib/python3.9/site-packages/tableauscraper/dashboard.py in getWorksheets(TS, data, info)
     53         worksheets = utils.listWorksheet(presModelMapVizData)
     54     elif presModelMapVizInfo is not None:
---> 55         worksheets = utils.listWorksheetInfo(presModelMapVizInfo)
     56         if len(worksheets) == 0:
     57             worksheets = utils.listStoryPointsInfo(presModelMapVizInfo)

~/.local/lib/python3.9/site-packages/tableauscraper/utils.py in listWorksheetInfo(presModel)
     42 
     43 def listWorksheetInfo(presModel):
---> 44     zones = presModel["workbookPresModel"]["dashboardPresModel"]["zones"]
     45     return [
     46         zones[z]["worksheet"]

KeyError: 'zones'

Scraping tableau data based on data filtered with dropdown boxes in non existent worksheet columns

I'm trying to scrape data from a tableau dashboard on this website

The package itself works amazingly, but I'd also like to filter the data (from the map (the data appears in the worksheet Mapa) by year (Año) and sex (Sexo) based on the dropdown boxes on the dashboard. I suspect that such filtering can be done through the setFilter() function with the argument dashboardFilter=True but am having trouble implementing it and/or am not quite understanding how it works.

I think the problem is because the names of the filters on the dashboard differ from the names of the columns in the actual worksheets and in some cases, do not even exist in the specific worksheet I'm looking for. Is there a way around this? I would be very grateful for any assistance or insight.

EDIT: The issue is very similar to #7 but I'm struggling to find the right fields to filter the data I would like (or am completely missing something)

Thankyou

from tableauscraper import TableauScraper as TS
import pandas as pd

url = "https://public.tableau.com/views/DashboardRegional_15811027307400/DashboardRegional?:embed=y&:showVizHome=no&:host_url=https%3A%2F%2Fpublic.tableau.com%2F&:embed_code_version=3&:tabs=no&:toolbar=no&:animate_transition=yes&:display_static_image=no&:display_spinner=no&:display_overlay=yes&:display_count=yes&publish=yes&:loadOrderID=0"

ts = TS()
ts.loads(url)
wb=ts.getWorkbook()
sheetName = "Mapa"

ws = wb.getWorksheet(sheetName)
print(ws.data)
# I cannot filter by these categories (the drop down boxes in teh dashboard)
# wb = ws.setFilter("Sexo", "Hombres", dashboardFilter=True)
# ws = wb.getWorksheet(sheetName)
# wb = ws.setFilter("Año", 2020, dashboardFilter=True)
# ws = wb.getWorksheet(sheetName)
# print(ws.data)

Can't set a filter to a value thats not in the defined list for that filter

I wrote the below code to get round this issue but it would be useful if this was a flag I could pass to setFilter.
It also doesn't use ordinal value which another useful feature to have.

def force_setFilter(wb, ws_name, columnName, values):
    "setFilter but ignore the listed filter options. also gets around wrong ordinal value which makes index value incorrect"

    scraper = wb._scraper
    tableauscraper.api.delayExecution(scraper)
    ws = next(ws for ws in wb.worksheets if ws.name == ws_name)

    filter = next(
        {
            "globalFieldName": t["globalFieldName"],
        }
        for t in ws.getFilters()
        if t["column"] == columnName
    )

    payload = (
        ("dashboard", scraper.dashboard),
        ("globalFieldName", (None, filter["globalFieldName"])),
        ("qualifiedFieldCaption", (None, columnName)),
        ("membershipTarget", (None, "filter")),
        ("exclude", (None, "false")),
        ("filterValues", (None, json.dumps(values))),
        ("filterUpdateType", (None, "filter-replace"))
    )
    try:
        r = scraper.session.post(
            f'{scraper.host}{scraper.tableauData["vizql_root"]}/sessions/{scraper.tableauData["sessionid"]}/commands/tabdoc/dashboard-categorical-filter',
            files=payload,
            verify=scraper.verify
        )
        scraper.lastActionTime = time.time()

        if r.status_code >= 400:
            raise requests.exceptions.RequestException(r.content)
        resp = r.json()
        errors = [
            res['commandReturn']['commandValidationPresModel']['errorMessage']
            for res in resp['vqlCmdResponse']['cmdResultList']
            if not res['commandReturn'].get('commandValidationPresModel', {}).get('valid', True)
        ]
        if errors:
            wb._scraper.logger.error(str(", ".join(errors)))
            raise tableauscraper.api.APIResponseException(", ".join(errors))

        wb.updateFullData(resp)
        return tableauscraper.dashboard.getWorksheetsCmdResponse(scraper, resp)
    except ValueError as e:
        scraper.logger.error(str(e))
        return tableauscraper.TableauWorkbook(
            scraper=scraper, originalData={}, originalInfo={}, data=[]
        )
    except tableauscraper.api.APIResponseException as e:
        wb._scraper.logger.error(str(e))
        return tableauscraper.TableauWorkbook(
            scraper=scraper, originalData={}, originalInfo={}, data=[]
        )

Specific worksheets only?

Hey @bertrandmartel , this is such an awesome library! Easy to use and super robust.

Quick question, are there any ways to only get a specific worksheet when getting a dashboard (rather than looking through dashboard.worksheets afterwards)?

How to scrape the interactive map?

Hi, thanks for this awesome toolkit. I used your library to scrape an interactive map data. But I encounter a problem of getting insufficient data.

I want to scrape every county's vaccine demographic information show in this page: https://www.dhs.wisconsin.gov/covid-19/vaccine-data.htm#day

I used the following code to get the sheet.

from tableauscraper import TableauScraper as TS
url = 'https://bi.wisconsin.gov/t/DHS/views/VaccinesAdministeredtoWIResidents/VaccinatedWisconsin-County'
ts = TS()
ts.loads(url)
dashboard = ts.getDashboard()
state = 'Wisconsin'
for t in dashboard.worksheets:
    #show worksheet name
    print(f"WORKSHEET NAME : {t.name}")
    #show dataframe for this worksheet
    # print(t.data)
    t.data.to_csv('TMP/'+state + '_' + t.name.replace('/', '_') + '.csv')

It returned a sheet named : Wisconsin_Race vax_unvax county.csv. However, this sheet only contains the total state level information. How could I get each county's information by your package?

Thanks.

Filter grabs incorrect data (Similar to Issue #6)

First, thanks for putting this together. It's likely this problem is either user error on my part or weirdness in how the worksheet I'm trying to scrape is set up.

I'm trying to get filtered data from the https://analytics.la.gov/t/LDH/views/covid19_hosp_vent_reg/Hosp_vent_c. The options listed on the dropdown on the visualization are "(All)" and then the nine regions. Using getFilters() returns just the nine regions and when using setFilters() the data returned is for the prior region in the index.

For example. This code:

url = 'https://analytics.la.gov/t/LDH/views/covid19_hosp_vent_reg/Hosp_vent_c'
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

sheets = workbook.getSheets()
ws = ts.getWorksheet('Hospitalization and Ventilator Usage')
wb = ws.setFilter('Region', '2 - Baton Rouge')
regionWs = wb.getWorksheet('Hospitalization and Ventilator Usage')
print(regionWs.data)

Returns the results for '1 - New Orleans' which is in the prior index position.

Setting the filter to the first region ('1 - New Orleans') results in KeyError: 'dataDictionary' when trying to access its data.

The issue appears to be similar to one addressed in Issue #6, except with filters rather than selections. In this case, getSelectableItems() returns the data displayed on the chart when "(All)" is selected.

I'm using tableauscraper version 0.1.8.

scraping workbook, NOT worksheet with selectables

Hi everyone,
I think i have found a tableau that has selectables not in the specific worksheet but in the workbook instead and i do not know how to scrape all the years. I tried this code to select the different years but when i look for the filters it gives back an empty list. Probably the the filtes are set at the workbook level and not at the worksheet level, however 'TableauWorkbook' object has no attribute 'getFilters'.

Could you please help me with that?

This is the table i'm referring to: https://public.tableau.com/views/SocialistPartyScandinavianFederation/Story1

And this is the code I used to scrape the data (as you can see i can only scrape the data from 1916 as it is set as the default year in the table).

from tableauscraper import TableauScraper as TS

url= "https://public.tableau.com/views/SocialistPartyScandinavianFederation/Story1"

ts = TS()
ts.loads(url)
wb = ts.getWorkbook()

for t in workbook.worksheets:
    print(f"worksheet name : {t.name}") #show worksheet name
    print(t.data) #show dataframe for this worksheet
    
ws = ts.getWorksheet("tab1")

# show selectable values
print(ws.getSelectableItems())  # does not show years
print(ws.getFilters()) #empty
print(wb.getParameters())  #empty

How to deal with server side rendered tableau data ?

Thanks for the work to put the R and python scripts together for this. I'm relatively experienced in R but less so in Python - apologies if the below is relatively simple user error.

OS: Mac 11.0.1

Target dashboard:
https://public.tableau.com/profile/football.observatory#!/vizhome/InstatIndexRanking/Instatindex

Alternative URL (redirects to the above, but is similar in structure to the examples you provided and did offer different results in R):

https://public.tableau.com/views/InstatIndexRanking/Instatindex

Issues in R:

Using the primary URL:

data <- body %>% 
+     html_nodes("textarea#tsConfigContainer") %>% 
+     html_text()

returns

character(0)

and nothing below that works as a result.

Using the alternate URL above, step by step the script seems to work ok until:

data <- fromJSON(extract[1,3])

Which results in:

> data
$secondaryInfo
list()

FWIW, data <- fromJSON(extract[1,2]) has tons of info in it (e.g. worksheet names, IDs, etc), but I couldn't find anything to fully satisfy needs lower down in the script.

In Python, unfortunately I can't offer much in the way of debugging, but with the alternate URL I get the below error.

Traceback (most recent call last):
  File "/Users/chris/Documents/tableau-scraping-master/scripts/tableau_specific_sheet.py", line 6, in <module>
    ts.loads(url)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/tableauscraper/TableauScraper.py", line 81, in loads
    presModelMap = self.data["secondaryInfo"]["presModelMap"]
KeyError: 'presModelMap'

Thanks so much for any insight you can provide.

measured values column is not extracted

Hi @bertrandmartel, I met a new problem when using TS to scrape the data from the tableau of New York State. Its URL is: https://covid19vaccine.health.ny.gov/vaccine-demographic-data

The original data look like this:

image

I tried the following code to scrape the data:

from tableauscraper import TableauScraper as TS

url = "https://covid19tracker.health.ny.gov/views/Race_Ethnicity_Public/RacebyCounty"

ts = TS()
ts.loads(url)

workbook = ts.getWorkbook()

parameters = workbook.getParameters()
print(parameters)

ts = TS()
ts.loads(url)

# set parameters column / value
workbook = workbook.setParameter('Show Value as', "Number")
# display worksheets
workbook.getWorksheet('Race').data

But what I get is this:

image

So the detailed values are replaced with %all%. I don't understand why. Do you have any suggestions?

Thanks very much!

Received error in TableauScraper.loads() method when scraping public tableau site

code:

url = "https://covid.cdc.gov/covid-data-tracker/#nationwide-blood-donor-seroprevalence"
ts = TS()
ts.loads(url)

error:

File "/usr/local/lib/python3.9/site-packages/tableauscraper/TableauScraper.py", line 78, in loads
    soup.find("textarea", {"id": "tsConfigContainer"}).text
AttributeError: 'NoneType' object has no attribute 'text'

Thanks!

forcing client rendering with :render=true

👋

It looks like the tableau client application will do something like detect the :render=true query param and then make requests to a different command endpoint (select-region-no-return-server) to get the data back in a format for client rendering.

Not sure how best to incorporate this but maybe a clientRender option in TableauScraper that adds the query param to the initial request and then uses select-region-no-return-server instead of select.

Example url:

https://dashboards.doh.nj.gov/vizql/w/DailyConfirmedCaseSummary7_22_2020/v/ConfirmedCases/sessions/98290EE86BE74F8A99259334C27502E2-1:0/commands/tabsrv/select-region-no-return-server

I found this while toying with a node.js port of this library and using some of the functions with playwright to observe how things were working. I'd be happy to collaborate! Getting a scraper working for the dashboard I'm currently focused on has been a wild ride, and am quite grateful for your work on this project.

Preferred citation?

Absolutely zero rush on this, but if you have a preferred citation we'd like to credit this package in published work.

Can't set a Parameter if not listed in getParameters()

I wrote this code to get around it but would be better if it was flag or something?

def force_setParameter(wb, parameterName, value):
    "Allow for setting a parameter even if it's not present in getParameters"
    scraper = wb._scraper
    tableauscraper.api.delayExecution(scraper)
    payload = (
        ("fieldCaption", (None, parameterName)),
        ("valueString", (None, value)),
    )
    r = scraper.session.post(
        f'{scraper.host}{scraper.tableauData["vizql_root"]}/sessions/{scraper.tableauData["sessionid"]}/commands/tabdoc/set-parameter-value',
        files=payload,
        verify=scraper.verify
    )
    scraper.lastActionTime = time.time()
    if r.status_code >= 400:
        raise requests.exceptions.RequestException(r.content)
    resp = r.json()

    wb.updateFullData(resp)
    return tableauscraper.dashboard.getWorksheetsCmdResponse(scraper, resp)

soup.find fails to find Tableau data

Ran this on WSL on Windows 10 which is a flavor of ubuntu.

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/app/profile/epidemiology.immunization.services.branch/viz/COVID-19DailyHighlights/DailyHighlights"
ts = TS()
ts.loads(url)

Then, we see this error:
python scrape_tableau.py
Traceback (most recent call last):
File "scrape_tableau.py", line 9, in
ts.loads(url)
File "/mnt/c/Users/stepa8/Projects/tableau-scraping/tab-env/lib/python3.8/site-packages/tableauscraper/TableauScraper.py", line 80, in loads
soup.find("textarea", {"id": "tsConfigContainer"}).text
AttributeError: 'NoneType' object has no attribute 'text'

It appears soup.find cannot find: "textarea", {"id": "tsConfigContainer"

Is there a workaround?

getFilters() only returns the first 200 values

Hii @bertrandmartel , first of all thank you so much for your work, it's amazing!
I'm facing issues on the same dashboard as @khrusco (issue #29). When calling getFilters() it only returns the first 200 values from the value field and I would like to create a database with all "agentes" values.
Here's my code:

import pandas as pd
from tableauscraper import TableauScraper as TS

url = "https://tableaupub.ccee.org.br/t/PIDM/views/IndicadoresdeSeguranadoMercado/ConcentraodeNegociao"

ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()


print("parametros:")
print(workbook.getParameters())
print("------------------------")

workbook = workbook.setParameter('parDimensão', 'Agente')
ws = workbook.getWorksheet('Classe Contraparte')

print("filtros:")
print(ws.getFilters())
print("------------------------")


filtros = ws.getFilters()
agentes = filtros[0]['values']


appended_data = pd.DataFrame()
for x in agentes:
    print(x)
    wb = ws.setFilter('Filtro Dimensão', x , membershipTarget=False, filterDelta=True, dashboardFilter=True)
    df_aux = wb.getWorksheet('Hist Concentração de Negociação (2)').data
    df_aux['agente'] = x
    appended_data = appended_data.append(df_aux)

Thanks in advance for you help! :)

Multiple parameters in single call?

Hello

I have not found in .getParameters() if there is a way to do multiple parameters. Is this something in your code that you haven't been able to do, or is it a Tableau issue?

I am trying to extra Ohio's Covid dashboard info and use the multiple parameters from the GUI, but have not figured out how to do it with your scraper.

Any help would be appreciated!

Issue getting filters, loading some sheets with server side rendering

I am attempting to get the underlying data (if possible) of which summary tables and graphs are presented in the workbook [here](https://tableau.soa.org/t/soa-public/views/USPostLevelTermMortalityExperienceInteractiveTool/3_PLTDuration?%3AisGuestRedirectFromVizportal=y&%3Aembed=y).

Loosely following your TableauCIESFootball code, I start like this:

from tableauscraper import TableauScraper as TS

url='https://tableau.soa.org/t/soa-public/views/USPostLevelTermMortalityExperienceInteractiveTool/DataTable3?%3AisGuestRedirectFromVizportal=y&%3Aembed=y'
wb = TS()
wb.loads(url)

workbook = wb.getWorkbook()
sheets = workbook.getSheets()
print(sheets)

#data2=s.goToSheet("Data Table 2")
data3=workbook.goToSheet("3. PLT Duration")
filters3 = data3.getFilters() 


data3V2=workbook.getWorksheet("3. PLT Duration")
filters3V2 = data3V2.getFilters() 

Issues:

  1. The line: #data2=s.goToSheet("Data Table 2") doesn't work, despite there being a worksheet with exactly that name.
  2. I can run the .goToSheet("3. PLT Duration") command but no the getWorksheet("3. PLT Duration").
  3. After running goToSheet, the getFilters() returns an error.

So I haven't been able to even get started with the data. But my overall objective to somehow get the granular underlying data of which the worksheets present amalgamated summaries.

Thanks for your help!

How to set multiple filters

I'm trying to pull down CBP data from this url. This works fine for pulling the data unfiltered or for a single filter. But i'm struggling to find a way to set up multiple filters at once.

When i try to set multiple filters i get:

2021-10-07 09:11:12,519 - tableauScraper - WARNING - no data dictionary present in response
2021-10-07 09:11:12,519 - tableauScraper - WARNING - no data dictionary present in response
2021-10-07 09:11:12,519 - tableauScraper - WARNING - no data dictionary present in response
2021-10-07 09:11:12,519 - tableauScraper - WARNING - no data dictionary present in response
2021-10-07 09:11:12,519 - tableauScraper - WARNING - no data dictionary present in response
2021-10-07 09:11:12,519 - tableauScraper - WARNING - no data dictionary present in response
2021-10-07 09:11:12,519 - tableauScraper - WARNING - no data dictionary present in response

The code I'm using:

url = (
    "https://publicstats.cbp.gov/t/PublicFacing/views/"
    "CBPSBOEnforcementActionsDashboardsAUGFY21/"
    "SBOEncounters8076?:isGuestRedirectFromVizportal=y&:embed=y"
)
ts = TS()
ts.loads(url)

ws = ts.getWorksheet("SBO Line Graph")
# Set First Filter 
wb = ws.setFilter("Citizenship Grouping", 'El Salvador')
# Grabbing Sheet Again 
ws = wb.getWorksheet("SBO Line Graph")
# Setting Second Filter 
wb = ws.setFilter("Demographic",'Single Adults')

#Grab the data
ws = wb.getWorksheet("SBO Line Graph")
print(ws.data)

Appreciate any help here - many thanks

Radio Button filtering does not work

When attempting to use the serFilter() method for a worksheet with radio buttons, no dictionary is returned.

The rendering seems to be happening on the client side so I don't think that's the issue.

Here's my sample code:

from tableauscraper import TableauScraper as TS
import requests

url = "https://public.tableau.com/shared/99WD3TBJK"

ts = TS(delayMs=500)
ts.loads(url)
workbook = ts.getWorkbook()

storypoint = workbook.goToStoryPoint(storyPointId=3)
worksheet = storypoint.getWorksheet("Sources map")
wf = worksheet.setFilter("Scenario","Long term")```


get Integer value from _dict instead of real

Considering this code block:
`from tableauscraper import TableauScraper as TS
import pandas as pd
import time

data_list = []
url = "https://tableau.azdhs.gov/views/EMResourceBeds/InpatientBedUsageAvailability?%3Aembed=y&"

ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

for t in workbook.worksheets:
data_list.append(t.data)
print(data_list)`

TableauScraper is only offering the real (percentage in this case) number. How can I get the integer instead?

Thanks as always BM

Scraping from a private-access dashboard

Hello everyone! I have access to a private dashboard (login/pass) on a tableau self hosted. I would like to know if i can use this library to scrap that data that requires login/pass

Zones don't always update when there are story points

There's a bug with the updateFullData method of the TableauWorkbook class. Starting at line 82, it tries to update the zones with any new zone data in the cmdResponse:

if ("applicationPresModel" in cmdResponse["vqlCmdResponse"]["layoutStatus"]):
    presModel = cmdResponse["vqlCmdResponse"]["layoutStatus"]["applicationPresModel"]
    newZones = utils.getZones(presModel)
    newZonesStorage = {}
    for zone in newZones.keys():
        if newZones[zone] is not None:
            zoneHasVizdata = utils.hasVizData(newZones[zone])
            if (not zoneHasVizdata) and (zone in self._scraper.zones):
                newZonesStorage[zone] = copy.deepcopy(
                    self._scraper.zones[zone])
            else:
                newZonesStorage[zone] = copy.deepcopy(newZones[zone])
    self._scraper.zones = newZonesStorage
else:
    self._scraper.zones = {}

However, it doesn't update zones if there is no vizData field present and the zones already exist. Story points never have a vizData field, so the workbook will never update the zones despite the worksheet zones within the story point changing.

This bug arises when iterating over 2 or more parameters within a story point. The code should probably also check whether the underlying worksheet zones have the vizData field and update the zones if they do.

(For those interested in a quick fix, I changed the if statement on line 89 to check if the zone is a story point)

if (not zoneHasVizdata) and (zone in self._scraper.zones) and not ("presModelHolder" in newZones[zone] and "flipboard" in newZones[zone]["presModelHolder"] and "storyPoints" in newZones[zone]["presModelHolder"]["flipboard"]):

Adding range filters

I'm scraping from Redfin's data center Tableau, and the date range filter doesn't show up in the ws.getFilters() output. In Chrome's dev console, I can see that range filters are distinct from categorical filters, so it likely requires a different worksheet or workbook method.

Filtering doesn't work when filter-delta property is specified

Hi Bertrand,

First of all, thanks for the great work behind this package!

I'm trying to retrieve the data from the USCIS Southwest Land Border Encounters by looping over all possible filter combinations in the worksheet. The problem is that I keep running into "no data dictionary present in response" warning messages when trying to set the individual filters to the desired combination even though the dashboard displays valid responses for the same filters on the web page. After a few warning messages, the scraper just dies with KeyError: 'workbookPresModel'.

My python's a bit rusty, so I'm calling your package from R using reticulate. Here's the code to reproduce the issue:

library(tidyverse)
library(reticulate)
library(rvest)

# py_install("tableauscraper", pip = TRUE)
baseurl <- "https://publicstats.cbp.gov/t/PublicFacing/views/"
ts <- import("tableauscraper")$TableauScraper()

dashboards <-
  session("https://www.cbp.gov/newsroom/stats/southwest-land-border-encounters") |>
  html_elements("param[value*='CBPSBOEnforcementActionsDashboardsJULFY21']") |>
  html_attr("value")

encountersdb <- str_c(baseurl, dashboards[1], "?:embed=y&:showVizHome=no")

ts$loads(encountersdb)
wb <- ts$getWorkbook()
ws <- wb$getWorksheet("SBO Line Graph")

combs <- ws$getFilters() |> (\(x) set_names(map(x, ~.$values), map_chr(x, ~.$column)))() |> cross_df()

data <-
  combs |>
  mutate(data =
           pmap(combs,
                function(...) {
                  params <- list(...)
                  wb <-
                    reduce2(names(params), params,
                            function(x, column, value) {
                              x$getWorksheet("SBO Line Graph")$setFilter(column, value)
                            },
                            .init = wb)
                  wb$getWorksheet("SBO Line Graph")$data |> as_tibble()
                }))

Would be appreciated if you could take the time to look into it. Or let me know if there's another way to accomplish the same result.

Selection producing log errors of "tableauScraper - ERROR - Expecting value: line 5 column 1 (char 8)"

New Jersey's Covid Dashboard is available here
https://dashboards.doh.nj.gov/views/DailyConfirmedCaseSummary7_22_2020/PCRandAntigenPositives?%3AisGuestRedirectFromVizportal=y&%3Aembed=y

Its system is bizarre, you can get county-day counts but you have to click on the date in the time series bar graph, and it will subset the county breakdown to that day. So we click each day on the time series in the middle, and read off the counts on the county breakdown on the left.

Using ws.select on the appropriate selectable with a valid value produces a series of logged errors

2021-03-31 20:54:15,322 - tableauScraper - ERROR - Expecting value: line 5 column 1 (char 8)
2021-03-31 20:54:15,322 - tableauScraper - ERROR - Expecting value: line 5 column 1 (char 8)
2021-03-31 20:54:15,322 - tableauScraper - ERROR - Expecting value: line 5 column 1 (char 8)
...

I believe the error is thrown at this location inside the try

r = tableauscraper.api.select(self._scraper, self.name, [index])

A reproducible example is here

import pandas as pd  
from tableauscraper import TableauScraper as TS
url = "https://dashboards.doh.nj.gov/views/DailyConfirmedCaseSummary7_22_2020/PCRandAntigenPositives?%3AshowAppBanner=false&%3Adisplay_count=n&%3AshowVizHome=n&%3Aorigin=viz_share_link&%3AisGuestRedirectFromVizportal=y&%3Aembed=y"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

ws = ts.getWorksheet("EPI CURVE") #this one is weird, we have to select the date, and then grab the counties list
selections = ws.getSelectableItems()
print(selections)
dates=ws.getSelectableItems()[3]['values']
date=dates[0]
key=ws.getSelectableItems()[3]['column']
wb = ws.select(key, date) #throws errors "tableauScraper - ERROR - Expecting value: line 5 column 1 (char 8)"

data coming back different than selected item

Hi,

I am using this library to scrape a state tableau site for COVID-19 vaccine data. My goal is to eventually obtain all of the relevant county level data. However, when I select a county a different county's data comes back, with one county "Santa Cruz" not selecting anything. This may be an issue with the worksheet.

If it is an issue with the worksheet itself, is there any workaround I can use this to select the county via the map.

Thank you!

from tableauscraper import TableauScraper as TS
import pandas as pd
import time

counties = ["Apache", "Coconino", "Cochise", "Graham", "Greenlee", "Gila", "La Paz", "Maricopa", "Mohave", "Navajo", "Pinal", "Pima",  "Yavapai", "Yuma"]


for county in counties:
    print(county)
    
    while True:
        try:
            data_list = []

            url = "https://tableau.azdhs.gov/views/VaccineDashboard/Vaccineadministrationdata?%3Aembed=y&"

            #initialize scraper

            ts = TS()
            ts.loads(url)

            #select that value
            dashboard = ts.getWorksheet(" County using admin county").select("Admin Address County W/ State Pod", county)

            for t in dashboard.worksheets:
                data_list.append(t.data)


            res = [int(i) for i in str(data_list[0]).split() if i.isdigit()]
            one_dose = res[1]
            print(one_dose)
            print(data_list[0])  
        except:
            continue
        break
    
    
    
    time.sleep(30)

Can't filter using dropdown

Hi! First of all, I'm really impressed about the lib, such a great job!

Could you help me with this code? I'm trying get data using some filters, but it seems the code can't read the dropdown. I already used the force_setFilter function mentioned in another issue, but I stil can't get the data from the dashboard

from tableauscraper import TableauScraper as TS

url = "https://tableaupub.ccee.org.br/t/PIDM/views/IndicadoresdeSeguranadoMercado/ConcentraodeNegociao"

ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

#the code returns the parameter values used below
workbook.getParameters()

workbook.setParameter('parDimensão', 'CNPJ') #select "CNPJ" option

#this code returns the name of worksheet used below
#workbook.getWorksheetNames()

ws = ts.getWorksheet('Hist Concentração de Negociação (2)') #reaching dashboard data

#by using this, i got the the filters, previously 'Agente', but now 'CNPJ'
#as mentioned on setParameters
#ws.getFilters()

#the code return the data without any filter previously applied
ws.setFilter('SG_PERF_AGEN (Consulta_SQL_personalizada)','00.095.840/0001-85') #can't use the dropdown filter
ws.data

tnks in advance :)

Scraping of data rendered at the server-side

Hello,

thanks a lot for the scraper! Good job!

I am trying to select all options in the dropdown filter following your docs.

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/views/SknadertilForskningsrdet-oppsummering/Enkeltsknader"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
ws = ts.getWorksheet("Prosjektoversikt")

filters = ws.getFilters()
print(filters)

Here I can find the relevant filter and the values:
{'column': 'MDY(Søknadsfrist dato)', 'ordinal': 0, 'values': [20200212, 20200331, 20200422, 20200525, 20200527, 20200902, 20200916, 20201111, 20201118, 20210210, 20210217, 20210317, 20210512, 20210521, 20210915], 'globalFieldName': '[federated.1cjq0fj17kfn4b19erg02187mlqp].[md:Søknadsfrist dato:ok]'}

However, when I try to use these values for filtering, I keep getting the Error: value not in list.

wb = ws.setFilter('MDY(Søknadsfrist dato)', 20200212)

I have also tested feeding the value as a string '20200212' and even transcribing the value in the dropdown 'February 12, 2020'.

The only thing that has worked is the force_setFilter function from djay presented here #26
However, I cannot choose 'May 25, 2020' from the dropdown menu (not sure why it is only this one value).

Any suggestions how to set the filter would be much appreciated!

Dashboards with dropdown box filters

Hew Hampshire's COVID-19 dashboard has a dropdown box for filtering results to "County"
https://www.nh.gov/covid19/dashboard/testing.htm#dash

It does not appear under selectables, parameters, or filters as best as I can find.
Here's the output https://pastebin.com/raw/BNuVQrHn) for
python prompt.py -get workbook -url "https://www.nh.gov/t/DHHS/views/COVID19TestingDashboard/TestingDashboard?:iid=4&:isGuestRedirectFromVizportal=y&:display_count=n&:showVizHome=n&:origin=viz_share_link"

Is this a use case you are aware of, and is there are part of tableau-scraper I should be using to handle it, perhaps in the "dashboards" part of the code?

Attempting to fetch data from sheet but can't get worksheets using getWorksheets(), most requests are returning empty arrays.

I'm trying to scrape data from this tableau:
https://public.tableau.com/app/profile/salma1413/viz/MOP_2021_v1/MOP

However I can't seem to be able to get any information at all from any of the methods except for workbook.getSheets() which does return a list of objects that have the sheet name and the windowId though I'm still unsure of what to do next with this info.

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/views/MOP_2021_v1/MOP"

ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

# ⚠ This is returning empty arrays [] []
wsNames = workbook.getWorksheetNames()
wss = workbook.getWorksheets()
print(wsNames, wss)

# This is returning a list of objects with IDs and names
sheetsList = workbook.getSheets()

for sheetDict in sheetsList:
    print(f"sheet : {sheetDict['sheet']} - {sheetDict['windowId']}") 
   
    ws = workbook.getWorksheet(sheetDict['sheet'])

    # ⚠ This is unfortunately returning an empty array as well  -> []
    filters = ws.getFilters()
    print("worksheet filters: ", filters) 
    

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.