targets-test

Project to practise creating analytical pipelines to run models using {targets} library.

The {targets} R package user manual: https://books.ropensci.org/targets/
Targets walkthrough: https://books.ropensci.org/targets/walkthrough.html

Important:

Each pipeline has its unique "_targets.R" file. And each pipeline will contain specific set of tar_target() and tar_group_by() functions used to configure the pipeline structure for each project.
As this _targets.R file must retain its original name, I will save each _targets.R file for each pipeline in a separate folder in this GitHub project.
Each pipeline folder will have to be run on a dedicated and individual R project to match the targets list from _targets.R file for that pipeline with its related set of adhoc R functions stored in the \R folder
This will ensure each pipeline works for the purpose stated in the pipeline folder created at the top of this project.
So when downloading each pipeline folder, it will contain the "_targets.R" files and related functions saved in the \R folder. All required input files will be sourced from the \data folder

Pipeline_01_populate_markdown_with_targets files:

_targets.R (Specific Pipeline setup file)
populate_markdown_with_targets_functions.R (Specific functions to populate this pipeline)

Pipeline_02_to_render_markdown:

_targets.R (Specific Pipeline setup file)
pipeline_render_markdown_functions.R (Specific function to populate this pipeline)

Pipeline_03_dynamic_branching files:

_targets.R (Specific Pipeline setup file)
dynamic_pipeline_functions.R (Specific functions to populate this pipeline)

1. Targets quick start guide

After installing the package, we load targets “library(targets)”. Then our first step is to run “use_tergets()” function. This creates a new file called _tragets.R that is used to configure and setup the pipeline.

Follow these steps then detailed in the R Documentation section of the use_targets() function:

After you call use_targets(), there is still configuration left to do:

Open ⁠_targets.R⁠ and edit by hand. Follow the comments to write any options, packages, and target definitions that your pipeline requires.

Edit run.R and choose which pipeline function to execute (tar_make(), tar_make_clustermq(), or tar_make_future()).

If applicable, edit clustermq.tmpl and/or future.tmpl to configure settings for your resource manager.

If applicable, configure job.sh, "clustermq.tmpl", and/or "future.tmpl" for your resource manager.

1.1 Create single scripts for each analysis steps

In this example I have started creating one script to load the data and another one to create a plot from that data
See script: before_targets/code_pre_targets.R

1.2 Turn these single scripts into functions

There is a folder called "before targets" containing individual R scripts called "code_pre_targets.R" this script allows me to plan the analysis. The second script "scripts_into_functions_targets_prep.R" contains new functions based on initial scripts to wwork with Targets package

See script: before_targets/scripts_to_functions.R

1.3 Functions used by Targets saved in R folder

The set of functions we want to run as part of our pipeline, are saved in the R folder for Targets to use them when executing the pipeline
see script "study_functions.R" initial scripts for each analysis step turned into functions to be used in targets pipeline
See script: R/study_functions.R

1.4 Pipeline defined in the _targets.R file

pipeline

1-4 Read in data
2-4 Clean data
3-4 Merge files
3-4 Plot data
After the pipeline run we can run report.Rmd Markdown report and populate it with objects created by Targets pipeline. )

All required files to run this pipeline saved in folder: Pipeline_04_data_wrangling_union_merge

1.5 Specific {targets} functions used to execute the pipeline

Load targets library library(targets)

First check for errors in the pipeline using tar_manifest() function tar_manifest(fields = command)

Then check pipeline dependency graph using tar_visnetwork() function tar_visnetwork()
Finally we run the pipeline we just built earlier using tar_make() function tar_make()

The plot created from our pipeline is now saved as an individual .png chart

1.6 Run pipeline

Fnally we run the pipeline we just built earlier using tar_make() function This function runs the correct targets in the correct order and saves the results to files tar_make()

Pipeline 01. populate markdown with targets

Everytime we update something in the pipeline we use "tar_make()" to re-run the entire pipeline. If some of the targets have not changed since last time we ran the pipeline, targets will skip those nodes in the pipeline called targets.

The tar_read() function we collect the pipeline output object to be used in specific sections of the Markdown report. For example, to use the data frame we creaetd on the first target we use tar_read(data). To use in the Markdown report the plot we created in the second Target object we use tar_read(plot). This allows us to populate our markdown report with specific objects created alongside the pipeline we just built and ran.

The final output of this pipeline is being used to create a fully rendered markdown report produced by the markdown file report.Rmd has been created and published in this repo:

The last step of this project has been building and rendering a markdown report called report.Rmd populated with the objects created in the pipeline by Targets. The aim is to autonmate the reports creation tasks by running a pipeline making it easier to mantain and update this report in the future. When rendering report.Rmd we obtain a document populated with tables and content from the pipeline. This could be expanded to automate reports ensuring reproducibility. Trying to follow RAP principles.

So now we have an initial pipeline that we can start to modify and expand to include extra analytical steps in the form of new targets

Pipeline 01. General pipeline structure using visnetwork

First we will merge all incoming .csv files, then we combine them into a single file and we use this new combined data frmae to populate our Markdown report.

This is the output usuing tar_visnetwork() function to check pipeline dependency graph

As part of the data preparation stage for future modelling pipeline

Pipeline 01. Completed pipeline final output

This is the output of the complated pipeline run, with dataframes saved and required .csv files saved in the \objects folder

After using tar_make() function we get the complete report of which sections of the pipeline have ran

All required files to run this pipeline saved in folder: Pipeline_01_populate_markdown_with_targets

1.Pipeline 02. Render Markdown in pipeline

We can render a Markdown document in the Targets pipeline by using {tarchetypes} library. This library provide us with the tar_render() function. So by adding a new target to our pipeline, we can render the report after the pipeline has run and it has populated our Markdown report.

And the rendering Targets function is now included in the pipeline:

After running the _targets file from this folder, we can automate the creation and rendering of a Markdown document inside the Targets pipeline

All required files to run this pipeline saved in folder: Pipeline_02_to_render_markdown

2.Pipeline 03. Dynamic branching and Time Series models forecast

Once the pipeline has run, before we implement a new feature (including a simple ARIMA model) defined in issue '#6', I have run fs:dir_tree("targets-test") to check whole set of objects created by Targets. The Markdown report has been populated by the three plots created in the pipeline.

In the coming week, I will be using Dynamic branching alongside Modeltime packages to introduce a couple of predictive models (ARIMA,Prophet) in the eixisting Pipeline. This is aimed to predict the next 5 months of Manufacturer's Value of Shipment for the following set of Shipment categories described below:

2.1 Dynanic branching

It is a way to define new targets while the pipeline is running. Opposed to declaring several targets up front. It is when you want to iterate over what is in the data, and you want a target that iterates by region. -Dynamic branching using {targets} https://books.ropensci.org/targets/dynamic.html

I will be using Dynamic branching to iterate over these four Economic Indicators downloaded from the FRED, Federal Reserve Economic Data:

Categories > Production & Business Activity > Manufacturing https://fred.stlouisfed.org/

Monthly time series indicators downloaded from FRED Economic Data. St Louis:

Manufacturers' Value of Shipments: Total Manufacturing (AMTMVS). 2000-2024. Frequency: Monthly. U.S. Census Bureau, Manufacturers' Value of Shipments: Total Manufacturing [AMTMVS], retrieved from FRED, Federal Reserve Bank of St. Louis; April 2, 2024.Frequency: Monthly. Units: Millions of Dollars, Seasonally Adjusted. URL: https://fred.stlouisfed.org/series/AMTMVS
Manufacturers' Value of Shipments: Computers and Electronic Products (A34SVS). 2000-2024. U.S. Census Bureau, Manufacturers' Value of Shipments: Computers and Electronic Products [A34SVS], retrieved from FRED, Federal Reserve Bank of St. Louis; April 3, 2024. Frequency: Monthly. Units: Millions of Dollars, Seasonally Adjusted. URL: https://fred.stlouisfed.org/series/A34SVS
Manufacturers' Value of Shipments: Durable Goods (AMDMVS). 2000-2024. Frequency: Monthly U.S. Census Bureau, Manufacturers' Value of Shipments: Durable Goods [AMDMVS], retrieved from FRED, Federal Reserve Bank of St. Louis; April 2, 2024.Frequency: Monthly. Units: Millions of Dollars, Seasonally Adjusted. URL: https://fred.stlouisfed.org/series/AMDMVS
Manufacturers' Value of Shipments: Nondefense Capital Goods Excluding Aircraft (ANXAVS). 2000-2024. Frequency: Monthly U.S. Census Bureau, Manufacturers' Value of Shipments: Nondefense Capital Goods Excluding Aircraft [ANXAVS], retrieved from FRED, Federal Reserve Bank of St. Louis; April 2, 2024.Frequency: Monthly. Units: Millions of Dollars, Seasonally Adjusted. URL: https://fred.stlouisfed.org/series/ANXAVS

This is an example of dynamic branching using tarchetypes package based on Metric variable, creating 2 branches for the two metrics included in this workflow: tarchetypes package GitHub repo:https://github.com/ropensci/tarchetypes/tree/main

Visnetwork from the above workfow including branching

All required files to run this pipeline saved in folder: Pipeline_03_dynamic_branching_files

This will allow me using Modeltime, to apply each model to the different branches created by Targets, so the model will ran by each metric in the pipeline

4.Pipeline 05. Dynamic branching including ARIMA and Prophet models

This pipeline is completed and all required files to run it can be found in "Pipeline_05_ARIMA_Prophet_models" folder:

Specific files to replicate this pipeline: "_targets.R","using_dynamic_predictive_pipeline.R" and "dynamic_predictive_pipeline_functions.R" this last one must be run from the \R folder

Using Modeltime Package to combine Prophet and ARIMA models in the previous Targets Pipeline. Modeltime package: https://business-science.github.io/modeltime/

pablo-source / targets-test Goto Github PK

targets-test's Introduction

targets-test

1. Targets quick start guide

1.1 Create single scripts for each analysis steps

1.2 Turn these single scripts into functions

1.3 Functions used by Targets saved in R folder

1.4 Pipeline defined in the _targets.R file

1.5 Specific {targets} functions used to execute the pipeline

1.6 Run pipeline

Pipeline 01. populate markdown with targets

Pipeline 01. General pipeline structure using visnetwork

Pipeline 01. Completed pipeline final output

1.Pipeline 02. Render Markdown in pipeline

2.Pipeline 03. Dynamic branching and Time Series models forecast

2.1 Dynanic branching

4.Pipeline 05. Dynamic branching including ARIMA and Prophet models

targets-test's People

Contributors

Stargazers

Watchers

targets-test's Issues

Recommend Projects

Recommend Topics

Recommend Org