Coder Social home page Coder Social logo

cookbook's People

Contributors

0xflotus avatar andkret avatar asyne avatar baky0905 avatar benjenkinsiv avatar brootware avatar bsamanvitha avatar christophe-williams avatar darek avatar derek-baker avatar dvincent1337 avatar eponkratova avatar goodoldneon avatar insigh1 avatar itsderek23 avatar kakaru1331 avatar kalebcoberly avatar koenbal avatar lucky7323 avatar maduxi avatar marcosvpj avatar mattmacs avatar maxwellarrigona avatar php1ic avatar repodevs avatar ricardocalleja avatar rodrigobressan avatar sabinbajracharya avatar sharad-vm avatar team-data-science avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cookbook's Issues

Thank you and quick suggestion

Just wanted to say thank you for the time you have put into providing this info.

A quick suggestion. You might want you add some tags to your repo so it shows up in searches better.

Link Checker Report

Errors were reported while checking the availability of links.


๐Ÿ“ Summary
---------------------
๐Ÿ” Total..........454
โœ… Successful.....451
โณ Timeouts.........0
๐Ÿ”€ Redirected.......0
๐Ÿ‘ป Excluded.........0
๐Ÿšซ Errors...........3

Errors in sections/03-AdvancedSkills.md
โœ— https://github.com/gschmutz/stream-processing-workshop/tree/master/04-twitter-data-ingestion-with-streamsets (GitHub token not specified. To check GitHub links reliably, use `--github-token` flag / `GITHUB_TOKEN` env var.)

Errors in sections/07-DataSources.md
โœ— http://www.bjs.gov/index.cfm?ty=dca (error sending request for url (https://www.bjs.gov/index.cfm?ty=dca): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:../ssl/statem/statem_clnt.c:1915: (Hostname mismatch))

Errors in sections/05-CaseStudies.md
โœ— https://www.youtube.com/channel/UCxwul7aBm2LybbpKGbCOYNA/playlists (404 Not Found)

Full Github Actions output

Link Checker Report

Errors were reported while checking the availability of links:
๐Ÿ“ Summary

๐Ÿ” Total............9
โœ… Successful.......8
โณ Timeouts.........1
๐Ÿ”€ Redirected.......0
๐Ÿ‘ป Excluded.........0
๐Ÿšซ Errors...........0

Errors in README.md
โง– https://andreaskretz.com/

Error in graphic "What's make Hadoop so popular"

In this section : https://github.com/andkret/Cookbook/blob/master/AdvancedSkills.md#what-makes-hadoop-so-popular

I think you mean the BI software Tableau and not Tableu, don't you ?

This version is over simplified for a purpose I get it but if you want to add a more detailed version of the big data landcape you can referance the work of Matt Turck : (2019) http://mattturck.com/wp-content/uploads/2019/07/2019_Matt_Turck_Big_Data_Landscape_Final_Fullsize.png

Link Checker Report

Errors were reported while checking the availability of links:
๐Ÿ“ Summary

๐Ÿ” Total..........456
โœ… Successful.....422
โณ Timeouts.........2
๐Ÿ”€ Redirected.......0
๐Ÿ‘ป Excluded.........0
๐Ÿšซ Errors..........32

Errors in sections/02-BasicSkills.md
โ†ฏ https://medium.com/@saswat.sipun/shell-scripting-cheat-sheet-c0ecfb80391 (Invalid mail address: https://medium.com/@saswat.sipun/shell-scripting-cheat-sheet-c0ecfb80391)

Errors in sections/09-BooksAndCourses.md
โœ— https://click.linksynergy.com/deeplink?id=uyxOZI9fN/M&mid=40328&murl=https%3A%2F%2Fwww.coursera.org%2Fspecializations%2Fios-development [400 Bad Request]
โœ— https://click.linksynergy.com/deeplink?id=uyxOZI9fN/M&mid=40328&murl=https%3A%2F%2Fwww.coursera.org%2Flearn%2Fcloud-services-java-spring-framework [400 Bad Request]
โœ— https://click.linksynergy.com/deeplink?id=uyxOZI9fN/M&mid=40328&murl=https%3A%2F%2Fwww.coursera.org%2Flearn%2Fmachine-learning [400 Bad Request]
โœ— https://click.linksynergy.com/deeplink?id=uyxOZI9fN/M&mid=40328&murl=https%3A%2F%2Fwww.coursera.org%2Flearn%2Fcomputer-networking%3F [400 Bad Request]

Errors in sections/05-CaseStudies.md
โœ— https://towardsdatascience.com/building-machine-learning-at-linkedin-scale-f08bd9a63f0a [410 Gone]
โ†ฏ https://medium.com/@Pinterest_Engineering/building-pin-stats-25ec8460e924 (Invalid mail address: https://medium.com/@Pinterest_Engineering/building-pin-stats-25ec8460e924)
โœ— https://www.uber.com/us/en/uberai/ [406 Not Acceptable]
โœ— https://engineering.linkedin.com/teams/data/projects/pinot [404 Not Found]
โ†ฏ https://medium.com/@Pinterest_Engineering/pinterest-joins-the-cloud-native-computing-foundation-e3b3e66cb4f (Invalid mail address: https://medium.com/@Pinterest_Engineering/pinterest-joins-the-cloud-native-computing-foundation-e3b3e66cb4f)
โ†ฏ https://medium.com/@Pinterest_Engineering/building-a-real-time-user-action-counting-system-for-ads-88a60d9c9a (Invalid mail address: https://medium.com/@Pinterest_Engineering/building-a-real-time-user-action-counting-system-for-ads-88a60d9c9a)
โ†ฏ https://medium.com/@Pinterest_Engineering/using-kafka-streams-api-for-predictive-budgeting-9f58d206c996 (Invalid mail address: https://medium.com/@Pinterest_Engineering/using-kafka-streams-api-for-predictive-budgeting-9f58d206c996)
โœ— https://www.linkedin.com/in/michalgancarski/ [999 ]
โ†ฏ https://medium.com/@Pinterest_Engineering/improving-hbase-backup-efficiency-at-pinterest-86159da4b954 (Invalid mail address: https://medium.com/@Pinterest_Engineering/improving-hbase-backup-efficiency-at-pinterest-86159da4b954)
โœ— https://www.linkedin.com/in/max-schultze-b11996110/ [999 ]
โœ— https://pinot.readthedocs.io/en/latest/intro.html# [404 Not Found]
โง– https://streaml.io/blog/intro-to-heron
โ†ฏ https://medium.com/@Pinterest_Engineering/auto-scaling-pinterest-df1d2beb4d64 (Invalid mail address: https://medium.com/@Pinterest_Engineering/auto-scaling-pinterest-df1d2beb4d64)

Errors in sections/07-DataSources.md
โง– http://www.socialmention.com/
โ†ฏ http://www.rita.dot.gov/bts/home (error sending request for url (http://www.rita.dot.gov/bts/home): error trying to connect: dns error: failed to lookup address information: No address associated with hostname)
โœ— http://www.ucrdatatool.gov/ [504 Gateway Timeout]
โœ— http://www.epa.gov/open/data-inventory-and-activities [403 Forbidden]
โœ— http://energyatlas.iea.org/?subject=-1118783123 [502 Bad Gateway]
โœ— http://fisher.osu.edu/fin/fdf/osudata.htm [404 Not Found]
โœ— https://www.cia.gov/library/publications/the-world-factbook/ [404 Not Found]
โœ— http://data.imf.org/?sk=7CB6619C-CF87-48DC-9443-2973E161ABEB [404 Not Found]
โ†ฏ https://medium.com/@Infogram/15-great-free-data-sources-for-2016-25cb455db257 (Invalid mail address: https://medium.com/@Infogram/15-great-free-data-sources-for-2016-25cb455db257)
โœ— http://www.google.com/trends/explore [429 Too Many Requests]
โœ— http://comtrade.un.org/labs/BIS-trade-in-goods/ [404 Not Found]

Errors in sections/03-AdvancedSkills.md
โœ— https://trends.google.com/trends/explore?geo=US&q=%2Fg%2F11fy132gmf,%2Fg%2F11cknd0blr [429 Too Many Requests]
โ†ฏ https://medium.com/@xaviergeerinck/building-a-real-time-streaming-dashboard-with-spark-grafana-chronograf-and-influxdb-e262b68087de (Invalid mail address: https://medium.com/@xaviergeerinck/building-a-real-time-streaming-dashboard-with-spark-grafana-chronograf-and-influxdb-e262b68087de)
โ†ฏ https://github.com/gschmutz/stream-processing-workshop/tree/master/04-twitter-data-ingestion-with-streamsets (GitHub token not specified. To check GitHub links reliably, use --github-token flag / GITHUB_TOKEN env var.)
โœ— https://jersey.github.io/documentation/latest/getting-started.html [404 Not Found]

Errors in sections/06-BestPracticesCloud.md
โœ— https://towardsdatascience.com/how-to-deploy-a-docker-container-python-on-amazon-ecs-using-amazon-ecr-9c52922b738f [410 Gone]

Link Checker Report

Errors were reported while checking the availability of links.


๐Ÿ“ Summary
---------------------
๐Ÿ” Total..........454
โœ… Successful.....452
โณ Timeouts.........0
๐Ÿ”€ Redirected.......0
๐Ÿ‘ป Excluded.........0
๐Ÿšซ Errors...........2

Errors in sections/03-AdvancedSkills.md
โ†ฏ https://github.com/gschmutz/stream-processing-workshop/tree/master/04-twitter-data-ingestion-with-streamsets (GitHub token not specified. To check GitHub links reliably, use `--github-token` flag / `GITHUB_TOKEN` env var.)

Errors in sections/06-BestPracticesCloud.md
โœ— https://towardsdatascience.com/how-to-deploy-a-docker-container-python-on-amazon-ecs-using-amazon-ecr-9c52922b738f [410 Gone]

Full Github Actions output

Link Checker Report

Errors were reported while checking the availability of links.


๐Ÿ“ Summary
---------------------
๐Ÿ” Total..........456
โœ… Successful.....454
โณ Timeouts.........1
๐Ÿ”€ Redirected.......0
๐Ÿ‘ป Excluded.........0
๐Ÿšซ Errors...........1

Errors in sections/03-AdvancedSkills.md
โœ— https://github.com/gschmutz/stream-processing-workshop/tree/master/04-twitter-data-ingestion-with-streamsets (GitHub token not specified. To check GitHub links reliably, use `--github-token` flag / `GITHUB_TOKEN` env var.)

Errors in sections/07-DataSources.md
โง– https://archive.ics.uci.edu/ml/index.php

Full Github Actions output

Generate ePub and Mobi files.

Hello,

Thanks for that's interesting content. Beside PDF file, I'd like to suggest to generate ePub and Mobi files as well.

In the README file it's not clear how the PDF generated, but I'd be happy to add it if PDF generation is added to readme.

Thanks.

Invalid Link to Course

Building Cloud Services with the Java Spring Framework (Coursera) Link does not work anymore

Gender neutral or balance in text

Thanks for this great resource!

While it is always a challenge to write in a gender neutral or balanced manner, please consider this in your book. For example, in the conclusion of the introduction both the data engineer and data scientist use the masculine pronoun. This could easily be fixed.

Link Checker Report

Errors were reported while checking the availability of links.


๐Ÿ“ Summary
---------------------
๐Ÿ” Total..........456
โœ… Successful.....454
โณ Timeouts.........0
๐Ÿ”€ Redirected.......0
๐Ÿ‘ป Excluded.........0
๐Ÿšซ Errors...........2

Errors in sections/07-DataSources.md
โœ— http://www.oecd.org/dac/financing-sustainable-development/development-finance-data/ (error sending request for url (http://www.oecd.org/dac/financing-sustainable-development/development-finance-data/): error trying to connect: dns error: failed to lookup address information: Name or service not known)

Errors in sections/03-AdvancedSkills.md
โœ— https://github.com/gschmutz/stream-processing-workshop/tree/master/04-twitter-data-ingestion-with-streamsets (GitHub token not specified. To check GitHub links reliably, use `--github-token` flag / `GITHUB_TOKEN` env var.)

Full Github Actions output

Link Checker Report

Errors were reported while checking the availability of links.


๐Ÿ“ Summary
---------------------
๐Ÿ” Total..........456
โœ… Successful.....455
โณ Timeouts.........0
๐Ÿ”€ Redirected.......0
๐Ÿ‘ป Excluded.........0
๐Ÿšซ Errors...........1

Errors in sections/03-AdvancedSkills.md
โœ— https://github.com/gschmutz/stream-processing-workshop/tree/master/04-twitter-data-ingestion-with-streamsets (GitHub token not specified. To check GitHub links reliably, use `--github-token` flag / `GITHUB_TOKEN` env var.)

Full Github Actions output

Link Checker Report

Errors were reported while checking the availability of links:
๐Ÿ“ Summary

๐Ÿ” Total............9
โœ… Successful.......8
โณ Timeouts.........1
๐Ÿ”€ Redirected.......0
๐Ÿ‘ป Excluded.........0
๐Ÿšซ Errors...........0

Errors in README.md
โง– https://andreaskretz.com/

Compilation error on case sensitive filesystems

In the main file when including figures, you use "images/", but the directory is actually called "Images". This prevents compilation on ubuntu (and I assume all *nix systems).

! LaTeX Error: File `Images/Machine-Learning-Pipeline' not found.

See the LaTeX manual or LaTeX Companion for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.112 ...twidth]{Images/Machine-Learning-Pipeline}

Missing link in Scaling Out section

On the actual page 37, section 12.3.4 Scaling Out, it is talked about a link to a Microsoft MSDN page that is missing

This Link to a Microsoft MSDN page has more options of scaling out an SQL database for you

Link Checker Report

Errors were reported while checking the availability of links.


๐Ÿ“ Summary
---------------------
๐Ÿ” Total..........456
โœ… Successful.....453
โณ Timeouts.........1
๐Ÿ”€ Redirected.......0
๐Ÿ‘ป Excluded.........0
๐Ÿšซ Errors...........2

Errors in sections/03-AdvancedSkills.md
โœ— https://github.com/gschmutz/stream-processing-workshop/tree/master/04-twitter-data-ingestion-with-streamsets (GitHub token not specified. To check GitHub links reliably, use `--github-token` flag / `GITHUB_TOKEN` env var.)

Errors in sections/07-DataSources.md
โœ— http://www.oecd.org/dac/financing-sustainable-development/development-finance-data/ (error sending request for url (https://www.oecd.org/dac/financing-sustainable-development/development-finance-data/): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:../ssl/statem/statem_clnt.c:1915: (unable to get local issuer certificate))
โง– http://www.the-numbers.com/

Full Github Actions output

Kubernetes and Docker

I would highly recommend the heading be Containers and Pods, or something to that effect, and not Docker. Although Docker has been the most easily recognizable containerization technology, many communities want to build containerization technology that works for the opensource community and not the business model of a popular containerization technology. It seems the Kubernetes community has been moving towards a container runtime interface (CRI) that is not locked into depending on Docker. That's where CRI-O has come into play.

It's the concept behind containers and pods that is important to understand and not necessarily the technology used to deliver those concepts. At least IMHO. These types of technologies will always change.

Link Checker Report

Errors were reported while checking the availability of links.


๐Ÿ“ Summary
---------------------
๐Ÿ” Total..........456
โœ… Successful.....455
โณ Timeouts.........0
๐Ÿ”€ Redirected.......0
๐Ÿ‘ป Excluded.........0
๐Ÿšซ Errors...........1

Errors in sections/03-AdvancedSkills.md
โœ— https://github.com/gschmutz/stream-processing-workshop/tree/master/04-twitter-data-ingestion-with-streamsets (GitHub token not specified. To check GitHub links reliably, use `--github-token` flag / `GITHUB_TOKEN` env var.)

Full Github Actions output

Don't track build artefacts

A lot of the files in the repo are created/modified when you compile the main document. There is no need for them to be to be tracked as either they create very noisy commits, or you have to do additional work to revert these files back to their original state before committing, in which case they no longer match the document in it's latest state.

For example, if I change the title text of the first \part{} and recompile, I have made changes to 7 files

$ git status 
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   Data Engineering Cookbook.aux
	modified:   Data Engineering Cookbook.log
	modified:   Data Engineering Cookbook.out
	modified:   Data Engineering Cookbook.pdf
	deleted:    Data Engineering Cookbook.synctex.gz
	modified:   Data Engineering Cookbook.tex
	modified:   Data Engineering Cookbook.toc

no changes added to commit (use "git add" and/or "git commit -a")

The important detail could easily be lost here.

Installs of texlive delete the .synctex.gz by default, unless you pass synctex=1 as an option to pdflatex.

I think a .gitignore with the following content will be a good place to start

# Ignore build artefacts
*.aux
*.log
*.lof
*.lot
*.toc
*.out
*.synctex.gz

This will keep commits, and PRs, clean and easier to follow.

Link Checker Report

Errors were reported while checking the availability of links:
๐Ÿ“ Summary

๐Ÿ” Total..........452
โœ… Successful.....440
โณ Timeouts.........0
๐Ÿ”€ Redirected.......0
๐Ÿ‘ป Excluded.........0
๐Ÿšซ Errors..........12

Errors in sections/05-CaseStudies.md
โ†ฏ https://medium.com/@Pinterest_Engineering/building-pin-stats-25ec8460e924 (Invalid mail address: https://medium.com/@Pinterest_Engineering/building-pin-stats-25ec8460e924)
โ†ฏ https://medium.com/@Pinterest_Engineering/auto-scaling-pinterest-df1d2beb4d64 (Invalid mail address: https://medium.com/@Pinterest_Engineering/auto-scaling-pinterest-df1d2beb4d64)
โ†ฏ https://medium.com/@Pinterest_Engineering/building-a-real-time-user-action-counting-system-for-ads-88a60d9c9a (Invalid mail address: https://medium.com/@Pinterest_Engineering/building-a-real-time-user-action-counting-system-for-ads-88a60d9c9a)
โ†ฏ https://medium.com/@Pinterest_Engineering/improving-hbase-backup-efficiency-at-pinterest-86159da4b954 (Invalid mail address: https://medium.com/@Pinterest_Engineering/improving-hbase-backup-efficiency-at-pinterest-86159da4b954)
โ†ฏ https://medium.com/@Pinterest_Engineering/using-kafka-streams-api-for-predictive-budgeting-9f58d206c996 (Invalid mail address: https://medium.com/@Pinterest_Engineering/using-kafka-streams-api-for-predictive-budgeting-9f58d206c996)
โ†ฏ https://medium.com/@Pinterest_Engineering/pinterest-joins-the-cloud-native-computing-foundation-e3b3e66cb4f (Invalid mail address: https://medium.com/@Pinterest_Engineering/pinterest-joins-the-cloud-native-computing-foundation-e3b3e66cb4f)
โ†ฏ https://medium.com/@kramasamy/introduction-to-apache-heron-c64f8c7c0956 (Invalid mail address: https://medium.com/@kramasamy/introduction-to-apache-heron-c64f8c7c0956)

Errors in sections/03-AdvancedSkills.md
โ†ฏ https://github.com/gschmutz/stream-processing-workshop/tree/master/04-twitter-data-ingestion-with-streamsets (GitHub token not specified. To check GitHub links reliably, use --github-token flag / GITHUB_TOKEN env var.)
โ†ฏ https://medium.com/@xaviergeerinck/building-a-real-time-streaming-dashboard-with-spark-grafana-chronograf-and-influxdb-e262b68087de (Invalid mail address: https://medium.com/@xaviergeerinck/building-a-real-time-streaming-dashboard-with-spark-grafana-chronograf-and-influxdb-e262b68087de)

Errors in sections/07-DataSources.md
โ†ฏ https://medium.com/@Infogram/15-great-free-data-sources-for-2016-25cb455db257 (Invalid mail address: https://medium.com/@Infogram/15-great-free-data-sources-for-2016-25cb455db257)

Errors in sections/02-BasicSkills.md
โ†ฏ https://medium.com/@saswat.sipun/shell-scripting-cheat-sheet-c0ecfb80391 (Invalid mail address: https://medium.com/@saswat.sipun/shell-scripting-cheat-sheet-c0ecfb80391)

Errors in sections/06-BestPracticesCloud.md
โœ— https://towardsdatascience.com/how-to-deploy-a-docker-container-python-on-amazon-ecs-using-amazon-ecr-9c52922b738f [410 Gone]

Link Checker Report

Errors were reported while checking the availability of links.


๐Ÿ“ Summary
---------------------
๐Ÿ” Total..........454
โœ… Successful.....452
โณ Timeouts.........0
๐Ÿ”€ Redirected.......0
๐Ÿ‘ป Excluded.........0
๐Ÿšซ Errors...........2

Errors in sections/03-AdvancedSkills.md
โ†ฏ https://github.com/gschmutz/stream-processing-workshop/tree/master/04-twitter-data-ingestion-with-streamsets (GitHub token not specified. To check GitHub links reliably, use `--github-token` flag / `GITHUB_TOKEN` env var.)

Errors in sections/06-BestPracticesCloud.md
โœ— https://towardsdatascience.com/how-to-deploy-a-docker-container-python-on-amazon-ecs-using-amazon-ecr-9c52922b738f [410 Gone]

Full Github Actions output

Missing DevOps reference

Hi, @andkret, great idea about data engineering cookbook, if possible please add some reference about DevOps i.e. pipelines, CI/CD to give a big picture on the toolchain loops for before and after go live that is havig go live from the very first iteration. Books: Kim Gene & Co.

General comments

We use BigQuery extensively both for production real-time queries of our data and batch processing that builds those tables in the first place. BigQuery was a major enabling technology for us and remains almost magical in how powerful and affordable it is. We also use BQML extensively as it eliminates much of the work to do basic machine learning. It runs in the same environment as the database queries, not code to write, no data transfer. We also use Matillion for BigQuery for all of our workflows. ELT vs ETL so SQL is an essential skill set but its visual programming environment provides common touch-point between all parties and very productive environment to work in.

Confusion between GitHub and git

First of all, great work with this really nice document and resource!

I have a small comment in the Get Familiar With Github chapter and it's about the confusion between Github and git itself. Although Github right now is the biggest hosting service for git, it's very important to make the distinction between git and the service that is Github.

Link Checker Report

Errors were reported while checking the availability of links.


๐Ÿ“ Summary
---------------------
๐Ÿ” Total..........456
โœ… Successful.....454
โณ Timeouts.........0
๐Ÿ”€ Redirected.......0
๐Ÿ‘ป Excluded.........0
๐Ÿšซ Errors...........2

Errors in sections/03-AdvancedSkills.md
โœ— https://github.com/gschmutz/stream-processing-workshop/tree/master/04-twitter-data-ingestion-with-streamsets (GitHub token not specified. To check GitHub links reliably, use `--github-token` flag / `GITHUB_TOKEN` env var.)

Errors in sections/07-DataSources.md
โœ— https://wiki.dbpedia.org/ (error sending request for url (https://wiki.dbpedia.org/): error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:../ssl/statem/statem_clnt.c:1915: (certificate has expired))

Full Github Actions output

Link Checker Report

Errors were reported while checking the availability of links.


๐Ÿ“ Summary
---------------------
๐Ÿ” Total..........454
โœ… Successful.....452
โณ Timeouts.........1
๐Ÿ”€ Redirected.......0
๐Ÿ‘ป Excluded.........0
๐Ÿšซ Errors...........1

Errors in sections/03-AdvancedSkills.md
โ†ฏ https://github.com/gschmutz/stream-processing-workshop/tree/master/04-twitter-data-ingestion-with-streamsets (GitHub token not specified. To check GitHub links reliably, use `--github-token` flag / `GITHUB_TOKEN` env var.)

Errors in sections/07-DataSources.md
โง– http://visualizingeconomics.com/

Full Github Actions output

Add Graph Databases to Cookbook

Hallo @andkret, appretiate your work here very much. For completeness i would like to see Graph Databases, especially Neo4J. Saw some very cool stuff from
their Graph Connect with Neo4J and Spark.
By the way, just for improvement, i would suggest to add some descriptive words to the doc, where currently only a podcast link or youtube link is available. Only to avoid the cookbook becoming just a link collection.
BR Moe

Enhancement to Data Engineer vs Data Scientist

I feel we could enhance by working on

Data Scientist

  1. Add link to Interview with Data Scientists {Or any other link to interviews with Data Scientists}
  2. Improve formatting of steps on how ML works {Right now its flat text}
  3. Talk specifically about overfitting and under-fitting {I've seen a lot of folks who struggle with this. They get 99% accuracy on test, but fail horribly in production}. There is also a recent paper on Machine Learning Testing: Survey, Landscapes and Horizons

Data Engineering

  1. Talk about CRISP DM method
  2. Talk about EDA {Who performs it, what are general steps}
  3. Giving historical context on why certain tools were created would help {E.g. Hadoop, Spark, Cassandra etc}

Do let me know if these are good, I can work on submitting a PR

Request: add iptables to cookbook

Below 7.5 Firewalls, in the talk here you mentioned roughly, "I'm not sure where this topic can be learned, maybe Udacity. If anyone knows ...?" (heavy paraphrasing here)

A link to an iptables tutorial, or a talk about it would be a great addition to the firewall section of the guide. :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.