Coder Social home page Coder Social logo

nopponaim603 / building-big-data-pipelines-with-apache-beam Goto Github PK

View Code? Open in Web Editor NEW

This project forked from packtpublishing/building-big-data-pipelines-with-apache-beam

0.0 0.0 0.0 557 KB

Building Big Data Pipelines with Apache Beam, published by Packt

License: Apache License 2.0

Shell 0.11% Python 11.31% Java 88.51% Dockerfile 0.06%

building-big-data-pipelines-with-apache-beam's Introduction

Building Big Data Pipelines with Apache Beam

Building Big Data Pipelines with Apache Beam

This is the code repository for Building Big Data Pipelines with Apache Beam, published by Packt.

Use a single programming model for both batch and stream data processing

What is this book about?

This book describes both batch processing and real-time processing pipelines. You’ll learn how to implement basic and advanced big data use cases with ease and develop a deep understanding of the Apache Beam model. In addition to this, you’ll discover how the portability layer works and the building blocks of an Apache Beam runner.

This book covers the following exciting features:

  • Understand the core concepts and architecture of Apache Beam
  • Implement stateless and stateful data processing pipelines
  • Use state and timers for processing real-time event processing
  • Structure your code for reusability
  • Use streaming SQL to process real-time data for increasing productivity and data accessibility
  • Run a pipeline using a portable runner and implement data processing using the Apache Beam Python SDK
  • Implement Apache Beam I/O connectors using the Splittable DoFn API

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders.

The code will look like the following:

ClassLoader loader = FirstPipeline.class.getClassLoader();
String file = loader.getResource("lorem.txt").getFile();
List<String> lines = Files.readAllLines( Paths.get(file), StandardCharsets.UTF_8);

Following is what you need for this book: This book is for data engineers, data scientists, and data analysts who want to learn how Apache Beam works. Intermediate-level knowledge of the Java programming language is assumed.

With the following software and hardware list you can run all code files present in the book (Chapter 1-7).

Software and Hardware List

Chapter Software required OS required
1-7 Java 11, Python 3 Windows, Mac OS X, and Linux (Any)
1-7 Bash Windows, Mac OS X, and Linux (Any)
1-7 Docker Windows, Mac OS X, and Linux (Any)

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Related products

Get to Know the Author

Jan Lukavský is a freelance big data architect and engineer who is also a committer of Apache Beam. He is a certified Apache Hadoop professional. He is working on open source big data systems combining batch and streaming data pipelines in a unified model, enabling the rise of real-time, data-driven applications.

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

https://packt.link/free-ebook/9781800564930

building-big-data-pipelines-with-apache-beam's People

Contributors

davids-packt avatar je-ik avatar packt-itservice avatar packtutkarshr avatar rahbirul avatar utkarsha-packt avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.