Coder Social home page Coder Social logo

rr-shell-data-processing's Introduction

rr-shell-data-processing

Big data - shell data processing using Powershell and GitBash commands

Powershell

  • mkdir - creates a new directory
  • cd - changes the directory
  • cd .. - changes to root directory
  • ni - creates new item
  • rm - removes an item
  • ls - list the items
  • ALT SPACE C -To close the window

GitBash

  • git clone repoUrl - clones the cloud repo to local machine
  • git pull origin branchName - pulls the fresh code from repo
  • git add remote origin repoUrl - adds remote folders to cloud repo
  • git add . - adds the files
  • git commit -m "initial commit" - commits the repo with message
  • git push origin branchName - pushes the changes to that specific branchName
  • cat - concatenate files and print on the standard output
  • head -10 filename.txt - displays the top 10 lines of file
  • tail -2 filename.txt - displays last 2 lines of file

Creating a project

  • Start a new project, Right click on folder and select "Open PowerShell window here as administrator".
  • Create a new subfolder by running a command "mkdir rr-shell-data-processing" where rr-shell-data-processing is subfolder.
  • Change directory to your subfolder by cd "rr-shell-data-processing".
  • Make an empty new items named as "README.md" and as ".gitignore" using command "ni README.md" and "ni .gitignore".
  • Find an interesting web page (http://shakespeare.mit.edu/julius_caesar/full.html (Links to an external site.)) and copy it.
  • Use curl to return the page text. Hint: curl "http://shakespeare.mit.edu/julius_caesar/full.html"
  • Commands to return the next text by using curl
   curl "http://shakespeare.mit.edu/julius_caesar/full.html" -O "data.txt"

To request content from an HTTPS url,use the command:

[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
  • Then close the window in PowerShell.

To process the text data using Bash commands

  • Transform each space ' ' into a return character '\12' (aka ASCII line feed) [2]
  tr ' ' '\12' < data.txt
  • Functionally, this "flat maps" each line into individual words. Pipe the output to sort (send the results of one command as input into another command)
tr ' ' '\12' < data.txt | sort
  • Pipe the sorted output to uniq -c to count
 tr ' ' '\12' < data.txt | sort | uniq -c
  • Pipe the reduced output to sort with -nr flag
tr ' ' '\12' < data.txt | sort | uniq -c | sort -nr
  • To redirect the output to result.txt
tr ' ' '\12' < data.txt | sort | uniq -c | sort -nr > result.txt

Text files:-

rr-shell-data-processing's People

Contributors

rajeshwari-rudra avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

rajeevbro

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.