Coder Social home page Coder Social logo

bayunova28 / spotify_lyrics Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 20.21 MB

This repository contains my personal project to generate mapreduce using apache hadoop

Shell 59.84% HiveQL 26.07% Python 14.10%
apache-derby apache-hadoop apache-hive hadoop-mapreduce mapreduce-python spotify

spotify_lyrics's Introduction

Spotify Lyrics

Background

Audio streaming and media services provider founded on 23 April 2006 by Daniel Ek and Martin Lorentzon. It is one of the largest music streaming service providers, with over 456 million monthly active users, including 195 million paying subscribers, as of September 2022. Spotify is listed (through a Luxembourg City-domiciled holding company, Spotify Technology S.A.) on the New York Stock Exchange in the form of American depositary receipts. Spotify offers digital copyright restricted recorded music and podcasts, including more than 82 million songs, from record labels and media companies. As a freemium service, basic features are free with advertisements and limited control, while additional features, such as offline listening and commercial-free listening, are offered via paid subscriptions. Users can search for music based on artist, album, or genre, and can create, edit, and share playlists. Spotify is available in most of Europe, as well as Africa, the Americas, Asia and Oceania, with a total availability in 184 markets. The service is available on most devices including Windows, macOS, and Linux computers, iOS and Android smartphones and tablets, smart home devices such as the Amazon Echo and Google Nest lines of products and digital media players like Roku.

Requirement

Mapper.py

# import python library
import sys

# input comes from STDIN (standard input)
for line in sys.stdin:
    # remove leading and trailing whitespace
    line = line.strip()
    # split the line into words
    words = line.split()
    # increase counters
    for word in words:
        # write the results to STDOUT (standard output)
        # tab-delimited; the trivial word count is 1
        print('%s\t%s' % (word, 1))

Reducer.py

# import python library
from operator import itemgetter
import sys

# set parameter from words dataset
current_word = None
current_count = 0
word = None

# input comes from STDIN
for line in sys.stdin:
    # remove leading and trailing whitespace
    line = line.strip()

    # parse the input we got from mapper.py
    word, count = line.split('\t', 1)

    # convert count (currently a string) to int
    try:
        count = int(count)
    except ValueError:
        # count was not a number, so silently
        # ignore/discard this line
        continue

    # this IF-switch only works because Hadoop sorts map output
    # by key (here: word) before it is passed to the reducer
    if current_word == word:
        current_count += count
    else:
        if current_word:
            # write result to STDOUT
            print('%s\t%s' % (current_word, current_count))
        current_count = count
        current_word = word

# do not forget to output the last word if needed!
if current_word == word:
    print('%s\t%s' % (current_word, current_count))

Run mapper and reducer program

EXEC=$hadoop jar c:\hadoop-2.8.0\share\hadoop\tools\lib\hadoop-streaming-*.jar -file "D:\mapper.py" -mapper "python D:\mapper.py" -file "D:\reducer.py" -reducer "python D:\reducer.py" -input spotify/samples.txt -output spotify/output/
echo "$EXEC"

Hadoop Web UI

Acknowledgement

spotify_lyrics's People

Contributors

bayunova28 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.