Coder Social home page Coder Social logo

code-challenge---data-engineer---machinemax's Introduction

    # Code-Challenge---Data-Engineer---Machinemax

Code Challenge - Data Engineer -

Machinemax

Challenge - News Content Collect and Store: Create a solution that crawls for articles from a news website, cleanses the response, stores in a mongo database then makes it available to search via an API.

Instructions

• Candidate should put their test results on a public code repository hosted on Github • Once test is completed please share the Github repository URL or a zip file. • You are building a backend application and no UI is required, input can be provided using a configuration file or command line

Details

• Write an application to crawl an online news website, e.g. www.theguardian.com/ au or www.bbc.com using a crawler framework such as [Scrapy] (http://scrapy.org/). You can use a crawl framework of your choice and build the application in Python. • The application should cleanse the articles to obtain only information relevant to the news story, e.g. article text, author, headline, article url, etc. Use a framework such as Readability to cleanse the page of superfluous content such as advertising and html • Store the data in a hosted mongo database, e.g. compose.io/mongo, for subsequent search and retrieval. Ensure the URL of the article is included to enable comparison to the original. • Write an API that provides access to the content preferably in a database. The user should be able to search for articles by keyword

Evaluation:

• Coding standard, comments and style • Unit testing strategy • Overall solution design • Code is easily understood and communicative (eg. comments, variable names, etc). • Git commit history is consistent, easy to follow and understand.

code-challenge---data-engineer---machinemax's People

Contributors

vinodworks avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.