Coder Social home page Coder Social logo

terapeak-ex1's Introduction

Ex 1

  • The DB used is postgresql.
  • Changed the password of the postgres user to 123. I should have created a new user, but I think for the exercise it is good enough.
  • Again, for the sake of the simplicity the password and the user are hard coded in the source code.
  • Created exdb database using the mkdb.sh script.
  • Using the Public schema.
  • Use the reset.sql script to create the table before running the application - ./psql.sh -f reset.sql.
  • Running the script erases the old table, so beware!
  • I did not spend much time doing error handling, something I would have surely done in a real application.
  • The code comes with an ant build script, just run ant without any arguments and find ex1.jar under bin\deploy.
  • java -jar bin\deploy\ex1.jar -f ../input_data_sample.json.gz

Ex 2

  • The keywords are calculated by splitting the description using the following regex: "\\s+|\\)|\\(|\\||\\?|\\!|\\\"|\\.|;|,|\\s+[':&-]|[':&-]\\s+"
  • Some keywords are ignored, namely these (case insensitive): "a", "the", "of", "per", "to", "this", "an", "with", "for", "and", "in", "any", "not", "only", "does", "do", "but", "you", "can", "or", "on", "is", "from", "your", "may", "so".
  • The proper way of handling such filtering is by creating a component which would receive a string and spit out keywords. Then this component should be injected into the relevant place using some DI. However, this is too much for an exercise.
  • Some keywords are too long, in which case they are truncated and terminated with an ellipsis. An alternative would be to add another boolean column explicitly indicating which keywords were truncated. Again, the proper handling should be implemented with an injectable component. The max size of the keyword is determined by reading the database metadata - no magic numbers in the code (only magic strings). Or we could simply give the keyword column the same size as that of the description (10,000).
  • The keywords table is de-normalized. Meaning it contains the name in full rather than a nameId foreign key into the items table.
  • Make sure to reset the database - ./psql.sh -f reset.sql
  • The query uses exact match on the keyword field. If a partial match is needed we can use the LIKE SQL statement. This could be a command line argument...
  • One db insertion thread is not enough - too slow. Use the following command line: java -jar bin\deploy\ex1.jar -f ../input_data_sample.json.gz --dbThreadCount 5
  • To query: java -jar bin/deploy/ex1.jar --query visual

terapeak-ex1's People

Contributors

markkharitonov avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.