Coder Social home page Coder Social logo

taipan_cc_lisi_search_app's Introduction

Running application

  • To run the application you need to have ruby installed.
  • run ruby lib/search_app.rb in the current directory (or the correct relative directory from whereever the command is being run).
  • It will automatically load the data from the /data directory.
  • It will then prompt for the table you wish to search, followed by the field, followed by the value.
  • The response is a pretty formatted json string, where the associations have been merged in to the base items. eg:
[
  {
    "_id": 32,
    "url": "http://initech.zendesk.com/api/v2/users/32.json",
    ...
    "organization_id": 105,
    "organization": {
      "_id": 105,
      "url": "http://initech.zendesk.com/api/v2/organizations/105.json",
      "external_id": "52f12203-6112-4fb9-aadc-70a6c816d605",
      ...
    },
    "tickets": [
      {
        "_id": "cb3b726e-9ba0-4e35-b4d6-ee41c29a7185",
        "url": "http://initech.zendesk.com/api/v2/tickets/cb3b726e-9ba0-4e35-b4d6-ee41c29a7185.json",
        "external_id": "19376875-49a7-4540-847b-b25f093b1635",
        ...
      },
      {
        ...
      }
    ]
  },
  ...other users
}

Testing

  • To run the specs need to run bundle and then rspec.

Assumptions

  • id's are unique
  • This will work more easily if the data is not changed. Can easily add more data and keep things indexed, but if had to update or delete would be a bit trickier to implement (though definitely doable).
  • A prettified json string is considered 'Human readable' ie. the user is comfortable reading json.
  • It is ok to restrict each search to a specific data type and that you can't search for a field across multiple data types at once.

Description and reasoning

Database class

  • There is one Database class that contains one "table" for each of the object types.
    • The database is created from json strings of the data, with a class method that creates a new instance by reading the json from files.
    • All external search calls come through this class and if we would add update, or delete functions this class would handle those as well.

Tables and indexing

  • Each table has the original hash saved which is used for arbitrary queries which need to scan the whole table.
  • #add_datais a separate method outside of theinitialize because all the tables have to be init'ed before they can be stored.
    • It returns self so it can be chained with other methods in the future if need be (eg. .add_data(data).add_association(associations).create_indexes)
  • Have added a primary in-memory-index to every table which is the '_id' => object so each hash is stored twice.
    • This doubles the memory requirements but makes lookups much easier. We could remove the original hash but the orgiinal hash is nicer for doing full scans on and 2x memory won't be the bottleneck for storage.
  • An additional in-memory-index is also created for each foreign key as they are the most commonly referenced fields.
    • As opposed to the primary index the values here are the primary keys of the object eg. on the users table there is an index {[org_key] => [array of user _ids]}. This allows for multiple additional indexes while limiting the extra memory.
    • The foreign key indexes could be combined into a single hash that would have all indexed fields as keys, but prefer having them as seperate hashes to provide more flexibility. For example if it's decided it's no longer necessary to keep one specific one it's easy to remove, or if they are very large having multiple will make it easier to store them on separate machines or in separate files.
  • Could add indexes on every field but don't think the space-performance ratio would be worthwhile, so unless there are specific performance issues is not needed.
  • Currently these indexes won't necessarily provide a large perforance boost as all the find_by queries still require a full table scan, but this logic can be extended to any field that is found to be frequently queried. If we decide to write indexes to files we could also then easily index all columns as the memory usage would no longer be a concern (though could depend on the memory-to-disk availability ratio if this would make sense).

Scalability and performance

  • Currently the main bottleneck to scalability is the memory usage, since it stores everything in memory. Though have tested it locally with up to 100,000 users and 100,000 tickets and have had no issues. If you would want it to scale then would have to either start writing portions to disk or could keep it all in memory and start splitting it across machines.
  • Currently all searches need to do a full scan of the table which is O(n) time. The additional association lookups is (near) constant time as all associations are indexed. But being that it is all in memory the performance is really fast and even full scans of 100,000 items are near instantaneous.

taipan_cc_lisi_search_app's People

Contributors

levsero avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.