Coder Social home page Coder Social logo

locationintelligencepipeline's Introduction

locationIntelligencePipeline

1. linkCompanyAndLocation_v2_focus_on_location.py

Function: Build the relationship between company and location, where only geo position of both are given. Either one company assigned with one location or one company assigned with multiple location in range will be fine.

Usage: --run_root [path of data] --ls_card [name of location_scorecard ( location feature ) ] --apps [output file will append it after the original name] --geobit [precision used for geohash matching] --dist_thresh [threshold of distance that used to get rid of some companis far away from the location in the same geohash]

Inputs:

1.1 location_scorecard:

    The feature of buildings including the building class, size, city ,wework_belongs and so on with key 'atlas_location_uuid'.

1.2. company feature in each city separately:

		E.g: ['dnb_pa.csv', 'dnb_sf.csv', 'dnb_sj.csv', 'dnb_Los_Angeles.csv', 'dnb_New_York.csv']

		Each file contains information of a company with key 'duns_number'.

Outputs:

File that contains pair of company and location. 
Lets call such file as 'company-location relationship file'.
It will be named as ['PA', 'SF', 'SJ', 'LA', 'NY'] + app_date + '.csv'.

2. get_csv_for_training_and_testing.py

Function: Generate the training/testing file for ML.

Usage: --run_root [path of data] --ls_card [name of location_scorecard ( location feature ) ] --app_date [output file will append date after the original name] --ratio [ratio of training sample: testing sample]

Inputs:

2.1. location_scorecard:

	The feature of buildings including the building class, size, city ,wework_belongs and so on with key 'atlas_location_uuid'.

2.2. company feature in each city separately:

	E.g: ['dnb_pa.csv', 'dnb_sf.csv', 'dnb_sj.csv', 'dnb_Los_Angeles.csv', 'dnb_New_York.csv']

	Each file contains information of a company with key 'duns_number'.

2.3. company-location relationship file:

	E.g: ['PA', 'SF', 'SJ', 'LA', 'NY'] + apps.

	It was generated by linkCompanyAndLocation_v2_focus_on_location.py ahead.

Outputs:

Files that can be used to train a model:

'train_val_test_location_company_82split'+apps: Train/test pairs where train set only contains P data while test pairs contains P/N data.

'company_feat' + apps: Normalized feature of each company.

'location_feat' + apps: Normalized feature of each location.

'comp_feat_norm_param' + app_date + '.pkl': Parameters for normalization of continuous feature of company : mean/std, column names.

'loc_feat_norm_param' + app_date + '.pkl': Parameters for normalization of continuous feature of location : mean/std, column names.

'comp_feat_dummy_param' + app_date + '.pkl': Parameters for dummy feature of company: {key: original column name, item: category list}.

'loc_feat_dummy_param' + app_date + '.pkl': Parameters for dummy feature of location: {key: original column name, item: category list}.

3. get_csv_for_new_city_addtionally.py

Function: Generate additional normalized feature file for ML according to the parameters of normalization generated last time.

Usage: --run_root [path of data] --ls_card [name of location_scorecard ( location feature ) ] --app_date [output file will append date after the original name] --ratio [ratio of training sample: testing sample]

Inputs:

3.1. location_scorecard:

	The feature of buildings including the building class, size, city ,wework_belongs and so on with key 'atlas_location_uuid'.

3.2. company feature in each city separately:

	E.g: ['dnb_pa.csv']

	Each file contains information of a company with key 'duns_number'.

3.3. company-location relationship file:

	E.g: ['PA'] + apps.
				
	It was generated by linkCompanyAndLocation_v2_focus_on_location.py ahead.

3.4. normalization file:

	'comp_feat_norm_param' + app_date + '.pkl': Parameters for normalization of continuous feature of company : mean/std, column names.
	
	'loc_feat_norm_param' + app_date + '.pkl': Parameters for normalization of continuous feature of location : mean/std, column names.
	
	'comp_feat_dummy_param' + app_date + '.pkl': Parameters for dummy feature of company: {key: original column name, item: category list}.

	'loc_feat_dummy_param' + app_date + '.pkl': Parameters for dummy feature of location: {key: original column name, item: category list}.

Outputs:

'train_val_test_location_company_82split'+appsadd: Train/test pairs where train set only contains P data while test pairs contains P/N data.

'company_feat' + appsadd: Normalized feature of each company.

'location_feat' + appsadd: Normalized feature of each location.

4. get_sub_recommend_reason_after_similarity.py

Function: Generate reason for <cid,bid,score> pairs.

Usage: --run_root [path of data] --ls_card [name of location_scorecard ( location feature ) ] --apps [apps aligned with output file from previous part of pipeline] --sampled [If True, 'sampled_' will be added.] --ww [If True, 'ww_' will be added.]

Inputs:

4.1. location_scorecard:

	The feature of buildings including the building class, size, city ,wework_belongs and so on with key 'atlas_location_uuid'.

4.2. company feature in each city separately:

	E.g: ['dnb_pa.csv']

	Each file contains information of a company with key 'duns_number'.

4.3. company-location similarity score file:

	E.g: ['sampled_ww_PA_similarity'] + apps.
	
	Each file contains the similarity score of company and building.

Outputs:

'dlsub_sampled_ww_PA_similarity' + apps: Column named 'note' stored the reason. 

It is used for unploading.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.