-
Preparation of the Datasets (dataset_TSMC2014_NYC.csv, dataset_TSMC2014_TKY.csv)
-
Converted string type values of column - “utcTimestamp” to datetime64[ns] type and disassembled into columns: “Year”, ”Month”, ”Hour”, ”Minute”.
- Mapped latitude and longitude with precision 5, and inserted into new column - “Geohash”, by using pygeohash module. It divided the coordinates into “buckets” of different zones based on number of digits (precision).
- Encoding data types Last step of data wrangling before I begin the model evaluation, was encoding Binary ID’s and string values to integer type, by using Label Encoder module. Specifically target value – Venue Category and feature – Geohash.