Coder Social home page Coder Social logo

tsanalysis's People

Contributors

despinozas avatar leocamus avatar

Stargazers

 avatar

Watchers

 avatar

tsanalysis's Issues

'perfiles' changed when running ADATRAP

The purpose of this issue is to document how 'perfiles' databases are evolving and what was found when comparing two different runs of ADATRAP for same days.

  1. It was found differences between Aguila information and 'etapas' information (particularly in the number of trxs. for 2017-04-10, 2017-04-17 and 2017-06-05).
  2. It was notified that ADATRAP was run again for all the available dates.
  3. A new perfil database for 2017-04-10 was uploaded.
  4. @despinozas found differences between old and new perfil for 2017-04-10. These differences were notified to ADATRAP team.
  5. New perfiles for 2017-03-14 and 2017-05-25 were uploaded.
  6. @despinozas found differences between old and new perfiles for 2017-03-14 and 2017-05-25.

Rows with same TIEMPO in evasion ddbb

Consider the complete evasion database by door == 1. There are rows with exactly the same attributes except for the number of people boarding and not validating

Check example.loc[1140:1141] in TEST_MatchingUsers > it is a problem of the original database. Consider getting rid of duplicated rows, having only rows with useful information.

Order in filtering

Consider to check the order in which df are filtered and how it affects the results

Matching user by times - Methodological

It was found that taking mid-times in time windows built with TIEMPO from evasion database do not assign correctly trx from etapas. Check example.head(10) and clean_filtered_df.loc[182388:182395] in TEST_MatchingUsers

Consider to merge the id-expedicion into etapas database

Merging idExpedicion from 'perfiles' to 'etapas' represents a critical step if information from 'paraderos' needs to be gathered. As a bus operating a particular service pass through same stops many times in a day, it becomes necessary to distinguish the different expeditions when computing, at least, t_min and t_max in trx. for a particular 'paradero'

Services names

There is an issue respective the name of services:
Etapas database 'servicio_subida' column contains TS names of the services.
Evasion database 'SERVICIO' column contains USER names of the services.

Consider converting all the names to a common one before merging in EvasionPlotter_ByBusByService.ipynb

Correct turnstile installation dates from July 15th

This represents a problem to:
(1) Time-interval between transaction for 19th of July of 2017, since it is possible for some buses with turnstiles already installed to be omitted (complete day and by period analysis, check https://www.overleaf.com/read/qzkqjmckxztw), low priority
(2) Evasion vs. mean intervals, since it is possible for some buses with turnstiles already installed to be omitted, low priority
(3) Number of transactions by day, high priority

This issue should be analyzed in deep when new information about turnstile installations dates are available.

Outliers in datasets

There are some outliers that should be analyzed better:

  1. 2017-04-10: high means for data in the with and without turnstiles condition. A low number of observations in the with-turnstiles condition. A low number of transactions although it corresponds to a normal working day (see Out[10]) SOLVED.

  2. 2017-04-17: Extremely high means for data in the with and without turnstiles condition. An extremely low number of transactions (see Out[11]) SOLVED
    They were detected when plotting means

  3. 2017-06-05: low number of transactions (see Out[24]) SOLVED

  4. 2017-06-13: low number of transactions (see Out[24]) VERIFIED IN AGUILA

  5. 2017-06-14: low number of transactions VERIFIED IN AGUILA
    They were detected when plotting the total number of transactions over time.

Enhance RunSilentlyDailyEtapasBuilder

  1. Refactoring RunSilentlyDailyEtapasBuilder based on knowledge obtained when developing TemporalDescriptivesBuilder.
  2. Refactoring Jupyter notebooks that import RunSilentlyDailyEtapasBuilder

idExpedicion revisited

When 'perfiles' databases were analyzed, it was noticed that the minimum time between different expeditions for same buses and services is about 30 seconds. This has some consequences mainly to compute mean time intervals for same buses, services, and stops (i.e. in MatchingUsers.py, function groupByEtapasDatabase).

The questions are,

  • Why this is not important when computing mean intervals by bus and by bus and service but it starts to be an issue when computing mean intervals by bus, service and stop?
  • How to recognize if two transactions in same bus, service, and stop, correspond to actually different expeditions?

Outliers in 'etapas' dataset

The utility of this issue is to document the up-to-date state of etapas datasets in terms of number of transactions.

  1. 05-06-2017:
  • The number of observations is 704132. It does not match with Aguila number of observations.
  1. 13-06-2017:
  • The number of observations is 33896. It matches with Aguila number of observations.
  1. 14-06-2017:
  • The number of observations is 816872. It matches with Aguila number of observations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.