Coder Social home page Coder Social logo

flight-safety-analysis's People

Contributors

alexs12 avatar martosc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flight-safety-analysis's Issues

Phase 0: repository launching

  • Repo creation
  • Write a good readme
  • How to contribute
  • Function docstrings and documentation
  • Split slides into Notebooks:
    • Phases of flight
    • Occurrences
    • Age
    • Evolution
  • Others?

Gain visibility and colaboration

@AeroPython (and just in case that did not work: @AeroPython/instructors) I invoke you all.

This repository gathers some of the analysis made for the PyData presentations. These materials are potentially useful and attractive to new aeropythoners. Conversations with @Juanlu001 suggest that in the medium they could be used for a pandas/dask tutorial. Regarding pandas, there are many tutorials and courses, but using a domain specific dataset is always a plus to gain attention from specific sectors. On the dask side, there are few tutorials, all of them really basic, and many information spread in different blogs, GitHub issues and stackoverflow questions, maybe we have a chance of contributing with a relevant tutorial.

@martosc and I would be grateful if you could have a look at the notebooks and/or source code and suggest some improvements (or even make a pull request) before we start advertising the repo.

Database access

Currently some functions under flight_safety/queries are provided to access to the data base filter following a criteria and return a DataFrame with the suitable column types.

Issues:

  • The filtering strategy is not very flexible, as it only covers the study case for the talks: accidents for far_parts 121, 125. Coding more and more functions is not an option and passing the filtering options as arguments would lead to tones of coding too. We need a viable alternative here.
  • It seems that pandas dtype inference is not working well (probably due to data quality) and anyway, some object columns must be converted to categorical. Is there a better way than typing everything at the beginning of the script?

Would something like sqlalchemy (https://www.sqlalchemy.org/) or pypika (https://github.com/kayak/pypika) help here?

Database storage and updating

Currently the database is not hosted in the repository due to a storage restriction in GitHub. Users are encouraged to download it using the script flight_safety/get_data.py which will download it from dropbox and place it in the data folder, or to do the process themselves.

Issues with this approach:

  • Original database is a Microsoft Access (mdb) file.
  • The database we are using is that one, converted to sqlite3, which lets users working on Linux use it comfortably.
  • The sqlite3 database is stored in a dropbox account. If the original file is updated, we need to convert it and upload it again. As the dropbox account is personal, only @AlexS12 can do it.
  • The database is hosted in a personal Dropbox account.
  • This workflow could lead to different users having different versions of the database.
  • We are forced to maintain the script to download the data and place it in the convenient place.

We should radically change this approach. It could be great if we could find somewhere to place the data in sqlite format and some automatic process could convert the original mdb files each month (they are supposed to be updated every month). When importing flight_safety, it could check if the user is working with the last version of the database and warn him otherwise.

Available options? other approaches?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.