Coder Social home page Coder Social logo

qpc-database's Projects

pai icon pai

Resource scheduling and cluster management for AI

pandas-challenge icon pandas-challenge

# Pandas Homework - Pandas, Pandas, Pandas ## Background The data dive continues! Now, it's time to take what you've learned about Python Pandas and apply it to new situations. For this assignment, you'll need to complete **one of two** (not both) Data Challenges. Once again, which challenge you take on is your choice. Just be sure to give it your all -- as the skills you hone will become powerful tools in your data analytics tool belt. ### Before You Begin 1. Create a new repository for this project called `pandas-challenge`. **Do not add this homework to an existing repository**. 2. Clone the new repository to your computer. 3. Inside your local git repository, create a directory for the Pandas Challenge you choose. Use folder names corresponding to the challenges: **HeroesOfPymoli** or **PyCitySchools**. 4. Add your Jupyter notebook to this folder. This will be the main script to run for analysis. 5. Push the above changes to GitHub or GitLab. ## Option 1: Heroes of Pymoli ![Fantasy](Images/Fantasy.png) Congratulations! After a lot of hard work in the data munging mines, you've landed a job as Lead Analyst for an independent gaming company. You've been assigned the task of analyzing the data for their most recent fantasy game Heroes of Pymoli. Like many others in its genre, the game is free-to-play, but players are encouraged to purchase optional items that enhance their playing experience. As a first task, the company would like you to generate a report that breaks down the game's purchasing data into meaningful insights. Your final report should include each of the following: ### Player Count * Total Number of Players ### Purchasing Analysis (Total) * Number of Unique Items * Average Purchase Price * Total Number of Purchases * Total Revenue ### Gender Demographics * Percentage and Count of Male Players * Percentage and Count of Female Players * Percentage and Count of Other / Non-Disclosed ### Purchasing Analysis (Gender) * The below each broken by gender * Purchase Count * Average Purchase Price * Total Purchase Value * Average Purchase Total per Person by Gender ### Age Demographics * The below each broken into bins of 4 years (i.e. <10, 10-14, 15-19, etc.) * Purchase Count * Average Purchase Price * Total Purchase Value * Average Purchase Total per Person by Age Group ### Top Spenders * Identify the the top 5 spenders in the game by total purchase value, then list (in a table): * SN * Purchase Count * Average Purchase Price * Total Purchase Value ### Most Popular Items * Identify the 5 most popular items by purchase count, then list (in a table): * Item ID * Item Name * Purchase Count * Item Price * Total Purchase Value ### Most Profitable Items * Identify the 5 most profitable items by total purchase value, then list (in a table): * Item ID * Item Name * Purchase Count * Item Price * Total Purchase Value As final considerations: * You must use the Pandas Library and the Jupyter Notebook. * You must submit a link to your Jupyter Notebook with the viewable Data Frames. * You must include a written description of three observable trends based on the data. * See [Example Solution](HeroesOfPymoli/HeroesOfPymoli_starter.ipynb) for a reference on expected format. ## Option 2: PyCitySchools ![Education](Images/education.png) Well done! Having spent years analyzing financial records for big banks, you've finally scratched your idealistic itch and joined the education sector. In your latest role, you've become the Chief Data Scientist for your city's school district. In this capacity, you'll be helping the school board and mayor make strategic decisions regarding future school budgets and priorities. As a first task, you've been asked to analyze the district-wide standardized test results. You'll be given access to every student's math and reading scores, as well as various information on the schools they attend. Your responsibility is to aggregate the data to and showcase obvious trends in school performance. Your final report should include each of the following: ### District Summary * Create a high level snapshot (in table form) of the district's key metrics, including: * Total Schools * Total Students * Total Budget * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### School Summary * Create an overview table that summarizes key metrics about each school, including: * School Name * School Type * Total Students * Total School Budget * Per Student Budget * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### Top Performing Schools (By % Overall Passing) * Create a table that highlights the top 5 performing schools based on % Overall Passing. Include: * School Name * School Type * Total Students * Total School Budget * Per Student Budget * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### Bottom Performing Schools (By % Overall Passing) * Create a table that highlights the bottom 5 performing schools based on % Overall Passing. Include all of the same metrics as above. ### Math Scores by Grade\*\* * Create a table that lists the average Math Score for students of each grade level (9th, 10th, 11th, 12th) at each school. ### Reading Scores by Grade * Create a table that lists the average Reading Score for students of each grade level (9th, 10th, 11th, 12th) at each school. ### Scores by School Spending * Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following: * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### Scores by School Size * Repeat the above breakdown, but this time group schools based on a reasonable approximation of school size (Small, Medium, Large). ### Scores by School Type * Repeat the above breakdown, but this time group schools based on school type (Charter vs. District). As final considerations: * Use the pandas library and Jupyter Notebook. * You must submit a link to your Jupyter Notebook with the viewable Data Frames. * You must include a written description of at least two observable trends based on the data. * See [Example Solution](PyCitySchools/PyCitySchools_starter.ipynb) for a reference on the expected format. ## Hints and Considerations * These are challenging activities for a number of reasons. For one, these activities will require you to analyze thousands of records. Hacking through the data to look for obvious trends in Excel is just not a feasible option. The size of the data may seem daunting, but pandas will allow you to efficiently parse through it. * Second, these activities will also challenge you by requiring you to learn on your feet. Don't fool yourself into thinking: "I need to study pandas more closely before diving in." Get the basic gist of the library and then _immediately_ get to work. When facing a daunting task, it's easy to think: "I'm just not ready to tackle it yet." But that's the surest way to never succeed. Learning to program requires one to constantly tinker, experiment, and learn on the fly. You are doing exactly the _right_ thing, if you find yourself constantly practicing Google-Fu and diving into documentation. There is just no way (or reason) to try and memorize it all. Online references are available for you to use when you need them. So use them! * Take each of these tasks one at a time. Begin your work, answering the basic questions: "How do I import the data?" "How do I convert the data into a DataFrame?" "How do I build the first table?" Don't get intimidated by the number of asks. Many of them are repetitive in nature with just a few tweaks. Be persistent and creative! * Expect these exercises to take time! Don't get discouraged if you find yourself spending hours initially with little progress. Force yourself to deal with the discomfort of not knowing and forge ahead. Consider these hours an investment in your future! * As always, feel encouraged to work in groups and get help from your TAs and Instructor. Just remember, true success comes from mastery and _not_ a completed homework assignment. So challenge yourself to truly succeed! ### Copyright Trilogy Education Services © 2019. All Rights Reserved.

pdfium icon pdfium

The PDF library used by the Chromium project

pecos icon pecos

PECOS - Prediction for Enormous and Correlated Spaces

pensum-extractor icon pensum-extractor

The Pensum-extractor tool distills a load-test scenario out of historical Google Analytics data

perf-track icon perf-track

Tracking framework performance and usage at scale

perfdoc icon perfdoc

A cross-platform Vulkan layer which checks Vulkan applications for best practices on Arm Mali devices.

perfetto icon perfetto

Read-only mirror of https://android.googlesource.com/platform/external/perfetto/

permission.site icon permission.site

A site to test the interaction of web APIs and browser permissions.

php-enum icon php-enum

The enum PHP is missing, inspired from SplEnum

php-ga-tools icon php-ga-tools

Google Analytics helper class for sending hits using the Measurement protocol

php-qrcode-detector-decoder icon php-qrcode-detector-decoder

This is a PHP library to detect and decode QR-codes. This is first and only QR code reader that works without extensions.

php-src icon php-src

Security backports for EOL versions of PHP.

phpuseragent icon phpuseragent

Lightning Fast, Minimalist PHP User Agent String Parser.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.