PROJECT1:
-
Find a data visualization tool and learn to use it;
-
Find a publicly available data set, use the data visualization tool to make a nice presentation of the data visually.
Using Tableau for Data Visualization:
Step 1: Please go to this URL: http://www.tableau.com/public/ to download the free version of Tableau.
Step 2: Refer to this resource page for more tutorials than you will need to build a nice visualization project: http://www.tableau.com/public/training
Step 3: Look for inspiration: the gallery is a good place to look for ideas if you do not know where to begin: http://www.tableau.com/public/gallery
Step 4: Look for data: Sample data sets can be found here: http://www.tableau.com/public/community/sample-data-sets. In addition, there are many data sets online these days at other web sites such as kaggle. You can also use your own data set if you have an interesting problem of your own to work on.
DELIVERABLE:
-
Is the use of visual components effective and intuitive?
-
Is the dashboard INTERACTIVE? Does it allow users to drill up and down (or across) through the use of various widgets?
-
Is the message clear thanks to the visualization?
PROJECT2:
In this project, each team will use a publicly available data set, define a mining problem, then use at least TWO different mining algorithms to mine the data set. Compare the performance of the two models, choose the better one, and interpret your findings. Teams will write a report summarizing their findings. The report is due at the end of the semester.
To look for a data set, you can go to kaggle,com or UC Urvine machine learning data repository. There are plenty of data for you to choose from. Make sure you read files that describe the data before you engage in mining.
To mine, you can use Rapid Miner. If you do use it, make sure to have a screen shot for EACH mining model for your presentation.
The resulted confusion matrix and performance parameters need to be shown in your presentation.
In your discussion, focus on your findings and their significance, both mathematically and practically.
STRUCTURE:
Part I: Introduction. Discuss the problem at hand, the background of the data set. Who collected/created it, for what purpose, etc.
Part II: Data. Show summary stats for your data. Number of rows, columns, median, mean, standard deviation, etc. RapidMiner could be a very useful help for this task.
Part III: Mining Algorithms. Introduce at least TWO CLASSIFICATION / PREDICTION algorithms covered in our class. Show screenshot of the mining workflow.
Part IV: Evaluation. The most important part. You will address the following issues:
a. Do you choose precision or recall as the main measure for your task? Why?
b. Show the confusion matrix for the two algorithms. Which one is better?
Project Link - https://public.tableau.com/profile/sirisha.bojjireddy#!/vizhome/BankData_15736791328020/Story1