Coder Social home page Coder Social logo

smpe's Introduction

Scientific Methodology and Performance Evaluation for Computer Scientists

Reporting errors: Although I do my best there may definitely be typos and broken links. This is github so please report me everything you find so that I can improve for others. :)

This website gather the series of lectures on applied performance evaluation avalaible that I was invited to give in various occasions. I have given this course with Jean-Marc Vincent several years (roughly from 2011 to 2015) in the second year of the Master of Science in Informatics at Grenoble and more recently at Federal University of Rio Grande do Sul.

Content

The aim of this course is to provide the fundamental basis for sound scientific methodology of performance evaluation of computer systems. This lecture emphasize on methodological aspects of measurement and on the statistics needed to analyze computer systems. I first sensibilize the audience to the experiment and analysis reproducibility issue in particular in computer science. Then I present tools that help answering the analysis problem and may also reveal useful for managing the experimental process through notebooks. The audience is given the basis of probabilities and statistics required to develop sound experiment designs. Unlike some other lectures, my goal is not to provide analysis recipes that people can readily apply but to make people really understand some simple tools so that they can then dig deeper later on.

The course is organized in 5 very dense lectures of 3 hours

  1. Reproducible research. A video of a similar presentation (a mixture of lecture 1 and 2 actually) is availableon canal-u (Part 1 and Part 2) and graal (Part 1 and Part 2).
  2. Data visualization/presentation.
  3. Introduction to probabilities/statistics.
  4. Linear regression.
  5. Design of Experiments.

All the examples given in this series of lecture use the R language and the source is provided so that people can reuse them. The slides are composed with org-mode/beamer.

More precisely, I introduce the audience to the following tools:

  • R and ggplot2 that provide a standard, efficient and flexible data management and graph generation mechanism. Although R is quite cumbersome at first for computer scientists, it quickly reveals an incredible asset compared to spreadsheets, gnuplot or graphical libraries like matplotlib or tikz.
  • knitR is a tool that enables to integrate R commands within a LaTeX or a Markdown document. It allows to fully automatize data post-processing/analysis and figure generation down to their integration to a report. Beyond the gain in term of ease of generation, page layout, uniformity insurance, such integration allows anyone to easily check what has been done during the analysis and possibly to improve graphs or analysis.
  • I explain how to use these tools with Rstudio, which is a multi-platform and easy-to-use IDE for R. For example, using R+Markdown (Rmd files) in Rstudio, it is extremely easy to export the output result to Rpubs and hence make the result of your research available to others in no more than two clicks.
  • I also mention other alternatives such as org-mode and babel or the ipython notebook that allow a day-to-day practice of reproducible research in a somehow more fluent way than knitR but I am probably not fully objective here. :)

Using R

Installing R and Rstudio

Here is how to proceed on debian-based distributions:

sudo apt-get install r-base r-cran-ggplot2 r-cran-reshape 

Rstudio and knitr are unfortunately not packaged within debian so the easiest is to download the corresponding debian package on the Rstudio webpage and then to install it manually (depending on when you do this, you can obviously change the version number).

wget http://download1.rstudio.org/rstudio-0.97.551-amd64.deb ## actually, this archive is likely to be outdated now so get the most recent one.
sudo dpkg -i rstudio-0.97.551-amd64.deb
sudo apt-get -f install # to fix possibly missing dependencies

You will also need to install knitr. To this end, you should simply run R (or Rstudio) and use the following command.

install.packages("knitr")

If r-cran-ggplot2 or r-cran-reshape could not be installed for some reason, you can also install it through R by doing:

install.packages("ggplot2")
install.packages("reshape")

Producing documents

The easiest way to go is probably to use R+Markdown (Rmd files) in Rstudio and to export them via Rpubs to make available whatever you want.

We can roughly distinguish between three kinds of documents:

  1. Lab notebook (with everything you try and that is meant mainly for yourself)
  2. Experimental report (selected results and explanations with enough details to discuss with your advisor)
  3. Result description (rather short with only the main point and, which could be embedded in an article)

We expect you to provide us the last two ones and to make them publicly available so as to allow others to comment on them.

Learning R

For a quick start, you may want to look at R for Beginners. A probably more entertaining way to go is to follow a good online lecture providing an introduction to R and to data analysis such as this one: https://www.coursera.org/course/compdata.

A quite effective way is to use SWIRL, an interactive learning environment that will guide through self-paced lesson.

install.packages("swirl")
library(swirl)
install_from_swirl("R Programming")
swirl()

References

  • R. Jain, The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling, Wiley- Interscience, New York, NY, April 1991. A new edition will be available in September 2015.

    This is an easy-to-read self-content book for practical performance evaluation. The numerous checklists make it a great book for engineers and every CS experimental scientist should have read it.

  • David J. Lilja, Measuring Computer Performance: A Practitioner’s Guide, Cambridge University Press 2005

    I like the organization although I really don’t like the content that provides very little insight on why the theory applies or not. I also think it is too general and lacks practical examples. It may be interesting for those willing a quick and broad presentation of the main concepts and “recipes” to apply.

  • Jean-Yves Le Boudec. Methods, practice and theory for the performance evaluation of computer and communication systems, 2006. EPFL electronic book.

    A very good book, with a much more theoretical treatment than the Jain. It goes way farther on many aspects and I can only recommand it.

  • R. Nelson, Probability stochastic processes and queuing theory: the mathematics of computer performance modeling. Springer Verlag 1995

    For those willing to know more about queuing theory.

smpe's People

Contributors

alegrand avatar schnorr avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.