I just learned that the 10th printing of the 2nd edition of the book “The Elements of Statistical Learning: Data Mining, Inference, and Prediction” by Hastie, Tibshirani, and Friedman which deals mostly with supervised methods for learning from data can be officially and legally downloaded for free in pdf-version from http://www-stat.stanford.edu/~tibs/ElemStatLearn/.

The hardcover and Kindle versions can be obtained for example from amazon for a nontrivial price tag:

Happy learning,
Christian

# Ask a Mathematician

In my opinion it is better than any math book because you’re directly confronted with an interesting question and can follow the answers step-by-step having time to form your own solution or opinion on the topic. This induces much more involvement than just consuming a book’s content. The consequence is that you really learned something after working (and I really mean working and not just reading) through the answers.

# Advanced R Programming Workshop Available on Bioconductor Website

The folks from Bioconductor, the “open source software for bioinformatics” project based on R, generously publish materials from their conferences and workshops on their website you can download free of charge. Even if you’re not into genetics you should check out the available general purpose workshop dealing with “Advanced R Programming”. The available materials include slides, papers, and self-study exercises:

If you’re interested in bioinformatics don’t forget to have a look into their other courses.

# ROI of Acquiring New Skills – GNU R as an Example

In his youtube video Courtney Brown , Ph.D. gives some reasons why learning R is worth the effort. His set of reasons is far from comprehensive but I think he covers some important aspects. In my opinion the return on investment argument is his most important one to convince people to learn R – especially potential business users and academics. The former are often dissatisfied with their current software (or its price),  the latter are often disillusioned by the non-applicability of many of their theoretical and software skills they acquired so far. Learning relevant methods and software to solve relevant problems is very satisfying.

# Competing product analysis: NBA 2K12 vs. EA Sports

My newest finding is a small but nice example of a competitor’s product analysis:

Gamespot author Marko Djordjevic took EA Sports’ point of view and analyzed 2K Sports’ current basketball video game “NBA 2K12”: What must EA do to outperform 2K’s successful franchise? Check out his article. It’s educating even if you’re not interested in video games nor basketball: http://www.gamespot.com/features/ea-on-the-rebound-6347523/

# The variance of the arithmetic average

Why is the standard deviation of the arithmetic average equal to $\sigma/\sqrt{n}$ ?

For $X \mathrm{\;i.i.d.}\; \sim N(\mu,\sigma^2)$ (“i.i.d.” means identically and independently distributed):

$\begin{array}{l l}Var(\overline{X}) &= Var((1/n)\sum_{i=1}^n{x_i})\\&=(1/n^2) Var(\sum_{i=1}^n{x_i}), Var(aX)=a^2Var(X); a\in \mathbb{R} \mathrm{\,const.}\\&= (1/n^2) \sum_{i=1}^nVar({x_i}) \quad \mathrm{for\;} X \mathrm{\;i.i.d.}\\&=(1/n^2) \sum_{i=1}^n{x_i}{\sigma^2}\\&=(1/n^2)*n*{\sigma^2}\\&=\sigma^2/n\end{array}$

Hence, the standard deviation $\sigma=\sqrt{\sigma^2}$ of $\overline{X}$ is $\sigma/\sqrt{n}$.

Voilà. The world is safe again.

# Harvard Citation Style and LaTeX – Problem Solution

Short note: I just wanted to share a blog post with everyone experiencing problems with the Harvard LaTeX package:

If you are having issues citing URLs with the Harvard package (which is very likely) follow the link to theseekersquill and let the posted solution save your day: http://theseekersquill.wordpress.com/2010/04/01/latex-harvard-citations/

# Expectation and variance of a binary random variable

If you start dealing with Generalized linear models (GLMs) you will come across sentences like “Obviously the variance of the binary dependent variable is $\mu(1-\mu)$.” Well, for everybody who does not find it too obvious the following derivation may help in understanding the mathematical reasoning behind GLMs, especially Logit and Probit models.

# Correlation and causality: An everyday life example of causal analysis

While digging a little bit into Java, I found an (at least for statistics-interested people) interesting post on javaworld.com written by Dustin Marx on “Correlation Between Typing Speed and Programming Competence”. From a statistician’s point of view you can see the article as a nice example of a small “everyday life” causal analysis.

Mr. Marx informally analyzes the causes for correlation between the attributes “typing speed” and “programming skill”. If you are short of time just read the conclusion to get the idea (which I cannot recommend for scientific papers!). Such examples are imho very useful for beginners to get the idea of “correlation vs. causality” and for professionals to get a look at their sophisticated mathematical analysis tools from a refreshing basic and everyday life perspective.

# Finger Exercise: Throwing two Dice in R using the rpanel Package

After a period of examinations I needed to fresh up some R-vocabulary (the exact syntax) because I started to mix it up with other programming languages’ syntax. And here is my result: A dice game simulation. Not very innovative, not very difficult, but I suppose it could be quite useful for people being new to R as an easy example of how programming in R may work. Furthermore, it is an application of the nice rpanel package.

For this program being very simple I skipped most comments on the code – but will add some more in the near future. The variables should be quite self-explaining. If not, feel free to write a comment. Of course more experienced programmers are welcomed to improve the code.

Usage: Run the code in R, use the sliders of the panel to choose the number of dice to throw, the number of throws and hit the Throw! button.

This small program enables you to investigate or illustrate, respectively, some aspects of convergence or simply to get a feeling for your chances to win your next dice game. Feel free to use the program for didactical purposes if you find it useful (see the license in the footer of this page). If you want to have reproducable results set a random number seed of your choice by using the set.seed() function implemented in R.

Happy R-ing.

# How to generate bivariate pdfs given a copula and the margins in R and MATLAB

After finding a few unanswered requests for a solution of this problem in the web (including my own…) I’d like to share the final results of my work.

The problem:

Suppose you have two random variables, Z and T.

Z is N(0,1) distributed.
T is t(3) distributed.

Now you are supposed to produce four contour plots of the random variables’ joint pdf for the cases that the variables’ dependence structure is given by the

1. Gaussian,
2. Clayton,
3. Frank- and
4. Gumbel copula.

With the copula and the marginal distributions given the (bivariate) joint distribution of Z and T can be constructed. And this post is about doing exactly this in R and MatLab (and drawing the corresponding contour-plots).