This is the last post in a three-part series on investment performance evaluation. The series explores: 1) Performance Measurement, 2) Performance Attribution, and 3) Performance Appraisal. In other words, how much we made, how much we made compared to a benchmark, and how much we made adjusted for the amount of risk we took on.

Performance evaluation allows us to examine the effectiveness of our investment process. It provides us with a systematic way of judging our decision-making process and improving on it, which is what investment theory is all about.

Today’s post deals with performance appraisal, which is a technique used to compare returns with an account’s corresponding risk. Risk is a hard thing to pin down, but for our purposes we will stick with two typical measures: systemic risk, measured by beta, and total risk, measured by the standard deviation of returns.

### Jensen’s Alpha

The first risk-adjusted appraisal metric that we will look at is Jenson’s alpha (also known as ex post alpha or just alpha). This measure is concerned with the average return on a portfolio above or below that predicted by the capital asset pricing model (CAPM). In other words, how much of our return is in excess of what the systemic risk of our portfolio would suggest. Mathematically:

Alpha = R

_{i}– [R_{f}+ β(R_{m}– R_{f})]R

_{i}is the realized return of the portfolio

R_{f}is the risk free rate

R_{m}is the return of the market

β is the beta of the portfolio in relation to the market

For example, if our portfolio returned 20%, the risk free rate is 3%, the market returned 15%, and our measured beta is 1.5 (for every one point the market moves our portfolio moves by 1.5). Our Alpha would be:

Alpha = 20 – [3 + 1.5(15 – 3)]

Alpha = -1%

In other words, we under-performed by 1% given the amount of systemic risk we took on.

### Treynor Measure

The Treynore measure (also known as the reward-to-volatility ratio) is similar to alpha in that it is also concerned with systemic risk. It is defined as the returns in excess of the risk free rate per unit of systemic risk. Mathematically:

Treynor = (R

_{a}– R_{f}) / βR

_{a}is the average return of the portfolio

R_{f}is the average risk free rate

β is the beta of the portfolio

Taking the same example numbers from above, and assuming they are averages, we get:

Treynore = (20 – 3) / 1.5

Treynore = 11.3

Since 11.3% is below the market excess return of 12%, we can conclude that we under-performed.

### Sharpe Ratio

The Sharpe ratio is the appraisal metric that most people are familiar with. It aims to compare the average excess returns of a portfolio to the total risk of the portfolio (standard deviation or returns). Mathematically:

Sharpe = (R

_{a}– R_{f}) / σ

Let’s add to the above example and say that the standard deviation of our past returns is 2%. We get:

Sharpe = (20 – 3) / 2

Sharpe = 8.5

Now let’s say that the market as a whole had a standard deviation of 3% during that time. The Sharpe of the market would be 5.6. In this case, we outperformed the market on a total risk adjusted basis. Instead of using returns in excess of the risk free rate, we could use returns in excess of a given benchmark and the standard deviation in the difference of those returns. In this case, the metric is called the information ratio.

### Issues with risk-adjusted performance

These metrics are in no way the end all be all of effective performance evaluation. For one, alpha and the Treynor measure are based on the validity of CAPM, which has its own issues. Another issue is that the inputs are all backward looking and only approximate the true underlying variables. Overtime, performance evaluated in these manners will fluctuate greatly.

They key problem with a risk-adjusted performance appraisal is the fundamental problem with defining risk. Here we used standard deviation (volatility) as a proxy for total risk, but to paraphrase Howard marks:

- Volatility is extrapolated from historic values, and the past is no guarantee of the future.
- A stock that meanders from $50 to $80 is likely to have the same statistical volatility as one that goes from $50 to $20 even though it would be hard to argue the former was as risky as the latter.
- A stock that goes from $20 to $80 in a straight line will be described as low risk, but if it suddenly declines from $80 to $50 it will be said to have become more risky. It’s hard to argue the stock is riskier at $50 than it was at $80.

Some efforts have been made to address these issues. For example, the Sortino ratio is a variation of the Sharpe ratio that is only concerned with downside volatility. The idea is that a portfolio shouldn’t be punished massive spikes to the upside.

Seth Klarman has pointed out that these measures are solely concerned with the price action of investments and not with the fundamentals of the underlying companies. For example, the price action of the dotcom bubble made a company selling at 1000x earnings look less risky than a company that had been around for decades. The price action wasn’t telling an accurate story.

### Conclusion

We have talked through a complete performance evaluation, starting from measuring performance, to comparing that performance to a benchmark, to adjusting that performance for the amount of risk taken on. Given a long enough time frame, a great manager is bound to under-perform for extended periods, and a horrible manager is bound to over-perform for extended periods. This makes it difficult to tell which strategies are working because of their efficacy, and which are working because of luck.

The key in separating luck from skill is time. Warren Buffett has often said you don’t see who is swimming naked until the tide goes out. Similarly, we can’t judge the effectiveness of an investment process until it has been tested through multiple market cycles. Whether the investment process produces few massive winners (venture capital), or many modest winners (passive investing), or something in between, the main evaluation metric will always be whether the process is able to outperform an appropriate benchmark over the long run.