Skip to main content

Reply to "Poll - Trout, Cabrera or other for AL MVP"

Here are some observation from the sport writer talking about this subject. This is by no means to bash or support weighted stats. I just want to bring these out to educate ourselves what is behind some of the numbers:


Take Trout’s base running, for example. The “Moneyball” paradigm is sometimes associated with de-emphasizing the value of the stolen base. In large part, this is because being caught stealing hurts a team about twice as much as a successful stolen base attempt helps it. Thus, a player who steals 20 bases, but who was caught stealing 10 times, provides little added benefit to his club.

comment ==> weighted stat - one fails stolen base is equal to 1 successful stolen base


One of these systems, Ultimate Zone Rating, estimates that Trout saved the Angels 11 runs with his defense in the outfield. Cabrera, a clumsy defender at third base who is more naturally suited to play first base, cost the Tigers 10 runs with his.

comment --> Weighted stat - translating another raw stat into "effective" runs

Between his defense and his base running, therefore, Trout was about 35 runs more valuable to the Angels than Cabrera was to the Tigers. By contrast, the 14 additional home runs that Cabrera hit (44 against Trout’s 30) were worth about 22 extra runs for the Tigers, based on measures that convert players’ contributions to a common scale.

comment ==> another weighted stat - translating 14 HR into 22 extra runs via some "common scale"


Didn’t Cabrera also hit for a higher batting average? Yes, but barely: he hit .330 against Trout’s .326. And Trout had the slight edge in on-base percentage, .399 to .393.

comment ---> Good raw stat


Although there are statistical formulas to adjust for these “park effects,” it is now also possible to measure the impact of ballpark dimensions through a visual inspection of the data.

comment ===> visual inspection of data translating into weighted stat?

Of the 159 home runs hit at Comerica Park this season, for example, about 20 or 25 were not hit deep enough to leave the field at Angel Stadium, according to ESPN’s Home Run Tracker. Another 15 or 20 would have been borderline cases.

comment ===> Did they take into trajectory of the HR ball, the spin of the ball, the wind speed, the direction of the wind, the temperature of the air, the humidity of the air, and the gravitation force at the stadium. This would definitely influence to flight path and how long it will stay in the air.


If all these interpretation and translating of weighted stat into a common scale for comparison are 100% correct, for example translating passed balls into how many dropped fly-balls, HR into effective runs, defensive catches into negative run-score, I would argue that one can create a computerized model that can predict all the past mlb baseball games by playing the effective runs of one team versus another team, the defensive converted to runs stat of one versus the other team and come out with a predicted winner of s game. Time is just another variable in the model. If we can predict the past outcome of baseball games, then we can extend forward to prediction for future games. Wouldn't it be nice we can say that for 80% confidence level, the Giants with this starting line playing at home will beat the Tigers by 8-6 on May 14th at the night game by the bay with temperature at 55 degree and wind speed of 20 knots?

If a model could not predict, it is an absolutely useless model. It's just a way to look at the model and see how pretty the model is. And how pretty is in the eye of the be-holder. So beware if someone say this is a 100% proven model but the only weakness is that it could not predict.
×
×
×
×