Skip to main content

 

I'm finishing up my Masters in Predictive Analytics from Northwestern University and looking for a predictive statistics project.  I work in commercial banking and could stick in that realm, but would love to find an opportunity to work with a team or personal coach in baseball.   I'm not interested in helping a coach just win games.   I can easily mine historical data and help a coach identify weaknesses of their team or an opponents, yet I have no interest in that.  I'm looking for an opportunity to help advance player development, particularly youth players.  I want to help advance player/team development, yet I'm not a coach myself.  

If you are a coach and would be interested in exploring player development with me I'd love to talk with you.  I'm not looking for anything in exchange, just looking for a some interesting ideas that if explored may be able to assist coaches and players advance in the game they love.   

BTW - I'm a long time follower of HSBaseball; however, this is my first post.   You have a great group here.  I have two young boys (12 & 6) that play ball, so I just lurk and learn from you all.   Maybe this could be a way I can contribute back to the community.  

Thanks in advance! 

Original Post

Replies sorted oldest to newest

I’m guessing you’re gonna have one Devil of a time finding many HS teams that have much in the way of in depth statistical data on themselves let alone their opponents, and even fewer at levels below that. Amateur ball isn’t like professional baseball where there is a common database for every game played, with meticulous attention to making sure the data is valid.

 

If you’d like, I’d be glad to share the data I have if you think it would help you.

 

Best of luck!

 

BRAD C - My guess is that Stats is the best source of data of the type you seem to be searching for. 

Not only have we seen a number of his data sets but he knows what he's doing in scoring which means that you will have a pretty good interpretation of what actually happened and not a skewed evaluation.  That will make the data better and thereby any analysis you make hold more validity than it might otherwise.

FWIW - IMO.

Thanks!   STATS4GNATS thanks for the offer.  What's the best way to reach out to you?

Many amateur teams are now using GameChanger or a like app., so data isn't lacking.  However, quality data is much more illusive given that those entering the events are commonly a player parent or someone whose interested are more focused on a dozen other things than data entry.   Thus the desire to working with data much less prone to input error.   One analysis considered is how youth pitchers perform on # days or # hours of rest.   I commonly read of concerns around youth pitch counts and in contrast how many coaches focus solely on tournament restrictions.  Maybe if we can show performance impact of consecutive days pitching coaches will be more open to the conversation.   

Just as an example ...................  Tournament XYZ allows for 12 year olds to pitch a total of 6 innings per pitcher.   Some coaches will pitch Pitcher A two innings on Friday and then start Sunday.   Maybe Pitcher A throws a gem both days, maybe he doesn't.   My curiosity is how does Pitcher A pitch in next weekends tournament.  

Another idea would be impact on batting order............  in youth baseball does batting order become a self-fulfilling prophecy?   Moreover, since batting success is so reliant on a positive psyche do kids that move down in the batting order ever move up?   

Please don't get me wrong I realize that there are numerous influences that impact these variables and thus an indisputable conclusion isn't probable.  However, youth baseball particularly seems to be managed via numerous rules of thumb that would seem to have potential answer, and I have the designated time and incentive to try and look for them.  

I'm looking to the HSBaseball family to provide any ideas they might have for me to chase down.   There are no bad ideas, and don't worry about the data that's my problem.  

Thanks again!

Look for a PM.

 

I like your thoughts about performance impact. I only dabble a little bit, nothing serious.

 

I do quite a bit of looking at performance by batting order position.

 

I’ve also looked at what impacts the final outcome of games, such as strikeouts, free passes, OBP, etc..

 

Looking forward to communicating with you.

Warning so no one faints, falls down and hurts themselves. I'm going to compliment Stats,

To the original poster ...

Stats is a stats junkie. It's his life. He has the ability to turn stats inside out and upside down and see things others wouldn't even think to look for. I will bet he has more data massaged in ways you would never imagine. He has ideas and then uses stats to see if the idea even exists.

RJM,

Thank you for the recommendation.........Stats seems to have a very positive reputation around the site.  He and I have shared contact info.  

 

Everyone else,

I have received some great feedback here thus far, and some fantastic leads via PM already.  However, if anyone has an idea that they are sitting on please share.  The more the merrier as I consider options.  

 

My intuition is that on top of the difficulty of finding reliable/useful data sets, the predictive value of the data is going to be lower the further you get away from adult baseball, probably for all kinds of reasons for which it will be difficult/impossible to control.  If you're really looking for youth data sets, the only ones you're likely to find in the 12yo range might need to come from PG or a similar organization, and I'm not sure if they even have much for kids that young. I used to have (and might still be able to recover) a small data set for some <14yo teams, but I can already tell you anecdotally that it wasn't very predictive of future outcomes for the players involved, and almost certainly wasn't a significant enough sample for making intra-year predictions.

If you wanted to work with HS age data, Stats surely has a bunch, and PG or similar will have as well, if they're willing to share. I would expect that a ton of work has been done on that in various professional baseball organizations as well, so it might be worth reaching out to some of those, or the baseballprospectus.com/fangraphs.com sites of the world.

If you have the ability to do significant data crunching for a professional organization, it's likely to be a labor of love.  Based on previous reports I've seen, they don't pay remotely well enough for the skills required.  I recall a posting a couple years ago where the Phillies were looking for someone to run their data team, and based on the requirements suggested were offering less than half what that kind of person get trivially get on the open market in other industries (per a headhunter for the financial industry).

JACJACATK,

Thanks for the insight!    I completely agree with your thoughts on the data for younger players........ it's going to be messy at best.   The more I explore this with folks the more HS data seems like a realistic starting point.   PG is an intriguing data source.  I'll dig into that.  

Regarding career in MLB, after watching MoneyBall a few dozen times when the movie first came out MLB seemed like the perfect venue to match my passion for baseball and interest in predictive modeling.   Shortly there after I came to the realization that you mentioned .............. subpar pay.    With that revelation I stayed in banking, and now work for the country's largest auto lender doing predictive analytics and helping executives interpret model output.    

The project I'm looking for will help me complete my Masters degree while hopefully allowing me to contribute back to the sport I love.    It would be fun to work with MLB or one of the teams on this project, and maybe since I'm working for free I can find a way to make that happen.   After that, who knows.   MLB teams don't pay much, but maybe I could find a contracted side project every now and then.  

Thanks again for the feedback!  I appreciate it.  

Here is an interesting success story.  Years ago we partnered with Baseball America to do the scouting service called Prospects Plus.  Prospects Plus no longer exists and we do our own calied PG CrossChecker.  Anyway back then we shared the cost of developing a special website for Prospects Plus.  We hired a tech guy from Chicago to develop the site.  When we went on our own, Baseball America hired the computer guy full time to help with Web related things.  In addition to his regular job he showed a passion for baseball as well as quite a lot of knowledge.  From there he joined forces with a project called Baseball Prospectus and became a well known and respected analytics guy.

He now is a front office scouting executive for a MLB organization.  He is making a lot more money in baseball than his original career.

jacjacatk posted:

My intuition is that on top of the difficulty of finding reliable/useful data sets, the predictive value of the data is going to be lower the further you get away from adult baseball, probably for all kinds of reasons for which it will be difficult/impossible to control....

If you wanted to work with HS age data, Stats surely has a bunch, and PG or similar will have as well, if they're willing to share. 

There is a ton of HS data on MaxPreps.  A programmer could make a fairly simple bot that could crawl their website and pull that data down team by team for re-assembly.  Or one could see if they would would give it to you.

Another idea would be to use JC data.  It's probably more reliable. There is a lot here, for example. http://www.cccbca.com/sports/bsb/2016-17/stats

Looking  at how Frosh results predict or don't predict Soph results might be interesting.

Last edited by JCG

Kyle Boddy and Driveline Baseball are doing very cool R&D - might be a way you can contribute with them.

As a strength & conditioning coach I'd like to see a biomechanical leading indicator for UCL damage. Could be beneficial for determining potential preventative measures or corrections, but I don't know where historical data would come from or if there is enough of it. 

A problem with data from a place like MaxPreps is not a lot of predictions can be made from the standard metrics available there. I.e., date, R, RBI, Hits, D, T, HR, SacF, SacB, BB, K, HBP, ROE, ROFC, LOB, SBA, SB are all they have for offensive data points. Even though that isn’t a great deal of information to work with, a sharp guy could glean a lot of information from it except that there’s no way to tell how valid it is.

 

When I say that I don’t mean whether or not the SK knows what s/he’s doing, but rather whether or not whoever is putting in the stats puts in all the possible data points for every game. Nothing requires a coach to put in everything they can, nor is there a requirement for making entries for every game.

 

Here’s an example of what could happen. Let’s say someone wanted to see how many runners are left on base compared to how many runners reach. If you wanted to look at an entire league for example, and only 1 coach didn’t enter LOBs or all LOBs, the final number would be virtually worthless.

 

Here’s a real example. A team has 41 hits, 25 BBs, 8 HBPs, 0 ROEs, and 2 ROFCs. That’s 76 runners. But that coach doesn’t put in LOBs. Another team has 59 hits, 27 BBs, 16 HBPs, 16 ROEs, 10 ROFCs and had 54 LOB. When you put the 2 together to try to see how many runners a LOB compared to how many got on you get 100 Hits, 52 BBs, 24 HBPs, 16 ROEs, 12 ROFCs for 204 runners with 54 LOB for a rate of 26.5% of the runners that reach scored. Trouble is, since the one coach didn’t enter any LOBs, the percentage is useless.

 

Then we’ve got another “issue”. Let’s say the leadoff batter in an inning walks. The next batter hits a ball to f6 and they get a force at 2nd, making the runner reaching on a FC. Let’s say the next batter does exactly the same thing and there’s another force at 2nd, and the next batter does exactly the same thing for another force. How many runners reached on fielder’s choices and how many runners were LOB? The correct answer is 3 and 1, but how many scorers have it as 2 and 0?

JCG posted:

Don't most teams upload their stats from iScore and GameChanger? For those that do aren't those numbers generated automatically by the apps?

 

If you go to a MP page and go all the way to the bottom and go to “Stat Partners”, then go to “Baseball”, you’ll see all the different registered scoring programs that can upload stats into MP. I don’t use any of those programs, am not sure what they generate by default, or if those defaults can be changed. I can’t say how many of those programs generate all of those data points, and I certainly can’t say how many use valid algorithms to compute things like ERs or RBIs.

 

Having said that, I honestly don’t know what percentage of HS teams use a scoring program, but for sure it isn’t  100%, and therein lies the rub. Even if 80% used a scoring program that still leaves one Hell of a lot of teams that don’t. The last I checked MP had close to 20% of all HS teams using their services, which is by far the largest in the country, and I’m sure that share has grown in the last few years. But even at that, there’s big holes in the data. It’s getting better and better, but so far it’s very untrustworthy as far as what Brad is looking to do.

 

Of course the numbers can be run for whatever anyone wants to do with them, but when we’re talking about a thesis, I assume someone will ask about the reliability of the data. And keep in mind I’m just about as big a fan and supporter of MP there is.

 

 

Thanks again for everyone's feedback and ideas!  I'm very appreciate of all of the responses in the forum and PMs that I have received.     

I agree with Stats, many of the scorekeeping apps do not provide event level data.   They do a great job of providing game and season level summary data.   I'm not sure how event level data entry leads to the summary stats they make available.   Moreover the data weaknesses can be significant due to variability in how individuals score games.  As an example, I know defensive statistics is an area of interest for many youth coaches; however, if you pay attention to the way their team's game activities are input into GameChanger players positions are never changed throughout the game.  Commonly players are assigned the position they start the game in and then never change as kids get moved around the field.  

I do believe there are some analysis that can be done that could isolate most input errors, but there will have to be some substantial assumptions made early on.   

BradC posted:

… As an example, I know defensive statistics is an area of interest for many youth coaches; however, if you pay attention to the way their team's game activities are input into GameChanger players positions are never changed throughout the game.  Commonly players are assigned the position they start the game in and then never change as kids get moved around the field.  …

 

It isn’t just youth coaches, unless you consider HSB to be youth ball as well. One of the biggest bones of contention between me and the coaches I score for is defensive movements. Coaches have gotten used to only worrying about pitching lineup substitutions because that’s all the umpire care about.

 

Few seem to understand that in MLB there’s not only an OSK, there are others in the booth who act as spotters. SKs at the amateur level very seldom have that luxury. When you’re scoring for a team on a field smaller than 60/90, in general the teams have fewer players and that makes a heck of a difference. Chances are also that you know the players on much more of a personal level which helps identifying them. But even at that, it’s not “normal” for scorers to annotate every defensive change. Now move to the big field with larger rosters and the job becomes even harder.

 

I know a lot of you are used to seeing players in spankin’ clean unis with numbers on the front and both names and numbers on the back, but that isn’t true everywhere. In fact, even though I might know the players really well, I seldom do defensive stats for summer or fall seasons. The main reason is, HS teams won’t wear game jerseys if they wear a jersey at all, so it isn’t at all strange for players to have no numbers at all. Throw in that teams usually play fast and loose with the rules, often batting 10 or 11 and having players sliding in and out of defensive positions because no one cares about re-entry rules.

 

 

Add Reply

×
×
×
×
Link copied to your clipboard.
×