Jump to content
Talk Sox
  • Create Account

Recommended Posts

  • Replies 113
  • Created
  • Last Reply

Top Posters In This Topic

Posted
But we don't know to what extent unintentional human error plays a role.

 

Some unintentional human error is almost always to be expected in data collection.

Posted
All the data falsifying in the world wouldn't have made Masterson less hittable this year.

 

Good point, but it also makes me wonder what data or stats they were looking at to think that he could be successful on the heels of his horrible 2014 performance.

Posted
We are talking about a radar gun........ they have always been jacked with. Not how many hits someone has. Not how many strike outs.....

 

what crusade are you on tonight Don Quixote??????

 

It's simple, the next time someone like Kimmi slaps him around with advanced stats, he needs the "fake velocity" argument to justify ignoring it.

Posted
Teams can't meddle with the guns and the official data today? Is that your opinion or a fact?

 

Edit: Fangraphs lists pitcher velocity back to 2004. Is that data unofficial? My point is that if the data is corrupted, it is worthless.

 

I am saying that the job of measuring that stuff has been outsourced to Pitch F/X ... who has been in charge of standardizing this stuff ... now could Pitch F/X be corrupted? Well, since another company is actually handing the data collection and distribution that seems unlikely.

 

I think pitching metrics before Pitch F/X type standardization has all the limitations you'd think it would.

Posted
I am saying that the job of measuring that stuff has been outsourced to Pitch F/X ... who has been in charge of standardizing this stuff ... now could Pitch F/X be corrupted? Well, since another company is actually handing the data collection and distribution that seems unlikely.

 

I think pitching metrics before Pitch F/X type standardization has all the limitations you'd think it would.

It should be standardized as should all data collection that is used to compile official stats. The possibility of team meddling should be eliminated otherwise the data doesn't have integrity.
Posted
It should be standardized as should all data collection that is used to compile official stats. The possibility of team meddling should be eliminated otherwise the data doesn't have integrity.

 

I think with an outside company dealing with it that is fine now. It's like Sport VU data with the NBA - more information from a standard data source. Now if only MLB provided the Pitch F/X data as richly as the NBA does with its Sport VU stuff. It's interesting stuff to know - even if the conclusions are questionable.

Posted
I think with an outside company dealing with it that is fine now. It's like Sport VU data with the NBA - more information from a standard data source. Now if only MLB provided the Pitch F/X data as richly as the NBA does with its Sport VU stuff. It's interesting stuff to know - even if the conclusions are questionable.
It is interesting stuff, and I do like to look at the stuff for teams and players that I don't watch very often.
Posted
Pitch velocity is the only stat I can think of that is subject to this, though. Every piece of data in baseball is pretty cut-and-dried. You can't fake the number of at-bats someone has, or the number of hits or RBI. You can't fudge batting average or ERA, because they're based on physical acts that are plain for all to see. A GM or a scout or an agent can't claim a player has a higher average than he actually does. You can't mess with the data on things that happen in a game, because they're recorded.

 

The only two things I can think of, besides pitch velocity (which is meaningless in the grand scheme of things) that are subject to human interpretation are balls and strikes, which don't factor enough into any statistic to be noteworthy, seeing as they have been subject to human error for over a century with little global impact, and scoring decisions on errors vs. hits. That is also negligible, I'd say, since the percentage of scoring decisions that could go either way is low (most errors are quite obviously errors, even to those of us who are not players or personnel). Small changes or ripples tend to factor out over time, that is a principle of many branches of science and history and it is, I think, an excellent principle when applied to baseball.

 

Statistics are real, at least in baseball. Whereas statistics in most fields, like politics or other demographic minutiae, are subject to errors and malicious interpretation, sports statistics are, by and large, pure and unadulterated. If a player has 1000 at-bats and gets 500 hits, he (in addition to being the best batter ever) is batting .500. If a pitcher gives up 3 runs in 9 innings, he has an ERA of 3.00. It's all math, and math is incorruptible.

 

Uh, I think that you forgot about UZR. That data is subjectively compiled by the human eye. So Ted does have a point.

Posted
Uh, I think that you forgot about UZR. That data is subjectively compiled by the human eye. So Ted does have a point.
LOL!! I usually try to have a point. You are one of the people that tries to understand my point (whether you agree with it or not). Much appreciated. :)
Posted
LOL!! I usually try to have a point. You are one of the people that tries to understand my point (whether you agree with it or not). Much appreciated. :)

 

Maybe you remember the discussion we all had regarding UZR about 6 years ago. Not one person could explain how the data was collected. I did my own research. I do understand that the field is divided into 72 or 78 zones or something. That's wonderful. But who or how is the plotting of the ball's trajectory, etc. determined?

 

The best that we all could come up with is the human eye. Maybe that has changed?

 

I don't really care about Gorilla suit antics. But I do question the accuracy and validity of UZR and any other Metric formed by subjective assessment.

Posted

Yeah, defense is one of the areas which I do think the teams have much better information than the public ... UZR is helpful, but the limitations are obvious and good to know. But I think even the UZR makers have pointed that out. You know, UZRs predictive value for a player really needs a couple of seasons (because of the noise that comes with human measurement and such) - although the data over a short time frame does have value for portraying what happened.

 

(i.e. going 0 for 10 goes not mean a guy is a .000 hitter, but it does mean he had a crappy 10 PAs)

Posted
Maybe you remember the discussion we all had regarding UZR about 6 years ago. Not one person could explain how the data was collected. I did my own research. I do understand that the field is divided into 72 or 78 zones or something. That's wonderful. But who or how is the plotting of the ball's trajectory, etc. determined?

 

The best that we all could come up with is the human eye. Maybe that has changed?

 

I don't really care about Gorilla suit antics. But I do question the accuracy and validity of UZR and any other Metric formed by subjective assessment.

 

There is truth here - and I do wonder if UZR is confounded by weird looking OFs, which happen regularly (cough, cough). I mean you look at how consistently Red Sox LFs have been measured poorly over time (whether it be Greenwell, Rice, Nava, Crawford, Manny, Hanley) - I have not checked Yankee RFs though Paul O'Neill graded badly - is it possible that those sorts of OFs fool the measurement system?

Posted
DD recently said that he didn't need to look at any metrics to know that Jose Iglesias is a very good fielder. I think that I will stick with my own test rather than a stat compiled on plotting of data points by individual's. I trust my eyes more than I trust theirs.
Posted
DD recently said that he didn't need to look at any metrics to know that Jose Iglesias is a very good fielder. I think that I will stick with my own test rather than a stat compiled on plotting of data points by individual's. I trust my eyes more than I trust theirs.

 

I'm with you on this.

 

Of course anyone watching should be able to see that JBJ is one very good fielder. Brooks Robinson was a Hoover.

 

The outstanding players are easy to qualify. It's the bulk of the average type players that may be tricky to evaluate.

Posted
I'm with you on this.

 

Of course anyone watching should be able to see that JBJ is one very good fielder. Brooks Robinson was a Hoover.

 

The outstanding players are easy to qualify. It's the bulk of the average type players that may be tricky to evaluate.

And the Hanley's are easy to identify.
Posted

Also, UZR's (or whatever) contribution is less about identifying good v bad defenders so much as being able to actually turn that into some assessment of value. While one did not need metrics to identify that Victorino was a spectacular RF in 2013 - it was a crucial part of a very reasonable (albeit downballot) MVP candidacy that he had. The combination of measuring what a fielder did, as well as how much did it actually matter.

 

One of my favourite examples was Pedroia winning the MVP in 2008 - now voters picked him for silly reasons (Scrappy McTougherson, Red Sox brand etc) - but he really actually was among the very best players in the league that season. The UZR stuff's value I think is being able to put something in the soup which is an improvement from just checking the triple slash (or, yuck, RBIs).

Posted
UZR ratings will probably be done strictly by computer within the near future. The technology just keeps advancing.

 

Maybe so.

 

However, will that make UZR a bonifed valid stat?

 

I still wonder about other aspects of evaluating a fielder with UZR.

 

Are things like wind speed / direction, temperature, humidity, field condition, and trajectory accounted for?

 

In science, they would be.

Posted
Maybe so.

 

However, will that make UZR a bonifed valid stat?

 

I still wonder about other aspects of evaluating a fielder with UZR.

 

Are things like wind speed / direction, temperature, humidity, field condition, and trajectory accounted for?

 

In science, they would be.

 

Trajectory yes. But I think you can only go so far. Most of the things you mentioned also affect hitters and pitchers, and they tend to even out over the season.

Posted
Maybe so.

 

However, will that make UZR a bonifed valid stat?

 

I still wonder about other aspects of evaluating a fielder with UZR.

 

Are things like wind speed / direction, temperature, humidity, field condition, and trajectory accounted for?

 

In science, they would be.

 

There are park adjustments, and certainly trajectory is accounted for somewhat ... and over the number of trials (just estimating 25 balls in play per game x 2430 games) a lot of the noise can be sampled away. Personally it is an improvement over what was previously done (largely, guess - you look at the ghastly gold glove voting) ... but like anything, it's not perfect on its own.

 

http://www.fangraphs.com/blogs/the-fangraphs-uzr-primer/

Posted
If the integrity of the data collected is compromised it is worthless in any statistical projections. I have similar concern abouts about a lot of data collection used in advanced sabremetrics and projections. If human error and bias enter into the data collection process the stats built upon that data are not very reliable.

 

Statisticians do everything possible to eliminate human error and bias. They will never be able to be 100% free from error or bias, but implying that the data is corrupt is nonsense. First off, statisticians will readily admit when there is any kind of noise in their data. Secondly, they are frequently testing how reliable their data is and making improvements from it. If the data is not reliable, they typically don't publish or use it. And if one statistician or group publishes some data, you can be darn sure that it is being cross-checked and re-tested by other statisticians.

 

No stat is perfect, but seriously, the newer stats like UZR are very good, despite the flaws. UZR and +/- are head over heels better than fielding percentage or errors. I can also guarantee you, though I have no proof, that the stats are less biased than the scouts are.

Posted

T

Statisticians do everything possible to eliminate human error and bias. They will never be able to be 100% free from error or bias, but implying that the data is corrupt is nonsense. First off, statisticians will readily admit when there is any kind of noise in their data. Secondly, they are frequently testing how reliable their data is and making improvements from it. If the data is not reliable, they typically don't publish or use it. And if one statistician or group publishes some data, you can be darn sure that it is being cross-checked and re-tested by other statisticians.

 

No stat is perfect, but seriously, the newer stats like UZR are very good, despite the flaws. UZR and +/- are head over heels better than fielding percentage or errors. I can also guarantee you, though I have no proof, that the stats are less biased than the scouts are.

There are hundreds of thousands of data items are collected each year. There is no possible way to cross check that. I am not implying that the data is corrupt. I am restating what Pedro said was Theo's practice regarding pitch velocity. That data was corrupted. I am not implying it. If that data is official data, it is worthless. I know that you are a stat head and that any questioning of the stats is viewed as an unfathomable attack. If I was as big a stat head as you, I would be concerned about the data collection process and data integrity. Double checking? Do you know if there is even a process for that and how it works? As I said, I think it would be impossible to double check given the volume of data. I felt much better about sk's explanation that much of it is done by independent third parties. That's what would need to be done to preserve the integrity and promote consistency in the data collection process. Without consistency and integrity, the data isn't worth spit. I'd like to know more about how the data is collected and who collects it. You may take the information at face value, but I don't. As a tax attorney for more than 30 years, I have seen government tax revenue projections of law changes that I knew were so inaccurate as to be worthless, and I know that they didn't pick the numbers out of the air, but they will never share their assumptions and calculations when making those projections. They rarely if ever turn to the business community to collect relevant data. So, excuse me if I am skeptical of things that others accept as true and accurate. I am not trashing your beloved stats, but just questioning whether an aspect of the science might need attention.
Posted
TThere are hundreds of thousands of data items are collected each year. There is no possible way to cross check that. I am not implying that the data is corrupt. I am restating what Pedro said was Theo's practice regarding pitch velocity. That data was corrupted. I am not implying it. If that data is official data, it is worthless. I know that you are a stat head and that any questioning of the stats is viewed as an unfathomable attack. If I was as big a stat head as you, I would be concerned about the data collection process and data integrity. Double checking? Do you know if there is even a process for that and how it works? As I said, I think it would be impossible to double check given the volume of data. I felt much better about sk's explanation that much of it is done by independent third parties. That's what would need to be done to preserve the integrity and promote consistency in the data collection process. Without consistency and integrity, the data isn't worth spit. I'd like to know more about how the data is collected and who collects it. You may take the information at face value, but I don't. As a tax attorney for more than 30 years, I have seen government tax revenue projections of law changes that I knew were so inaccurate as to be worthless, and I know that they didn't pick the numbers out of the air, but they will never share their assumptions and calculations when making those projections. They rarely if ever turn to the business community to collect relevant data. So, excuse me if I am skeptical of things that others accept as true and accurate. I am not trashing your beloved stats, but just questioning whether an aspect of the science might need attention.

 

What SK posted about independent 3rd parties collecting data and what I posted about statisiticians cross checking the data of others are not mutually exclusive.

 

We are talking about all kinds of data and stats here. Most of the standard stats are pretty straight forward and easily double checked by simply reviewing the game, ie. batting average, OBP, Ks.

 

The data collected by pitch/FX is extremely accurate. It is recorded by 2 or 3 cameras (not radar guns) in each stadium. The cameras are equipped with very sophisticated software that tracks the pitch from release point until it hits the catcher's mitt. Pitch/FX revolutionized the world of pitching data, most notably the pitch framing stats.

 

BIS and STATS INC are two major baseball data collecting third parties. BIS is the one that supplies the data for UZR and plus/minus. There are several other lesser known such third parties that collect their own data, and I'm sure many MLB teams have their own team of data collectors. If one company posts some data that seems out of line with another companies, it will not go unnoticed.

 

Additonally, within any particular third party, they have taken every measure possible to cross check their own data and to eliminate bias and human error. For instance, BIS has at minumum two highly trained video scouts independently watching and charting every play. If their data does not match on any particular play, then additional scouts will review said play. They also rotate their scouts regularly, so that any scout is not always charting the same team or even the same division. This helps avoid biases.

 

Even with all the steps taken to eliminate human error, it will never be 100% eliminated. It can't be. BIS readily acknowledges where its shortcomings lie (with catcher defense, for instance) and these statisticians are constantly reviewing data and finding ways to improve upon it. The company's sole purpose is collecting, reviewing, and analyzing data in an effort to better understand baseball. A big part of that is testing the reliability of their data, which they also do constantly.

 

So yes, I feel very confident about the integrity and the consistency of the data collection process. Certainly, I would say it's, by far, much better than the home team's biased official scorekeeper's ruling on whether something is a hit or an error. For that matter, it's even, by far, much better than an umpire's ruling on whether a pitch is a ball or a strike.

Posted
TThere are hundreds of thousands of data items are collected each year. There is no possible way to cross check that. I am not implying that the data is corrupt. I am restating what Pedro said was Theo's practice regarding pitch velocity. That data was corrupted. I am not implying it. If that data is official data, it is worthless. I know that you are a stat head and that any questioning of the stats is viewed as an unfathomable attack. If I was as big a stat head as you, I would be concerned about the data collection process and data integrity. Double checking? Do you know if there is even a process for that and how it works? As I said, I think it would be impossible to double check given the volume of data. I felt much better about sk's explanation that much of it is done by independent third parties. That's what would need to be done to preserve the integrity and promote consistency in the data collection process. Without consistency and integrity, the data isn't worth spit. I'd like to know more about how the data is collected and who collects it. You may take the information at face value, but I don't. As a tax attorney for more than 30 years, I have seen government tax revenue projections of law changes that I knew were so inaccurate as to be worthless, and I know that they didn't pick the numbers out of the air, but they will never share their assumptions and calculations when making those projections. They rarely if ever turn to the business community to collect relevant data. So, excuse me if I am skeptical of things that others accept as true and accurate. I am not trashing your beloved stats, but just questioning whether an aspect of the science might need attention.

 

FTR, any questioning of stats is not viewed as an unfathomable attack. However, when there is questioning of stats, I am going to offer my defense, especially if I feel the questioning is unwarranted.

 

That said, I will never buy into anyone's belief that he/she does not need stats to fully understand baseball. I don't care how long you've been watching and how well you know the game. Not pointing to anyone in particular here, since as far as I know, everyone here understands the importance of stats, just to varying degrees. And that also goes the other way - you can't fully understand baseball by stats alone.

Posted
What SK posted about independent 3rd parties collecting data and what I posted about statisiticians cross checking the data of others are not mutually exclusive.

 

We are talking about all kinds of data and stats here. Most of the standard stats are pretty straight forward and easily double checked by simply reviewing the game, ie. batting average, OBP, Ks.

I clearly wasn't talking about these stats. I was talking about data used in advanced Sabremetrics, not old school stats.

 

Additonally, within any particular third party, they have taken every measure possible to cross check their own data and to eliminate bias and human error. For instance, BIS has at minumum two highly trained video scouts independently watching and charting every play. If their data does not match on any particular play, then additional scouts will review said play. They also rotate their scouts regularly, so that any scout is not always charting the same team or even the same division. This helps avoid biases.
This is good information -- something that I had not previously heard or read.

 

The company's sole purpose is collecting, reviewing, and analyzing data in an effort to better understand baseball. A big part of that is testing the reliability of their data, which they also do constantly.
THis means nothing to me. Volkswagen's sole purpose was to make automobiles that comply with local regulations.
Posted
TThere are hundreds of thousands of data items are collected each year. There is no possible way to cross check that. I am not implying that the data is corrupt. I am restating what Pedro said was Theo's practice regarding pitch velocity. That data was corrupted. I am not implying it. If that data is official data, it is worthless. I know that you are a stat head and that any questioning of the stats is viewed as an unfathomable attack. If I was as big a stat head as you, I would be concerned about the data collection process and data integrity. Double checking? Do you know if there is even a process for that and how it works? As I said, I think it would be impossible to double check given the volume of data. I felt much better about sk's explanation that much of it is done by independent third parties. That's what would need to be done to preserve the integrity and promote consistency in the data collection process. Without consistency and integrity, the data isn't worth spit. I'd like to know more about how the data is collected and who collects it. You may take the information at face value, but I don't. As a tax attorney for more than 30 years, I have seen government tax revenue projections of law changes that I knew were so inaccurate as to be worthless, and I know that they didn't pick the numbers out of the air, but they will never share their assumptions and calculations when making those projections. They rarely if ever turn to the business community to collect relevant data. So, excuse me if I am skeptical of things that others accept as true and accurate. I am not trashing your beloved stats, but just questioning whether an aspect of the science might need attention.

 

As far as the radar gun thing goes, it has never been a secret that different guns will give you different readings. The location of the gun in the stadium will also affect the reading. That said, over the course of enough innings and enough pitches in different stadiums, even that data will tend to become fairly reliable. You will have a good idea of how hard a pitcher can throw.

 

As YOTN said, a pitcher's velocity is a stat that is more or less used in and of itself. A pitcher's velocity is not used in determining pitch framing, ERA, K rate, WAR, or anything else where a faulty reading will skew that data.

 

This is an area where a scout would be a better source than the data anyway. One pitcher's 90 mph fastball might look like 95 mph, depending on deception and differential with the offspeed stuff.

Posted
FTR, any questioning of stats is not viewed as an unfathomable attack. However, when there is questioning of stats, I am going to offer my defense, especially if I feel the questioning is unwarranted.
Questioning is never unwarranted. If there is an answer that addresses the concerns, it should be offered, but that never invalidates the question. Just the way you phrased this ^indicates that you think any questioning of the process is an attack.

 

That said, I will never buy into anyone's belief that he/she does not need stats to fully understand baseball. I don't care how long you've been watching and how well you know the game.
I don't know who said this. I certainly did not. I have been pouring over stats before your parents met. LOL! Because I look at stats, I want to be able to rely on them, which is why I was concerned about the possibility of a team's ability to manipulate them. I don't need stats to tell me about the Red Sox, because I watch almost every single inning of every single game. That is not true of the other 29 teams and 725+ players.
Posted
As far as the radar gun thing goes, it has never been a secret that different guns will give you different readings. The location of the gun in the stadium will also affect the reading. That said, over the course of enough innings and enough pitches in different stadiums, even that data will tend to become fairly reliable. You will have a good idea of how hard a pitcher can throw.
This I already knew. What was news to me was that team personnel like GM's could intentionally manipulate the readings.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
The Talk Sox Caretaker Fund
The Talk Sox Caretaker Fund

You all care about this site. The next step is caring for it. We’re asking you to caretake this site so it can remain the premier Red Sox community on the internet.

×
×
  • Create New...