I do think watching nearly every game of one team does give the TV viewer a pretty good idea of who is good- bad, great or crappy, but just like stats and metrics, it is flawed, for many of the reasons you mention.
My biggest beef with "eye test" proponents is that Horrible, bad, okay, good and great are all relative terms in MLB. I'll use the defense of Bogey as the example,
Bogey is a damn good fielder compared to everyone who every played or tried to play baseball, but in MLB he can be viewed as anywhere from near average to bottom tier. While it might be true, in our own mind that Bogey is a "good fielder," what good does that label mean, if 20-28 SSs are better than him? Is that really "good?"
Now, the next point: no single person watched nearly every play of every team, so how can we determine just how Bogey compares to other SSs?
We simply cannot by using only the eye test. If we watch nearly every Sox game and just a few other games, we may not even see some SSs play, at all. Others, we may see 3-6 games, each. A few others more than a dozen games, tops.
While metrics are highly flawed, they do inspect every play or every game in a way that attempts to be impartial and calibrated to be as consistent as possible, unlike home field scorers, who routinely assign errors and non errors in seemingly haphazard ways.
In defense of watching games on TV, often a replay will show a play from another angle that does capture the jump a player gets and how much ground he covers to make the play. This works better for IF'ers than OF'ers, because angles taken, jumps off the crack of the bat and the fielder's speed is harder to judge than an IF'ers.
The fact that DRS, OAA and UZR/150 often vary by wide degrees shows none are perfect, but to me, looking at all three, together has to be better than the eye test, when a blindfold is over our eyes for most of 28 out of 30 teams, each day.