Tournament Scoring Discussion

jjscud

Active member
Aug 18, 2004
3,797
165
43
Tournament Scoring Discussion

Ok, rather than take over bassano's area 85 tournament thread, I am going to keep this separate.

tldr version: There are inconsistencies in the scoring system used in MF tournaments, What could be done to improve it? Note: This discussion focuses on a number of nuances, if you only read the tldr version, you're probably missing the point.


So what is the goal of this discussion? Unfortunately, that's a tough question. I'm leaning more and more towards a refresh of the scoring system, even in the MFO's over the traditional scoring, but I also don't have plans to run any tournament in the near future and its debatable whether D2 has anything other than a near future. I do by and large agree with Nightfish's thread and its somewhat unlikely that any game made in the current game making mindset will be able to displace D2.

Ok, so when it comes to scoring there are a few basic feelings that people have:

1. If its rare, the player should be rewarded.

2. If the probability says its rarer than item x, it should score more (possibly proportionately more).

3. It should take more than 1 extremely lucky find to win the tournament, that's why we take the top 5.

Those are the math free reasonings. Here are a few of the arguments that affect these.

[highlight]Double Point vs. Half point Scoring[/highlight]

There are three base elite uniques that have two unique counterparts that drop with equal rarity, Spired Helms, Thundermauls, and Mighty Scepters. The double point scoring argument is that the ATMA/GoMule odds represent the probability of getting any one of those items and so their score should be based on that probability. Half point scoring argues that the base item drops with a certain probability and after that its a coin flip as to which one you get. Awarding the double points is basically saying the player wins the coin flip if they get head or tails. Its a guarantee so its silly to award extra points for it.

[highlight]Items with a single base but uneven odds[/highlight]

These items (Lightsaber, Legendary Mallet, Scourge and Sacred Armor) also have two uniques for a single base item but the odds are not split evenly. In the case of Tyrael's Might, these odds are very disproportionate and so it is ridiculously rare. There are several problems with this. First, based on pure probability, the common item from the set is still rarer than other items from its TC, but should you really be rewarded for getting the likely, easy outcome?
Second, While this appears to be different from the items with equal probabilities above, over the long term its not, the expected score from getting 9 sacred armors is within a few points of the expected score of getting 9 thundermauls with double-point scoring.
Third, the resulting score for Tyrael's Might is so high that it basically can't be beaten, changing the "top 5" into a "top 1" in all but one instance.

[highlight]Item Domination[/highlight]

This is a theoretical pet peeve of mine. What if there there 12 different thundermauls instead of 2. Because of the commonness of the base item, many would be found each tournement and the winner would always be the one who found the most (111.6 points each), even Tyrael's could only break ties. Now, this is theoretical, and will almost certainly never happen, but it emphasizes the weakness of probability based scoring. The points rewarded for items don't actually represent the probability of getting those points, which has been the goal. This presents its self in a couple other places, a jewel for instance scores 2.7 in the traditional system, but if each of the 8 facets is counted separately, then they would each score 21.6 and if included scoring would be based on finding the top 4 or 5 items + jewels, silly stuff like Death's Fathom would be completely irrelevant. Also, if you look at the traditional scoring system you'll see that the tc 78 weapons outscore the tc 84 weapons, why, tc78 is more likely but there are more weapons to divide those odds amongst so each item is rarer, though getting one is more likely.


[highlight]So... What to do?[/highlight]

I don't exactly know. Probability of a given item dropping doesn't accurately represent the probability of scoring points, but what does. I suppose we could try to answer that directly but for once, I'm not sure where to begin.

We could make a purely arbitrary scoring system with all the qualities we want, but that could be difficult to agree on and justify.

Beyond that, I'm interested in suggestions and other general discussion.


Greebo said:
Maybe, if you feel like it, you could start a thread asking for feedback and then make a decision?.

jjscud said:


mmm, looks like its words for lunch today.
 
Re: Tournament Scoring Discussion

[highlight]Addendum: Mathematical Errors[/highlight]

So far as I understand, the current score sheet has a simple error in calculation for a handful of items. These are not high-scoring items, so correcting this would probably not impact tournament results (at least among the high-scoring competitors).

Greebo said:
At the very minimum I think it'd be good and non-controversial to change Templar's to 12.0, Horizon's to 6.8, Stone Crusher to 5.8 and Lightsabre to 3.1.

Greebo's spreadsheet showing this (note the values in red):
tc87.jpg

I motion that these be fixed, regardless of what we do with the overall issues presented above.
 
Re: Tournament Scoring Discussion

[highlight]Addendum: Mathematical Errors[/highlight]

So far as I understand, the current score sheet has a simple error in calculation for a handful of items. These are not high-scoring items.

Actually, my error was with the high scoring items. I meant to deduct the high and low pairs so that the low pair scored similar other items in its tc. Of course that was an arbitrary decision and could be corrected either way, but it was deliberate, not an error.



 
Re: Tournament Scoring Discussion

How about a scoring system based on TC?

I'm leaning more towards that more than anything else right now. But I would still like to reward the rareness of class specific and staff type items and possibly a few other considerations.

I'm thinking about pulling together the mfo scores from the last few years and applying different schemes to them to see what happens when other scoring methods are applied, perhaps I will give this a shot.


Another thing I've thought about is increasing the number of scored items. This would lessen the effect of a lucky find and reward effort, but still not eliminate the luck factor.



 
Re: Tournament Scoring Discussion

Allow me to use the thing about amulets to make a more general point:
amulets.jpg


Now, if we were looking just at the probabilities, pretty much all the scores in any competition would have a minimum of 5*15.6, as we'd find five of some cool amulets together. Not good.

Finding a unique amulet takes 1:11496, only a tad less likely than the basis of our scoring, which is IKSC. If there were only one unique amulet, it'd be worth 1.04. I kept one more digit, seeing as rounded down you'd not see the difference.

My idea: finding a unique amulet should give EXPECTED SCORE of 1.04. The scores of each amulet should be adjusted, so that Metalgrid is worth 10 times more than Nokozan. That gives values on the right.

Seems reasonable? It is, I think.

My main argument in the discussion is that the same should have been done to the 4 pairs that jjscud mentioned. Including Tyrael's. Similar table
tc87.jpg


I think the 4 items in green are given too many points same way as giving 15.6 for Mara's would work. It's just that there are 12 amulets but only 2 Sacred Armors, so the ratio is by 2, instead of by 12.

-------
My thoughts:
(1) Add jewels, rings, amulets. Adjusted as presented above, so Metalgrid = 3.2.
(2) Keep the rest the same, except 8 items, where adjust according to the principle that if you find 9 Sacred Armors, they should be worth 9*10.7 points on average. So Tyrael's = 48.2, Templars = 6.0. Etc for the other 6 items.
(3) Other stuff, as you see fit.
 
Last edited:
Re: Tournament Scoring Discussion

My opinion is that Greebo has it exactly right. The Expected Value of an item should be the chance of the base item dropping. This is the only thing that makes sense to me. Finding a Unique Sacred Armor should be worth exactly 10.7 (well, rounding errors etc) points on average, but when you get lucky and find Tyrael's, you obviously should get more (48.2).

If I was running an MF competition tomorrow, that's what I would do. Adding Jewelry seems unnecessary to me, as does TC3s (as discussed in the other thread), but isn't a big deal either way to me.
 
Re: Tournament Scoring Discussion

But why draw the line at unique item base types?

As I mentioned above, the same thing is happening with overly populated TCs. Ultimately I think you either draw an arbitrary line or end up at the tc based scores.
 
Re: Tournament Scoring Discussion

I'm not sure I understand your point here. Can you give me an example using real items? If certain TCs are overpopulated, does it not mean those items will drop less frequently, and thus award more points when they do drop? Is this not exactly what we want?
 
Re: Tournament Scoring Discussion

Well... I kinda see what you mean jjscud, I think. I may be wrong.

What lies at the other end of this spectrum?

Not actual proposal, hypothetical scoring system
- We score all items
- At all items we look at the kind of probability that they drop with, assign points accordingly
- Sum the above

What do we get? We get a competition where we measure how many monster we killed. Essentially it boils down to that.

MFO scoring system is a measurement of luck. You improve your chances by whatever means, but at the end of the day the luckiest one wins. It's always been that way, and I think it should stay that way. It's the spirit of the competition. We WILL HAVE TO draw the line somewhere, as you put it. We never checked for how lucky the roll was, that was one of the lines we drew.

My argument was based on something else, and I'm pretty sure everyone who was following this discussion got it perfectly well.

-----
What would you rather count if not actual unique items? TC classes they come from? That's moving more towards scoring Tyrael's and Templar's equal, and way beyond that actually. I don't think that's what MFO is about.

Perhaps you should explain more clearly what would be your exact proposal, so that we can respond to some actual system rather than general statement. I fear there's confusion going around.
 
Re: Tournament Scoring Discussion

I'm not sure I understand your point here. Can you give me an example using real items? If certain TCs are overpopulated, does it not mean those items will drop less frequently, and thus award more points when they do drop? Is this not exactly what we want?

If a TC is over populated it means that a given item will drop less frequently. but it doesn't mean that the TC will be selected less frequently. I don't have enough base item information so this will have to be semi-hypothetical.

Lets assume that tc78weapon is selected with 1/10 odds from a given monster. From the same monster, tc84 weapon is selected with 1/15 odds. However, if there are 6 items in tc84 weapon and 13 in tc78weapon then the odds of a given item are 1/90 and 1/130 respectively. The tc78 weapon is notably rarer, scores more points but its easier (more common) to get a tc84 weapon.

The numbers are made up but the situation is real, Doombringer, (tc78), scores 5.7 points while Frostwind (tc84) scores 4.6 and Tombreaver (tc 81) scores 4.3. This is basically an identical situation to the items with a common base. The score for some items are getting a boost not because they drop less often, but because there are more of them to split up the odds.

Greebo said:
MFO scoring system is a measurement of luck. You improve your chances by whatever means, but at the end of the day the luckiest one wins.

I disagree, the luckiest one is, as often as not, the player who does one run and finds a medium point qualifier and still ends up in last place. For something as long as the MFO, only a Tyrael's can win it with one find,and most of the time (even most of those that have found Tyrael's) the finds show that the still winner put forth a substantial effort. We certainly don't want it to end up a monster count, but we don't want the other end of the spectrum either, a simple dice roll.

Greebo said:
Perhaps you should explain more clearly what would be your exact proposal, so that we can respond to some actual system rather than general statement. I fear there's confusion going around.

Sorry if you think I'm trying to lead this all to some precalculated point but I'm not. I see lots of imperfections and am trying to find the best way to deal with it.



 
Re: Tournament Scoring Discussion

I meant more that it's much harder to understand your point if you don't give an example, even if said example is not an exact proposal. I never meant to imply that you're leading this discussion somewhere.

------

OK, thinking some about this, I arrived at some thoughts that might be good if shared:

(1) Perhaps the starting point of the scoring system should be TC system, not Elite S/U, or whatever. Perhaps we should assign score for items from TC69 or higher (the TC with IKSC). TC60 or higher. Whatever. Corpsemourn is TC66 and Exceptional. I see no reason it shouldn't be worth _some_ points.

(2) We could adjust the score of all items in a TC in a similar manner that I suggested score adjustment for amulets. Essentially, if tc87weapon has 4 items, divide scores by 4. if tc84weapon has 17 items, divide the scores by 17. After all this is done, readjust so that lowest scoring item is worth 1.0.

The problems I have with this is
- I don't know how to do this and I don't feel like learning how to do this
- How do we count items that have only set version, or no set/unique version?

Just some thoughts I felt might be good to put out there.
--Greebo
 
Last edited:
Re: Tournament Scoring Discussion

Very interesting discussion, thanks again. I like Greebo's earlier "expected value" theory for the uniques... and if i'm understanding, the previous post hypothesizes expanding that to create an "expected value" for each TC, then weighting the item drops based on rarity within that?

The way it sounds in my head makes sense to me and I like it... but I don't know how well that comes across in writing.

As something toward an example with very rough figures, let's say TC 75 has an expected value of 4.3 (andariel's visage in A85). Let's say for the sake of argument there are 14 S/U in TC75 and we could plot out for 1000 S/U drops from that TC how many of each item we would find (or do you need a larger sample of ALL drops since the proportion of items with S/U versions vs failed S/U drops varies by TC?). We use those ratios to weight the scoring such that the average TC 75 drop is worth 4.3 - somewhat lower for andariel's, verdungos, etc, and quite a bit higher for azurewrath (i don't have enough of the figures in front of me to complete the example)...
 
Re: Tournament Scoring Discussion

I propose the following process:

1) Calculate the probability for each scoring item to drop as a decimal;

2) Invert this number to derive a raw score;

3) Divide this score by the raw score for Immortal King's Stone Crusher;

4) Divide by the total number of unique/set items for the given item type (i.e. "half scoring");

5) Divide by the ratio between the number of base items in the treasure class and the number of base items in the smallest relevant treasure class.

These last two steps will remove the distorting factor of base items with multiple uniques or sets, and treasure classes with more base items than others. As near as I can tell, these are the two things causing all the bugbears that led to this discussion.
 
Re: Tournament Scoring Discussion

I think Jason's approach is the most mathematically sound, although it seems possible that the TC based approach would arrive at the same spot.

Greebo brought up two good points. First, sets, I think these are easily dealt with by looking at the basic odds of rolling a set rather than rolling a unique and applying that to the set item scores. However, this has always been done with 0% MF as the baseline but nobody actually plays that way. MF's diminishing returns don't affect sets as bad as they do uniques and most people MF with a good bit of MF gear on so set item's points have always been artificially high, based on this (just a little bit probably). I think this would be easily improved by basing it on an arbitrary MF amount (probably around 400%).

The second point is a bit more difficult to address.

Greebo said:
How do we count items that have ... no set/unique version?

The obvious answer is we don't but more to the point, should the chances of hitting a unique be considered. Going back to my made up TC example. TC 78 had a 1/10 chance of being selected and tc 84 had a 1/15 chance of being selected. I could simply assign 10 points to tc 78 and 15 points to tc 84. However, Lets further refine that and say that 4 of the 6 tc 84 items have a unique version and 5 of the 13 tc 78 items have a unique version.

Now lets say we found 30 items, 3 would be expected to be tc 78 and 2 should be tc 84. A quick look may indicate that my expected scores are 3 * 10 = 30 points from tc78s and 2 * 15 = 30 points from tc84s. This is what I would like to see, equal possibilities of scoring points. Of course, the standard top 5 scoring would favor the rarer tc 84s, as it should, because they are more points per item.

But, when we consider the actual uniques in the tcs this changes. The tc 78's score becomes 3 * (5/13) * 10 = 11.5 points and the tc 84's points are 2 * (4/6) * 15 = 20. So its actually about twice as easy to score points off of tc 84 items (remember, this is my made up example, not actual numbers).

To me it makes sense to take that into consideration too, multiplying by the inverse of the uniques to base items in a tc ratio, though this does look funny on paper. Using the above we would get 10*(13/5) = 26 points per tc 78 unique found and 15*(6/4) = 22.5 points per tc 84 unique found.


Unfortunately, I think I've hit about the end of what I feel comfortable with until I carefully go over the basics of item generation again.
 
Re: Tournament Scoring Discussion

Aw, shucks, I got referenced here. :)

As for the matter at hand, even though I can't see myself participating in any such tournaments, I'd suggest to keep things as simple as possible while still making sense. Since you asked for opinions, here's one. You did ask for opinions, right? I didn't actually read your post because I was much more interested in what I have to say than in what anyone else has to say. :badteeth:

Maybe that's just me, but if I had to read a huge wall of text just to figure out how scoring works in your MF tournament, I probably wouldn't care that much anymore. Personally, if I had to do it, I'd probably start from Greebo's table and assign each item an integer number based on it's rarity. I guess one thing you'll have to decide is what your goal is. Do you want a highly accurate representation of item's rarity in your scoring? You probably don't since you said you don't want tyrael's to dominate everything.

Personally, what I'd like to see as a final result if it were my tournament is something like this:

Item 1: 475 Points
Item 2: 450 Points
Item 3: 445 Points
Item 4: 445 Points
Item 5: 435 Points

You know, something that's easy to do the math with and that I can take one look at and see what it's about. I'd start with the least rare item that counts in my competition and then work my way up from there. Give it a baseline score and then add 5, 10 or 25 points as you go up, depending on how big the gap to the next item is. If it's very small, like, 5% or whatever, I'd give both items the same score.

Doing it like this sort of ensures that one Tyrael's Might does not win you the competition because you've got a hard cap on how much more valuable an item can be. Of course it does not accurately represent the true epicness of very rare finds, but if you did that you'd be right where you started, with once-in-a-lifetime finds being instant winners.

Note that I never aimed at being perfectly accurate. My goal was to get items in order according to rarity and weigh them with rare items being worth slightly more than less rare items.
 
Re: Tournament Scoring Discussion

The final result will be something like you mention, NF. This is a thread about getting there and why.

I think that 400 MF is a bit high. I'd choose 200-250 for a "neutral" number. That's easy to get on a non-sorceress.

The drop mechanics are tricky to understand for me. I need to read a lot. If I'm understanding this right, jjscud, you want to take probabilities of selecting each of weap87, armo87 and so on, apply them to items inside (with bonuses for class specific, etc.), readjust for #items in each "chest" (like weap87 is a "chest"), readjust for #uniques in the chest and then readjust for mulitple uniques of the same item type.

Did I get it right?

--Greebo
 
Re: Tournament Scoring Discussion

[highlight]Set Items Issue[/highlight]

Let's treat everything as if it were unique. Then we apply some penalty to Set Items. With 200 MF:

IKSC is 1:4642, Windhammer is 1:13095. The ratio is 0.3545.

Let's just multiply the scores of all the set items by that at the end of whatever method we come up with, before normalization to IKSC, obviously. Or whatever comes out the "easiest" item, we will normalize to that one. Though we might want to normalize to IKSC for tradition's sake.

With 0 MF that ratio is 0.4011.

With 400 MF that ratio would be 0.3211, so it would lower the scores of Set Items by further 10% as compared to 200 MF.

My reasoning for sticking to 200 MF is that some of the non-sorceress runners will compete with such amount. The "Rule of Thumb" developed by NF, I think, was to get to 200MF and higher if it makes no difference to your kill speed.

--------

[highlight]The Issue of Scoring[/highlight]
You can find the spreadsheet that I'm using here.

I went through TC69 and higher, which I think should be the base for scoring, but lower TC's can of course be added.
Atomic%20TC%2069-87.jpg


I did this by hand, based on tables like this one:
weap87%20n%20armo87.jpg


Both of those can be found in the excel sheet I linked to.

So, jjscud, could you please give a real life example on how you would score:
- Windforce and Steelrend
- Astreon's
- Gris' Honor and Gris' Valor
- Earthshifter
- Tyrael's & Templar's

I think the entire issue can be pretty much covered by discussing these 8 items. I also think with the table I've just made it shouldn't be very difficult, unless I misunderstood you.

At this point, let's not worry about normalization at all. We'll get some numbers and only care about ratios of them to each other. I think we can handle that.
 
Re: Tournament Scoring Discussion

I would like to add my personal opinion too.
I am with jjscud in all points beacuse I think that actual scoring system is good and it worked lot of years without problems.
every MF tournament have some factor which include some luck and not only MF tournaments but lot of sports e.g. too

I do not agree that some of that score should be changed according to exact drop probability beacuse even Eart Shifter and even Cranium Basher and even another TC87 weapons are the same in this way. I do not know why finding unique Thunder Maul should automatically give you twice point then unique Hydra Bow or Berserker Axe or other weapons. Probabilities for drop like that are same and if you will have more points fr Thunder Mauls it will be only factor to randomize score list more then now. Same with Templars or Horizons or another beacuse finding unique Sacred armor give you advantage just in sense that one of 10 drops like that will be Tyrael and another point benefit for Templar is overpowered imho beacuse once again Sacred Armor dopr is same probable as Myrmidon Graves or Diadem drop.

If you want to put some factor which reduce luck more, I do not think that it should be something based on number of monsters you killed or number of runs you did. My opinion is to keep this system beacuse sometime you are lucky and sometime another player is lucky, and if you will do enough MF competitions with simmilar number of runs with winner, you will get your deserved victory soon. Beacuse statistic work :)
 
PurePremium
Estimated market value
Low
High