You're not wrong, but it's a touchy subject. Gender issues are always tinged with "discrimination" overtones.
And, just to be clear, there is _plenty_ of discrimination and bias against women. But that shouldn't detract from the fact that you're pointing out a major outlier in their data.
Unfortunately, without knowing the test, it's hard to even guess what happened there. Is it focusing on syntactic vs. semantical correctness? Does it require deep specialist knowledge, or broad generalist knowledge? How is it graded?
Or is it simply a function of the tiny sample size?
Without pointing out what you conclude from that data, people will infer what you concluded. And since it's a loaded subject, they will often arrive at "what a misogynist jerk" without ever knowing what you were trying to say.
Corollary: When pointing out data on a loaded subject, it might be beneficial to point out what you conclude, and why. At least that way, only one of the two factions can flame you ;)
(Disclaimer: I am a woman, so I obviously disagree with the "women can't code theory". I'd still love to find out _why_ women are filtered out by that test so disproportionately, simply because it might hold clues as to why women are not interested in CS)
Also, the sample size is crap and the deviation is crap relative to the sample size, so you can't make very many truly meaningful observations from it.
diolpah, my grandparent here is why you were downvoted. Nowadays, the atmosphere around anything related to gender is super-charged, and people are hyper-super-extra-sensitive to anything that could even slightly be construed as debasing towards women, so even something as simple as observing measured data sounds to them like overt sexism.