top of page

Research Rebuttal Paper Uncovers Misuse of Holocaust Datasets

viewphoto.jpeg

Selin Deniz AkdoÄŸan

​

Melkior Ornik is a professor in the Department of Aerospace Engineering at the University of Illinois Urbana-Champaign. He is also a mathematician, who believes in the integrity of using hard science in public discussions. News of a pair of researchers who developed a statistical method to analyze datasets and used it to purportedly refute the number of Holocaust victims from a concentration camp in Croatia caught his attention, and he decided to study the research further. He re-analyzed the same data from the United States Holocaust Memorial Museum. Then, he wrote a rebuttal paper debunking the researchers' findings, which is published in the same journal as the original article. "As scientists, as engineers, I think it's our duty to correct flawed and faulty science," Ornik said. "There is so much effort to get the public and policymakers to believe in science that when a math expert says they have proof, it brings credence to the argument. But when their claims are demonstrably not true, it's not good for science and it's not good for society. That's why it's especially important for scientists to challenge false findings when we discover them."

​

Ornik stated that some people spread the idea that concentration camps either didn't exist or weren't used to kill people, or that the current data of victims have been substantially inflated, while most historians do not take the claims seriously in light of vast available data and evidence.

​

"For the authors of the original paper to claim that they have found mathematical proof that the list of victims of that camp was fabricated has obvious historical implications," Ornik said. "I think, to some extent the damage has already been done, but I felt the need to go on record with the assumptions, inaccuracies, and misuse of the raw museum data I found in the original research."

 

Ornik stated that he does not dispute the merits of the method to identify anomalies across a set of histograms presented in the original paper, just its application to the Jasenovac concentration camp. The researchers indicated in one example that a smaller list naturally had a lesser outlier score, yet when they compared scores across victim list sizes to imply that the one connected to Jasenovac, one of the largest, was problematic. This made Ornik dubious of the paper's conclusions.

 

"I started looking to see if there was some sort of a bias for the size and whether they were actually more likely to assign the flag of being problematic to a larger list or not. And it turns out, despite the authors' claims, they were," Ornik said. "The bigger lists are more likely to be computed to be problematic than the smaller lists when their method is applied to the data."

 

"When you look at data, a collection of anything, and you want to figure out an outlier—something that's different—you need to assume that all of the pieces of data come from the same source, the same distribution. Take a list of victims by birth year. It would yield a graph of the ages of each person. Say 10 percent are older than 70 years old. Now, that distribution wouldn't be true for a list of deported children, for example, because that list, by definition, is structurally different. It is also different from a list of everyone who has an identity card. Identity cards are issued only to people who are not children. Yet, the lists that these researchers worked with came from a multitude of sources and include lists of children, lists of people getting married, lists of prisoners of war—things that by definition cannot have come from the same distribution."

 

According to Ornik, another key flaw in the original study was that certain duplicate listings were considered as two independent lists. This meant that around 67 percent of their database was made up of sub-lists of the main list.

 

"The 7,000-plus lists published online by the Holocaust Museum are not curated," Ornik said. "For instance, there are two lists that contain exactly the same data; one is in Cyrillic and the other one uses the Latin alphabet. But they treated them as two separate lists. There are other lists that contain the same name, but there is no way of knowing if they are the same person or two different people born on the same day with identical names. They could have removed the very egregious errors in which a list is clearly duplicated but the rest, you would need access to the original historic data."

​

References

Larson, Debra. “Research rebuttal paper uncovers misuse of Holocaust datasets.” University of Illinois Grainger College of Engineering, 2021.

© 2024 by Math Club. 

bottom of page