To receive industry-leading AI updates and exclusive content, sign up for our daily and weekly newsletters. Learn more
Matt Schumer, co-founder and CEO of OthersideAI, also known for its flagship AI assistant lighting product. Hyper Lightis breaking nearly two days of silence after being accused of fraud when independent researchers were unable to reproduce the purported best performance of a new large-scale language model (LLM) that it announced on Thursday, September 5.
On your social network X account, Schumer apologized. He then claimed they had “gotten ahead of themselves,” adding: “I know many of you were excited about this possibility and are now skeptical.”
But his latest statement He claims that his model, the Reflection 70B, is a variant of Meta’s Llama 3.1. Glaive AI, a synthetic data generation platform; All subsequent independent testing has not come close to matching the initial claims, and Schumer hasn’t revealed exactly what went wrong. Here’s the timeline:
Thursday, September 5, 2024: Initial lofty claims that the Reflection 70B will perform better in benchmarks
Last week, Schumer released the Reflection 70B. Open source AI community Hugging FaceHe calls it “the best open source model in the world.” In X’s post They then posted graphs that they claim are cutting-edge results in third-party benchmarks.
Schumer claimed that this impressive performance was achieved through a technique called “reflective tuning,” which allows the model to evaluate and refine the accuracy of its responses before outputting them to the user.
VentureBeat interviewed Schumer and is accepting the benchmarks he presented and giving him credit for it, since we don’t have the time or resources to run our own benchmarks, and most of the model providers we’ve covered have been forthright so far.
Friday, September 6th – Monday, September 9th: Independent evaluation fails to replicate Reflection 70B’s impressive results, Schumer accused of fraud
However, just a few days after their debut, over the weekend, an independent third-party evaluator Open source AI community posts on Reddit and Hacker News Some began to question the model’s performance and were unable to reproduce it themselves, and some found answers and data indicating that the model was probably just related to a thin “wrap.” Referencing Anthropic’s Claude 3.5 Sonnet model.
Artificial Analysis, an independent AI evaluation organization, Testing the Reflection 70B posted on X The score we got was significantly lower than HyperWrite originally claimed.
Schumer also Turns out he’s invested in GraveHe said the company used synthetic data to train the model, but did not provide details at the time of the Reflection 70B’s release.
Schumer blamed the discrepancy on an issue with the process of uploading the models to HuggingFace, and promised to correct the models’ weights last week, but has yet to do so.
One X user, Shin Megami Tensei, publicly criticizes Mr. Schumer On Sunday, September 8, Schumer did not directly respond to the accusations of “fraud in the AI research community.”
After posting and reposting various X-messages related to Reflection 70B, Schumer went silent on Sunday evening, and by the evening of Tuesday, September 10th, he had not responded to VentureBeat’s request for comment or made any public X-posts.
moreover, AI researchers such as Nvidia’s Jim Huang pointed out: Even less powerful (low parameters and complexity) models were easy to train to perform well on third-party benchmarks.
Tuesday, September 10: Schumer responded and apologized but did not explain the discrepancy.
Schumer finally They released a statement regarding X tonight at 5:30pm EST. Apologies, in part, “We have a team working tirelessly to understand what happened, and we will determine what to do next once we know the truth. Once we have all the facts, we will continue to be transparent with the community about what happened and what we will do next.”
Schumer also It links to another X post by Sahil Chaudhary, founder of Glaive AISchumer previously claimed the platform was used to generate synthetic data for training Reflection 70B.
Interestingly, Posted by Chaudhary He also noted that it’s still a mystery why some of the responses from Reflection 70B say it’s a variant of Anthropic’s Claude. He also acknowledged that “we have not been able to reproduce the benchmark scores shared with Matt to date.” See the full post below.
But Schumer and Chaudhary’s response wasn’t enough to appease skeptics and critics, including co-founder and CTO Yucheng Jin. Hyperbolic LabOpen-access AI cloud provider.
Jin wrote: Long post about X He hosted a version of Reflection 70B on his own site, detailing how hard he worked to troubleshoot the supposed error, saying, “I spent so much time and energy on this that it took a toll on me mentally, so I tweeted what my face looked like over the weekend.”
He also responded to Schumer’s comments: Reply to X, “Hi Matt, we’ve invested a lot of time, effort, and GPUs into hosting your models, and it’s unfortunate that you’ve stalled out on replies for the past 30+ hours. I think you could be more transparent about what happened (especially why your private API performs so much better),” he wrote.
As of tonight, many people, including Meghami Boson, are not convinced by Sumar and Chaudhary’s version of events, which paints the story as one of mysterious and still-unexplained errors born out of enthusiasm.
“As far as I’m concerned, either you’re lying, Matt Schumer is lying, or of course you’re both lying,” he wrote in a post on X, posing a series of questions. Similarly, the Local Llama subreddit doesn’t believe Schumer’s claims.
Only time will tell whether Schumer and Chaudhary can provide a satisfactory response to critics and skeptics, who are growing across the online generative AI community.