I've been out of the Linux world for several years, but if memory serves the whole kernel updating process is very human-centric.
A user envisions some new piece of code to update, patch, or otherwise modified the Linux kernel; they write the code; the submit it to the parties responsible for maintaining that portion of the kernel; it gets reviewed by the maintainers and any other developers subscribed to that mailing list; after the review and comment period the code is either incorporated or discarded.
As a free and open source package "maintained" by thousands of developers all over the world, I'm not sure how you remove people from the process. In fact, the scale of involvement is intended to dilute the human error component, with the idea being that SOOOOO many people are looking at it, somebody is bound to find a problem if it exists. And that's what these people were trying to show. That the worldwide review process has a flaw. It's not just any flaw, though, they are showing that the fundamental principal on which so many Linux users operate is flawed. You don't just go to an amorphous group of people and say "Um, yes, hello...your entire operational concept is fundamentally flawed. You should fix it." and expect a meaningful response.
So I agree with IRstuff here that the only practical sandbox to test it in is THE sandbox. I suppose you could randomly find a cross section of the Linux developers out there of various skills and experience and have them review a piece of code, but how do remove the bias of these people knowing they're involved in research of some kind which would likely increase their awareness and how do you prove to the rest of this ecosystem that the fact that you slipped your malware by these 30 people means that it would slip by the 30,000 who might have an opportunity to look at it if you released it into the wild (I'm making up numbers - a quick google search didn't give me anything useful)?
There's another problem - if you go through the trouble of proving that this is a problem but fail to convince the ecosystem that it's real, all you've done is pointed malefactors to a vulnerability.
So was this a good idea? No, probably not. But is there a plausible end goal with good intentions without a good alternative? I'd say yes.