ML4Sci #12: Thoughts on COVID-19, Scientific Gatekeeping, and Substack Newsletters

Challenges and Opportunities for Scientific Communication in the Information Era

May 25, 2020

Hi, I’m Charles Yang and I’m sharing (roughly) weekly issues about applications of artificial intelligence and machine learning to problems of interest for scientists and engineers.

If you enjoy reading ML4Sci, send us a ❤️. Or forward it to someone who you think might enjoy it!

Share ML4Sci

As COVID-19 continues to spread, let’s all do our part to help protect those who are most vulnerable to this epidemic. Wash your hands frequently (maybe after reading this?), check in on someone (potentially virtually), and continue to practice social distancing.

This essay was inspired by Jeffrey Ding’s reflections on gatekeeping in his ChinAI Substack newsletter, which covers China-US tech policy.

Thoughts on COVID-19, Scientific Gatekeeping, and Substack newsletters

“There are decades where nothing happens, and there are weeks where decades happen.” -Vladimir Lenin

COVID-19: A Catalyst for Experimentation

COVID-19 has forced experiments with new technologies and tools and may have pushed many nascent technologies pass the “valley of death” between early adopters and mass consumption. For instance, telemedicine, ecommerce (see below), and new scientific collaboration tools have all taken off during the global pandemic.

2PM @2PMinc

10 years vs. 8 weeks

In addition to serving as an accelerant, COVID-19 has also laid bare many fault lines in the scientific world. For the first time, we are seeing the collision of a global pandemic with the (mis-)information age. Suddenly, the general public is intensely interested in the preprints coming out of BioRxiv and scientists are finding that the scientific community is woefully unprepared to cogently communicate the pitfalls of the scientific process. As a result, we’ve seen the proliferation of “coronavirus influencers” i.e. people with no medical background writing on medium about coronavirus research and models, as well as the spread of misinformation online about fake cures and treatments. The deluge of papers is proving difficult for even scientists to keep up with, which has led to the creation of new tools to analyze and mine the flood of papers coming out.

Now, we see the inadequacies of our scientific communication infrastructure laid bare when quite literally millions of lives are at stake. There are several distinct problems:

lack of infrastructure for rapid, robust, yet high-quality and open-source peer review to filter preprints
lack of infrastructure to aggregate useful papers and for scientists to identify and discuss early trends
lack of infrastructure to communicate science, with all its nuances and pitfalls, to the broader public and news media [1]

Lessons learned from the Machine Learning Community

Of course, another field that has broad public interest, rampant misinformation and misconceptions, an overwhelming growth in papers, not to mention enormous financial incentives, is artificial intelligence, particularly where deep learning is concerned. [Somehow, all of these manage to be encapsulated in the persona of Elon Musk, who regularly tweets about terminator-esque AI, has developed a highly-valued software-driven car company with AI as a core competency, and co-founded an AI institute (OpenAI) that publishes a breath-taking amount of high-quality papers]

Elon Musk @elonmusk

Anonymous bot swarms deserve a closer examination. If they’re evolving rapidly, something’s up.

The broader scientific community might be able to learn from how the ML community has used open science principles to streamline review processes, while avoiding their mistakes. For instance, OpenReview is a great, open-source, peer review process for ML conferences that allows everyone to view the reviewer comments and author responses. The preprint and open-source culture of ML research certainly has begun permeating other fields, particularly those at the interface of ML4Sci.

However, the enormous financial incentives around publishing new state-of-the-art models and fame associated with incremental achievements on benchmark datasets are still altering the publishing and review landscape. See “Troubling Trends in ML Scholarship” or “Peer Review in NLP: reject-if-not-SOTA” for more.

Building digital gatekeeper communities

Fingerspitzengefühl: literally, "finger tips feeling"

It seems to me the problem is that we are missing a middle abstraction barrier. We have the very low barrier-to-entry of preprint publishing; basically anyone can put something on Arxiv. On the other hand, we have the very high barrier-to-entry of peer review publications, which (as of now) happens behind closed doors and can take months. In both cases, the scientific writing is formal and difficult to parse for anyone not in the field. What we need is to build infrastructure that can help improve people’s situational awareness of published literature.

Currently, Twitter is the trending solution to this problem. Arxiv-sanity, an improved UI for Arxiv, has a tab specifically for preprints trending on Twitter (as does Rxivist, the BioRxiv equivalent). Evidently, the alternative to peer review is a papers popularity on Twitter. Yet anyone who has spent any time on Twitter knows its a terrible platform for scientific discussion: 280 character limits, complicated threading, prone to “wisdom of the crowd”/mob mentality and “rich get richer” network effects. The social media sphere is also easily infiltrated by those seeking to promote misinformation[@China][@Russia]. (imagine if someone built Twitter bots to promote their own research and “game” the system….or more likely, nation-states doing so to promote their own “national” research and increase prestige)

What we need are gatekeepers of knowledge, who aggregate and identify high-value information and articles. More long-form and conducive to substative discussion than Twitter, but also faster than peer review. We also need people to provide high-level overviews and opinions on the trajectory of a field. What might that look like?

Well, hopefully that somewhat reminds you of this newsletter! But the point of this essay is not just a circuitous plug for my newsletter, but rather to advocate for the development of an ecosystem of career-advancing, financially feasible, scientific writing focused around what might be best called “mildly opinionated reviews and analysis” of a particular field. My hope is that newsletters like this (or potentially other forms of media content that have yet to be created) can help fill the middle gap by providing aggregation of both peer-review publications and preprints, as well as analysis and opinions on emerging trends of research.

This would mirror similar shifts occurring throughout the digital information economy. For instance, A16Z, a prominent venture capital firm in Silicon Valley, has written about the growth of the “passion economy”, akin to the gig economy for creators (as well as their investment into Substack). [2]

To see the opportunity presented by these new business models, we need to look no further than journalism. Plummeting ad revenue due to COVID-19 has hollowed out newsroom journaling, which has traditionally been driven by classified ads. Yet many journalists who found themselves recently unemployed have found a new business structure for themselves on Substack. Perhaps its time for scientists to similarly evolve new “business models” and career paths?

In the past, the only way to make it in the research sciences was to grind it out into a post-tenure professorship or else be consigned to perpetual post-doc purgatory. The tightening funding pool and intense publish-or-perish culture has contributed to a mental health crisis in graduate students, who are supposed to be the next generation of our best and brightest minds.

One piece of the solution to all these problems may be to create new career paths. The resurgence of subscription-based content, driven by an era where trust is the new currency, makes such roles financially feasible for newly minted Ph.D’s looking to do something besides academia. Such gatekeepers can be outward-facing (to provide an accessible person-of-contact for reporters looking to actually understand the science and direction of a field, to communicate in times of crises or combat misinformation put out by corporations or other groups, etc.) or inward-facing (providing high-level overviews of fields, resources for scientists looking to learn new skills e.g. ML/AI, high-level analysis of new techniques in recent literature, etc.)

The transition will be hard. The impact-factor metric is deeply ingrained with academic prestige. I don’t know what the solution to this is (maybe someone will one day cite an issue of my newsletter?), but I am confident that the current publishing schemes we have now, even with preprints and open peer review, are not sufficient for the challenges posed by the information era. [3]

Science and technology are becoming more deeply entangled into our society; we need scientists who can clearly communicate their nuances and pitfalls to policymakers, newswriters, and the general public. Science and technology are becoming more complex and specialized; we need scientists who can step back and study entire fields in their breadths, reflect on trends and identify salient papers, and share those with stakeholders.

Finally, we need diverse gatekeepers. Science always thrives in open discussion and debate. To have a single authoritative voice hemming in an entire field will lead to groupthink and stale science. In other words, gatekeepers need peer review too, but probably not the anonymized, closed-door, synchronous kind we have now. The term “gatekeeper” might conjure up the image of a walled garden, with a singular authority that determines who comes in and out. But what we really need is an ensemble of gatekeepers who debate and build off of each other, providing back and forth. [4,5]

This culture of open debates over blog posts is already embedded in ML e.g. the bitter lesson”, Deep RL doesn’t work yet, etc. We need more of that in the sciences, and preferably in a more structured format than random personal blogs, but until recently, have lacked the infrastructure or incentive to do so. Now, new technology services and business structures. like Substack and Revue, as well as the growth of other mediums like podcasts, have made it more feasible both technologically and financially.

Conclusion

Science is becoming more complex, more important to public policy, and just bigger. We can’t just keep trying to drink from the flood of preprints. We need a diverse set of trusted voices who can summarize and aggregate trends, particularly in fields that are closest to the public’s mind e.g. epidemiology, environmental science, public health, AI, etc.

There is plenty of room in this growing field. We are still in the early stages of the creator friendly economy. There are so many different ways of doing this: different audiences, verticals, monetization schemes, etc. The scientific community is desperately in need of new solutions to the age old problem of communication and trust. For instance, some friends of mine recently started a substack newsletter about battery research. What will you start?

[1] To be fair, much of the misinformation online is not the fault of the scientist: larger geostrategic considerations and domestic political polarization are dominant factors. With that said, we still have a responsibility to continue to try and identify new, better ways of communicating in the new information age.

[2] Some people might think of Medium as a salient example of subsciption-based monetization of content creation. But I think writing on Medium is a little too close to being paid for making tweetable articles (one title that popped up on my feed: “How I went from zero coding skills to data scientist in 6 months”). Medium is definitely an improvement and some of its publications offer nice tutorials (I’ve also published on Medium, so I don’t want to be too unfair). But I think the incentive structure of Medium is too misaligned with the objectivity required of science, as I think anyone who has really perused Medium will agree.

[3] As this was going to press, I found a great Nature Index article that discussed several exciting new preprint projects aimed at filling this “middle abstraction barrier”. The ideas all look great and I’m definitely excited to continue tracking progress in these projects. However, these solutions are all some variant of “open-sourcing peer review for preprints”. This an undeniably an incredible important step forward (honestly can’t express how badly the journal-based peer review system is in need of an overhaul), but I still believe that, in conjunction with these initatives, we need more casual, long-form, field-level analysis and discussion, rather than article-specific review (important as it is!).

[4] As an example, see Jeff Ding’s discussion on how gatekeepers shape policy debate in the China-US tech sphere.

[5] Some people might take issue with the mixing of opinionated gatekeepers and science. But when we consider something as broad as an entire field, say, ML4Sci, then naturally people will have different opinions, be excited about different things, and believe the future lies in different places. The important thing is that we acknowledge what backgrounds we come from and never cease to welcome open debate and rebuttal.

📰In the News

Science

New NLP, data mining, open-science tools to keep abreast of the flood of COVID-19 papers

Fermionic neural-network states for ab-initio electronic structure

Review of techniques for nowcasting rain forecasts from radar

A Survey of Deep Learning for Scientific Discovery (nice survey of deep learning overall, with some nice scientific examples and hints for starting ML projects + additional resources. Also co-authored by Eric Schmidt, co-founder of Google)

Google datacenters use carbon-intelligent computing

Recognizing Industrial Smoke Emission (with open-sourced dataset)

Technical Report from DOE on Basic Research Needs in SciML

ML/Industry

A nice, short essay on Google, Deep Learning, and Monopolies

Troubling trends in ML scholarship

Ever wonder why tech companies spend so much money on AI research? Facebook AI research does a blog post on detecting hate speech and showcases all the state-of-the-art language models they’ve developed

AI Economist: developing equitable tax policies with AI (by Salesforce)

Thanks for Reading!

I hope you’re as excited as I am about the future of machine learning for solving exciting problems in science. You can find the archive of all past issues here and click here to subscribe to the newsletter.

Have any questions, feedback, or suggestions for articles? Contact me at ml4science@gmail.com or on Twitter @charlesxjyang

ML4Sci