UnMixed Motives
Building a more transparent, equitable, private, and useful search experience for today’s Internet
In 1996, two computer science PhDs published a paper discussing the state of web search. Among issues like scaling and search result quality, the students discuss the impact of advertising-based business models. They expect these models would “be inherently biased towards the advertisers and away from the needs of the consumers,” resulting in a subpar user experience.
In the paper, the duo consider a hypothetical search for “cellular phone.” The first result given by a prototype ranking algorithm built by the students to measure the objective quality of a site is an oft-cited study detailing the risks associated with using a cell phone while driving. They question whether such a study would appear in the results for an advertising-funded search engine. And they end their discussion by stating “the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm.”
The authors of this paper, The Anatomy of a Large-Scale Hypertextual Web Search Engine, were Sergey Brin and Larry Page. They incorporated Google two years later in September 1998.
Of course, the sentiments that Brin and Page expressed in this paper did not last long. Google began to sell ads in 1999 and then it was off to the races. [1] By 2006, Fred Wilson of Union Square Ventures was dissecting the impact of digital media’s disaggregation on content creation, editorial, distribution, and revenue generation. He noted “the first step in the content consumption process (finding the content you want) is being monetized by Google and Yahoo! and others, not by the companies that are producing the very content you want to find.” [2]
Today, Google’s advertising business model is so successful that it receives approximately 52% of global digital ad spend, drawing away revenue from the publishing industry. [3]
More recently, a law was passed in Australia which seeks to even the playing field between Google and local publishers. The News Media Bargaining Code aims to “address a bargaining power imbalance” between large technology platforms and news publishers by requiring the former compensate the latter for news content made available or linked on their platforms.
While I disagree with this law’s execution (linking on the web should be free), the existential angst underlying the law is palpable. While Google has helped many smaller publishers increase their distribution, its domination of the advertising market has also put them at risk of extinction.
Fortunately, we are in the midst of a paradigm shift for how content is owned and managed on the web. I believe that new primitives introduced by web3 and decentralized networks will bring about a new kind of search engine that is more transparent, equitable, private, and useful to the web’s stakeholders.
The importance of transparency in search is perhaps best summed up by Brin and Page themselves (again, in that 1996 paper):
Since it is very difficult even for experts to evaluate search engines, search engine bias is particularly insidious. A good example was OpenText, which was reported to be selling companies the right to be listed at the top of the search results for particular queries [Marchiori 97]. This type of bias is much more insidious than advertising, because it is not clear who "deserves" to be there, and who is willing to pay money to be listed. This business model resulted in an uproar, and OpenText has ceased to be a viable search engine. But less blatant bias are likely to be tolerated by the market. For example, a search engine could add a small factor to search results from "friendly" companies, and subtract a factor from results from competitors. This type of bias is very difficult to detect but could still have a significant effect on the market. [Emphasis is mine; 4]
Decentralized protocols are by definition open source and thus transparent. Ranking algorithms used by a decentralized search engine could be audited by stakeholders. If some bias is introduced, then at least its impact on results would be clear to its users.
Regarding equity: above, I’ve discussed how Google’s ad-based business model evolved at the cost of user experience and publisher sustainability. What is most exciting about the paradigm shift brought about by decentralized networks is the movement towards stakeholder ownership of protocols. Ownership begets power, and aligns interests. [5]
Rarible, an NFT marketplace, rewards its most active creators and consumers with protocol ownership. Mirror is a publishing platform owned by its writers. The Graph, which “does for blockchains what Google does for search”, rewards its network stakeholders (a teeming ecosystem of indexers, curators, and delegators) with ownership as well.
The next wave of aggregators are owned by their stakeholders, and have the potential to create superior stakeholder experiences as a result. A decentralized search engine could be owned by searchers and publishers, who could thus prioritize their interests over those of advertisers or some detached corporate entity.
And on the topic of user interests: users are increasingly demanding search privacy. Two months ago, privacy-focused search engine DuckDuckGo hit an all time high number of queries per day. And just this week, privacy-focused browser Brave announced it was launching its own privacy-first search engine.
Also this week, Google announced that it would stop tracking individuals to sell ads, a response to continued critiques from champions of data privacy and regulatory scrutiny.
While at first seeming to mark a shift in alignment towards their users, many were quick to point out that Google had just found a more effective and arguably more dangerous way to sell advertising based on your movements across the web. In other words: no matter the PR spin, the interests of the corporate shareholders will always win out over those of the stakeholder.
For the stakeholder-owners of a decentralized search engine, enforcing a privacy-first experience would be a no-brainer.
Lastly, usefulness. Google Search is, of course, an extraordinarily useful product. But there are an increasing number of search use cases where Google falls short. If you’ve been in crypto for a while, you probably primarily use Twitter to get the day’s news, and a mix of Twitter, Medium, Substack, and podcasts to learn about new technologies. You probably rarely use Google Search or News, and when you do you’re disappointed.
There is so much high quality content in crypto - why is it so hard to find? This paradox has frustrated me for years. It would be much easier to do a single search than to spend hours weeding through various social media platforms.
There are probably a few reasons why Google Search for crypto is so bad. I believe it revolves around the fact that most content within crypto is user-generated. What’s interesting here is that most “user” publishers have some level of expertise - they are the developers who invented the technology, the engineers building upon it, or the researchers and investors who have thought deeply about it.
So when I’m searching for information in this space, I’m much less interested in asking “what is this thing?” than I am in asking “what do the people who know a lot about this thing think about it?” I want to read what Vitalik Buterin has recently proposed regarding Ethereum scalability, not rote definitions of Layer 2 scaling solutions. Google is extraordinarily good at answering the “what is this thing?” question. It’s less good at answering the “what do the people who know about the thing think about it?” question. Why?
First off, user-generated content (UGC) is unlikely to be SEO-optimized, so it’s less likely to show up highly ranked in search results.
Second, the quality of UGC is measured in the same way as all other content when being ranked by Google. [6, 7] But the way we, the searchers, assess the quality of UGC is fundamentally different than how we assess the quality of a static website. The proxies for quality of UGC are follower count, likes, and shares. We’ll also likely consider the user’s proximity to our social circle. These metrics are not prioritized by Google’s ranking algorithm.
Perhaps most importantly, Google is not necessarily incentivized to optimize ranking for UGC. Ranking a Tweet more highly will funnel more web traffic to Twitter, a direct competitor to Google in the market for ad spend.
Crypto is only one space in which this paradox - an abundance of high quality UGC with a poor search experience - exists. More entrepreneurs and investors are open sourcing their learnings and theses. [8] Writers, journalists, and creators are increasingly defecting from established media outlets to go out on their own or join loosely structured cooperatives. As we move closer to a decentralized media landscape, this issue will continue to grow.
I am hopeful that emerging primitives for web3 will allow us to better measure and rank quality of content in response to a search. A creator’s identity (whether mapped to their real world identity or not) is, for instance, always tied to what they produce. Web3-based content and its associated metadata are open sourced and portable. This data can be read by anyone, not just those sitting within the silos of the corporate social media conglomerates.
The data might be more effectively scrutinized and ranked in response to a query. There are already protocols (The Graph, for instance) which have successfully incentivized the provision of human expertise on certain topics (protocol data) for the benefit of a larger audience. Competition is inherent - anyone can build a better subgraph if they see the opportunity to do so. I’m excited for the next iteration of this model, in which experts are incentivized to curate content on a topic for the benefit of a large, information-hungry audience.
In their 1996 research paper, Brin and Page titled the appendix discussing ad-based business models “Advertising and Mixed Motives”. The beauty of stakeholder-ownership is that it clarifies motives. It puts stakeholders in control, and spotlights their interests. Had they been born 20 years later, the Stanford PhDs who condemned ads in 1996 might have executed their vision upon web3 rails. While I don’t doubt Google will be around for quite a while, I think it’s time they faced some competition.
[1] Wikipedia. Google.
[2] AVC. Disaggregated Media (continued) - The Rise of the Ad Networks. Fred Wilson.
[3] Wall Street Journal. Google to Stop Selling Ads Based on Your Specific Web Browsing. Keach Hagey and Sam Schechner.
[4] Stanford University. The Anatomy of a Large Scale Hypertextual Web Search Engine. Sergey Brin and Larry Page.
[5] The Ownership Economy, written by Jesse Walden at Variant, is an excellent articulation of this idea.
[6] Google. How Search Algorithms Work.
[7] Search Engine Journal. Google Doesn’t Treat User Generated Content Different From Main Content. Matt Southern.
[8] ARK Invest has, for instance, open sourced its financial models for Square and Tesla on Github.