Leon Yin is an award-winning data journalist at Bloomberg. He builds datasets and develops methods to investigate the impacts of technology. He writes Inspect Element, a practioner's guide to auditing algorithms. His work has been cited by legislators, the academy, and popular media. In 2023, "Still Loading" was recognized by IRE with a Philip Meyer Award for the best use of social science research methods in journalism. Leon got his start in news at The Markup, and his start in research writing Fortran scripts at NASA.
Still Loading (2022)
Aaron Sankin and I found four ISPs charge the same price for drastically different internet speeds based on where you live.
In cities across the U.S., neighborhoods that were historically redlined, lower-income, and had the highest concentration of people of color were disproportionately asked to overpay for slow speeds.
Building off a technique by Princeton researchers, I found and used undocumented APIs to collect +1M internet plans. I merged socioeconomic data from the U.S. Census and digitized redlining maps. We wrote a story recipe to guide reporters to localized data, and a "Build Your Own Dataset" guide for others to reproduce our study with little-to-no coding.
The project was honored by a Scripps Howard, SIGMA, NABJ, ONA, SABEW, and Philip Meyer Awards.
Amazon Brands and Exclusives (2021)
My co-author Adrianne Jeffries and I found Amazon gave its own branded products an advantage over better-rated competitors in search results.
I trained a random forest to predict which product Amazon placed on top of thousands of popular searches.
The investigation was cited by the House antitrust committee in a letter to Amazon, and received a Gerald Loeb Award for personal finance and consumer reporting in 2022.
I was lead engineer on Amazon Brand Detector, a Firefox and Chrome extension that helps shoppers spot Amazon branded products.
YouTube's Keyword Blocklist for Ad Targeting (2021)
I found an undocumented API and worked with civil rights groups to audit YouTube's in-house brand safety tools.
My co-author Aaron Sankin and I found racial justice phrases like Black Lives Matter were blocked, while hate terms like White Lives Matter were not.
In fact, we found that YouTube only blocked one-third of well-known hate terms from advertisers. Removing spaces from phrases circumvented the block in almost every instance.
In 2022, the series was part of a portfolio honored by NABJ for best practices reporting on algorithmic bias.
Counting pixels on Google Search (2020)
I developed a staining technique to audit Google search results.
My reporting partner Adrianne Jeffries and I developed a categorization scheme for all the things found on Google Search.
We found Google's own products and answers covered 41% of the first page.
Our research was cited in the congressional subcommittee hearing on Big Tech and antitrust.
In 2021, "Google the Giant" was a finalist for a Gerald Loeb Award in explanatory journalism.
Citizen Browser (2021)
I contributed to an ambitious project to distribute a privacy preserving app to collect Facebook data from a national panel. I built the redaction system and data pipelines alongside Micha Gorelick. Alfred Ng and I debunked Facebook's promise to stop Political Group recommendations. Our story was cited by Senator Ed Markey who demanded answers from Facebook for their broken commitments. The project received an Edward R. Murrow Award in Innovation.
Google Keyword Planner (2020)
Building off Safiya Noble's book, "Algorithms of Oppression: How Search Engines Reinforce Racism", and Latanya Sweeney's, "Discrimination in Online Ad Delivery", I developed an audit of Google Ad's Keyword Planner with my reporting partner Aaron Sankin. We found hundreds of pornographic keyword suggestions for Black, Latina, and Asian girls, but no results whatsoever for "White girls". The story was featured in a NOVA documentary.
The Internet Research Agency: Hyperlinks, News, and Marketing Tools (2018)
How impactful was "fake news" in foreign info ops during the 2016 U.S. Presidential Election? I analyzed hyperlinks to Junk, National, and Local news sources sent by accounts released by the Senate Intelligence Committee and Twitter's Elections Integrity initiative. My analysis reveals the surprising role of local news, group identity, and free marketing tools in info ops.
Disinfo Doppler (2018)
An open source computer vision toolkit used to trace and measure image-based activity online. Designed to assist evidence-based reporting and reduce vicarious trauma amongst ephemeral spaces rife with coordinated hoaxes, harassment campaigns and racist propaganda.
Reverse Image Search (2017)
A demonstration of a simple, robust, and scalable reverse image search engine that leverages features from convolutional neural networks and the distance returned from the K-nearest neighbors algorithm.
United States Place Sampler
Need a random sample of U.S. addresses? Partnered with Big Local News to simplify that process.
Now it's easier than ever to get reciepts from streets to test for disparate outcomes.
Despite having the largest userbase amongst American adults, YouTube is a social media platform that is often overlooked in academic research. youtube-data-api is a Python client to make this data source more accessible, while introducing new applications and methods to analyze this platform.
urlExpander is a Python package for quickly and thoroughly expanding shortened URLs. Marketing and analytics services like bit.ly are great for tracking engagement. However, these services obfuscate the destination of URLs for social media analysts.
If any of the projects are dated or contain inaccuracies please let me know via email or an issue on GitHub :)