Leon Yin

Computer science is sudo science.


I'm an investigative data journalist at The Markup, a non-profit newsroom specializing in data, privacy, and technology. I gather datasets, run experiments, and build tools to investigate social and technical systems. Before joining the Markup, I was on a similar beat as a research scientist working alongside political scientists at NYU and STS scholars at Data & Society. I started my career writing Fortran software to analyze oceanographic data at NASA after receiving my BS from NYU in 2015.


The Internet Research Agency, Hyperlinks, News, and Marketing Tools

How impactful was "fake news" in foreign info ops during the 2016 U.S. Presidential Election? I analyzed hyperlinks to Junk, National, and Local news sources sent by accounts released by the Senate Intelligence Committee and Twitter's Elections Integrity initiative. My analysis reveals the suprising role of local news, group identity, and free marketing tools in info ops.
Read the Report
Identifying Local News Outlets

Language Models

How can we use text from vast quantities of unlabeled documents? Researchers from fast.ai and UW suggest that deep recurrent language models learn useful word representations for an array of NLP tasks. This project was my intro to PyTorch, with re-usable code for pre-processing text, loading data, initialting, training, and evaluating bi-directional LSTM neural networks.
Jupyter Notebook

Reverse Image Search

A demonstration of a simple, robust, and scalable reverse image search engine that leverages features from convolutional neural networks and the distance returned from the K-nearest neighbors algorithm.
Jupyter Notebook
Presentation at PyData 2017

Fwd: My Great New Friend

What cultural biases do ML algorithms pick up on? I trained a character-level recurrent neural network with one Long Short-Term Memory layer, on 2000 emails from the Enron corpus to finish lines in a love poem. The model picked up corporate culture, and rambled endlessly about "the company", and "compensation". In collaboration with Constant Dullaart and Rhizome. Presented at the New Museum for The Making of Natural Language.
Jupyter Notebook

Are US Legislators Ideologically Polarized?

A timeseries visualization of legislator voting history using DW-Nominate, a metric of the liberal-conservative spectrum.
Jupyter Notebook
Ideological Polarization of Congress JFK-2014

Who is on the Receiving End of Tax-Payer Dollars?

Government contracts are available to the public on USASpending.gov. In this notebook I show how to download records, and aggregate financial data from the US' largest private prison systems. This may someday become a Twitter bot.
Jupyter Notebook 1 2 3
Plotly CoreCivic contracts by state

What Research Does the NSF Support (and How Much)?

NSF grants are available to the public and contain rich metadata. For this project, I ingest XML files into SQLite tables to power dashboards and wordclouds. I look into funding history, and on-going projects from several notable Oceanographers.
Jupyter Notebook
Plot.ly 1 2
d3.js Network Graph (a welcome mistake)

Open Source Software


Despite having the largest userbase amongst American adults, YouTube is a social media platform that is often overlooked in academic research. youtube-data-api is a Python client to make this data source more accessible, while introducing new applications and methods to analyze this platform.
Github Repo
PyPi Page


urlExpander is a Python package for quickly and thoroughly expanding shortened URLs. Marketing and analytics services like bit.ly are great for tracking engagement. However, these services obfuscate the destination of URLs for social media analysts.
Jupyter Notebook Quickstart
Github Repo
PyPi Page

S3 Helper

A high-level Python AWS-cli wrapper to smooth workflows with private data stored on s3 cloud storage. This Jupyter notebook showcases the module's ability to stream csv and json files to Pandas dataframes, and save Scikit-Learn models to s3 buckets.
Jupyter Notebook Tutorial
Github Repo
PyPi Page

Data Pipes and Web Scrapers

Coming soon!

  • Local News Dataset
  • Linktree
  • Much more!

  • Software and analyses adopt historical mistakes and bias. If any of the projects are dated or contain inaccuracies please let me know via email or an issue on GitHub :)
    The next section contains a Javascript app that cycles through a collection of quotes I like.



    Get in Touch

    hello [at] {this-domain}

    Especially, if you're interested in:

    1. 🙈 Undocumented APIs and auditing algorithms.
    2. 🙊 {"metadata" : ["hidden-in-plain-sight.js"]}.
    3. 🙉 Methods of studying the information ecosystem.