Leon Yin

Solving puzzles across domains, byte-by-byte.

About

I'm a data scientist at the SMaPP lab at NYU and a research affiliate at Data & Society's Media Manipulation Intiative. I develop methodology and tools to leverage images, links and text in a meaningful way. Usually for social scientists, but sometimes for fun, profit, or altruism. Previously, I wrote scientific software at NASA, and wrangled data for Sony.

Research

Academic pursuits (by progress descending)
   Link Analysis on Social Media
   Detecting Political Polarity in Texts
   Tracing Images In Disinformation Campaigns
For research outputs and past projects...
   Scroll to the next section, fellow bot.

Projects

The Internet Research Agency, Hyperlinks, News, and Marketing Tools

How impactful was "fake news" in foreign info Ops during the 2016 U.S. Presidential Election? I analyzed hyperlinks to Junk, National, and Local news sources sent by accounts released by the Senate Intelligence Committee and Twitter's Elections Integrity initiative. My analysis reveals a deliberate use of local news articles to build trust in fake local news accounts, and highlight key social issues masquerading as white conservatives and black activists. I also identify marketing tools used to automate, optimize, and measure the reach of their content.
Read the Report

Three Body Problem Language Model

How can we learn about text from vast quantities of unlabeled documents? Researchers from fast.ai and UW suggest that deep recurrent language models learn useful word representations for an array of NLP tasks. This project was my genesis into PyTorch, with re-usable code for pre-processing text, loading data, initialting, training, and evaluating bi-directional LSTM neural networks.
Jupyter Notebook

Reverse Image Search

A demonstration of a simple, robust, and scalable reverse image search engine that leverages features from convolutional neural networks and the distance returned from the K-nearest neighbors algorithm.
Jupyter Notebook
Presentation at PyData 2017

Fwd: My Great New Friend

What cultural biases do ML algorithms pick up on? I trained a character-level recurrent neural network with one Long Short-Term Memory layer, on 2000 emails from the Enron corpus to finish lines in a love poem. The model picked up corporate culture, and rambled endlessly about "the company", and "compensation". In collaboration with Constant Dullaart and Rhizome. Presented at the New Museum for The Making of Natural Language.
Jupyter Notebook
Poem

Are US Legislators Ideologically Polarized?

A timeseries visualization of legislator voting history using DW-Nominate, a metric of the liberal-conservative spectrum.
Jupyter Notebook
Ideological Polarization of Congress JFK-2014

Who is on the Receiving End of Tax-Payer Dollars?

Government contracts are available to the public on USASpending.gov. In this notebook I show how to download records, and aggregate financial data from the US' largest private prison systems. This may someday become a Twitter bot.
Jupyter Notebook 1 2
Plotly CoreCivic contracts by state

What Research Does the NSF Support (and How Much)?

NSF grants are available to the public and contain rich metadata. For this project, I ingest XML files into SQLite tables to power dashboards and wordclouds. I look into funding history, and on-going projects from several notable Oceanographers.
Jupyter Notebook
Plot.ly 1 2
d3.js Network Graph (a welcome mistake)



Open Source Software

YouTube-Data-API

Despite having the largest userbase amongst American adults, YouTube is a social media platform that is often overlooked in academic research. youtube-data-api is a Python client to make this data source more accessible, while introducing new applications and methods to analyze this platform.
ReadTheDocs
Github Repo
PyPi Page

urlExpander

urlExpander is a Python package for quickly and thoroughly expanding shortened URLs. Marketing and analytics services like bit.ly are great for tracking engagement. However, these services obfuscate the destination of URLs for social media analysts.
Jupyter Notebook Quickstart
Github Repo
PyPi Page

S3 Helper

A high-level Python AWS-cli wrapper to smooth workflows with private data stored on s3 cloud storage. This Jupyter notebook showcases the module's ability to stream csv and json files to Pandas dataframes, and save Scikit-Learn models to s3 buckets.
Jupyter Notebook Tutorial
Github Repo
PyPi Page



Data Pipes and Web Scrapers

Coming soon!

  • Local News Dataset
  • Youtube Data API
  • Red Hen Yelp Campaign
  • Stormfront
  • And so much more!



  • Technology adopts historical mistakes and bias. If any of the projects have room to improve please let me know via email or an issue on github :)
    The next section contains a Javascript app that cycles through a collection of quotes I like.

    Next

    Next

    Get in Touch

    @leonyin
    LeonLovesSpam@gmail.com

    Especially, if you're interested in:

    1. The spread of images and memes across platforms.
    2. Machine learning for social science.
    3. Mis/disinformation on the web.


    Stay Updated

    I occasionally write about the burts and the bees.