I use data science to make sense of how policies, firms, and the environment interact. My work involves building and analysing large text, spatial, and unit-level datasets, employing tools from machine learning, NLP, geospatial analysis and causal inference. Here’s some of the things I’ve done.


Public Goods

  • CAG Scraper. The CAG website has an exhaustive archive of all audit reports generated by the body. This is an excellent source of text data, now easily accessible with this scraper. GitHub

  • IGROdisha Land Values. I built a large high-resolution geocoded dataset (over 64,000,000 plots) of land market values for several states. Here’s the open-sourced code for Odisha. GitHub

  • Longview: Stata package for visualizing cumulative treatment effects. Stata package for plotting cumulative treatment effects from event-study / Difference-in-Differences designs. GitHub

  • LaTeX templates for Masters’ theses at MSE. GitHub GitHub


Projects

  • Deep Learning Pipelines for Judicial Text Classification

I built deep learning and pre-processing pipelines for a unique judicial text classification problem at XKDR Forum. My work involved figuring out how this would scale and completing a cost comparison to using LLMs for this usecase.

  • No Bridge Too Far: Unique Geospatial Bridges Database

I built a novel dataset of over 2,200 Indian bridges with geospatial and temporal markers using Google Earth, OpenStreetMaps, and QGIS for a difference-in-differences study assessing the economic impact of bridge-building using night-lights data from the SHRUG database as a proxy for economic growth.

  • Tracking the Tradeoff: High-Resolution Remote Sensing Dataset

I used NASA and ESA satellite data to construct a novel high-resolution (village/census town-level) panel dataset of economic and environmental indicators for the states of Uttarakhand and Himachal Pradesh over twenty years. Most of this involved converting raw raster data on night lights, built-up area, and land surface temperature to quantitative data using R and QGIS. This was part of my work on investigating the economy-ecology trade-off; a working draft is here.

  • In Progress

I scraped over 10,000,000 Extra-Ordinary issues of the Gazette of India (used for communicating matters of special importance to the government) from different archives and am working on using Natural Language Processing to parse this important data source.