AI Governance β€’ NLP Audit 2025
MIT Policy Hackathon 2025

DocScope Copilot

An automated auditing tool for AI documentation governance and policy analysis

The Crisis: Voluntary AI documentation standards are failing. My analysis of 22 major AI model cards (including GPT-4o) revealed a systematic lack of substantive governance data.

The Solution: DocScope Copilot is an automated auditing tool that processes technical documentation to reveal the gap between "framework recommendations" and "actual industry practice."

Key Audit Metrics

22 Documents Audited
1,303 Text Chunks NLP-Scored
81% Disclosure Gap
0.537 Avg. Quality Score

Tech Stack

🐍 Python πŸ’¬ NLP (Spacy/NLTK) πŸ“‹ JSON Schema πŸ“ Markdown Parsing πŸ” Automated Auditing
Personal Data β€’ Cinema Analytics 2024–2025

CineScope: Personal Cinema Analytics

Years of curated watchlists turned into a full analytics pipeline on my own film life

CineScope takes my personal film collection β€” IMDb exports, local databases and hand-curated lists β€” and treats it like a research dataset. I built a pipeline that enriches every title with external sources, then visualises my patterns: which eras I over-index on, who I really watch, and where the gaps in my collection are.

The Dataset

2,000+ Films Tracked
5+ External Sources
30+ Visual Dashboards

Tech Stack

🐍 Python 🐼 Pandas πŸ“Š Matplotlib πŸ—„οΈ SQLite πŸ”— APIs (TMDb, OMDb, Wikidata)
Machine Learning β€’ Data Mining 2024-2025

Cinema Through Data: Cultural Analysis (1915-1960)

Large-scale cultural data mining meets film history β€” 140,000+ movies, one massive pipeline

What can 140,000 old movies tell you about culture, representation, and power dynamics? I built a full-stack data product to find out β€” from scraping IMDb datasets to training ML models that actually understand film.

The Scale

140,000+ Titles Processed
77,000 Graph Nodes
313,000 Graph Edges
98.71% Model Accuracy

Tech Stack

🐍 Python πŸ•ΈοΈ NetworkX πŸš€ LightGBM πŸ“± Streamlit 🧠 TensorFlow
NLP & Infrastructure Summer 2025

AI Chatbot for Inter-American Development Bank

Enterprise-grade semantic search at scale

Built an AI-powered chatbot for the IDB's Office of the Secretary that doesn't just keyword search β€” it actually understands what you're asking and finds the right documents.

Tech Stack

🐍 Python πŸ’» C# ☁️ Azure OpenAI πŸ” FAISS πŸ“„ PyPDF2
Information Retrieval 2024

Search Engine Performance Analysis

Making search suck less through systematic evaluation

Built a framework to measure exactly how search engines fail to understand complex queries. Created gold-standard evaluation datasets and Python automation for precision/recall calculation.

Tech Stack

🐍 Python πŸ”Ž Information Retrieval πŸ“ Evaluation Metrics

Want to See More?

I'm constantly working on new projects at the intersection of AI fairness, NLP, and ethical tech. Check out my GitHub for the latest code, or get in touch if you want to collaborate.