Personal Data • Cinema • ML Pipeline
CineScope: Personal Cinema Analytics
Years of curated watchlists and a carefully built movie collection, transformed into a data project to understand my own cinematic patterns.
I have been exporting IMDb lists, maintaining spreadsheets, and building databases of my films for years. CineScope is where I turned that private obsession into a structured analytics pipeline: who I watch, which stories I return to, how my taste evolves across decades, and where the hidden gaps in my collection are.
From Lifelong Lists to a Data Project
CineScope is built on top of years of curated lists: personal watch history, IMDb exports, local collection databases, and notes about what I love. I pulled all of this together, enriched it with external sources (IMDb, TMDb, OMDb, DoesTheDogDie, Wikidata), and created a pipeline that treats my own film life like a research dataset.
Instead of asking "what should I watch?", this project asks deeper questions: What do I actually watch? Where are my biases? Which genres and eras define my taste? Who shows up over and over again when I am not paying attention?
1. Single-Genre vs Hybrid Films
One of the most revealing patterns in my collection: the overwhelming preference for genre-blending over purity.
Key Finding: 91.7% Hybrid Preference
Only 189 films (8.3%) in my collection are "pure" single-genre entries. The remaining ~92% combine two or more genres.
Pure Drama
The Shawshank Redemption, Bicycle Thieves, All About Eve, Magnolia, Rebel Without a Cause
Pure Comedy
Bringing Up Baby, One Two Three, The Birdcage, Ferris Bueller's Day Off, Death at a Funeral
Pure Thriller
Wait Until Dark
Distribution Peak: 2-3 Genres
The bar chart shows the peak at 2-3 genres (~750 films with 2 genres, 800+ with 3). This represents my comfort zone:
- 2 genres: Forrest Gump (Drama/Romance), V for Vendetta (Action/Sci-Fi)
- 3 genres: Se7en (Crime/Drama/Mystery), Gremlins (Comedy/Fantasy/Horror)
Long Tail: 6-8 Genre Films
The collection includes several "genre soup" films where marketing applies every possible tag:
- Xuxa Abracadabra: Action/Adventure/Family/Fantasy/Romance/Sci-Fi
- The VelociPastor: Action/Adventure/Comedy/Fantasy/Horror/Sci-Fi
- What Happened to Monday: Action/Crime/Fantasy/Mystery/Sci-Fi/Thriller
- Annihilation: Adventure/Drama/Horror/Mystery/Sci-Fi
Interpretation: This distribution suggests a preference for films that mix tones - funny but sad, romantic but tense, scary but playful. Pure single-genre films are the exception rather than the rule.
2. Genre Evolution Across Decades
A stacked-area timeline showing how different genres rise and fall throughout cinema history as reflected in my viewing patterns.
Temporal Genre Distribution
Classic Era (1930s-1950s): Drama & Romance Rule
Drama, Romance, and Comedy form the dominant stripes. Horror, Action, and Sci-Fi remain small.
- 1930s: City Lights, Modern Times, Bringing Up Baby, Gone with the Wind
- 1940s: Casablanca, It's a Wonderful Life, The Philadelphia Story, His Girl Friday, Double Indemnity
- 1950s: All About Eve, Sunset Blvd., Rebel Without a Cause
1960s-1970s: Tension and Horror Enter
Drama and Romance stay strong, but Thriller, Crime, and Horror start gaining thickness.
- Psycho, The Exorcist, Wait Until Dark, Taxi Driver, The Godfather, The Texas Chain Saw Massacre
1980s: Genre Cinema Boom
Significant expansion in Horror, Action, and Sci-Fi - video-store energy meets classic sensibilities.
- Gremlins, The Lost Boys, Hellraiser, Back to the Future, The Terminator, Ghostbusters
1990s-2010s: Maximum Volume
The tallest part of the graph. Drama and Comedy both reach 170-180 titles per decade. Thriller and Crime explode.
- Drama/Comedy: Forrest Gump, Fight Club, Magnolia, American Pie 2, Muriel's Wedding, Amelie
- Thriller/Crime: Se7en, Fargo, Reservoir Dogs, Run Lola Run, V for Vendetta, The Usual Suspects, Memento
- Horror: Hellraiser, The Lost Boys, Saw, The Ring, Annabelle, Annihilation
2020s: Still Building
Early decade with lower numbers across all genres - the streaming era continues to accumulate.
Interpretation: Classic decades provide foundational dramas, romances, and comedies. More recent decades layer thriller, horror, and action onto that emotional base - but without abandoning the core genres.
3. Decade by Genre Heatmap Analysis
Granular view of genre distribution by decade, revealing specific comfort zones and exploration patterns.
Notable Patterns by Decade
Drama, Comedy, Romance dominate (40-60 titles each). Horror present but small (Cat People, early creature features).
City Lights, Modern Times, Casablanca, Double Indemnity, All About Eve
Brightest comedy spot. Horror, Action, Sci-Fi gain significant space.
Ferris Bueller's Day Off, Hellraiser, Back to the Future, The Terminator, Firestarter
Drama and Comedy ~170-180 titles each. Thriller and Crime grow sharply.
The Shawshank Redemption, Forrest Gump, Magnolia, Se7en, Fight Club, Fargo, Heat
Drama (318), Comedy (329), Romance (286), Thriller (168). Widest viewing decade - every genre represented.
City of God, V for Vendetta, The Ring, Saw, Enchanted
Drama (252), Comedy (278), Romance (233), Thriller (205). Modern genre blends dominate.
Inception, Black Swan, Annihilation, What Happened to Monday, The VelociPastor, Elle
Thriller (91) and Action (61) already stand out. The streaming/exploration era.
Vertical Spine: Drama, Comedy, and Romance form consistent pillars across all decades. From the 1980s onward, Thriller, Horror, Action, and Sci-Fi layer additional dimensions onto this foundation.
4. Actor Eras: Classic vs Modern
Distribution of actors by their primary active era, revealing the temporal composition of my viewing network.
Actor Era Distribution
Dominates the count. Modern films have large ensembles and many credited roles. Includes casts from Inception, Annihilation, What Happened to Monday, and more.
Stars and character actors from The Lost Boys, Gremlins, Fight Club, Se7en, Forrest Gump, Magnolia.
Foundational cinema faces: casts of City Lights, Modern Times, Casablanca, It's a Wonderful Life, Bicycle Thieves, Double Indemnity, Bringing Up Baby.
Performers from Psycho, The Exorcist, Taxi Driver, The Godfather, Kramer vs. Kramer.
Interpretation: On the film side, the collection is fairly balanced between classic and modern. On the actor side, the network skews heavily modern because: (1) more films are from 1980s onward, and (2) recent productions have bigger, more international casts. The viewing connects Charlie Chaplin and Ingrid Bergman-era figures through to modern ensembles, but numerically, most screen time is with post-2000 performers.
5. Top Actors: The Performers Who Define My Collection
This chart reveals the actors I have watched most frequently, regardless of rating. These are the faces that define my viewing identity.
My Signature Performer: Vincent Price
Vincent Price leads with 30 films (avg rating 6.61) - my definitive horror icon and the clearest signal of what "comfort viewing" means to me.
- The Abominable Dr. Phibes, Theatre of Blood, House of Wax
- House on Haunted Hill, The Fly (1958)
- The Masque of the Red Death (Poe adaptations)
- The Great Mouse Detective (even voice roles appear)
The 80s-90s Action Icons
Golden Age Royalty
Modern Prestige Performers
The Comfort Viewing Crew
Pattern: I balance "prestige cinema" with unabashed entertainment - Sandler, Price, and Denzel Washington coexist in the same dataset.
6. Top Actresses: The Women Who Shape My Viewing
A significant commitment to Hollywood's leading women, particularly in romance and comedy, spanning from Golden Age legends to contemporary performers.
Leading Presence: Julianne Moore (21 films)
Moore sits at the intersection of emotional drama, romance, and psychological tension - exactly where my favorite tones converge.
- Magnolia, The Hours, Children of Men
- Crazy, Stupid, Love, Still Alice
Classic Hollywood Legends
Modern Leading Women
Hidden Favorites Detected
Patricia Clarkson (14 films, very high avg rating 6.71) - prestige supporting characters signal a sophisticated viewing pattern.
Scarlett Johansson & Anne Hathaway - consistent engagement with top contemporary actresses.
Pattern: I watch women who define eras, emotionally and culturally - from screwball comedy pioneers to modern rom-com royalty.
7. Actor Diversity Across Decades
This metric estimates how many unique performers I encounter per 100 films across different decades.
Diversity Patterns by Era
1910s-1920s Spike
High diversity from silent-era and early talkie films where casts change frequently - Chaplin shorts, Laurel & Hardy, early horror experiments.
1960s Peak
Aligns with major ensemble works in my collection:
- The Godfather, The Great Gatsby, Rosemary's Baby, Bonnie and Clyde
2000s Dip
Heavy franchise viewing reduces cast diversity - repeated performers across:
- MCU, X-Men, Jurassic Park sequels
- Long-running comedy/action ensembles
2020s Rise
More streaming originals, international casts, and indie experiments - a growing curiosity about global cinema.
Pattern: Franchise loyalty in the 2000s created cast repetition, while recent streaming-era exploration has increased performer diversity.
8. Actors in My Highest-Rated Films
This chart focuses on performers whose films I rate highest on average - revealing which actors correlate with quality in my personal assessment.
Quality Indicators
Classic Theatre-Trained Ensemble Drama
Several top performers share a common source - 12 Angry Men influence:
Anime Presence
Japanese voice actors from highly-rated animated works:
UK Drama & Historical Pieces
Pattern: Strong character actors and tight ensembles correlate with my highest ratings - I reward acting craft over big-budget spectacle.
9. Career Arcs & Actor Trajectories
This timeline shows both career length and the years I have engaged with different performers, revealing intentional film history exploration versus nostalgia-driven viewing.
Career Span Analysis
Classic Legends I Actively Seek Out
I do not watch classics "by accident" - there is an intentional film history appetite.
Action Titans of the 80s to Present
These careers align with teenage years and comfort nostalgia viewing patterns.
Modern Dramatic Anchors
Core performers for contemporary taste - bridging prestige and accessibility.
Pattern: The timeline reveals a split between intentional classic exploration (Bogart, Cary Grant, Bess Flowers) and era-specific comfort viewing (80s-90s action stars).
10. Career Length Distribution by Gender
Prevalence of Brief Careers
The largest bars for both groups fall in the 0-5 years range. A significant number of performers in my films appear in one movie and disappear, or act intensively for a couple of years before leaving the industry.
Examples of these brief appearances:
- Background students in college slashers
- One-scene characters in rom-coms
- Child actors in family films who never act again
Long Careers Skew Male-Labeled
As bars move right (10, 20, 30, 40+ years), male-labeled performers maintain higher counts. When I watch very long-spanning careers, I typically encounter:
- Vincent Price: 1930s to 1980s in horror and thrillers
- Cary Grant: Early 1930s screwball comedies to 1960s Hitchcock
- Morgan Freeman, Arnold Schwarzenegger: Active from 80s to 2010s
Female-labeled performers with truly long spans exist, but fewer reach 30-40+ year stretches. This reflects structural industry issues: narrower age windows for leading roles, fewer opportunities after a certain age, and less consistent pivots to character acting.
Pattern: I encounter Christopher Plummer or Anthony Hopkins in many later films, while older actresses appear less often or primarily in supporting "mother/grandmother" roles - a reflection of industry bias, not personal preference.
Actor Preferences Summary
Synthesizing what the performer data reveals about my viewing identity.
Vincent Price at #1 by significant margin - classic horror is one of my deepest personal traditions.
Arnold, Stallone, JCVD, Willis - I enjoy action when the lead has icon energy, not just explosions.
Moore, Bullock, Thompson, both Hepburns, Davis - women who define eras emotionally and culturally.
Many 12 Angry Men actors have top average ratings - tight casts over spectacle.
Bogart, Cary Grant, Bess Flowers - intentional classic cinema exploration.
Sandler + Price + Denzel in the same dataset - I do not restrict viewing to "serious" cinema.
11. Directors: The Filmmakers Who Define My Collection
This chart shows directors by presence in my watch history - not quality, but commitment to following their work.
Alfred Hitchcock Dominates (31 films)
Hitchcock is far above everyone else - an auteur-level commitment spanning his entire career:
- Classics: Psycho, Rear Window, Vertigo, North by Northwest
- British period: The 39 Steps, The Lady Vanishes
- Psychological: Notorious, Rebecca, Dial M for Murder
If Hitchcock directed it, I am interested - even the minor works.
Classic Hollywood Cluster
Comedy & Feel-Good Directors
Hong Kong Martial Arts Cinema
Not just Western action - I also chase kung-fu craftsmanship.
B-Movies & Cult
Pattern: I follow directors across eras - from Chaplin to Hitchcock to Corman to Levy - treating them as recurring "authors" in my viewing habits.
12. Director Quality: Who Delivers My Highest Ratings
Moving from "who I watch most" to who scores best - directors with minimum 5 films, ranked by average rating.
Top of the Quality Table
Classic Comfort Directors = Quality Guarantees
Billy Wilder, Howard Hawks, William Wyler, Frank Capra all cluster in the high-7s range with many films each. These are not "homework" watches - they consistently deliver.
Modern Storytellers
These cluster slightly below Nolan/Chaplin but still above 7 on average.
Pattern: My quality list balances silent-era genius (Chaplin), noir & classic comedy (Wilder, Hawks, Capra), 80s/90s mainstream (Zemeckis, Spielberg), and edgier voices (Coens, Fincher, Nolan). I reward strong authorial control whether in black-and-white or dream-within-a-dream blockbusters.
13. Director Diversity Over Decades
How many unique directors appear in my collection per decade - tracking the evolution from curated classics to exploratory viewing.
Decade-by-Decade Evolution
14-60 directors per decade. Selected classics rather than exhaustive viewing. Studio systems meant the same names repeated (Hitchcock, Hawks, Capra, Chaplin).
VHS, cable, and global cinema spread. Big names (Ridley Scott, John Carpenter, James Cameron) plus smaller horror, comedy, and action directors.
Independent film boom, world cinema becomes more accessible. Much broader sampling.
DVD era + internet recommendations + massive increase in film production. Widest decade of exploration.
Slightly lower but still huge. Streaming era maintains breadth.
Early decade - number still increasing.
Pattern: My viewing evolved from curated classics to an exploratory, global mix. I am not stuck in a nostalgic bubble - I keep meeting new voices.
14. Studios: The Production Houses Shaping My Collection
Top 20 production companies by film count, along with average ratings - revealing which studios deliver consistent quality.
The Big Five Dominate
Each has dozens to hundreds of films in my history (avg scores ~6.5-6.8):
My watch history is essentially a tour through the canon shaped by major American studios - from classic musicals and film noir to modern franchises like The Matrix, Fast & Furious, Jurassic Park, and Harry Potter.
Mid-Budget & Specialty Players
Average ratings ~6.0-6.7. I use these as genre delivery systems rather than quality marks.
UK & International Standouts
Pattern: Within a mostly US-studio diet, I carve out a special place for UK/European co-productions, especially in romance and character-driven stories. Working Title delivers my highest studio average.
15. Oscar Timeline: How Award-Literate Is My Collection?
For each decade, tracking Best Picture winners, other Oscar-winning films, and nominees - revealing how I engage with the industry's official markers of importance.
Living in the World of Nominees
In most decades, the brown "nominee" bars tower over the winner bars. I am not just ticking off Titanic and Ben-Hur - I am also watching films like Saving Private Ryan, La La Land, and Star Wars: A New Hope, which piled up nominations but did not necessarily win Best Picture.
The 1990s & 2000s: My Big Awards Decades
These decades explode in height - heavy on nominees, healthy on winners:
Classic Oscar History Still Present
Earlier decades show smaller but very real bars - I travel back to the canonical milestones:
Pattern: I like to know what the industry considered important across many eras, but I do not restrict myself to only the official "winners." The nominees often capture more interesting creative risks than the safe consensus picks.
16. The Award Monsters in My Collection
The 20 most-awarded titles in my data - total wins across Oscars and other awards, nominations, and Best Picture status.
The Epic Event Films
I clearly have patience for big, sweeping cinema that demands commitment.
Prestige Musicals & Literary Adaptations
Emotional, character-driven, technically polished - I stack several of this type.
American Prestige Dramas
All About Eve, The Apartment, Out of Africa, Forrest Gump, and The Sting - that combination of Hollywood craft, recognizable stars, and just enough emotional punch to generate awards buzz.
Prestige TV in the Mix
I track prestige TV too - I care about storytelling, not just format.
Pattern: My most-awarded titles cluster around big, emotionally resonant, canonized works - epics, musicals, heavy dramas - plus prestige TV. I have built a very "award-literate" library.
17. How the Main Rating Sites See My Films
Comparing IMDb, TMDB, Rotten Tomatoes, and Metacritic across average rating, coverage, distribution, and consistency.
Coverage Gaps Reveal My Taste
Almost blanket coverage. Any random film I have watched is essentially guaranteed a score.
Missing ~19% - likely older, obscure, exploitation, or non-US titles I dig up that mainstream critics do not log.
Missing ~30% - the most selective source, dropping off for anything outside curated mainstream.
Average Scores: Comfortable Middle Ground
My watchlist is not full of universally adored masterpieces nor pure trash - it is a mixture of mid-tier, cult, and good-but-not-sacred titles. Pairing Titanic with Problem Child or So Undercover gives that comfortable 6-7 zone.
Distribution Behavior
IMDb/TMDB: Compact, smooth distributions with
most scores in the 5-8 range. They behave like crowd
averages.
Rotten Tomatoes: Very stretched violin, pushing
films toward very high or very low scores (~2.8 std dev).
Critics mark things clearly "fresh" or "rotten," exaggerating
extremes.
Metacritic: Also harsh and spread out, though
less extreme than RT.
Pattern: For something like La La Land, all four sources probably group in "good to excellent." But for Love, Weddings & Other Disasters or I Spit on Your Grave 2, IMDb/TMDB might say "5-6, meh" while RT/Metacritic could slam it or oddly elevate it - creating a messier picture.
18. Where Critics Completely Disagree
The top 20 films with the largest rating gap between highest and lowest source, plus the overall variance distribution.
The Hall of Chaos
Movies where critics and crowds cannot agree at all - gaps of 7-8+ points:
Light Comedies & Mid-Budget Genre Films
Cult / Trash / Exploitation Tension
Family / Kids Titles
Overall Pattern
The histogram shows most films have low variance (under ~2 points) - the sites broadly agree. A smaller tail stretches to very high variance (10+), occupied by those controversial films above. Mean variance ~1.40, median ~0.77 - disagreement is the exception, not the rule, but when it happens, it is dramatic.
Pattern: This list reveals a part of my taste that is not "Oscar canon" - I like wandering into messy, uneven, sometimes trashy, sometimes secretly fun territory where the critical world cannot decide.
19. My Relationship with Rotten Tomatoes
Zooming in specifically on RT's "freshness" system - score distribution, correlation with IMDb, fresh vs rotten split, and coverage.
Score Distribution
RT scores (converted 0-10) spread fairly widely with a mean around 5.93 - just below the "fresh" threshold (traditionally 6.0/60%). My collection leans slightly toward films that critics consider okay or good, but not universally beloved. Pairing Titanic or Ben-Hur with Problem Child or A Thousand Words gives that mixed picture.
Correlation with IMDb (r ~ 0.77)
A strong positive correlation:
- When IMDb audiences like a film (The Apartment, All About Eve), RT critics usually like it too
- When IMDb is lukewarm (The Contract, Two of Hearts), RT tends to be lukewarm or worse
Although outliers exist, I generally watch films where audience and critic sentiment move together.
Fresh vs Rotten Split
Tiny tilt toward critic-approved titles like La La Land, The Sound of Music, Saving Private Ryan
Totally game for something RT considers "rotten" like So Undercover or A Thousand Words
Coverage: ~81%
The missing ~19% are likely the very obscure, older, or niche films: pre-Code oddities, trash horror, early B-movies, or international titles that never got mainstream RT coverage. That gap is actually part of my identity - I am not confined to what current critics happen to track.
Pattern: My library is almost split in half between fresh and rotten, with a tiny tilt towards critic-approved titles. But nearly half of what I watch RT considers "rotten" - and I watched them anyway.
20. My Alignment with Metacritic
How "Meta-core" is my collection? Metacritic aggregates professional critics into a weighted score - here is how my viewing aligns with their consensus.
Score Distribution
Most of my films sit in the 4-8 Metacritic range (40-80/100), with the bulk around 5-7. Films like The English Patient or Saving Private Ryan fit right in the heart of my collection: neither obscure disaster nor ultra-niche masterpiece, but "serious critical cinema."
Correlation with IMDb
The scatterplot forms a tight diagonal cloud: when Metacritic goes up, IMDb usually goes up too. If I like a film with 8.5 on IMDb, chances are its Metascore is also solid - think The Shawshank Redemption or Cinema Paradiso.
Where points drift away from the diagonal are the "critics vs crowd" divergences - a comfort film that IMDb users rate 7.5 but Meta holds at 55, or a formally brilliant movie critics adore (Meta 85+) while the public is more lukewarm.
Metacritic Categories
City of God, Double Indemnity, 12 Angry Men, Modern Times, It's a Wonderful Life, Casablanca, Inception - the "critically bullet-proof" ones
A big chunk: Forrest Gump, Fight Club, The Sting, La La Land
The largest bar - "mid-Meta" cinema that critics see as imperfect or divisive. Romantic comedies and genre films that audiences like more than critics.
A non-tiny number: my willingness to watch So Undercover or Problem Child shows I actively explore trash/guilty-pleasure territory, not only "respectable" options.
Coverage: ~70%
About 30% of my films have no Metascore - often older classics that never got a formal Meta aggregation, TV movies, international titles, or niche genre releases. I regularly go beyond the "Metacritic canon."
Pattern: I am not a Metacritic snob. I watch across all four quality buckets, with a special tolerance for the "mixed" and "unfavorable" zones that critics dismiss.
21. Personalized Recommendations: What My Own Data Suggests
A prototype recommender built from my own films - each block takes a "reference" film and lists the most similar titles in my dataset based on genre, decade, tone, and ratings.
If I Love The Shawshank Redemption
The system pairs it with:
When I am in a Shawshank mood, these are other character-focused, critically solid dramas waiting on my shelf.
If I Am Into The Host (2006)
I get a cluster of films that mix genre with emotional depth - thriller + family drama, fantastical elements + social commentary. The system says: "You like horror/sci-fi where the monster is not the only problem - here are others with that same layered structure."
If I Choose Den of Thieves: Pantera
Recommendations trend toward crime/action heist energy: films like Battleship or My Mom's New Boyfriend (less heist, more caper) echo my attraction to slick, slightly messy genre cinema.
If I Click Where is Coletti? (1915)
The list suggests early-cinema comedies with similar runtime and silent slapstick DNA. "If you are already venturing into 1910s comedy, here are the neighbors on that same dusty shelf."
For 27 Dresses
The system groups rom-coms with similar structure and mood - "On days when you rewatch 27 Dresses, these are your other wedding/identity/work-rom-com comfort picks."
Pattern: This figure is less about what critics think and more: "If I trained an algorithm on my own taste, what would it tell Future Me to watch next?"
22. Collection Gaps: Where to Expand Next
My "where to explore" map - highlighting genre-decade pockets with high potential based on what I already love but have barely touched.
High-Quality Gaps
Genre-decade pockets with high average ratings but very few films watched:
Era Gaps
Decades where certain genres are almost missing:
Directors to Explore Deeper
Directors I rated very highly but have only seen 1-3 films from:
Runtime Diversity Gaps
I might watch Fantasy (Standard length) a lot, but Fantasy (Short) or Crime (Epic-length) almost never. If I enjoy Once Upon a Time in the West or Seven Samurai, more epic-length genre cinema would suit me.
Pattern: This is a strategic watchlist generator - it tells me where the low-hanging fruit for future favorites is hiding.
23. Collection Diversity Score: The Dashboard on a Single Slide
Three dimensions of diversity - genre, era, and quality range - summarizing how varied my personal cinema universe really is.
Genre Diversity: ~77% (23 Unique Genres)
I do not just oscillate between "Drama, Comedy, Action." I also touch Documentary, Film-Noir, Biography, Animation, Short, and more. In practice, my viewing swings from Ben-Hur to Six Feet Under (series), from screwball comedies to war epics, from experimental horror to family films.
Era Diversity: ~82% (12 Decades)
My films stretch from the 1910s (e.g., Where is Coletti?) all the way to the 2020s, with substantial presence across silent era, Golden Age, New Hollywood, 80s/90s, and contemporary cinema. I am not just "a 90s kid who watches 90s movies" - I am actively traveling across film history.
Quality Range: ~76% (Spread of Ratings)
My collection purposely includes:
I explore the whole spectrum, not just the "Top 250" safety zone.
Pattern: My collection is broad in what it watches, when it watches, and how "good" those films are supposed to be. Three dimensions roughly balanced.
24. Outlier Detection: My Hall of Anomalies
Films that are weird compared with the rest of my watch history - rating outliers, extreme runtimes, and unusual eras that defy normal patterns.
Outlier Types
Films way higher or lower than my usual scores - giving 9+ to something most people treat as minor, or 3 to an accepted classic.
Ultra-long or ultra-short: epics like Ben-Hur and Gone with the Wind, or very short experimental/early titles.
Films whose decade is rare in my collection - stray 1910s silent comedies or non-Western 1940s films.
Sample Outlier Films
High-impact films, often classics or foreign masterpieces:
Pattern: These are films that, according to the math, hit me (or the broader audience) in a way that normal genre/decade patterns cannot fully explain. They often become new favorites.
25. High-Level Patterns: What My Collection Looks Like If I Squint
A 4-in-1 panel zooming out to see genre-decade heatmaps, runtime-quality patterns, top combinations, and average quality by genre.
Genre x Decade Heatmap
Comedy in the 2000s is clearly my most populated cell - every rom-com, mainstream comedy, and light film I have watched from that decade: 27 Dresses, The Wedding Date, Because I Said So, etc.
Drama and Comedy in the 1990s-2010s also glow bright: Forrest Gump, Fight Club, Saving Private Ryan, La La Land, Star Wars: Episode IV (for 70s), and lots of contemporary titles.
Earlier decades (1930s-1950s) show smaller but visible islands: classic comedies and dramas like It Happened One Night, All About Eve, Kramer vs. Kramer, The Sound of Music, Doctor Zhivago.
Runtime x Quality Pattern
Top 15 Genre-Decade Combinations
Comedy (2000s) is clearly #1, my comfort zone. Comedy (2010s) and Comedy (1990s) still strong - "comedy from my own lifetime" is a huge pillar. Action (2000s/2010s) and Drama (2000s/2010s) also rank high - superhero cinema to prestige dramas.
Average Quality by Genre
Pattern: My collection leans toward recent decades but with consistent roots in the classic canon. I reward documentaries and biographies most, and am slightly harsher (or more adventurous) with comedy.
26. Quality Distribution Per Genre: How Messy Is Each Genre?
Boxplots showing the spread of IMDb ratings for each genre - revealing where I find consistency and where I surf chaos.
Drama
Median around 6.7-6.8 with a long tail up to 9+ (The Shawshank Redemption, 12 Angry Men, City of God, Forrest Gump). I consistently find strong dramas, but I also include weaker ones, so the lower whisker drops toward 4s and 3s.
Comedy & Romance
Slightly lower medians (~6.2-6.4). Very wide spread - from painful 3-ish romantic comedies or broad comedies (Problem Child, National Lampoon's Senior Trip) to well-loved pieces like La La Land, The Apartment, Some Like It Hot. I experiment a lot here, not only chasing "the best of the best."
Horror & Action
Broadest spreads: some very high peaks (classic horror like The Exorcist-type films, smart action like Die Hard) but many low-rated genre entries. I am clearly open to trash, cult, and B-movie horror/action, not just prestige entries.
Crime, Mystery, Sci-Fi, Fantasy, Adventure, Family
Medians in the 6.3-6.8 band. Crime and Mystery skew slightly higher, reflecting my love for tight narratives (noir, thrillers). Family films have a moderate spread - lots of average ones, with a few outstanding hits (Pixar, classic family adventure) pulling the upper tail.
Pattern: I am not a perfectionist. I am willing to watch across the quality spectrum, but different genres have different "risk profiles." Documentaries and biographies are safe bets; comedy, horror, and action are where I surf chaos.
Under the Hood
CineScope is built as a modular Python pipeline combining IMDb non-commercial datasets, APIs (TMDb, OMDb, DoesTheDogDie, Wikidata), and a dense network of CSV, Parquet, and SQLite layers. The same tools I use in research are pointed at something extremely personal: my own watch history and collection.
The repository shows the full pipeline: data ingestion, enrichment scripts, analysis batches, visual exports, and an early UI prototype exploring search and browsing on top of the enriched data.