All Episodes

November 20, 2022 108 mins

We take a peak into some of the challenges Twitter has faced while solving data problems at large scale, while Michael challenges the audience, Joe speaks from experience, and Allen blindsides them both.

The full show notes for this episode are available at https://www.codingblocks.net/episode198.

News

  • Want to help us out? Leave us a review!
  • The 2023 Game Ja-Ja-Ja Jam is coming up!

Twitter has a Data Problem

Moving an Exabyte of Data

  • In 2019, over 100 million people per day would visit Twitter.
  • Every tweet and user action creates an event that is used by machine learning and employees for analytics.
  • Their goal was to democratize data analysis within Twitter to allow people with various skillsets to analyze and/or visualize the data.
  • At the time, various technologies were used for data analysis:
    • Scalding which required programmer knowledge, and
    • Presto and Vertica which had performance issues at scale.
  • Another problem was having data spread across multiple systems without a simple way to access it.

Moving pieces to Google Cloud Platform

  • The Google Cloud big data tools at play:
    • BigQuery, a cost-effective, serverless, multicloud enterprise data warehouse to power your data-driven innovation.
    • DataStudio, unifying data in one place with ability to explore, visualize and tell stories with the data.

History of Data Warehousing at Twitter

  • 2011 – Data analysis was done with Vertica and Hadoop and data was ingested using Pig for MapReduce.
  • 2012 – Replaced Pig with Scalding using Scala APIs that were geared towards creating complex pipelines that were easy to test. However, it was difficult for people with SQL skills to pick up.
  • 2016 – Started using Presto to access Hadoop data using SQL and also used Spark for ad hoc data science and machine learning.
  • 2018 …
    • Scalding for production pipelines,
    • Scalding and Spark for ad hoc data science and machine learning,
    • Vertica and Presto for ad hoc, interactive SQL analysis,
    • Druid for interactive, exploratory access to time-series metrics, and
    • Tableau, Zeppelin, and Pivot for data visualization.
  • So why the change? To simplify analytical tools for Twitter employees.

BigQuery for Everyone

  • Challenges:
    • Needed to develop an infrastructure to reliably ingest large amounts of data,
    • Support company-wide data management,
    • Implement access controls,
    • Ensure customer privacy, and
    • Build systems for:
      • Resource allocation,
      • Monitoring, and
      • Charge-back.
  • In 2018, they rolled out an alpha release.
    • The most frequently used tables were offered with personal data removed.
      • Over 250 users, from engineering, finance, and marketing used the alpha.
      • Sometime around June of 2019, they had a month where 8,000 queries were run that processed over 100 petabytes of data, not including scheduled reports.
      • The alpha turned out to be a large success so they moved forward with more using BigQuery.
  • They have a nice diagram that’s an overview of what their processes looked like at this time, where they essentially pushed data into GCS from on-premise Hadoop data clusters, and then used Airflow to move that into BigQuery, from which Data Studio pulled its data.

Ease of Use

  • BigQuery was easy to use because it didn’t require the installation of special tools and instead was easy to navigate via a web UI.
    • Users did need to become f
Mark as Played

Advertise With Us

Popular Podcasts

Are You A Charlotte?

Are You A Charlotte?

In 1997, actress Kristin Davis’ life was forever changed when she took on the role of Charlotte York in Sex and the City. As we watched Carrie, Samantha, Miranda and Charlotte navigate relationships in NYC, the show helped push once unacceptable conversation topics out of the shadows and altered the narrative around women and sex. We all saw ourselves in them as they searched for fulfillment in life, sex and friendships. Now, Kristin Davis wants to connect with you, the fans, and share untold stories and all the behind the scenes. Together, with Kristin and special guests, what will begin with Sex and the City will evolve into talks about themes that are still so relevant today. "Are you a Charlotte?" is much more than just rewatching this beloved show, it brings the past and the present together as we talk with heart, humor and of course some optimism.

On Purpose with Jay Shetty

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.