TextXD Schedule

All talks and training sessions are located at Chou Hall Spieker Forum (6th Floor).

  • 9.00am

    NLP Training Part 1

    Python Text Analysis, Part 1: Bag-of-Words Representations

    How do we convert text into a representation that we can operate on computationally? This requires developing a numerical representation of the text. In this part of the workshop, we study one of the foundational numerical representation of text data: the bag-of-words model. This model relies heavily on word frequencies in order to characterize text corpora. We build bag-of-words models, and their variations (e.g., TF-IDF), and use these representations to perform classification on text. Chou Hall Spieker Forum (6th Floor). Hosted by D-Lab.

    Skill Level:The workshop is geared towards those with a basic familiarity with Python, but people without any familiarity with Python should be able to follow along with the conceptual presentation of the materials.

  • 12:00pm


  • 1.00pm

    NLP Training Part 2

    Python Text Analysis, Part 2: Word Embeddings

    How can we use neural networks to create meaningful representations of words? The bag-of-words is limited in its ability to characterize text, because it does not utilize word context. In this part, we study word embeddings, which were among the first attempts to use neural networks to develop numerical representations of text that incorporate context. We learn how to use the package gensim to construct and explore word embeddings of text. Chou Hall Spieker Forum (6th Floor). Hosted by D-Lab.

    Skill Level: The workshop is geared towards those with a basic familiarity with Python, but people without any familiarity with Python should be able to follow along with the conceptual presentation of the materials.

  • 4:00pm

    Check In

    Check in begins at Chou Hall Spieker Forum (6th Floor)

  • 4:30pm

    Keynote: Emily M. Bender

    Meaning making with artificial interlocutors and risks of language technology

  • 5:30pm

    Opening Reception

  • 8:30am

    Start of the Day

    9:00am: Registration and Coffee

    9:30: Welcome

  • 9:45am

    Session 1

    Rising Up and Tamping Down: Social Movements and Political Elites

    Talk 1.1: Ishita Gopal, Coverage of Protests on Telegram

    Talk 1.2: Yunus Emre Tapan, Investigation of elite opinion on Turkish-American relations

  • 11:00am

    Session 2

    Scientific Fictions and Scientific Truths

    Talk 2.1: Mark Algee-Hewitt, Truth in Fiction: the Discourse of Embedded Scientific Facts in Climate Fiction

    Talk 2.2: Shide Dehghani, ExtractFlora: a pipeline for transforming a floristic manual into a database for ecological and evolutionary study

  • 11:45am

    Poster Session

    Poster session begins and runs through lunch.

    Geoffrey Boushey Contextualizing AI Generated Transcription Accuracy for Researchers

    Jee Young Bhan Do Student Populations Matter for the Allocation of Additional Funds?: A Deeper Look at California’s Education Funding through LCAP Content

    Benjamin Lacar Identifying and evaluating social risk associations with pediatric oncology outcomes using electronic health records and clinical notes

    Hongkai Mao Sentiment Shift: Interplay between users and the community on a Chinese Movie-rating Website

    Anastasja Abraham Quantifying the Rhetoric of Polarization: A Word Embeddings Approach to the American Ideological Divide

    Da Gong Politicians, Media and COVID

  • 12:00pm


  • 1:30pm

    Session 3

    Lightning Talks

    Sub-Session: Scripting Culture, Smashing Politics

    Lightning 3.1: Jaren Haber, Does Website Text Signal Quality & Fit for Schools of Choice?: Evidence from Online Experiments Lightning 3.2: Pranathi Iyer, Decoding matrimonial advertisements: Individual preferences entrenched in socio cultural biases

    Lightning 3.3: Aleksi Sahala, Code-switching in Sumerian Emesal texts: A computational approach

    Q/A (15min)

    Sub-Session: The Hidden Structures of Science

    Lightning 3.4: Xiangyi Meng, Hidden Citations Obscure True Impact in Science

    Lightning 3.5: Viktoriia Baibakova, Dataset of BiFeO3 thin film sol-gel synthesis recipes extracted manually and with GPT-3

    Lightning 3.6: Karl Swanson, Hypernym Substitution for the Simplification of Biomedical Definitions

    Q/A (15min)

  • 2:00pm

    Coffee Break

  • 3:00pm

    Session 4

    Through New Eyes: Advances in Methods

    Talk 4.1: Jake Ryland Williams, How To Train Your Own Transformer From Scratch

    Talk 4.2: Emily Amspoker, A Gamified Approach to Frame Semantic Role Labeling

    Talk 4.3: Peter Leonard, Text & visual cultural heritage collections: evocative possibilities

  • 4:00pm

    Collaboration Session

    Themed Table Discussion

  • 8:30am - 9:40am

    Start of the Day

    9:00am - 9:30am: Registration and Coffee

    9:30: Welcome back

  • 9:45pm

    Session 5

    Reading the Fine Print: Social Determinants of Health

    Talk 5.1: Dmytro ‘Dima’ Lituiev, Automatic Extraction of Social Determinants of Health from Medical Notes of Chronic Lower Back Pain Patients

    Talk 5.2: Shenghuan Sun, Predicting the cancer therapy regimen from social work notes using natural language processing

  • 10:30am

    Coffee Break

  • 11:00am

    Session 6

    Friends and Enemies: Network Interaction, Group Formation, and Misinformation

    Talk 6.1: Monica Lee, Extracting text signals from social media movements with Information Corridors

    Talk 6.2: Lara Yang, Locally Ensconced and Globally Integrated: How Positions in Network Structure Relate to a Language-Based Model of Group Identification

  • 11:45pm

    Poster Session

    Poster session begins and runs through lunch.

    Michael Ruiz Lifting the Bar: A relationship-orienting intervention motivates teachers to build relationships with hypothetical formerly incarcerated students

    Julian Heid Is Populism Contagious? Evidence from Parliamentary Speeches in Germany

    Zeyneb Kaya Women in the Workplace: Analyzing Gender Biases in Corporate Email Communications

    Vitaly Meursault Can innovations in NLP help us study innovation?

    Nikki Garlic Using R to Wrangle and Analyze Unstructured Text in Federal Court Case Dockets

    Jay Gupta A Topic Modeling Analysis on Historical US Mental Health Legislation

  • 12:15pm


  • 1:30pm

    Session 3

    Lightning Talks

    Sub-Session: Widening the Frame: New Methods and Resources

    Lightning 7.1: Laura Nixon, ReThink Media is working on a pipeline to analyze news coverage of the issues we work on, leveraging past work by researchers on quote extraction, news/opinion classification, NER and name-to-gender inference. We’ll discuss what we’ve learned, and the outstanding questions we’re grappling with now.

    Lightning 7.2: Lindsay Katz, Digitization of the Australian Hansard (1901-2022): A new, comprehensive database of all proceedings of the Australian Parliamentary Debates

    Lightning 7.3: Ayush Pancholy, Sister Help: Data Augmentation for Frame-Semantic Role Labeling


    Sub-Session:Tweeting Covid-19

    Lightning 7.4: Samuel R. Mendez, Performance of reading grade level estimators for public health communication on Twitter

    Lightning 7.5: Burak Ozturan, Polarization in elite discourse on the COVID-19 pandemic

    Lightning 7.6: Wan Nurul Naszeerah, A Preliminary Social Listening-based Evidence in Using a Language-based Approach to Enhancing Vaccine Confidence in the Underrepresented Malay-speaking Communities in Southeast Asia


  • 2:30pm

    Coffee Break

  • 3:00pm

    Session 8

    Money Talks: Finance and Marketing

    Talk 8.1: Dominik Jurek, Patent texts for identification in economic and innovation research

    Talk 8.2: Vincent Chen, Correcting Recency Bias in Retrospective Judgments Using Diachronic Word Embedding

    Talk 8.3: Summer Zhao, The Information Content of Credit Sentiments in Conference Calls

  • 4:00pm

    Keynote: David Blei

    Beyond Roll Call: Inferring Politics from Text

  • 5:00pm

    Closing Reception