Hosted by:


In Partnership with:



event

TextXD: Text Analysis Across Domains

Text-focused data science conference at UC Berkeley. Three days of workshops, talks, posters, and disscussion on current Natural Language Processing (NLP) research.

Recent Past Programs

See: 2020 | 2019 | 2018

Discover More

Community

Fostering a cross-disciplinary community of text processing experts from academia, research, and industry.

Research

Develop a shared understanding of each other’s use of text processing data, algorithms, and software.

Learn

Learn from one another, through tutorials, and collaborative work sessions, about available tools and methods, and the applicability of these to various discipline-specific problems.

schedule

TextXD Schedule

All talks and training sessions are located at Chou Hall Spieker Forum (6th Floor).

  • 9.00am

    NLP Training Part 1

    Python Text Analysis, Part 1: Bag-of-Words Representations

    How do we convert text into a representation that we can operate on computationally? This requires developing a numerical representation of the text. In this part of the workshop, we study one of the foundational numerical representation of text data: the bag-of-words model. This model relies heavily on word frequencies in order to characterize text corpora. We build bag-of-words models, and their variations (e.g., TF-IDF), and use these representations to perform classification on text. Chou Hall Spieker Forum (6th Floor). Hosted by D-Lab.

    Skill Level:The workshop is geared towards those with a basic familiarity with Python, but people without any familiarity with Python should be able to follow along with the conceptual presentation of the materials.

  • 12:00pm

    Lunch

  • 1.00pm

    NLP Training Part 2

    Python Text Analysis, Part 2: Word Embeddings

    How can we use neural networks to create meaningful representations of words? The bag-of-words is limited in its ability to characterize text, because it does not utilize word context. In this part, we study word embeddings, which were among the first attempts to use neural networks to develop numerical representations of text that incorporate context. We learn how to use the package gensim to construct and explore word embeddings of text. Chou Hall Spieker Forum (6th Floor). Hosted by D-Lab.

    Skill Level: The workshop is geared towards those with a basic familiarity with Python, but people without any familiarity with Python should be able to follow along with the conceptual presentation of the materials.

  • 4:00pm

    Check In

    Check in begins at Chou Hall Spieker Forum (6th Floor)

  • 4:30pm

    Keynote: Emily M. Bender

    Meaning making with artificial interlocutors and risks of language technology

  • 5:30pm

    Opening Reception

  • 8:30am

    Start of the Day

    9:00am: Registration and Coffee

    9:30: Welcome

  • 9:45am

    Session 1

    Rising Up and Tamping Down: Social Movements and Political Elites

    Talk 1.1: Ishita Gopal, Coverage of Protests on Telegram

    Talk 1.2: Yunus Emre Tapan, Investigation of elite opinion on Turkish-American relations

  • 11:00am

    Session 2

    Scientific Fictions and Scientific Truths

    Talk 2.1: Mark Algee-Hewitt, Truth in Fiction: the Discourse of Embedded Scientific Facts in Climate Fiction

    Talk 2.2: Shide Dehghani, ExtractFlora: a pipeline for transforming a floristic manual into a database for ecological and evolutionary study

  • 11:45am

    Poster Session

    Poster session begins and runs through lunch.

    Geoffrey Boushey Contextualizing AI Generated Transcription Accuracy for Researchers

    Jee Young Bhan Do Student Populations Matter for the Allocation of Additional Funds?: A Deeper Look at California’s Education Funding through LCAP Content

    Benjamin Lacar Identifying and evaluating social risk associations with pediatric oncology outcomes using electronic health records and clinical notes

    Hongkai Mao Sentiment Shift: Interplay between users and the community on a Chinese Movie-rating Website

    Anastasja Abraham Quantifying the Rhetoric of Polarization: A Word Embeddings Approach to the American Ideological Divide

    Da Gong Politicians, Media and COVID

  • 12:00pm

    Lunch

  • 1:30pm

    Session 3

    Lightning Talks

    Sub-Session: Scripting Culture, Smashing Politics

    Lightning 3.1: Jaren Haber, Does Website Text Signal Quality & Fit for Schools of Choice?: Evidence from Online Experiments Lightning 3.2: Pranathi Iyer, Decoding matrimonial advertisements: Individual preferences entrenched in socio cultural biases

    Lightning 3.3: Aleksi Sahala, Code-switching in Sumerian Emesal texts: A computational approach

    Q/A (15min)

    Sub-Session: The Hidden Structures of Science

    Lightning 3.4: Xiangyi Meng, Hidden Citations Obscure True Impact in Science

    Lightning 3.5: Viktoriia Baibakova, Dataset of BiFeO3 thin film sol-gel synthesis recipes extracted manually and with GPT-3

    Lightning 3.6: Karl Swanson, Hypernym Substitution for the Simplification of Biomedical Definitions

    Q/A (15min)

  • 2:00pm

    Coffee Break

  • 3:00pm

    Session 4

    Through New Eyes: Advances in Methods

    Talk 4.1: Jake Ryland Williams, How To Train Your Own Transformer From Scratch

    Talk 4.2: Emily Amspoker, A Gamified Approach to Frame Semantic Role Labeling

    Talk 4.3: Peter Leonard, Text & visual cultural heritage collections: evocative possibilities

  • 4:00pm

    Collaboration Session

    Themed Table Discussion

  • 8:30am - 9:40am

    Start of the Day

    9:00am - 9:30am: Registration and Coffee

    9:30: Welcome back

  • 9:45pm

    Session 5

    Reading the Fine Print: Social Determinants of Health

    Talk 5.1: Dmytro ‘Dima’ Lituiev, Automatic Extraction of Social Determinants of Health from Medical Notes of Chronic Lower Back Pain Patients

    Talk 5.2: Shenghuan Sun, Predicting the cancer therapy regimen from social work notes using natural language processing

  • 10:30am

    Coffee Break

  • 11:00am

    Session 6

    Friends and Enemies: Network Interaction, Group Formation, and Misinformation

    Talk 6.1: Monica Lee, Extracting text signals from social media movements with Information Corridors

    Talk 6.2: Lara Yang, Locally Ensconced and Globally Integrated: How Positions in Network Structure Relate to a Language-Based Model of Group Identification

  • 11:45pm

    Poster Session

    Poster session begins and runs through lunch.

    Michael Ruiz Lifting the Bar: A relationship-orienting intervention motivates teachers to build relationships with hypothetical formerly incarcerated students

    Julian Heid Is Populism Contagious? Evidence from Parliamentary Speeches in Germany

    Zeyneb Kaya Women in the Workplace: Analyzing Gender Biases in Corporate Email Communications

    Vitaly Meursault Can innovations in NLP help us study innovation?

    Nikki Garlic Using R to Wrangle and Analyze Unstructured Text in Federal Court Case Dockets

    Jay Gupta A Topic Modeling Analysis on Historical US Mental Health Legislation

  • 12:15pm

    Lunch

  • 1:30pm

    Session 3

    Lightning Talks

    Sub-Session: Widening the Frame: New Methods and Resources

    Lightning 7.1: Laura Nixon, ReThink Media is working on a pipeline to analyze news coverage of the issues we work on, leveraging past work by researchers on quote extraction, news/opinion classification, NER and name-to-gender inference. We’ll discuss what we’ve learned, and the outstanding questions we’re grappling with now.

    Lightning 7.2: Lindsay Katz, Digitization of the Australian Hansard (1901-2022): A new, comprehensive database of all proceedings of the Australian Parliamentary Debates

    Lightning 7.3: Ayush Pancholy, Sister Help: Data Augmentation for Frame-Semantic Role Labeling

    Q/A

    Sub-Session:Tweeting Covid-19

    Lightning 7.4: Samuel R. Mendez, Performance of reading grade level estimators for public health communication on Twitter

    Lightning 7.5: Burak Ozturan, Polarization in elite discourse on the COVID-19 pandemic

    Lightning 7.6: Wan Nurul Naszeerah, A Preliminary Social Listening-based Evidence in Using a Language-based Approach to Enhancing Vaccine Confidence in the Underrepresented Malay-speaking Communities in Southeast Asia

    Q/A

  • 2:30pm

    Coffee Break

  • 3:00pm

    Session 8

    Money Talks: Finance and Marketing

    Talk 8.1: Dominik Jurek, Patent texts for identification in economic and innovation research

    Talk 8.2: Vincent Chen, Correcting Recency Bias in Retrospective Judgments Using Diachronic Word Embedding

    Talk 8.3: Summer Zhao, The Information Content of Credit Sentiments in Conference Calls

  • 4:00pm

    Keynote: David Blei

    Beyond Roll Call: Inferring Politics from Text

  • 5:00pm

    Closing Reception

Price

Get your Ticket

Below will be the ticket prices. If you are a student and cannot afford the price of the ticket, there will be an option to get financial assistance.

$25

Undergraduate Student
  • Three Day Ticket
  • Breakfast & Lunch
  • Opening & Closing Receptions
  • Optional $10 Full Day Workshop
Buy Yout Ticket

$40

Graduate Student and Postdoc
  • Three Days Ticket
  • Breakfast & Lunch
  • Opening & Closing Receptions
  • Optional $15 Full Day Workshop
Buy Your Ticket

$100

Faculty and Industry
  • Three Days Ticket
  • Breakfast & Lunch
  • Opening & Closing Receptions
  • Optional $20 Full Day Workshop
Buy Your Ticket
Speakers

Speakers & Presenters

Multidisciplinary NLP researchers from acedemia and industry.

     David Blei, Keynote

    David Blei, Keynote

    Professor, Columbia University

     Emily M. Bender, Keynote

    Emily M. Bender, Keynote

    Professor, Linguistics, University of Washington

    Aleksi Sahala

      Aleksi Sahala

      University of Helsinki

      Anastasja Abraham

        Anastasja Abraham

        Northeastern University

        Ayush Pancholy

          Ayush Pancholy

          UC-Berkeley, International Computer Science Institute

          Benjamin Lacar

            Benjamin Lacar

            UCSF Bakar Computational Health Sciences Institute / UC Berkeley BIDS

            Burak Ozturan

              Burak Ozturan

              Northeastern University

              Da Gong

                Da Gong

                UC-Riverside

                Dmytro 'Dima' Lituiev

                  Dmytro 'Dima' Lituiev

                  Janssen Pharmaceuticals

                  Dominik Jurek

                    Dominik Jurek

                    UC-Berkeley Haas School of Business

                    Emily Amspoker

                      Emily Amspoker

                      UC Berkeley, International Computer Science Institute / Carnegie Mellon University

                      Geoffrey Boushey

                        Geoffrey Boushey

                        UCSF Library

                        Hongkai Mao

                          Hongkai Mao

                          University of Chicago

                          Ishita Gopal

                            Ishita Gopal

                            Pennsylvania State University

                            Jake Ryland Williams

                              Jake Ryland Williams

                              Drexel University

                              Jaren Haber

                                Jaren Haber

                                Dartmouth College

                                Jay Gupta

                                  Jay Gupta

                                  Stanford University and Archbishop Mitty High School

                                  Julian Heid

                                    Julian Heid

                                    LMU Munich

                                    Lara Yang

                                      Lara Yang

                                      Stanford Graduate School of Business

                                      Laura Nixon

                                        Laura Nixon

                                        ReThink Media

                                        Lindsay Katz

                                          Lindsay Katz

                                          University of Toronto

                                          Mark Algee-Hewitt

                                            Mark Algee-Hewitt

                                            Stanford University

                                            Michael Ruiz

                                              Michael Ruiz

                                              UC-Berkeley

                                              Monica Lee

                                                Monica Lee

                                                Facebook

                                                Nikki Garlic

                                                  Nikki Garlic

                                                  Temple University and University of Nevada, Reno

                                                  Peter Leonard

                                                    Peter Leonard

                                                    Stanford Libraries

                                                    Samuel R. Mendez

                                                      Samuel R. Mendez

                                                      Harvard T.H. Chan School of Public Health

                                                      Shenghuan (Harry) Sun

                                                        Shenghuan (Harry) Sun

                                                        UCSF Bakar Computational Health Sciences Institute

                                                        Shide Dehghani

                                                          Shide Dehghani

                                                          UC-Berkeley

                                                          Summer Zhao

                                                            Summer Zhao

                                                            UC-Berkeley Haas School of Business

                                                            Viktoriia Baibakova

                                                              Viktoriia Baibakova

                                                              UC-Berkeley / Lawrence Berkeley National Laboratory

                                                              Vincent Chen

                                                                Vincent Chen

                                                                UC-Berkeley Haas School of Business

                                                                Vitaly Meursault

                                                                  Vitaly Meursault

                                                                  Federal Reserve Bank of Philadelphia

                                                                  Xiangyi Meng

                                                                    Xiangyi Meng

                                                                    Northeastern University

                                                                    Yunus Emre Tapan

                                                                      Yunus Emre Tapan

                                                                      Northeastern University

                                                                      Zeyneb Kaya

                                                                        Zeyneb Kaya

                                                                        Saratoga High School

                                                                        Sponsors

                                                                        Our Partners & Sponsors

                                                                        Want to be a sponsor ?
                                                                        Contact Us
                                                                        Venue location
                                                                        December 5-7, 2022

                                                                        Spieker Forum at Chou Hall,
                                                                        University of California, Berkeley

                                                                        View Map location